Skip to Content

Blogs

Revolution Analytics is proud to once again be a gold sponsor and Wi-Fi sponsor of the JSM 2012 conference in San Diego, the largest gathering of statisticians, biostatisticians, analysts, data miners and data scientists in the world. The conference begins on Sunday, and you'll find the Revolution Analytics team in the exhibit hall.

R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16Gb of data, so that wasn't a limitation until machines with massive amounts of RAM came along. And even then compound objects like data frames and lists can contain multiple vectors (and so exceed the 16Gb limit), so not many people noticed the issue.

The R language gets a brief mention in an article in yesterday's New York Times on automated bond trading:

The traders here are mostly educated in math or physics, often outside the United States, and their desks are piled high with textbooks like the “R Graphs Cookbook,” for working with obscure computer programming languages.

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full July edition (with highlights from this blog and community events) online.

There's only a few days left to enter the Civic Data Challenge: entries are due before midnight EST on July 29 to qualify for the $100,000 in prizes. The competition, open to US residents, challenges particpants to applications and visualizations from civic health data.

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R.

The video below isn't just a tribute to the beauty of San Francisco coupled with a pretty decent indie-rock song. It also has some incredible photographic work:

 

The June 2012 issue of the R Journal, the peer-reviewed open-journal about R packages and applications of R, is now available. This issue includes articles about:

Growing up in Australia, for me a carbonated drink like Pepsi or Fanta or lemonade was always just a "soft drink". (Also, 'lemonade' in Australia was something different to 'lemonade' in the US; it's something close to 7-Up.) So when I moved to Seattle, it was surprising to me that all such things were called "pop". And then I travelling across the US, and realised it was also "soda" (which, to an Australian, is exclusively club soda), and even sometimes "coke". Not capital-C Coke, but "coke", meant any generic soft drink. It's all very confusing.

In most data science applications, preparing the data is at least half the job. Finding where the data lives, figuring out how to access it, finding the right records, filtering, cleaning and transforming the data ... all of this has to be done before the statistical analysis can even begin.