The team at AMPLab has announced a developer preview of SparkR, an R package enabling R users to run jobs on an Apache Spark cluster.  Spark is an open source project that supports distributed in-memory computing for advanced analytics, such as fast queries, machine learning, streaming analytics and graph engines.

In case you missed them, here are some articles from December of particular interest to R users:

A ComputerWorld tutorial on basic data processing with R.

Prediction: R will replace legacy SAS solutions and go mainstream.

O'Reilly has just published the results of the Data Scientist Salary Survey, based on data collected from attendees of the O'Reilly Strata conferences in 2012 and 2013. There were some interesting results from the salary portion of the survey:

Our crack-shot R trainer Luba Gloukhov generated a spirited (pun intended!) discussion from her post K-means Clustering 86 Single Malt Scotch Whiskies, with mentions of her analysis at FlowingData and

We had a marvellous series of guest posts here on the blog over the past few weeks. I'd like to give a special thanks to all of our guest bloggers for contributing, with special thanks to Joe Rickert for stepping in as our acting editor for the past 3 weeks. If you were celebrating or vacationing over the holidays, here's what you missed:

I tweeted out the image below earlier this month, and it quickly went viral:

We’re very happy to announce our recent publication with Steve Weston in the Journal of Statistical Software (JSS), “Scalable Strategies for Computing with Massive Data”, JSS Volume 55 Issue 14. In a nutshell:

Some Applications of the xts Time Series Package