Skip to Content


I just got back from the Strata+Hadoop World conference in New York, and amongst the usual talks on the technology and applications of big data and data science ran a new thread: data ethics. DJ Patil, the US government's chief data scientist, made a call for comments on data ethics in his keynote and in a follow-up discussion session.

Hadley Wickham, RStudio's Chief Scientist and prolific author of R books and packages, conducted an AMA (Ask Me Anything) session on Reddit this past Monday. The session was tremendously popular, generating more than 500 questions/comments and promoting the AMA to the front page of Reddit.

If you're not familiar with Hadley's work (which would be a surprise if you're an R user), his own introduction in the Reddit AMA post will fill you in:

by Andrie de Vries

A few weeks ago I wrote about the Jupyter notebooks project and the R kernel. In the comments, I was asked how to resize the plots in a Jupyter notebook.

by Bob Horton
Microsoft Senior Data Scientist

Learning curves are an elaboration of the idea of validating a model on a test set, and have been widely popularized by Andrew Ng’s Machine Learning course on Coursera. Here I present a simple simulation that illustrates this idea.

If you've developed a useful function in R (say, a function to make a forecast or prediction from a statistical model), you may want to call that function from an application other than R. For example, you might want to display the forecast (calculated in R) as part of a desktop, web-based or mobile application. One solution is to install R alongside the application and call it directly, but that can be difficult — or impossible, in the case of mobile apps. (You also need to be careful to comply with R's open-source GPL2 license.)

Sure, the Solar System is big, but it's probably a lot bigger than you think, thanks to textbook representations that squeeze all the planets and their orbits into one page. Even at the speed of light, it takes more than 40 minutes to get from the Sun to Jupiter (a journey you can experience in real-time here).

The RHadoop packages make it easy to connect R to Hadoop data (rhdfs), and write map-reduce operations in the R language (rmr2) to process that data using the power of the nodes in a Hadoop cluster. But getting the Hadoop cluster configured, with R and all the necessary packages installed on each node, hasn't always been so easy.

by Joseph Rickert

This week, the Infrastructure Steering Committee (ISC) of the R Consortium unanimously elected Hadley Wickham as its chair thereby also giving Hadley a seat on the R Consortium board of directors. Congratulations Hadley!!

by Andrie de Vries

Every once in a while I try to remember how to do interpolation using R. This is not something I do frequently in my workflow, so I do the usual sequence of finding the appropriate help page:


Help pages:

          stats::approx Interpolation Functions
   stats::NLSstClosestX Inverse Interpolation
          stats::spline Interpolating Splines