Skip to Content


Computerworld's Sharon Machlis published today a very useful list of R packages that every R user should know. The list covers packages for data import, data wrangling, data visualization and package development, but for beginning R users the biggest challenge is usually just dealing with data. To that end, I thought it was worth listing the package for data access and manipulation, which I thoroughly endorse:

by Gregory Vandenbrouck
Software Engineer at Microsoft

This post is the first in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux.

We’ll start with a relatively simple case of pulling data from SQL Azure to an R client on Windows.

Creating the database

The Azure Management site changes quite often, therefore these instructions are valid “at the time of this writing” :o)

KDnuggets is once again running its annual poll of data science software tools, now in its 16th year. If you'd like to participate, visit the KDnuggets poll page and answer the question, "What Predictive Analytics, Data Mining, Data Science software/tools you used in the past 12 months?". The poll allows you to select up to 20 tools from the following categories:

When I was 13 or 14 my school sent us on an astronomy camp near Port Augusta, South Australia. Seeing the rings of Saturn through a telescope for the first time was a huge thrill, but what I remember most is simply the skies at night. The camp was timed near the new moon, so the moon set early and after that: darkness. But looking up, so many stars! If you've lived in the city all your life you won't believe how many you can see just with your naked eyes. 

In case you missed them, here are some articles from April of particular interest to R users.

Joseph Rickert reviews the inaugural New York City R User Conference, featuring Andrew Gelman.

by Joseph Rickert

The following multi-panel graph, which graces the cover of the most recent issue of the Journal of Computational and Graphical Statistics ,JCGS, (Vol 24, Num 1, March 2015) is from the paper by Grolemund and Wickham entitled Visualizing Complex Data With Embedded Plots. The four plots are noteworthy for a couple or reasons: 

Arthur Charpentier was trying to solve an interesting problem with R: given this data set of random walks in the 2-D plane, what is the likely origin of a pathway that ends in the black circle below?

It's pretty easy to generate random data like this with a few lines of code in R. And with 2 million trajectories of 80 points each, you have some moderately-sized data to analyze: about 4Gb.

Since 2009, it has been possible to call R from SAS programs. However, this integration requires IML, an add-on matrix-object language for SAS which isn't available with all SAS installations and is separate from the standard SAS PROC execution model.

If you visit and upload a photo of yourself, a maching learning algorithm (the 'How Old Robot') will indentify your gender and tell you how old you look. Here's how it did on a photo of me: