Skip to Content


By Srini Kumar, Director of Data Science at Microsoft

Who does not hate being stopped and given a traffic ticket? Invariably, we think that something is not fair that we got it and everyone else did not. I am no different, and living in the SF Bay Area, I have often wondered if I could get the data about traffic tickets, particularly since there may be some unusual patterns.

Airbnb, the property-rental marketplace that helps you find a place to stay when you're travelling, uses R to scale data science. Airbnb is a famously data-driven company, and has recently gone through a period of rapid growth.

The field of neuroscience -- the study of brains and the nervous system -- has taken some major leaps in recent years. Scientists can now gather real-time electrical activity from the brain during actions and thoughts, which is helping to pinpoint the exact location of brain lesions caused by strokes, and is leading to promising treatments for epilepsy and even profound paralysis. Joseph Sirosh describes these advances in a keynote presented at Strata Hadoop World last week: 


If you've got a few minutes to kill and just want to immerse yourself in the stream of consciousness of the Internet, try watching Petit Tube. It's an unending sequence of YouTube videos that have exactly zero views, and because your view increments that counter, will never again be seen on Petit Tube (or, most likely, by anyone). You can be the first viewer of a bizarre but strangely engrossing stream of poorly produced real estate ads, video game clips, tourist videos, foreign commercials and -- yes -- cat videos.

Data visualization with R doesn't always have to be serious. Here are a couple of fun charts created recently by R users.

First, here's a minimalist rendition of the characters in The Simpsons, by an anonymous blogger:

by Joseph Rickert

Packages continue to flood into CRAN at a rate the challenges the sanity of anyone trying to keep up with what's new. So far this month, more than 190 packages have been added. Here is a my view of what's interesting in this March madness.

by Andrie de Vries

Earlier today Microsoft announced that Jupyter Notebooks are now available with the R Kernel as a service in Azure Machine Learning (ML) Studio.

I wrote about Jupyter Notebooks in September 2015 (Using R with Jupyter Notebooks), where I noted some of the great benefits of using notebooks:

by Bob Horton, Senior Data Scientist, Microsoft

This is a follow-up to my earlier post on learning curves. A learning curve is a plot of predictive error for training and validation sets over a range of training set sizes. Here we’re using simulated data to explore some fundamental relationships between training set size, model complexity, and prediction error.

Start by simulating a dataset: