Skip to Content


by Andrie de Vries

We have written on several occasions about AzureML, the Microsoft machine learning studio that is part of the Cortana Analytics suite:

Bob Horton
Sr Data Scientist, Microsoft

Wikipedia describes Simpson’s paradox as “a trend that appears in different groups of data but disappears or reverses when these groups are combined.” Here is the figure from the top of that article (you can click on the image in Wikipedia then follow the “more details” link to find the R code used to generate it. There is a lot of R in Wikipedia).

Two recent surveys — one based on LinkedIn skills data, and another a direct survey of data miners — show that R remains the most popular software for statistical data analysis.

Despite some skepticism (audio NSFW), we have a pretty good grasp on how magnets work, and some people have used that knowledge to do some pretty cool things with them. For example, a Norwegian startup is trying to launch a magnet-based game that allows you to create some spectacular chain reactions:

In case you missed them, here are some articles from October of particular interest to R users. 

A video from the PASS 2015 conference in Seattle shows R running within SQL Server 2016. The preview for SQL Server 2016 includes Revolution R Enterprise (as SQL Server R Services). 

by Andrie de Vries

For much of my data science work, I want to have the very latest package from CRAN or github.  However, once any work finds it way into production server (where it runs on a regular schedule), I want my environment to be stable. Most importantly, for these projects I want to ensure I have reproducible results. In these cases I want to isolate the packages I use, and ensure I don't "pollute" my library with the most recent package versions. In this post I give some tips for keeping my libraries clean.

Betterment, the online automated investing service, uses R for modeling, analysis and reporting. In a recent blog post on the company website, data scientist Sam Swift suggests using R or Python as open data analysis platforms and goes on to reveal: