Skip to Content


It's not an overstatement to say that, at least for me personally, Edward Tufte's book The Visual Display of Quantitative Information was transformative. Reading this book got me and, I feel confident saying, many many other data scientists passionate about visualizing data.

by Andrie de Vries

A few weeks ago I wrote about the growth of CRAN packages, where I demonstrated how to scrape CRAN archives to get an estimate of the number of packages over time. In this post I briefly mentioned that the Ecdat package contains a dataset, CRANpackages, with snapshots recorded by John Fox and Spencer Graves.

Here is a plot of the data they collected. The dataset contains data through 2014, so I manually added the package count as of today (8,329).

by Lixun Zhang, Data Scientist at Microsoft

As a data scientist, I have experience with R. Naturally, when I was first exposed to Microsoft R Open (MRO, formerly Revolution R Open) and Microsoft R Server (MRS, formerly Revolution R Enterprise), I wanted to know the answers for 3 questions:

Naomi Robbins, author of Creating More Effective Graphs and Forbes contributor has teamed up with daughter Dr Joyce Robbins to present a new webinar this Thursday April 28, Creating Effective Graphs with Microsoft R Open.

England and America have been called "two countries separated by a common language", but if the befuddled looks I get sometimes are anything to go by the same can be said of Australia and America. I've dropped most of these abbreviations from my speech (because of the aforementioned befuddlement), but I was still surprised just how many there are:


You might think literary criticism is no place for statistical analysis, but given digital versions of the text you can, for example, use sentiment analysis to infer the dramatic arc of an Oscar Wilde novel.

As I mentioned yesterday, Microsoft R Server now available for HDInsight, which means that you can now run R code (including the big-data algorithms of Microsoft R Server) on a managed, cloud-based Hadoop instance.