Skip to Content


Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data.

This blog has its fair share of typos and homophones, I know. There's always room for more proofreading. (And don't get me started on the inconsistent use of "favorite" and "favourite" — my spelling locus is drifting somewhere in the mid-Pacific these days.) But I am a bit of a grammar nerd, so I appreciate Weird Al Yankovic's attempt to get the social media set to use words proper-like (and also get off my lawn!).


[Reposting to update with the new date for the webinar: Tuesday July 29.]

by Yaniv Mor, Co-founder & CEO of Xplenty

How do you get Big Data ready for R? Gigabytes or terabytes of raw data may need to be combined, cleaned, and aggregated before they can be analyzed. Processing such large amounts of data used to require installing Hadoop on a cluster of servers, not to mention coding MapReduce jobs in Pig or Java. Those days are over.

InsideBigData has published a new Guide to Machine Learning, in collaboration with Revolution Analytics.

The most entertaining book I've read in the past few months is The Martian, by Andy Weir. It tells the story of astronaut Mark Watney, a member of a six-person mission to Mars in the near future. After an accident early in the mission, Watney is stranded alone on the surface of Mars. The book is the story of his quest for survival.

IEEE — the world's largest professional association for the language of technology — recently published its ranking of the popularity of programming languages. The R language comes in at number 9 in the list.

As announced by Peter Dalgaard for the R Core Team today, R 3.1.1 has been released. Codenamed "Sock it to Me", this is a patch release for R 3.1, and mostly includes minor bug fixes. It also includes some small improvements, including easier access to package help files, improved accuracy when importing data with very large integers, and some clearer warnings and error messages.