Skip to Content


R is a functional language, which means that your code often contains a lot of ( parentheses ). And complex code often means nesting those parentheses together, which make code hard to read and understand. But there's a very handy R package — magrittr, by Stefan Milton Bache — which lets you transform nested function calls into a simple pipeline of operations that's easier to write and understand.

Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data.

This blog has its fair share of typos and homophones, I know. There's always room for more proofreading. (And don't get me started on the inconsistent use of "favorite" and "favourite" — my spelling locus is drifting somewhere in the mid-Pacific these days.) But I am a bit of a grammar nerd, so I appreciate Weird Al Yankovic's attempt to get the social media set to use words proper-like (and also get off my lawn!).


[Reposting to update with the new date for the webinar: Tuesday July 29.]

by Yaniv Mor, Co-founder & CEO of Xplenty

How do you get Big Data ready for R? Gigabytes or terabytes of raw data may need to be combined, cleaned, and aggregated before they can be analyzed. Processing such large amounts of data used to require installing Hadoop on a cluster of servers, not to mention coding MapReduce jobs in Pig or Java. Those days are over.

InsideBigData has published a new Guide to Machine Learning, in collaboration with Revolution Analytics.

The most entertaining book I've read in the past few months is The Martian, by Andy Weir. It tells the story of astronaut Mark Watney, a member of a six-person mission to Mars in the near future. After an accident early in the mission, Watney is stranded alone on the surface of Mars. The book is the story of his quest for survival.