Skip to Content


The following post by Norm Matloff originally appeared on his blog, Mad(Data)Scientist, on September 15th. We rarely republish posts that have appeared on other blogs, however, the questions that Norm raises both with respect to the teaching of statistics, and his assertion that "R's statistical procedures are centered far too much on significance testing" deserve a second look. Moreover, Norm's post elicited quite a few comments, many of which are at a high level of discourse.

Hadley Wickham's dplyr package is a great toolkit for getting data ready for analysis in R. If you haven't yet taken the plunge to using dplyr, Kevin Markham has put together a great hands-on video tutorial for his Data School blog, which you can see below.

I think I may be one of the few kids that actually liked the ET: The Extra-Terrestial game for the Atari 2600. Sure it was frustrating, but so were most games of the era, and at least it wasn't a disappointing "recreation" of one of my arcade favourites.

The militarization of local police departments here in the US has been much in the news lately, and the New York Times published in June an in-depth article on how materiel from wars has ended up in the hands of US counties.

by Joseph Rickert

One of the most difficult things about R, a problem that is particularly vexing to beginners, is finding things. This is an unintended consequence of R's spectacular, but mostly uncoordinated, organic growth. The R core team does a superb job of maintaining the stability and growth of the R language itself, but the innovation engine for new functionality is largely in the hands of the global R communty. 

In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective "data hacker"?

by Matt Sundquist, Plotly Co-founder

It's delightfully smooth to publish R code, plots, and presentations to the web. For example:

You're probably familiar with the classic Travelling Salesman problem: given (say) 20 cities, what is shortest route you can take that passes through all 20 cities and returns to the starting point? It's a difficult problem to solve, because you need to try all possible routes to find the minimum, and there are a LOT of possibilities. For a 20-city tour there are more than 1 trillion trillion routes to try — and that's a fairly small problem!

Rrrr! It's International Talk Like a Pirate day again, mateys, the day all landlubbers should talk in pirate lingo. (If you're unsure how, R can help.) It's also the day where you can pick up some great O'Reilly R books for half price

A quick heads up that if you'd like to get a great introduction to doing data science with the R language, Joe Rickert will be giving a free webinar next Thursday, September 25: Data Science with R. Regular readers of the blog will be familiar with Joe's posts on this topic.