Skip to Content


In most data science applications, preparing the data is at least half the job. Finding where the data lives, figuring out how to access it, finding the right records, filtering, cleaning and transforming the data ... all of this has to be done before the statistical analysis can even begin.

John Myles White, self-described "statistics hacker" and co-author of "Machine Learning for Hackers" was interviewed recently by The Setup. In the interview, he describes his some of his go-to R packages for data science:

Linear Programming is a mathematical technique used to find the values of some variables (within the bounds of some defined constraints) to find the maximum value of a quantity. For example, consider this problem from the FishyOperations blog

In 2004, NASA sent two rovers to Mars. Each rover had scheduled a three-month mission to explore the surface, after safely bouncing onto the surface of Mars in an cushion of airbag-like balloons. In a marvel of engineering and dedication Spirit lasted six years, and Opportunity is still advancing our scientific knowledge about Mars to this day.

At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google's internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday sheds some light:

Got some great reactions to the Napa Valley wine tasting map made with the ggmap package I posted on Monday. A couple of people asked if similar maps could be made for other wine regions (like Australia's Hunter Valley, or the Walla Walla region in Washington): provided you have a list of winery addresses, tweaks to the same R script should work just fine.

In case you missed them, here are some articles from June of particular interest to R users.

Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a "data dictionary" defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC IMPORT statement.

R has had a maps package available since the very early days. It's great for simple geographic maps, but the political boundaries can be out of date. For more detailed maps, you can also download shape files and use the sp package to draw borders directly.

Since the first Formula 1 auto race in 1950, the drive for faster lap times, more reliability, and changes in the rules governing competition and safety have had an evolutionary effect. The bullet-shaped cars of the 50's have morphed into the low-slung, spoiler-adorned, aerodynamic supercars of today. This animation from designer Rufus Blacklock captures the evolution of the chassis, spoilers, engine and steering wheels over 60 years in just 60 seconds: