Skip to Content

Blogs

by Joseph Rickert

We are pleased to announce that Jo-fai Chow is the winner of the Revolution Analytics contest. Jo-fai’s entry, which was implemented as a Shiny project, may be viewed by clicking on the figure below.

 

Hilary Parker has contributed a lovely article to Significance, the magazine of the American Statistical Association and the Royal Statistical Society, on using R to set your Google calendar to mark the time of sunsets.

by Nick Elprin, Co-Founder of Domino Data Lab

We built a platform that lets analysts deploy R code to an HTTP server with one click, and we describe it in detail below.  If you have ever wanted to invoke your R model with a simple HTTP call, without dealing with any infrastructure setup or asking for help from developers — imagine Heroku for your R code — we hope you’ll enjoy this.

Introduction

A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science.

There's a lot of great video on YouTube, and also a lot of boring video that could be interesting if only it was shorter. But speeding up video just amplifies the camera shake to the point of motion sickness. Microsoft has solved that problem with Hyperlapse, which does for video what Panorama mode did for photography:

 

You can learn more about how it's done in this video, or at the Hyperlapse website.

If you're looking for just the right package to solve your R problem, you could always browse through the list of available packages on CRAN. But with almost 6000 entries, that's not going to be the most efficient process.

A statistical consultant known only as "Stanford PhD" has put together a table comparing the statistical capabilities of the software packages R, Matlab, SAS, Stata and SPSS. For each of 57 methods (including techniques like "ridge regression", "survival analysis", "optimization") the author ranks the capabilities of each software package as "Yes" (fully supported), "Limited" or "Experimental". Here are the first few rows of the table:

Joe wrote about this already, but now the recording of John Chambers' keynote presentation from the useR! 2014 conference, Interfaces, Efficiency and Big Data, is now available for viewing thanks to Data Science LA.