Skip to Content


The O'Reilly Strata conferences are always great fun to attend, and this latest installment in New York City is no exception. This one is super-busy though; the conference has been sold out for weeks -- and not just marketing-sold-out, it's fire-department-sold out. It's non-stop conversations and presentations, and it's tough to move through the hallways in between.

Nonetheless, I thought I'd pause for a couple of minutes and share some of the highlights for me so far.

On Thursday next week (November 1), I'll be giving a new webinar on the topic of Big Data, Data Science and R. Titled "The Rise of Data Science in the Age of Big Data Analytics: Why Data Distillation and Machine Learning Aren’t Enough", this is a provocative look at why data scientists cannot be replaced by technology, and why R is the ideal environment for building data science applications. Here's the abstract:

There are new local R user groups in eight (!) countries to announce this month:

The population of the world has been over 7 billion for about a year now. But those seven billion aren't distributed equally around the globe. 1.2 billion people — about  in India alone (despite it havingjust 2% of the world's land area). At the other end of the spectrum, the entire continent of Australia houses about 0.3% of Australia.

A new report from analyst firm Gartner forecasts that IT organizations will spend $232 billion (US) on hardware, software and services related to Big Data through 2016. Some key findings from the report:

The chart below comes by way of the is.R blog and shows the average ideology of the members of the United State House of Representatives within the Republican (red) and Democratic (blue) parties. (Other parties are shown in green.) The chart is shown as a time series, from the first US congress in 1789, to the most recent full congress (the 111th, from 2010). The 80th congress first met in 1947.

In a webinar today previewing Spotfire 5 (scheduled for release this November), TIBCO announced that it will include TERR: The Tibco Enterprise Runtime for R. TERR is a closed-source reimplementation of the R language engine, and not based on the GPL-licensed R project from the R Foundation. Here's the relevant slide from the webinar:

If you're reporting on the results of a statistical analysis for a journal or report, you'll probably be building a table comparing two or models. Such tables may include variables in the model, parameter estimates, and p-values, and model summary statistics. If you want to include such tables based on lm, glm, svyglm, gee, gam, polr, survreg or coxph models in a LaTeX document, Marek Hlavac's stargazer package may save you some time.

You're probably familiar with the Game Of Life: create a grid sparsely populated with "cells"; apply a couple of simple rules (cells "die" if they have too few or too many neighbours; dead cells spring into life if they have exactly three); watch complex behaviour spontaneously emerge. This video explains the traditional Game of Life in more detail (watch to the end so see come interesting "creatures" that emerge from these simple rules).