Skip to Content

Blogs

Today, Teradata announced the new Teradata Database 14.10 and with it some exciting news for R programmers: the first next-generation in-database R analytics that are fully parallel and scalable.

The most recent edition of the Revolution Newsletter is now available. In case you missed it, the news section is below, and you can read the full September edition (with highlights from this blog and community events) online.

Datanami interviews Revolution Analytics' Bill Jacobs about the upcoming Revolution R Enterprise 7, which will be available later this year. A key feature of this release is that that the big-data predictive analytics R functions in the ScaleR package will run on data situated in a Hadoop cluster, and use the parallel computational power of the Hadoop nodes for the work:

Here's some amazing new software ... that might also make you distrust every "real" photo you see from now on:

 

Here's a paper about geosemantic snapping if you want to learn more about how this wizardry happens.

That's all for this week! Don't forget you can find previous Because it's Friday posts here, and have a great weekend!

During the 2013 JSM (Joint Statistics Meetings) Conference in Montreal, Revolution Analytics conducted a survey of attendees from August 5 to August 8. The 865 respondents gave their opinions on the privacy and ethics related to data collection, and on their familiarity with statistical software used for the analysis of such data.

Out of the 865 statisticians surveyed:

The massively-online open course (MOOC) platform Coursera has already delivered two essential free courses for anyone who wants to learn the R languageComputing for Data Analysis, presented by Roger Peng, covers the basics of R programming. The follow-up course Data Analysis, presented by Jeff Leek, covers statistical modeling and data visualization with R.

One of the practical challenges of Bayesian statistics is being able to deal with all of the complex probability distributions involved. You begin with the likelihood function of interest, but once you combine it with the prior distributions of all the parameters, you end up with a complex posterior distribution that you need to characterize. Since you usually can't calculate that distribution analytically, the best you can do is to simulate from that distribution (generally, using Markov-Chain Monte-Carlo techniques).

If you enjoy data visualization and schadenfreude, here's a great Tumblr to spend some browsing through this weekend. WTF Visualizations features a steady diet of charts, graphs, infographics and data visualizations of all stripes that are poorly constructed, confusing, misleading or just make no sense. Here's just one example: