Skip to Content


Since 2007, Karl Rexer has been collecting data on the tools, skills and practices of statisticians and data miners. Over the years, his semi-annual Data Miner Survey has expanded in scope, and now includes research on the topics of data science, big data, and analytics applications on business.

by Herman Jopia

What is Binning?

Binning is the term used in scoring modeling for what is also known in Machine Learning as Discretization, the process of transforming a continuous characteristic into a finite number of intervals (the bins), which allows for a better understanding of its distribution and its relationship with a binary variable. The bins generated by the this process will eventually become the attributes of a predictive characteristic, the key component of a Scorecard.

Why Binning?

A quick heads-up that tomorrow (Tuesday) at 10AM Pacific Time I'll be giving a live (and free) webinar: Reproducibility with Revolution R Open and the Checkpoint Package.

Think of a number. Now multiply it with itself. Now add the number you first thought of. Multiply the result with itself. Again add the number you first thought of. Keep repeating that process. If the number you have keeps getting larger and larger, the number you first thought of is not part of the Mandelbrot Set. (For example, starting with 0.5 you get 0.75 -> 1.0625 -> 1.628 -> 3.153 ... and so on off to infinity, so 0.5 is not part of the Mandelbrot Set.) 

The On Broadway project collected more than 600,000 photographs taken near Broadway in New York City during a six-month period in 2014. If you're in New York, you can explore the images in an interactive installation at the New York Public Library though the end of this year. You can also explore them in your browser using this online app

A new version of the checkpoint package for R has just been released on CRAN. With the checkpoint package, you can easily:

by Gary R. Moser
Director of Institutional Research and Planning
The California Maritime Academy

I recently contacted Joseph Rickert about inviting Vim guru Drew Niel (web:, book: "Practical Vim: Edit Text at the Speed of Thought") to speak at the Bay Area R User Group group. Due to Drew's living in Great Britain that might not be easily achieved, so Joe generously extended an invitation for me to share a bit about why I like Vim here.

It's hard to overstate the role of open-source software in the data science revolution. Tools like Hadoop, Spark, R, and Python are essential parts of the modern data science toolkit. These tools are likewise part of the solutions built by the Consulting Services group at Revolution Analytics.