R 2.15.1 includes performance improvements inspired by dataframe package
The latest update to open-source R, R 2.15.1, was released this morning. (You can grab sources now, and binary versions will hit the CRAN mirrors over the next couple of days.) In addition to several new features and bug fixes (including the new globalVariables function, which will be a boon to package developers), this update also includes some significant performance improvements inspired by the dataframe package.
At the useR! 2012 conference last week, Google's Tim Hesterberg introduced the dataframe package (available now on CRAN), which has been in use for the last three years amongst Google's 500+ R users. (You can download Tim's PDF slides here.) The package makes no functional changes to R; instead, it improves the implementation of data frames to reduce the number of temporary copies made of data. Tim reported that using the dataframe package with R 2.15.0 improved performance by 21% for creation and column subscripting, and by 14% for row subscripting.
Tim mentioned during the talk that r-core member Luke Tierney was in the process of incorporating performance improvements from the dataframe package into base R, and indeed several such improvements are noted in the NEWS file. All the improvements are devoted to reducing the number of times R makes temporary internal copies of data, which improves both speed and memory usage of R. And because these are low-level improvements at R's core, these improvements will affect many R functions, not just those related to data frames.
If you've built R 2.15.1 already, have you noticed performance improvements? Let us know in the comments.
R-announce mailing list: R 2.15.1 is released