ddply {plyr}
Description
For each subset of a data frame, apply function then combine results into a data frame.
Usage
ddply(.data, .variables, .fun = NULL, ...,
.progress = "none", .inform = FALSE, .drop = TRUE,
.parallel = FALSE, .paropts = NULL)
Arguments
- .fun
- function to apply to each piece
- ...
- other arguments passed on to
.fun - .progress
- name of the progress bar to use, see
create_progress_bar - .parallel
- if
TRUE, apply function in parallel, using parallel backend provided by foreach - .paropts
- a list of additional options passed into the
foreachfunction when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the.exportand.packagesarguments to supply them so that all cluster nodes have the correct environment set up for computing. - .inform
- produce informative error messages? This is turned off by by default because it substantially slows processing speed, but is very useful for debugging
- .data
- data frame to be processed
- .variables
- variables to split data frame by, as
as.quotedvariables, a formula or character vector - .drop
- should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default)
Values
A data frame, as described in the output section.
Input
This function splits data frames by variables.
Output
The most unambiguous behaviour is achieved when .fun returns a data frame - in that case pieces will be combined with rbind.fill. If .fun returns an atomic vector of fixed length, it will be rbinded together and converted to a data frame. Any other values will result in an error.
If there are no results, then this function will return a data frame with zero rows and columns (data.frame()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.
See Also
tapply for similar functionality in the base package
Other data frame input: d_ply, daply, dlply
Other data frame output: adply, ldply, mdply
Examples
# Summarize a dataset by two variables require(plyr) dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54) ) # Note the use of the '.' function to allow # group and sex to be used without quoting ddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2)) # An example using a formula for .variables ddply(baseball[1:100,], ~ year, nrow) # Applying two functions; nrow and ncol ddply(baseball, .(lg), c("nrow", "ncol")) # Calculate mean runs batted in for each year rbi <- ddply(baseball, .(year), summarise, mean_rbi = mean(rbi, na.rm = TRUE)) # Plot a line chart of the result plot(mean_rbi ~ year, type = "l", data = rbi) # make new variable career_year based on the # start year for each player (id) base2 <- ddply(baseball, .(id), mutate, career_year = year - min(year) + 1 )
Documentation reproduced from package plyr, version 1.8. License: MIT
