rpart {rpart}
Description
Fit a rpart model
Usage
rpart(formula, data, weights, subset, na.action = na.rpart, method,
model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)
Arguments
- formula
- a formula, with a response but no interaction terms. If this a a data frome, that is taken as the model frame (see
model.frame). - data
- an optional data frame in which to interpret the variables named in the formula.
- weights
- optional case weights.
- subset
- optional expression saying that only a subset of the rows of the data should be used in the fit.
- na.action
- the default action deletes all observations for which
yis missing, but keeps those in which one or more predictors are missing. - method
- one of
"anova","poisson","class"or"exp". Ifmethodis missing then the routine tries to make an intelligent guess. Ifyis a survival object, thenmethod = "exp"is assumed, ifyhas 2 columns thenmethod = "poisson"is assumed, ifyis a factor thenmethod = "class"is assumed, otherwisemethod = "anova"is assumed. It is wisest to specify the method directly, especially as more criteria may added to the function in future.Alternatively,
methodcan be a list of functions namedinit,splitandeval. Examples are given in the file ‘tests/usersplits.R’ in the sources, and in the vignettes ‘User Written Split Functions’. - model
- if logical: keep a copy of the model frame in the result? If the input value for
modelis a model frame (likely from an earlier call to therpartfunction), then this frame is used rather than constructing new data. - x
- keep a copy of the
xmatrix in the result. - y
- keep a copy of the dependent variable in the result. If missing and
modelis supplied this defaults toFALSE. - parms
- optional parameters for the splitting function.
Anova splitting has no parameters.
Poisson splitting has a single parameter, the coefficient of variation of the prior distribution on the rates. The default value is 1.
Exponential splitting has the same parameter as Poisson.
For classification splitting, the list can contain any of: the vector of prior probabilities (componentprior), the loss matrix (componentloss) or the splitting index (componentsplit). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can beginiorinformation. The default priors are proportional to the data counts, the losses default to 1, and the split defaults togini. - control
- a list of options that control details of the
rpartalgorithm. Seerpart.control. - cost
- a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.
- ...
- arguments to
rpart.controlmay also be specified in the call torpart. They are checked against the list of valid arguments.
Details
This differs from the tree function in S mainly in its handling of surrogate variables. In most details it follows Breiman et. al (1984) quite closely. R package tree provides a re-implementation of tree.
Values
An object of class rpart. See rpart.object.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
See Also
rpart.control, rpart.object, summary.rpart, print.rpart
Examples
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) fit2 <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis, parms = list(prior = c(.65,.35), split = "information")) fit3 <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis, control = rpart.control(cp = 0.05)) par(mfrow = c(1,2), xpd = NA) # otherwise on some devices the text is clipped plot(fit) text(fit, use.n = TRUE) plot(fit2) text(fit2, use.n = TRUE)
Documentation reproduced from package rpart, version 4.1-1. License: GPL-2 | GPL-3
