# stepAIC {MASS}

### Description

Performs stepwise model selection by AIC.

### Usage

stepAIC(object, scope, scale = 0, direction = c("both", "backward", "forward"), trace = 1, keep = NULL, steps = 1000, use.start = FALSE, k = 2, ...)

### Arguments

- object
- an object representing a model of an appropriate class. This is used as the initial model in the stepwise search.
- scope
- defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components
`upper`

and`lower`

, both formulae. See the details for how to specify the formulae and how they are used. - scale
- used in the definition of the AIC statistic for selecting the models, currently only for
`lm`

and`aov`

models (see`extractAIC`

for details). - direction
- the mode of stepwise search, can be one of
`"both"`

,`"backward"`

, or`"forward"`

, with a default of`"both"`

. If the`scope`

argument is missing the default for`direction`

is`"backward"`

. - trace
- if positive, information is printed during the running of
`stepAIC`

. Larger values may give more information on the fitting process. - keep
- a filter function whose input is a fitted model object and the associated
`AIC`

statistic, and whose output is arbitrary. Typically`keep`

will select a subset of the components of the object and return them. The default is not to keep anything. - steps
- the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.
- use.start
- if true the updated fits are done starting at the linear predictor for the currently selected model. This may speed up the iterative calculations for
`glm`

(and other fits), but it can also slow them down.**Not used**in R. - k
- the multiple of the number of degrees of freedom used for the penalty. Only
`k = 2`

gives the genuine AIC:`k = log(n)`

is sometimes referred to as BIC or SBC. - ...
- any additional arguments to
`extractAIC`

. (None are currently used.)

### Details

The set of models searched is determined by the `scope`

argument. The right-hand-side of its `lower`

component is always included in the model, and right-hand-side of the model is included in the `upper`

component. If `scope`

is a single formula, it specifies the `upper`

component, and the `lower`

model is empty. If `scope`

is missing, the initial model is used as the `upper`

model.

Models specified by `scope`

can be templates to update `object`

as used by `update.formula`

.

There is a potential problem in using `glm`

fits with a variable `scale`

, as in that case the deviance is not simply related to the maximized log-likelihood. The `glm`

method for `extractAIC`

makes the appropriate adjustment for a `gaussian`

family, but may need to be amended for other cases. (The `binomial`

and `poisson`

families have fixed `scale`

by default and do not correspond to a particular maximum-likelihood problem for variable `scale`

.)

Where a conventional deviance exists (e.g. for `lm`

, `aov`

and `glm`

fits) this is quoted in the analysis of variance table: it is the *unscaled* deviance.

### Values

the stepwise-selected model is returned, with up to two additional components. There is an `"anova"`

component corresponding to the steps taken in the search, as well as a `"keep"`

component if the `keep=`

argument was supplied in the call. The `"Resid. Dev"`

column of the analysis of deviance table refers to a constant minus twice the maximized log likelihood: it will be a deviance only in cases where a saturated model is well-defined (thus excluding `lm`

, `aov`

and `survreg`

fits, for example).

### References

Venables, W. N. and Ripley, B. D. (2002) *Modern Applied Statistics with S.* Fourth edition. Springer.

### Note

The model fitting must apply the models to the same dataset. This may be a problem if there are missing values and an `na.action`

other than `na.fail`

is used (as is the default in R). We suggest you remove the missing values first.

### Examples

quine.hi <- aov(log(Days + 2.5) ~ .^4, quine) quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn) quine.stp <- stepAIC(quine.nxt, scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1), trace = FALSE) quine.stp$anova cpus1 <- cpus for(v in names(cpus)[2:7]) cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])), include.lowest = TRUE) cpus0 <- cpus1[, 2:8] # excludes names, authors' predictions cpus.samp <- sample(1:209, 100) cpus.lm <- lm(log10(perf) ~ ., data = cpus1[cpus.samp,2:8]) cpus.lm2 <- stepAIC(cpus.lm, trace = FALSE) cpus.lm2$anova example(birthwt) birthwt.glm <- glm(low ~ ., family = binomial, data = bwt) birthwt.step <- stepAIC(birthwt.glm, trace = FALSE) birthwt.step$anova birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2) + I(scale(lwt)^2), trace = FALSE) birthwt.step2$anova quine.nb <- glm.nb(Days ~ .^4, data = quine) quine.nb2 <- stepAIC(quine.nb) quine.nb2$anova

Documentation reproduced from package MASS, version 7.3-29. License: GPL-2 | GPL-3