lm is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although
aov may provide a more convenient interface for these).
lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...)
- an object of class
"formula"(or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.
- an optional data frame, list or environment (or object coercible by
as.data.frameto a data frame) containing the variables in the model. If not found in
data, the variables are taken from
environment(formula), typically the environment from which
- an optional vector specifying a subset of observations to be used in the fitting process.
- an optional vector of weights to be used in the fitting process. Should be
NULLor a numeric vector. If non-NULL, weighted least squares is used with weights
weights(that is, minimizing
sum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,
- a function which indicates what should happen when the data contain
NAs. The default is set by the
options, and is
na.failif that is unset. The ‘factory-fresh’ default is
na.omit. Another possible value is
NULL, no action. Value
na.excludecan be useful.
- the method to be used; for fitting, currently only
method = "qr"is supported;
method = "model.frame"returns the model frame (the same as with
model = TRUE, see below).
- model, x, y, qr
- logicals. If
TRUEthe corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.
- logical. If
FALSE(the default in S but not in R) a singular fit is an error.
- an optional list. See the
- this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be
NULLor a numeric vector of length equal to the number of cases. One or more
offsetterms can be included in the formula instead or as well, and if more than one are specified their sum is used. See
- additional arguments to be passed to the low level regression fitting functions (see below).
lm are specified symbolically. A typical model has the form
response ~ terms where
response is the (numeric) response vector and
terms is a series of terms which specifies a linear predictor for
response. A terms specification of the form
first + second indicates all the terms in
first together with all the terms in
second with duplicates removed. A specification of the form
first:second indicates the set of terms obtained by taking the interactions of all terms in
first with all terms in
second. The specification
first*second indicates the cross of
second. This is the same as
first + second + first:second.
If the formula includes an
offset, this is evaluated and subtracted from the response.
response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.
model.matrix for some further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a
terms object as the formula (see
demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either
y ~ x - 1 or
y ~ 0 + x. See
formula for more details of allowed formulae.
weights can be used to indicate that different observations have different variances (with the values in
weights being inversely proportional to the variances); or equivalently, when the elements of
weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized).
anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions
residuals extract various useful features of the value returned by
An object of class
"lm" is a list containing at least the following components:
- a named vector of coefficients
- the residuals, that is response minus fitted values.
- the fitted mean values.
- the numeric rank of the fitted linear model.
- (only for weighted fits) the specified weights.
- the residual degrees of freedom.
- the matched call.
- (only where relevant) the contrasts used.
- (only where relevant) a record of the levels of the factors used in fitting.
- the offset used (missing if none were used).
- if requested, the response used.
- if requested, the model matrix used.
- if requested (the default), the model frame used.
- (where relevant) information returned by
model.frameon the special handling of
Using time series
Considerable care is needed when using
lm with time series.
na.action = NULL, the time series attributes are stripped from the variables before the regression is done. (This is necessary as omitting
NAs would invalidate the time series attributes, and if
NAs are omitted in the middle of the series the result would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to line up series, so that the time shift of a lagged or differenced regressor would be ignored. It is good practice to prepare a
data argument by
ts.intersect(..., dframe = TRUE), then apply a suitable
na.action to that data frame and call
na.action = NULL so that residuals and fitted values are time series.
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models for analysis of variance. Applied Statistics, 22, 392--9.
Offsets specified by
offset will not be included in predictions by
predict.lm, whereas those specified by an offset term in the formula will be.
summary.lm for summaries and
anova.lm for the ANOVA table;
aov for a different interface.
biglm in package biglm for an alternative way to fit linear models to large datasets (especially those with many cases).
require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D90) opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted, ... par(opar) ### less simple examples in "See Also" above
Documentation reproduced from R 3.0.2. License: GPL-2.