# princomp {stats}

### Description

`princomp`

performs a principal components analysis on the given numeric data matrix and returns the results as an object of class `princomp`

.

### Usage

princomp(x, ...) ## S3 method for class 'formula': princomp((formula, data = NULL, subset, na.action, ...)) ## S3 method for class 'default': princomp((x, cor = FALSE, scores = TRUE, covmat = NULL, subset = rep(TRUE, nrow(as.matrix(x))), ...)) ## S3 method for class 'princomp': predict((object, newdata, ...))

### Arguments

- formula
- a formula with no response variable, referring only to numeric variables.
- data
- an optional data frame (or similar: see
`model.frame`

) containing the variables in the formula`formula`

. By default the variables are taken from`environment(formula)`

. - subset
- an optional vector used to select rows (observations) of the data matrix
`x`

. - na.action
- a function which indicates what should happen when the data contain
`NA`

s. The default is set by the`na.action`

setting of`options`

, and is`na.fail`

if that is unset. The ‘factory-fresh’ default is`na.omit`

. - x
- a numeric matrix or data frame which provides the data for the principal components analysis.
- cor
- a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix. (The correlation matrix can only be used if there are no constant variables.)
- scores
- a logical value indicating whether the score on each principal component should be calculated.
- covmat
- a covariance matrix, or a covariance list as returned by
`cov.wt`

(and`cov.mve`

or`cov.mcd`

from package MASS). If supplied, this is used rather than the covariance matrix of`x`

. - ...
- arguments passed to or from other methods. If
`x`

is a formula one might specify`cor`

or`scores`

. - object
- Object of class inheriting from
`"princomp"`

- newdata
- An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names,
`newdata`

must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.

### Details

`princomp`

is a generic function with `"formula"`

and `"default"`

methods.

The calculation is done using `eigen`

on the correlation or covariance matrix, as determined by `cor`

. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use `svd`

on `x`

, as is done in `prcomp`

.

Note that the default calculation uses divisor `N`

for the covariance matrix.

The `print`

method for these objects prints the results in a nice format and the `plot`

method produces a scree plot (`screeplot`

). There is also a `biplot`

method.

If `x`

is a formula then the standard NA-handling is applied to the scores (if requested): see `napredict`

.

`princomp`

only handles so-called R-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many units as variables. For Q-mode PCA use `prcomp`

.

### Values

`princomp`

returns a list with class `"princomp"`

containing the following components:

- sdev
- the standard deviations of the principal components.
- loadings
- the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). This is of class
`"loadings"`

: see`loadings`

for its`print`

method. - center
- the means that were subtracted.
- scale
- the scalings applied to each variable.
- n.obs
- the number of observations.
- scores
- if
`scores = TRUE`

, the scores of the supplied data on the principal components. These are non-null only if`x`

was supplied, and if`covmat`

was also supplied if it was a covariance list. For the formula method,`napredict()`

is applied to handle the treatment of values omitted by the`na.action`

. - call
- the matched call.
- na.action
- If relevant.

### References

Mardia, K. V., J. T. Kent and J. M. Bibby (1979). *Multivariate Analysis*, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002). *Modern Applied Statistics with S*, Springer-Verlag.

### Note

The signs of the columns of the loadings and scores are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.

### Examples

require(graphics) ## The variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate (pc.cr <- princomp(USArrests)) # inappropriate princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE) ## Similar, but different: ## The standard deviations differ by a factor of sqrt(49/50) summary(pc.cr <- princomp(USArrests, cor = TRUE)) loadings(pc.cr) ## note that blank entries are small but not zero plot(pc.cr) # shows a screeplot. biplot(pc.cr) ## Formula interface princomp(~ ., data = USArrests, cor = TRUE) ## NA-handling USArrests[1, 2] <- NA pc.cr <- princomp(~ Murder + Assault + UrbanPop, data = USArrests, na.action = na.exclude, cor = TRUE) pc.cr$scores[1:5, ] ## (Simple) Robust PCA: ## Classical: (pc.cl <- princomp(stackloss)) ## Robust: (pc.rob <- princomp(stackloss, covmat = MASS::cov.rob(stackloss)))

Documentation reproduced from R 3.0.2. License: GPL-2.