Compute a multivariate location and scale estimate with a high breakdown point -- this can be thought of as estimating the mean and covariance of the
good part of the data.
cov.mcd are compatibility wrappers.
cov.rob(x, cor = FALSE, quantile.used = floor((n + p + 1)/2), method = c("mve", "mcd", "classical"), nsamp = "best", seed) cov.mve(...) cov.mcd(...)
- a matrix or data frame.
- should the returned result include a correlation matrix?
- the minimum number of the data points regarded as
- the method to be used -- minimum volume ellipsoid, minimum covariance determinant or classical product-moment. Using
- the number of samples or
"sample"the number chosen is
min(5*p, 3000), taken from Rousseeuw and Hubert (1997). If
"best"exhaustive enumeration is done up to 5000 samples: if
"exact"exhaustive enumeration will be attempted however many samples are needed.
- the seed to be used for random sampling: see
RNGkind. The current value of
.Random.seedwill be preserved if it is set.
- arguments to
"mve", an approximate search is made of a subset of size
quantile.used with an enclosing ellipsoid of smallest volume; in method
"mcd" it is the volume of the Gaussian confidence ellipsoid, equivalently the determinant of the classical covariance matrix, that is minimized. The mean of the subset provides a first estimate of the location, and the rescaled covariance matrix a first estimate of scatter. The Mahalanobis distances of all the points from the location estimate for this covariance matrix are calculated, and those points within the 97.5% point under Gaussian assumptions are declared to be
good. The final estimates are the mean and rescaled covariance of the
The rescaling is by the appropriate percentile under Gaussian data; in addition the first covariance matrix has an ad hoc finite-sample correction given by Marazzi.
"mve" the search is made over ellipsoids determined by the covariance matrix of
p of the data points. For method
"mcd" an additional improvement step suggested by Rousseeuw and van Driessen (1999) is used, in which once a subset of size
quantile.used is selected, an ellipsoid based on its covariance is tested (as this will have no larger a determinant, and may be smaller).
A list with components
- the final estimate of location.
- the final estimate of scatter.
- (only is
cor = TRUE) the estimate of the correlation matrix.
- message giving number of singular samples out of total
- the value of the criterion on log scale. For MCD this is the determinant, and for MVE it is proportional to the volume.
- the subset used. For MVE the best sample, for MCD the best set of size
- total number of observations.
P. J. Rousseeuw and A. M. Leroy (1987) Robust Regression and Outlier Detection. Wiley.
A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth and Brooks/Cole.
P. J. Rousseeuw and B. C. van Zomeren (1990) Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 633--639.
P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212--223.
P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In L1-Statistical Procedures and Related Topics ed Y. Dodge, IMS Lecture Notes volume 31, pp. 201--214.
Documentation reproduced from package MASS, version 7.3-45. License: GPL-2 | GPL-3