Skip to Content



Detection of outliers and influential errors using a latent variable model.


A collection of methods for automated data cleaning where all actions are logged.


Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the igraph package.


Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point for Incomplete Data


This package performs univariate stratification of survey populations. The main function implements a generalization of the Lavallee-Hidiroglou method of stratum construction. The generalized method takes into account a discrepancy between the stratification variable and the survey variable. The determination of the optimal boundaries also incorporate, if desired, an anticipated non-response, a take-all stratum for large units, a take-none stratum for small units, and a certainty stratum to ensure that some specific units are in the sample. The well known cumulative root frequency rule of Dalenius and Hodges and the geometric rule of Gunning and Horgan are also implemented.


Sampling procedures from the book 'Stichproben. Methoden und praktische Umsetzung mit R' by Goeran Kauermann and Helmut Kuechenhoff (2010)


Imputation of incomplete continuous or categorical datasets; Missing values are imputed with a principal component analysis (PCA), a multiple correspondence analysis (MCA) model or a multiple factor analysis (MFA) model; Perform multiple imputation with and in PCA


Estimation of indicators on social exclusion and poverty, as well as Pareto tail modeling for empirical income distributions.


Provides functions for linking and deduplicating data sets. Methods based on a stochastical approach are implemented as well as classification algorithms from the machine learning domain.


Simulate populations for surveys based on sample data with special application to EU-SILC.