Skip to Content

read.dta {foreign}

Read Stata Binary Files
Package: 
foreign
Version: 
0.8-59

Description

Reads a file in Stata version 5--12 binary format into a data frame. Frozen: will not support Stata formats after 12.

Usage

read.dta(file, convert.dates = TRUE, convert.factors = TRUE,
         missing.type = FALSE,
         convert.underscore = FALSE, warn.missing.labels = TRUE)

Arguments

file
a filename or URL as a character string.
convert.dates
Convert Stata dates to Date class, and date-times to POSIXct class?
convert.factors
Use Stata value labels to create factors? (Version 6.0 or later).
missing.type
For version 8 or later, store information about different types of missing data?
convert.underscore
Convert "_" in Stata variable names to "." in R names?
warn.missing.labels
Warn if a variable is specified with value labels and those value labels are not present in the file.

Details

If the filename appears to be a URL (of schemes http:, ftp: or https:) the URL is first downloaded to a temporary file and then read. (https: is only supported on some platforms.) The variables in the Stata data set become the columns of the data frame. Missing values are correctly handled. The data label, variable labels, timestamp, and variable/dataset characteristics are stored as attributes of the data frame.

By default Stata dates (%d and %td formats) are converted to R's Date class, and variables with Stata value labels are converted to factors. Ordinarily, read.dta will not convert a variable to a factor unless a label is present for every level. Use convert.factors = NA to override this. In any case the value label and format information is stored as attributes on the returned data frame. Stata's date formats are sketchily documented: if necessary use convert.dates = FALSE and examine the attributes to work out how to post-process the dates. Stata 8 introduced a system of 27 different missing data values. If missing.type is TRUE a separate list is created with the same variable names as the loaded data. For string variables the list value is NULL. For other variables the value is NA where the observation is not missing and 0--26 when the observation is missing. This is attached as the "missing" attribute of the returned value.

The default file format for Stata 13, format-115, is substantially different from those for Stata 5--12.

Values

A data frame with attributes. These will include "datalabel", "time.stamp", "formats", "types", "val.labels", "var.labels" and "version" and may include "label.table" and "expansion.table". Possible versions are 5, 6, 7, -7 (Stata 7SE, ‘format-111’), 8 (Stata 8 and 9, ‘format-113’), 10 (Stata 10 and 11, ‘format-114’). and 12 (Stata 12, ‘format-115’). The value labels in attribute "val.labels" name a table for each variable, or are an empty string. The tables are elements of the named list attribute "label.table": each is an integer vector with names.

References

Stata Users Manual (versions 5 & 6), Programming manual (version 7), or online help (version 8 and later) describe the format of the files. Or directly at http://www.stata.com/help.cgi?dta_114 and http://www.stata.com/help.cgi?dta_113, but note that these have been changed since first published.

See Also

A different approach is available in package memisc: see its help for Stata.file, at the time of writing not for Stata 12 or later.

write.dta, attributes, Date, factor

Examples

data(swiss)
write.dta(swiss,swissfile <- tempfile())
read.dta(swissfile)

Author(s)

Thomas Lumley and R-core members: support for value labels by Brian Quistorff.

Documentation reproduced from package foreign, version 0.8-59. License: GPL (>= 2)