importdta.import_dta

importdta.import_dta(
    path,
    *,
    convert_categoricals=True,
    store_in_attrs=True,
    update_mtable_defaults=False,
    override=False,
    return_labels=False,
)

Import a Stata .dta into a pandas DataFrame.

Behavior - Preserves Stata value labels by reading labeled variables as pandas.Categorical when convert_categoricals=True. - Extracts variable (column) labels from the file. - Stores variable labels in df.attrs[‘variable_labels’] by default (store_in_attrs=True). - Optionally merges labels into MTable.DEFAULT_LABELS for package-wide defaults.

Parameters

Name Type Description Default
path str | os.PathLike Local filesystem path to a .dta file. required
convert_categoricals bool Convert Stata value labels to pandas.Categorical (recommended to preserve value labels). True
store_in_attrs bool If True, save extracted variable labels under df.attrs[‘variable_labels’]. True
update_mtable_defaults bool If True, merge extracted labels into MTable.DEFAULT_LABELS. False
override bool Controls merging into MTable.DEFAULT_LABELS when update_mtable_defaults=True: - True: overwrite existing keys with new labels. - False: only fill keys that are missing. False
return_labels bool If True, also return the labels dict in addition to the DataFrame. False

Returns

Name Type Description
DataFrame or (DataFrame, dict) - If return_labels=False: the DataFrame. - If return_labels=True: (df, labels) where labels is {column_name: label}.

Notes

  • pandas handles Stata encodings from the file header.
  • API works across pandas versions; if StataReader constructor does not support convert_categoricals, it falls back and applies it at read time if needed.

Examples

>>> df = import_dta("data/auto.dta")
>>> df.attrs["variable_labels"]["price"]
>>> df, labels = import_dta("data/auto.dta", update_mtable_defaults=True, return_labels=True)