import pandas as pd
import numpy as np
import maketables as mt
from maketables.importdta import get_var_labels, import_dtaVariable Labeling in MakeTables
This notebook explains how variable labels are handled in the maketables package and demonstrates the available helper functions for working with labeled data.
Overview
Variable labels provide human-readable descriptions for your variables, making tables more interpretable. The maketables package supports:
- Storing labels in DataFrame attributes
- Applying custom labels via dictionaries
- Default labels for common variable names
- Reading and writing Dataframes with variable labels (from and to Stata
.dtafiles)
Setup
How Labels Work
Label Priority
When creating tables, maketables looks for variable labels in this order:
- User-provided labels (via
labelsparameter) - DataFrame attributes (stored in
df.attrs['variable_labels']) - Default labels (for common variables like ‘age’, ‘wage’, etc.)
- Variable name (as fallback)
Example: Creating Sample Data
# Create sample data
np.random.seed(42)
df = pd.DataFrame({
'age': np.random.randint(20, 65, 100),
'wage': np.random.normal(50000, 15000, 100),
'tenure': np.random.randint(0, 20, 100),
'education': np.random.choice(['HS', 'College', 'Graduate'], 100)
})Method 1: Custom Labels via Dictionary
You can provide custom labels using the labels parameter:
# Custom labels
custom_labels = {
'age': 'Age (years)',
'wage': 'Annual Salary ($)',
'tenure': 'Years at Company',
}
mt.DTable(
df,
vars=['age', 'wage', 'tenure'],
stats=['mean', 'std'],
labels=custom_labels,
caption="Table with Custom Labels"
)| Table with Custom Labels | ||
| Mean | Std. Dev. | |
|---|---|---|
| Age (years) | 41.60 | 13.31 |
| Annual Salary ($) | 50,662 | 14,769 |
| Years at Company | 8.95 | 6.45 |
Method 2: Labels in DataFrame Attributes
You can store labels directly in the DataFrame’s attributes. Once you have done this, they will be used when a table is displayed that uses data from this DataFrame.
# Store labels in DataFrame attributes
df.attrs['variable_labels'] = {
'age': 'Age (years)',
'wage': 'Annual Salary ($)',
'tenure': 'Years at Company',
'education': 'Education Level'
}
# Now labels are automatically used
mt.DTable(
df,
vars=['age', 'wage', 'tenure'],
stats=['mean', 'std'],
caption="Table Using DataFrame Attributes"
)| Table Using DataFrame Attributes | ||
| Mean | Std. Dev. | |
|---|---|---|
| Age (years) | 41.60 | 13.31 |
| Annual Salary ($) | 50,662 | 14,769 |
| Years at Company | 8.95 | 6.45 |
Method 3: Setting default labels
You can also specific default labels that work across DataFrames with
mt.MTable.DEFAULT_LABELS = {
'age': 'Age (years)',
'wage': 'Annual Salary ($)',
'tenure': 'Years at Company',
'education': 'Education Level'
}No Labels
If you do not want to display labels but rather keep the variable names, just pass an empty dictionary to the labels parameter: labels={}.
mt.DTable(
df,
vars=['age', 'wage', 'tenure'],
stats=['mean', 'std'],
labels={}, # Empty dict to override default labels
caption="Table without labels"
)| Table without labels | ||
| Mean | Std. Dev. | |
|---|---|---|
| age | 41.60 | 13.31 |
| wage | 50,662 | 14,769 |
| tenure | 8.95 | 6.45 |
Working with .dta Files
Importing Data with Labels
In pandas data frames are often stored as .csv files that do not contain variable labels. But often it is quite convenient to store labels with the data frame. To do that maketables offers two wrapper functions import_dta() and export_dta() that read and write Stata .dta files that do contain variable labels. Both functions are just simple wrappers using pandas StataReader to read and write .dta files including variable label information.
The import_dta() reads a data frame from a Stata .dta file also importing variable labels and stores these labels in the DataFrame attributes so that they are used by default when creating tables. Note that the function uses StataReader’s functionality to convert variables with value labels (i.e. categorical variables) to pandas Categorical data types. export_dta() writes a data frame to a Stata .dta file also exporting variable labels that are stored in the DataFrame attributes.
df = mt.import_dta("https://www.stata-press.com/data/r18/auto.dta")
# Create descriptive statistics table
mt.DTable(df, vars=["mpg","weight","length"], bycol=["foreign"])| Domestic | Foreign | |||||
|---|---|---|---|---|---|---|
| N | Mean | Std. Dev. | N | Mean | Std. Dev. | |
| Mileage (mpg) | 52.00 | 19.83 | 4.74 | 22.00 | 24.77 | 6.61 |
| Weight (lbs.) | 52.00 | 3,317.12 | 695.36 | 22.00 | 2,315.91 | 433.00 |
| Length (in.) | 52.00 | 196.13 | 20.05 | 22.00 | 168.55 | 13.68 |
Extracting Labels from DataFrames
Use get_var_labels() to retrieve labels from a DataFrame:
# Get labels (merges DataFrame attrs with default labels)
labels = get_var_labels(df)
print("Extracted labels:")
for var, label in labels.items():
print(f" {var}: {label}")Extracted labels:
make: Make and model
price: Price
mpg: Mileage (mpg)
rep78: Repair record 1978
headroom: Headroom (in.)
trunk: Trunk space (cu. ft.)
weight: Weight (lbs.)
length: Length (in.)
turn: Turn circle (ft.)
displacement: Displacement (cu. in.)
gear_ratio: Gear ratio
foreign: Car origin