Variable Labeling in MakeTables

This notebook explains how variable labels are handled in the maketables package and demonstrates the available helper functions for working with labeled data.

Overview

Variable labels provide human-readable descriptions for your variables, making tables more interpretable. The maketables package supports:

  • Storing labels in DataFrame attributes
  • Applying custom labels via dictionaries
  • Default labels for common variable names
  • Reading and writing Dataframes with variable labels (from and to Stata .dta files)

Setup

import pandas as pd
import numpy as np
import maketables as mt
from maketables.importdta import get_var_labels, import_dta

How Labels Work

Label Priority

When creating tables, maketables looks for variable labels in this order:

  1. User-provided labels (via labels parameter)
  2. DataFrame attributes (stored in df.attrs['variable_labels'])
  3. Default labels (for common variables like ‘age’, ‘wage’, etc.)
  4. Variable name (as fallback)

Example: Creating Sample Data

# Create sample data
np.random.seed(42)
df = pd.DataFrame({
    'age': np.random.randint(20, 65, 100),
    'wage': np.random.normal(50000, 15000, 100),
    'tenure': np.random.randint(0, 20, 100),
    'education': np.random.choice(['HS', 'College', 'Graduate'], 100)
})

Method 1: Custom Labels via Dictionary

You can provide custom labels using the labels parameter:

# Custom labels
custom_labels = {
    'age': 'Age (years)',
    'wage': 'Annual Salary ($)',
    'tenure': 'Years at Company',
}

mt.DTable(
    df,
    vars=['age', 'wage', 'tenure'],
    stats=['mean', 'std'],
    labels=custom_labels,
    caption="Table with Custom Labels"
)
Table with Custom Labels
Mean Std. Dev.
Age (years) 41.60 13.31
Annual Salary ($) 50,662 14,769
Years at Company 8.95 6.45

Method 2: Labels in DataFrame Attributes

You can store labels directly in the DataFrame’s attributes. Once you have done this, they will be used when a table is displayed that uses data from this DataFrame.

# Store labels in DataFrame attributes
df.attrs['variable_labels'] = {
    'age': 'Age (years)',
    'wage': 'Annual Salary ($)',
    'tenure': 'Years at Company',
    'education': 'Education Level'
}

# Now labels are automatically used
mt.DTable(
    df,
    vars=['age', 'wage', 'tenure'],
    stats=['mean', 'std'],
    caption="Table Using DataFrame Attributes"
)
Table Using DataFrame Attributes
Mean Std. Dev.
Age (years) 41.60 13.31
Annual Salary ($) 50,662 14,769
Years at Company 8.95 6.45

Method 3: Setting default labels

You can also specific default labels that work across DataFrames with

mt.MTable.DEFAULT_LABELS = {
    'age': 'Age (years)',
    'wage': 'Annual Salary ($)',
    'tenure': 'Years at Company',
    'education': 'Education Level'
}

No Labels

If you do not want to display labels but rather keep the variable names, just pass an empty dictionary to the labels parameter: labels={}.

mt.DTable(
    df,
    vars=['age', 'wage', 'tenure'],
    stats=['mean', 'std'],
    labels={},  # Empty dict to override default labels
    caption="Table without labels"
)
Table without labels
Mean Std. Dev.
age 41.60 13.31
wage 50,662 14,769
tenure 8.95 6.45

Working with .dta Files

Importing Data with Labels

In pandas data frames are often stored as .csv files that do not contain variable labels. But often it is quite convenient to store labels with the data frame. To do that maketables offers two wrapper functions import_dta() and export_dta() that read and write Stata .dta files that do contain variable labels. Both functions are just simple wrappers using pandas StataReader to read and write .dta files including variable label information.

The import_dta() reads a data frame from a Stata .dta file also importing variable labels and stores these labels in the DataFrame attributes so that they are used by default when creating tables. Note that the function uses StataReader’s functionality to convert variables with value labels (i.e. categorical variables) to pandas Categorical data types. export_dta() writes a data frame to a Stata .dta file also exporting variable labels that are stored in the DataFrame attributes.

df = mt.import_dta("https://www.stata-press.com/data/r18/auto.dta")

# Create descriptive statistics table
mt.DTable(df, vars=["mpg","weight","length"], bycol=["foreign"])
Domestic Foreign
N Mean Std. Dev. N Mean Std. Dev.
Mileage (mpg) 52.00 19.83 4.74 22.00 24.77 6.61
Weight (lbs.) 52.00 3,317.12 695.36 22.00 2,315.91 433.00
Length (in.) 52.00 196.13 20.05 22.00 168.55 13.68

Extracting Labels from DataFrames

Use get_var_labels() to retrieve labels from a DataFrame:

# Get labels (merges DataFrame attrs with default labels)
labels = get_var_labels(df)

print("Extracted labels:")
for var, label in labels.items():
    print(f"  {var}: {label}")
Extracted labels:
  make: Make and model
  price: Price
  mpg: Mileage (mpg)
  rep78: Repair record 1978
  headroom: Headroom (in.)
  trunk: Trunk space (cu. ft.)
  weight: Weight (lbs.)
  length: Length (in.)
  turn: Turn circle (ft.)
  displacement: Displacement (cu. in.)
  gear_ratio: Gear ratio
  foreign: Car origin