Integrate Stata Results

This notebook demonstrates how to integrate Stata results in tables with MakeTables. You need to have a local Stata installation and setup pystata to run this notebook.

Basic Usage

import stata_setup

# Adjust the path to your Stata installation
stata_setup.config("C:/Program Files/Stata18", "mp")

import pystata
import maketables as mt

# Run regression in Stata 
pystata.stata.run('''
    sysuse auto, clear
    regress mpg weight length foreign
''')

# Extract results and labels for MakeTables
result = mt.extract_current_stata_results()

# Create table
mt.ETable([result], caption="Regression Results from Stata")

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      StataNow 18.5
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: Unlimited-user 4-core network, expiring 14 Dec 2025
Serial number: 501809302858
  Licensed to: Dirk Sliwka
               Universität zu Köln

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

. 
.     sysuse auto, clear
(1978 automobile data)

.     regress mpg weight length foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =     48.10
       Model |   1645.2889         3  548.429632   Prob > F        =    0.0000
    Residual |  798.170563        70  11.4024366   R-squared       =    0.6733
-------------+----------------------------------   Adj R-squared   =    0.6593
       Total |  2443.45946        73  33.4720474   Root MSE        =    3.3767

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |  -.0043656   .0016014    -2.73   0.008    -.0075595   -.0011718
      length |  -.0827432   .0547942    -1.51   0.136    -.1920267    .0265403
     foreign |  -1.707904    1.06711    -1.60   0.114    -3.836188    .4203806
       _cons |   50.53701   6.245835     8.09   0.000     38.08009    62.99394
------------------------------------------------------------------------------

. 
Regression Results from Stata
Mileage (mpg)
(1)
coef
Weight (lbs.) -0.004***
(0.002)
Length (in.) -0.083
(0.055)
Car origin -1.708
(1.067)
Intercept 50.537***
(6.246)
stats
Observations 74
R2 0.673
Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)

rstata() Wrapper Function

The rstata() function combines Stata execution and result extraction.

# Run regression and auto-extract results in one step (quietly=True supresses display of stata output)
result = mt.rstata("regress mpg weight length foreign", quietly=True)

# Create table
mt.ETable([result], caption="Regression Results from Stata")
Regression Results from Stata
Mileage (mpg)
(1)
coef
Weight (lbs.) -0.004***
(0.002)
Length (in.) -0.083
(0.055)
Car origin -1.708
(1.067)
Intercept 50.537***
(6.246)
stats
Observations 74
R2 0.673
Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)

Multiple Model Comparison

# Run multiple specifications with quietly=True for clean output
model1 = mt.rstata('regress mpg weight', quietly=True)
model2 = mt.rstata('regress mpg weight length', quietly=True)
model3 = mt.rstata('regress mpg weight length foreign', quietly=True)

# Create comparison table
mt.ETable([model1, model2, model3])
Mileage (mpg)
(1) (2) (3)
coef
Weight (lbs.) -0.006***
(0.001)
-0.004**
(0.002)
-0.004***
(0.002)
Length (in.) -0.080
(0.055)
-0.083
(0.055)
Car origin -1.708
(1.067)
Intercept 39.440***
(1.614)
47.885***
(6.088)
50.537***
(6.246)
stats
Observations 74 74 74
R2 0.652 0.661 0.673
Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)

Categorical Variables and Interactions

You can also use Stata’s i. and c. operators to create dummy variables and interaction terms. The makeTables Stata extractor will extract also Stata value labels and convert the stata variable names into the formulaic notation used by python regression packages and thus also handles relabeling and formating of these categorical variables and interaction terms.

# Setup data with categorical variables
pystata.stata.run('''
    sysuse auto, clear
    
    // Create categorical variables for demonstration
    gen price_cat = 1 if price < 5000
    replace price_cat = 2 if price >= 5000 & price < 10000  
    replace price_cat = 3 if price >= 10000 & price != .
    label define price_lbl 1 "Low" 2 "Medium" 3 "High"
    label values price_cat price_lbl
    label variable price_cat "Price category"
    
''', quietly=True)

model1 = mt.rstata('regress mpg i.price_cat weight foreign', quietly=True)
model2 = mt.rstata('regress mpg c.weight##i.foreign i.price_cat', quietly=True)

# Create comparison table
mt.ETable([model1, model2], cat_template="{value}")
Mileage (mpg)
(1) (2)
coef
Medium -0.641
(1.045)
-0.386
(1.013)
High -0.085
(1.727)
0.705
(1.694)
Weight (lbs.) -0.006***
(0.001)
-0.006***
(0.001)
Car origin -1.353
(1.343)
Foreign 9.604**
(4.560)
Foreign × Weight (lbs.) -0.005**
(0.002)
Intercept 41.422***
(2.809)
40.121***
(2.757)
stats
Observations 74 74
R2 0.665 0.693
Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)

Combining results from different packages

Demonstrating identical regression specification run in both Stata and PyFixest.

# Stata vs PyFixest Side-by-Side Comparison
import pandas as pd
import pyfixest as pf

# Get Stata data and run Stata regression
df = pystata.stata.pdataframe_from_data()

# Apply the same value labels as defined in Stata
df['price_cat'] = df['price_cat'].map({1: 'Low', 2: 'Medium', 3: 'High'}).astype('category')
df['foreign'] = df['foreign'].map({0: 'Domestic', 1: 'Foreign'}).astype('category')

# Order categorial to assure that reference group correctly picked
df['price_cat'] = df['price_cat'].cat.reorder_categories(['Low', 'Medium', 'High'])
df['foreign'] = df['foreign'].cat.reorder_categories(['Domestic', 'Foreign'])

# Run regressions
pyfixest_result = pf.feols("mpg ~ i(price_cat)*weight", data=df)
stata_result = mt.rstata('regress mpg c.weight##i.price_cat', quietly=True, formulaic_names=True)

# Create comparison table
mt.ETable([stata_result, pyfixest_result], model_heads=["Stata (PyStata)", "PyFixest"])
  Mileage (mpg)
Stata (PyStata) PyFixest
(1) (2)
coef
Weight (lbs.) -0.007***
(0.001)
-0.007***
(0.001)
Price category=Medium -5.139
(3.797)
-5.139
(3.797)
Price category=High -20.317**
(9.061)
-20.317**
(9.061)
Price category=Medium × Weight (lbs.) 0.001
(0.001)
0.001
(0.001)
Price category=High × Weight (lbs.) 0.005**
(0.002)
0.005**
(0.002)
Intercept 42.113***
(2.495)
42.113***
(2.495)
stats
Observations 74 74
R2 0.684 0.684
Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)