import pyfixest as pf
df = pf.get_data()Translating Stata to PyFixest
How to Get Started
This guide will focus on how to replicate the regression results you would get in Stata with the Python package pyfixest and assumes you know how to do things like install Python packages and load data into Pandas. For a broader introduction to doing econmetrics in Python you might check out Arthur Turrell’s Coding for Economist, which includes a section on Coming from Stata, or Tidy Finance with Python by Christopher Scheuch, Stefan Voigt, Patrick Weiss, and Christoph Frey. You can also check out the fixest section of stata2r as there is a lot of overlap between fixest and PyFixest syntax.
Data
pyfixest includes a function to generate a dataset for testing.
If you want to use the same dataset in Stata, you can save the data to your home directory as a .dta file with
import os
df.to_stata(os.path.join(os.path.expanduser("~"), "pyfixest-data.dta"))and then load the data in Stata with
cd ~
use pyfixest-data.dtaBasic OLS
To do a basic linear regression in pyfixest you would simply use
fit1 = pf.feols("Y ~ X1", data = df)
fit2 = pf.feols("Y ~ X1 + X2", data = df)which is equivalent to
reg Y X1
* Source | SS df MS Number of obs = 998
* -------------+---------------------------------- F(1, 996) = 139.30
* Model | 650.171727 1 650.171727 Prob > F = 0.0000
* Residual | 4648.80985 996 4.66747977 R-squared = 0.1227
* -------------+---------------------------------- Adj R-squared = 0.1218
* Total | 5298.98157 997 5.31492635 Root MSE = 2.1604
*
* ------------------------------------------------------------------------------
* Y | Coefficient Std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -1.000086 .0847353 -11.80 0.000 -1.166366 -.8338056
* _cons | .9185178 .1118212 8.21 0.000 .6990856 1.13795
* ------------------------------------------------------------------------------
reg y X1 X2
* Source | SS df MS Number of obs = 998
* -------------+---------------------------------- F(2, 995) = 106.99
* Model | 937.866146 2 468.933073 Prob > F = 0.0000
* Residual | 4361.11543 995 4.38303058 R-squared = 0.1770
* -------------+---------------------------------- Adj R-squared = 0.1753
* Total | 5298.98157 997 5.31492635 Root MSE = 2.0936
*
* ------------------------------------------------------------------------------
* Y | Coefficient Std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9929358 .0821175 -12.09 0.000 -1.154079 -.8317925
* X2 | -.1763424 .021766 -8.10 0.000 -.2190549 -.1336299
* _cons | .8887791 .1084224 8.20 0.000 .6760163 1.101542
* ------------------------------------------------------------------------------in Stata. However, you should note that this will only run the regressions and store the results in fit1 and fit2. To show the results you can use some of the following methods and functions.
fit1.summary() # Basic summary statisticsmodels###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: iid
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.919 | 0.112 | 8.214 | 0.000 | 0.699 | 1.138 |
| X1 | -1.000 | 0.085 | -11.802 | 0.000 | -1.166 | -0.834 |
---
RMSE: 2.158 R2: 0.123
fit1.tidy() # Estimates, std errors, t-values, etc. in a "tidy" tablemodels| Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% | |
|---|---|---|---|---|---|---|
| Coefficient | ||||||
| Intercept | 0.918518 | 0.111821 | 8.214166 | 6.661338e-16 | 0.699086 | 1.137950 |
| X1 | -1.000086 | 0.084735 | -11.802468 | 0.000000e+00 | -1.166366 | -0.833806 |
pf.report.etable([fit1, fit2]) # Customizable table that can include results for multiple models| Y | ||
|---|---|---|
| (1) | (2) | |
| coef | ||
| X1 | -1.000*** (0.085) |
-0.993*** (0.082) |
| X2 | -0.176*** (0.022) |
|
| Intercept | 0.919*** (0.112) |
0.889*** (0.108) |
| stats | ||
| Observations | 998 | 998 |
| S.E. type | iid | iid |
| R2 | 0.123 | 0.177 |
| Adj. R2 | 0.122 | 0.175 |
| Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) | ||
You can also access individual parts of the results with a variety of methods like
fit1.coef() # Get the coefficientsCoefficient
Intercept 0.918518
X1 -1.000086
Name: Estimate, dtype: float64
fit1.se() # Get the standard errorsCoefficient
Intercept 0.111821
X1 0.084735
Name: Std. Error, dtype: float64
fit1.pvalue() # Get the p-valuesCoefficient
Intercept 6.661338e-16
X1 0.000000e+00
Name: Pr(>|t|), dtype: float64
Robust Standard Errors
To get heteroskedasticity robust standard errors you can use
fit3 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC1")
fit3.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.889 | 0.108 | 8.249 | 0.000 | 0.677 | 1.100 |
| X1 | -0.993 | 0.080 | -12.439 | 0.000 | -1.150 | -0.836 |
| X2 | -0.176 | 0.022 | -8.129 | 0.000 | -0.219 | -0.134 |
---
RMSE: 2.09 R2: 0.177
which is equivalent to
reg Y X1 X2, robust
* Linear regression Number of obs = 998
* F(2, 995) = 107.91
* Prob > F = 0.0000
* R-squared = 0.1770
* Root MSE = 2.0936
*
* ------------------------------------------------------------------------------
* | Robust
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9929358 .0798259 -12.44 0.000 -1.149582 -.8362893
* X2 | -.1763424 .0216936 -8.13 0.000 -.2189129 -.1337719
* _cons | .8887791 .1077457 8.25 0.000 .6773442 1.100214
* ------------------------------------------------------------------------------or
reg Y X1 X2, vce(robust)
* Identical output to aboveor you can choose a different type of robust standard errors like “HC3” using
fit4 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC3")
fit4.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC3
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.889 | 0.108 | 8.222 | 0.000 | 0.677 | 1.101 |
| X1 | -0.993 | 0.080 | -12.396 | 0.000 | -1.150 | -0.836 |
| X2 | -0.176 | 0.022 | -8.087 | 0.000 | -0.219 | -0.134 |
---
RMSE: 2.09 R2: 0.177
Note: This will not exactly match the output of the equivalent Stata command, which is
reg Y X1 X2, vce(hc3)
* Linear regression Number of obs = 998
* F(2, 995) = 107.38
* Prob > F = 0.0000
* R-squared = 0.1770
* Root MSE = 2.0936
*
* ------------------------------------------------------------------------------
* | Robust HC3
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9929358 .0799832 -12.41 0.000 -1.149891 -.8359807
* X2 | -.1763424 .0217734 -8.10 0.000 -.2190693 -.1336154
* _cons | .8887791 .1079372 8.23 0.000 .6769684 1.10059
* ------------------------------------------------------------------------------this is because by default, pyfixest uses two small sample size corrections for HC3 robust standard errors, while Stata only uses one of them. You can turn off the correction that Stata doesn’t use with the ssc argument.
fit5 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC3", ssc=pf.ssc(k_adj = False))
fit5.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC3
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.889 | 0.108 | 8.234 | 0.000 | 0.677 | 1.101 |
| X1 | -0.993 | 0.080 | -12.414 | 0.000 | -1.150 | -0.836 |
| X2 | -0.176 | 0.022 | -8.099 | 0.000 | -0.219 | -0.134 |
---
RMSE: 2.09 R2: 0.177
which matches Stata exactly. You can read all about the small sample size corrections implememnted by pyfixest at On Small Sample Corrections.
Clustered Standard Errors
To cluster the standard errors by group you can use
fit6 = pf.feols("Y ~ X1 + X2", data=df.dropna(subset=['f1']), vcov={"CRV1": "f1"})
fit6.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: CRV1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.890 | 0.262 | 3.393 | 0.002 | 0.353 | 1.426 |
| X1 | -0.995 | 0.076 | -13.142 | 0.000 | -1.150 | -0.840 |
| X2 | -0.177 | 0.020 | -8.920 | 0.000 | -0.217 | -0.136 |
---
RMSE: 2.091 R2: 0.177
which is equivalent to
reg Y X1 X2, vce(cluster f1)
* Linear regression Number of obs = 997
* F(2, 29) = 102.51
* Prob > F = 0.0000
* R-squared = 0.1774
* Root MSE = 2.094
*
* (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
* | Robust
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9951969 .0757246 -13.14 0.000 -1.150071 -.8403227
* X2 | -.1766173 .019799 -8.92 0.000 -.2171109 -.1361237
* _cons | .8895569 .2622066 3.39 0.002 .3532841 1.42583
* ------------------------------------------------------------------------------Note: clustered standard errors are not supported with missing values in the cluster variable, which is why we drop the rows with missing values for f1.
For two way clustering you would need to use
fit7 = pf.feols(
"Y ~ X1 + X2",
data=df.dropna(subset=['f1', 'f2']),
vcov={"CRV1": "f1 + f2"}
)
fit7.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: CRV1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.890 | 0.301 | 2.958 | 0.006 | 0.274 | 1.505 |
| X1 | -0.995 | 0.073 | -13.570 | 0.000 | -1.145 | -0.845 |
| X2 | -0.177 | 0.023 | -7.535 | 0.000 | -0.225 | -0.129 |
---
RMSE: 2.091 R2: 0.177
Note: This will not exactly match the output of the equivalent Stata command, which is
reg Y X1 X2, vce(cluster f1 f2)
* Linear regression Number of obs = 997
* Clusters per comb.: Cluster comb. = 3
* min = 30 F(2, 29) = 88.58
* avg = 211 Prob > F = 0.0000
* max = 572 R-squared = 0.1774
* Adj R-squared = 0.1758
* Root MSE = 2.0940
*
* (Std. err. adjusted for multiway clustering)
* ------------------------------------------------------------------------------
* | Robust
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9951969 .074771 -13.31 0.000 -1.148121 -.8422731
* X2 | -.1766173 .02376 -7.43 0.000 -.225212 -.1280226
* _cons | .8895569 .3016464 2.95 0.006 .2726208 1.506493
* ------------------------------------------------------------------------------
* Cluster combinations formed by f1 and f2.this is because by default, pyfixest uses a small sample size correction that adjusts each clustering dimentsion by whichever dimension has the smallest number of clusters, while in Stata the default is to adjust each dimension based on the number of clusters in that dimension. You can use the same correction as Stata through the ssc argument.
fit8 = pf.feols(
"Y ~ X1 + X2",
df.dropna(subset=['f1', 'f2']),
vcov={"CRV1": "f1 + f2"},
ssc=pf.ssc(G_df="conventional")
)
fit8.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: CRV1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.890 | 0.302 | 2.949 | 0.006 | 0.273 | 1.506 |
| X1 | -0.995 | 0.075 | -13.310 | 0.000 | -1.148 | -0.842 |
| X2 | -0.177 | 0.024 | -7.433 | 0.000 | -0.225 | -0.128 |
---
RMSE: 2.091 R2: 0.177
As a reminder, for an excellent breakdown on small sample correction in the pyfixest package, you can check out On Small Sample Correction.
Fixed Effect Regressions
To do a fixed effect regression with one fixed effect you could use
fit9 = pf.feols("Y ~ X1 + X2 | f1", data=df, vcov="iid")
fit9.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: f1
Inference: iid
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -0.950 | 0.066 | -14.306 | 0.000 | -1.080 | -0.819 |
| X2 | -0.174 | 0.018 | -9.902 | 0.000 | -0.209 | -0.140 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239
which is equivalent to
xtset f1
xtreg Y X1 X2, fe
* Fixed-effects (within) regression Number of obs = 997
* Group variable: f1 Number of groups = 30
*
* R-squared: Obs per group:
* Within = 0.2388 min = 23
* Between = 0.0770 avg = 33.2
* Overall = 0.1774 max = 48
*
* F(2, 965) = 151.33
* corr(u_i, Xb) = 0.0268 Prob > F = 0.0000
*
* ------------------------------------------------------------------------------
* Y | Coefficient Std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9495256 .0663728 -14.31 0.000 -1.079777 -.8192739
* X2 | -.1742253 .0175957 -9.90 0.000 -.2087555 -.1396951
* _cons | .842222 .0872525 9.65 0.000 .6709955 1.013448
* -------------+----------------------------------------------------------------
* sigma_u | 1.2570454
* sigma_e | 1.6751049
* rho | .36026283 (fraction of variance due to u_i)
* ------------------------------------------------------------------------------
* F test that all u_i=0: F(29, 965) = 20.29 Prob > F = 0.0000or
reghdfe Y X1 X2, absorb(f1)
* HDFE Linear regression Number of obs = 997
* Absorbing 1 HDFE group F( 2, 965) = 151.33
* Prob > F = 0.0000
* R-squared = 0.4890
* Adj R-squared = 0.4726
* Within R-sq. = 0.2388
* Root MSE = 1.6751
*
* ------------------------------------------------------------------------------
* Y | Coefficient Std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9495256 .0663728 -14.31 0.000 -1.079777 -.8192739
* X2 | -.1742253 .0175957 -9.90 0.000 -.2087555 -.1396951
* _cons | .842222 .0872525 9.65 0.000 .6709955 1.013448
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
* Absorbed FE | Categories - Redundant = Num. Coefs |
* -------------+---------------------------------------|
* f1 | 30 0 30 |
* -----------------------------------------------------+To use cluster robust standard errors, specify the vcov argument:
# Standard iid inference
fit10 = pf.feols("Y ~ X1 + X2 | f1", data=df)
fit10.summary()
# Cluster robust standard errors
fit10_clustered = pf.feols("Y ~ X1 + X2 | f1", data=df, vcov={"CRV1": "f1"})
fit10_clustered.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: f1
Inference: iid
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -0.950 | 0.066 | -14.306 | 0.000 | -1.080 | -0.819 |
| X2 | -0.174 | 0.018 | -9.902 | 0.000 | -0.209 | -0.140 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239
###
Estimation: OLS
Dep. var.: Y, Fixed effects: f1
Inference: CRV1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -0.950 | 0.067 | -14.266 | 0.000 | -1.086 | -0.813 |
| X2 | -0.174 | 0.018 | -9.464 | 0.000 | -0.212 | -0.137 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239
The clustered version fit10_clustered is equivalent to
xtset f1
xtreg Y X1 X2, fe vce(cluster f1)
* Fixed-effects (within) regression Number of obs = 997
* Group variable: f1 Number of groups = 30
*
* R-squared: Obs per group:
* Within = 0.2388 min = 23
* Between = 0.0770 avg = 33.2
* Overall = 0.1774 max = 48
*
* F(2, 29) = 146.33
* corr(u_i, Xb) = 0.0268 Prob > F = 0.0000
*
* (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
* | Robust
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9495256 .0665572 -14.27 0.000 -1.08565 -.8134009
* X2 | -.1742253 .018409 -9.46 0.000 -.2118759 -.1365746
* _cons | .842222 .0694639 12.12 0.000 .7001523 .9842916
* -------------+----------------------------------------------------------------
* sigma_u | 1.2570454
* sigma_e | 1.6751049
* rho | .36026283 (fraction of variance due to u_i)
* ------------------------------------------------------------------------------or
reghdfe Y X1 X2, absorb(f1) cluster(f1)
* HDFE Linear regression Number of obs = 997
* Absorbing 1 HDFE group F( 2, 29) = 146.33
* Statistics robust to heteroskedasticity Prob > F = 0.0000
* R-squared = 0.4890
* Adj R-squared = 0.4726
* Within R-sq. = 0.2388
* Number of clusters (f1) = 30 Root MSE = 1.6751
*
* (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
* | Robust
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9495256 .0665572 -14.27 0.000 -1.08565 -.8134009
* X2 | -.1742253 .018409 -9.46 0.000 -.2118759 -.1365746
* _cons | .842222 .0694639 12.12 0.000 .7001523 .9842916
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
* Absorbed FE | Categories - Redundant = Num. Coefs |
* -------------+---------------------------------------|
* f1 | 30 30 0 *|
* -----------------------------------------------------+
* * = FE nested within cluster; treated as redundant for DoF computationFor multiple fixed effects you could do
fit11 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=df)
fit11.summary()###
Estimation: OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference: iid
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -0.924 | 0.056 | -16.483 | 0.000 | -1.034 | -0.814 |
| X2 | -0.174 | 0.015 | -11.717 | 0.000 | -0.203 | -0.145 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303
which is equivalent to
reghdfe Y X1 X2, absorb(f1 f2) cluster(f1)
* HDFE Linear regression Number of obs = 997
* Absorbing 2 HDFE groups F( 2, 29) = 182.76
* Statistics robust to heteroskedasticity Prob > F = 0.0000
* R-squared = 0.6590
* Adj R-squared = 0.6372
* Within R-sq. = 0.3026
* Number of clusters (f1) = 30 Root MSE = 1.3893
*
* (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
* | Robust
* Y | Coefficient std. err. t P>|t| [95% conf. interval]
* -------------+----------------------------------------------------------------
* X1 | -.9240462 .0618743 -14.93 0.000 -1.050593 -.7974991
* X2 | -.1741073 .0148338 -11.74 0.000 -.2044458 -.1437689
* _cons | .8156588 .064596 12.63 0.000 .6835452 .9477723
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
* Absorbed FE | Categories - Redundant = Num. Coefs |
* -------------+---------------------------------------|
* f1 | 30 30 0 *|
* f2 | 30 1 29 |
* -----------------------------------------------------+
* * = FE nested within cluster; treated as redundant for DoF computationNote: To cluster standard errors by a specific group, use the vcov argument. See the fit8 example above for how to specify two way clustering.