Translating Stata to PyFixest

How to Get Started

This guide will focus on how to replicate the regression results you would get in Stata with the Python package pyfixest and assumes you know how to do things like install Python packages and load data into Pandas. For a broader introduction to doing econmetrics in Python you might check out Arthur Turrell’s Coding for Economist, which includes a section on Coming from Stata, or Tidy Finance with Python by Christopher Scheuch, Stefan Voigt, Patrick Weiss, and Christoph Frey. You can also check out the fixest section of stata2r as there is a lot of overlap between fixest and PyFixest syntax.

Data

pyfixest includes a function to generate a dataset for testing.

import pyfixest as pf
df = pf.get_data()

If you want to use the same dataset in Stata, you can save the data to your home directory as a .dta file with

import os
df.to_stata(os.path.join(os.path.expanduser("~"), "pyfixest-data.dta"))

and then load the data in Stata with

cd ~
use pyfixest-data.dta

Basic OLS

To do a basic linear regression in pyfixest you would simply use

fit1 = pf.feols("Y ~ X1", data = df)
fit2 = pf.feols("Y ~ X1 + X2", data = df)

which is equivalent to

reg Y X1

*       Source |       SS           df       MS      Number of obs   =       998
* -------------+----------------------------------   F(1, 996)       =    139.30
*        Model |  650.171727         1  650.171727   Prob > F        =    0.0000
*     Residual |  4648.80985       996  4.66747977   R-squared       =    0.1227
* -------------+----------------------------------   Adj R-squared   =    0.1218
*        Total |  5298.98157       997  5.31492635   Root MSE        =    2.1604
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -1.000086   .0847353   -11.80   0.000    -1.166366   -.8338056
*        _cons |   .9185178   .1118212     8.21   0.000     .6990856     1.13795
* ------------------------------------------------------------------------------

reg y X1 X2

*       Source |       SS           df       MS      Number of obs   =       998
* -------------+----------------------------------   F(2, 995)       =    106.99
*        Model |  937.866146         2  468.933073   Prob > F        =    0.0000
*     Residual |  4361.11543       995  4.38303058   R-squared       =    0.1770
* -------------+----------------------------------   Adj R-squared   =    0.1753
*        Total |  5298.98157       997  5.31492635   Root MSE        =    2.0936
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9929358   .0821175   -12.09   0.000    -1.154079   -.8317925
*           X2 |  -.1763424    .021766    -8.10   0.000    -.2190549   -.1336299
*        _cons |   .8887791   .1084224     8.20   0.000     .6760163    1.101542
* ------------------------------------------------------------------------------

in Stata. However, you should note that this will only run the regressions and store the results in fit1 and fit2. To show the results you can use some of the following methods and functions.

fit1.summary() # Basic summary statisticsmodels

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.919 |        0.112 |     8.214 |      0.000 |  0.699 |   1.138 |
| X1            |     -1.000 |        0.085 |   -11.802 |      0.000 | -1.166 |  -0.834 |
---
RMSE: 2.158 R2: 0.123

fit1.tidy() # Estimates, std errors, t-values, etc. in a "tidy" tablemodels

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
Intercept	0.918518	0.111821	8.214166	6.661338e-16	0.699086	1.137950
X1	-1.000086	0.084735	-11.802468	0.000000e+00	-1.166366	-0.833806

pf.report.etable([fit1, fit2]) # Customizable table that can include results for multiple models

	Y
	(1)	(2)
coef
X1	-1.000*** (0.085)	-0.993*** (0.082)
X2		-0.176*** (0.022)
Intercept	0.919*** (0.112)	0.889*** (0.108)
stats
Observations	998	998
S.E. type	iid	iid
R²	0.123	0.177
Adj. R²	0.122	0.175
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

You can also access individual parts of the results with a variety of methods like

fit1.coef() # Get the coefficients

Coefficient
Intercept    0.918518
X1          -1.000086
Name: Estimate, dtype: float64

fit1.se() # Get the standard errors

Coefficient
Intercept    0.111821
X1           0.084735
Name: Std. Error, dtype: float64

fit1.pvalue() # Get the p-values

Coefficient
Intercept    6.661338e-16
X1           0.000000e+00
Name: Pr(>|t|), dtype: float64

Robust Standard Errors

To get heteroskedasticity robust standard errors you can use

fit3 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC1")
fit3.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  HC1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.889 |        0.108 |     8.249 |      0.000 |  0.677 |   1.100 |
| X1            |     -0.993 |        0.080 |   -12.439 |      0.000 | -1.150 |  -0.836 |
| X2            |     -0.176 |        0.022 |    -8.129 |      0.000 | -0.219 |  -0.134 |
---
RMSE: 2.09 R2: 0.177

which is equivalent to

reg Y X1 X2, robust

* Linear regression                               Number of obs     =        998
*                                                 F(2, 995)         =     107.91
*                                                 Prob > F          =     0.0000
*                                                 R-squared         =     0.1770
*                                                 Root MSE          =     2.0936
*
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9929358   .0798259   -12.44   0.000    -1.149582   -.8362893
*           X2 |  -.1763424   .0216936    -8.13   0.000    -.2189129   -.1337719
*        _cons |   .8887791   .1077457     8.25   0.000     .6773442    1.100214
* ------------------------------------------------------------------------------

reg Y X1 X2, vce(robust)

* Identical output to above

or you can choose a different type of robust standard errors like “HC3” using

fit4 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC3")
fit4.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  HC3
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.889 |        0.108 |     8.222 |      0.000 |  0.677 |   1.101 |
| X1            |     -0.993 |        0.080 |   -12.396 |      0.000 | -1.150 |  -0.836 |
| X2            |     -0.176 |        0.022 |    -8.087 |      0.000 | -0.219 |  -0.134 |
---
RMSE: 2.09 R2: 0.177

Note: This will not exactly match the output of the equivalent Stata command, which is

reg Y X1 X2, vce(hc3)

* Linear regression                               Number of obs     =        998
*                                                 F(2, 995)         =     107.38
*                                                 Prob > F          =     0.0000
*                                                 R-squared         =     0.1770
*                                                 Root MSE          =     2.0936
*
* ------------------------------------------------------------------------------
*              |             Robust HC3
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9929358   .0799832   -12.41   0.000    -1.149891   -.8359807
*           X2 |  -.1763424   .0217734    -8.10   0.000    -.2190693   -.1336154
*        _cons |   .8887791   .1079372     8.23   0.000     .6769684     1.10059
* ------------------------------------------------------------------------------

this is because by default, pyfixest uses two small sample size corrections for HC3 robust standard errors, while Stata only uses one of them. You can turn off the correction that Stata doesn’t use with the ssc argument.

fit5 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC3", ssc=pf.ssc(adj = False))
fit5.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  HC3
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.889 |        0.108 |     8.230 |      0.000 |  0.677 |   1.101 |
| X1            |     -0.993 |        0.080 |   -12.408 |      0.000 | -1.150 |  -0.836 |
| X2            |     -0.176 |        0.022 |    -8.095 |      0.000 | -0.219 |  -0.134 |
---
RMSE: 2.09 R2: 0.177

which matches Stata exactly. You can read all about the small sample size corrections implememnted by pyfixest at On Small Sample Corrections.

Clustered Standard Errors

To cluster the standard errors by group you can use

fit6 = pf.feols("Y ~ X1 + X2", data=df.dropna(subset=['f1']), vcov={"CRV1": "f1"})
fit6.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.890 |        0.262 |     3.393 |      0.002 |  0.353 |   1.426 |
| X1            |     -0.995 |        0.076 |   -13.142 |      0.000 | -1.150 |  -0.840 |
| X2            |     -0.177 |        0.020 |    -8.920 |      0.000 | -0.217 |  -0.136 |
---
RMSE: 2.091 R2: 0.177

which is equivalent to

reg Y X1 X2, vce(cluster f1)

* Linear regression                               Number of obs     =        997
*                                                 F(2, 29)          =     102.51
*                                                 Prob > F          =     0.0000
*                                                 R-squared         =     0.1774
*                                                 Root MSE          =      2.094
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9951969   .0757246   -13.14   0.000    -1.150071   -.8403227
*           X2 |  -.1766173    .019799    -8.92   0.000    -.2171109   -.1361237
*        _cons |   .8895569   .2622066     3.39   0.002     .3532841     1.42583
* ------------------------------------------------------------------------------

Note: clustered standard errors are not supported with missing values in the cluster variable, which is why we drop the rows with missing values for f1.

For two way clustering you would need to use

fit7 = pf.feols(
  "Y ~ X1 + X2",
  data=df.dropna(subset=['f1', 'f2']),
  vcov={"CRV1": "f1 + f2"}
)
fit7.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.890 |        0.301 |     2.958 |      0.006 |  0.274 |   1.505 |
| X1            |     -0.995 |        0.073 |   -13.570 |      0.000 | -1.145 |  -0.845 |
| X2            |     -0.177 |        0.023 |    -7.535 |      0.000 | -0.225 |  -0.129 |
---
RMSE: 2.091 R2: 0.177

Note: This will not exactly match the output of the equivalent Stata command, which is

reg Y X1 X2, vce(cluster f1 f2)

* Linear regression                                       Number of obs =    997
* Clusters per comb.:                                     Cluster comb. =      3
*   min =  30                                             F(2, 29)      =  88.58
*   avg = 211                                             Prob > F      = 0.0000
*   max = 572                                             R-squared     = 0.1774
*                                                         Adj R-squared = 0.1758
*                                                         Root MSE      = 2.0940
*
*                                   (Std. err. adjusted for multiway clustering)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9951969    .074771   -13.31   0.000    -1.148121   -.8422731
*           X2 |  -.1766173     .02376    -7.43   0.000     -.225212   -.1280226
*        _cons |   .8895569   .3016464     2.95   0.006     .2726208    1.506493
* ------------------------------------------------------------------------------
* Cluster combinations formed by f1 and f2.

this is because by default, pyfixest uses a small sample size correction that adjusts each clustering dimentsion by whichever dimension has the smallest number of clusters, while in Stata the default is to adjust each dimension based on the number of clusters in that dimension. You can use the same correction as Stata through the ssc argument.

fit8 = pf.feols(
  "Y ~ X1 + X2",
  df.dropna(subset=['f1', 'f2']),
  vcov={"CRV1": "f1 + f2"},
  ssc=pf.ssc(cluster_df="conventional")
)
fit8.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.890 |        0.302 |     2.949 |      0.006 |  0.273 |   1.506 |
| X1            |     -0.995 |        0.075 |   -13.310 |      0.000 | -1.148 |  -0.842 |
| X2            |     -0.177 |        0.024 |    -7.433 |      0.000 | -0.225 |  -0.128 |
---
RMSE: 2.091 R2: 0.177

As a reminder, for an excellent breakdown on small sample correction in the pyfixest package, you can check out On Small Sample Correction.

Fixed Effect Regressions

To do a fixed effect regression with one fixed effect you could use

fit9 = pf.feols("Y ~ X1 + X2 | f1", data=df, vcov="iid")
fit9.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  iid
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.950 |        0.066 |   -14.306 |      0.000 | -1.080 |  -0.819 |
| X2            |     -0.174 |        0.018 |    -9.902 |      0.000 | -0.209 |  -0.140 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239

which is equivalent to

xtset f1
xtreg Y X1 X2, fe

* Fixed-effects (within) regression               Number of obs     =        997
* Group variable: f1                              Number of groups  =         30
*
* R-squared:                                      Obs per group:
*      Within  = 0.2388                                         min =         23
*      Between = 0.0770                                         avg =       33.2
*      Overall = 0.1774                                         max =         48
*
*                                                 F(2, 965)         =     151.33
* corr(u_i, Xb) = 0.0268                          Prob > F          =     0.0000
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0663728   -14.31   0.000    -1.079777   -.8192739
*           X2 |  -.1742253   .0175957    -9.90   0.000    -.2087555   -.1396951
*        _cons |    .842222   .0872525     9.65   0.000     .6709955    1.013448
* -------------+----------------------------------------------------------------
*      sigma_u |  1.2570454
*      sigma_e |  1.6751049
*          rho |  .36026283   (fraction of variance due to u_i)
* ------------------------------------------------------------------------------
* F test that all u_i=0: F(29, 965) = 20.29                    Prob > F = 0.0000

reghdfe Y X1 X2, absorb(f1)

* HDFE Linear regression                            Number of obs   =        997
* Absorbing 1 HDFE group                            F(   2,    965) =     151.33
*                                                   Prob > F        =     0.0000
*                                                   R-squared       =     0.4890
*                                                   Adj R-squared   =     0.4726
*                                                   Within R-sq.    =     0.2388
*                                                   Root MSE        =     1.6751
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0663728   -14.31   0.000    -1.079777   -.8192739
*           X2 |  -.1742253   .0175957    -9.90   0.000    -.2087555   -.1396951
*        _cons |    .842222   .0872525     9.65   0.000     .6709955    1.013448
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
*  Absorbed FE | Categories  - Redundant  = Num. Coefs |
* -------------+---------------------------------------|
*           f1 |        30           0          30     |
* -----------------------------------------------------+

Note: You need to specify vcov="iid. This is because the default for fixest is to cluster standard errors based on the first fixed effect. So

fit10 = pf.feols("Y ~ X1 + X2 | f1", data=df)
fit10.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.950 |        0.067 |   -14.266 |      0.000 | -1.086 |  -0.813 |
| X2            |     -0.174 |        0.018 |    -9.464 |      0.000 | -0.212 |  -0.137 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239

is equivalent to

xtset f1
xtreg Y X1 X2, fe vce(cluster f1)

* Fixed-effects (within) regression               Number of obs     =        997
* Group variable: f1                              Number of groups  =         30
*
* R-squared:                                      Obs per group:
*      Within  = 0.2388                                         min =         23
*      Between = 0.0770                                         avg =       33.2
*      Overall = 0.1774                                         max =         48
*
*                                                 F(2, 29)          =     146.33
* corr(u_i, Xb) = 0.0268                          Prob > F          =     0.0000
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0665572   -14.27   0.000     -1.08565   -.8134009
*           X2 |  -.1742253    .018409    -9.46   0.000    -.2118759   -.1365746
*        _cons |    .842222   .0694639    12.12   0.000     .7001523    .9842916
* -------------+----------------------------------------------------------------
*      sigma_u |  1.2570454
*      sigma_e |  1.6751049
*          rho |  .36026283   (fraction of variance due to u_i)
* ------------------------------------------------------------------------------

reghdfe Y X1 X2, absorb(f1) cluster(f1)

* HDFE Linear regression                            Number of obs   =        997
* Absorbing 1 HDFE group                            F(   2,     29) =     146.33
* Statistics robust to heteroskedasticity           Prob > F        =     0.0000
*                                                   R-squared       =     0.4890
*                                                   Adj R-squared   =     0.4726
*                                                   Within R-sq.    =     0.2388
* Number of clusters (f1)      =         30         Root MSE        =     1.6751
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0665572   -14.27   0.000     -1.08565   -.8134009
*           X2 |  -.1742253    .018409    -9.46   0.000    -.2118759   -.1365746
*        _cons |    .842222   .0694639    12.12   0.000     .7001523    .9842916
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
*  Absorbed FE | Categories  - Redundant  = Num. Coefs |
* -------------+---------------------------------------|
*           f1 |        30          30           0    *|
* -----------------------------------------------------+
* * = FE nested within cluster; treated as redundant for DoF computation

For multiple fixed effects you could do

fit11 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=df)
fit11.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.924 |        0.062 |   -14.934 |      0.000 | -1.051 |  -0.797 |
| X2            |     -0.174 |        0.015 |   -11.737 |      0.000 | -0.204 |  -0.144 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303

which is equivalent to

reghdfe Y X1 X2, absorb(f1 f2) cluster(f1)

* HDFE Linear regression                            Number of obs   =        997
* Absorbing 2 HDFE groups                           F(   2,     29) =     182.76
* Statistics robust to heteroskedasticity           Prob > F        =     0.0000
*                                                   R-squared       =     0.6590
*                                                   Adj R-squared   =     0.6372
*                                                   Within R-sq.    =     0.3026
* Number of clusters (f1)      =         30         Root MSE        =     1.3893
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9240462   .0618743   -14.93   0.000    -1.050593   -.7974991
*           X2 |  -.1741073   .0148338   -11.74   0.000    -.2044458   -.1437689
*        _cons |   .8156588    .064596    12.63   0.000     .6835452    .9477723
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
*  Absorbed FE | Categories  - Redundant  = Num. Coefs |
* -------------+---------------------------------------|
*           f1 |        30          30           0    *|
*           f2 |        30           1          29     |
* -----------------------------------------------------+
* * = FE nested within cluster; treated as redundant for DoF computation

Note: By default pyfixest will still cluster standard errors just by the first group specified. You can change this using the vcov argument. See the fit8 example above for how to specify two way clustering.