import pyfixest as pf
df = pf.get_data()Translating Stata to PyFixest
How to Get Started
This guide will focus on how to replicate the regression results you would get in Stata with the Python package pyfixest and assumes you know how to do things like install Python packages and load data into Pandas. For a broader introduction to doing econmetrics in Python you might check out Arthur Turrell’s Coding for Economist, which includes a section on Coming from Stata, or Tidy Finance with Python by Christopher Scheuch, Stefan Voigt, Patrick Weiss, and Christoph Frey. You can also check out the fixest section of stata2r as there is a lot of overlap between fixest and PyFixest syntax.
Data
pyfixest includes a function to generate a dataset for testing.
If you want to use the same dataset in Stata, you can save the data to your home directory as a .dta file with
import os
df.to_stata(os.path.join(os.path.expanduser("~"), "pyfixest-data.dta"))and then load the data in Stata with
cd ~
use pyfixest-data.dtaBasic OLS
To do a basic linear regression in pyfixest you would simply use
fit1 = pf.feols("Y ~ X1", data = df)
fit2 = pf.feols("Y ~ X1 + X2", data = df)which is equivalent to
reg Y X1
*       Source |       SS           df       MS      Number of obs   =       998
* -------------+----------------------------------   F(1, 996)       =    139.30
*        Model |  650.171727         1  650.171727   Prob > F        =    0.0000
*     Residual |  4648.80985       996  4.66747977   R-squared       =    0.1227
* -------------+----------------------------------   Adj R-squared   =    0.1218
*        Total |  5298.98157       997  5.31492635   Root MSE        =    2.1604
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -1.000086   .0847353   -11.80   0.000    -1.166366   -.8338056
*        _cons |   .9185178   .1118212     8.21   0.000     .6990856     1.13795
* ------------------------------------------------------------------------------
reg y X1 X2
*       Source |       SS           df       MS      Number of obs   =       998
* -------------+----------------------------------   F(2, 995)       =    106.99
*        Model |  937.866146         2  468.933073   Prob > F        =    0.0000
*     Residual |  4361.11543       995  4.38303058   R-squared       =    0.1770
* -------------+----------------------------------   Adj R-squared   =    0.1753
*        Total |  5298.98157       997  5.31492635   Root MSE        =    2.0936
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9929358   .0821175   -12.09   0.000    -1.154079   -.8317925
*           X2 |  -.1763424    .021766    -8.10   0.000    -.2190549   -.1336299
*        _cons |   .8887791   .1084224     8.20   0.000     .6760163    1.101542
* ------------------------------------------------------------------------------in Stata. However, you should note that this will only run the regressions and store the results in fit1 and fit2. To show the results you can use some of the following methods and functions.
fit1.summary() # Basic summary statisticsmodels###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  iid
Observations:  998
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.919 |        0.112 |     8.214 |      0.000 |  0.699 |   1.138 |
| X1            |     -1.000 |        0.085 |   -11.802 |      0.000 | -1.166 |  -0.834 |
---
RMSE: 2.158 R2: 0.123 fit1.tidy() # Estimates, std errors, t-values, etc. in a "tidy" tablemodels| Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% | |
|---|---|---|---|---|---|---|
| Coefficient | ||||||
| Intercept | 0.918518 | 0.111821 | 8.214166 | 6.661338e-16 | 0.699086 | 1.137950 | 
| X1 | -1.000086 | 0.084735 | -11.802468 | 0.000000e+00 | -1.166366 | -0.833806 | 
pf.report.etable([fit1, fit2]) # Customizable table that can include results for multiple models| Y | ||
|---|---|---|
| (1) | (2) | |
| coef | ||
| X1 | -1.000*** (0.085) | -0.993*** (0.082) | 
| X2 | -0.176*** (0.022) | |
| Intercept | 0.919*** (0.112) | 0.889*** (0.108) | 
| stats | ||
| Observations | 998 | 998 | 
| S.E. type | iid | iid | 
| R2 | 0.123 | 0.177 | 
| Adj. R2 | 0.122 | 0.175 | 
| Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) | ||
You can also access individual parts of the results with a variety of methods like
fit1.coef() # Get the coefficientsCoefficient
Intercept    0.918518
X1          -1.000086
Name: Estimate, dtype: float64fit1.se() # Get the standard errorsCoefficient
Intercept    0.111821
X1           0.084735
Name: Std. Error, dtype: float64fit1.pvalue() # Get the p-valuesCoefficient
Intercept    6.661338e-16
X1           0.000000e+00
Name: Pr(>|t|), dtype: float64Robust Standard Errors
To get heteroskedasticity robust standard errors you can use
fit3 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC1")
fit3.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  HC1
Observations:  998
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.889 |        0.108 |     8.249 |      0.000 |  0.677 |   1.100 |
| X1            |     -0.993 |        0.080 |   -12.439 |      0.000 | -1.150 |  -0.836 |
| X2            |     -0.176 |        0.022 |    -8.129 |      0.000 | -0.219 |  -0.134 |
---
RMSE: 2.09 R2: 0.177 which is equivalent to
reg Y X1 X2, robust
* Linear regression                               Number of obs     =        998
*                                                 F(2, 995)         =     107.91
*                                                 Prob > F          =     0.0000
*                                                 R-squared         =     0.1770
*                                                 Root MSE          =     2.0936
*
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9929358   .0798259   -12.44   0.000    -1.149582   -.8362893
*           X2 |  -.1763424   .0216936    -8.13   0.000    -.2189129   -.1337719
*        _cons |   .8887791   .1077457     8.25   0.000     .6773442    1.100214
* ------------------------------------------------------------------------------or
reg Y X1 X2, vce(robust)
* Identical output to aboveor you can choose a different type of robust standard errors like “HC3” using
fit4 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC3")
fit4.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  HC3
Observations:  998
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.889 |        0.108 |     8.222 |      0.000 |  0.677 |   1.101 |
| X1            |     -0.993 |        0.080 |   -12.396 |      0.000 | -1.150 |  -0.836 |
| X2            |     -0.176 |        0.022 |    -8.087 |      0.000 | -0.219 |  -0.134 |
---
RMSE: 2.09 R2: 0.177 Note: This will not exactly match the output of the equivalent Stata command, which is
reg Y X1 X2, vce(hc3)
* Linear regression                               Number of obs     =        998
*                                                 F(2, 995)         =     107.38
*                                                 Prob > F          =     0.0000
*                                                 R-squared         =     0.1770
*                                                 Root MSE          =     2.0936
*
* ------------------------------------------------------------------------------
*              |             Robust HC3
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9929358   .0799832   -12.41   0.000    -1.149891   -.8359807
*           X2 |  -.1763424   .0217734    -8.10   0.000    -.2190693   -.1336154
*        _cons |   .8887791   .1079372     8.23   0.000     .6769684     1.10059
* ------------------------------------------------------------------------------this is because by default, pyfixest uses two small sample size corrections for HC3 robust standard errors, while Stata only uses one of them. You can turn off the correction that Stata doesn’t use with the ssc argument.
fit5 = pf.feols("Y ~ X1 + X2", data=df, vcov="HC3", ssc=pf.ssc(adj = False))
fit5.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  HC3
Observations:  998
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.889 |        0.108 |     8.230 |      0.000 |  0.677 |   1.101 |
| X1            |     -0.993 |        0.080 |   -12.408 |      0.000 | -1.150 |  -0.836 |
| X2            |     -0.176 |        0.022 |    -8.095 |      0.000 | -0.219 |  -0.134 |
---
RMSE: 2.09 R2: 0.177 which matches Stata exactly. You can read all about the small sample size corrections implememnted by pyfixest at On Small Sample Corrections.
Clustered Standard Errors
To cluster the standard errors by group you can use
fit6 = pf.feols("Y ~ X1 + X2", data=df.dropna(subset=['f1']), vcov={"CRV1": "f1"})
fit6.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  CRV1
Observations:  997
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.890 |        0.262 |     3.393 |      0.002 |  0.353 |   1.426 |
| X1            |     -0.995 |        0.076 |   -13.142 |      0.000 | -1.150 |  -0.840 |
| X2            |     -0.177 |        0.020 |    -8.920 |      0.000 | -0.217 |  -0.136 |
---
RMSE: 2.091 R2: 0.177 which is equivalent to
reg Y X1 X2, vce(cluster f1)
* Linear regression                               Number of obs     =        997
*                                                 F(2, 29)          =     102.51
*                                                 Prob > F          =     0.0000
*                                                 R-squared         =     0.1774
*                                                 Root MSE          =      2.094
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9951969   .0757246   -13.14   0.000    -1.150071   -.8403227
*           X2 |  -.1766173    .019799    -8.92   0.000    -.2171109   -.1361237
*        _cons |   .8895569   .2622066     3.39   0.002     .3532841     1.42583
* ------------------------------------------------------------------------------Note: clustered standard errors are not supported with missing values in the cluster variable, which is why we drop the rows with missing values for f1.
For two way clustering you would need to use
fit7 = pf.feols(
  "Y ~ X1 + X2",
  data=df.dropna(subset=['f1', 'f2']),
  vcov={"CRV1": "f1 + f2"}
)
fit7.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  CRV1
Observations:  997
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.890 |        0.301 |     2.958 |      0.006 |  0.274 |   1.505 |
| X1            |     -0.995 |        0.073 |   -13.570 |      0.000 | -1.145 |  -0.845 |
| X2            |     -0.177 |        0.023 |    -7.535 |      0.000 | -0.225 |  -0.129 |
---
RMSE: 2.091 R2: 0.177 Note: This will not exactly match the output of the equivalent Stata command, which is
reg Y X1 X2, vce(cluster f1 f2)
* Linear regression                                       Number of obs =    997
* Clusters per comb.:                                     Cluster comb. =      3
*   min =  30                                             F(2, 29)      =  88.58
*   avg = 211                                             Prob > F      = 0.0000
*   max = 572                                             R-squared     = 0.1774
*                                                         Adj R-squared = 0.1758
*                                                         Root MSE      = 2.0940
*
*                                   (Std. err. adjusted for multiway clustering)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9951969    .074771   -13.31   0.000    -1.148121   -.8422731
*           X2 |  -.1766173     .02376    -7.43   0.000     -.225212   -.1280226
*        _cons |   .8895569   .3016464     2.95   0.006     .2726208    1.506493
* ------------------------------------------------------------------------------
* Cluster combinations formed by f1 and f2.this is because by default, pyfixest uses a small sample size correction that adjusts each clustering dimentsion by whichever dimension has the smallest number of clusters, while in Stata the default is to adjust each dimension based on the number of clusters in that dimension. You can use the same correction as Stata through the ssc argument.
fit8 = pf.feols(
  "Y ~ X1 + X2",
  df.dropna(subset=['f1', 'f2']),
  vcov={"CRV1": "f1 + f2"},
  ssc=pf.ssc(cluster_df="conventional")
)
fit8.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: 0
Inference:  CRV1
Observations:  997
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.890 |        0.302 |     2.949 |      0.006 |  0.273 |   1.506 |
| X1            |     -0.995 |        0.075 |   -13.310 |      0.000 | -1.148 |  -0.842 |
| X2            |     -0.177 |        0.024 |    -7.433 |      0.000 | -0.225 |  -0.128 |
---
RMSE: 2.091 R2: 0.177 As a reminder, for an excellent breakdown on small sample correction in the pyfixest package, you can check out On Small Sample Correction.
Fixed Effect Regressions
To do a fixed effect regression with one fixed effect you could use
fit9 = pf.feols("Y ~ X1 + X2 | f1", data=df, vcov="iid")
fit9.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  iid
Observations:  997
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.950 |        0.066 |   -14.306 |      0.000 | -1.080 |  -0.819 |
| X2            |     -0.174 |        0.018 |    -9.902 |      0.000 | -0.209 |  -0.140 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239 which is equivalent to
xtset f1
xtreg Y X1 X2, fe
* Fixed-effects (within) regression               Number of obs     =        997
* Group variable: f1                              Number of groups  =         30
*
* R-squared:                                      Obs per group:
*      Within  = 0.2388                                         min =         23
*      Between = 0.0770                                         avg =       33.2
*      Overall = 0.1774                                         max =         48
*
*                                                 F(2, 965)         =     151.33
* corr(u_i, Xb) = 0.0268                          Prob > F          =     0.0000
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0663728   -14.31   0.000    -1.079777   -.8192739
*           X2 |  -.1742253   .0175957    -9.90   0.000    -.2087555   -.1396951
*        _cons |    .842222   .0872525     9.65   0.000     .6709955    1.013448
* -------------+----------------------------------------------------------------
*      sigma_u |  1.2570454
*      sigma_e |  1.6751049
*          rho |  .36026283   (fraction of variance due to u_i)
* ------------------------------------------------------------------------------
* F test that all u_i=0: F(29, 965) = 20.29                    Prob > F = 0.0000or
reghdfe Y X1 X2, absorb(f1)
* HDFE Linear regression                            Number of obs   =        997
* Absorbing 1 HDFE group                            F(   2,    965) =     151.33
*                                                   Prob > F        =     0.0000
*                                                   R-squared       =     0.4890
*                                                   Adj R-squared   =     0.4726
*                                                   Within R-sq.    =     0.2388
*                                                   Root MSE        =     1.6751
*
* ------------------------------------------------------------------------------
*            Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0663728   -14.31   0.000    -1.079777   -.8192739
*           X2 |  -.1742253   .0175957    -9.90   0.000    -.2087555   -.1396951
*        _cons |    .842222   .0872525     9.65   0.000     .6709955    1.013448
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
*  Absorbed FE | Categories  - Redundant  = Num. Coefs |
* -------------+---------------------------------------|
*           f1 |        30           0          30     |
* -----------------------------------------------------+Note: You need to specify vcov="iid. This is because the default for fixest is to cluster standard errors based on the first fixed effect. So
fit10 = pf.feols("Y ~ X1 + X2 | f1", data=df)
fit10.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV1
Observations:  997
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.950 |        0.067 |   -14.266 |      0.000 | -1.086 |  -0.813 |
| X2            |     -0.174 |        0.018 |    -9.464 |      0.000 | -0.212 |  -0.137 |
---
RMSE: 1.648 R2: 0.489 R2 Within: 0.239 is equivalent to
xtset f1
xtreg Y X1 X2, fe vce(cluster f1)
* Fixed-effects (within) regression               Number of obs     =        997
* Group variable: f1                              Number of groups  =         30
*
* R-squared:                                      Obs per group:
*      Within  = 0.2388                                         min =         23
*      Between = 0.0770                                         avg =       33.2
*      Overall = 0.1774                                         max =         48
*
*                                                 F(2, 29)          =     146.33
* corr(u_i, Xb) = 0.0268                          Prob > F          =     0.0000
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0665572   -14.27   0.000     -1.08565   -.8134009
*           X2 |  -.1742253    .018409    -9.46   0.000    -.2118759   -.1365746
*        _cons |    .842222   .0694639    12.12   0.000     .7001523    .9842916
* -------------+----------------------------------------------------------------
*      sigma_u |  1.2570454
*      sigma_e |  1.6751049
*          rho |  .36026283   (fraction of variance due to u_i)
* ------------------------------------------------------------------------------or
reghdfe Y X1 X2, absorb(f1) cluster(f1)
* HDFE Linear regression                            Number of obs   =        997
* Absorbing 1 HDFE group                            F(   2,     29) =     146.33
* Statistics robust to heteroskedasticity           Prob > F        =     0.0000
*                                                   R-squared       =     0.4890
*                                                   Adj R-squared   =     0.4726
*                                                   Within R-sq.    =     0.2388
* Number of clusters (f1)      =         30         Root MSE        =     1.6751
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9495256   .0665572   -14.27   0.000     -1.08565   -.8134009
*           X2 |  -.1742253    .018409    -9.46   0.000    -.2118759   -.1365746
*        _cons |    .842222   .0694639    12.12   0.000     .7001523    .9842916
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
*  Absorbed FE | Categories  - Redundant  = Num. Coefs |
* -------------+---------------------------------------|
*           f1 |        30          30           0    *|
* -----------------------------------------------------+
* * = FE nested within cluster; treated as redundant for DoF computationFor multiple fixed effects you could do
fit11 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=df)
fit11.summary()###
Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997
| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.924 |        0.062 |   -14.934 |      0.000 | -1.051 |  -0.797 |
| X2            |     -0.174 |        0.015 |   -11.737 |      0.000 | -0.204 |  -0.144 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303 which is equivalent to
reghdfe Y X1 X2, absorb(f1 f2) cluster(f1)
* HDFE Linear regression                            Number of obs   =        997
* Absorbing 2 HDFE groups                           F(   2,     29) =     182.76
* Statistics robust to heteroskedasticity           Prob > F        =     0.0000
*                                                   R-squared       =     0.6590
*                                                   Adj R-squared   =     0.6372
*                                                   Within R-sq.    =     0.3026
* Number of clusters (f1)      =         30         Root MSE        =     1.3893
*
*                                     (Std. err. adjusted for 30 clusters in f1)
* ------------------------------------------------------------------------------
*              |               Robust
*            Y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
* -------------+----------------------------------------------------------------
*           X1 |  -.9240462   .0618743   -14.93   0.000    -1.050593   -.7974991
*           X2 |  -.1741073   .0148338   -11.74   0.000    -.2044458   -.1437689
*        _cons |   .8156588    .064596    12.63   0.000     .6835452    .9477723
* ------------------------------------------------------------------------------
*
* Absorbed degrees of freedom:
* -----------------------------------------------------+
*  Absorbed FE | Categories  - Redundant  = Num. Coefs |
* -------------+---------------------------------------|
*           f1 |        30          30           0    *|
*           f2 |        30           1          29     |
* -----------------------------------------------------+
* * = FE nested within cluster; treated as redundant for DoF computationNote: By default pyfixest will still cluster standard errors just by the first group specified. You can change this using the vcov argument. See the fit8 example above for how to specify two way clustering.