Setup
import numpy as np
import pyfixest as pf
data = pf.get_data()
data.head()
Y
Y2
X1
X2
f1
f2
f3
group_id
Z1
Z2
weights
0
NaN
2.357103
0.0
0.457858
15.0
0.0
7.0
9.0
-0.330607
1.054826
0.661478
1
-1.458643
5.163147
NaN
-4.998406
6.0
21.0
4.0
8.0
NaN
-4.113690
0.772732
2
0.169132
0.751140
2.0
1.558480
NaN
1.0
7.0
16.0
1.207778
0.465282
0.990929
3
3.319513
-2.656368
1.0
1.560402
1.0
10.0
11.0
3.0
2.869997
0.467570
0.021123
4
0.134420
-1.866416
2.0
-3.472232
19.0
20.0
6.0
14.0
0.835819
-3.115669
0.790815
PyFixest specifies different regression models by Wilkinson Formulas via the formulaic package. Wilkinson formulas should be familiar to you if you have used R’s lm() or statsmodels formula API. Many additional ideas implemented in PyFixest have been developed in the fixest package (most notably multiple estimation syntax, the i-operator, sample splitting). By default, all formula options presented here are supported by all models available via the pf.feols(), pf.feglm(), and pf.fepois() APIs.
Basic Syntax
In the simplest case, we regress covariates X1 and X2 on Y.
fit1 = pf.feols("Y ~ X1 + X2" , data= data)
fit1.summary()
###
Estimation: OLS
Dep. var.: Y
sample: None = all
Inference: iid
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.889 | 0.108 | 8.197 | 0.000 | 0.676 | 1.102 |
| X1 | -0.993 | 0.082 | -12.092 | 0.000 | -1.154 | -0.832 |
| X2 | -0.176 | 0.022 | -8.102 | 0.000 | -0.219 | -0.134 |
---
RMSE: 2.09 R2: 0.177
All transformations that are supported via formulaic are also supported via PyFixest. To name just a few important ones, you can create categorical variables via the C() operator:
fit2 = pf.feols("Y ~ X1 + X2 + C(f1)" , data= data)
You can interact variables via the * and : operators:
fit3 = pf.feols("Y ~ X1:X2" , data= data)
fit4 = pf.feols("Y ~ X1*X2" , data= data)
pf.etable([fit3, fit4])
Y
(1)
(2)
coef
X1 × X2
-0.099 (0.018)
0.02 (0.027)
X1
-0.992 (0.082)
X2
-0.197 (0.036)
Intercept
-0.136 (0.072)
0.888 (0.108)
stats
Observations
998
998
R2
0.031
0.177
Format of coefficient cell: Coefficient (Std. Error)
To create logarithms of a function, just use
fit5 = pf.feols("Y ~ log(X1)" , data= data)
or use any numpy transforms, e.g.
fit5 = pf.feols("Y ~ X1 + np.power(X1,2)" , data= data)
Note - for the logarithm, we suggest to not rely on np.log but use the internal log operator.
Fixed Effects Syntax
We can add fixed effects behind the | operator: here we add two fixed effects f1 and f2.
fit6 = pf.feols("Y ~ X1 + X2 | f1 + f2" , data= data)
We can interact two fixed effects via the ^ operator.
fit7 = pf.feols("Y ~ X1 + X2 | f1^f2" , data= data)
For details on fixed effects regression, take a look at the OLS with Fixed Effects vignette.
Instrumental Variables (IV) Syntax
For IV estimation, PyFixest uses a three-part formula syntax:
"Y ~ exogenous_controls | fixed_effects | endogenous ~ instruments"
Here is a minimal example with fixed effects:
fit_iv = pf.feols("Y ~ X2 | f1 + f2 | X1 ~ Z1" , data= data)
fit_iv.summary()
###
Estimation: IV
Dep. var.: Y, Fixed effects: f1 + f2
sample: None = all
Inference: iid
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X2 | -0.174 | 0.015 | -11.701 | 0.000 | -0.204 | -0.145 |
| X1 | -1.050 | 0.089 | -11.793 | 0.000 | -1.225 | -0.875 |
---
For details on IV estimation, take a look at the Instrumental Variables vignette.
The i() operator for interacting fixed effects
For interacting fixed effects, we include a specialised operator i()
If you simply wrap a variable into i(), it will be treated just as the C() operator (see above).
fit_i = pf.feols("Y ~ i(f1)" , data= data)
fit_c = pf.feols("Y ~ C(f1)" , data= data)
But overall, i() is more powerful than C(). Most importantly, you can easily set the reference level of the categorical variable:
# set 1 as reference level
fit_i1 = pf.feols("Y ~ i(f1, ref = 1)" , data= data)
You can also easily interact variables:
# set 1 as reference level
fit_i2 = pf.feols("Y ~ i(f1, f2)" , data= data)
and set reference levels for both via the ref and ref2 levels.
# set 1 as reference level
fit_i3 = pf.feols("Y ~ i(f1, f2, ref = 1, ref2 = 2)" , data= data)
This is in particular useful for difference-in-differences models.
Last, you can bin levels of a variable via the bin argument. This groups multiple levels into a single category.
fit_bin = pf.feols(
"Y ~ i(f1, bin={'low': list(range(0, 10)), 'mid': list(range(10, 20)), 'high': list(range(20, 30))}, ref='low')" ,
data= data,
)
fit_bin.summary()
###
Estimation: OLS
Dep. var.: Y
sample: None = all
Inference: iid
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | -0.473 | 0.122 | -3.887 | 0.000 | -0.712 | -0.234 |
| f1::high | 0.110 | 0.174 | 0.630 | 0.529 | -0.232 | 0.451 |
| f1::mid | 0.968 | 0.176 | 5.503 | 0.000 | 0.623 | 1.313 |
---
RMSE: 2.264 R2: 0.035
Multiple Estimation Syntax
Last, PyFixest provides syntactic sugar to fit multiple estimations in one go. This is not only economizes on lines-of-code, but allows for performance optimizations via caching - if you fit many regression models on a fixed set of fixed effects and many overlapping covariates or dependent variables, and performance is poor, we highly recommend you to try out multiple estimations.
For multiple estimations, we provide 5 custom operators: sw, csw, sw0, csw0 and mvsw. In addition, it is possible to specify multiple dependent variables.
Multiple dependent variables
Multiple depvars are expanded to multiple estimations: "Y1 + Y2 ~ X1" behaves like "sw(Y1, Y2) ~ X1".
fit_multi_dep = pf.feols("Y + Y2 ~ X1 + X2" , data= data)
pf.etable(fit_multi_dep)
Y
Y2
(1)
(2)
coef
X1
-0.993 (0.082)
-1.316 (0.214)
X2
-0.176 (0.022)
-0.133 (0.057)
Intercept
0.889 (0.108)
1.042 (0.283)
stats
Observations
998
999
R2
0.177
0.042
Format of coefficient cell: Coefficient (Std. Error)
sw(): stepwise alternatives
y ~ x1 + sw(x2, x3) expands to y ~ x1 + x2 and y ~ x1 + x3.
fit_sw = pf.feols("Y ~ X1 + sw(X2, Z1)" , data= data)
pf.etable(fit_sw)
Y
(1)
(2)
coef
X1
-0.993 (0.082)
-0.991 (0.109)
X2
-0.176 (0.022)
Z1
-0.009 (0.068)
Intercept
0.889 (0.108)
0.918 (0.112)
stats
Observations
998
998
R2
0.177
0.123
Format of coefficient cell: Coefficient (Std. Error)
sw0(): stepwise with zero step
y ~ x1 + sw0(x2, x3) expands to y ~ x1, y ~ x1 + x2, and y ~ x1 + x3.
fit_sw0 = pf.feols("Y ~ X1 + sw0(X2, Z1)" , data= data)
pf.etable(fit_sw0)
Y
(1)
(2)
(3)
coef
X1
-1.000 (0.085)
-0.993 (0.082)
-0.991 (0.109)
X2
-0.176 (0.022)
Z1
-0.009 (0.068)
Intercept
0.919 (0.112)
0.889 (0.108)
0.918 (0.112)
stats
Observations
998
998
998
R2
0.123
0.177
0.123
Format of coefficient cell: Coefficient (Std. Error)
csw(): cumulative stepwise
y ~ x1 + csw(x2, x3) expands to y ~ x1 + x2 and y ~ x1 + x2 + x3.
fit_csw = pf.feols("Y ~ X1 + csw(X2, Z1)" , data= data)
pf.etable(fit_csw)
Y
(1)
(2)
coef
X1
-0.993 (0.082)
-1.010 (0.106)
X2
-0.176 (0.022)
-0.177 (0.022)
Z1
0.017 (0.066)
Intercept
0.889 (0.108)
0.889 (0.108)
stats
Observations
998
998
R2
0.177
0.177
Format of coefficient cell: Coefficient (Std. Error)
csw0(): cumulative stepwise with zero step
y ~ x1 + csw0(x2, x3) expands to y ~ x1, y ~ x1 + x2, and y ~ x1 + x2 + x3.
fit_csw0 = pf.feols("Y ~ X1 + csw0(X2, Z1)" , data= data)
pf.etable(fit_csw0)
Y
(1)
(2)
(3)
coef
X1
-1.000 (0.085)
-0.993 (0.082)
-1.010 (0.106)
X2
-0.176 (0.022)
-0.177 (0.022)
Z1
0.017 (0.066)
Intercept
0.919 (0.112)
0.889 (0.108)
0.889 (0.108)
stats
Observations
998
998
998
R2
0.123
0.177
0.177
Format of coefficient cell: Coefficient (Std. Error)
mvsw(): multiverse stepwise
y ~ mvsw(x1, x2, x3) expands to all non-empty combinations plus the zero step: y ~ 1, y ~ x1, y ~ x2, y ~ x3, y ~ x1 + x2, y ~ x1 + x3, y ~ x2 + x3, y ~ x1 + x2 + x3.
fit_mvsw = pf.feols("Y ~ mvsw(X1, X2, Z1)" , data= data)
pf.etable(fit_mvsw)
Y
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
coef
X1
-1.000 (0.085)
-0.993 (0.082)
-0.991 (0.109)
-1.010 (0.106)
X2
-0.178 (0.023)
-0.176 (0.022)
-0.172 (0.023)
-0.177 (0.022)
Z1
-0.396 (0.054)
-0.009 (0.068)
-0.378 (0.053)
0.017 (0.066)
Intercept
-0.127 (0.073)
0.919 (0.112)
-0.15 (0.071)
0.286 (0.091)
0.889 (0.108)
0.918 (0.112)
0.246 (0.089)
0.889 (0.108)
stats
Observations
999
998
999
998
998
998
998
998
R2
0
0.123
0.055
0.05
0.177
0.123
0.102
0.177
Format of coefficient cell: Coefficient (Std. Error)
Combining operators
Multiple estimation operators can be combined. For example, y ~ csw(x1, x2) + sw(z1, z2) expands to y ~ x1 + z1, y ~ x1 + z2, y ~ x1 + x2 + z1, y ~ x1 + x2 + z2.
fit_combo = pf.feols("Y ~ csw(X1, X2) + sw(Z1, X1:Z1)" , data= data)
pf.etable(fit_combo)
Y
(1)
(2)
(3)
(4)
coef
X1
-0.991 (0.109)
-1.014 (0.13)
-1.010 (0.106)
-1.041 (0.126)
Z1
-0.009 (0.068)
0.017 (0.066)
X1 × Z1
0.007 (0.049)
0.024 (0.047)
X2
-0.177 (0.022)
-0.177 (0.022)
Intercept
0.918 (0.112)
0.921 (0.113)
0.889 (0.108)
0.897 (0.11)
stats
Observations
998
998
998
998
R2
0.123
0.123
0.177
0.177
Format of coefficient cell: Coefficient (Std. Error)
Regressions on Multiple Samples
Via the split and fsplit argument, you can easily separate identical models on different samples.
split estimates separate models by subgroup.
fsplit does the same but also keeps the full-sample fit.
fit_split = pf.feols("Y ~ X1 + X2 | f1" , data= data, split= "f2" )
pf.etable(fit_split)
Y
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
coef
X1
-2.177 (0.638)
-0.801 (0.466)
0.495 (0.344)
-2.044 (0.482)
-0.519 (0.785)
-0.974 (0.312)
0.056 (0.505)
-0.222 (1.151)
-0.69 (0.46)
0.351 (0.42)
-0.986 (0.458)
-0.466 (0.472)
-0.879 (0.509)
-1.851 (1.072)
-2.697 (1.051)
-1.532 (0.387)
-1.274 (0.329)
-1.120 (0.838)
-0.937 (0.529)
-1.012 (0.531)
-1.315 (0.37)
-1.137 (0.527)
-1.033 (0.447)
-1.700 (0.5)
-0.43 (0.303)
-1.065 (0.539)
-0.065 (0.674)
-0.575 (0.412)
-0.659 (0.48)
-0.845 (0.411)
X2
-0.145 (0.065)
-0.106 (0.134)
-0.022 (0.093)
-0.177 (0.144)
-0.096 (0.212)
-0.213 (0.086)
-0.23 (0.128)
-0.206 (0.248)
-0.135 (0.123)
-0.061 (0.099)
-0.242 (0.102)
-0.224 (0.149)
-0.139 (0.115)
-0.218 (0.194)
-0.24 (0.147)
-0.182 (0.132)
-0.093 (0.076)
-0.254 (0.221)
-0.171 (0.104)
-0.189 (0.14)
-0.102 (0.097)
-0.323 (0.188)
-0.033 (0.132)
-0.235 (0.146)
-0.059 (0.09)
-0.326 (0.093)
0.099 (0.185)
0.03 (0.148)
-0.256 (0.173)
-0.026 (0.112)
fe
f1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
stats
Observations
24
32
10
27
19
28
29
14
18
24
36
14
35
9
20
30
24
19
23
27
23
23
25
18
24
34
16
26
22
35
R2
0.924
0.788
0.975
0.858
0.749
0.781
0.715
0.6
0.754
0.782
0.696
0.673
0.658
0.954
0.798
0.834
0.8
0.51
0.841
0.735
0.598
0.771
0.597
0.877
0.829
0.668
0.514
0.761
0.788
0.654
Format of coefficient cell: Coefficient (Std. Error)
fit_fsplit = pf.feols("Y ~ X1 + X2 | f1" , data= data, fsplit= "f2" )
pf.etable(fit_fsplit)
Y
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
coef
X1
-0.95 (0.066)
-2.177 (0.638)
-0.801 (0.466)
0.495 (0.344)
-2.044 (0.482)
-0.519 (0.785)
-0.974 (0.312)
0.056 (0.505)
-0.222 (1.151)
-0.69 (0.46)
0.351 (0.42)
-0.986 (0.458)
-0.466 (0.472)
-0.879 (0.509)
-1.851 (1.072)
-2.697 (1.051)
-1.532 (0.387)
-1.274 (0.329)
-1.120 (0.838)
-0.937 (0.529)
-1.012 (0.531)
-1.315 (0.37)
-1.137 (0.527)
-1.033 (0.447)
-1.700 (0.5)
-0.43 (0.303)
-1.065 (0.539)
-0.065 (0.674)
-0.575 (0.412)
-0.659 (0.48)
-0.845 (0.411)
X2
-0.174 (0.018)
-0.145 (0.065)
-0.106 (0.134)
-0.022 (0.093)
-0.177 (0.144)
-0.096 (0.212)
-0.213 (0.086)
-0.23 (0.128)
-0.206 (0.248)
-0.135 (0.123)
-0.061 (0.099)
-0.242 (0.102)
-0.224 (0.149)
-0.139 (0.115)
-0.218 (0.194)
-0.24 (0.147)
-0.182 (0.132)
-0.093 (0.076)
-0.254 (0.221)
-0.171 (0.104)
-0.189 (0.14)
-0.102 (0.097)
-0.323 (0.188)
-0.033 (0.132)
-0.235 (0.146)
-0.059 (0.09)
-0.326 (0.093)
0.099 (0.185)
0.03 (0.148)
-0.256 (0.173)
-0.026 (0.112)
fe
f1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
stats
Observations
997
24
32
10
27
19
28
29
14
18
24
36
14
35
9
20
30
24
19
23
27
23
23
25
18
24
34
16
26
22
35
R2
0.489
0.924
0.788
0.975
0.858
0.749
0.781
0.715
0.6
0.754
0.782
0.696
0.673
0.658
0.954
0.798
0.834
0.8
0.51
0.841
0.735
0.598
0.771
0.597
0.877
0.829
0.668
0.514
0.761
0.788
0.654
Format of coefficient cell: Coefficient (Std. Error)