Estimate a linear regression models with fixed effects using fixest formula syntax.
Parameters
Name
Type
Description
Default
fml
str
A three-sided formula string using fixest formula syntax. Syntax: “Y ~ X1 + X2 | FE1 + FE2 | X1 ~ Z1”. “|” separates dependent variable, fixed effects, and instruments. Special syntax includes stepwise regressions, cumulative stepwise regression, multiple dependent variables, interaction of variables (i(X1,X2)), and interacted fixed effects (fe1^fe2).
required
data
DataFrameType
A pandas or polars dataframe containing the variables in the formula.
required
vcov
Union[str, dict[str, str]]
Type of variance-covariance matrix for inference. Options include “iid”, “hetero”, “HC1”, “HC2”, “HC3”, or a dictionary for CRV1/CRV3 inference.
None
weights
Union[None, str], optional.
Default is None. Weights for WLS estimation. If None, all observations are weighted equally. If a string, the name of the column in data that contains the weights.
None
ssc
str
A ssc object specifying the small sample correction for inference.
ssc()
fixef_rm
str
Specifies whether to drop singleton fixed effects. Options: “none” (default), “singleton”.
'none'
collin_tol
float
Tolerance for collinearity check, by default 1e-10.
1e-10
fixef_tol
Tolerance for the fixed effects demeaning algorithm. Defaults to 1e-08.
1e-08
drop_intercept
bool
Whether to drop the intercept from the model, by default False.
False
i_ref1
Deprecated with pyfixest version 0.18.0. Please use i-syntax instead, i.e. feols(‘Y~ i(f1, ref=1)’, data = data) instead of the former feols(‘Y~ i(f1)’, data = data, i_ref=1).
None
copy_data
bool
Whether to copy the data before estimation, by default True. If set to False, the data is not copied, which can save memory but may lead to unintended changes in the input data outside of fepois. For example, the input data set is re-index within the function. As far as I know, the only other relevant case is when using interacted fixed effects, in which case you’ll find a column with interacted fixed effects in the data set.
True
store_data
bool
Whether to store the data in the model object, by default True. If set to False, the data is not stored in the model object, which can improve performance and save memory. However, it will no longer be possible to access the data via the data attribute of the model object. This has impact on post-estimation capabilities that rely on the data, e.g. predict() or vcov().
True
weights_type
str
Options include aweights or fweights. aweights implement analytic or precision weights, while fweights implement frequency weights. For details see this blog post: https://notstatschat.rbind.io/2020/08/04/weights-in-statistics/.
'aweights'
Returns
Type
Description
object
An instance of the [Feols(/reference/Feols.qmd) class or FixestMulti class for multiple models specified via fml.
Examples
As in fixest, the [Feols(/reference/Feols.qmd) function can be used to estimate a simple linear regression model with fixed effects. The following example regresses Y on X1 and X2 with fixed effects for f1 and f2: fixed effects are specified after the | symbol.
import pyfixest as pfdata = pf.get_data()fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)fit.summary()
Calling feols() returns an instance of the [Feols(/reference/Feols.qmd) class. The summary() method can be used to print the results.
An alternative way to retrieve model results is via the tidy() method, which returns a pandas dataframe with the estimated coefficients, standard errors, t-statistics, and p-values.
fit.tidy()
Estimate
Std. Error
t value
Pr(>|t|)
2.5%
97.5%
Coefficient
X1
-0.924046
0.060934
-15.164621
2.664535e-15
-1.048671
-0.799421
X2
-0.174107
0.014608
-11.918277
1.069367e-12
-0.203985
-0.144230
You can also access all elements in the tidy data frame by dedicated methods, e.g. fit.coef() for the coefficients, fit.se() for the standard errors, fit.tstat() for the t-statistics, and fit.pval() for the p-values, and fit.confint() for the confidence intervals.
The employed type of inference can be specified via the vcov argument. If vcov is not provided, PyFixest employs the fixest default of iid inference, unless there are fixed effects in the model, in which case feols() clusters the standard error by the first fixed effect (CRV1 inference).
Supported inference types are “iid”, “hetero”, “HC1”, “HC2”, “HC3”, and “CRV1”/“CRV3”. Clustered standard errors are specified via a dictionary, e.g. {"CRV1": "f1"} for CRV1 inference with clustering by f1 or {"CRV3": "f1"} for CRV3 inference with clustering by f1. For two-way clustering, you can provide a formula string, e.g. {"CRV1": "f1 + f2"} for CRV1 inference with clustering by f1.
The ssc argument specifies the small sample correction for inference. In general, feols() uses all of fixest::feols() defaults, but sets the fixef.K argument to "none" whereas the fixest::feols() default is "nested". See here for more details: link to github.
feols() supports a range of multiple estimation syntax, i.e. you can estimate multiple models in one call. The following example estimates two models, one with fixed effects for f1 and one with fixed effects for f2 using the sw() syntax.
The returned object is an instance of the FixestMulti class. You can access the results of the first model via fit.fetch_model(0) and the results of the second model via fit.fetch_model(1). You can compare the model results via the etable() function:
Model: Y~X1+X2|f1
Model: Y~X1+X2|f2
est1 est2
------------ ----------------- -----------------
depvar Y Y
--------------------------------------------------
X1 -0.950*** (0.067) -0.979*** (0.077)
X2 -0.174*** (0.018) -0.175*** (0.022)
--------------------------------------------------
f1 x -
f2 - x
--------------------------------------------------
R2 0.489 0.354
S.E. type by: f1 by: f2
Observations 997 998
--------------------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient (Std. Error)
Other supported multiple estimation syntax include sw0(), csw() and csw0(). While sw() adds variables in a “stepwise” fashion, csw() does so cumulatively.
Model: Y~X1+X2|f1
Model: Y~X1+X2|f1+f2
est1 est2
------------ ----------------- -----------------
depvar Y Y
--------------------------------------------------
X1 -0.950*** (0.067) -0.924*** (0.061)
X2 -0.174*** (0.018) -0.174*** (0.015)
--------------------------------------------------
f1 x x
f2 - x
--------------------------------------------------
R2 0.489 0.659
S.E. type by: f1 by: f1
Observations 997 997
--------------------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient (Std. Error)
The sw0() and csw0() syntax are similar to sw() and csw(), but start with a model that excludes the variables specified in sw() and csw():
Model: Y~X1+X2
Model: Y~X1+X2|f1
Model: Y~X1+X2|f2
est1 est2 est3
------------ ----------------- ----------------- -----------------
depvar Y Y Y
---------------------------------------------------------------------
Intercept 0.889*** (0.108)
X1 -0.993*** (0.082) -0.950*** (0.067) -0.979*** (0.077)
X2 -0.176*** (0.022) -0.174*** (0.018) -0.175*** (0.022)
---------------------------------------------------------------------
f1 - x -
f2 - - x
---------------------------------------------------------------------
R2 0.177 0.489 0.354
S.E. type iid by: f1 by: f2
Observations 998 997 998
---------------------------------------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient (Std. Error)
The feols() function also supports multiple dependent variables. The following example estimates two models, one with Y1 as the dependent variable and one with Y2 as the dependent variable.
fit = pf.feols("Y + Y2 ~ X1 | f1 + f2", data)pf.etable([fit.fetch_model(0), fit.fetch_model(1)])
Model: Y~X1|f1+f2
Model: Y2~X1|f1+f2
est1 est2
------------ ----------------- -----------------
depvar Y Y2
--------------------------------------------------
X1 -0.919*** (0.065) -1.228*** (0.195)
--------------------------------------------------
f1 x x
f2 x x
--------------------------------------------------
R2 0.609 0.168
S.E. type by: f1 by: f1
Observations 997 998
--------------------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient (Std. Error)
It is possible to combine different multiple estimation operators:
Model: Y~X1|f1
Model: Y2~X1|f1
Model: Y~X1|f2
Model: Y2~X1|f2
est1 est2 est3 est4
------------ ----------------- ----------------- ----------------- -----------------
depvar Y Y2 Y Y2
----------------------------------------------------------------------------------------
X1 -0.949*** (0.069) -1.266*** (0.176) -0.982*** (0.081) -1.301*** (0.205)
----------------------------------------------------------------------------------------
f1 x x - -
f2 - - x x
----------------------------------------------------------------------------------------
R2 0.437 0.115 0.302 0.090
S.E. type by: f1 by: f1 by: f2 by: f2
Observations 997 998 998 999
----------------------------------------------------------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient (Std. Error)
In general, using muliple estimation syntax can improve the estimation time as covariates that are demeaned in one model and are used in another model do not need to be demeaned again: feols() implements a caching mechanism that stores the demeaned covariates.
Besides OLS, feols() also supports IV estimation via three part formulas:
fit = pf.feols("Y ~ X2 | f1 + f2 | X1 ~ Z1", data)fit.tidy()
Estimate
Std. Error
t value
Pr(>|t|)
2.5%
97.5%
Coefficient
X1
-1.050097
0.085493
-12.282912
5.133671e-13
-1.224949
-0.875245
X2
-0.174351
0.014779
-11.797039
1.369793e-12
-0.204578
-0.144124
Here, X1 is the endogenous variable and Z1 is the instrument. f1 and f2 are the fixed effects, as before. To estimate IV models without fixed effects, simply omit the fixed effects part of the formula:
fit = pf.feols("Y ~ X2 | X1 ~ Z1", data)fit.tidy()
Estimate
Std. Error
t value
Pr(>|t|)
2.5%
97.5%
Coefficient
Intercept
0.861939
0.151187
5.701137
1.567858e-08
0.565257
1.158622
X1
-0.967238
0.130078
-7.435847
2.238210e-13
-1.222497
-0.711980
X2
-0.176416
0.021769
-8.104001
1.554312e-15
-0.219134
-0.133697
Last, feols() supports interaction of variables via the i() syntax. Documentation on this is tba.
After fitting a model via feols(), you can use the predict() method to get the predicted values:
fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)fit.predict()[0:5]
Last, you can plot the results of a model via the coefplot() method:
fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)fit.coefplot()
Objects of type Feols support a range of other methods to conduct inference. For example, you can run a wild (cluster) bootstrap via the wildboottest() method:
fit.wildboottest(param ="X1", reps=1000)
param X1
t value -14.70814685400939
Pr(>|t|) 0.0
bootstrap_type 11
inference CRV(f1)
impose_null True
dtype: object
would run a wild bootstrap test for the coefficient of X1 with 1000 bootstrap repetitions.
For a wild cluster bootstrap, you can specify the cluster variable via the cluster argument:
/home/runner/work/pyfixest/pyfixest/pyfixest/estimation/feols_.py:1180: UserWarning:
The initial model was not clustered. CRV1 inference is computed and stored in the model object.