estimation.estimation.feols

estimation.estimation.feols(
    fml,
    data,
    vcov=None,
    weights=None,
    ssc=None,
    fixef_rm='none',
    fixef_tol=1e-08,
    fixef_maxiter=100000,
    collin_tol=1e-10,
    drop_intercept=False,
    copy_data=True,
    store_data=True,
    lean=False,
    weights_type='aweights',
    solver='scipy.linalg.solve',
    demeaner_backend='numba',
    use_compression=False,
    reps=100,
    context=None,
    seed=None,
    split=None,
    fsplit=None,
)

Estimate a linear regression models with fixed effects using fixest formula syntax.

Parameters

Name	Type	Description	Default
fml	str	A three-sided formula string using fixest formula syntax. Syntax: “Y ~ X1 + X2 \| FE1 + FE2 \| X1 ~ Z1”. “\|” separates dependent variable, fixed effects, and instruments. Special syntax includes stepwise regressions, cumulative stepwise regression, multiple dependent variables, interaction of variables (i(X1,X2)), and interacted fixed effects (fe1^fe2).	required
data	DataFrameType	A pandas or polars dataframe containing the variables in the formula.	required
vcov	Union[VcovTypeOptions, dict[str, str]]	Type of variance-covariance matrix for inference. Options include “iid”, “hetero”, “HC1”, “HC2”, “HC3”, or a dictionary for CRV1/CRV3 inference.	`None`
weights	Union[None, str], optional.	Default is None. Weights for WLS estimation. If None, all observations are weighted equally. If a string, the name of the column in `data` that contains the weights.	`None`
ssc	str	A ssc object specifying the small sample correction for inference.	`None`
fixef_rm	FixedRmOptions	Specifies whether to drop singleton fixed effects. Options: “none” (default), “singleton”.	`'none'`
collin_tol	float	Tolerance for collinearity check, by default 1e-10.	`1e-10`
fixef_tol		Tolerance for the fixed effects demeaning algorithm. Defaults to 1e-08.	`1e-08`
fixef_maxiter	int	Maximum number of iterations for the demeaning algorithm. Defaults to 100,000.	`100000`
drop_intercept	bool	Whether to drop the intercept from the model, by default False.	`False`
copy_data	bool	Whether to copy the data before estimation, by default True. If set to False, the data is not copied, which can save memory but may lead to unintended changes in the input data outside of `fepois`. For example, the input data set is re-index within the function. As far as I know, the only other relevant case is when using interacted fixed effects, in which case you’ll find a column with interacted fixed effects in the data set.	`True`
store_data	bool	Whether to store the data in the model object, by default True. If set to False, the data is not stored in the model object, which can improve performance and save memory. However, it will no longer be possible to access the data via the `data` attribute of the model object. This has impact on post-estimation capabilities that rely on the data, e.g. `predict()` or `vcov()`.	`True`
lean	bool	False by default. If True, then all large objects are removed from the returned result: this will save memory but will block the possibility to use many methods. It is recommended to use the argument vcov to obtain the appropriate standard-errors at estimation time, since obtaining different SEs won’t be possible afterwards.	`False`
weights_type	WeightsTypeOptions	Options include `aweights` or `fweights`. `aweights` implement analytic or precision weights, while `fweights` implement frequency weights. For details see this blog post: https://notstatschat.rbind.io/2020/08/04/weights-in-statistics/.	`'aweights'`
solver	SolverOptions, optional.	The solver to use for the regression. Can be “np.linalg.lstsq”, “np.linalg.solve”, “scipy.linalg.solve”, “scipy.sparse.linalg.lsqr” and “jax”. Defaults to “scipy.linalg.solve”.	`'scipy.linalg.solve'`
demeaner_backend	DemeanerBackendOptions	The backend to use for demeaning. Can be either “numba”, “jax”, or “rust”. Defaults to “numba”.	`'numba'`
use_compression	bool	Whether to use sufficient statistics to losslessly fit the regression model on compressed data. False by default. If True, the model is estimated on compressed data, which can lead to a significant speed-up for large data sets. See the paper by Wong et al (2021) for more details https://arxiv.org/abs/2102.11297. Note that if `use_compression = True`, inference is lossless. If standard errors are clustered, a wild cluster bootstrap is employed. Parameters for the wild bootstrap can be specified via the `reps` and `seed` arguments. Additionally, note that for one-way fixed effects, the estimation method uses a Mundlak transform to “control” for the fixed effects. For two-way fixed effects, a two-way Mundlak transform is employed. For two-way fixed effects, the Mundlak transform is only identical to a two-way fixed effects model if the data set is a panel. We do not provide any checks for the panel status of the data set.	`False`
reps	int	Number of bootstrap repetitions. Only relevant for boostrap inference applied to compute cluster robust errors when `use_compression = True`.	`100`
context	int or Mapping[str, Any]	A dictionary containing additional context variables to be used by formulaic during the creation of the model matrix. This can include custom factorization functions, transformations, or any other variables that need to be available in the formula environment.	`None`
seed	Optional[int]	Seed for the random number generator. Only relevant for boostrap inference applied to compute cluster robust errors when `use_compression = True`.	`None`
split	Optional[str]	A character string, i.e. ‘split = var’. If provided, the sample is split according to the variable and one estimation is performed for each value of that variable. If you also want to include the estimation for the full sample, use the argument fsplit instead.	`None`
fsplit	Optional[str]	This argument is the same as split but also includes the full sample as the first estimation.	`None`

Returns

Name	Type	Description
	object	An instance of the [Feols(/reference/Feols.qmd) class or `FixestMulti` class for multiple models specified via `fml`.

Examples

As in fixest, the [Feols(/reference/Feols.qmd) function can be used to estimate a simple linear regression model with fixed effects. The following example regresses Y on X1 and X2 with fixed effects for f1 and f2: fixed effects are specified after the | symbol.

import pyfixest as pf
import pandas as pd
import numpy as np

data = pf.get_data()

fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)
fit.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.924 |        0.062 |   -14.934 |      0.000 | -1.051 |  -0.797 |
| X2            |     -0.174 |        0.015 |   -11.737 |      0.000 | -0.204 |  -0.144 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303

Calling feols() returns an instance of the [Feols(/reference/Feols.qmd) class. The summary() method can be used to print the results.

An alternative way to retrieve model results is via the tidy() method, which returns a pandas dataframe with the estimated coefficients, standard errors, t-statistics, and p-values.

fit.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	-0.924046	0.061874	-14.934259	3.774758e-15	-1.050593	-0.797499
X2	-0.174107	0.014834	-11.737229	1.548539e-12	-0.204446	-0.143769

You can also access all elements in the tidy data frame by dedicated methods, e.g. fit.coef() for the coefficients, fit.se() for the standard errors, fit.tstat() for the t-statistics, and fit.pval() for the p-values, and fit.confint() for the confidence intervals.

The employed type of inference can be specified via the vcov argument. If vcov is not provided, PyFixest employs the fixest default of iid inference, unless there are fixed effects in the model, in which case feols() clusters the standard error by the first fixed effect (CRV1 inference).

fit1 = pf.feols("Y ~ X1 + X2 | f1 + f2", data, vcov="iid")
fit2 = pf.feols("Y ~ X1 + X2 | f1 + f2", data, vcov="hetero")
fit3 = pf.feols("Y ~ X1 + X2 | f1 + f2", data, vcov={"CRV1": "f1"})

Supported inference types are “iid”, “hetero”, “HC1”, “HC2”, “HC3”, and “CRV1”/“CRV3”. Clustered standard errors are specified via a dictionary, e.g. {"CRV1": "f1"} for CRV1 inference with clustering by f1 or {"CRV3": "f1"} for CRV3 inference with clustering by f1. For two-way clustering, you can provide a formula string, e.g. {"CRV1": "f1 + f2"} for CRV1 inference with clustering by f1.

fit4 = pf.feols("Y ~ X1 + X2 | f1 + f2", data, vcov={"CRV1": "f1 + f2"})

Inference can be adjusted post estimation via the vcov method:

fit.summary()
fit.vcov("iid").summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.924 |        0.062 |   -14.934 |      0.000 | -1.051 |  -0.797 |
| X2            |     -0.174 |        0.015 |   -11.737 |      0.000 | -0.204 |  -0.144 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303 
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  iid
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1            |     -0.924 |        0.056 |   -16.483 |      0.000 | -1.034 |  -0.814 |
| X2            |     -0.174 |        0.015 |   -11.717 |      0.000 | -0.203 |  -0.145 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303

The ssc argument specifies the small sample correction for inference. In general, feols() uses all of fixest::feols() defaults, but sets the fixef.K argument to "none" whereas the fixest::feols() default is "nested". See here for more details: link to github.

feols() supports a range of multiple estimation syntax, i.e. you can estimate multiple models in one call. The following example estimates two models, one with fixed effects for f1 and one with fixed effects for f2 using the sw() syntax.

fit = pf.feols("Y ~ X1 + X2 | sw(f1, f2)", data)
type(fit)

pyfixest.estimation.FixestMulti_.FixestMulti

The returned object is an instance of the FixestMulti class. You can access the results of the first model via fit.fetch_model(0) and the results of the second model via fit.fetch_model(1). You can compare the model results via the etable() function:

pf.etable(fit)

	Y
	(1)	(2)
coef
X1	-0.950*** (0.067)	-0.979*** (0.077)
X2	-0.174*** (0.018)	-0.175*** (0.022)
fe
f1	x	-
f2	-	x
stats
Observations	997	998
S.E. type	by: f1	by: f2
R²	0.489	0.354
R² Within	0.239	0.204
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Other supported multiple estimation syntax include sw0(), csw() and csw0(). While sw() adds variables in a “stepwise” fashion, csw() does so cumulatively.

fit = pf.feols("Y ~ X1 + X2 | csw(f1, f2)", data)
pf.etable(fit)

	Y
	(1)	(2)
coef
X1	-0.950*** (0.067)	-0.924*** (0.062)
X2	-0.174*** (0.018)	-0.174*** (0.015)
fe
f1	x	x
f2	-	x
stats
Observations	997	997
S.E. type	by: f1	by: f1
R²	0.489	0.659
R² Within	0.239	0.303
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

The sw0() and csw0() syntax are similar to sw() and csw(), but start with a model that excludes the variables specified in sw() and csw():

fit = pf.feols("Y ~ X1 + X2 | sw0(f1, f2)", data)
pf.etable(fit)

	Y
	(1)	(2)	(3)
coef
X1	-0.993*** (0.082)	-0.950*** (0.067)	-0.979*** (0.077)
X2	-0.176*** (0.022)	-0.174*** (0.018)	-0.175*** (0.022)
Intercept	0.889*** (0.108)
fe
f1	-	x	-
f2	-	-	x
stats
Observations	998	997	998
S.E. type	iid	by: f1	by: f2
R²	0.177	0.489	0.354
R² Within	-	0.239	0.204
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

The feols() function also supports multiple dependent variables. The following example estimates two models, one with Y1 as the dependent variable and one with Y2 as the dependent variable.

fit = pf.feols("Y + Y2 ~ X1 | f1 + f2", data)
pf.etable(fit)

	Y	Y2
	(1)	(2)
coef
X1	-0.919*** (0.066)	-1.228*** (0.198)
fe
f1	x	x
f2	x	x
stats
Observations	997	998
S.E. type	by: f1	by: f1
R²	0.609	0.168
R² Within	0.200	0.035
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

It is possible to combine different multiple estimation operators:

fit = pf.feols("Y + Y2 ~ X1 | sw(f1, f2)", data)
pf.etable(fit)

	Y	Y2	Y	Y2
	(1)	(2)	(3)	(4)
coef
X1	-0.949*** (0.069)	-1.266*** (0.176)	-0.982*** (0.081)	-1.301*** (0.205)
fe
f1	x	x	-	-
f2	-	-	x	x
stats
Observations	997	998	998	999
S.E. type	by: f1	by: f1	by: f2	by: f2
R²	0.437	0.115	0.302	0.090
R² Within	0.161	0.036	0.141	0.036
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

In general, using muliple estimation syntax can improve the estimation time as covariates that are demeaned in one model and are used in another model do not need to be demeaned again: feols() implements a caching mechanism that stores the demeaned covariates.

Additionally, you can fit models on different samples via the split and fsplit arguments. The split argument splits the sample according to the variable specified in the argument, while the fsplit argument also includes the full sample in the estimation.

fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data, split = "f1")
pf.etable(fit)

/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)
/home/runner/work/pyfixest/pyfixest/pyfixest/utils/utils.py:198: RuntimeWarning: divide by zero encountered in scalar divide
  cluster_adj_value = G / (G - 1)

	Y
	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)	(12)	(13)	(14)	(15)	(16)	(17)	(18)	(19)	(20)	(21)	(22)	(23)	(24)	(25)	(26)	(27)	(28)	(29)	(30)
coef
X1	-1.357 (INF)	-1.137 (INF)	-0.455 (INF)	-1.138 (INF)	0.201 (INF)	-0.306 (INF)	-0.597 (INF)	-0.824 (INF)	-1.482 (INF)	-1.117 (INF)	-1.142 (INF)	-1.334 (INF)	-3.531 (INF)	-1.102 (INF)	-0.826 (INF)	-0.773 (INF)	-1.501 (INF)	-1.226 (INF)	-0.641 (INF)	-0.378 (INF)	-0.652 (INF)	-1.508 (INF)	-0.941 (INF)	-0.206 (INF)	-0.195 (INF)	-0.702 (INF)	-1.141 (INF)	-1.349 (INF)	-0.537 (INF)	-1.141 (INF)
X2	-0.250 (INF)	0.198 (INF)	-0.145 (INF)	-0.330 (INF)	-0.177 (INF)	-0.187 (INF)	-0.118 (INF)	-0.292 (INF)	-0.029 (INF)	-0.264 (INF)	-0.148 (INF)	-0.313 (INF)	-0.152 (INF)	-0.296 (INF)	0.130 (INF)	-0.059 (INF)	-0.223 (INF)	-0.113 (INF)	-0.261 (INF)	0.089 (INF)	-0.148 (INF)	-0.267 (INF)	-0.125 (INF)	-0.282 (INF)	-0.153 (INF)	0.004 (INF)	0.083 (INF)	-0.226 (INF)	-0.158 (INF)	-0.160 (INF)
fe
f1	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x
f2	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x	x
stats
Observations	30	29	44	30	31	36	36	30	36	35	32	30	23	28	34	34	48	40	36	34	35	37	27	35	29	27	43	36	24	28
S.E. type	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1	by: f1
R²	0.850	0.691	0.578	0.745	0.939	0.644	0.792	0.776	0.919	0.797	0.727	0.822	0.924	0.865	0.711	0.808	0.651	0.819	0.746	0.731	0.880	0.868	0.796	0.648	0.915	0.820	0.837	0.789	0.688	0.883
R² Within	0.485	0.296	0.140	0.471	0.314	0.258	0.205	0.396	0.561	0.590	0.448	0.440	0.781	0.329	0.102	0.172	0.397	0.367	0.233	0.079	0.278	0.660	0.255	0.392	0.252	0.118	0.459	0.359	0.144	0.415
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Besides OLS, feols() also supports IV estimation via three part formulas:

fit = pf.feols("Y ~  X2 | f1 + f2 | X1 ~ Z1", data)
fit.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	-1.050097	0.086811	-12.096326	7.460699e-13	-1.227646	-0.872548
X2	-0.174351	0.015007	-11.617833	1.980638e-12	-0.205044	-0.143658

Here, X1 is the endogenous variable and Z1 is the instrument. f1 and f2 are the fixed effects, as before. To estimate IV models without fixed effects, simply omit the fixed effects part of the formula:

fit = pf.feols("Y ~  X2 | X1 ~ Z1", data)
fit.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
Intercept	0.861939	0.151187	5.701137	1.567858e-08	0.565257	1.158622
X1	-0.967238	0.130078	-7.435847	2.238210e-13	-1.222497	-0.711980
X2	-0.176416	0.021769	-8.104001	1.554312e-15	-0.219134	-0.133697

Last, feols() supports interaction of variables via the i() syntax. Documentation on this is tba.

You can pass custom transforms via the context argument. If you set context = 0, all functions from the level of the call to feols() will be available:

def _lspline(series: pd.Series, knots: list[float]) -> np.array:
    'Generate a linear spline design matrix for the input series based on knots.'
    vector = series.values
    columns = []

    for i, knot in enumerate(knots):
        column = np.minimum(vector, knot if i == 0 else knot - knots[i - 1])
        columns.append(column)
        vector = vector - column

    # Add the remainder as the last column
    columns.append(vector)

    # Combine columns into a design matrix
    return np.column_stack(columns)

spline_split = _lspline(data["X2"], [0, 1])
data["X2_0"] = spline_split[:, 0]
data["0_X2_1"] = spline_split[:, 1]
data["1_X2"] = spline_split[:, 2]

explicit_fit = pf.feols("Y ~ X2_0 + 0_X2_1 + 1_X2 | f1 + f2", data=data)
# set context = 0 to make _lspline available for feols' internal call to Formulaic.model_matrix
context_captured_fit = pf.feols("Y ~ _lspline(X2,[0,1]) | f1 + f2", data=data, context = 0)
# or provide it as a dict / mapping
context_captured_fit_map = pf.feols("Y ~ _lspline(X2,[0,1]) | f1 + f2", data=data, context = {"_lspline":_lspline})

pf.etable([explicit_fit, context_captured_fit, context_captured_fit_map])

	Y
	(1)	(2)	(3)
coef
X2_0	-0.177*** (0.040)
0_X2_1	-0.240 (0.138)
1_X2	-0.140** (0.049)
_lspline(X2, [0, 1])[0]		-0.177*** (0.040)	-0.177*** (0.040)
_lspline(X2, [0, 1])[1]		-0.240 (0.138)	-0.240 (0.138)
_lspline(X2, [0, 1])[2]		-0.140** (0.049)	-0.140** (0.049)
fe
f1	x	x	x
f2	x	x	x
stats
Observations	998	998	998
S.E. type	by: f1	by: f1	by: f1
R²	0.560	0.560	0.560
R² Within	0.100	0.100	0.100
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

After fitting a model via feols(), you can use the predict() method to get the predicted values:

fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)
fit.predict()[0:5]

array([ 3.0633663 , -0.69574133, -0.91240433, -0.46370257, -1.67331154])

The predict() method also supports a newdata argument to predict on new data, which returns a numpy array of the predicted values:

fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)
fit.predict(newdata=data)[0:5]

array([ 2.14598761,         nan,         nan,  3.06336415, -0.69574276])

Last, you can plot the results of a model via the coefplot() method:

fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data)
fit.coefplot()

We can conduct a regression decomposition via the decompose() method, which implements a regression decomposition following the method developed in Gelbach (2016):

import re
import pyfixest as pf
from pyfixest.utils.dgps import gelbach_data

data_gelbach = gelbach_data(nobs = 1000)
fit = pf.feols("y ~ x1 + x21 + x22 + x23", data=data_gelbach)

# simple decomposition
res = fit.decompose(param = "x1")
pf.make_table(res)

# group covariates via "combine_covariates" argument
res = fit.decompose(param = "x1", combine_covariates={"g1": ["x21", "x22"], "g2": ["x23"]})
pf.make_table(res)

# group covariates via regex
res = fit.decompose(param="x1", combine_covariates={"g1": re.compile("x2[1-2]"), "g2": re.compile("x23")})

  0%|          | 0/1000 [00:00<?, ?it/s]  1%|          | 8/1000 [00:03<06:23,  2.59it/s]  1%|          | 12/1000 [00:03<03:55,  4.19it/s]  8%|▊         | 76/1000 [00:03<00:22, 41.29it/s] 20%|██        | 204/1000 [00:03<00:05, 135.30it/s] 40%|███▉      | 396/1000 [00:03<00:02, 288.24it/s] 52%|█████▏    | 524/1000 [00:03<00:01, 370.27it/s] 65%|██████▌   | 652/1000 [00:03<00:00, 442.44it/s] 78%|███████▊  | 780/1000 [00:04<00:00, 519.77it/s] 91%|█████████ | 908/1000 [00:04<00:00, 568.77it/s]100%|██████████| 1000/1000 [00:04<00:00, 231.60it/s]
  0%|          | 0/1000 [00:00<?, ?it/s] 12%|█▏        | 120/1000 [00:00<00:00, 1031.16it/s] 25%|██▍       | 248/1000 [00:00<00:00, 1070.07it/s] 50%|█████     | 504/1000 [00:00<00:00, 1119.99it/s] 63%|██████▎   | 632/1000 [00:00<00:00, 1016.53it/s] 76%|███████▌  | 760/1000 [00:00<00:00, 963.26it/s]  89%|████████▉ | 888/1000 [00:00<00:00, 925.32it/s]100%|██████████| 1000/1000 [00:00<00:00, 1090.20it/s]
  0%|          | 0/1000 [00:00<?, ?it/s] 12%|█▏        | 120/1000 [00:00<00:00, 1038.57it/s] 25%|██▍       | 248/1000 [00:00<00:00, 1076.54it/s] 50%|█████     | 504/1000 [00:00<00:00, 1111.17it/s] 63%|██████▎   | 632/1000 [00:00<00:00, 1023.98it/s] 76%|███████▌  | 760/1000 [00:00<00:00, 957.68it/s]  89%|████████▉ | 888/1000 [00:00<00:00, 931.36it/s]100%|██████████| 1000/1000 [00:00<00:00, 1101.81it/s]

Objects of type Feols support a range of other methods to conduct inference. For example, you can run a wild (cluster) bootstrap via the wildboottest() method:

fit = pf.feols("Y ~ X1 + X2", data)
fit.wildboottest(param = "X1", reps=1000)

param                    X1
t value          -12.444998
Pr(>|t|)                0.0
bootstrap_type           11
inference                HC
impose_null            True
ssc                 1.00201
dtype: object

would run a wild bootstrap test for the coefficient of X1 with 1000 bootstrap repetitions.

For a wild cluster bootstrap, you can specify the cluster variable via the cluster argument:

fit.wildboottest(param = "X1", reps=1000, cluster="group_id")

param                             X1
t value           -9.502246183519802
Pr(>|t|)                         0.0
bootstrap_type                    11
inference              CRV(group_id)
impose_null                     True
ssc                         1.057677
dtype: object

The ritest() method can be used to conduct randomization inference:

fit.ritest(resampvar = "X1", reps=1000)

H0                                      X1=0
ri-type                      randomization-c
Estimate                 -0.9929357698186859
Pr(>|t|)                                 0.0
Std. Error (Pr(>|t|))                    0.0
2.5% (Pr(>|t|))                          0.0
97.5% (Pr(>|t|))                         0.0
dtype: object

Last, you can compute the cluster causal variance estimator by Athey et al by using the ccv() method:

import numpy as np
rng = np.random.default_rng(1234)
data["D"] = rng.choice([0, 1], size = data.shape[0])
fit_D = pf.feols("Y ~ D", data = data)
fit_D.ccv(treatment = "D", cluster = "group_id")

/home/runner/work/pyfixest/pyfixest/pyfixest/estimation/feols_.py:1518: UserWarning: The initial model was not clustered. CRV1 inference is computed and stored in the model object.
  warnings.warn(

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
CCV	0.016087657906364183	0.251236	0.064034	0.949649	-0.51174	0.543915
CRV1	0.016088	0.13378	0.120254	0.905614	-0.264974	0.29715