estimation.Feols

estimation.Feols(self, Y, X, weights, collin_tol, coefnames, weights_name, weights_type, solver='np.linalg.solve')

Non user-facing class to estimate a liner regression via OLS.

Users should not directly instantiate this class, but rather use the feols() function. Note that no demeaning is performed in this class: demeaning is performed in the FixestMulti class (to allow for caching of demeaned variables for multiple estimation).

Parameters

Name	Type	Description	Default
`Y`	np.ndarray	Dependent variable, a two-dimensional numpy array.	required
`X`	np.ndarray	Independent variables, a two-dimensional numpy array.	required
`weights`	np.ndarray	Weights, a one-dimensional numpy array.	required
`collin_tol`	float	Tolerance level for collinearity checks.	required
`coefnames`	list[str]	Names of the coefficients (of the design matrix X).	required
`weights_name`	Optional[str]	Name of the weights variable.	required
`weights_type`	Optional[str]	Type of the weights variable. Either “aweights” for analytic weights or “fweights” for frequency weights.	required
`solver`	str, optional.	The solver to use for the regression. Can be either “np.linalg.solve” or “np.linalg.lstsq”. Defaults to “np.linalg.solve”.	`'np.linalg.solve'`

Attributes

Name	Type	Description
_method	str	Specifies the method used for regression, set to “feols”.
_is_iv	bool	Indicates whether instrumental variables are used, initialized as False.
_Y	np.ndarray	The dependent variable array.
_X	np.ndarray	The independent variables array.
_X_is_empty	bool	Indicates whether the X array is empty.
_collin_tol	float	Tolerance level for collinearity checks.
_coefnames	list	Names of the coefficients (of the design matrix X).
_collin_vars	list	Variables identified as collinear.
_collin_index	list	Indices of collinear variables.
_Z	np.ndarray	Alias for the _X array, used for calculations.
_solver	str	The solver used for the regression.
_weights	np.ndarray	Array of weights for each observation.
_N	int	Number of observations.
_k	int	Number of independent variables (or features).
_support_crv3_inference	bool	Indicates support for CRV3 inference.
_data	Any	Data used in the regression, to be enriched outside of the class.
_fml	Any	Formula used in the regression, to be enriched outside of the class.
_has_fixef	bool	Indicates whether fixed effects are used.
_fixef	Any	Fixed effects used in the regression.
_icovars	Any	Internal covariates, to be enriched outside of the class.
_ssc_dict	dict	dictionary for sum of squares and cross products matrices.
_tZX	np.ndarray	Transpose of Z multiplied by X, set in get_fit().
_tXZ	np.ndarray	Transpose of X multiplied by Z, set in get_fit().
_tZy	np.ndarray	Transpose of Z multiplied by Y, set in get_fit().
_tZZinv	np.ndarray	Inverse of the transpose of Z multiplied by Z, set in get_fit().
_beta_hat	np.ndarray	Estimated regression coefficients.
_Y_hat_link	np.ndarray	Predicted values of the dependent variable.
_Y_hat_response	np.ndarray	Response predictions of the model.
_u_hat	np.ndarray	Residuals of the regression model.
_scores	np.ndarray	Scores used in the regression analysis.
_hessian	np.ndarray	Hessian matrix used in the regression.
_bread	np.ndarray	Bread matrix, used in calculating the variance-covariance matrix.
_vcov_type	Any	Type of variance-covariance matrix used.
_vcov_type_detail	Any	Detailed specification of the variance-covariance matrix type.
_is_clustered	bool	Indicates if clustering is used in the variance-covariance calculation.
_clustervar	Any	Variable used for clustering in the variance-covariance calculation.
_G	Any	Group information used in clustering.
_ssc	Any	Sum of squares and cross products matrix.
_vcov	np.ndarray	Variance-covariance matrix of the estimated coefficients.
_se	np.ndarray	Standard errors of the estimated coefficients.
_tstat	np.ndarray	T-statistics of the estimated coefficients.
_pvalue	np.ndarray	P-values associated with the t-statistics.
_conf_int	np.ndarray	Confidence intervals for the estimated coefficients.
_F_stat	Any	F-statistic for the model, set in get_Ftest().
_fixef_dict	dict	dictionary containing fixed effects estimates.
_sumFE	np.ndarray	Sum of all fixed effects for each observation.
_rmse	float	Root mean squared error of the model.
_r2	float	R-squared value of the model.
_r2_within	float	R-squared value computed on demeaned dependent variable.
_adj_r2	float	Adjusted R-squared value of the model.
_adj_r2_within	float	Adjusted R-squared value computed on demeaned dependent variable.
_solver	str	The solver used to fit the normal equation.

Methods

Name	Description
add_fixest_multi_context	Enrich Feols object.
ccv	Compute the Causal Cluster Variance following Abadie et al (QJE 2023).
coef	Fitted model coefficents.
confint	Fitted model confidence intervals.
fixef	Compute the coefficients of (swept out) fixed effects for a regression model.
get_fit	Fit an OLS model.
get_inference	Compute standard errors, t-statistics, and p-values for the regression model.
get_nobs	Fetch the number of observations used in fitting the regression model.
get_performance	Get Goodness-of-Fit measures.
plot_ritest	Plot the distribution of the Randomization Inference Statistics.
predict	Predict values of the model on new data.
pvalue	Fitted model p-values.
resid	Fitted model residuals.
ritest	Conduct Randomization Inference (RI) test against a null hypothesis of
se	Fitted model standard errors.
solve_ols	Solve the ordinary least squares problem using the specified solver.
tidy	Tidy model outputs.
tstat	Fitted model t-statistics.
vcov	Compute covariance matrices for an estimated regression model.
wald_test	Conduct Wald test.
wildboottest	Run a wild cluster bootstrap based on an object of type “Feols”.

add_fixest_multi_context

estimation.Feols.add_fixest_multi_context(fml, depvar, Y, _data, _ssc_dict, _k_fe, fval, store_data)

Enrich Feols object.

Enrich an instance of Feols Class with additional attributes set in the FixestMulti class.

Parameters

Name	Type	Description	Default
`fml`	str	The formula used for estimation.	required
`depvar`	str	The dependent variable of the regression model.	required
`Y`	pd.Series	The dependent variable of the regression model.	required
`_data`	pd.DataFrame	The data used for estimation.	required
`_ssc_dict`	dict	A dictionary with the sum of squares and cross products matrices.	required
`_k_fe`	int	The number of fixed effects.	required
`fval`	str	The fixed effects formula.	required
`store_data`	bool	Indicates whether to save the data used for estimation in the object	required

Returns

Type	Description
None

ccv

estimation.Feols.ccv(treatment, cluster=None, seed=None, n_splits=8, pk=1, qk=1)

Compute the Causal Cluster Variance following Abadie et al (QJE 2023).

Parameters

Name	Type	Description	Default
`treatment`		The name of the treatment variable.	required
`cluster`	str	The name of the cluster variable. None by default. If None, uses the cluster variable from the model fit.	`None`
`seed`	int	An integer to set the random seed. Defaults to None.	`None`
`n_splits`	int	The number of splits to use in the cross-fitting procedure. Defaults to 8.	`8`
`pk`	float	The proportion of sampled clusters. Defaults to 1, which corresponds to all clusters of the population being sampled.	`1`
`qk`	float	The proportion of sampled observations within each cluster. Defaults to 1, which corresponds to all observations within each cluster being sampled.	`1`

Returns

Type	Description
pd.DataFrame	A DataFrame with inference based on the “Causal Cluster Variance” and “regular” CRV1 inference.

Examples

from pyfixest.estimation import feols
from pyfixest.utils import get_data

data = get_data()
data["D1"] = np.random.choice([0, 1], size=data.shape[0])

fit = feols("Y ~ D", data=data, vcov={"CRV1": "group_id"})
fit.ccv(treatment="D", pk=0.05, gk=0.5, n_splits=8, seed=123).head()

coef

estimation.Feols.coef()

Fitted model coefficents.

Returns

Type	Description
pd.Series	A pd.Series with the estimated coefficients of the regression model.

confint

estimation.Feols.confint(alpha=0.05, keep=None, drop=None, exact_match=False, joint=False, seed=None, reps=10000)

Fitted model confidence intervals.

Parameters

Name	Type	Description	Default
`alpha`	float	The significance level for confidence intervals. Defaults to 0.05. keep: str or list of str, optional	`0.05`
`joint`	bool	Whether to compute simultaneous confidence interval for joint null of parameters selected by `keep` and `drop`. Defaults to False. See https://www.causalml-book.org/assets/chapters/CausalML_chap_4.pdf, Remark 4.4.1 for details.	`False`
`keep`	Optional[Union[list, str]]	The pattern for retaining coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Default is keeping all coefficients. You should use regular expressions to select coefficients. “age”, # would keep all coefficients containing age r”^tr”, # would keep all coefficients starting with tr r”\d$“, # would keep all coefficients ending with number Output will be in the order of the patterns.	`None`
`drop`	Optional[Union[list, str]]	The pattern for excluding coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Syntax is the same as for `keep`. Default is keeping all coefficients. Parameter `keep` and `drop` can be used simultaneously.	`None`
`exact_match`	Optional[bool]	Whether to use exact match for `keep` and `drop`. Default is False. If True, the pattern will be matched exactly to the coefficient name instead of using regular expressions.	`False`
`reps`	int	The number of bootstrap iterations to run for joint confidence intervals. Defaults to 10_000. Only used if `joint` is True.	`10000`
`seed`	int	The seed for the random number generator. Defaults to None. Only used if `joint` is True.	`None`

Returns

Type	Description
pd.DataFrame	A pd.DataFrame with confidence intervals of the estimated regression model for the selected coefficients.

Examples

from pyfixest.utils import get_data
from pyfixest.estimation import feols

data = get_data()
fit = feols("Y ~ C(f1)", data=data)
fit.confint(alpha=0.10).head()
fit.confint(alpha=0.10, joint=True, reps=9999).head()

fixef

estimation.Feols.fixef()

Compute the coefficients of (swept out) fixed effects for a regression model.

This method creates the following attributes: - alphaDF (pd.DataFrame): A DataFrame with the estimated fixed effects. - sumFE (np.array): An array with the sum of fixed effects for each observation (i = 1, …, N).

Returns

Type	Description
None

get_fit

estimation.Feols.get_fit()

Fit an OLS model.

Returns

Type	Description
None

get_inference

estimation.Feols.get_inference(alpha=0.05)

Compute standard errors, t-statistics, and p-values for the regression model.

Parameters

Name	Type	Description	Default
`alpha`	float	The significance level for confidence intervals. Defaults to 0.05, which produces a 95% confidence interval.	`0.05`

Returns

Type	Description
None

get_nobs

estimation.Feols.get_nobs()

Fetch the number of observations used in fitting the regression model.

Returns

Type	Description
None

get_performance

estimation.Feols.get_performance()

Get Goodness-of-Fit measures.

Compute multiple additional measures commonly reported with linear regression output, including R-squared and adjusted R-squared. Note that variables with the suffix _within use demeaned dependent variables Y, while variables without do not or are invariant to demeaning.

Returns

Type	Description
None
Creates the following instances:
- r2 (float): R-squared of the regression model.
- adj_r2 (float): Adjusted R-squared of the regression model.
- r2_within (float): R-squared of the regression model, computed on
demeaned dependent variable.
- adj_r2_within (float): Adjusted R-squared of the regression model,
computed on demeaned dependent variable.

plot_ritest

estimation.Feols.plot_ritest(plot_backend='lets_plot')

Plot the distribution of the Randomization Inference Statistics.

Parameters

Name	Type	Description	Default
`plot_backend`	str	The plotting backend to use. Defaults to “lets_plot”. Alternatively, “matplotlib” is available.	`'lets_plot'`

Returns

Type	Description
A lets_plot or matplotlib figure with the distribution of the Randomization
Inference Statistics.

predict

estimation.Feols.predict(newdata=None)

Predict values of the model on new data.

Return a flat np.array with predicted values of the regression model. If new fixed effect levels are introduced in newdata, predicted values for such observations will be set to NaN.

Parameters

Name	Type	Description	Default
`newdata`	Optional[DataFrameType]	A pd.DataFrame or pl.DataFrame with the data to be used for prediction. If None (default), the data used for fitting the model is used.	`None`

Returns

Type	Description
np.ndarray	A flat np.array with predicted values of the regression model.

pvalue

estimation.Feols.pvalue()

Fitted model p-values.

Returns

Type	Description
pd.Series	A pd.Series with p-values of the estimated regression model.

resid

estimation.Feols.resid()

Fitted model residuals.

Returns

Type	Description
np.ndarray	A np.ndarray with the residuals of the estimated regression model.

ritest

estimation.Feols.ritest(resampvar, cluster=None, reps=100, type='randomization-c', rng=None, choose_algorithm='auto', store_ritest_statistics=False, level=0.95)

Conduct Randomization Inference (RI) test against a null hypothesis of resampvar = 0.

Parameters

Name	Type	Description	Default
`resampvar`	str	The name of the variable to be resampled.	required
`cluster`	str	The name of the cluster variable in case of cluster random assignment. If provided, `resampvar` is held constant within each `cluster`. Defaults to None.	`None`
`reps`	int	The number of randomization iterations. Defaults to 100.	`100`
`type`	str	The type of the randomization inference test. Can be “randomization-c” or “randomization-t”. Note that the “randomization-c” is much faster, while the “randomization-t” is recommended by Wu & Ding (JASA, 2021).	`'randomization-c'`
`rng`	np.random.Generator	A random number generator. Defaults to None.	`None`
`choose_algorithm`	str	The algorithm to use for the computation. Defaults to “auto”. The alternative is “fast” and “slow”, and should only be used for running CI tests. Ironically, this argument is not tested for any input errors from the user! So please don’t use it =)	`'auto'`
`include_plot`		Whether to include a plot of the distribution p-values. Defaults to False.	required
`store_ritest_statistics`	bool	Whether to store the simulated statistics of the RI procedure. Defaults to False. If True, stores the simulated statistics in the model object via the `ritest_statistics` attribute as a numpy array.	`False`
`level`	float	The level for the confidence interval of the randomization inference p-value. Defaults to 0.95.	`0.95`

Returns

Type	Description
A pd.Series with the regression coefficient of `resampvar` and the p-value
of the RI test. Additionally, reports the standard error and the confidence
interval of the p-value.

se

estimation.Feols.se()

Fitted model standard errors.

Returns

Type	Description
pd.Series	A pd.Series with the standard errors of the estimated regression model.

solve_ols

estimation.Feols.solve_ols(tZX, tZY, solver)

Solve the ordinary least squares problem using the specified solver.

Parameters

Name	Type	Default
`tZX`	np.ndarray	required
`tZY`	np.ndarray	required
`solver`	str	required

Returns

Type	Description
array-like: The solution to the ordinary least squares problem.

Raises

Type	Description
ValueError: If the specified solver is not supported.

tidy

estimation.Feols.tidy(alpha=None)

Tidy model outputs.

Return a tidy pd.DataFrame with the point estimates, standard errors, t-statistics, and p-values.

Parameters

alpha: Optional[float] The significance level for the confidence intervals. If None, computes a 95% confidence interval (alpha = 0.05).

Returns

Type	Description
pd.DataFrame	A tidy pd.DataFrame containing the regression results, including point estimates, standard errors, t-statistics, and p-values.

tstat

estimation.Feols.tstat()

Fitted model t-statistics.

Returns

Type	Description
pd.Series	A pd.Series with t-statistics of the estimated regression model.

vcov

estimation.Feols.vcov(vcov, data=None)

Compute covariance matrices for an estimated regression model.

Parameters

Name	Type	Description	Default
`vcov`	Union[str, dict[str, str]]	A string or dictionary specifying the type of variance-covariance matrix to use for inference. If a string, it can be one of “iid”, “hetero”, “HC1”, “HC2”, “HC3”. If a dictionary, it should have the format {“CRV1”: “clustervar”} for CRV1 inference or {“CRV3”: “clustervar”} for CRV3 inference. Note that CRV3 inference is currently not supported for IV estimation.	required
`data`	Optional[DataFrameType]	The data used for estimation. If None, tries to fetch the data from the model object. Defaults to None.	`None`

Returns

Type	Description
Feols	An instance of class [Feols(/reference/Feols.qmd) with updated inference.

wald_test

estimation.Feols.wald_test(R=None, q=None, distribution='F')

Conduct Wald test.

Compute a Wald test for a linear hypothesis of the form Rb = q. By default, tests the joint null hypothesis that all coefficients are zero.

Parameters

Name	Type	Description	Default
`R`	array - like	The matrix R of the linear hypothesis. If None, defaults to an identity matrix.	`None`
`q`	array - like	The vector q of the linear hypothesis. If None, defaults to a vector of zeros.	`None`
`distribution`	str	The distribution to use for the p-value. Can be either “F” or “chi2”. Defaults to “F”.	`'F'`

Returns

Type	Description
pd.Series	A pd.Series with the Wald statistic and p-value.

wildboottest

estimation.Feols.wildboottest(reps, cluster=None, param=None, weights_type='rademacher', impose_null=True, bootstrap_type='11', seed=None, adj=True, cluster_adj=True, parallel=False, return_bootstrapped_t_stats=False)

Run a wild cluster bootstrap based on an object of type “Feols”.

Parameters

Name	Type	Description	Default
`reps`	int	The number of bootstrap iterations to run.	required
`cluster`	Union[str, None]	The variable used for clustering. Defaults to None. If None, then uses the variable specified in the model’s `clustervar` attribute. If no `_clustervar` attribute is found, runs a heteroskedasticity- robust bootstrap.	`None`
`param`	Union[str, None]	A string of length one, containing the test parameter of interest. Defaults to None.	`None`
`weights_type`	str	The type of bootstrap weights. Options are ‘rademacher’, ‘mammen’, ‘webb’, or ‘normal’. Defaults to ‘rademacher’.	`'rademacher'`
`impose_null`	bool	Indicates whether to impose the null hypothesis on the bootstrap DGP. Defaults to True.	`True`
`bootstrap_type`	str	A string of length one to choose the bootstrap type. Options are ‘11’, ‘31’, ‘13’, or ‘33’. Defaults to ‘11’.	`'11'`
`seed`	Union[int, None]	An option to provide a random seed. Defaults to None.	`None`
`adj`	bool	Indicates whether to apply a small sample adjustment for the number of observations and covariates. Defaults to True.	`True`
`cluster_adj`	bool	Indicates whether to apply a small sample adjustment for the number of clusters. Defaults to True.	`True`
`parallel`	bool	Indicates whether to run the bootstrap in parallel. Defaults to False.	`False`
`seed`	Union[str, None]	An option to provide a random seed. Defaults to None.	`None`
`return_bootstrapped_t_stats`	bool, optional:	If True, the method returns a tuple of the regular output and the bootstrapped t-stats. Defaults to False.	`False`

Returns

Type	Description
pd.DataFrame	A DataFrame with the original, non-bootstrapped t-statistic and bootstrapped p-value, along with the bootstrap type, inference type (HC vs CRV), and whether the null hypothesis was imposed on the bootstrap DGP. If `return_bootstrapped_t_stats` is True, the method returns a tuple of the regular output and the bootstrapped t-stats.