estimation.Feols

estimation.Feols(self, Y, X, weights, collin_tol, coefnames, weights_name, weights_type, solver='np.linalg.solve')

Non user-facing class to estimate a liner regression via OLS.

Users should not directly instantiate this class, but rather use the feols() function. Note that no demeaning is performed in this class: demeaning is performed in the FixestMulti class (to allow for caching of demeaned variables for multiple estimation).

Parameters

Name Type Description Default
Y np.ndarray Dependent variable, a two-dimensional numpy array. required
X np.ndarray Independent variables, a two-dimensional numpy array. required
weights np.ndarray Weights, a one-dimensional numpy array. required
collin_tol float Tolerance level for collinearity checks. required
coefnames list[str] Names of the coefficients (of the design matrix X). required
weights_name Optional[str] Name of the weights variable. required
weights_type Optional[str] Type of the weights variable. Either “aweights” for analytic weights or “fweights” for frequency weights. required
solver str, optional. The solver to use for the regression. Can be either “np.linalg.solve” or “np.linalg.lstsq”. Defaults to “np.linalg.solve”. 'np.linalg.solve'

Attributes

Name Type Description
_method str Specifies the method used for regression, set to “feols”.
_is_iv bool Indicates whether instrumental variables are used, initialized as False.
_Y np.ndarray The dependent variable array.
_X np.ndarray The independent variables array.
_X_is_empty bool Indicates whether the X array is empty.
_collin_tol float Tolerance level for collinearity checks.
_coefnames list Names of the coefficients (of the design matrix X).
_collin_vars list Variables identified as collinear.
_collin_index list Indices of collinear variables.
_Z np.ndarray Alias for the _X array, used for calculations.
_solver str The solver used for the regression.
_weights np.ndarray Array of weights for each observation.
_N int Number of observations.
_k int Number of independent variables (or features).
_support_crv3_inference bool Indicates support for CRV3 inference.
_data Any Data used in the regression, to be enriched outside of the class.
_fml Any Formula used in the regression, to be enriched outside of the class.
_has_fixef bool Indicates whether fixed effects are used.
_fixef Any Fixed effects used in the regression.
_icovars Any Internal covariates, to be enriched outside of the class.
_ssc_dict dict dictionary for sum of squares and cross products matrices.
_tZX np.ndarray Transpose of Z multiplied by X, set in get_fit().
_tXZ np.ndarray Transpose of X multiplied by Z, set in get_fit().
_tZy np.ndarray Transpose of Z multiplied by Y, set in get_fit().
_tZZinv np.ndarray Inverse of the transpose of Z multiplied by Z, set in get_fit().
_beta_hat np.ndarray Estimated regression coefficients.
_Y_hat_link np.ndarray Predicted values of the dependent variable.
_Y_hat_response np.ndarray Response predictions of the model.
_u_hat np.ndarray Residuals of the regression model.
_scores np.ndarray Scores used in the regression analysis.
_hessian np.ndarray Hessian matrix used in the regression.
_bread np.ndarray Bread matrix, used in calculating the variance-covariance matrix.
_vcov_type Any Type of variance-covariance matrix used.
_vcov_type_detail Any Detailed specification of the variance-covariance matrix type.
_is_clustered bool Indicates if clustering is used in the variance-covariance calculation.
_clustervar Any Variable used for clustering in the variance-covariance calculation.
_G Any Group information used in clustering.
_ssc Any Sum of squares and cross products matrix.
_vcov np.ndarray Variance-covariance matrix of the estimated coefficients.
_se np.ndarray Standard errors of the estimated coefficients.
_tstat np.ndarray T-statistics of the estimated coefficients.
_pvalue np.ndarray P-values associated with the t-statistics.
_conf_int np.ndarray Confidence intervals for the estimated coefficients.
_F_stat Any F-statistic for the model, set in get_Ftest().
_fixef_dict dict dictionary containing fixed effects estimates.
_sumFE np.ndarray Sum of all fixed effects for each observation.
_rmse float Root mean squared error of the model.
_r2 float R-squared value of the model.
_r2_within float R-squared value computed on demeaned dependent variable.
_adj_r2 float Adjusted R-squared value of the model.
_adj_r2_within float Adjusted R-squared value computed on demeaned dependent variable.
_solver str The solver used to fit the normal equation.

Methods

Name Description
add_fixest_multi_context Enrich Feols object.
ccv Compute the Causal Cluster Variance following Abadie et al (QJE 2023).
coef Fitted model coefficents.
confint Fitted model confidence intervals.
fixef Compute the coefficients of (swept out) fixed effects for a regression model.
get_fit Fit an OLS model.
get_inference Compute standard errors, t-statistics, and p-values for the regression model.
get_nobs Fetch the number of observations used in fitting the regression model.
get_performance Get Goodness-of-Fit measures.
plot_ritest Plot the distribution of the Randomization Inference Statistics.
predict Predict values of the model on new data.
pvalue Fitted model p-values.
resid Fitted model residuals.
ritest Conduct Randomization Inference (RI) test against a null hypothesis of
se Fitted model standard errors.
solve_ols Solve the ordinary least squares problem using the specified solver.
tidy Tidy model outputs.
tstat Fitted model t-statistics.
vcov Compute covariance matrices for an estimated regression model.
wald_test Conduct Wald test.
wildboottest Run a wild cluster bootstrap based on an object of type “Feols”.

add_fixest_multi_context

estimation.Feols.add_fixest_multi_context(fml, depvar, Y, _data, _ssc_dict, _k_fe, fval, store_data)

Enrich Feols object.

Enrich an instance of Feols Class with additional attributes set in the FixestMulti class.

Parameters

Name Type Description Default
fml str The formula used for estimation. required
depvar str The dependent variable of the regression model. required
Y pd.Series The dependent variable of the regression model. required
_data pd.DataFrame The data used for estimation. required
_ssc_dict dict A dictionary with the sum of squares and cross products matrices. required
_k_fe int The number of fixed effects. required
fval str The fixed effects formula. required
store_data bool Indicates whether to save the data used for estimation in the object required

Returns

Type Description
None

ccv

estimation.Feols.ccv(treatment, cluster=None, seed=None, n_splits=8, pk=1, qk=1)

Compute the Causal Cluster Variance following Abadie et al (QJE 2023).

Parameters

Name Type Description Default
treatment The name of the treatment variable. required
cluster str The name of the cluster variable. None by default. If None, uses the cluster variable from the model fit. None
seed int An integer to set the random seed. Defaults to None. None
n_splits int The number of splits to use in the cross-fitting procedure. Defaults to 8. 8
pk float The proportion of sampled clusters. Defaults to 1, which corresponds to all clusters of the population being sampled. 1
qk float The proportion of sampled observations within each cluster. Defaults to 1, which corresponds to all observations within each cluster being sampled. 1

Returns

Type Description
pd.DataFrame A DataFrame with inference based on the “Causal Cluster Variance” and “regular” CRV1 inference.

Examples

from pyfixest.estimation import feols
from pyfixest.utils import get_data

data = get_data()
data["D1"] = np.random.choice([0, 1], size=data.shape[0])

fit = feols("Y ~ D", data=data, vcov={"CRV1": "group_id"})
fit.ccv(treatment="D", pk=0.05, gk=0.5, n_splits=8, seed=123).head()

coef

estimation.Feols.coef()

Fitted model coefficents.

Returns

Type Description
pd.Series A pd.Series with the estimated coefficients of the regression model.

confint

estimation.Feols.confint(alpha=0.05, keep=None, drop=None, exact_match=False, joint=False, seed=None, reps=10000)

Fitted model confidence intervals.

Parameters

Name Type Description Default
alpha float The significance level for confidence intervals. Defaults to 0.05. keep: str or list of str, optional 0.05
joint bool Whether to compute simultaneous confidence interval for joint null of parameters selected by keep and drop. Defaults to False. See https://www.causalml-book.org/assets/chapters/CausalML_chap_4.pdf, Remark 4.4.1 for details. False
keep Optional[Union[list, str]] The pattern for retaining coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Default is keeping all coefficients. You should use regular expressions to select coefficients. “age”, # would keep all coefficients containing age r”^tr”, # would keep all coefficients starting with tr r”\d$“, # would keep all coefficients ending with number Output will be in the order of the patterns. None
drop Optional[Union[list, str]] The pattern for excluding coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Syntax is the same as for keep. Default is keeping all coefficients. Parameter keep and drop can be used simultaneously. None
exact_match Optional[bool] Whether to use exact match for keep and drop. Default is False. If True, the pattern will be matched exactly to the coefficient name instead of using regular expressions. False
reps int The number of bootstrap iterations to run for joint confidence intervals. Defaults to 10_000. Only used if joint is True. 10000
seed int The seed for the random number generator. Defaults to None. Only used if joint is True. None

Returns

Type Description
pd.DataFrame A pd.DataFrame with confidence intervals of the estimated regression model for the selected coefficients.

Examples

from pyfixest.utils import get_data
from pyfixest.estimation import feols

data = get_data()
fit = feols("Y ~ C(f1)", data=data)
fit.confint(alpha=0.10).head()
fit.confint(alpha=0.10, joint=True, reps=9999).head()

fixef

estimation.Feols.fixef()

Compute the coefficients of (swept out) fixed effects for a regression model.

This method creates the following attributes: - alphaDF (pd.DataFrame): A DataFrame with the estimated fixed effects. - sumFE (np.array): An array with the sum of fixed effects for each observation (i = 1, …, N).

Returns

Type Description
None

get_fit

estimation.Feols.get_fit()

Fit an OLS model.

Returns

Type Description
None

get_inference

estimation.Feols.get_inference(alpha=0.05)

Compute standard errors, t-statistics, and p-values for the regression model.

Parameters

Name Type Description Default
alpha float The significance level for confidence intervals. Defaults to 0.05, which produces a 95% confidence interval. 0.05

Returns

Type Description
None

get_nobs

estimation.Feols.get_nobs()

Fetch the number of observations used in fitting the regression model.

Returns

Type Description
None

get_performance

estimation.Feols.get_performance()

Get Goodness-of-Fit measures.

Compute multiple additional measures commonly reported with linear regression output, including R-squared and adjusted R-squared. Note that variables with the suffix _within use demeaned dependent variables Y, while variables without do not or are invariant to demeaning.

Returns

Type Description
None
Creates the following instances:
- r2 (float): R-squared of the regression model.
- adj_r2 (float): Adjusted R-squared of the regression model.
- r2_within (float): R-squared of the regression model, computed on
demeaned dependent variable.
- adj_r2_within (float): Adjusted R-squared of the regression model,
computed on demeaned dependent variable.

plot_ritest

estimation.Feols.plot_ritest(plot_backend='lets_plot')

Plot the distribution of the Randomization Inference Statistics.

Parameters

Name Type Description Default
plot_backend str The plotting backend to use. Defaults to “lets_plot”. Alternatively, “matplotlib” is available. 'lets_plot'

Returns

Type Description
A lets_plot or matplotlib figure with the distribution of the Randomization
Inference Statistics.

predict

estimation.Feols.predict(newdata=None)

Predict values of the model on new data.

Return a flat np.array with predicted values of the regression model. If new fixed effect levels are introduced in newdata, predicted values for such observations will be set to NaN.

Parameters

Name Type Description Default
newdata Optional[DataFrameType] A pd.DataFrame or pl.DataFrame with the data to be used for prediction. If None (default), the data used for fitting the model is used. None

Returns

Type Description
np.ndarray A flat np.array with predicted values of the regression model.

pvalue

estimation.Feols.pvalue()

Fitted model p-values.

Returns

Type Description
pd.Series A pd.Series with p-values of the estimated regression model.

resid

estimation.Feols.resid()

Fitted model residuals.

Returns

Type Description
np.ndarray A np.ndarray with the residuals of the estimated regression model.

ritest

estimation.Feols.ritest(resampvar, cluster=None, reps=100, type='randomization-c', rng=None, choose_algorithm='auto', store_ritest_statistics=False, level=0.95)

Conduct Randomization Inference (RI) test against a null hypothesis of resampvar = 0.

Parameters

Name Type Description Default
resampvar str The name of the variable to be resampled. required
cluster str The name of the cluster variable in case of cluster random assignment. If provided, resampvar is held constant within each cluster. Defaults to None. None
reps int The number of randomization iterations. Defaults to 100. 100
type str The type of the randomization inference test. Can be “randomization-c” or “randomization-t”. Note that the “randomization-c” is much faster, while the “randomization-t” is recommended by Wu & Ding (JASA, 2021). 'randomization-c'
rng np.random.Generator A random number generator. Defaults to None. None
choose_algorithm str The algorithm to use for the computation. Defaults to “auto”. The alternative is “fast” and “slow”, and should only be used for running CI tests. Ironically, this argument is not tested for any input errors from the user! So please don’t use it =) 'auto'
include_plot Whether to include a plot of the distribution p-values. Defaults to False. required
store_ritest_statistics bool Whether to store the simulated statistics of the RI procedure. Defaults to False. If True, stores the simulated statistics in the model object via the ritest_statistics attribute as a numpy array. False
level float The level for the confidence interval of the randomization inference p-value. Defaults to 0.95. 0.95

Returns

Type Description
A pd.Series with the regression coefficient of resampvar and the p-value
of the RI test. Additionally, reports the standard error and the confidence
interval of the p-value.

se

estimation.Feols.se()

Fitted model standard errors.

Returns

Type Description
pd.Series A pd.Series with the standard errors of the estimated regression model.

solve_ols

estimation.Feols.solve_ols(tZX, tZY, solver)

Solve the ordinary least squares problem using the specified solver.

Parameters

Name Type Description Default
tZX np.ndarray required
tZY np.ndarray required
solver str required

Returns

Type Description
array-like: The solution to the ordinary least squares problem.

Raises

Type Description
ValueError: If the specified solver is not supported.

tidy

estimation.Feols.tidy(alpha=None)

Tidy model outputs.

Return a tidy pd.DataFrame with the point estimates, standard errors, t-statistics, and p-values.

Parameters

alpha: Optional[float] The significance level for the confidence intervals. If None, computes a 95% confidence interval (alpha = 0.05).

Returns

Type Description
pd.DataFrame A tidy pd.DataFrame containing the regression results, including point estimates, standard errors, t-statistics, and p-values.

tstat

estimation.Feols.tstat()

Fitted model t-statistics.

Returns

Type Description
pd.Series A pd.Series with t-statistics of the estimated regression model.

vcov

estimation.Feols.vcov(vcov, data=None)

Compute covariance matrices for an estimated regression model.

Parameters

Name Type Description Default
vcov Union[str, dict[str, str]] A string or dictionary specifying the type of variance-covariance matrix to use for inference. If a string, it can be one of “iid”, “hetero”, “HC1”, “HC2”, “HC3”. If a dictionary, it should have the format {“CRV1”: “clustervar”} for CRV1 inference or {“CRV3”: “clustervar”} for CRV3 inference. Note that CRV3 inference is currently not supported for IV estimation. required
data Optional[DataFrameType] The data used for estimation. If None, tries to fetch the data from the model object. Defaults to None. None

Returns

Type Description
Feols An instance of class [Feols(/reference/Feols.qmd) with updated inference.

wald_test

estimation.Feols.wald_test(R=None, q=None, distribution='F')

Conduct Wald test.

Compute a Wald test for a linear hypothesis of the form Rb = q. By default, tests the joint null hypothesis that all coefficients are zero.

Parameters

Name Type Description Default
R array - like The matrix R of the linear hypothesis. If None, defaults to an identity matrix. None
q array - like The vector q of the linear hypothesis. If None, defaults to a vector of zeros. None
distribution str The distribution to use for the p-value. Can be either “F” or “chi2”. Defaults to “F”. 'F'

Returns

Type Description
pd.Series A pd.Series with the Wald statistic and p-value.

wildboottest

estimation.Feols.wildboottest(reps, cluster=None, param=None, weights_type='rademacher', impose_null=True, bootstrap_type='11', seed=None, adj=True, cluster_adj=True, parallel=False, return_bootstrapped_t_stats=False)

Run a wild cluster bootstrap based on an object of type “Feols”.

Parameters

Name Type Description Default
reps int The number of bootstrap iterations to run. required
cluster Union[str, None] The variable used for clustering. Defaults to None. If None, then uses the variable specified in the model’s clustervar attribute. If no _clustervar attribute is found, runs a heteroskedasticity- robust bootstrap. None
param Union[str, None] A string of length one, containing the test parameter of interest. Defaults to None. None
weights_type str The type of bootstrap weights. Options are ‘rademacher’, ‘mammen’, ‘webb’, or ‘normal’. Defaults to ‘rademacher’. 'rademacher'
impose_null bool Indicates whether to impose the null hypothesis on the bootstrap DGP. Defaults to True. True
bootstrap_type str A string of length one to choose the bootstrap type. Options are ‘11’, ‘31’, ‘13’, or ‘33’. Defaults to ‘11’. '11'
seed Union[int, None] An option to provide a random seed. Defaults to None. None
adj bool Indicates whether to apply a small sample adjustment for the number of observations and covariates. Defaults to True. True
cluster_adj bool Indicates whether to apply a small sample adjustment for the number of clusters. Defaults to True. True
parallel bool Indicates whether to run the bootstrap in parallel. Defaults to False. False
seed Union[str, None] An option to provide a random seed. Defaults to None. None
return_bootstrapped_t_stats bool, optional: If True, the method returns a tuple of the regular output and the bootstrapped t-stats. Defaults to False. False

Returns

Type Description
pd.DataFrame A DataFrame with the original, non-bootstrapped t-statistic and bootstrapped p-value, along with the bootstrap type, inference type (HC vs CRV), and whether the null hypothesis was imposed on the bootstrap DGP. If return_bootstrapped_t_stats is True, the method returns a tuple of the regular output and the bootstrapped t-stats.