estimation.Feols
estimation.Feols(self, Y, X, weights, collin_tol, coefnames, weights_name, weights_type, solver='np.linalg.solve')
Non user-facing class to estimate a liner regression via OLS.
Users should not directly instantiate this class, but rather use the feols() function. Note that no demeaning is performed in this class: demeaning is performed in the FixestMulti class (to allow for caching of demeaned variables for multiple estimation).
Parameters
Name | Type | Description | Default |
---|---|---|---|
Y |
np.ndarray | Dependent variable, a two-dimensional numpy array. | required |
X |
np.ndarray | Independent variables, a two-dimensional numpy array. | required |
weights |
np.ndarray | Weights, a one-dimensional numpy array. | required |
collin_tol |
float | Tolerance level for collinearity checks. | required |
coefnames |
list[str] | Names of the coefficients (of the design matrix X). | required |
weights_name |
Optional[str] | Name of the weights variable. | required |
weights_type |
Optional[str] | Type of the weights variable. Either “aweights” for analytic weights or “fweights” for frequency weights. | required |
solver |
str, optional. | The solver to use for the regression. Can be either “np.linalg.solve” or “np.linalg.lstsq”. Defaults to “np.linalg.solve”. | 'np.linalg.solve' |
Attributes
Name | Type | Description |
---|---|---|
_method | str | Specifies the method used for regression, set to “feols”. |
_is_iv | bool | Indicates whether instrumental variables are used, initialized as False. |
_Y | np.ndarray | The dependent variable array. |
_X | np.ndarray | The independent variables array. |
_X_is_empty | bool | Indicates whether the X array is empty. |
_collin_tol | float | Tolerance level for collinearity checks. |
_coefnames | list | Names of the coefficients (of the design matrix X). |
_collin_vars | list | Variables identified as collinear. |
_collin_index | list | Indices of collinear variables. |
_Z | np.ndarray | Alias for the _X array, used for calculations. |
_solver | str | The solver used for the regression. |
_weights | np.ndarray | Array of weights for each observation. |
_N | int | Number of observations. |
_k | int | Number of independent variables (or features). |
_support_crv3_inference | bool | Indicates support for CRV3 inference. |
_data | Any | Data used in the regression, to be enriched outside of the class. |
_fml | Any | Formula used in the regression, to be enriched outside of the class. |
_has_fixef | bool | Indicates whether fixed effects are used. |
_fixef | Any | Fixed effects used in the regression. |
_icovars | Any | Internal covariates, to be enriched outside of the class. |
_ssc_dict | dict | dictionary for sum of squares and cross products matrices. |
_tZX | np.ndarray | Transpose of Z multiplied by X, set in get_fit(). |
_tXZ | np.ndarray | Transpose of X multiplied by Z, set in get_fit(). |
_tZy | np.ndarray | Transpose of Z multiplied by Y, set in get_fit(). |
_tZZinv | np.ndarray | Inverse of the transpose of Z multiplied by Z, set in get_fit(). |
_beta_hat | np.ndarray | Estimated regression coefficients. |
_Y_hat_link | np.ndarray | Predicted values of the dependent variable. |
_Y_hat_response | np.ndarray | Response predictions of the model. |
_u_hat | np.ndarray | Residuals of the regression model. |
_scores | np.ndarray | Scores used in the regression analysis. |
_hessian | np.ndarray | Hessian matrix used in the regression. |
_bread | np.ndarray | Bread matrix, used in calculating the variance-covariance matrix. |
_vcov_type | Any | Type of variance-covariance matrix used. |
_vcov_type_detail | Any | Detailed specification of the variance-covariance matrix type. |
_is_clustered | bool | Indicates if clustering is used in the variance-covariance calculation. |
_clustervar | Any | Variable used for clustering in the variance-covariance calculation. |
_G | Any | Group information used in clustering. |
_ssc | Any | Sum of squares and cross products matrix. |
_vcov | np.ndarray | Variance-covariance matrix of the estimated coefficients. |
_se | np.ndarray | Standard errors of the estimated coefficients. |
_tstat | np.ndarray | T-statistics of the estimated coefficients. |
_pvalue | np.ndarray | P-values associated with the t-statistics. |
_conf_int | np.ndarray | Confidence intervals for the estimated coefficients. |
_F_stat | Any | F-statistic for the model, set in get_Ftest(). |
_fixef_dict | dict | dictionary containing fixed effects estimates. |
_sumFE | np.ndarray | Sum of all fixed effects for each observation. |
_rmse | float | Root mean squared error of the model. |
_r2 | float | R-squared value of the model. |
_r2_within | float | R-squared value computed on demeaned dependent variable. |
_adj_r2 | float | Adjusted R-squared value of the model. |
_adj_r2_within | float | Adjusted R-squared value computed on demeaned dependent variable. |
_solver | str | The solver used to fit the normal equation. |
Methods
Name | Description |
---|---|
add_fixest_multi_context | Enrich Feols object. |
ccv | Compute the Causal Cluster Variance following Abadie et al (QJE 2023). |
coef | Fitted model coefficents. |
confint | Fitted model confidence intervals. |
fixef | Compute the coefficients of (swept out) fixed effects for a regression model. |
get_fit | Fit an OLS model. |
get_inference | Compute standard errors, t-statistics, and p-values for the regression model. |
get_nobs | Fetch the number of observations used in fitting the regression model. |
get_performance | Get Goodness-of-Fit measures. |
plot_ritest | Plot the distribution of the Randomization Inference Statistics. |
predict | Predict values of the model on new data. |
pvalue | Fitted model p-values. |
resid | Fitted model residuals. |
ritest | Conduct Randomization Inference (RI) test against a null hypothesis of |
se | Fitted model standard errors. |
solve_ols | Solve the ordinary least squares problem using the specified solver. |
tidy | Tidy model outputs. |
tstat | Fitted model t-statistics. |
vcov | Compute covariance matrices for an estimated regression model. |
wald_test | Conduct Wald test. |
wildboottest | Run a wild cluster bootstrap based on an object of type “Feols”. |
add_fixest_multi_context
estimation.Feols.add_fixest_multi_context(fml, depvar, Y, _data, _ssc_dict, _k_fe, fval, store_data)
Enrich Feols object.
Enrich an instance of Feols
Class with additional attributes set in the FixestMulti
class.
Parameters
Name | Type | Description | Default |
---|---|---|---|
fml |
str | The formula used for estimation. | required |
depvar |
str | The dependent variable of the regression model. | required |
Y |
pd.Series | The dependent variable of the regression model. | required |
_data |
pd.DataFrame | The data used for estimation. | required |
_ssc_dict |
dict | A dictionary with the sum of squares and cross products matrices. | required |
_k_fe |
int | The number of fixed effects. | required |
fval |
str | The fixed effects formula. | required |
store_data |
bool | Indicates whether to save the data used for estimation in the object | required |
Returns
Type | Description |
---|---|
None |
ccv
estimation.Feols.ccv(treatment, cluster=None, seed=None, n_splits=8, pk=1, qk=1)
Compute the Causal Cluster Variance following Abadie et al (QJE 2023).
Parameters
Name | Type | Description | Default |
---|---|---|---|
treatment |
The name of the treatment variable. | required | |
cluster |
str | The name of the cluster variable. None by default. If None, uses the cluster variable from the model fit. | None |
seed |
int | An integer to set the random seed. Defaults to None. | None |
n_splits |
int | The number of splits to use in the cross-fitting procedure. Defaults to 8. | 8 |
pk |
float | The proportion of sampled clusters. Defaults to 1, which corresponds to all clusters of the population being sampled. | 1 |
qk |
float | The proportion of sampled observations within each cluster. Defaults to 1, which corresponds to all observations within each cluster being sampled. | 1 |
Returns
Type | Description |
---|---|
pd.DataFrame | A DataFrame with inference based on the “Causal Cluster Variance” and “regular” CRV1 inference. |
Examples
from pyfixest.estimation import feols
from pyfixest.utils import get_data
= get_data()
data "D1"] = np.random.choice([0, 1], size=data.shape[0])
data[
= feols("Y ~ D", data=data, vcov={"CRV1": "group_id"})
fit ="D", pk=0.05, gk=0.5, n_splits=8, seed=123).head() fit.ccv(treatment
coef
estimation.Feols.coef()
Fitted model coefficents.
Returns
Type | Description |
---|---|
pd.Series | A pd.Series with the estimated coefficients of the regression model. |
confint
estimation.Feols.confint(alpha=0.05, keep=None, drop=None, exact_match=False, joint=False, seed=None, reps=10000)
Fitted model confidence intervals.
Parameters
Name | Type | Description | Default |
---|---|---|---|
alpha |
float | The significance level for confidence intervals. Defaults to 0.05. keep: str or list of str, optional | 0.05 |
joint |
bool | Whether to compute simultaneous confidence interval for joint null of parameters selected by keep and drop . Defaults to False. See https://www.causalml-book.org/assets/chapters/CausalML_chap_4.pdf, Remark 4.4.1 for details. |
False |
keep |
Optional[Union[list, str]] | The pattern for retaining coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Default is keeping all coefficients. You should use regular expressions to select coefficients. “age”, # would keep all coefficients containing age r”^tr”, # would keep all coefficients starting with tr r”\d$“, # would keep all coefficients ending with number Output will be in the order of the patterns. | None |
drop |
Optional[Union[list, str]] | The pattern for excluding coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Syntax is the same as for keep . Default is keeping all coefficients. Parameter keep and drop can be used simultaneously. |
None |
exact_match |
Optional[bool] | Whether to use exact match for keep and drop . Default is False. If True, the pattern will be matched exactly to the coefficient name instead of using regular expressions. |
False |
reps |
int | The number of bootstrap iterations to run for joint confidence intervals. Defaults to 10_000. Only used if joint is True. |
10000 |
seed |
int | The seed for the random number generator. Defaults to None. Only used if joint is True. |
None |
Returns
Type | Description |
---|---|
pd.DataFrame | A pd.DataFrame with confidence intervals of the estimated regression model for the selected coefficients. |
Examples
from pyfixest.utils import get_data
from pyfixest.estimation import feols
= get_data()
data = feols("Y ~ C(f1)", data=data)
fit =0.10).head()
fit.confint(alpha=0.10, joint=True, reps=9999).head() fit.confint(alpha
fixef
estimation.Feols.fixef()
Compute the coefficients of (swept out) fixed effects for a regression model.
This method creates the following attributes: - alphaDF
(pd.DataFrame): A DataFrame with the estimated fixed effects. - sumFE
(np.array): An array with the sum of fixed effects for each observation (i = 1, …, N).
Returns
Type | Description |
---|---|
None |
get_fit
estimation.Feols.get_fit()
Fit an OLS model.
Returns
Type | Description |
---|---|
None |
get_inference
estimation.Feols.get_inference(alpha=0.05)
Compute standard errors, t-statistics, and p-values for the regression model.
Parameters
Name | Type | Description | Default |
---|---|---|---|
alpha |
float | The significance level for confidence intervals. Defaults to 0.05, which produces a 95% confidence interval. | 0.05 |
Returns
Type | Description |
---|---|
None |
get_nobs
estimation.Feols.get_nobs()
Fetch the number of observations used in fitting the regression model.
Returns
Type | Description |
---|---|
None |
get_performance
estimation.Feols.get_performance()
Get Goodness-of-Fit measures.
Compute multiple additional measures commonly reported with linear regression output, including R-squared and adjusted R-squared. Note that variables with the suffix _within use demeaned dependent variables Y, while variables without do not or are invariant to demeaning.
Returns
Type | Description |
---|---|
None | |
Creates the following instances: | |
- r2 (float): R-squared of the regression model. | |
- adj_r2 (float): Adjusted R-squared of the regression model. | |
- r2_within (float): R-squared of the regression model, computed on | |
demeaned dependent variable. | |
- adj_r2_within (float): Adjusted R-squared of the regression model, | |
computed on demeaned dependent variable. |
plot_ritest
estimation.Feols.plot_ritest(plot_backend='lets_plot')
Plot the distribution of the Randomization Inference Statistics.
Parameters
Name | Type | Description | Default |
---|---|---|---|
plot_backend |
str | The plotting backend to use. Defaults to “lets_plot”. Alternatively, “matplotlib” is available. | 'lets_plot' |
Returns
Type | Description |
---|---|
A lets_plot or matplotlib figure with the distribution of the Randomization | |
Inference Statistics. |
predict
estimation.Feols.predict(newdata=None)
Predict values of the model on new data.
Return a flat np.array with predicted values of the regression model. If new fixed effect levels are introduced in newdata
, predicted values for such observations will be set to NaN.
Parameters
Name | Type | Description | Default |
---|---|---|---|
newdata |
Optional[DataFrameType] | A pd.DataFrame or pl.DataFrame with the data to be used for prediction. If None (default), the data used for fitting the model is used. | None |
Returns
Type | Description |
---|---|
np.ndarray | A flat np.array with predicted values of the regression model. |
pvalue
estimation.Feols.pvalue()
Fitted model p-values.
Returns
Type | Description |
---|---|
pd.Series | A pd.Series with p-values of the estimated regression model. |
resid
estimation.Feols.resid()
Fitted model residuals.
Returns
Type | Description |
---|---|
np.ndarray | A np.ndarray with the residuals of the estimated regression model. |
ritest
estimation.Feols.ritest(resampvar, cluster=None, reps=100, type='randomization-c', rng=None, choose_algorithm='auto', store_ritest_statistics=False, level=0.95)
Conduct Randomization Inference (RI) test against a null hypothesis of resampvar = 0
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
resampvar |
str | The name of the variable to be resampled. | required |
cluster |
str | The name of the cluster variable in case of cluster random assignment. If provided, resampvar is held constant within each cluster . Defaults to None. |
None |
reps |
int | The number of randomization iterations. Defaults to 100. | 100 |
type |
str | The type of the randomization inference test. Can be “randomization-c” or “randomization-t”. Note that the “randomization-c” is much faster, while the “randomization-t” is recommended by Wu & Ding (JASA, 2021). | 'randomization-c' |
rng |
np.random.Generator | A random number generator. Defaults to None. | None |
choose_algorithm |
str | The algorithm to use for the computation. Defaults to “auto”. The alternative is “fast” and “slow”, and should only be used for running CI tests. Ironically, this argument is not tested for any input errors from the user! So please don’t use it =) | 'auto' |
include_plot |
Whether to include a plot of the distribution p-values. Defaults to False. | required | |
store_ritest_statistics |
bool | Whether to store the simulated statistics of the RI procedure. Defaults to False. If True, stores the simulated statistics in the model object via the ritest_statistics attribute as a numpy array. |
False |
level |
float | The level for the confidence interval of the randomization inference p-value. Defaults to 0.95. | 0.95 |
Returns
Type | Description |
---|---|
A pd.Series with the regression coefficient of resampvar and the p-value |
|
of the RI test. Additionally, reports the standard error and the confidence | |
interval of the p-value. |
se
estimation.Feols.se()
Fitted model standard errors.
Returns
Type | Description |
---|---|
pd.Series | A pd.Series with the standard errors of the estimated regression model. |
solve_ols
estimation.Feols.solve_ols(tZX, tZY, solver)
Solve the ordinary least squares problem using the specified solver.
Parameters
Name | Type | Description | Default |
---|---|---|---|
tZX |
np.ndarray | required | |
tZY |
np.ndarray | required | |
solver |
str | required |
Returns
Type | Description |
---|---|
array-like: The solution to the ordinary least squares problem. |
Raises
Type | Description |
---|---|
ValueError: If the specified solver is not supported. |
tidy
estimation.Feols.tidy(alpha=None)
Tidy model outputs.
Return a tidy pd.DataFrame with the point estimates, standard errors, t-statistics, and p-values.
Parameters
alpha: Optional[float] The significance level for the confidence intervals. If None, computes a 95% confidence interval (alpha = 0.05
).
Returns
Type | Description |
---|---|
pd.DataFrame | A tidy pd.DataFrame containing the regression results, including point estimates, standard errors, t-statistics, and p-values. |
tstat
estimation.Feols.tstat()
Fitted model t-statistics.
Returns
Type | Description |
---|---|
pd.Series | A pd.Series with t-statistics of the estimated regression model. |
vcov
estimation.Feols.vcov(vcov, data=None)
Compute covariance matrices for an estimated regression model.
Parameters
Name | Type | Description | Default |
---|---|---|---|
vcov |
Union[str, dict[str, str]] | A string or dictionary specifying the type of variance-covariance matrix to use for inference. If a string, it can be one of “iid”, “hetero”, “HC1”, “HC2”, “HC3”. If a dictionary, it should have the format {“CRV1”: “clustervar”} for CRV1 inference or {“CRV3”: “clustervar”} for CRV3 inference. Note that CRV3 inference is currently not supported for IV estimation. | required |
data |
Optional[DataFrameType] | The data used for estimation. If None, tries to fetch the data from the model object. Defaults to None. | None |
Returns
Type | Description |
---|---|
Feols | An instance of class [Feols(/reference/Feols.qmd) with updated inference. |
wald_test
estimation.Feols.wald_test(R=None, q=None, distribution='F')
Conduct Wald test.
Compute a Wald test for a linear hypothesis of the form Rb = q. By default, tests the joint null hypothesis that all coefficients are zero.
Parameters
Name | Type | Description | Default |
---|---|---|---|
R |
array - like | The matrix R of the linear hypothesis. If None, defaults to an identity matrix. | None |
q |
array - like | The vector q of the linear hypothesis. If None, defaults to a vector of zeros. | None |
distribution |
str | The distribution to use for the p-value. Can be either “F” or “chi2”. Defaults to “F”. | 'F' |
Returns
Type | Description |
---|---|
pd.Series | A pd.Series with the Wald statistic and p-value. |
wildboottest
estimation.Feols.wildboottest(reps, cluster=None, param=None, weights_type='rademacher', impose_null=True, bootstrap_type='11', seed=None, adj=True, cluster_adj=True, parallel=False, return_bootstrapped_t_stats=False)
Run a wild cluster bootstrap based on an object of type “Feols”.
Parameters
Name | Type | Description | Default |
---|---|---|---|
reps |
int | The number of bootstrap iterations to run. | required |
cluster |
Union[str, None] | The variable used for clustering. Defaults to None. If None, then uses the variable specified in the model’s clustervar attribute. If no _clustervar attribute is found, runs a heteroskedasticity- robust bootstrap. |
None |
param |
Union[str, None] | A string of length one, containing the test parameter of interest. Defaults to None. | None |
weights_type |
str | The type of bootstrap weights. Options are ‘rademacher’, ‘mammen’, ‘webb’, or ‘normal’. Defaults to ‘rademacher’. | 'rademacher' |
impose_null |
bool | Indicates whether to impose the null hypothesis on the bootstrap DGP. Defaults to True. | True |
bootstrap_type |
str | A string of length one to choose the bootstrap type. Options are ‘11’, ‘31’, ‘13’, or ‘33’. Defaults to ‘11’. | '11' |
seed |
Union[int, None] | An option to provide a random seed. Defaults to None. | None |
adj |
bool | Indicates whether to apply a small sample adjustment for the number of observations and covariates. Defaults to True. | True |
cluster_adj |
bool | Indicates whether to apply a small sample adjustment for the number of clusters. Defaults to True. | True |
parallel |
bool | Indicates whether to run the bootstrap in parallel. Defaults to False. | False |
seed |
Union[str, None] | An option to provide a random seed. Defaults to None. | None |
return_bootstrapped_t_stats |
bool, optional: | If True, the method returns a tuple of the regular output and the bootstrapped t-stats. Defaults to False. | False |
Returns
Type | Description |
---|---|
pd.DataFrame | A DataFrame with the original, non-bootstrapped t-statistic and bootstrapped p-value, along with the bootstrap type, inference type (HC vs CRV), and whether the null hypothesis was imposed on the bootstrap DGP. If return_bootstrapped_t_stats is True, the method returns a tuple of the regular output and the bootstrapped t-stats. |