estimation.feols_.Feols
estimation.feols_.Feols(self
FixestFormula
data
ssc_dict
drop_singletons
drop_intercept
weights
weights_type
collin_tol
fixef_tol
lookup_demeaned_data='np.linalg.solve'
solver=True
store_data=True
copy_data=False
lean=None
sample_split_var=None
sample_split_value )
Non user-facing class to estimate a linear regression via OLS.
Users should not directly instantiate this class, but rather use the feols() function. Note that no demeaning is performed in this class: demeaning is performed in the FixestMulti class (to allow for caching of demeaned variables for multiple estimation).
Parameters
Name | Type | Description | Default |
---|---|---|---|
Y | np.ndarray | Dependent variable, a two-dimensional numpy array. | required |
X | np.ndarray | Independent variables, a two-dimensional numpy array. | required |
weights | np.ndarray | Weights, a one-dimensional numpy array. | required |
collin_tol | float | Tolerance level for collinearity checks. | required |
coefnames | list[str] | Names of the coefficients (of the design matrix X). | required |
weights_name | Optional[str] | Name of the weights variable. | required |
weights_type | Optional[str] | Type of the weights variable. Either “aweights” for analytic weights or “fweights” for frequency weights. | required |
solver | str, optional. | The solver to use for the regression. Can be either “np.linalg.solve” or “np.linalg.lstsq”. Defaults to “np.linalg.solve”. | 'np.linalg.solve' |
Attributes
Name | Type | Description |
---|---|---|
_method | str | Specifies the method used for regression, set to “feols”. |
_is_iv | bool | Indicates whether instrumental variables are used, initialized as False. |
_Y | np.ndarray | The demeaned dependent variable, a two-dimensional numpy array. |
_X | np.ndarray | The demeaned independent variables, a two-dimensional numpy array. |
_X_is_empty | bool | Indicates whether the X array is empty. |
_collin_tol | float | Tolerance level for collinearity checks. |
_coefnames | list | Names of the coefficients (of the design matrix X). |
_collin_vars | list | Variables identified as collinear. |
_collin_index | list | Indices of collinear variables. |
_Z | np.ndarray | Alias for the _X array, used for calculations. |
_solver | str | The solver used for the regression. |
_weights | np.ndarray | Array of weights for each observation. |
_N | int | Number of observations. |
_k | int | Number of independent variables (or features). |
_support_crv3_inference | bool | Indicates support for CRV3 inference. |
_data | Any | Data used in the regression, to be enriched outside of the class. |
_fml | Any | Formula used in the regression, to be enriched outside of the class. |
_has_fixef | bool | Indicates whether fixed effects are used. |
_fixef | Any | Fixed effects used in the regression. |
_icovars | Any | Internal covariates, to be enriched outside of the class. |
_ssc_dict | dict | dictionary for sum of squares and cross products matrices. |
_tZX | np.ndarray | Transpose of Z multiplied by X, set in get_fit(). |
_tXZ | np.ndarray | Transpose of X multiplied by Z, set in get_fit(). |
_tZy | np.ndarray | Transpose of Z multiplied by Y, set in get_fit(). |
_tZZinv | np.ndarray | Inverse of the transpose of Z multiplied by Z, set in get_fit(). |
_beta_hat | np.ndarray | Estimated regression coefficients. |
_Y_hat_link | np.ndarray | Prediction at the level of the explanatory variable, i.e., the linear predictor X @ beta. |
_Y_hat_response | np.ndarray | Prediction at the level of the response variable, i.e., the expected predictor E(Y|X). |
_u_hat | np.ndarray | Residuals of the regression model. |
_scores | np.ndarray | Scores used in the regression analysis. |
_hessian | np.ndarray | Hessian matrix used in the regression. |
_bread | np.ndarray | Bread matrix, used in calculating the variance-covariance matrix. |
_vcov_type | Any | Type of variance-covariance matrix used. |
_vcov_type_detail | Any | Detailed specification of the variance-covariance matrix type. |
_is_clustered | bool | Indicates if clustering is used in the variance-covariance calculation. |
_clustervar | Any | Variable used for clustering in the variance-covariance calculation. |
_G | Any | Group information used in clustering. |
_ssc | Any | Sum of squares and cross products matrix. |
_vcov | np.ndarray | Variance-covariance matrix of the estimated coefficients. |
_se | np.ndarray | Standard errors of the estimated coefficients. |
_tstat | np.ndarray | T-statistics of the estimated coefficients. |
_pvalue | np.ndarray | P-values associated with the t-statistics. |
_conf_int | np.ndarray | Confidence intervals for the estimated coefficients. |
_F_stat | Any | F-statistic for the model, set in get_Ftest(). |
_fixef_dict | dict | dictionary containing fixed effects estimates. |
_sumFE | np.ndarray | Sum of all fixed effects for each observation. |
_rmse | float | Root mean squared error of the model. |
_r2 | float | R-squared value of the model. |
_r2_within | float | R-squared value computed on demeaned dependent variable. |
_adj_r2 | float | Adjusted R-squared value of the model. |
_adj_r2_within | float | Adjusted R-squared value computed on demeaned dependent variable. |
_solver | str | The solver used to fit the normal equation. |
_data | pd.DataFrame | The data frame used in the estimation. None if arguments lean = True or store_data = False . |
Methods
Name | Description |
---|---|
add_fixest_multi_context | Enrich Feols object. |
ccv | Compute the Causal Cluster Variance following Abadie et al (QJE 2023). |
coef | Fitted model coefficents. |
confint | Fitted model confidence intervals. |
demean | Demean the dependent variable and covariates by the fixed effect(s). |
drop_multicol_vars | Detect and drop multicollinear variables. |
fixef | Compute the coefficients of (swept out) fixed effects for a regression model. |
get_fit | Fit an OLS model. |
get_inference | Compute standard errors, t-statistics, and p-values for the regression model. |
get_performance | Get Goodness-of-Fit measures. |
plot_ritest | Plot the distribution of the Randomization Inference Statistics. |
predict | Predict values of the model on new data. |
prepare_model_matrix | Prepare model matrices for estimation. |
pvalue | Fitted model p-values. |
resid | Fitted model residuals. |
ritest | Conduct Randomization Inference (RI) test against a null hypothesis of |
se | Fitted model standard errors. |
solve_ols | Solve the ordinary least squares problem using the specified solver. |
tidy | Tidy model outputs. |
to_array | Convert estimation data frames to np arrays. |
tstat | Fitted model t-statistics. |
update | Update coefficients for new observations using Sherman-Morrison formula. |
vcov | Compute covariance matrices for an estimated regression model. |
wald_test | Conduct Wald test. |
wildboottest | Run a wild cluster bootstrap based on an object of type “Feols”. |
wls_transform | Transform model matrices for WLS Estimation. |
add_fixest_multi_context
estimation.feols_.Feols.add_fixest_multi_context(
depvar
Y
_data
_ssc_dict
_k_fe
fval
store_data )
Enrich Feols object.
Enrich an instance of Feols
Class with additional attributes set in the FixestMulti
class.
Parameters
Name | Type | Description | Default |
---|---|---|---|
FixestFormula | FixestFormula | The formula(s) used for estimation encoded in a FixestFormula object. |
required |
depvar | str | The dependent variable of the regression model. | required |
Y | pd.Series | The dependent variable of the regression model. | required |
_data | pd.DataFrame | The data used for estimation. | required |
_ssc_dict | dict | A dictionary with the sum of squares and cross products matrices. | required |
_k_fe | int | The number of fixed effects. | required |
fval | str | The fixed effects formula. | required |
store_data | bool | Indicates whether to save the data used for estimation in the object | required |
Returns
Name | Type | Description |
---|---|---|
None |
ccv
estimation.feols_.Feols.ccv(
treatment=None
cluster=None
seed=8
n_splits=1
pk=1
qk )
Compute the Causal Cluster Variance following Abadie et al (QJE 2023).
Parameters
Name | Type | Description | Default |
---|---|---|---|
treatment | The name of the treatment variable. | required | |
cluster | str | The name of the cluster variable. None by default. If None, uses the cluster variable from the model fit. | None |
seed | int | An integer to set the random seed. Defaults to None. | None |
n_splits | int | The number of splits to use in the cross-fitting procedure. Defaults to 8. | 8 |
pk | float | The proportion of sampled clusters. Defaults to 1, which corresponds to all clusters of the population being sampled. | 1 |
qk | float | The proportion of sampled observations within each cluster. Defaults to 1, which corresponds to all observations within each cluster being sampled. | 1 |
Returns
Name | Type | Description |
---|---|---|
pd.DataFrame | A DataFrame with inference based on the “Causal Cluster Variance” and “regular” CRV1 inference. |
Examples
from pyfixest.estimation import feols
from pyfixest.utils import get_data
= get_data()
data "D1"] = np.random.choice([0, 1], size=data.shape[0])
data[
= feols("Y ~ D", data=data, vcov={"CRV1": "group_id"})
fit ="D", pk=0.05, gk=0.5, n_splits=8, seed=123).head() fit.ccv(treatment
coef
estimation.feols_.Feols.coef()
Fitted model coefficents.
Returns
Name | Type | Description |
---|---|---|
pd.Series | A pd.Series with the estimated coefficients of the regression model. |
confint
estimation.feols_.Feols.confint(=0.05
alpha=None
keep=None
drop=False
exact_match=False
joint=None
seed=10000
reps )
Fitted model confidence intervals.
Parameters
Name | Type | Description | Default |
---|---|---|---|
alpha | float | The significance level for confidence intervals. Defaults to 0.05. keep: str or list of str, optional | 0.05 |
joint | bool | Whether to compute simultaneous confidence interval for joint null of parameters selected by keep and drop . Defaults to False. See https://www.causalml-book.org/assets/chapters/CausalML_chap_4.pdf, Remark 4.4.1 for details. |
False |
keep | Optional[Union[list, str]] | The pattern for retaining coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Default is keeping all coefficients. You should use regular expressions to select coefficients. “age”, # would keep all coefficients containing age r”^tr”, # would keep all coefficients starting with tr r”\d$“, # would keep all coefficients ending with number Output will be in the order of the patterns. | None |
drop | Optional[Union[list, str]] | The pattern for excluding coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Syntax is the same as for keep . Default is keeping all coefficients. Parameter keep and drop can be used simultaneously. |
None |
exact_match | Optional[bool] | Whether to use exact match for keep and drop . Default is False. If True, the pattern will be matched exactly to the coefficient name instead of using regular expressions. |
False |
reps | int | The number of bootstrap iterations to run for joint confidence intervals. Defaults to 10_000. Only used if joint is True. |
10000 |
seed | int | The seed for the random number generator. Defaults to None. Only used if joint is True. |
None |
Returns
Name | Type | Description |
---|---|---|
pd.DataFrame | A pd.DataFrame with confidence intervals of the estimated regression model for the selected coefficients. |
Examples
from pyfixest.utils import get_data
from pyfixest.estimation import feols
= get_data()
data = feols("Y ~ C(f1)", data=data)
fit =0.10).head()
fit.confint(alpha=0.10, joint=True, reps=9999).head() fit.confint(alpha
demean
estimation.feols_.Feols.demean()
Demean the dependent variable and covariates by the fixed effect(s).
drop_multicol_vars
estimation.feols_.Feols.drop_multicol_vars()
Detect and drop multicollinear variables.
fixef
=1e-06, btol=1e-06) estimation.feols_.Feols.fixef(atol
Compute the coefficients of (swept out) fixed effects for a regression model.
This method creates the following attributes: - alphaDF
(pd.DataFrame): A DataFrame with the estimated fixed effects. - sumFE
(np.array): An array with the sum of fixed effects for each observation (i = 1, …, N).
Returns
Name | Type | Description |
---|---|---|
None |
get_fit
estimation.feols_.Feols.get_fit()
Fit an OLS model.
Returns
Name | Type | Description |
---|---|---|
None |
get_inference
=0.05) estimation.feols_.Feols.get_inference(alpha
Compute standard errors, t-statistics, and p-values for the regression model.
Parameters
Name | Type | Description | Default |
---|---|---|---|
alpha | float | The significance level for confidence intervals. Defaults to 0.05, which produces a 95% confidence interval. | 0.05 |
Returns
Name | Type | Description |
---|---|---|
None |
get_performance
estimation.feols_.Feols.get_performance()
Get Goodness-of-Fit measures.
Compute multiple additional measures commonly reported with linear regression output, including R-squared and adjusted R-squared. Note that variables with the suffix _within use demeaned dependent variables Y, while variables without do not or are invariant to demeaning.
Returns
Name | Type | Description |
---|---|---|
None | ||
Creates the following instances: | ||
- r2 (float): R-squared of the regression model. | ||
- adj_r2 (float): Adjusted R-squared of the regression model. | ||
- r2_within (float): R-squared of the regression model, computed on | ||
demeaned dependent variable. | ||
- adj_r2_within (float): Adjusted R-squared of the regression model, | ||
computed on demeaned dependent variable. |
plot_ritest
='lets_plot') estimation.feols_.Feols.plot_ritest(plot_backend
Plot the distribution of the Randomization Inference Statistics.
Parameters
Name | Type | Description | Default |
---|---|---|---|
plot_backend | str | The plotting backend to use. Defaults to “lets_plot”. Alternatively, “matplotlib” is available. | 'lets_plot' |
Returns
Name | Type | Description |
---|---|---|
A lets_plot or matplotlib figure with the distribution of the Randomization | ||
Inference Statistics. |
predict
estimation.feols_.Feols.predict(=None
newdata=1e-06
atol=1e-06
btoltype='link'
)
Predict values of the model on new data.
Return a flat np.array with predicted values of the regression model. If new fixed effect levels are introduced in newdata
, predicted values for such observations will be set to NaN.
Parameters
Name | Type | Description | Default |
---|---|---|---|
newdata | Optional[DataFrameType] | A pd.DataFrame or pl.DataFrame with the data to be used for prediction. If None (default), the data used for fitting the model is used. | None |
type | str | The type of prediction to be computed. Can be either “response” (default) or “link”. For linear models, both are identical. | 'link' |
atol | Float | Stopping tolerance for scipy.sparse.linalg.lsqr(). See https://docs.scipy.org/doc/ scipy/reference/generated/scipy.sparse.linalg.lsqr.html | 1e-6 |
btol | Float | Another stopping tolerance for scipy.sparse.linalg.lsqr(). See https://docs.scipy.org/doc/ scipy/reference/generated/scipy.sparse.linalg.lsqr.html | 1e-6 |
link | The type of prediction to be made. Can be either ‘link’ or ‘response’. Defaults to ‘link’. ‘link’ and ‘response’ lead to identical results for linear models. | required |
Returns
Name | Type | Description |
---|---|---|
y_hat | np.ndarray | A flat np.array with predicted values of the regression model. |
prepare_model_matrix
estimation.feols_.Feols.prepare_model_matrix()
Prepare model matrices for estimation.
pvalue
estimation.feols_.Feols.pvalue()
Fitted model p-values.
Returns
Name | Type | Description |
---|---|---|
pd.Series | A pd.Series with p-values of the estimated regression model. |
resid
estimation.feols_.Feols.resid()
Fitted model residuals.
Returns
Name | Type | Description |
---|---|---|
np.ndarray | A np.ndarray with the residuals of the estimated regression model. |
ritest
estimation.feols_.Feols.ritest(
resampvar=None
cluster=100
repstype='randomization-c'
=None
rng='auto'
choose_algorithm=False
store_ritest_statistics=0.95
level )
Conduct Randomization Inference (RI) test against a null hypothesis of resampvar = 0
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
resampvar | str | The name of the variable to be resampled. | required |
cluster | str | The name of the cluster variable in case of cluster random assignment. If provided, resampvar is held constant within each cluster . Defaults to None. |
None |
reps | int | The number of randomization iterations. Defaults to 100. | 100 |
type | str | The type of the randomization inference test. Can be “randomization-c” or “randomization-t”. Note that the “randomization-c” is much faster, while the “randomization-t” is recommended by Wu & Ding (JASA, 2021). | 'randomization-c' |
rng | np.random.Generator | A random number generator. Defaults to None. | None |
choose_algorithm | str | The algorithm to use for the computation. Defaults to “auto”. The alternative is “fast” and “slow”, and should only be used for running CI tests. Ironically, this argument is not tested for any input errors from the user! So please don’t use it =) | 'auto' |
include_plot | Whether to include a plot of the distribution p-values. Defaults to False. | required | |
store_ritest_statistics | bool | Whether to store the simulated statistics of the RI procedure. Defaults to False. If True, stores the simulated statistics in the model object via the ritest_statistics attribute as a numpy array. |
False |
level | float | The level for the confidence interval of the randomization inference p-value. Defaults to 0.95. | 0.95 |
Returns
Name | Type | Description |
---|---|---|
A pd.Series with the regression coefficient of resampvar and the p-value |
||
of the RI test. Additionally, reports the standard error and the confidence | ||
interval of the p-value. |
se
estimation.feols_.Feols.se()
Fitted model standard errors.
Returns
Name | Type | Description |
---|---|---|
pd.Series | A pd.Series with the standard errors of the estimated regression model. |
solve_ols
estimation.feols_.Feols.solve_ols(tZX, tZY, solver)
Solve the ordinary least squares problem using the specified solver.
Parameters
Name | Type | Description | Default |
---|---|---|---|
tZX | np.ndarray | required | |
tZY | np.ndarray | required | |
solver | str | required |
Returns
Name | Type | Description |
---|---|---|
array-like: The solution to the ordinary least squares problem. |
Raises
Name | Type | Description |
---|---|---|
ValueError: If the specified solver is not supported. |
tidy
=None) estimation.feols_.Feols.tidy(alpha
Tidy model outputs.
Return a tidy pd.DataFrame with the point estimates, standard errors, t-statistics, and p-values.
Parameters
Name | Type | Description | Default |
---|---|---|---|
alpha | Optional[float] | The significance level for the confidence intervals. If None, computes a 95% confidence interval (alpha = 0.05 ). |
None |
Returns
Name | Type | Description |
---|---|---|
tidy_df | pd.DataFrame | A tidy pd.DataFrame containing the regression results, including point estimates, standard errors, t-statistics, and p-values. |
to_array
estimation.feols_.Feols.to_array()
Convert estimation data frames to np arrays.
tstat
estimation.feols_.Feols.tstat()
Fitted model t-statistics.
Returns
Name | Type | Description |
---|---|---|
pd.Series | A pd.Series with t-statistics of the estimated regression model. |
update
=False) estimation.feols_.Feols.update(X_new, y_new, inplace
Update coefficients for new observations using Sherman-Morrison formula.
Returns
Name | Type | Description |
---|---|---|
np.ndarray | Updated coefficients |
vcov
=None) estimation.feols_.Feols.vcov(vcov, data
Compute covariance matrices for an estimated regression model.
Parameters
Name | Type | Description | Default |
---|---|---|---|
vcov | Union[str, dict[str, str]] | A string or dictionary specifying the type of variance-covariance matrix to use for inference. If a string, it can be one of “iid”, “hetero”, “HC1”, “HC2”, “HC3”. If a dictionary, it should have the format {“CRV1”: “clustervar”} for CRV1 inference or {“CRV3”: “clustervar”} for CRV3 inference. Note that CRV3 inference is currently not supported for IV estimation. | required |
data | Optional[DataFrameType] | The data used for estimation. If None, tries to fetch the data from the model object. Defaults to None. | None |
Returns
Name | Type | Description |
---|---|---|
Feols | An instance of class [Feols(/reference/Feols.qmd) with updated inference. |
wald_test
=None, q=None, distribution='F') estimation.feols_.Feols.wald_test(R
Conduct Wald test.
Compute a Wald test for a linear hypothesis of the form R * β = q. where R is m x k matrix, β is a k x 1 vector of coefficients, and q is m x 1 vector. By default, tests the joint null hypothesis that all coefficients are zero.
This method producues the following attriutes
_dfd : int degree of freedom in denominator _dfn : int degree of freedom in numerator _wald_statistic : scalar Wald-statistics computed for hypothesis testing _f_statistic : scalar Wald-statistics(when R is an indentity matrix, and q being zero vector) computed for hypothesis testing _p_value : scalar corresponding p-value for statistics
Parameters
Name | Type | Description | Default |
---|---|---|---|
R | array - like | The matrix R of the linear hypothesis. If None, defaults to an identity matrix. | None |
q | array - like | The vector q of the linear hypothesis. If None, defaults to a vector of zeros. | None |
distribution | str | The distribution to use for the p-value. Can be either “F” or “chi2”. Defaults to “F”. | 'F' |
Returns
Name | Type | Description |
---|---|---|
pd.Series | A pd.Series with the Wald statistic and p-value. |
Examples
import numpy as np import pandas as pd
from pyfixest.estimation.estimation import feols
data = pd.read_csv(“pyfixest/did/data/df_het.csv”) data = data.iloc[1:3000]
R = np.array([[1,-1]] ) q = np.array([0.0])
fml = “dep_var ~ treat” fit = feols(fml, data, vcov={“CRV1”: “year”}, ssc=ssc(adj=False))
Wald test
fit.wald_test(R=R, q=q, distribution = “chi2”) f_stat = fit._f_statistic p_stat = fit._p_value
print(f”Python f_stat: {f_stat}“) print(f”Python p_stat: {p_stat}“)
The code above produces the following results :
Python f_stat: 256.55432910297003
Python p_stat: 9.67406627744023e-58
wildboottest
estimation.feols_.Feols.wildboottest(
reps=None
cluster=None
param='rademacher'
weights_type=True
impose_null='11'
bootstrap_type=None
seed=True
adj=True
cluster_adj=False
parallel=False
return_bootstrapped_t_stats )
Run a wild cluster bootstrap based on an object of type “Feols”.
Parameters
Name | Type | Description | Default |
---|---|---|---|
reps | int | The number of bootstrap iterations to run. | required |
cluster | Union[str, None] | The variable used for clustering. Defaults to None. If None, then uses the variable specified in the model’s clustervar attribute. If no _clustervar attribute is found, runs a heteroskedasticity- robust bootstrap. |
None |
param | Union[str, None] | A string of length one, containing the test parameter of interest. Defaults to None. | None |
weights_type | str | The type of bootstrap weights. Options are ‘rademacher’, ‘mammen’, ‘webb’, or ‘normal’. Defaults to ‘rademacher’. | 'rademacher' |
impose_null | bool | Indicates whether to impose the null hypothesis on the bootstrap DGP. Defaults to True. | True |
bootstrap_type | str | A string of length one to choose the bootstrap type. Options are ‘11’, ‘31’, ‘13’, or ‘33’. Defaults to ‘11’. | '11' |
seed | Union[int, None] | An option to provide a random seed. Defaults to None. | None |
adj | bool | Indicates whether to apply a small sample adjustment for the number of observations and covariates. Defaults to True. | True |
cluster_adj | bool | Indicates whether to apply a small sample adjustment for the number of clusters. Defaults to True. | True |
parallel | bool | Indicates whether to run the bootstrap in parallel. Defaults to False. | False |
seed | Union[str, None] | An option to provide a random seed. Defaults to None. | None |
return_bootstrapped_t_stats | bool, optional: | If True, the method returns a tuple of the regular output and the bootstrapped t-stats. Defaults to False. | False |
Returns
Name | Type | Description |
---|---|---|
pd.DataFrame | A DataFrame with the original, non-bootstrapped t-statistic and bootstrapped p-value, along with the bootstrap type, inference type (HC vs CRV), and whether the null hypothesis was imposed on the bootstrap DGP. If return_bootstrapped_t_stats is True, the method returns a tuple of the regular output and the bootstrapped t-stats. |
wls_transform
estimation.feols_.Feols.wls_transform()
Transform model matrices for WLS Estimation.