estimation.feols_.Feols

estimation.feols_.Feols(
    self
    FixestFormula
    data
    ssc_dict
    drop_singletons
    drop_intercept
    weights
    weights_type
    collin_tol
    fixef_tol
    lookup_demeaned_data
    solver='np.linalg.solve'
    store_data=True
    copy_data=True
    lean=False
    sample_split_var=None
    sample_split_value=None
)

Non user-facing class to estimate a linear regression via OLS.

Users should not directly instantiate this class, but rather use the feols() function. Note that no demeaning is performed in this class: demeaning is performed in the FixestMulti class (to allow for caching of demeaned variables for multiple estimation).

Parameters

Name Type Description Default
Y np.ndarray Dependent variable, a two-dimensional numpy array. required
X np.ndarray Independent variables, a two-dimensional numpy array. required
weights np.ndarray Weights, a one-dimensional numpy array. required
collin_tol float Tolerance level for collinearity checks. required
coefnames list[str] Names of the coefficients (of the design matrix X). required
weights_name Optional[str] Name of the weights variable. required
weights_type Optional[str] Type of the weights variable. Either “aweights” for analytic weights or “fweights” for frequency weights. required
solver str, optional. The solver to use for the regression. Can be either “np.linalg.solve” or “np.linalg.lstsq”. Defaults to “np.linalg.solve”. 'np.linalg.solve'

Attributes

Name Type Description
_method str Specifies the method used for regression, set to “feols”.
_is_iv bool Indicates whether instrumental variables are used, initialized as False.
_Y np.ndarray The demeaned dependent variable, a two-dimensional numpy array.
_X np.ndarray The demeaned independent variables, a two-dimensional numpy array.
_X_is_empty bool Indicates whether the X array is empty.
_collin_tol float Tolerance level for collinearity checks.
_coefnames list Names of the coefficients (of the design matrix X).
_collin_vars list Variables identified as collinear.
_collin_index list Indices of collinear variables.
_Z np.ndarray Alias for the _X array, used for calculations.
_solver str The solver used for the regression.
_weights np.ndarray Array of weights for each observation.
_N int Number of observations.
_k int Number of independent variables (or features).
_support_crv3_inference bool Indicates support for CRV3 inference.
_data Any Data used in the regression, to be enriched outside of the class.
_fml Any Formula used in the regression, to be enriched outside of the class.
_has_fixef bool Indicates whether fixed effects are used.
_fixef Any Fixed effects used in the regression.
_icovars Any Internal covariates, to be enriched outside of the class.
_ssc_dict dict dictionary for sum of squares and cross products matrices.
_tZX np.ndarray Transpose of Z multiplied by X, set in get_fit().
_tXZ np.ndarray Transpose of X multiplied by Z, set in get_fit().
_tZy np.ndarray Transpose of Z multiplied by Y, set in get_fit().
_tZZinv np.ndarray Inverse of the transpose of Z multiplied by Z, set in get_fit().
_beta_hat np.ndarray Estimated regression coefficients.
_Y_hat_link np.ndarray Prediction at the level of the explanatory variable, i.e., the linear predictor X @ beta.
_Y_hat_response np.ndarray Prediction at the level of the response variable, i.e., the expected predictor E(Y|X).
_u_hat np.ndarray Residuals of the regression model.
_scores np.ndarray Scores used in the regression analysis.
_hessian np.ndarray Hessian matrix used in the regression.
_bread np.ndarray Bread matrix, used in calculating the variance-covariance matrix.
_vcov_type Any Type of variance-covariance matrix used.
_vcov_type_detail Any Detailed specification of the variance-covariance matrix type.
_is_clustered bool Indicates if clustering is used in the variance-covariance calculation.
_clustervar Any Variable used for clustering in the variance-covariance calculation.
_G Any Group information used in clustering.
_ssc Any Sum of squares and cross products matrix.
_vcov np.ndarray Variance-covariance matrix of the estimated coefficients.
_se np.ndarray Standard errors of the estimated coefficients.
_tstat np.ndarray T-statistics of the estimated coefficients.
_pvalue np.ndarray P-values associated with the t-statistics.
_conf_int np.ndarray Confidence intervals for the estimated coefficients.
_F_stat Any F-statistic for the model, set in get_Ftest().
_fixef_dict dict dictionary containing fixed effects estimates.
_sumFE np.ndarray Sum of all fixed effects for each observation.
_rmse float Root mean squared error of the model.
_r2 float R-squared value of the model.
_r2_within float R-squared value computed on demeaned dependent variable.
_adj_r2 float Adjusted R-squared value of the model.
_adj_r2_within float Adjusted R-squared value computed on demeaned dependent variable.
_solver str The solver used to fit the normal equation.
_data pd.DataFrame The data frame used in the estimation. None if arguments lean = True or store_data = False.

Methods

Name Description
add_fixest_multi_context Enrich Feols object.
ccv Compute the Causal Cluster Variance following Abadie et al (QJE 2023).
coef Fitted model coefficents.
confint Fitted model confidence intervals.
demean Demean the dependent variable and covariates by the fixed effect(s).
drop_multicol_vars Detect and drop multicollinear variables.
fixef Compute the coefficients of (swept out) fixed effects for a regression model.
get_fit Fit an OLS model.
get_inference Compute standard errors, t-statistics, and p-values for the regression model.
get_performance Get Goodness-of-Fit measures.
plot_ritest Plot the distribution of the Randomization Inference Statistics.
predict Predict values of the model on new data.
prepare_model_matrix Prepare model matrices for estimation.
pvalue Fitted model p-values.
resid Fitted model residuals.
ritest Conduct Randomization Inference (RI) test against a null hypothesis of
se Fitted model standard errors.
solve_ols Solve the ordinary least squares problem using the specified solver.
tidy Tidy model outputs.
to_array Convert estimation data frames to np arrays.
tstat Fitted model t-statistics.
update Update coefficients for new observations using Sherman-Morrison formula.
vcov Compute covariance matrices for an estimated regression model.
wald_test Conduct Wald test.
wildboottest Run a wild cluster bootstrap based on an object of type “Feols”.
wls_transform Transform model matrices for WLS Estimation.

add_fixest_multi_context

estimation.feols_.Feols.add_fixest_multi_context(
    depvar
    Y
    _data
    _ssc_dict
    _k_fe
    fval
    store_data
)

Enrich Feols object.

Enrich an instance of Feols Class with additional attributes set in the FixestMulti class.

Parameters

Name Type Description Default
FixestFormula FixestFormula The formula(s) used for estimation encoded in a FixestFormula object. required
depvar str The dependent variable of the regression model. required
Y pd.Series The dependent variable of the regression model. required
_data pd.DataFrame The data used for estimation. required
_ssc_dict dict A dictionary with the sum of squares and cross products matrices. required
_k_fe int The number of fixed effects. required
fval str The fixed effects formula. required
store_data bool Indicates whether to save the data used for estimation in the object required

Returns

Name Type Description
None

ccv

estimation.feols_.Feols.ccv(
    treatment
    cluster=None
    seed=None
    n_splits=8
    pk=1
    qk=1
)

Compute the Causal Cluster Variance following Abadie et al (QJE 2023).

Parameters

Name Type Description Default
treatment The name of the treatment variable. required
cluster str The name of the cluster variable. None by default. If None, uses the cluster variable from the model fit. None
seed int An integer to set the random seed. Defaults to None. None
n_splits int The number of splits to use in the cross-fitting procedure. Defaults to 8. 8
pk float The proportion of sampled clusters. Defaults to 1, which corresponds to all clusters of the population being sampled. 1
qk float The proportion of sampled observations within each cluster. Defaults to 1, which corresponds to all observations within each cluster being sampled. 1

Returns

Name Type Description
pd.DataFrame A DataFrame with inference based on the “Causal Cluster Variance” and “regular” CRV1 inference.

Examples

from pyfixest.estimation import feols
from pyfixest.utils import get_data

data = get_data()
data["D1"] = np.random.choice([0, 1], size=data.shape[0])

fit = feols("Y ~ D", data=data, vcov={"CRV1": "group_id"})
fit.ccv(treatment="D", pk=0.05, gk=0.5, n_splits=8, seed=123).head()

coef

estimation.feols_.Feols.coef()

Fitted model coefficents.

Returns

Name Type Description
pd.Series A pd.Series with the estimated coefficients of the regression model.

confint

estimation.feols_.Feols.confint(
    alpha=0.05
    keep=None
    drop=None
    exact_match=False
    joint=False
    seed=None
    reps=10000
)

Fitted model confidence intervals.

Parameters

Name Type Description Default
alpha float The significance level for confidence intervals. Defaults to 0.05. keep: str or list of str, optional 0.05
joint bool Whether to compute simultaneous confidence interval for joint null of parameters selected by keep and drop. Defaults to False. See https://www.causalml-book.org/assets/chapters/CausalML_chap_4.pdf, Remark 4.4.1 for details. False
keep Optional[Union[list, str]] The pattern for retaining coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Default is keeping all coefficients. You should use regular expressions to select coefficients. “age”, # would keep all coefficients containing age r”^tr”, # would keep all coefficients starting with tr r”\d$“, # would keep all coefficients ending with number Output will be in the order of the patterns. None
drop Optional[Union[list, str]] The pattern for excluding coefficient names. You can pass a string (one pattern) or a list (multiple patterns). Syntax is the same as for keep. Default is keeping all coefficients. Parameter keep and drop can be used simultaneously. None
exact_match Optional[bool] Whether to use exact match for keep and drop. Default is False. If True, the pattern will be matched exactly to the coefficient name instead of using regular expressions. False
reps int The number of bootstrap iterations to run for joint confidence intervals. Defaults to 10_000. Only used if joint is True. 10000
seed int The seed for the random number generator. Defaults to None. Only used if joint is True. None

Returns

Name Type Description
pd.DataFrame A pd.DataFrame with confidence intervals of the estimated regression model for the selected coefficients.

Examples

from pyfixest.utils import get_data
from pyfixest.estimation import feols

data = get_data()
fit = feols("Y ~ C(f1)", data=data)
fit.confint(alpha=0.10).head()
fit.confint(alpha=0.10, joint=True, reps=9999).head()

demean

estimation.feols_.Feols.demean()

Demean the dependent variable and covariates by the fixed effect(s).

drop_multicol_vars

estimation.feols_.Feols.drop_multicol_vars()

Detect and drop multicollinear variables.

fixef

estimation.feols_.Feols.fixef(atol=1e-06, btol=1e-06)

Compute the coefficients of (swept out) fixed effects for a regression model.

This method creates the following attributes: - alphaDF (pd.DataFrame): A DataFrame with the estimated fixed effects. - sumFE (np.array): An array with the sum of fixed effects for each observation (i = 1, …, N).

Returns

Name Type Description
None

get_fit

estimation.feols_.Feols.get_fit()

Fit an OLS model.

Returns

Name Type Description
None

get_inference

estimation.feols_.Feols.get_inference(alpha=0.05)

Compute standard errors, t-statistics, and p-values for the regression model.

Parameters

Name Type Description Default
alpha float The significance level for confidence intervals. Defaults to 0.05, which produces a 95% confidence interval. 0.05

Returns

Name Type Description
None

get_performance

estimation.feols_.Feols.get_performance()

Get Goodness-of-Fit measures.

Compute multiple additional measures commonly reported with linear regression output, including R-squared and adjusted R-squared. Note that variables with the suffix _within use demeaned dependent variables Y, while variables without do not or are invariant to demeaning.

Returns

Name Type Description
None
Creates the following instances:
- r2 (float): R-squared of the regression model.
- adj_r2 (float): Adjusted R-squared of the regression model.
- r2_within (float): R-squared of the regression model, computed on
demeaned dependent variable.
- adj_r2_within (float): Adjusted R-squared of the regression model,
computed on demeaned dependent variable.

plot_ritest

estimation.feols_.Feols.plot_ritest(plot_backend='lets_plot')

Plot the distribution of the Randomization Inference Statistics.

Parameters

Name Type Description Default
plot_backend str The plotting backend to use. Defaults to “lets_plot”. Alternatively, “matplotlib” is available. 'lets_plot'

Returns

Name Type Description
A lets_plot or matplotlib figure with the distribution of the Randomization
Inference Statistics.

predict

estimation.feols_.Feols.predict(
    newdata=None
    atol=1e-06
    btol=1e-06
    type='link'
)

Predict values of the model on new data.

Return a flat np.array with predicted values of the regression model. If new fixed effect levels are introduced in newdata, predicted values for such observations will be set to NaN.

Parameters

Name Type Description Default
newdata Optional[DataFrameType] A pd.DataFrame or pl.DataFrame with the data to be used for prediction. If None (default), the data used for fitting the model is used. None
type str The type of prediction to be computed. Can be either “response” (default) or “link”. For linear models, both are identical. 'link'
atol Float Stopping tolerance for scipy.sparse.linalg.lsqr(). See https://docs.scipy.org/doc/ scipy/reference/generated/scipy.sparse.linalg.lsqr.html 1e-6
btol Float Another stopping tolerance for scipy.sparse.linalg.lsqr(). See https://docs.scipy.org/doc/ scipy/reference/generated/scipy.sparse.linalg.lsqr.html 1e-6
link The type of prediction to be made. Can be either ‘link’ or ‘response’. Defaults to ‘link’. ‘link’ and ‘response’ lead to identical results for linear models. required

Returns

Name Type Description
y_hat np.ndarray A flat np.array with predicted values of the regression model.

prepare_model_matrix

estimation.feols_.Feols.prepare_model_matrix()

Prepare model matrices for estimation.

pvalue

estimation.feols_.Feols.pvalue()

Fitted model p-values.

Returns

Name Type Description
pd.Series A pd.Series with p-values of the estimated regression model.

resid

estimation.feols_.Feols.resid()

Fitted model residuals.

Returns

Name Type Description
np.ndarray A np.ndarray with the residuals of the estimated regression model.

ritest

estimation.feols_.Feols.ritest(
    resampvar
    cluster=None
    reps=100
    type='randomization-c'
    rng=None
    choose_algorithm='auto'
    store_ritest_statistics=False
    level=0.95
)

Conduct Randomization Inference (RI) test against a null hypothesis of resampvar = 0.

Parameters

Name Type Description Default
resampvar str The name of the variable to be resampled. required
cluster str The name of the cluster variable in case of cluster random assignment. If provided, resampvar is held constant within each cluster. Defaults to None. None
reps int The number of randomization iterations. Defaults to 100. 100
type str The type of the randomization inference test. Can be “randomization-c” or “randomization-t”. Note that the “randomization-c” is much faster, while the “randomization-t” is recommended by Wu & Ding (JASA, 2021). 'randomization-c'
rng np.random.Generator A random number generator. Defaults to None. None
choose_algorithm str The algorithm to use for the computation. Defaults to “auto”. The alternative is “fast” and “slow”, and should only be used for running CI tests. Ironically, this argument is not tested for any input errors from the user! So please don’t use it =) 'auto'
include_plot Whether to include a plot of the distribution p-values. Defaults to False. required
store_ritest_statistics bool Whether to store the simulated statistics of the RI procedure. Defaults to False. If True, stores the simulated statistics in the model object via the ritest_statistics attribute as a numpy array. False
level float The level for the confidence interval of the randomization inference p-value. Defaults to 0.95. 0.95

Returns

Name Type Description
A pd.Series with the regression coefficient of resampvar and the p-value
of the RI test. Additionally, reports the standard error and the confidence
interval of the p-value.

se

estimation.feols_.Feols.se()

Fitted model standard errors.

Returns

Name Type Description
pd.Series A pd.Series with the standard errors of the estimated regression model.

solve_ols

estimation.feols_.Feols.solve_ols(tZX, tZY, solver)

Solve the ordinary least squares problem using the specified solver.

Parameters

Name Type Description Default
tZX np.ndarray required
tZY np.ndarray required
solver str required

Returns

Name Type Description
array-like: The solution to the ordinary least squares problem.

Raises

Name Type Description
ValueError: If the specified solver is not supported.

tidy

estimation.feols_.Feols.tidy(alpha=None)

Tidy model outputs.

Return a tidy pd.DataFrame with the point estimates, standard errors, t-statistics, and p-values.

Parameters

Name Type Description Default
alpha Optional[float] The significance level for the confidence intervals. If None, computes a 95% confidence interval (alpha = 0.05). None

Returns

Name Type Description
tidy_df pd.DataFrame A tidy pd.DataFrame containing the regression results, including point estimates, standard errors, t-statistics, and p-values.

to_array

estimation.feols_.Feols.to_array()

Convert estimation data frames to np arrays.

tstat

estimation.feols_.Feols.tstat()

Fitted model t-statistics.

Returns

Name Type Description
pd.Series A pd.Series with t-statistics of the estimated regression model.

update

estimation.feols_.Feols.update(X_new, y_new, inplace=False)

Update coefficients for new observations using Sherman-Morrison formula.

Returns

Name Type Description
np.ndarray Updated coefficients

vcov

estimation.feols_.Feols.vcov(vcov, data=None)

Compute covariance matrices for an estimated regression model.

Parameters

Name Type Description Default
vcov Union[str, dict[str, str]] A string or dictionary specifying the type of variance-covariance matrix to use for inference. If a string, it can be one of “iid”, “hetero”, “HC1”, “HC2”, “HC3”. If a dictionary, it should have the format {“CRV1”: “clustervar”} for CRV1 inference or {“CRV3”: “clustervar”} for CRV3 inference. Note that CRV3 inference is currently not supported for IV estimation. required
data Optional[DataFrameType] The data used for estimation. If None, tries to fetch the data from the model object. Defaults to None. None

Returns

Name Type Description
Feols An instance of class [Feols(/reference/Feols.qmd) with updated inference.

wald_test

estimation.feols_.Feols.wald_test(R=None, q=None, distribution='F')

Conduct Wald test.

Compute a Wald test for a linear hypothesis of the form R * β = q. where R is m x k matrix, β is a k x 1 vector of coefficients, and q is m x 1 vector. By default, tests the joint null hypothesis that all coefficients are zero.

This method producues the following attriutes

_dfd : int degree of freedom in denominator _dfn : int degree of freedom in numerator _wald_statistic : scalar Wald-statistics computed for hypothesis testing _f_statistic : scalar Wald-statistics(when R is an indentity matrix, and q being zero vector) computed for hypothesis testing _p_value : scalar corresponding p-value for statistics

Parameters

Name Type Description Default
R array - like The matrix R of the linear hypothesis. If None, defaults to an identity matrix. None
q array - like The vector q of the linear hypothesis. If None, defaults to a vector of zeros. None
distribution str The distribution to use for the p-value. Can be either “F” or “chi2”. Defaults to “F”. 'F'

Returns

Name Type Description
pd.Series A pd.Series with the Wald statistic and p-value.

Examples

import numpy as np import pandas as pd

from pyfixest.estimation.estimation import feols

data = pd.read_csv(“pyfixest/did/data/df_het.csv”) data = data.iloc[1:3000]

R = np.array([[1,-1]] ) q = np.array([0.0])

fml = “dep_var ~ treat” fit = feols(fml, data, vcov={“CRV1”: “year”}, ssc=ssc(adj=False))

Wald test

fit.wald_test(R=R, q=q, distribution = “chi2”) f_stat = fit._f_statistic p_stat = fit._p_value

print(f”Python f_stat: {f_stat}“) print(f”Python p_stat: {p_stat}“)

The code above produces the following results :

Python f_stat: 256.55432910297003

Python p_stat: 9.67406627744023e-58

wildboottest

estimation.feols_.Feols.wildboottest(
    reps
    cluster=None
    param=None
    weights_type='rademacher'
    impose_null=True
    bootstrap_type='11'
    seed=None
    adj=True
    cluster_adj=True
    parallel=False
    return_bootstrapped_t_stats=False
)

Run a wild cluster bootstrap based on an object of type “Feols”.

Parameters

Name Type Description Default
reps int The number of bootstrap iterations to run. required
cluster Union[str, None] The variable used for clustering. Defaults to None. If None, then uses the variable specified in the model’s clustervar attribute. If no _clustervar attribute is found, runs a heteroskedasticity- robust bootstrap. None
param Union[str, None] A string of length one, containing the test parameter of interest. Defaults to None. None
weights_type str The type of bootstrap weights. Options are ‘rademacher’, ‘mammen’, ‘webb’, or ‘normal’. Defaults to ‘rademacher’. 'rademacher'
impose_null bool Indicates whether to impose the null hypothesis on the bootstrap DGP. Defaults to True. True
bootstrap_type str A string of length one to choose the bootstrap type. Options are ‘11’, ‘31’, ‘13’, or ‘33’. Defaults to ‘11’. '11'
seed Union[int, None] An option to provide a random seed. Defaults to None. None
adj bool Indicates whether to apply a small sample adjustment for the number of observations and covariates. Defaults to True. True
cluster_adj bool Indicates whether to apply a small sample adjustment for the number of clusters. Defaults to True. True
parallel bool Indicates whether to run the bootstrap in parallel. Defaults to False. False
seed Union[str, None] An option to provide a random seed. Defaults to None. None
return_bootstrapped_t_stats bool, optional: If True, the method returns a tuple of the regular output and the bootstrapped t-stats. Defaults to False. False

Returns

Name Type Description
pd.DataFrame A DataFrame with the original, non-bootstrapped t-statistic and bootstrapped p-value, along with the bootstrap type, inference type (HC vs CRV), and whether the null hypothesis was imposed on the bootstrap DGP. If return_bootstrapped_t_stats is True, the method returns a tuple of the regular output and the bootstrapped t-stats.

wls_transform

estimation.feols_.Feols.wls_transform()

Transform model matrices for WLS Estimation.