# PyFixest

> Fast high-dimensional fixed effects regression in Python, closely mirroring the syntax of the R package fixest.

## Docs

- [Getting Started](https://pyfixest.org/quickstart.html): [markdown](https://pyfixest.org/quickstart.html.md)
- [API Reference](https://pyfixest.org/reference/): [markdown](https://pyfixest.org/reference/index.html.md)
- [feols API](https://pyfixest.org/reference/estimation.api.feols.feols.html): [markdown](https://pyfixest.org/reference/estimation.api.feols.feols.html.md)
- [Regression Tables (etable)](https://pyfixest.org/table-layout.html): [markdown](https://pyfixest.org/table-layout.html.md)
- [Changelog](https://pyfixest.org/changelog.html): [markdown](https://pyfixest.org/changelog.html.md)

All documentation pages are available as clean markdown by appending `.md` to the HTML URL (e.g., `quickstart.html` -> `quickstart.html.md`).

## Core API (4 functions)

- `pyfixest.feols(fml, data, vcov, weights, ssc, fixef_rm, ...)`: OLS/WLS/IV with fixed effects.
- `pyfixest.fepois(fml, data, vcov, ...)`: Poisson regression with fixed effects.
- `pyfixest.feglm(fml, data, family, vcov, ...)`: GLM regression (family: "logit", "probit", "gaussian") with fixed effects.
- `pyfixest.quantreg(fml, data, quantile, ...)`: Quantile regression via interior point solver.

## Formula Syntax

Formulas follow fixest syntax and are split into 1-3 parts by `|`:

- One-part: `"Y ~ X1 + X2"` (no fixed effects, no IV)
- Two-part: `"Y ~ X1 + X2 | FE1 + FE2"` (fixed effects)
- Two-part IV: `"Y ~ X1 + X2 | X_endog ~ Z1 + Z2"` (IV without fixed effects)
- Three-part IV: `"Y ~ X1 + X2 | FE1 + FE2 | X_endog ~ Z1 + Z2"` (IV with fixed effects)

IV behavior:
- The IV part must be `endogenous ~ instruments`.
- Exogenous variables from the second-stage RHS are automatically added to the first stage.
- Endogenous variables are automatically added to the second stage.
- Multiple endogenous variables are not supported.

Other syntax:
- Multiple depvars are expanded to multiple estimations: `"Y1 + Y2 ~ X1"` behaves like `"sw(Y1, Y2) ~ X1"`.
- `i()` creates indicator expansions and interactions:
  - `i(cat)` expands to dummies for each level of `cat` (one omitted).
  - `i(cat, ref="Base")` sets the omitted reference level explicitly.
  - `i(cat, x)` interacts `cat` with `x`. If `x` is numeric, this yields category-specific slopes. If `x` is categorical, this yields cat-by-x indicators.
  - `i(cat1, cat2, ref2="Base")` interacts two categorical variables; `ref2` sets the omitted level of `cat2`.
  - Example (cat x numeric): `Y ~ i(industry, exposure)` creates industry-specific slopes on `exposure`.
  - Example (cat x cat): `Y ~ i(state, year, ref2=2000)` creates state-by-year indicators with 2000 as the base year.
- Standard interactions work as well:
  - `X1 * X2` expands to `X1 + X2 + X1:X2`.
  - `X1:X2` is the interaction term only (no main effects).
- Interacted FEs: `"Y ~ X1 | FE1 ^ FE2"` (creates a combined FE).

### Multiple Estimation Operators

Operators can appear anywhere in the formula (RHS, fixed effects, IV parts). They can be combined; expansion is recursive and produces all combinations. Multiple estimation can be significantly faster than independent model calls due to internal caching of demeaned covariates.

`sw` (sequential stepwise):
- `y ~ x1 + sw(x2, x3)` -> `y ~ x1 + x2` and `y ~ x1 + x3`

`sw0` (sequential stepwise with zero step):
- `y ~ x1 + sw0(x2, x3)` -> `y ~ x1`, `y ~ x1 + x2`, `y ~ x1 + x3`

`csw` (cumulative stepwise):
- `y ~ x1 + csw(x2, x3)` -> `y ~ x1 + x2`, `y ~ x1 + x2 + x3`

`csw0` (cumulative stepwise with zero step):
- `y ~ x1 + csw0(x2, x3)` -> `y ~ x1`, `y ~ x1 + x2`, `y ~ x1 + x2 + x3`

`mvsw` (multiverse stepwise):
- `y ~ mvsw(x1, x2, x3)` -> all non-empty combinations plus the zero step:
  `y ~ 1`, `y ~ x1`, `y ~ x2`, `y ~ x3`, `y ~ x1 + x2`, `y ~ x1 + x3`, `y ~ x2 + x3`, `y ~ x1 + x2 + x3`

Combining operators example:
- `y ~ csw(x1, x2) + sw(z1, z2)` expands to:
  `y ~ x1 + z1`, `y ~ x1 + z2`, `y ~ x1 + x2 + z1`, `y ~ x1 + x2 + z2`

You can run regressions for subsamples by using the `split` and `fsplit` arguments, where both split by the provided variable, but `fsplit` also provides a fit for the full sample.

## Inference (vcov)

Pass to `vcov`:

- `"iid"` -- IID errors
- `"hetero"` -- HC1 heteroskedasticity-robust (alias: `"HC1"`)
- `"HC2"` -- HC2 robust (not supported with fixed effects or IV)
- `"HC3"` -- HC3 robust (not supported with fixed effects or IV)
- `{"CRV1": "cluster_var"}` -- Cluster-robust variance
- `{"CRV3": "cluster_var"}` -- Leave-one-cluster-out jackknife
- `"NW"` -- Newey-West HAC (requires `vcov_kwargs` with `time_id` and optionally `panel_id`, `lag`)
- `"DK"` -- Driscoll-Kraay HAC (requires `vcov_kwargs` with `time_id` and optionally `panel_id`, `lag`)

Two-way clustering: `{"CRV1": "var1 + var2"}`.

Inference can be adjusted post-estimation: `fit.vcov("hetero").summary()`.

## Post Processing

Model objects support:

- `.summary()` -- Print regression summary
- `.tidy()` -- Tidy DataFrame of coefficients, SEs, t-stats, p-values, CIs
- `.coef()` -- Coefficient values
- `.se()` -- Standard errors
- `.pvalue()` -- P-values
- `.confint()` -- Confidence intervals
- `.predict(newdata)` -- Predictions
- `.resid()` -- Residuals
- `.vcov()` -- Variance-covariance matrix
- `.tstat()` -- t-statistics
- `.fixef()` -- Extract fixed effect estimates
- `.wildboottest(param, reps, seed)` -- Wild cluster bootstrap inference
- `.ccv(treatment, pk, qk, ...)` -- Causal cluster variance estimator
- `.ritest(resampvar, reps, ...)` -- Randomization inference
- `.decompose(param, x1_vars, type, ...)` -- Gelbach (2016) decomposition
- `.wald_test(R, q)` -- Linear hypothesis testing
- `.first_stage()` -- First-stage results (IV only)
- `.IV_Diag()` -- IV diagnostic tests (IV only)

For IV models, show first and second stage together: `pf.etable([fit._model_1st_stage, fit])`.

## DiD / Causal Inference

- `pyfixest.did2s(data, yname, first_stage, second_stage, treatment, cluster)` -- Two-stage DID (Gardner 2022).
- `pyfixest.event_study(data, yname, idname, tname, gname, estimator="twfe")` -- Event study with multiple estimators.
- `pyfixest.lpdid(data, yname, idname, tname, gname)` -- Local projections DID.
- `pyfixest.SaturatedEventStudy(data, yname, idname, tname, gname)` -- Saturated event study with cohort-specific effects.
- `pyfixest.panelview(data, unit, time, treat)` -- Panel treatment visualization.

## Visualization

- `pyfixest.coefplot(models)` -- Plot coefficients with confidence intervals.
- `pyfixest.iplot(models)` -- Plot coefficients from `i()` interactions (event-study style).
- `pyfixest.qplot(models)` -- Plot quantile regression coefficients.

## Multiple Testing

- `pyfixest.bonferroni(models, param)` -- Bonferroni-adjusted p-values.
- `pyfixest.rwolf(models, param, reps, seed)` -- Romano-Wolf adjusted p-values.
- `pyfixest.wyoung(models, param, reps, seed)` -- Westfall-Young adjusted p-values.

## Utilities

- `pyfixest.get_data(N, seed)` -- Generate example dataset for testing.
- `pyfixest.ssc(k_adj, k_fixef, G_adj, G_df)` -- Configure small sample corrections.

## etable Basics

For regression tables, use `pf.etable()`.

- Build tables: `pf.etable([fit1, fit2, ...])` or `pf.etable(pf.feols("Y~csw(X1,X2)", data))`.
- Output formats: `type="gt"` (default), `"md"`, `"tex"`, `"df"`.
- Keep/drop variables: `keep="X1"` or `drop=["X2"]`.
- Labels: `labels={"X1": "Age"}`, `felabels={"f1": "Industry FE"}`.
- Coefficient format: `coef_fmt="b (se)\n[p]"` shows coefficient, SE in parentheses, p-value in brackets.
- Title: `caption="Regression Results"`.
- Column headers: `model_heads=[...]` and `head_order="hd"` or `"dh"` to control header order.