Instrumental Variables
You should have read the Getting Started page and have pyfixest installed.
Introduction
Ordinary Least Squares (OLS) delivers biased estimates when a regressor is correlated with the error term — due to omitted variables, simultaneity, or measurement error. Instrumental Variables (IV) estimation solves this by finding a variable \(Z\) that:
- Relevance: \(Z\) is correlated with the endogenous regressor \(X\).
- Exclusion: \(Z\) affects the outcome \(Y\) only through \(X\).
The two-stage least squares (2SLS) estimator first regresses \(X\) on \(Z\) (and controls), then uses the predicted \(\hat{X}\) in the outcome equation. In pyfixest, the IV syntax is:
Y ~ exogenous_controls | fixed_effects | endogenous ~ instrument
This tutorial walks through three applications — each highlighting a different reason researchers reach for IV.
import pyfixest as pfApplication 1: The Motherhood Penalty
Does having children reduce women’s earnings?
A naive regression of earnings on fertility is biased: women with stronger career ambitions may both earn more and have fewer children. Since career ambition is positively correlated with earnings but negatively correlated with fertility, OLS overstates the motherhood penalty. Lundborg, Plug & Rasmussen (2017) exploit the quasi-random success of IVF treatment as an instrument for fertility among women who sought treatment.
Synthetic Data
ivf_df = pf.get_ivf_data()
ivf_df.head()Naive OLS
Without accounting for endogeneity, OLS overstates the penalty because career ambition is an omitted variable that increases earnings while reducing fertility:
fit_ols = pf.feols("earnings ~ num_children", data=ivf_df)
fit_ols.summary()IV Estimation
Using ivf_success as an instrument for num_children:
fit_iv = pf.feols("earnings ~ 1 | num_children ~ ivf_success", data=ivf_df)
fit_iv.summary()The IV estimate is closer to the true effect of -0.15 — less negative than the OLS estimate, which overstates the penalty due to omitted-variable bias from career ambition.
Compare OLS and IV
pf.etable(
[fit_ols, fit_iv],
labels={"earnings": "Earnings", "num_children": "Number of Children"},
caption="Motherhood Penalty: OLS vs IV",
)First-Stage Diagnostics
A strong first stage is essential for IV to work. We check the first-stage F-statistic and run IV_Diag():
fit_iv.first_stage()fit_iv.IV_Diag()The first-stage F-statistic should be well above 10, confirming that IVF success is a strong predictor of fertility.
Application 2: A/B Encouragement Design
Estimating the effect of feature adoption on revenue when users don’t comply with treatment assignment.
A tech company runs an A/B test: half of users are encouraged (shown a banner) to try a new feature. But not everyone who sees the banner actually adopts the feature, and some control users discover it on their own. The simple intent-to-treat (ITT) comparison underestimates the effect on users who actually adopt. Random assignment serves as an instrument for actual adoption, and the IV estimate recovers the Local Average Treatment Effect (LATE) on compliers.
Synthetic Data
ab_df = pf.get_encouragement_data()
ab_df.head()Three Estimands
We estimate the reduced form (ITT), the first stage, and the IV/LATE:
# Intent-to-treat (reduced form)
fit_itt = pf.feols("revenue ~ assigned_treatment | user_type", data=ab_df)
# First stage
fit_fs = pf.feols("adopted_feature ~ assigned_treatment | user_type", data=ab_df)
# IV / LATE
fit_late = pf.feols("revenue ~ 1 | user_type | adopted_feature ~ assigned_treatment", data=ab_df)The Wald estimator says: \(\text{LATE} = \frac{\text{ITT}}{\text{First Stage}} = \frac{\text{Cov}(Y, Z)}{\text{Cov}(D, Z)}\).
Let’s verify:
itt_coef = fit_itt.coef()["assigned_treatment"]
fs_coef = fit_fs.coef()["assigned_treatment"]
late_coef = fit_late.coef()["adopted_feature"]
print(f"ITT coefficient: {itt_coef:.4f}")
print(f"First-stage coefficient: {fs_coef:.4f}")
print(f"Wald ratio (ITT / FS): {itt_coef / fs_coef:.4f}")
print(f"IV/LATE coefficient: {late_coef:.4f}")Compare All Three
pf.etable(
[fit_itt, fit_fs, fit_late],
labels={
"revenue": "Revenue",
"adopted_feature": "Adopted Feature",
"assigned_treatment": "Assigned Treatment",
},
felabels={"user_type": "User Type FE"},
caption="A/B Encouragement Design: ITT, First Stage, and LATE",
)pf.coefplot([fit_itt, fit_late], keep="assigned_treatment|adopted_feature")IV Diagnostics
fit_late.IV_Diag()IV Diagnostics in PyFixest
Weak instruments — instruments that are only loosely correlated with the endogenous variable — lead to biased and unreliable IV estimates. PyFixest provides two key diagnostic tools to detect this problem.
The First-Stage F-Statistic
The .first_stage() method re-estimates the first-stage regression and computes the first-stage F-statistic, which tests \(H_0\colon \pi = 0\) (all instrument coefficients are jointly zero). The classic rule of thumb is \(F > 10\) (Stock & Yogo, 2005).
The F-statistic adapts to your variance-covariance specification: if you fit the IV model with heteroskedasticity-robust or cluster-robust standard errors, the first-stage F-statistic is computed accordingly.
# Re-use the motherhood penalty IV model
fit_iv.first_stage()
# The F-stat is stored as an attribute after calling first_stage()
print(f"First-stage F-statistic: {fit_iv._f_stat_1st_stage:.1f}")
print(f"First-stage p-value: {fit_iv._p_value_1st_stage:.4f}")The Effective F-Statistic
The standard F-statistic can be misleading when there are multiple endogenous regressors or when errors are non-homoskedastic. The effective F-statistic (Olea & Pflueger, 2013) is a more robust measure of instrument strength that remains valid under heteroskedasticity:
\[ F_{\text{eff}} = \frac{\hat{\pi}' Q_{ZZ} \hat{\pi}}{\text{tr}(\hat{\Sigma} \, Q_{ZZ})} \]
where \(\hat{\pi}\) are the first-stage coefficients on the excluded instruments, \(Q_{ZZ} = Z'Z\), and \(\hat{\Sigma}\) is the robust variance-covariance matrix of \(\hat{\pi}\).
The .IV_Diag() method computes both the standard F-statistic and the effective F-statistic in one call:
fit_iv.IV_Diag()
print(f"Standard F-statistic: {fit_iv._f_stat_1st_stage:.1f}")
print(f"Effective F-statistic: {fit_iv._eff_F:.1f}")- \(F < 10\): instruments are likely too weak. Consider finding stronger instruments or using weak-instrument-robust inference.
- Large gap between standard and effective F: suggests that heteroskedasticity matters for instrument strength. Rely on the effective F in this case.
- \(F_{\text{eff}} \gg 10\): strong instruments under heteroskedasticity — you’re in good shape.
- Standard Errors & Inference — learn about robust, cluster-robust, and bootstrap inference.
- Regression Tables — customize publication-ready output tables.
FeivAPI Reference — full documentation of the IV estimator class.