where \(\alpha_i\) is an individual fixed effect (constant across time) and \(\psi_t\) is a time fixed effect (constant across individuals). Fixed effects are not limited to panel data - any categorical grouping variable can serve as a fixed effect (for example, wage regressions with worker and firm FE) - more on that topic later!).
PyFixest efficiently estimates fixed effects models by applying the Frisch-Waugh-Lovell , which, among other things, avoids the need to create hundreds of dummy variables.
In the following section, we introduce two chanonical use cases of fixed effects regression.
Application 1: Twin Studies and the Returns to Education
One of the most foundational question in the economics of education is “wow much does an extra year of education raise wages”? If we were to simply regress years of education on realized wages, we would likely overstates the return to education as there is a selection bias: ability drives both education and wages. Or, in other words, kids with high innate (but unobserved) ability end up with more years of education, but also higher wages! The relation between education and wages might be spurios, as both are driven by the same latent factor.
Twin studies aim to correct for this selection effect by comparing twins who share the same genetic endowment. If the latent innate ability is encoded in the genome, then twins with identical gene should have the same latent ability. Under this assumption, any difference in educational attainment between twins is not driven by innate ability. As a result any within‑twin difference in wages can be attributed to differences in schooling rather than unobserved ability.
In practice, twin fixed‑effects regressions compare each twin to their sibling, netting out shared genes and family background. The estimated coefficient on schooling then captures the causal return to education under the assumption that the only remaining differences between twins are not systematically related to both schooling and wages.
import pyfixest as pftwins = pf.get_twin_data(N_pairs=500, seed=42)twins.head()
twin_pair_id
twin_id
ability
educ
age
experience
log_wage
0
1
1
0.304717
14.880083
38.0
17.119917
3.241823
1
1
2
0.304717
13.942729
49.0
29.057271
3.379130
2
2
1
-1.039984
10.041047
33.0
16.958953
2.303006
3
2
2
-1.039984
8.475001
32.0
17.524999
2.057258
4
3
1
0.750451
8.000000
35.0
21.000000
3.449381
Naive OLS (biased)
In a first step, we estimate the “naive” regression and fit the relation between education and wages.
Without controlling for ability, the coefficient on educ captures both the true return to education and the selection effect:
In the next regression, we include a fixed effect for each twin pair. This controls for everything the twins share, including genes and environment, so the estimate uses only differences in education between the twins.
The FE estimate (~0.08) is smaller than the “naive” OLS estimate. Indeed, part of the correlation between education and wages is that higher ability students obtain more years of education.
pf.etable( [fit_naive, fit_fe], labels={"log_wage": "Log Hourly Wage", "educ": "Years of Education", "experience": "Experience"}, felabels={"twin_pair_id": "Twin Pair FE"}, caption="Returns to Education: Naive OLS vs Twin FE",)
Returns to Education: Naive OLS vs Twin FE
Log Hourly Wage
(1)
(2)
coef
Years of Education
0.114 (0.007)
0.088 (0.007)
Experience
0.019 (0.001)
0.02 (0.002)
Intercept
1.113 (0.091)
fe
Twin Pair FE
-
x
stats
Observations
1,000
1,000
R2
0.283
0.801
Format of coefficient cell: Coefficient (Std. Error)
pf.coefplot([fit_naive, fit_fe], keep="educ")
Application 2: AKM Worker-Firm Regressions
Wages of workers depend on both worker characteristics and workplace characteristics. Higher-skill worker might earn more, but there might also be workplace-premia. A two-way fixed effects model as formulated in Abowd, Kramarz & Margolis (AKM, 1999) separates these unobserved effects.
For some background reading on AKM models and their application, take a look at this slide deck: AKM Lecture Slides.
Here is an interesting historical fact: the first paper (to our knowledge) that fitted a three-way high-dimensional fixed effects model in an unbalanced panel was published in 2013, only four years before the transformer was invented!. Before that, economists simply did not know how to fit 3-way fixed effects regression models on unbalanced panels efficiently. See Guimarães, Portugal, and Torres, “The Sources of Wage Variation: A Three-Way High-Dimensional Fixed Effects Regression Model”, who more or less fit the model above on Portuguese data.