import pandas as pd
import pyfixest as pf
Causal Inference for the Brave and True
Chapter 14: Panel Data and Fixed Effects
In this example we replicate the results of the great (freely available reference!) Causal Inference for the Brave and True - Chapter 14. Please refer to the original text for a detailed explanation of the data.
= "https://raw.githubusercontent.com/bashtage/linearmodels/main/linearmodels/datasets/wage_panel/wage_panel.csv.bz2"
data_path = pd.read_csv(data_path)
data_df
data_df.head()
nr | year | black | exper | hisp | hours | married | educ | union | lwage | expersq | occupation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 13 | 1980 | 0 | 1 | 0 | 2672 | 0 | 14 | 0 | 1.197540 | 1 | 9 |
1 | 13 | 1981 | 0 | 2 | 0 | 2320 | 0 | 14 | 1 | 1.853060 | 4 | 9 |
2 | 13 | 1982 | 0 | 3 | 0 | 2940 | 0 | 14 | 0 | 1.344462 | 9 | 9 |
3 | 13 | 1983 | 0 | 4 | 0 | 2960 | 0 | 14 | 0 | 1.433213 | 16 | 9 |
4 | 13 | 1984 | 0 | 5 | 0 | 3071 | 0 | 14 | 0 | 1.568125 | 25 | 5 |
We have a classical panel data set with units (nr) and time (year).
We are interested in estimating the effect of marriage status on log wage, using a set of controls (union, hours) and individual (nr) and year fixed effects.
= pf.feols(
panel_fit ="lwage ~ married + expersq + union + hours | nr + year",
fml=data_df,
data={"CRV1": "nr + year"},
vcov="rust",
demeaner_backend )
pf.etable(panel_fit)
lwage | |
---|---|
(1) | |
coef | |
married | 0.048* (0.018) |
expersq | -0.006*** (0.001) |
union | 0.073* (0.023) |
hours | -0.000** (0.000) |
fe | |
nr | x |
year | x |
stats | |
Observations | 4360 |
S.E. type | by: nr+year |
R2 | 0.631 |
R2 Within | 0.047 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
We obtain the same results as in the book!