import pandas as pd
import pyfixest as pfCausal Inference for the Brave and True
Chapter 14: Panel Data and Fixed Effects
In this example we replicate the results of the great (freely available reference!) Causal Inference for the Brave and True - Chapter 14. Please refer to the original text for a detailed explanation of the data.
data_path = "https://raw.githubusercontent.com/bashtage/linearmodels/main/linearmodels/datasets/wage_panel/wage_panel.csv.bz2"
data_df = pd.read_csv(data_path)
data_df.head()| nr | year | black | exper | hisp | hours | married | educ | union | lwage | expersq | occupation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 13 | 1980 | 0 | 1 | 0 | 2672 | 0 | 14 | 0 | 1.197540 | 1 | 9 |
| 1 | 13 | 1981 | 0 | 2 | 0 | 2320 | 0 | 14 | 1 | 1.853060 | 4 | 9 |
| 2 | 13 | 1982 | 0 | 3 | 0 | 2940 | 0 | 14 | 0 | 1.344462 | 9 | 9 |
| 3 | 13 | 1983 | 0 | 4 | 0 | 2960 | 0 | 14 | 0 | 1.433213 | 16 | 9 |
| 4 | 13 | 1984 | 0 | 5 | 0 | 3071 | 0 | 14 | 0 | 1.568125 | 25 | 5 |
We have a classical panel data set with units (nr) and time (year).
We are interested in estimating the effect of marriage status on log wage, using a set of controls (union, hours) and individual (nr) and year fixed effects.
panel_fit = pf.feols(
fml="lwage ~ married + expersq + union + hours | nr + year",
data=data_df,
vcov={"CRV1": "nr + year"},
demeaner_backend="rust",
)pf.etable(panel_fit)| lwage | |
|---|---|
| (1) | |
| coef | |
| married | 0.048* (0.018) |
| expersq | -0.006*** (0.001) |
| union | 0.073* (0.023) |
| hours | -0.000** (0.000) |
| fe | |
| nr | x |
| year | x |
| stats | |
| Observations | 4360 |
| S.E. type | by: nr+year |
| R2 | 0.631 |
| R2 Within | 0.047 |
| Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) | |
We obtain the same results as in the book!