Replicating Examples from “The Effect”

This notebook replicates code examples from Nick Huntington-Klein’s book on causal inference, The Effect.

from causaldata import Mroz, gapminder, organ_donations, restaurant_inspections

import pyfixest as pf

%load_ext watermark
%watermark --iversions

pyfixest  : 0.30.2
causaldata: 0.1.5

Chapter 4: Describing Relationships

# Read in data
dt = Mroz.load_pandas().data
# Keep just working women
dt = dt.query("lfp")
# Create unlogged earnings
dt.loc[:, "earn"] = dt["lwg"].apply("exp")

# 5. Run multiple linear regression models by succesively adding controls
fit = pf.feols(fml="lwg ~ csw(inc, wc, k5)", data=dt, vcov="iid")
pf.etable(fit)

/tmp/ipykernel_5080/786816010.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dt.loc[:, "earn"] = dt["lwg"].apply("exp")

	lwg
	(1)	(2)	(3)
coef
inc	0.010** (0.003)	0.005 (0.003)	0.005 (0.003)
wc		0.342*** (0.075)	0.349*** (0.075)
k5			-0.072 (0.087)
Intercept	1.007*** (0.071)	0.972*** (0.070)	0.982*** (0.071)
stats
Observations	428	428	428
S.E. type	iid	iid	iid
R²	0.020	0.066	0.068
Adj. R²	0.018	0.062	0.061
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Chapter 13: Regression

Example 1

res = restaurant_inspections.load_pandas().data
res.inspection_score = res.inspection_score.astype(float)
res.NumberofLocations = res.NumberofLocations.astype(float)
res.dtypes

fit = pf.feols(fml="inspection_score ~ NumberofLocations", data=res)
pf.etable([fit])

	inspection_score
	(1)
coef
NumberofLocations	-0.019*** (0.000)
Intercept	94.866*** (0.046)
stats
Observations	27178
S.E. type	iid
R²	0.065
Adj. R²	0.065
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Example 2

df = restaurant_inspections.load_pandas().data

fit1 = pf.feols(
    fml="inspection_score ~ NumberofLocations + I(NumberofLocations^2) + Year", data=df
)
fit2 = pf.feols(fml="inspection_score ~ NumberofLocations*Weekend + Year", data=df)

pf.etable([fit1, fit2])

	inspection_score
	(1)	(2)
coef
NumberofLocations	-0.075*** (0.019)	-0.019*** (0.000)
I(NumberofLocations ^ 2)	0.056** (0.019)
Year	-0.065*** (0.006)	-0.065*** (0.006)
Weekend		1.759*** (0.488)
NumberofLocations:Weekend		-0.010 (0.008)
Intercept	225.504*** (12.409)	225.126*** (12.415)
stats
Observations	27178	27178
S.E. type	iid	iid
R²	0.069	0.069
Adj. R²	0.069	0.069
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Example 3: HC Standard Errors

pf.feols(fml="inspection_score ~ Year + Weekend", data=df, vcov="HC3").summary()

###

Estimation:  OLS
Dep. var.: inspection_score, Fixed effects: 0
Inference:  HC3
Observations:  27178

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |    2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept     |    185.380 |       12.150 |    15.257 |      0.000 | 161.564 | 209.196 |
| Year          |     -0.046 |        0.006 |    -7.551 |      0.000 |  -0.057 |  -0.034 |
| Weekend       |      2.057 |        0.353 |     5.829 |      0.000 |   1.365 |   2.749 |
---
RMSE: 6.248 R2: 0.003

Example 4: Clustered Standard Errors

pf.feols(
    fml="inspection_score ~ Year + Weekend", data=df, vcov={"CRV1": "Weekend"}
).tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
Intercept	185.380033	3.264345	56.789344	0.011209	143.902592	226.857474
Year	-0.045640	0.001624	-28.107556	0.022640	-0.066272	-0.025008
Weekend	2.057166	0.001401	1468.256801	0.000434	2.039364	2.074969

Example 5: Bootstrap Inference

fit = pf.feols(fml="inspection_score ~ Year + Weekend", data=df)
fit.wildboottest(reps=999, param="Year")

param                 Year
t value           -7.55233
Pr(>|t|)               0.0
bootstrap_type          11
inference               HC
impose_null           True
ssc               1.000074
dtype: object

Chapter 16: Fixed Effects

Example 1

tba

Example 2

gm = gapminder.load_pandas().data
gm["logGDPpercap"] = gm["gdpPercap"].apply("log")

fit = pf.feols(fml="lifeExp ~ C(country) + np.log(gdpPercap)", data=gm)
fit.tidy().head()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
Intercept	-27.773459	2.500533	-11.107015	0.000000e+00	-32.678217	-22.868701
C(country)[T.Albania]	17.782625	2.195160	8.100835	1.110223e-15	13.476853	22.088397
C(country)[T.Algeria]	5.241055	2.214496	2.366704	1.806875e-02	0.897356	9.584755
C(country)[T.Angola]	-13.907122	2.201727	-6.316460	3.481857e-10	-18.225777	-9.588468
C(country)[T.Argentina]	8.132158	2.272781	3.578065	3.567229e-04	3.674133	12.590183

Example 3: TWFE

# Set our individual and time (index) for our data
fit = pf.feols(fml="lifeExp ~ np.log(gdpPercap) | country + year", data=gm)
fit.summary()

###

Estimation:  OLS
Dep. var.: lifeExp, Fixed effects: country+year
Inference:  CRV1
Observations:  1704

| Coefficient       |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:------------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| np.log(gdpPercap) |      1.450 |        0.679 |     2.134 |      0.035 |  0.107 |   2.793 |
---
RMSE: 3.267 R2: 0.936 R2 Within: 0.019

Chapter 18: Difference-in-Differences

Example 1

od = organ_donations.load_pandas().data

# Create Treatment Variable
od["California"] = od["State"] == "California"
od["After"] = od["Quarter_Num"] > 3
od["Treated"] = 1 * (od["California"] & od["After"])

did = pf.feols(fml="Rate ~ Treated | State + Quarter", data=od)
did.summary()

###

Estimation:  OLS
Dep. var.: Rate, Fixed effects: State+Quarter
Inference:  CRV1
Observations:  162

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Treated       |     -0.022 |        0.006 |    -3.663 |      0.001 | -0.035 |  -0.010 |
---
RMSE: 0.022 R2: 0.979 R2 Within: 0.009

Example 3: Dynamic Treatment Effect

od = organ_donations.load_pandas().data

# Create Treatment Variable
od["California"] = od["State"] == "California"
# od["Quarter_Num"] = pd.Categorical(od.Quarter_Num)
od["California"] = od.California.astype(float)

did2 = pf.feols(
    fml="Rate ~ i(Quarter_Num, California,ref=3) | State + Quarter_Num", data=od
)

did2.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
C(Quarter_Num, contr.treatment(base=3))[1]:California	-0.002942	0.005084	-0.578719	0.567759	-0.013393	0.007508
C(Quarter_Num, contr.treatment(base=3))[2]:California	0.006296	0.002266	2.778832	0.009997	0.001639	0.010953
C(Quarter_Num, contr.treatment(base=3))[4]:California	-0.021565	0.005034	-4.284177	0.000222	-0.031912	-0.011218
C(Quarter_Num, contr.treatment(base=3))[5]:California	-0.020292	0.004473	-4.536282	0.000114	-0.029487	-0.011097
C(Quarter_Num, contr.treatment(base=3))[6]:California	-0.022165	0.010013	-2.213610	0.035835	-0.042748	-0.001583