Replicating Examples from “The Effect”

This notebook replicates code examples from Nick Huntington-Klein’s book on causal inference, The Effect.

from causaldata import Mroz, gapminder, organ_donations, restaurant_inspections

import pyfixest as pf

%load_ext watermark
%watermark --iversions
causaldata: 0.1.3
pyfixest  : 0.19.5

Chapter 4: Describing Relationships

# Read in data
dt = Mroz.load_pandas().data
# Keep just working women
dt = dt.query("lfp")
# Create unlogged earnings
dt.loc[:, "earn"] = dt["lwg"].apply("exp")

# 5. Run multiple linear regression models by succesively adding controls
fit = pf.feols(fml="lwg ~ csw(inc, wc, k5)", data=dt, vcov="iid")
fit.summary()
###

Estimation:  OLS
Dep. var.: lwg, Fixed effects: 
Inference:  iid
Observations:  428

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      1.007 |        0.071 |    14.180 |      0.000 |  0.868 |   1.147 |
| inc           |      0.010 |        0.003 |     2.947 |      0.003 |  0.003 |   0.016 |
---
RMSE: 0.715 R2: 0.02 
###

Estimation:  OLS
Dep. var.: lwg, Fixed effects: 
Inference:  iid
Observations:  428

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.972 |        0.070 |    13.909 |      0.000 |  0.834 |   1.109 |
| inc           |      0.005 |        0.003 |     1.640 |      0.102 | -0.001 |   0.012 |
| wc            |      0.342 |        0.075 |     4.595 |      0.000 |  0.196 |   0.489 |
---
RMSE: 0.698 R2: 0.066 
###

Estimation:  OLS
Dep. var.: lwg, Fixed effects: 
Inference:  iid
Observations:  428

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      0.982 |        0.071 |    13.819 |      0.000 |  0.843 |   1.122 |
| inc           |      0.005 |        0.003 |     1.590 |      0.113 | -0.001 |   0.012 |
| wc            |      0.349 |        0.075 |     4.656 |      0.000 |  0.202 |   0.497 |
| k5            |     -0.072 |        0.087 |    -0.825 |      0.410 | -0.243 |   0.099 |
---
RMSE: 0.697 R2: 0.068 
C:\Users\alexa\AppData\Local\Temp\ipykernel_26864\865424107.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dt.loc[:, "earn"] = dt["lwg"].apply("exp")

Chapter 13: Regression

Example 1

res = restaurant_inspections.load_pandas().data
res.inspection_score = res.inspection_score.astype(float)
res.NumberofLocations = res.NumberofLocations.astype(float)
res.dtypes

fit = pf.feols(fml="inspection_score ~ NumberofLocations", data=res)
pf.summary(fit)
###

Estimation:  OLS
Dep. var.: inspection_score, Fixed effects: 
Inference:  iid
Observations:  27178

| Coefficient       |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:------------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept         |     94.866 |        0.046 |  2049.047 |      0.000 | 94.775 |  94.956 |
| NumberofLocations |     -0.019 |        0.000 |   -43.321 |      0.000 | -0.020 |  -0.018 |
---
RMSE: 6.051 R2: 0.065 

Example 2

df = restaurant_inspections.load_pandas().data

fit1 = pf.feols(
    fml="inspection_score ~ NumberofLocations + I(NumberofLocations^2) + Year", data=df
)
fit2 = pf.feols(fml="inspection_score ~ NumberofLocations*Weekend + Year", data=df)

pf.summary([fit1, fit2])
###

Estimation:  OLS
Dep. var.: inspection_score, Fixed effects: 
Inference:  iid
Observations:  27178

| Coefficient              |   Estimate |   Std. Error |   t value |   Pr(>|t|) |    2.5% |   97.5% |
|:-------------------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept                |    225.504 |       12.409 |    18.172 |      0.000 | 201.181 | 249.827 |
| NumberofLocations        |     -0.075 |        0.019 |    -4.041 |      0.000 |  -0.111 |  -0.039 |
| I(NumberofLocations ^ 2) |      0.056 |        0.019 |     3.009 |      0.003 |   0.020 |   0.093 |
| Year                     |     -0.065 |        0.006 |   -10.527 |      0.000 |  -0.077 |  -0.053 |
---
RMSE: 6.038 R2: 0.069 
###

Estimation:  OLS
Dep. var.: inspection_score, Fixed effects: 
Inference:  iid
Observations:  27178

| Coefficient               |   Estimate |   Std. Error |   t value |   Pr(>|t|) |    2.5% |   97.5% |
|:--------------------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept                 |    225.126 |       12.415 |    18.134 |      0.000 | 200.793 | 249.460 |
| NumberofLocations         |     -0.019 |        0.000 |   -43.759 |      0.000 |  -0.020 |  -0.018 |
| Weekend                   |      1.759 |        0.488 |     3.606 |      0.000 |   0.803 |   2.715 |
| Year                      |     -0.065 |        0.006 |   -10.494 |      0.000 |  -0.077 |  -0.053 |
| NumberofLocations:Weekend |     -0.010 |        0.008 |    -1.307 |      0.191 |  -0.025 |   0.005 |
---
RMSE: 6.038 R2: 0.069 

Example 3: HC Standard Errors

pf.feols(fml="inspection_score ~ Year + Weekend", data=df, vcov="HC3").summary()
###

Estimation:  OLS
Dep. var.: inspection_score, Fixed effects: 
Inference:  HC3
Observations:  27178

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |    2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept     |    185.380 |       12.150 |    15.257 |      0.000 | 161.564 | 209.196 |
| Year          |     -0.046 |        0.006 |    -7.551 |      0.000 |  -0.057 |  -0.034 |
| Weekend       |      2.057 |        0.353 |     5.829 |      0.000 |   1.365 |   2.749 |
---
RMSE: 6.248 R2: 0.003 

Example 4: Clustered Standard Errors

pf.feols(
    fml="inspection_score ~ Year + Weekend", data=df, vcov={"CRV1": "Weekend"}
).tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
Intercept 185.380033 3.264345 56.789343 0.011209 143.902592 226.857474
Year -0.045640 0.001624 -28.107556 0.022640 -0.066272 -0.025008
Weekend 2.057166 0.001401 1468.256799 0.000434 2.039364 2.074969

Example 5: Bootstrap Inference

fit = pf.feols(fml="inspection_score ~ Year + Weekend", data=df)
fit.wildboottest(reps=999, param="Year")
param                  Year
t value           15.434326
Pr(>|t|)                0.0
bootstrap_type           11
inference                HC
impose_null            True
dtype: object

Chapter 16: Fixed Effects

Example 1

tba

Example 2

gm = gapminder.load_pandas().data
gm["logGDPpercap"] = gm["gdpPercap"].apply("log")

fit = pf.feols(fml="lifeExp ~ C(country) + np.log(gdpPercap)", data=gm)
fit.tidy().head()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
Intercept -27.773459 2.500533 -11.107015 0.000000e+00 -32.678217 -22.868701
C(country)[T.Albania] 17.782625 2.195160 8.100835 1.110223e-15 13.476853 22.088397
C(country)[T.Algeria] 5.241055 2.214496 2.366704 1.806875e-02 0.897356 9.584755
C(country)[T.Angola] -13.907122 2.201727 -6.316460 3.481857e-10 -18.225777 -9.588468
C(country)[T.Argentina] 8.132158 2.272781 3.578065 3.567229e-04 3.674133 12.590183

Example 3: TWFE

# Set our individual and time (index) for our data
fit = pf.feols(fml="lifeExp ~ np.log(gdpPercap) | country + year", data=gm)
fit.summary()
###

Estimation:  OLS
Dep. var.: lifeExp, Fixed effects: country+year
Inference:  CRV1
Observations:  1704

| Coefficient       |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:------------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| np.log(gdpPercap) |      1.450 |        0.677 |     2.141 |      0.034 |  0.111 |   2.788 |
---
RMSE: 3.267 R2: 0.936 R2 Within: 0.019 

Chapter 18: Difference-in-Differences

Example 1

od = organ_donations.load_pandas().data

# Create Treatment Variable
od["California"] = od["State"] == "California"
od["After"] = od["Quarter_Num"] > 3
od["Treated"] = 1 * (od["California"] & od["After"])

did = pf.feols(fml="Rate ~ Treated | State + Quarter", data=od)
did.summary()
###

Estimation:  OLS
Dep. var.: Rate, Fixed effects: State+Quarter
Inference:  CRV1
Observations:  162

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Treated       |     -0.022 |        0.006 |    -3.733 |      0.001 | -0.035 |  -0.010 |
---
RMSE: 0.022 R2: 0.979 R2 Within: 0.009 

Example 3: Dynamic Treatment Effect

od = organ_donations.load_pandas().data

# Create Treatment Variable
od["California"] = od["State"] == "California"
# od["Quarter_Num"] = pd.Categorical(od.Quarter_Num)
od["California"] = od.California.astype(float)

did2 = pf.feols(
    fml="Rate ~ i(Quarter_Num, California,ref=3) | State + Quarter_Num", data=od
)

did2.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
C(Quarter_Num, contr.treatment(base=3))[T.1]:California -0.002942 0.004986 -0.590105 0.560215 -0.013191 0.007307
C(Quarter_Num, contr.treatment(base=3))[T.2]:California 0.006296 0.002222 2.833502 0.008782 0.001729 0.010864
C(Quarter_Num, contr.treatment(base=3))[T.4]:California -0.021565 0.004937 -4.368464 0.000178 -0.031713 -0.011418
C(Quarter_Num, contr.treatment(base=3))[T.5]:California -0.020292 0.004387 -4.625529 0.000090 -0.029310 -0.011275
C(Quarter_Num, contr.treatment(base=3))[T.6]:California -0.022165 0.009820 -2.257160 0.032627 -0.042351 -0.001980