This notebook replicates code examples from Nick Huntington-Klein’s book on causal inference, The Effect .
from causaldata import Mroz, gapminder, organ_donations, restaurant_inspections
import pyfixest as pf
% load_ext watermark
% watermark -- iversions
causaldata: 0.1.3
pyfixest : 0.19.5
Chapter 4: Describing Relationships
# Read in data
dt = Mroz.load_pandas().data
# Keep just working women
dt = dt.query("lfp" )
# Create unlogged earnings
dt.loc[:, "earn" ] = dt["lwg" ].apply ("exp" )
# 5. Run multiple linear regression models by succesively adding controls
fit = pf.feols(fml= "lwg ~ csw(inc, wc, k5)" , data= dt, vcov= "iid" )
fit.summary()
###
Estimation: OLS
Dep. var.: lwg, Fixed effects:
Inference: iid
Observations: 428
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 1.007 | 0.071 | 14.180 | 0.000 | 0.868 | 1.147 |
| inc | 0.010 | 0.003 | 2.947 | 0.003 | 0.003 | 0.016 |
---
RMSE: 0.715 R2: 0.02
###
Estimation: OLS
Dep. var.: lwg, Fixed effects:
Inference: iid
Observations: 428
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.972 | 0.070 | 13.909 | 0.000 | 0.834 | 1.109 |
| inc | 0.005 | 0.003 | 1.640 | 0.102 | -0.001 | 0.012 |
| wc | 0.342 | 0.075 | 4.595 | 0.000 | 0.196 | 0.489 |
---
RMSE: 0.698 R2: 0.066
###
Estimation: OLS
Dep. var.: lwg, Fixed effects:
Inference: iid
Observations: 428
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.982 | 0.071 | 13.819 | 0.000 | 0.843 | 1.122 |
| inc | 0.005 | 0.003 | 1.590 | 0.113 | -0.001 | 0.012 |
| wc | 0.349 | 0.075 | 4.656 | 0.000 | 0.202 | 0.497 |
| k5 | -0.072 | 0.087 | -0.825 | 0.410 | -0.243 | 0.099 |
---
RMSE: 0.697 R2: 0.068
C:\Users\alexa\AppData\Local\Temp\ipykernel_26864\865424107.py:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dt.loc[:, "earn"] = dt["lwg"].apply("exp")
Chapter 13: Regression
Example 1
res = restaurant_inspections.load_pandas().data
res.inspection_score = res.inspection_score.astype(float )
res.NumberofLocations = res.NumberofLocations.astype(float )
res.dtypes
fit = pf.feols(fml= "inspection_score ~ NumberofLocations" , data= res)
pf.summary(fit)
###
Estimation: OLS
Dep. var.: inspection_score, Fixed effects:
Inference: iid
Observations: 27178
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:------------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 94.866 | 0.046 | 2049.047 | 0.000 | 94.775 | 94.956 |
| NumberofLocations | -0.019 | 0.000 | -43.321 | 0.000 | -0.020 | -0.018 |
---
RMSE: 6.051 R2: 0.065
Example 2
df = restaurant_inspections.load_pandas().data
fit1 = pf.feols(
fml= "inspection_score ~ NumberofLocations + I(NumberofLocations^2) + Year" , data= df
)
fit2 = pf.feols(fml= "inspection_score ~ NumberofLocations*Weekend + Year" , data= df)
pf.summary([fit1, fit2])
###
Estimation: OLS
Dep. var.: inspection_score, Fixed effects:
Inference: iid
Observations: 27178
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:-------------------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept | 225.504 | 12.409 | 18.172 | 0.000 | 201.181 | 249.827 |
| NumberofLocations | -0.075 | 0.019 | -4.041 | 0.000 | -0.111 | -0.039 |
| I(NumberofLocations ^ 2) | 0.056 | 0.019 | 3.009 | 0.003 | 0.020 | 0.093 |
| Year | -0.065 | 0.006 | -10.527 | 0.000 | -0.077 | -0.053 |
---
RMSE: 6.038 R2: 0.069
###
Estimation: OLS
Dep. var.: inspection_score, Fixed effects:
Inference: iid
Observations: 27178
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept | 225.126 | 12.415 | 18.134 | 0.000 | 200.793 | 249.460 |
| NumberofLocations | -0.019 | 0.000 | -43.759 | 0.000 | -0.020 | -0.018 |
| Weekend | 1.759 | 0.488 | 3.606 | 0.000 | 0.803 | 2.715 |
| Year | -0.065 | 0.006 | -10.494 | 0.000 | -0.077 | -0.053 |
| NumberofLocations:Weekend | -0.010 | 0.008 | -1.307 | 0.191 | -0.025 | 0.005 |
---
RMSE: 6.038 R2: 0.069
Example 3: HC Standard Errors
pf.feols(fml= "inspection_score ~ Year + Weekend" , data= df, vcov= "HC3" ).summary()
###
Estimation: OLS
Dep. var.: inspection_score, Fixed effects:
Inference: HC3
Observations: 27178
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept | 185.380 | 12.150 | 15.257 | 0.000 | 161.564 | 209.196 |
| Year | -0.046 | 0.006 | -7.551 | 0.000 | -0.057 | -0.034 |
| Weekend | 2.057 | 0.353 | 5.829 | 0.000 | 1.365 | 2.749 |
---
RMSE: 6.248 R2: 0.003
Example 4: Clustered Standard Errors
pf.feols(
fml= "inspection_score ~ Year + Weekend" , data= df, vcov= {"CRV1" : "Weekend" }
).tidy()
Coefficient
Intercept
185.380033
3.264345
56.789343
0.011209
143.902592
226.857474
Year
-0.045640
0.001624
-28.107556
0.022640
-0.066272
-0.025008
Weekend
2.057166
0.001401
1468.256799
0.000434
2.039364
2.074969
Example 5: Bootstrap Inference
fit = pf.feols(fml= "inspection_score ~ Year + Weekend" , data= df)
fit.wildboottest(reps= 999 , param= "Year" )
param Year
t value 15.434326
Pr(>|t|) 0.0
bootstrap_type 11
inference HC
impose_null True
dtype: object
Chapter 16: Fixed Effects
Example 2
gm = gapminder.load_pandas().data
gm["logGDPpercap" ] = gm["gdpPercap" ].apply ("log" )
fit = pf.feols(fml= "lifeExp ~ C(country) + np.log(gdpPercap)" , data= gm)
fit.tidy().head()
Coefficient
Intercept
-27.773459
2.500533
-11.107015
0.000000e+00
-32.678217
-22.868701
C(country)[T.Albania]
17.782625
2.195160
8.100835
1.110223e-15
13.476853
22.088397
C(country)[T.Algeria]
5.241055
2.214496
2.366704
1.806875e-02
0.897356
9.584755
C(country)[T.Angola]
-13.907122
2.201727
-6.316460
3.481857e-10
-18.225777
-9.588468
C(country)[T.Argentina]
8.132158
2.272781
3.578065
3.567229e-04
3.674133
12.590183
Example 3: TWFE
# Set our individual and time (index) for our data
fit = pf.feols(fml= "lifeExp ~ np.log(gdpPercap) | country + year" , data= gm)
fit.summary()
###
Estimation: OLS
Dep. var.: lifeExp, Fixed effects: country+year
Inference: CRV1
Observations: 1704
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:------------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| np.log(gdpPercap) | 1.450 | 0.677 | 2.141 | 0.034 | 0.111 | 2.788 |
---
RMSE: 3.267 R2: 0.936 R2 Within: 0.019
Chapter 18: Difference-in-Differences
Example 1
od = organ_donations.load_pandas().data
# Create Treatment Variable
od["California" ] = od["State" ] == "California"
od["After" ] = od["Quarter_Num" ] > 3
od["Treated" ] = 1 * (od["California" ] & od["After" ])
did = pf.feols(fml= "Rate ~ Treated | State + Quarter" , data= od)
did.summary()
###
Estimation: OLS
Dep. var.: Rate, Fixed effects: State+Quarter
Inference: CRV1
Observations: 162
| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Treated | -0.022 | 0.006 | -3.733 | 0.001 | -0.035 | -0.010 |
---
RMSE: 0.022 R2: 0.979 R2 Within: 0.009
Example 3: Dynamic Treatment Effect
od = organ_donations.load_pandas().data
# Create Treatment Variable
od["California" ] = od["State" ] == "California"
# od["Quarter_Num"] = pd.Categorical(od.Quarter_Num)
od["California" ] = od.California.astype(float )
did2 = pf.feols(
fml= "Rate ~ i(Quarter_Num, California,ref=3) | State + Quarter_Num" , data= od
)
did2.tidy()
Coefficient
C(Quarter_Num, contr.treatment(base=3))[T.1]:California
-0.002942
0.004986
-0.590105
0.560215
-0.013191
0.007307
C(Quarter_Num, contr.treatment(base=3))[T.2]:California
0.006296
0.002222
2.833502
0.008782
0.001729
0.010864
C(Quarter_Num, contr.treatment(base=3))[T.4]:California
-0.021565
0.004937
-4.368464
0.000178
-0.031713
-0.011418
C(Quarter_Num, contr.treatment(base=3))[T.5]:California
-0.020292
0.004387
-4.625529
0.000090
-0.029310
-0.011275
C(Quarter_Num, contr.treatment(base=3))[T.6]:California
-0.022165
0.009820
-2.257160
0.032627
-0.042351
-0.001980