This vignette compares estimation results from fixest
with pyfixest
via the rpy2
package.
Setup
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr
import pyfixest as pf
# Activate pandas2ri
pandas2ri.activate()
# Import R packages
fixest = importr("fixest" )
stats = importr("stats" )
broom = importr("broom" )
# IPython magic commands for autoreloading
% load_ext autoreload
% autoreload 2
# Get data using pyfixest
data = pf.get_data(model= "Feols" , N= 10_000 , seed= 99292 )
Ordinary Least Squares (OLS)
IID Inference
First, we estimate a model via `pyfixest. We compute “iid” standard errors.
fit = pf.feols(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "iid" )
We estimate the same model with weights:
fit_weights = pf.feols(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, weights= "weights" , vcov= "iid"
)
Via r-fixest
and rpy2
, we get
r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "iid" ,
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
weights= ro.Formula("~weights" ),
vcov= "iid" ,
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
Let’s compare how close the covariance matrices are:
fit_vcov = fit._vcov
r_vcov = stats.vcov(r_fit)
fit_vcov - r_vcov
array([[-5.42101086e-19, -6.31138807e-22],
[-6.31138807e-22, -1.32137140e-19]])
And for WLS:
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[ 1.68051337e-18, -2.11758237e-21],
[-2.11758237e-21, -1.52465931e-19]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
0.112019
0.016947
6.610087
4.036971e-11
0.078800
0.145238
X2
0.732788
0.004595
159.474281
0.000000e+00
0.723781
0.741795
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0
X1
0.112019
0.016947
6.610087
0.0
1
X2
0.732788
0.004595
159.474281
0.0
Coefficient
X1
0.123687
0.016880
7.327370
2.529088e-13
0.090598
0.156775
X2
0.732244
0.004584
159.741845
0.000000e+00
0.723258
0.741229
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0
X1
0.123687
0.01688
7.32737
0.0
1
X2
0.732244
0.004584
159.741845
0.0
Heteroskedastic Errors
We repeat the same exercise with heteroskedastic (HC1) errors:
fit = pf.feols(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "hetero" )
fit_weights = pf.feols(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "hetero" , weights= "weights"
)
r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "hetero" ,
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
weights= ro.Formula("~weights" ),
vcov= "hetero" ,
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
As before, we compare the variance covariance matrices:
fit._vcov - stats.vcov(r_fit)
array([[-1.61762964e-16, -2.13305660e-17],
[-2.13306190e-17, -5.39492225e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[-2.05022631e-16, -9.53695571e-18],
[-9.53674395e-18, -3.03068389e-17]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
0.112019
0.017009
6.585966
4.746648e-11
0.078678
0.145359
X2
0.732788
0.004553
160.940355
0.000000e+00
0.723863
0.741713
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0
X1
0.112019
0.017009
6.585966
0.0
1
X2
0.732788
0.004553
160.940355
0.0
Coefficient
X1
0.123687
0.019361
6.388512
1.749667e-10
0.085736
0.161638
X2
0.732244
0.005140
142.453692
0.000000e+00
0.722168
0.742319
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0
X1
0.123687
0.019361
6.388512
0.0
1
X2
0.732244
0.00514
142.453692
0.0
Cluster-Robust Errors
We conclude with cluster robust errors.
fit = pf.feols(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= {"CRV1" : "f1" })
fit_weights = pf.feols(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= {"CRV1" : "f1" }, weights= "weights"
)
r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= ro.Formula("~f1" ),
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
weights= ro.Formula("~weights" ),
vcov= ro.Formula("~f1" ),
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit._vcov - stats.vcov(r_fit)
array([[ 4.20670443e-16, -6.97565513e-17],
[-6.97565513e-17, -1.42166010e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[2.59070109e-16, 4.07324592e-16],
[4.07321204e-16, 1.79537104e-17]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
0.112019
0.015816
7.082626
4.456304e-09
0.080251
0.143786
X2
0.732788
0.004476
163.724207
0.000000e+00
0.723798
0.741778
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0
X1
0.112019
0.015816
7.082626
0.0
1
X2
0.732788
0.004476
163.724207
0.0
Coefficient
X1
0.123687
0.018311
6.754616
1.452848e-08
0.086907
0.160466
X2
0.732244
0.005249
139.495536
0.000000e+00
0.721700
0.742787
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0
X1
0.123687
0.018311
6.754616
0.0
1
X2
0.732244
0.005249
139.495536
0.0
Poisson Regression
data = pf.get_data(model= "Fepois" )
fit_iid = pf.fepois(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "iid" , iwls_tol= 1e-10 )
fit_hetero = pf.fepois(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "hetero" , iwls_tol= 1e-10
)
fit_crv = pf.fepois(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= {"CRV1" : "f1" }, iwls_tol= 1e-10
)
fit_r_iid = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "iid" ,
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
fit_r_hetero = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "hetero" ,
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
fit_r_crv = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= ro.Formula("~f1" ),
ssc= fixest.ssc(True , "none" , True , "min" , "min" , False ),
)
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit_iid._vcov - stats.vcov(fit_r_iid)
array([[ 1.20791284e-08, -6.55604931e-10],
[-6.55604931e-10, 1.69958097e-09]])
fit_hetero._vcov - stats.vcov(fit_r_hetero)
array([[ 2.18101847e-08, -7.38711972e-10],
[-7.38711972e-10, 3.07587753e-09]])
fit_crv._vcov - stats.vcov(fit_r_crv)
array([[ 1.58300904e-08, -1.20806815e-10],
[-1.20806815e-10, 3.17512746e-09]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
-0.006591
0.040758
-0.161714
0.871531
-0.086475
0.073293
X2
-0.014924
0.010994
-1.357466
0.174633
-0.036472
0.006624
pd.DataFrame(broom.tidy_fixest(fit_r_iid)).T
0
X1
-0.006591
0.040758
-0.161714
0.871531
1
X2
-0.014924
0.010994
-1.357476
0.17463
Coefficient
X1
-0.006591
0.039145
-0.168376
0.866287
-0.083314
0.070132
X2
-0.014924
0.010501
-1.421145
0.155275
-0.035506
0.005658
pd.DataFrame(broom.tidy_fixest(fit_r_hetero)).T
0
X1
-0.006591
0.039145
-0.168377
0.866286
1
X2
-0.014924
0.010501
-1.421165
0.155269
Coefficient
X1
-0.006591
0.034745
-0.189697
0.849547
-0.074691
0.061509
X2
-0.014924
0.010303
-1.448570
0.147458
-0.035117
0.005269
pd.DataFrame(broom.tidy_fixest(fit_r_crv)).T
0
X1
-0.006591
0.034745
-0.189698
0.849546
1
X2
-0.014924
0.010302
-1.448592
0.147452