This vignette compares estimation results from fixest
with pyfixest
via the rpy2
package.
Setup
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr
import pyfixest as pf
# Activate pandas2ri
pandas2ri.activate()
# Import R packages
fixest = importr("fixest" )
stats = importr("stats" )
broom = importr("broom" )
# IPython magic commands for autoreloading
% load_ext autoreload
% autoreload 2
# Get data using pyfixest
data = pf.get_data(model= "Feols" , N= 10_000 , seed= 99292 )
Ordinary Least Squares (OLS)
IID Inference
First, we estimate a model via `pyfixest. We compute “iid” standard errors.
fit = pf.feols(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "iid" )
We estimate the same model with weights:
fit_weights = pf.feols(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, weights= "weights" , vcov= "iid"
)
Via r-fixest
and rpy2
, we get
r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "iid" ,
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
weights= ro.Formula("~weights" ),
vcov= "iid" ,
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
Let’s compare how close the covariance matrices are:
fit_vcov = fit._vcov
r_vcov = stats.vcov(r_fit)
fit_vcov - r_vcov
array([[-5.42101086e-19, -6.38169843e-22],
[-6.38169843e-22, -1.35525272e-19]])
And for WLS:
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[ 1.68051337e-18, -2.11758237e-21],
[-2.11758237e-21, -1.52465931e-19]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
0.112019
0.017042
6.572948
5.181855e-11
0.078612
0.145425
X2
0.732788
0.004621
158.578261
0.000000e+00
0.723730
0.741846
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0
X1
0.112019
0.017042
6.572948
0.0
1
X2
0.732788
0.004621
158.578261
0.0
Coefficient
X1
0.123687
0.016975
7.286200
3.432810e-13
0.090411
0.156962
X2
0.732244
0.004610
158.844322
0.000000e+00
0.723207
0.741280
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0
X1
0.123687
0.016975
7.2862
0.0
1
X2
0.732244
0.00461
158.844322
0.0
Heteroskedastic Errors
We repeat the same exercise with heteroskedastic (HC1) errors:
fit = pf.feols(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "hetero" )
fit_weights = pf.feols(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "hetero" , weights= "weights"
)
r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "hetero" ,
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
weights= ro.Formula("~weights" ),
vcov= "hetero" ,
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
As before, we compare the variance covariance matrices:
fit._vcov - stats.vcov(r_fit)
array([[-1.63606108e-16, -2.15722880e-17],
[-2.15723410e-17, -5.45624743e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[-2.07299455e-16, -9.64516417e-18],
[-9.64495241e-18, -3.06524283e-17]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
0.112019
0.017105
6.548962
6.082002e-11
0.078490
0.145548
X2
0.732788
0.004579
160.036098
0.000000e+00
0.723812
0.741763
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0
X1
0.112019
0.017105
6.548962
0.0
1
X2
0.732788
0.004579
160.036098
0.0
Coefficient
X1
0.123687
0.019470
6.352618
2.210045e-10
0.085521
0.161852
X2
0.732244
0.005169
141.653304
0.000000e+00
0.722111
0.742376
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0
X1
0.123687
0.01947
6.352618
0.0
1
X2
0.732244
0.005169
141.653304
0.0
Cluster-Robust Errors
We conclude with cluster robust errors.
fit = pf.feols(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= {"CRV1" : "f1" })
fit_weights = pf.feols(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= {"CRV1" : "f1" }, weights= "weights"
)
r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= ro.Formula("~f1" ),
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
weights= ro.Formula("~weights" ),
vcov= ro.Formula("~f1" ),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit._vcov - stats.vcov(r_fit)
array([[ 4.23326738e-16, -7.01927733e-17],
[-7.01936203e-17, -1.43046924e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[2.60533782e-16, 4.09879243e-16],
[4.09879243e-16, 1.80790712e-17]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
0.112019
0.015865
7.060624
4.823750e-09
0.080152
0.143885
X2
0.732788
0.004490
163.215618
0.000000e+00
0.723770
0.741806
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0
X1
0.112019
0.015865
7.060624
0.0
1
X2
0.732788
0.00449
163.215618
0.0
Coefficient
X1
0.123687
0.018368
6.733633
1.566958e-08
0.086792
0.160581
X2
0.732244
0.005266
139.062210
0.000000e+00
0.721667
0.742820
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0
X1
0.123687
0.018368
6.733633
0.0
1
X2
0.732244
0.005266
139.06221
0.0
Poisson Regression
data = pf.get_data(model= "Fepois" )
fit_iid = pf.fepois(fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "iid" , iwls_tol= 1e-10 )
fit_hetero = pf.fepois(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= "hetero" , iwls_tol= 1e-10
)
fit_crv = pf.fepois(
fml= "Y ~ X1 + X2 | f1 + f2" , data= data, vcov= {"CRV1" : "f1" }, iwls_tol= 1e-10
)
fit_r_iid = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "iid" ,
)
fit_r_hetero = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= "hetero" ,
)
fit_r_crv = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2" ),
data= data,
vcov= ro.Formula("~f1" ),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit_iid._vcov - stats.vcov(fit_r_iid)
array([[ 1.28680222e-08, -6.98422811e-10],
[-6.98422811e-10, 1.81058144e-09]])
fit_hetero._vcov - stats.vcov(fit_r_hetero)
array([[ 2.32346186e-08, -7.86957614e-10],
[-7.86957614e-10, 3.27676461e-09]])
fit_crv._vcov - stats.vcov(fit_r_crv)
array([[ 1.63391494e-08, -1.24691682e-10],
[-1.24691682e-10, 3.27723218e-09]])
We conclude by comparing all estimation results via the tidy
methods:
Coefficient
X1
-0.006591
0.042068
-0.156678
0.875499
-0.089042
0.075860
X2
-0.014924
0.011347
-1.315197
0.188444
-0.037164
0.007316
pd.DataFrame(broom.tidy_fixest(fit_r_iid)).T
0
X1
-0.006591
0.042068
-0.156679
0.875498
1
X2
-0.014924
0.011347
-1.315207
0.18844
Coefficient
X1
-0.006591
0.040403
-0.163133
0.870413
-0.085780
0.072598
X2
-0.014924
0.010839
-1.376894
0.168545
-0.036168
0.006320
pd.DataFrame(broom.tidy_fixest(fit_r_hetero)).T
0
X1
-0.006591
0.040403
-0.163134
0.870413
1
X2
-0.014924
0.010839
-1.376913
0.168539
Coefficient
X1
-0.006591
0.035300
-0.186718
0.851882
-0.075777
0.062595
X2
-0.014924
0.010467
-1.425826
0.153919
-0.035439
0.005591
pd.DataFrame(broom.tidy_fixest(fit_r_crv)).T
0
X1
-0.006591
0.035299
-0.186719
0.851881
1
X2
-0.014924
0.010467
-1.425847
0.153913