Does PyFixest match fixest?

This vignette compares estimation results from fixest with pyfixest via the rpy2 package.

Setup

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

import pyfixest as pf

# Activate pandas2ri
pandas2ri.activate()

# Import R packages
fixest = importr("fixest")
stats = importr("stats")
broom = importr("broom")

# IPython magic commands for autoreloading
%load_ext autoreload
%autoreload 2

# Get data using pyfixest
data = pf.get_data(model="Feols", N=10_000, seed=99292)

Ordinary Least Squares (OLS)

IID Inference

First, we estimate a model via `pyfixest. We compute “iid” standard errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid")

We estimate the same model with weights:

fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, weights="weights", vcov="iid"
)

Via r-fixest and rpy2, we get

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="iid",
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

Let’s compare how close the covariance matrices are:

fit_vcov = fit._vcov
r_vcov = stats.vcov(r_fit)
fit_vcov - r_vcov
array([[-5.42101086e-19, -6.31138807e-22],
       [-6.31138807e-22, -1.32137140e-19]])

And for WLS:

fit_weights._vcov - stats.vcov(r_fit_weights)
array([[ 1.68051337e-18, -2.11758237e-21],
       [-2.11758237e-21, -1.52465931e-19]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.016947 6.610087 4.036971e-11 0.078800 0.145238
X2 0.732788 0.004595 159.474281 0.000000e+00 0.723781 0.741795
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.016947 6.610087 0.0
1 X2 0.732788 0.004595 159.474281 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.016880 7.327370 2.529088e-13 0.090598 0.156775
X2 0.732244 0.004584 159.741845 0.000000e+00 0.723258 0.741229
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.01688 7.32737 0.0
1 X2 0.732244 0.004584 159.741845 0.0

Heteroskedastic Errors

We repeat the same exercise with heteroskedastic (HC1) errors:

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero")
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", weights="weights"
)
r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="hetero",
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

As before, we compare the variance covariance matrices:

fit._vcov - stats.vcov(r_fit)
array([[-1.61762964e-16, -2.13305660e-17],
       [-2.13306190e-17, -5.39492225e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[-2.05022631e-16, -9.53695571e-18],
       [-9.53674395e-18, -3.03068389e-17]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.017009 6.585966 4.746648e-11 0.078678 0.145359
X2 0.732788 0.004553 160.940355 0.000000e+00 0.723863 0.741713
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.017009 6.585966 0.0
1 X2 0.732788 0.004553 160.940355 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.019361 6.388512 1.749667e-10 0.085736 0.161638
X2 0.732244 0.005140 142.453692 0.000000e+00 0.722168 0.742319
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.019361 6.388512 0.0
1 X2 0.732244 0.00514 142.453692 0.0

Cluster-Robust Errors

We conclude with cluster robust errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"})
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, weights="weights"
)

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov=ro.Formula("~f1"),
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit._vcov - stats.vcov(r_fit)
array([[ 4.20670443e-16, -6.97565513e-17],
       [-6.97565513e-17, -1.42166010e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[2.59070109e-16, 4.07324592e-16],
       [4.07321204e-16, 1.79537104e-17]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.015816 7.082626 4.456304e-09 0.080251 0.143786
X2 0.732788 0.004476 163.724207 0.000000e+00 0.723798 0.741778
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.015816 7.082626 0.0
1 X2 0.732788 0.004476 163.724207 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.018311 6.754616 1.452848e-08 0.086907 0.160466
X2 0.732244 0.005249 139.495536 0.000000e+00 0.721700 0.742787
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.018311 6.754616 0.0
1 X2 0.732244 0.005249 139.495536 0.0

Poisson Regression

data = pf.get_data(model="Fepois")
fit_iid = pf.fepois(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid", iwls_tol=1e-10)
fit_hetero = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", iwls_tol=1e-10
)
fit_crv = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, iwls_tol=1e-10
)

fit_r_iid = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

fit_r_hetero = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

fit_r_crv = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
    ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit_iid._vcov - stats.vcov(fit_r_iid)
array([[ 1.20791284e-08, -6.55604931e-10],
       [-6.55604931e-10,  1.69958097e-09]])
fit_hetero._vcov - stats.vcov(fit_r_hetero)
array([[ 2.18101847e-08, -7.38711972e-10],
       [-7.38711972e-10,  3.07587753e-09]])
fit_crv._vcov - stats.vcov(fit_r_crv)
array([[ 1.58300904e-08, -1.20806815e-10],
       [-1.20806815e-10,  3.17512746e-09]])

We conclude by comparing all estimation results via the tidy methods:

fit_iid.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.040758 -0.161714 0.871531 -0.086475 0.073293
X2 -0.014924 0.010994 -1.357466 0.174633 -0.036472 0.006624
pd.DataFrame(broom.tidy_fixest(fit_r_iid)).T
0 1 2 3 4
0 X1 -0.006591 0.040758 -0.161714 0.871531
1 X2 -0.014924 0.010994 -1.357476 0.17463
fit_hetero.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.039145 -0.168376 0.866287 -0.083314 0.070132
X2 -0.014924 0.010501 -1.421145 0.155275 -0.035506 0.005658
pd.DataFrame(broom.tidy_fixest(fit_r_hetero)).T
0 1 2 3 4
0 X1 -0.006591 0.039145 -0.168377 0.866286
1 X2 -0.014924 0.010501 -1.421165 0.155269
fit_crv.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.034745 -0.189697 0.849547 -0.074691 0.061509
X2 -0.014924 0.010303 -1.448570 0.147458 -0.035117 0.005269
pd.DataFrame(broom.tidy_fixest(fit_r_crv)).T
0 1 2 3 4
0 X1 -0.006591 0.034745 -0.189698 0.849546
1 X2 -0.014924 0.010302 -1.448592 0.147452