Does PyFixest match fixest?

This vignette compares estimation results from fixest with pyfixest via the rpy2 package.

Setup

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

import pyfixest as pf

# Activate pandas2ri
pandas2ri.activate()

# Import R packages
fixest = importr("fixest")
stats = importr("stats")
broom = importr("broom")

# IPython magic commands for autoreloading
%load_ext autoreload
%autoreload 2

# Get data using pyfixest
data = pf.get_data(model="Feols", N=10_000, seed=99292)

Ordinary Least Squares (OLS)

IID Inference

First, we estimate a model via `pyfixest. We compute “iid” standard errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid")

We estimate the same model with weights:

fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, weights="weights", vcov="iid"
)

Via r-fixest and rpy2, we get

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="iid",
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

Let’s compare how close the covariance matrices are:

fit_vcov = fit._vcov
r_vcov = stats.vcov(r_fit)
fit_vcov - r_vcov
array([[-5.42101086e-19, -6.38169843e-22],
       [-6.38169843e-22, -1.35525272e-19]])

And for WLS:

fit_weights._vcov - stats.vcov(r_fit_weights)
array([[ 1.68051337e-18, -2.11758237e-21],
       [-2.11758237e-21, -1.52465931e-19]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.017042 6.572948 5.181855e-11 0.078612 0.145425
X2 0.732788 0.004621 158.578261 0.000000e+00 0.723730 0.741846
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.017042 6.572948 0.0
1 X2 0.732788 0.004621 158.578261 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.016975 7.286200 3.432810e-13 0.090411 0.156962
X2 0.732244 0.004610 158.844322 0.000000e+00 0.723207 0.741280
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.016975 7.2862 0.0
1 X2 0.732244 0.00461 158.844322 0.0

Heteroskedastic Errors

We repeat the same exercise with heteroskedastic (HC1) errors:

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero")
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", weights="weights"
)
r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="hetero",
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

As before, we compare the variance covariance matrices:

fit._vcov - stats.vcov(r_fit)
array([[-1.63606108e-16, -2.15722880e-17],
       [-2.15723410e-17, -5.45624743e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[-2.07299455e-16, -9.64516417e-18],
       [-9.64495241e-18, -3.06524283e-17]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.017105 6.548962 6.082002e-11 0.078490 0.145548
X2 0.732788 0.004579 160.036098 0.000000e+00 0.723812 0.741763
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.017105 6.548962 0.0
1 X2 0.732788 0.004579 160.036098 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.019470 6.352618 2.210045e-10 0.085521 0.161852
X2 0.732244 0.005169 141.653304 0.000000e+00 0.722111 0.742376
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.01947 6.352618 0.0
1 X2 0.732244 0.005169 141.653304 0.0

Cluster-Robust Errors

We conclude with cluster robust errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"})
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, weights="weights"
)

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
)
r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov=ro.Formula("~f1"),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit._vcov - stats.vcov(r_fit)
array([[ 4.23326738e-16, -7.01927733e-17],
       [-7.01936203e-17, -1.43046924e-17]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[2.60533782e-16, 4.09879243e-16],
       [4.09879243e-16, 1.80790712e-17]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.015865 7.060624 4.823750e-09 0.080152 0.143885
X2 0.732788 0.004490 163.215618 0.000000e+00 0.723770 0.741806
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.015865 7.060624 0.0
1 X2 0.732788 0.00449 163.215618 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.018368 6.733633 1.566958e-08 0.086792 0.160581
X2 0.732244 0.005266 139.062210 0.000000e+00 0.721667 0.742820
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.018368 6.733633 0.0
1 X2 0.732244 0.005266 139.06221 0.0

Poisson Regression

data = pf.get_data(model="Fepois")
fit_iid = pf.fepois(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid", iwls_tol=1e-10)
fit_hetero = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", iwls_tol=1e-10
)
fit_crv = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, iwls_tol=1e-10
)

fit_r_iid = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
)

fit_r_hetero = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
)

fit_r_crv = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit_iid._vcov - stats.vcov(fit_r_iid)
array([[ 1.28680222e-08, -6.98422811e-10],
       [-6.98422811e-10,  1.81058144e-09]])
fit_hetero._vcov - stats.vcov(fit_r_hetero)
array([[ 2.32346186e-08, -7.86957614e-10],
       [-7.86957614e-10,  3.27676461e-09]])
fit_crv._vcov - stats.vcov(fit_r_crv)
array([[ 1.63391494e-08, -1.24691682e-10],
       [-1.24691682e-10,  3.27723218e-09]])

We conclude by comparing all estimation results via the tidy methods:

fit_iid.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.042068 -0.156678 0.875499 -0.089042 0.075860
X2 -0.014924 0.011347 -1.315197 0.188444 -0.037164 0.007316
pd.DataFrame(broom.tidy_fixest(fit_r_iid)).T
0 1 2 3 4
0 X1 -0.006591 0.042068 -0.156679 0.875498
1 X2 -0.014924 0.011347 -1.315207 0.18844
fit_hetero.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.040403 -0.163133 0.870413 -0.085780 0.072598
X2 -0.014924 0.010839 -1.376894 0.168545 -0.036168 0.006320
pd.DataFrame(broom.tidy_fixest(fit_r_hetero)).T
0 1 2 3 4
0 X1 -0.006591 0.040403 -0.163134 0.870413
1 X2 -0.014924 0.010839 -1.376913 0.168539
fit_crv.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.035300 -0.186718 0.851882 -0.075777 0.062595
X2 -0.014924 0.010467 -1.425826 0.153919 -0.035439 0.005591
pd.DataFrame(broom.tidy_fixest(fit_r_crv)).T
0 1 2 3 4
0 X1 -0.006591 0.035299 -0.186719 0.851881
1 X2 -0.014924 0.010467 -1.425847 0.153913