Does PyFixest match fixest?

This vignette compares estimation results from fixest with pyfixest via the rpy2 package.

Setup

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

import pyfixest as pf

# Activate pandas2ri
pandas2ri.activate()

# Import R packages
fixest = importr("fixest")
stats = importr("stats")
broom = importr("broom")

# IPython magic commands for autoreloading
%load_ext autoreload
%autoreload 2

# Get data using pyfixest
data = pf.get_data(model="Feols", N=10_000, seed=99292)

Ordinary Least Squares (OLS)

IID Inference

First, we estimate a model via `pyfixest. We compute “iid” standard errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid")

We estimate the same model with weights:

fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, weights="weights", vcov="iid"
)

Via r-fixest and rpy2, we get

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="iid",
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

Let’s compare how close the covariance matrices are:

fit_vcov = fit._vcov
r_vcov = stats.vcov(r_fit)
fit_vcov - r_vcov
array([[-6.50521303e-19, -1.23953015e-21],
       [-1.23953015e-21, -1.32137140e-19]])

And for WLS:

fit_weights._vcov - stats.vcov(r_fit_weights)
array([[ 1.68051337e-18, -1.27054942e-21],
       [-1.27054942e-21, -1.52465931e-19]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.017042 6.572948 5.181855e-11 0.078612 0.145425
X2 0.732788 0.004621 158.578261 0.000000e+00 0.723730 0.741846
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.017042 6.572948 0.0
1 X2 0.732788 0.004621 158.578261 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.016975 7.286200 3.432810e-13 0.090411 0.156962
X2 0.732244 0.004610 158.844322 0.000000e+00 0.723207 0.741280
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.016975 7.2862 0.0
1 X2 0.732244 0.00461 158.844322 0.0

Heteroskedastic Errors

We repeat the same exercise with heteroskedastic (HC1) errors:

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero")
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", weights="weights"
)
r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="hetero",
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

As before, we compare the variance covariance matrices:

fit._vcov - stats.vcov(r_fit)
array([[-1.08100378e-14, -1.51722114e-15],
       [-1.51722119e-15, -3.55911386e-15]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[-7.68563815e-15,  2.60362300e-15],
       [ 2.60362300e-15, -1.52112210e-15]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.017105 6.548962 6.082002e-11 0.078490 0.145548
X2 0.732788 0.004579 160.036098 0.000000e+00 0.723812 0.741763
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.017105 6.548962 0.0
1 X2 0.732788 0.004579 160.036098 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.019470 6.352618 2.210045e-10 0.085521 0.161852
X2 0.732244 0.005169 141.653304 0.000000e+00 0.722111 0.742376
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.01947 6.352618 0.0
1 X2 0.732244 0.005169 141.653304 0.0

Cluster-Robust Errors

We conclude with cluster robust errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"})
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, weights="weights"
)

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
)
r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov=ro.Formula("~f1"),
)
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
fit._vcov - stats.vcov(r_fit)
array([[ 2.86079212e-14, -4.95778972e-15],
       [-4.95778972e-15, -9.43703123e-16]])
fit_weights._vcov - stats.vcov(r_fit_weights)
array([[ 7.00025975e-15,  1.26505760e-14],
       [ 1.26505760e-14, -1.11283189e-16]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.112019 0.015865 7.060624 4.823750e-09 0.080152 0.143885
X2 0.732788 0.004490 163.215618 0.000000e+00 0.723770 0.741806
pd.DataFrame(broom.tidy_fixest(r_fit)).T
0 1 2 3 4
0 X1 0.112019 0.015865 7.060624 0.0
1 X2 0.732788 0.00449 163.215618 0.0
fit_weights.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 0.123687 0.018368 6.733633 1.566958e-08 0.086792 0.160581
X2 0.732244 0.005266 139.062210 0.000000e+00 0.721667 0.742820
pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T
0 1 2 3 4
0 X1 0.123687 0.018368 6.733633 0.0
1 X2 0.732244 0.005266 139.06221 0.0

Poisson Regression

data = pf.get_data(model="Fepois")
fit_iid = pf.fepois(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid", iwls_tol=1e-10)
fit_hetero = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", iwls_tol=1e-10
)
fit_crv = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, iwls_tol=1e-10
)

fit_r_iid = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
)

fit_r_hetero = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
)

fit_r_crv = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
)
/home/runner/work/pyfixest/pyfixest/pyfixest/estimation/model_matrix_fixest_.py:215: UserWarning: 2 singleton fixed effect(s) detected. These observations are dropped from the model.
  warnings.warn(
/home/runner/work/pyfixest/pyfixest/pyfixest/estimation/model_matrix_fixest_.py:215: UserWarning: 2 singleton fixed effect(s) detected. These observations are dropped from the model.
  warnings.warn(
/home/runner/work/pyfixest/pyfixest/pyfixest/estimation/model_matrix_fixest_.py:215: UserWarning: 2 singleton fixed effect(s) detected. These observations are dropped from the model.
  warnings.warn(
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTES: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
       1/1 fixed-effects (2 observations) removed because of only 0 outcomes or singletons.

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTES: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
       1/1 fixed-effects (2 observations) removed because of only 0 outcomes or singletons.

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTES: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).
       1/1 fixed-effects (2 observations) removed because of only 0 outcomes or singletons.
fit_iid._vcov - stats.vcov(fit_r_iid)
array([[ 1.28462055e-08, -6.97520098e-10],
       [-6.97520098e-10,  1.80739283e-09]])
fit_hetero._vcov - stats.vcov(fit_r_hetero)
array([[ 2.31981933e-08, -7.87424679e-10],
       [-7.87424679e-10,  3.27120926e-09]])
fit_crv._vcov - stats.vcov(fit_r_crv)
array([[ 1.63339756e-08, -1.24809110e-10],
       [-1.24809110e-10,  3.27895328e-09]])

We conclude by comparing all estimation results via the tidy methods:

fit_iid.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.042025 -0.156836 0.875374 -0.088959 0.075777
X2 -0.014924 0.011336 -1.316520 0.188000 -0.037142 0.007294
pd.DataFrame(broom.tidy_fixest(fit_r_iid)).T
0 1 2 3 4
0 X1 -0.006591 0.042025 -0.156836 0.875374
1 X2 -0.014924 0.011336 -1.316529 0.187996
fit_hetero.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.040363 -0.163297 0.870284 -0.085700 0.072518
X2 -0.014924 0.010828 -1.378277 0.168118 -0.036146 0.006298
pd.DataFrame(broom.tidy_fixest(fit_r_hetero)).T
0 1 2 3 4
0 X1 -0.006591 0.040362 -0.163298 0.870284
1 X2 -0.014924 0.010828 -1.378296 0.168112
fit_crv.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
Coefficient
X1 -0.006591 0.035302 -0.186705 0.851892 -0.075782 0.062600
X2 -0.014924 0.010468 -1.425726 0.153947 -0.035440 0.005592
pd.DataFrame(broom.tidy_fixest(fit_r_crv)).T
0 1 2 3 4
0 X1 -0.006591 0.035302 -0.186706 0.851891
1 X2 -0.014924 0.010467 -1.425747 0.153941