Does `PyFixest` match `fixest`?

This vignette compares estimation results from fixest with pyfixest via the rpy2 package.

Setup

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

import pyfixest as pf

# Activate pandas2ri
pandas2ri.activate()

# Import R packages
fixest = importr("fixest")
stats = importr("stats")
broom = importr("broom")

# IPython magic commands for autoreloading
%load_ext autoreload
%autoreload 2

# Get data using pyfixest
data = pf.get_data(model="Feols", N=10_000, seed=99292)

Ordinary Least Squares (OLS)

IID Inference

First, we estimate a model via `pyfixest. We compute “iid” standard errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid")

We estimate the same model with weights:

fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, weights="weights", vcov="iid"
)

Via r-fixest and rpy2, we get

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="iid",
)

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

Let’s compare how close the covariance matrices are:

fit_vcov = fit._vcov
r_vcov = stats.vcov(r_fit)
fit_vcov - r_vcov

array([[-5.42101086e-19, -6.38169843e-22],
       [-6.38169843e-22, -1.35525272e-19]])

And for WLS:

fit_weights._vcov - stats.vcov(r_fit_weights)

array([[ 1.68051337e-18, -2.11758237e-21],
       [-2.11758237e-21, -1.52465931e-19]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	0.112019	0.017042	6.572948	5.181855e-11	0.078612	0.145425
X2	0.732788	0.004621	158.578261	0.000000e+00	0.723730	0.741846

pd.DataFrame(broom.tidy_fixest(r_fit)).T

	0	1	2	3	4
0	X1	0.112019	0.017042	6.572948	0.0
1	X2	0.732788	0.004621	158.578261	0.0

fit_weights.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	0.123687	0.016975	7.286200	3.432810e-13	0.090411	0.156962
X2	0.732244	0.004610	158.844322	0.000000e+00	0.723207	0.741280

pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T

	0	1	2	3	4
0	X1	0.123687	0.016975	7.2862	0.0
1	X2	0.732244	0.00461	158.844322	0.0

Heteroskedastic Errors

We repeat the same exercise with heteroskedastic (HC1) errors:

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero")
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", weights="weights"
)

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
)

r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov="hetero",
)

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

As before, we compare the variance covariance matrices:

fit._vcov - stats.vcov(r_fit)

array([[-1.63551898e-16, -2.15724469e-17],
       [-2.15724998e-17, -5.45556981e-17]])

fit_weights._vcov - stats.vcov(r_fit_weights)

array([[-2.07353665e-16, -9.64601120e-18],
       [-9.64579945e-18, -3.06524283e-17]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	0.112019	0.017105	6.548962	6.082002e-11	0.078490	0.145548
X2	0.732788	0.004579	160.036098	0.000000e+00	0.723812	0.741763

pd.DataFrame(broom.tidy_fixest(r_fit)).T

	0	1	2	3	4
0	X1	0.112019	0.017105	6.548962	0.0
1	X2	0.732788	0.004579	160.036098	0.0

fit_weights.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	0.123687	0.019470	6.352618	2.210045e-10	0.085521	0.161852
X2	0.732244	0.005169	141.653304	0.000000e+00	0.722111	0.742376

pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T

	0	1	2	3	4
0	X1	0.123687	0.01947	6.352618	0.0
1	X2	0.732244	0.005169	141.653304	0.0

Cluster-Robust Errors

We conclude with cluster robust errors.

fit = pf.feols(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"})
fit_weights = pf.feols(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, weights="weights"
)

r_fit = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
)
r_fit_weights = fixest.feols(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    weights=ro.Formula("~weights"),
    vcov=ro.Formula("~f1"),
)

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

fit._vcov - stats.vcov(r_fit)

array([[ 4.23272528e-16, -7.01699034e-17],
       [-7.01707504e-17, -1.42979161e-17]])

fit_weights._vcov - stats.vcov(r_fit_weights)

array([[2.60533782e-16, 4.09872467e-16],
       [4.09872467e-16, 1.80790712e-17]])

We conclude by comparing all estimation results via the tidy methods:

fit.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	0.112019	0.015865	7.060624	4.823750e-09	0.080152	0.143885
X2	0.732788	0.004490	163.215618	0.000000e+00	0.723770	0.741806

pd.DataFrame(broom.tidy_fixest(r_fit)).T

	0	1	2	3	4
0	X1	0.112019	0.015865	7.060624	0.0
1	X2	0.732788	0.00449	163.215618	0.0

fit_weights.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	0.123687	0.018368	6.733633	1.566958e-08	0.086792	0.160581
X2	0.732244	0.005266	139.062210	0.000000e+00	0.721667	0.742820

pd.DataFrame(broom.tidy_fixest(r_fit_weights)).T

	0	1	2	3	4
0	X1	0.123687	0.018368	6.733633	0.0
1	X2	0.732244	0.005266	139.06221	0.0

Poisson Regression

data = pf.get_data(model="Fepois")

fit_iid = pf.fepois(fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="iid", iwls_tol=1e-10)
fit_hetero = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov="hetero", iwls_tol=1e-10
)
fit_crv = pf.fepois(
    fml="Y ~ X1 + X2 | f1 + f2", data=data, vcov={"CRV1": "f1"}, iwls_tol=1e-10
)

fit_r_iid = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="iid",
)

fit_r_hetero = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov="hetero",
)

fit_r_crv = fixest.fepois(
    ro.Formula("Y ~ X1 + X2 | f1 + f2"),
    data=data,
    vcov=ro.Formula("~f1"),
)

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: NOTE: 3 observations removed because of NA values (LHS: 1, RHS: 1, Fixed-effects: 1).

fit_iid._vcov - stats.vcov(fit_r_iid)

array([[ 1.28680222e-08, -6.98422811e-10],
       [-6.98422811e-10,  1.81058144e-09]])

fit_hetero._vcov - stats.vcov(fit_r_hetero)

array([[ 2.32346186e-08, -7.86957614e-10],
       [-7.86957614e-10,  3.27676461e-09]])

fit_crv._vcov - stats.vcov(fit_r_crv)

array([[ 1.63391494e-08, -1.24691682e-10],
       [-1.24691682e-10,  3.27723218e-09]])

We conclude by comparing all estimation results via the tidy methods:

fit_iid.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	-0.006591	0.042068	-0.156678	0.875499	-0.089042	0.075860
X2	-0.014924	0.011347	-1.315197	0.188444	-0.037164	0.007316

pd.DataFrame(broom.tidy_fixest(fit_r_iid)).T

	0	1	2	3	4
0	X1	-0.006591	0.042068	-0.156679	0.875498
1	X2	-0.014924	0.011347	-1.315207	0.18844

fit_hetero.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	-0.006591	0.040403	-0.163133	0.870413	-0.085780	0.072598
X2	-0.014924	0.010839	-1.376894	0.168545	-0.036168	0.006320

pd.DataFrame(broom.tidy_fixest(fit_r_hetero)).T

	0	1	2	3	4
0	X1	-0.006591	0.040403	-0.163134	0.870413
1	X2	-0.014924	0.010839	-1.376913	0.168539

fit_crv.tidy()

	Estimate	Std. Error	t value	Pr(>\|t\|)	2.5%	97.5%
Coefficient
X1	-0.006591	0.035300	-0.186718	0.851882	-0.075777	0.062595
X2	-0.014924	0.010467	-1.425826	0.153919	-0.035439	0.005591

pd.DataFrame(broom.tidy_fixest(fit_r_crv)).T

	0	1	2	3	4
0	X1	-0.006591	0.035299	-0.186719	0.851881
1	X2	-0.014924	0.010467	-1.425847	0.153913