Changelog

PyFixest 0.28.0 (In Development, can be installed from github)

New features and bug fixes

  • Adds a function argument context, that allows to pass information / context to the formulaic.Formulaic.get_model_matrix() call that creates the model matrix.
  • Fix a bug that caused reindexing of LPDID._coeftable when calling LPDID.iplot(). As a result, a second call of LPDID.iplot() would fail.
  • Bumps the required formulaic version to 1.1.0 and fixes errors that arose when a) the ref argument was used for i() syntax, which led to a silent failure under formulaic >= 1.1.0, and fixef() / predict() with fixed effects, which led to a loud error.

New experimental Features

  • Adds a pf.feglm() function that supports GLMs with normal and binomial families (gaussian, logit, probit) without fixed effects. Fixed effects support is work in progress.
  • Adds options to run the demean function via JAX. This might speed up the model fit if GPU is available.

PyFixest 0.27.0

  • Adds support for Gelbach’s (JoLe 2016) Regression Decomposition method using a decompose() method for Feols.
  • Adds support for the multiple hypothesis correction by Westfall & Young via the pf.wyoung() function.
  • Input data frames to pf.feols() and pf.fepois() are now converted to pandas via narwhals. As a result, users can not provide duckdb or ibis tables as inputs, as well as pandas and polars data frames. polars and pyarrow are dropped as a dependencies.
  • Fixes a bug in the wildboottest method, which incorrectly used to run a regression on the demeaned dependend variable in case it was applied after a fixed effects regression. My apologies for that!
  • Fixes a bug in the ritest method, which would use randomization inference coefficients instead of t-statistics, leading to incorrect results. This has consequences for the rwolf() function, which, in case of running ri-inference, would default to run the randomization-t. My apolgies!
  • Adds a vignette on multiple testing corrections.
  • Adds a vignette on Gelbach’s regression decomposition.

PyFixest 0.22.0 - 0.25.4

See the github changelog for details: link.

PyFixest 0.22.0

Changes

  • Fix bug in wildboottest method @s3alfisc (#506)
  • docs: add sanskriti2005 as a contributor for infra @allcontributors (#503)
  • Infra: added the release-drafter for automation of release notes @sanskriti2005 (#502)
  • Fix broken link in contributing.md @s3alfisc (#499)
  • docs: add leostimpfle as a contributor for bug @allcontributors (#495)
  • Update justfile @leostimpfle (#494)
  • docs: add baggiponte as a contributor for doc @allcontributors (#490)
  • docs: improve installation section @baggiponte (#489)
  • Bump tornado from 6.4 to 6.4.1 @dependabot (#487)
  • docs: add leostimpfle as a contributor for code @allcontributors (#478)
  • Feols: speed up the creation of interacted fixed effects via fe1^fe2 syntax @leostimpfle (#475)
  • rename resampling iterations to ‘reps’ in all methods @s3alfisc (#474)
  • fix a lot of broken links throught the repo @s3alfisc (#472)
  • Multiple readme fixes required after package was moved to py-econometrics project @s3alfisc (#450)

Infrastructure

  • infrastructure: fix minor release drafter bugs @s3alfisc (#504)

PyFixest 0.21.0

  • Add support for randomization inference via the ritest() method:
import pyfixest as pf
data = pf.get_data()

fit = pf.feols("Y ~ X1", data = data)
fit.ritest(resampvar="X1=0", reps = 1000)

PyFixest 0.20.0

  • This version introduces MyPy type checks to the entire pyfixest codebase. Thanks to @juanitorduz for nudging me to get started with this =). It also fixes a handful of smaller bugs.

PyFixest 0.19.0

  • Fixes multiple smaller and larger performance regressions. The NYC-Taxi example regression now takes approximately 22 seconds to run (… if my laptopt is connected to a power charger)!
%load_ext autoreload
%autoreload 2

import duckdb
import time
import numpy as np
import pyfixest as pf

# %%
nyc = duckdb.sql(
    '''
    FROM 'C:/Users/alexa/Documents/nyc-taxi/**/*.parquet'
    SELECT
        tip_amount, trip_distance, passenger_count,
        vendor_id, payment_type, dropoff_at,
        dayofweek(dropoff_at) AS dofw
    WHERE year = 2012 AND month <= 3
    '''
    ).df()

# convert dowf, vendor_id, payment_type to categorical
tic = time.time()
nyc["dofw"] = nyc["dofw"].astype(int)
nyc["vendor_id"] = nyc["vendor_id"].astype("category")
nyc["payment_type"] = nyc["payment_type"].astype("category")
print(f"""
    I am convering columns of type 'objects' to 'categories' and 'int'data types outside
    of the regression, hence I am cheating a bit. This saves {np.round(time.time() - tic)} seconds.
    """
)
#    I am convering columns of type 'objects' to 'categories' and 'int'data types outside
#    of the regression, hence I am cheating a bit. This saves 7.0 seconds.

run = True
if run:

    # mock regression for JIT compilation
    fit = pf.feols(
        fml = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
        data = nyc.iloc[1:10_000],
        copy_data = False,
        store_data = False
        )

    import time
    tic = time.time()
    fit = pf.feols(
        fml = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
        data = nyc,
        copy_data = False, # saves a few seconds
        store_data = False # saves a few second
        )
    passed = time.time() - tic
    print(f"Passed time is {np.round(passed)}.")
    # Passed time is 22.
  • Adds three new function arguments to feols() and fepois(): copy_data, store_data, and fixef_tol.
  • Adds support for frequency weights with the weights_type function argument.
import pyfixest as pf

data = pf.get_data(N = 10000, model = "Fepois")
df_weighted = data[["Y", "X1", "f1"]].groupby(["Y", "X1", "f1"]).size().reset_index().rename(columns={0: "count"})
df_weighted["id"] = list(range(df_weighted.shape[0]))

print("Dimension of the aggregated df:", df_weighted.shape)
print(df_weighted.head())

fit = pf.feols(
    "Y ~ X1 | f1",
    data = data
)
fit_weighted = pf.feols(
    "Y ~ X1 | f1",
    data = df_weighted,
    weights = "count",
    weights_type = "fweights"
)
pf.etable([fit, fit_weighted], coef_fmt = "b(se) \n (t) \n (p)")
Dimension of the aggregated df: (1278, 5)
     Y   X1   f1  count  id
0  0.0  0.0  0.0     17   0
1  0.0  0.0  1.0     11   1
2  0.0  0.0  2.0     10   2
3  0.0  0.0  3.0     17   3
4  0.0  0.0  4.0     14   4
Y
(1) (2)
coef
X1 0.001(0.012)
(0.092)
(0.927)
0.001(0.012)
(0.092)
(0.927)
fe
f1 x x
stats
Observations 9997 9997
S.E. type by: f1 by: f1
R2 0.011 -
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient(Std. Error) (t-stats) (p-value)
  • Bugfix: Wild Cluster Bootstrap Inference with Weights would compute unweighted standard errors. Sorry about that! WLS is not supported for the WCB.
  • Adds support for CRV3 inference with weights.

PyFixest 0.18.0

  • Large Refactoring of Interal Processing of Model Formulas, in particular FixestFormulaParser and model_matrix_fixest. As a results, the code should be cleaner and more robust.
  • Thanks to the refactoring, we can now bump the required formulaic version to the stable 1.0.0 release.
  • The fml argument of model_matrix_fixest is deprecated. Instead, model_matrix_fixest now asks for a FixestFormula, which is essentially a dictionary with information on model specifications like a first stage formula (if applicable), dependent variables, fixed effects, etc.
  • Additionally, model_matrix_fixest now returns a dictionary instead of a tuple.
  • Brings back fixed effects reference setting via i(var1, var2, ref) syntax. Deprecates the i_ref1, i_ref2 function arguments. I.e. it is again possible to e.g. run
import pyfixest as pf
data = pf.get_data()

fit1 = pf.feols("Y ~ i(f1, X2)", data=data)
fit1.coef()[0:8]

Via the ref syntax, via can set the reference level:

fit2 = pf.feols("Y ~ i(f1, X2, ref = 1)", data=data)
fit2.coef()[0:8]

PyFixest 0.17.0

  • Restructures the codebase and reorganizes how users can interact with the pyfixest API. It is now recommended to use pyfixest in the following way:

    import numpy as np
    import pyfixest as pf
    data = pf.get_data()
    data["D"] = data["X1"] > 0
    fit = pf.feols("Y ~ D + f1", data = data)
    fit.tidy()
    Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
    Coefficient
    Intercept 0.778849 0.170261 4.574437 0.000005 0.444737 1.112961
    D -1.402617 0.152224 -9.214140 0.000000 -1.701335 -1.103899
    f1 0.004774 0.008058 0.592508 0.553645 -0.011038 0.020587

    The update should not inroduce any breaking changes. Thanks to @Wenzhi-Ding for the PR!

  • Adds support for simultaneous confidence intervals via a multiplier bootstrap. Thanks to @apoorvalal for the contribution!

    fit.confint(joint = True)
    2.5% 97.5%
    Intercept 0.383776 1.173922
    D -1.755837 -1.049396
    f1 -0.013923 0.023472
  • Adds support for the causal cluster variance estimator by Abadie et al. (QJE, 2023) for OLS via the .ccv() method.

    fit.ccv(treatment = "D", cluster = "group_id")
    /home/runner/work/pyfixest/pyfixest/pyfixest/estimation/feols_.py:1383: UserWarning: The initial model was not clustered. CRV1 inference is computed and stored in the model object.
      warnings.warn(
    Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
    CCV -1.4026168622179929 0.295399 -4.748209 0.000161 -2.023227 -0.782006
    CRV1 -1.402617 0.205132 -6.837621 0.000002 -1.833584 -0.97165

PyFixest 0.16.0

  • Adds multiple quality of life improvements for developers, thanks to NKeleher.
  • Adds more options to customize etable() output thanks to Wenzhi-Ding.
  • Implements Romano-Wolf and Bonferroni corrections for multiple testing in the multcomp module.

PyFixest 0.15.

  • Adds support for weighted least squares for feols().
  • Reduces testing time drastically by running tests on fewer random data samples. Qualitatively, the set of test remains identical.
  • Some updates for future pandas compatibility.

PyFixest 0.14.0

  • Moves the documentation to quartodoc.
  • Changes all docstrings to numpy format.
  • Difference-in-differences estimation functions now need to be imported via the pyfixest.did.estimation module:
from pyfixest.did.estimation import did2s, lpdid, event_study

PyFixest 0.13.5

  • Fixes a bug that lead to incorrect results when the dependent variable and all covariates (excluding the fixed effects) where integers.

PyFixest 0.13.4

  • Fixes a bug in etable() with IV’s that occurred because feols() does not report R2 statistics for IVs.

PyFixest 0.13.2

  • Fixes a bug in etable() and a warning in fixest_model_matrix that arose with higher pandas versions. Thanks to @aeturrell for reporting!

PyFixest 0.13.0

New Features

  • Introduces a new pyfixest.did module which contains routines for Difference-in-Differences estimation.
  • Introduces support for basic versions of the local projections DiD estimator following Dube et al (2023)
  • Adds a new vignette for Difference-in-Differences estimation.
  • Reports R2 values in etable().

PyFixest 0.12.0

Enhancements:

  • Good performance improvements for singleton fixed effects detection. Thanks to @styfenschaer for the PR! See #229.
  • Uses the r2u project for installing R and R packages on github actions, with great performance improvements.
  • Allows to pass polars data frames to feols(), fepois() and predict(). #232. Thanks to @vincentarelbundock for the suggestion!

Bug Fixes:

  • Missing variables in features were not always handled correctly in predict() with newdata not None in the presence of missing data, which would lead to an error. See #246 for details.
  • Categorical variables were not always handled correctly in predict() with newdata not None, because the number of fixed effects levels in newdata might be smaller than in data. In consequence, some levels were not found, which lead to an error. See #245 for details. Thanks to @jiafengkevinchen for the pointer!
  • Multicollinearity checks for over-identified IV was not implemented correctly, which lead to a dimension error. See #236 for details. Thanks to @jiafengkevinchen for the pointer!
  • The number of degrees of freedom k was computed incorrectly if columns were dropped from the design matrix X in the presence of multicollinearity. See #235 for details. Thanks to @jiafengkevinchen for the pointer!
  • If all variables were dropped due to multicollinearity, an unclear and imprecise error message was produced. See #228 for details. Thanks to @manferdinig for the pointer!
  • If selection fixef_rm = 'singleton', feols() and fepois() would fail, which has been fixed. #192

Dependency Requirements

  • For now, sets formulaic versions to be 0.6.6 or lower as version 1.0.0 seems to have introduced a problem with the i() operator, See #244 for details.
  • Drops dependency on pyhdfe.

PyFixest 0.11.1

  • Fixes some bugs around the computation of R-squared values (see issue #103).
  • Reports R-squared values again when calling .summary().

PyFixest 0.11.0

  • Significant speedups for CRV1 inference.

PyFixest 0.10.12

Fixes a small bug with the separation check for poisson regression #138.

PyFixest 0.10.11

Fixes bugs with i(var1, var2) syntax introduced with PyFixest 0.10.10.

PyFixest 0.10.10

Fixes a bug with variable interactions via i(var) syntax. See issue #221.

PyFixest 0.10.9

Makes etable() prettier and more informative.

PyFixest 0.10.8

Breaking changes

Reference levels for the i() formula syntax can no longer be set within the formula, but need to be added via the i_ref1 function argument to either feols() and fepois().

New feature

A dids2() function is added, which implements the 2-stage difference-in-differences procedure à la Gardner and follows the syntax of @kylebutts did2s R package.

from pyfixest.did.did import did2s
from pyfixest.estimation import feols
from pyfixest.visualize import iplot
import pandas as pd
import numpy as np

df_het = pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/master/pyfixest/did/data/df_het.csv")

fit = did2s(
    df_het,
    yname = "dep_var",
    first_stage = "~ 0 | state + year",
    second_stage = "~i(rel_year)",
    treatment = "treat",
    cluster = "state",
    i_ref1 = [-1.0, np.inf],
)

fit_twfe = feols(
    "dep_var ~ i(rel_year) | state + year",
    df_het,
    i_ref1 = [-1.0, np.inf]
)

iplot([fit, fit_twfe], coord_flip=False, figsize = (900, 400), title = "TWFE vs DID2S")

PyFixest 0.10.7

  • Adds basic support for event study estimation via two-way fixed effects and Gardner’s two-stage “Did2s” approach. This is a beta version and experimental. Further updates (i.e. proper event studies vs “only” ATTs) and a more flexible did2s front end will follow in future releases.
%load_ext autoreload
%autoreload 2

from pyfixest.did.did import event_study
import pyfixest as pf
import pandas as pd
df_het = pd.read_csv("pyfixest/did/data/df_het.csv")

fit_twfe = event_study(
    data = df_het,
    yname = "dep_var",
    idname= "state",
    tname = "year",
    gname = "g",
    estimator = "twfe"
)

fit_did2s = event_study(
    data = df_het,
    yname = "dep_var",
    idname= "state",
    tname = "year",
    gname = "g",
    estimator = "did2s"
)

pf.etable([fit_twfe, fit_did2s])
# | Coefficient   | est1             | est2             |
# |:--------------|:-----------------|:-----------------|
# | ATT           | 2.135*** (0.044) | 2.152*** (0.048) |
# Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001

PyFixest 0.10.6

  • Adds an etable() function that outputs markdown, latex or a pd.DataFrame.

PyFixest 0.10.5

  • Fixes a big in IV estimation that would trigger an error. See here for details. Thanks to @aeturrell for reporting!

PyFixest 0.10.4

  • Implements a custom function to drop singleton fixed effects.
  • Additional small performance improvements.

PyFixest 0.10.3

  • Allows for white space in the multiway clustering formula.
  • Adds documentation for multiway clustering.

PyFixest 0.10.2

  • Adds support for two-way clustering.
  • Adds support for CRV3 inference for Poisson regression.

PyFixest 0.10.1

  • Adapts the internal fixed effects demeaning criteron to match `PyHDFE’s default.
  • Adds Styfen as coauthor.

PyFixest 0.10

  • Multiple performance improvements.
  • Most importantly, implements a custom demeaning algorithm in numba - thanks to Styfen Schaer (@styfenschaer), which leads to performance improvements of 5x or more:
%load_ext autoreload
%autoreload 2

import numpy as np
import time
import pyhdfe
from pyfixest.demean import demean

np.random.seed(1238)
N = 10_000_000
x = np.random.normal(0, 1, 10*N).reshape((N,10))
f1 = np.random.choice(list(range(1000)), N).reshape((N,1))
f2 = np.random.choice(list(range(1000)), N).reshape((N,1))

flist = np.concatenate((f1, f2), axis = 1)
weights = np.ones(N)

algorithm = pyhdfe.create(flist)

start_time = time.time()
res_pyhdfe = algorithm.residualize(x)
end_time = time.time()
print(end_time - start_time)
# 26.04527711868286


start_time = time.time()
res_pyfixest, success = demean(x, flist, weights, tol = 1e-10)
# Calculate the execution time
end_time = time.time()
print(end_time - start_time)
#4.334428071975708

np.allclose(res_pyhdfe , res_pyfixest)
# True

PyFixest 0.9.11

  • Bump required formulaic version to 0.6.5.
  • Stop copying the data frame in fixef().

PyFixest 0.9.10

  • Fixes a big in the wildboottest method (see #158).
  • Allows to run a wild bootstrap after fixed effect estimation.

PyFixest 0.9.9

  • Adds support for wildboottest for Python 3.11.

PyFixest 0.9.8

  • Fixes a couple more bugs in the predict() and fixef() methods.
  • The predict() argument data is renamed to newdata.

PyFixest 0.9.7

Fixes a bug in predict() produced when multicollinear variables are dropped.

PyFixest 0.9.6

Improved Collinearity handling. See #145

PyFixest 0.9.5

  • Moves plotting from matplotlib to lets-plot.
  • Fixes a few minor bugs in plotting and the fixef() method.

PyFixest 0.9.1

Breaking API changes

It is no longer required to initiate an object of type Fixest prior to running [Feols(/reference/Feols.qmd) or fepois. Instead, you can now simply use feols() and fepois() as functions, just as in fixest. Both function can be found in an estimation module and need to obtain a pd.DataFrame as a function argument:

from pyfixest.estimation import fixest, fepois
from pyfixest.utils import get_data

data = get_data()
fit = feols("Y ~ X1 | f1", data = data, vcov = "iid")

Calling feols() will return an instance of class [Feols(/reference/Feols.qmd), while calling fepois() will return an instance of class Fepois. Multiple estimation syntax will return an instance of class FixestMulti.

Post processing works as before via .summary(), .tidy() and other methods.

New Features

A summary function allows to compare multiple models:

from pyfixest.summarize import summary
fit2 = feols("Y ~ X1 + X2| f1", data = data, vcov = "iid")
summary([fit, fit2])

Visualization is possible via custom methods (.iplot() & .coefplot()), but a new module allows to visualize a list of [Feols(/reference/Feols.qmd) and/or Fepois instances:

from pyfixest.visualize import coefplot, iplot
coefplot([fit, fit2])

The documentation has been improved (though there is still room for progress), and the code has been cleaned up a bit (also lots of room for improvements).