News

PyFixest 0.22.0

Changes

  • Fix bug in wildboottest method @s3alfisc (#506)
  • docs: add sanskriti2005 as a contributor for infra @allcontributors (#503)
  • Infra: added the release-drafter for automation of release notes @sanskriti2005 (#502)
  • Fix broken link in contributing.md @s3alfisc (#499)
  • docs: add leostimpfle as a contributor for bug @allcontributors (#495)
  • Update justfile @leostimpfle (#494)
  • docs: add baggiponte as a contributor for doc @allcontributors (#490)
  • docs: improve installation section @baggiponte (#489)
  • Bump tornado from 6.4 to 6.4.1 @dependabot (#487)
  • docs: add leostimpfle as a contributor for code @allcontributors (#478)
  • Feols: speed up the creation of interacted fixed effects via fe1^fe2 syntax @leostimpfle (#475)
  • rename resampling iterations to ‘reps’ in all methods @s3alfisc (#474)
  • fix a lot of broken links throught the repo @s3alfisc (#472)
  • Multiple readme fixes required after package was moved to py-econometrics project @s3alfisc (#450)

Infrastructure

  • infrastructure: fix minor release drafter bugs @s3alfisc (#504)

PyFixest 0.21.0

  • Add support for randomization inference via the ritest() method:
import pyfixest as pf
data = pf.get_data()

fit = pf.feols("Y ~ X1", data = data)
fit.ritest(resampvar="X1=0", reps = 1000)

PyFixest 0.20.0

  • This version introduces MyPy type checks to the entire pyfixest codebase. Thanks to @juanitorduz for nudging me to get started with this =). It also fixes a handful of smaller bugs.

PyFixest 0.19.0

  • Fixes multiple smaller and larger performance regressions. The NYC-Taxi example regression now takes approximately 22 seconds to run (… if my laptopt is connected to a power charger)!
%load_ext autoreload
%autoreload 2

import duckdb
import time
import numpy as np
import pyfixest as pf

# %%
nyc = duckdb.sql(
    '''
    FROM 'C:/Users/alexa/Documents/nyc-taxi/**/*.parquet'
    SELECT
        tip_amount, trip_distance, passenger_count,
        vendor_id, payment_type, dropoff_at,
        dayofweek(dropoff_at) AS dofw
    WHERE year = 2012 AND month <= 3
    '''
    ).df()

# convert dowf, vendor_id, payment_type to categorical
tic = time.time()
nyc["dofw"] = nyc["dofw"].astype(int)
nyc["vendor_id"] = nyc["vendor_id"].astype("category")
nyc["payment_type"] = nyc["payment_type"].astype("category")
print(f"""
    I am convering columns of type 'objects' to 'categories' and 'int'data types outside
    of the regression, hence I am cheating a bit. This saves {np.round(time.time() - tic)} seconds.
    """
)
#    I am convering columns of type 'objects' to 'categories' and 'int'data types outside
#    of the regression, hence I am cheating a bit. This saves 7.0 seconds.

run = True
if run:

    # mock regression for JIT compilation
    fit = pf.feols(
        fml = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
        data = nyc.iloc[1:10_000],
        copy_data = False,
        store_data = False
        )

    import time
    tic = time.time()
    fit = pf.feols(
        fml = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
        data = nyc,
        copy_data = False, # saves a few seconds
        store_data = False # saves a few second
        )
    passed = time.time() - tic
    print(f"Passed time is {np.round(passed)}.")
    # Passed time is 22.
  • Adds three new function arguments to feols() and fepois(): copy_data, store_data, and fixef_tol.
  • Adds support for frequency weights with the weights_type function argument.
import pyfixest as pf

data = pf.get_data(N = 10000, model = "Fepois")
df_weighted = data[["Y", "X1", "f1"]].groupby(["Y", "X1", "f1"]).size().reset_index().rename(columns={0: "count"})
df_weighted["id"] = list(range(df_weighted.shape[0]))

print("Dimension of the aggregated df:", df_weighted.shape)
print(df_weighted.head())

fit = pf.feols(
    "Y ~ X1 | f1",
    data = data
)
fit_weighted = pf.feols(
    "Y ~ X1 | f1",
    data = df_weighted,
    weights = "count",
    weights_type = "fweights"
)
pf.etable([fit, fit_weighted], coef_fmt = "b(se) \n (t) \n (p)")
Dimension of the aggregated df: (1278, 5)
     Y   X1   f1  count  id
0  0.0  0.0  0.0     17   0
1  0.0  0.0  1.0     11   1
2  0.0  0.0  2.0     10   2
3  0.0  0.0  3.0     17   3
4  0.0  0.0  4.0     14   4
                       est1           est2
------------  -------------  -------------
depvar                    Y              Y
------------------------------------------
X1            0.001(0.012)   0.001(0.012)
                   (0.092)        (0.092)
                    (0.927)        (0.927)
------------------------------------------
f1                        x              x
------------------------------------------
R2                    0.011              -
S.E. type            by: f1         by: f1
Observations           9997           9997
------------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient(Std. Error) 
 (t-stats) 
 (p-value)
  • Bugfix: Wild Cluster Bootstrap Inference with Weights would compute unweighted standard errors. Sorry about that! WLS is not supported for the WCB.
  • Adds support for CRV3 inference with weights.

PyFixest 0.18.0

  • Large Refactoring of Interal Processing of Model Formulas, in particular FixestFormulaParser and model_matrix_fixest. As a results, the code should be cleaner and more robust.
  • Thanks to the refactoring, we can now bump the required formulaic version to the stable 1.0.0 release.
  • The fml argument of model_matrix_fixest is deprecated. Instead, model_matrix_fixest now asks for a FixestFormula, which is essentially a dictionary with information on model specifications like a first stage formula (if applicable), dependent variables, fixed effects, etc.
  • Additionally, model_matrix_fixest now returns a dictionary instead of a tuple.
  • Brings back fixed effects reference setting via i(var1, var2, ref) syntax. Deprecates the i_ref1, i_ref2 function arguments. I.e. it is again possible to e.g. run
import pyfixest as pf
data = pf.get_data()

fit1 = pf.feols("Y ~ i(f1, X2)", data=data)
fit1.coef()[0:8]

Via the ref syntax, via can set the reference level:

fit2 = pf.feols("Y ~ i(f1, X2, ref = 1)", data=data)
fit2.coef()[0:8]

PyFixest 0.17.0

  • Restructures the codebase and reorganizes how users can interact with the pyfixest API. It is now recommended to use pyfixest in the following way:

    import numpy as np
    import pyfixest as pf
    data = pf.get_data()
    data["D"] = data["X1"] > 0
    fit = pf.feols("Y ~ D + f1", data = data)
    fit.tidy()
    Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
    Coefficient
    Intercept 0.778849 0.170261 4.574437 0.000005 0.444737 1.112961
    D -1.402617 0.152224 -9.214140 0.000000 -1.701335 -1.103899
    f1 0.004774 0.008058 0.592508 0.553645 -0.011038 0.020587

    The update should not inroduce any breaking changes. Thanks to @Wenzhi-Ding for the PR!

  • Adds support for simultaneous confidence intervals via a multiplier bootstrap. Thanks to @apoorvalal for the contribution!

    fit.confint(joint = True)
    0.025% 0.975%
    Intercept 0.384354 1.173344
    D -1.755320 -1.049913
    f1 -0.013896 0.023444
  • Adds support for the causal cluster variance estimator by Abadie et al. (QJE, 2023) for OLS via the .ccv() method.

    fit.ccv(treatment = "D", cluster = "group_id")
    /home/runner/work/pyfixest/pyfixest/pyfixest/estimation/feols_.py:1179: UserWarning:
    
    The initial model was not clustered. CRV1 inference is computed and stored in the model object.
    
    Estimate Std. Error t value Pr(>|t|) 2.5% 97.5%
    CCV -1.4026168622179929 0.25203 -5.565287 0.000028 -1.932111 -0.873122
    CRV1 -1.402617 0.205132 -6.837621 0.000002 -1.833584 -0.97165

PyFixest 0.16.0

  • Adds multiple quality of life improvements for developers, thanks to NKeleher.
  • Adds more options to customize etable() output thanks to Wenzhi-Ding.
  • Implements Romano-Wolf and Bonferroni corrections for multiple testing in the multcomp module.

PyFixest 0.15.

  • Adds support for weighted least squares for feols().
  • Reduces testing time drastically by running tests on fewer random data samples. Qualitatively, the set of test remains identical.
  • Some updates for future pandas compatibility.

PyFixest 0.14.0

  • Moves the documentation to quartodoc.
  • Changes all docstrings to numpy format.
  • Difference-in-differences estimation functions now need to be imported via the pyfixest.did.estimation module:
from pyfixest.did.estimation import did2s, lpdid, event_study

PyFixest 0.13.5

  • Fixes a bug that lead to incorrect results when the dependent variable and all covariates (excluding the fixed effects) where integers.

PyFixest 0.13.4

  • Fixes a bug in etable() with IV’s that occurred because feols() does not report R2 statistics for IVs.

PyFixest 0.13.2

  • Fixes a bug in etable() and a warning in fixest_model_matrix that arose with higher pandas versions. Thanks to @aeturrell for reporting!

PyFixest 0.13.0

New Features

  • Introduces a new pyfixest.did module which contains routines for Difference-in-Differences estimation.
  • Introduces support for basic versions of the local projections DiD estimator following Dube et al (2023)
  • Adds a new vignette for Difference-in-Differences estimation.
  • Reports R2 values in etable().

PyFixest 0.12.0

Enhancements:

  • Good performance improvements for singleton fixed effects detection. Thanks to @styfenschaer for the PR! See #229.
  • Uses the r2u project for installing R and R packages on github actions, with great performance improvements.
  • Allows to pass polars data frames to feols(), fepois() and predict(). #232. Thanks to @vincentarelbundock for the suggestion!

Bug Fixes:

  • Missing variables in features were not always handled correctly in predict() with newdata not None in the presence of missing data, which would lead to an error. See #246 for details.
  • Categorical variables were not always handled correctly in predict() with newdata not None, because the number of fixed effects levels in newdata might be smaller than in data. In consequence, some levels were not found, which lead to an error. See #245 for details. Thanks to @jiafengkevinchen for the pointer!
  • Multicollinearity checks for over-identified IV was not implemented correctly, which lead to a dimension error. See #236 for details. Thanks to @jiafengkevinchen for the pointer!
  • The number of degrees of freedom k was computed incorrectly if columns were dropped from the design matrix X in the presence of multicollinearity. See #235 for details. Thanks to @jiafengkevinchen for the pointer!
  • If all variables were dropped due to multicollinearity, an unclear and imprecise error message was produced. See #228 for details. Thanks to @manferdinig for the pointer!
  • If selection fixef_rm = 'singleton', feols() and fepois() would fail, which has been fixed. #192

Dependency Requirements

  • For now, sets formulaic versions to be 0.6.6 or lower as version 1.0.0 seems to have introduced a problem with the i() operator, See #244 for details.
  • Drops dependency on pyhdfe.

PyFixest 0.11.1

  • Fixes some bugs around the computation of R-squared values (see issue #103).
  • Reports R-squared values again when calling .summary().

PyFixest 0.11.0

  • Significant speedups for CRV1 inference.

PyFixest 0.10.12

Fixes a small bug with the separation check for poisson regression #138.

PyFixest 0.10.11

Fixes bugs with i(var1, var2) syntax introduced with PyFixest 0.10.10.

PyFixest 0.10.10

Fixes a bug with variable interactions via i(var) syntax. See issue #221.

PyFixest 0.10.9

Makes etable() prettier and more informative.

PyFixest 0.10.8

Breaking changes

Reference levels for the i() formula syntax can no longer be set within the formula, but need to be added via the i_ref1 function argument to either feols() and fepois().

New feature

A dids2() function is added, which implements the 2-stage difference-in-differences procedure à la Gardner and follows the syntax of @kylebutts did2s R package.

from pyfixest.did.did import did2s
from pyfixest.estimation import feols
from pyfixest.visualize import iplot
import pandas as pd
import numpy as np

df_het = pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/master/pyfixest/did/data/df_het.csv")

fit = did2s(
    df_het,
    yname = "dep_var",
    first_stage = "~ 0 | state + year",
    second_stage = "~i(rel_year)",
    treatment = "treat",
    cluster = "state",
    i_ref1 = [-1.0, np.inf],
)

fit_twfe = feols(
    "dep_var ~ i(rel_year) | state + year",
    df_het,
    i_ref1 = [-1.0, np.inf]
)

iplot([fit, fit_twfe], coord_flip=False, figsize = (900, 400), title = "TWFE vs DID2S")

PyFixest 0.10.7

  • Adds basic support for event study estimation via two-way fixed effects and Gardner’s two-stage “Did2s” approach. This is a beta version and experimental. Further updates (i.e. proper event studies vs “only” ATTs) and a more flexible did2s front end will follow in future releases.
%load_ext autoreload
%autoreload 2

from pyfixest.did.did import event_study
import pyfixest as pf
import pandas as pd
df_het = pd.read_csv("pyfixest/did/data/df_het.csv")

fit_twfe = event_study(
    data = df_het,
    yname = "dep_var",
    idname= "state",
    tname = "year",
    gname = "g",
    estimator = "twfe"
)

fit_did2s = event_study(
    data = df_het,
    yname = "dep_var",
    idname= "state",
    tname = "year",
    gname = "g",
    estimator = "did2s"
)

pf.etable([fit_twfe, fit_did2s])
# | Coefficient   | est1             | est2             |
# |:--------------|:-----------------|:-----------------|
# | ATT           | 2.135*** (0.044) | 2.152*** (0.048) |
# Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001

PyFixest 0.10.6

  • Adds an etable() function that outputs markdown, latex or a pd.DataFrame.

PyFixest 0.10.5

  • Fixes a big in IV estimation that would trigger an error. See here for details. Thanks to @aeturrell for reporting!

PyFixest 0.10.4

  • Implements a custom function to drop singleton fixed effects.
  • Additional small performance improvements.

PyFixest 0.10.3

  • Allows for white space in the multiway clustering formula.
  • Adds documentation for multiway clustering.

PyFixest 0.10.2

  • Adds support for two-way clustering.
  • Adds support for CRV3 inference for Poisson regression.

PyFixest 0.10.1

  • Adapts the internal fixed effects demeaning criteron to match `PyHDFE’s default.
  • Adds Styfen as coauthor.

PyFixest 0.10

  • Multiple performance improvements.
  • Most importantly, implements a custom demeaning algorithm in numba - thanks to Styfen Schaer (@styfenschaer), which leads to performance improvements of 5x or more:
%load_ext autoreload
%autoreload 2

import numpy as np
import time
import pyhdfe
from pyfixest.demean import demean

np.random.seed(1238)
N = 10_000_000
x = np.random.normal(0, 1, 10*N).reshape((N,10))
f1 = np.random.choice(list(range(1000)), N).reshape((N,1))
f2 = np.random.choice(list(range(1000)), N).reshape((N,1))

flist = np.concatenate((f1, f2), axis = 1)
weights = np.ones(N)

algorithm = pyhdfe.create(flist)

start_time = time.time()
res_pyhdfe = algorithm.residualize(x)
end_time = time.time()
print(end_time - start_time)
# 26.04527711868286


start_time = time.time()
res_pyfixest, success = demean(x, flist, weights, tol = 1e-10)
# Calculate the execution time
end_time = time.time()
print(end_time - start_time)
#4.334428071975708

np.allclose(res_pyhdfe , res_pyfixest)
# True

PyFixest 0.9.11

  • Bump required formulaic version to 0.6.5.
  • Stop copying the data frame in fixef().

PyFixest 0.9.10

  • Fixes a big in the wildboottest method (see #158).
  • Allows to run a wild bootstrap after fixed effect estimation.

PyFixest 0.9.9

  • Adds support for wildboottest for Python 3.11.

PyFixest 0.9.8

  • Fixes a couple more bugs in the predict() and fixef() methods.
  • The predict() argument data is renamed to newdata.

PyFixest 0.9.7

Fixes a bug in predict() produced when multicollinear variables are dropped.

PyFixest 0.9.6

Improved Collinearity handling. See #145

PyFixest 0.9.5

  • Moves plotting from matplotlib to lets-plot.
  • Fixes a few minor bugs in plotting and the fixef() method.

PyFixest 0.9.1

Breaking API changes

It is no longer required to initiate an object of type Fixest prior to running [Feols(/reference/Feols.qmd) or fepois. Instead, you can now simply use feols() and fepois() as functions, just as in fixest. Both function can be found in an estimation module and need to obtain a pd.DataFrame as a function argument:

from pyfixest.estimation import fixest, fepois
from pyfixest.utils import get_data

data = get_data()
fit = feols("Y ~ X1 | f1", data = data, vcov = "iid")

Calling feols() will return an instance of class [Feols(/reference/Feols.qmd), while calling fepois() will return an instance of class Fepois. Multiple estimation syntax will return an instance of class FixestMulti.

Post processing works as before via .summary(), .tidy() and other methods.

New Features

A summary function allows to compare multiple models:

from pyfixest.summarize import summary
fit2 = feols("Y ~ X1 + X2| f1", data = data, vcov = "iid")
summary([fit, fit2])

Visualization is possible via custom methods (.iplot() & .coefplot()), but a new module allows to visualize a list of [Feols(/reference/Feols.qmd) and/or Fepois instances:

from pyfixest.visualize import coefplot, iplot
coefplot([fit, fit2])

The documentation has been improved (though there is still room for progress), and the code has been cleaned up a bit (also lots of room for improvements).