import pyfixest as pf
= pf.get_data()
data
= pf.feols("Y ~ X1", data = data)
fit ="X1=0", reps = 1000) fit.ritest(resampvar
Changelog
PyFixest 0.26.0 (In Development)
- Input data frames to
pf.feols()
andpf.fepois()
are now converted topandas
via narwhals. As a result, users can not provideduckdb
oribis
tables as inputs, as well aspandas
andpolars
data frames.
PyFixest 0.22.0 - 0.25.4
See the github changelog for details: link.
PyFixest 0.22.0
Changes
- Fix bug in wildboottest method @s3alfisc (#506)
- docs: add sanskriti2005 as a contributor for infra @allcontributors (#503)
- Infra: added the release-drafter for automation of release notes @sanskriti2005 (#502)
- Fix broken link in contributing.md @s3alfisc (#499)
- docs: add leostimpfle as a contributor for bug @allcontributors (#495)
- Update justfile @leostimpfle (#494)
- docs: add baggiponte as a contributor for doc @allcontributors (#490)
- docs: improve installation section @baggiponte (#489)
- Bump tornado from 6.4 to 6.4.1 @dependabot (#487)
- docs: add leostimpfle as a contributor for code @allcontributors (#478)
- Feols: speed up the creation of interacted fixed effects via
fe1^fe2
syntax @leostimpfle (#475) - rename resampling iterations to ‘reps’ in all methods @s3alfisc (#474)
- fix a lot of broken links throught the repo @s3alfisc (#472)
- Multiple readme fixes required after package was moved to py-econometrics project @s3alfisc (#450)
Infrastructure
- infrastructure: fix minor release drafter bugs @s3alfisc (#504)
PyFixest 0.21.0
- Add support for randomization inference via the
ritest()
method:
PyFixest 0.20.0
- This version introduces MyPy type checks to the entire pyfixest codebase. Thanks to @juanitorduz for nudging me to get started with this =). It also fixes a handful of smaller bugs.
PyFixest 0.19.0
- Fixes multiple smaller and larger performance regressions. The NYC-Taxi example regression now takes approximately 22 seconds to run (… if my laptopt is connected to a power charger)!
%load_ext autoreload
%autoreload 2
import duckdb
import time
import numpy as np
import pyfixest as pf
# %%
= duckdb.sql(
nyc '''
FROM 'C:/Users/alexa/Documents/nyc-taxi/**/*.parquet'
SELECT
tip_amount, trip_distance, passenger_count,
vendor_id, payment_type, dropoff_at,
dayofweek(dropoff_at) AS dofw
WHERE year = 2012 AND month <= 3
'''
).df()
# convert dowf, vendor_id, payment_type to categorical
= time.time()
tic "dofw"] = nyc["dofw"].astype(int)
nyc["vendor_id"] = nyc["vendor_id"].astype("category")
nyc["payment_type"] = nyc["payment_type"].astype("category")
nyc[print(f"""
I am convering columns of type 'objects' to 'categories' and 'int'data types outside
of the regression, hence I am cheating a bit. This saves {np.round(time.time() - tic)} seconds.
"""
)# I am convering columns of type 'objects' to 'categories' and 'int'data types outside
# of the regression, hence I am cheating a bit. This saves 7.0 seconds.
= True
run if run:
# mock regression for JIT compilation
= pf.feols(
fit = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
fml = nyc.iloc[1:10_000],
data = False,
copy_data = False
store_data
)
import time
= time.time()
tic = pf.feols(
fit = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
fml = nyc,
data = False, # saves a few seconds
copy_data = False # saves a few second
store_data
)= time.time() - tic
passed print(f"Passed time is {np.round(passed)}.")
# Passed time is 22.
- Adds three new function arguments to
feols()
andfepois()
:copy_data
,store_data
, andfixef_tol
. - Adds support for frequency weights with the
weights_type
function argument.
import pyfixest as pf
= pf.get_data(N = 10000, model = "Fepois")
data = data[["Y", "X1", "f1"]].groupby(["Y", "X1", "f1"]).size().reset_index().rename(columns={0: "count"})
df_weighted "id"] = list(range(df_weighted.shape[0]))
df_weighted[
print("Dimension of the aggregated df:", df_weighted.shape)
print(df_weighted.head())
= pf.feols(
fit "Y ~ X1 | f1",
= data
data
)= pf.feols(
fit_weighted "Y ~ X1 | f1",
= df_weighted,
data = "count",
weights = "fweights"
weights_type
)= "b(se) \n (t) \n (p)") pf.etable([fit, fit_weighted], coef_fmt
Dimension of the aggregated df: (1278, 5)
Y X1 f1 count id
0 0.0 0.0 0.0 17 0
1 0.0 0.0 1.0 11 1
2 0.0 0.0 2.0 10 2
3 0.0 0.0 3.0 17 3
4 0.0 0.0 4.0 14 4
Y | ||
---|---|---|
(1) | (2) | |
coef | ||
X1 | 0.001(0.012) (0.092) (0.927) |
0.001(0.012) (0.092) (0.927) |
fe | ||
f1 | x | x |
stats | ||
Observations | 9997 | 9997 |
S.E. type | by: f1 | by: f1 |
R2 | 0.011 | - |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient(Std. Error) (t-stats) (p-value) |
- Bugfix: Wild Cluster Bootstrap Inference with Weights would compute unweighted standard errors. Sorry about that! WLS is not supported for the WCB.
- Adds support for CRV3 inference with weights.
PyFixest 0.18.0
- Large Refactoring of Interal Processing of Model Formulas, in particular
FixestFormulaParser
andmodel_matrix_fixest
. As a results, the code should be cleaner and more robust. - Thanks to the refactoring, we can now bump the required
formulaic
version to the stable1.0.0
release. - The
fml
argument ofmodel_matrix_fixest
is deprecated. Instead,model_matrix_fixest
now asks for aFixestFormula
, which is essentially a dictionary with information on model specifications like a first stage formula (if applicable), dependent variables, fixed effects, etc. - Additionally,
model_matrix_fixest
now returns a dictionary instead of a tuple. - Brings back fixed effects reference setting via
i(var1, var2, ref)
syntax. Deprecates thei_ref1
,i_ref2
function arguments. I.e. it is again possible to e.g. run
import pyfixest as pf
= pf.get_data()
data
= pf.feols("Y ~ i(f1, X2)", data=data)
fit1 0:8] fit1.coef()[
Via the ref
syntax, via can set the reference level:
= pf.feols("Y ~ i(f1, X2, ref = 1)", data=data)
fit2 0:8] fit2.coef()[
PyFixest 0.17.0
Restructures the codebase and reorganizes how users can interact with the
pyfixest
API. It is now recommended to usepyfixest
in the following way:import numpy as np import pyfixest as pf = pf.get_data() data "D"] = data["X1"] > 0 data[= pf.feols("Y ~ D + f1", data = data) fit fit.tidy()
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5% Coefficient Intercept 0.778849 0.170261 4.574437 0.000005 0.444737 1.112961 D -1.402617 0.152224 -9.214140 0.000000 -1.701335 -1.103899 f1 0.004774 0.008058 0.592508 0.553645 -0.011038 0.020587 The update should not inroduce any breaking changes. Thanks to @Wenzhi-Ding for the PR!
Adds support for simultaneous confidence intervals via a multiplier bootstrap. Thanks to @apoorvalal for the contribution!
= True) fit.confint(joint
2.5% 97.5% Intercept 0.381714 1.175984 D -1.757681 -1.047552 f1 -0.014021 0.023569 Adds support for the causal cluster variance estimator by Abadie et al. (QJE, 2023) for OLS via the
.ccv()
method.= "D", cluster = "group_id") fit.ccv(treatment
/home/runner/work/pyfixest/pyfixest/pyfixest/estimation/feols_.py:1381: UserWarning: The initial model was not clustered. CRV1 inference is computed and stored in the model object. warnings.warn(
Estimate Std. Error t value Pr(>|t|) 2.5% 97.5% CCV -1.4026168622179929 0.226959 -6.180034 0.000008 -1.879441 -0.925793 CRV1 -1.402617 0.205132 -6.837621 0.000002 -1.833584 -0.97165
PyFixest 0.16.0
- Adds multiple quality of life improvements for developers, thanks to NKeleher.
- Adds more options to customize
etable()
output thanks to Wenzhi-Ding. - Implements Romano-Wolf and Bonferroni corrections for multiple testing in the
multcomp
module.
PyFixest 0.15.
- Adds support for weighted least squares for
feols()
. - Reduces testing time drastically by running tests on fewer random data samples. Qualitatively, the set of test remains identical.
- Some updates for future
pandas
compatibility.
PyFixest 0.14.0
- Moves the documentation to quartodoc.
- Changes all docstrings to
numpy
format. - Difference-in-differences estimation functions now need to be imported via the
pyfixest.did.estimation
module:
from pyfixest.did.estimation import did2s, lpdid, event_study
PyFixest 0.13.5
- Fixes a bug that lead to incorrect results when the dependent variable and all covariates (excluding the fixed effects) where integers.
PyFixest 0.13.4
- Fixes a bug in
etable()
with IV’s that occurred becausefeols()
does not report R2 statistics for IVs.
PyFixest 0.13.2
- Fixes a bug in
etable()
and a warning infixest_model_matrix
that arose with higherpandas
versions. Thanks to @aeturrell for reporting!
PyFixest 0.13.0
New Features
- Introduces a new
pyfixest.did
module which contains routines for Difference-in-Differences estimation. - Introduces support for basic versions of the local projections DiD estimator following Dube et al (2023)
- Adds a new vignette for Difference-in-Differences estimation.
- Reports R2 values in
etable()
.
PyFixest 0.12.0
Enhancements:
- Good performance improvements for singleton fixed effects detection. Thanks to @styfenschaer for the PR! See #229.
- Uses the r2u project for installing R and R packages on github actions, with great performance improvements.
- Allows to pass
polars
data frames tofeols()
,fepois()
andpredict()
. #232. Thanks to @vincentarelbundock for the suggestion!
Bug Fixes:
- Missing variables in features were not always handled correctly in
predict()
withnewdata
notNone
in the presence of missing data, which would lead to an error. See #246 for details. - Categorical variables were not always handled correctly in
predict()
withnewdata
notNone
, because the number of fixed effects levels innewdata
might be smaller than indata
. In consequence, some levels were not found, which lead to an error. See #245 for details. Thanks to @jiafengkevinchen for the pointer! - Multicollinearity checks for over-identified IV was not implemented correctly, which lead to a dimension error. See #236 for details. Thanks to @jiafengkevinchen for the pointer!
- The number of degrees of freedom
k
was computed incorrectly if columns were dropped from the design matrixX
in the presence of multicollinearity. See #235 for details. Thanks to @jiafengkevinchen for the pointer! - If all variables were dropped due to multicollinearity, an unclear and imprecise error message was produced. See #228 for details. Thanks to @manferdinig for the pointer!
- If selection
fixef_rm = 'singleton'
,feols()
andfepois()
would fail, which has been fixed. #192
Dependency Requirements
- For now, sets
formulaic
versions to be0.6.6
or lower as version1.0.0
seems to have introduced a problem with thei()
operator, See #244 for details. - Drops dependency on
pyhdfe
.
PyFixest 0.11.1
- Fixes some bugs around the computation of R-squared values (see issue #103).
- Reports R-squared values again when calling
.summary()
.
PyFixest 0.11.0
- Significant speedups for CRV1 inference.
PyFixest 0.10.12
Fixes a small bug with the separation check for poisson regression #138.
PyFixest 0.10.11
Fixes bugs with i(var1, var2) syntax introduced with PyFixest 0.10.10.
PyFixest 0.10.10
Fixes a bug with variable interactions via i(var)
syntax. See issue #221.
PyFixest 0.10.9
Makes etable()
prettier and more informative.
PyFixest 0.10.8
Breaking changes
Reference levels for the i()
formula syntax can no longer be set within the formula, but need to be added via the i_ref1
function argument to either feols()
and fepois()
.
New feature
A dids2()
function is added, which implements the 2-stage difference-in-differences procedure à la Gardner and follows the syntax of @kylebutts did2s R package.
from pyfixest.did.did import did2s
from pyfixest.estimation import feols
from pyfixest.visualize import iplot
import pandas as pd
import numpy as np
= pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/master/pyfixest/did/data/df_het.csv")
df_het
= did2s(
fit
df_het,= "dep_var",
yname = "~ 0 | state + year",
first_stage = "~i(rel_year)",
second_stage = "treat",
treatment = "state",
cluster = [-1.0, np.inf],
i_ref1
)
= feols(
fit_twfe "dep_var ~ i(rel_year) | state + year",
df_het,= [-1.0, np.inf]
i_ref1
)
=False, figsize = (900, 400), title = "TWFE vs DID2S") iplot([fit, fit_twfe], coord_flip
PyFixest 0.10.7
- Adds basic support for event study estimation via two-way fixed effects and Gardner’s two-stage “Did2s” approach. This is a beta version and experimental. Further updates (i.e. proper event studies vs “only” ATTs) and a more flexible did2s front end will follow in future releases.
%load_ext autoreload
%autoreload 2
from pyfixest.did.did import event_study
import pyfixest as pf
import pandas as pd
= pd.read_csv("pyfixest/did/data/df_het.csv")
df_het
= event_study(
fit_twfe = df_het,
data = "dep_var",
yname = "state",
idname= "year",
tname = "g",
gname = "twfe"
estimator
)
= event_study(
fit_did2s = df_het,
data = "dep_var",
yname = "state",
idname= "year",
tname = "g",
gname = "did2s"
estimator
)
pf.etable([fit_twfe, fit_did2s])# | Coefficient | est1 | est2 |
# |:--------------|:-----------------|:-----------------|
# | ATT | 2.135*** (0.044) | 2.152*** (0.048) |
# Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
PyFixest 0.10.6
- Adds an
etable()
function that outputs markdown, latex or a pd.DataFrame.
PyFixest 0.10.5
- Fixes a big in IV estimation that would trigger an error. See here for details. Thanks to @aeturrell for reporting!
PyFixest 0.10.4
- Implements a custom function to drop singleton fixed effects.
- Additional small performance improvements.
PyFixest 0.10.3
- Allows for white space in the multiway clustering formula.
- Adds documentation for multiway clustering.
PyFixest 0.10.2
- Adds support for two-way clustering.
- Adds support for CRV3 inference for Poisson regression.
PyFixest 0.10.1
- Adapts the internal fixed effects demeaning criteron to match `PyHDFE’s default.
- Adds Styfen as coauthor.
PyFixest 0.10
- Multiple performance improvements.
- Most importantly, implements a custom demeaning algorithm in
numba
- thanks to Styfen Schaer (@styfenschaer), which leads to performance improvements of 5x or more:
%load_ext autoreload
%autoreload 2
import numpy as np
import time
import pyhdfe
from pyfixest.demean import demean
1238)
np.random.seed(= 10_000_000
N = np.random.normal(0, 1, 10*N).reshape((N,10))
x = np.random.choice(list(range(1000)), N).reshape((N,1))
f1 = np.random.choice(list(range(1000)), N).reshape((N,1))
f2
= np.concatenate((f1, f2), axis = 1)
flist = np.ones(N)
weights
= pyhdfe.create(flist)
algorithm
= time.time()
start_time = algorithm.residualize(x)
res_pyhdfe = time.time()
end_time print(end_time - start_time)
# 26.04527711868286
= time.time()
start_time = demean(x, flist, weights, tol = 1e-10)
res_pyfixest, success # Calculate the execution time
= time.time()
end_time print(end_time - start_time)
#4.334428071975708
np.allclose(res_pyhdfe , res_pyfixest)# True
PyFixest 0.9.11
- Bump required
formulaic
version to0.6.5
. - Stop copying the data frame in
fixef()
.
PyFixest 0.9.10
- Fixes a big in the
wildboottest
method (see #158). - Allows to run a wild bootstrap after fixed effect estimation.
PyFixest 0.9.9
- Adds support for
wildboottest
for Python3.11
.
PyFixest 0.9.8
- Fixes a couple more bugs in the
predict()
andfixef()
methods. - The
predict()
argumentdata
is renamed tonewdata
.
PyFixest 0.9.7
Fixes a bug in predict()
produced when multicollinear variables are dropped.
PyFixest 0.9.6
Improved Collinearity handling. See #145
PyFixest 0.9.5
- Moves plotting from
matplotlib
tolets-plot
. - Fixes a few minor bugs in plotting and the
fixef()
method.
PyFixest 0.9.1
Breaking API changes
It is no longer required to initiate an object of type Fixest
prior to running [Feols(/reference/Feols.qmd) or fepois
. Instead, you can now simply use feols()
and fepois()
as functions, just as in fixest
. Both function can be found in an estimation
module and need to obtain a pd.DataFrame
as a function argument:
from pyfixest.estimation import fixest, fepois
from pyfixest.utils import get_data
= get_data()
data = feols("Y ~ X1 | f1", data = data, vcov = "iid") fit
Calling feols()
will return an instance of class [Feols(/reference/Feols.qmd), while calling fepois()
will return an instance of class Fepois
. Multiple estimation syntax will return an instance of class FixestMulti
.
Post processing works as before via .summary()
, .tidy()
and other methods.
New Features
A summary function allows to compare multiple models:
from pyfixest.summarize import summary
= feols("Y ~ X1 + X2| f1", data = data, vcov = "iid")
fit2 summary([fit, fit2])
Visualization is possible via custom methods (.iplot()
& .coefplot()
), but a new module allows to visualize a list of [Feols(/reference/Feols.qmd) and/or Fepois
instances:
from pyfixest.visualize import coefplot, iplot
coefplot([fit, fit2])
The documentation has been improved (though there is still room for progress), and the code has been cleaned up a bit (also lots of room for improvements).