import numpy as np
import pandas as pd
import pylatex as pl # for the latex table; note: not a dependency of pyfixest - needs manual installation
from great_tables import loc, style
from IPython.display import FileLink, display
import pyfixest as pf
%load_ext autoreload
%autoreload 2
= pf.get_data()
data
= pf.feols("Y ~ X1 + X2 | f1", data=data)
fit1 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=data)
fit2 = pf.feols("Y ~ X1 *X2 | f1 + f2", data=data)
fit3 = pf.feols("Y2 ~ X1 + X2 | f1", data=data)
fit4 = pf.feols("Y2 ~ X1 + X2 | f1 + f2", data=data)
fit5 = pf.feols("Y2 ~ X1 *X2 | f1 + f2", data=data) fit6
Regression Tables via pf.etable()
Table Layout with PyFixest
Pyfixest comes with functions to generate publication-ready tables. Regression tables are generated with pf.etable()
, which can output different formats, for instance using the Great Tables package or generating formatted LaTex Tables using booktabs. There are also further functions pf.dtable()
to display descriptive statistics and pf.make_table()
generating formatted tables from pandas dataframes in the same layout.
To begin, we load some libraries and fit a set of regression models.
Basic Usage
We can compare all regression models via the pyfixest-internal pf.etable()
function:
pf.etable([fit1, fit2, fit3, fit4, fit5, fit6])
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
X2 | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
X1:X2 | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
You can also estimate and display multiple regressions with one line of code using the (py)fixest stepwise notation:
"Y+Y2~csw(X1,X2,X1:X2)", data=data)) pf.etable(pf.feols(
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -1.000*** (0.085) |
-0.993*** (0.082) |
-0.992*** (0.082) |
-1.322*** (0.215) |
-1.316*** (0.214) |
-1.316*** (0.215) |
X2 | -0.176*** (0.022) |
-0.197*** (0.036) |
-0.133* (0.057) |
-0.132 (0.095) |
||
X1:X2 | 0.020 (0.027) |
-0.001 (0.071) |
||||
Intercept | 0.919*** (0.112) |
0.889*** (0.108) |
0.888*** (0.108) |
1.064*** (0.283) |
1.042*** (0.283) |
1.042*** (0.283) |
stats | ||||||
Observations | 998 | 998 | 998 | 999 | 999 | 999 |
S.E. type | iid | iid | iid | iid | iid | iid |
R2 | 0.123 | 0.177 | 0.177 | 0.037 | 0.042 | 0.042 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Keep and drop variables
etable
allows us to do a few things out of the box. For example, we can only keep the variables that we’d like, which keeps all variables that fit the provided regex match.
="X1") pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], keep
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
X1:X2 | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
We can use the exact_match
argument to select a specific set of variables:
=["X1", "X2"], exact_match=True) pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], keep
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
X2 | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
We can also easily drop variables via the drop
argument:
=["X1"]) pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], drop
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X2 | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Hide fixed effects or SE-type rows
We can hide the rows showing the relevant fixed effects and those showing the S.E. type by setting show_fe=False
and show_setype=False
(for instance when the set of fixed effects or the estimation method for the std. errors is the same for all models and you want to describe this in the text or table notes rather than displaying it in the table).
=False, show_se_type=False) pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], show_fe
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
X2 | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
X1:X2 | 0.011 (0.018) |
-0.041 (0.081) |
||||
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Display p-values or confidence intervals
By default, pf.etable()
reports standard errors. But we can also ask to output p-values or confidence intervals via the coef_fmt
function argument.
="b \n (se) \n [p]") pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], coef_fmt
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.950*** (0.067) [0.000] |
-0.924*** (0.061) [0.000] |
-0.924*** (0.061) [0.000] |
-1.267*** (0.174) [0.000] |
-1.232*** (0.192) [0.000] |
-1.231*** (0.192) [0.000] |
X2 | -0.174*** (0.018) [0.000] |
-0.174*** (0.015) [0.000] |
-0.185*** (0.025) [0.000] |
-0.131** (0.042) [0.005] |
-0.118** (0.042) [0.008] |
-0.074 (0.104) [0.482] |
X1:X2 | 0.011 (0.018) [0.565] |
-0.041 (0.081) [0.618] |
||||
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) [p-value] |
Significance levels and rounding
Additionally, we can also overwrite the defaults for the reported significance levels and control the rounding of results via the signif_code
and digits
function arguments:
=[0.01, 0.05, 0.1], digits=5) pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], signif_code
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.94953*** (0.06652) |
-0.92405*** (0.06093) |
-0.92417*** (0.06094) |
-1.26655*** (0.17359) |
-1.23153*** (0.19228) |
-1.23100*** (0.19167) |
X2 | -0.17423*** (0.01840) |
-0.17411*** (0.01461) |
-0.18550*** (0.02516) |
-0.13056*** (0.04239) |
-0.11767*** (0.04152) |
-0.07369 (0.10356) |
X1:X2 | 0.01057 (0.01818) |
-0.04082 (0.08093) |
||||
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.48899 | 0.65904 | 0.65916 | 0.12017 | 0.17151 | 0.17180 |
Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error) |
Other output formats
By default, pf.etable()
returns a GT object (see the Great Tables package), but you can also opt to dataframe, markdown, or latex output via the type
argument.
# Pandas styler output:
pf.etable(
[fit1, fit2, fit3, fit4, fit5, fit6],=[0.01, 0.05, 0.1],
signif_code=5,
digits="b (se)",
coef_fmttype="df",
)
est1 | est2 | est3 | est4 | est5 | est6 | |
---|---|---|---|---|---|---|
depvar | Y | Y | Y | Y2 | Y2 | Y2 |
X1 | -0.94953*** (0.06652) | -0.92405*** (0.06093) | -0.92417*** (0.06094) | -1.26655*** (0.17359) | -1.23153*** (0.19228) | -1.23100*** (0.19167) |
X2 | -0.17423*** (0.01840) | -0.17411*** (0.01461) | -0.18550*** (0.02516) | -0.13056*** (0.04239) | -0.11767*** (0.04152) | -0.07369 (0.10356) |
X1:X2 | 0.01057 (0.01818) | -0.04082 (0.08093) | ||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.48899 | 0.65904 | 0.65916 | 0.12017 | 0.17151 | 0.17180 |
# Markdown output:
pf.etable(
[fit1, fit2, fit3, fit4, fit5, fit6],=[0.01, 0.05, 0.1],
signif_code=5,
digitstype="md",
)
index est1 est2 est3 est4 est5 est6
------------ ------------ ------------ ------------ ------------ ------------ ------------
depvar Y Y Y Y2 Y2 Y2
------------------------------------------------------------------------------------------------
X1 -0.94953*** -0.92405*** -0.92417*** -1.26655*** -1.23153*** -1.23100***
(0.06652) (0.06093) (0.06094) (0.17359) (0.19228) (0.19167)
X2 -0.17423*** -0.17411*** -0.18550*** -0.13056*** -0.11767*** -0.07369
(0.01840) (0.01461) (0.02516) (0.04239) (0.04152) (0.10356)
X1:X2 0.01057 -0.04082
(0.01818) (0.08093)
------------------------------------------------------------------------------------------------
f2 - x x - x x
f1 x x x x x x
------------------------------------------------------------------------------------------------
Observations 997 997 997 998 998 998
S.E. type by: f1 by: f1 by: f1 by: f1 by: f1 by: f1
R2 0.48899 0.65904 0.65916 0.12017 0.17151 0.17180
------------------------------------------------------------------------------------------------
To obtain latex output use format = "tex"
. If you want to save the table as a tex file, you can use the filename=
argument to specify the respective path where it should be saved. If you want the latex code to be displayed in the notebook, you can use the print_tex=True
argument. Etable will use latex packages booktabs
, threeparttable
and makecell
for the table layout, so don’t forget to include these packages in your latex document.
# LaTex output (include latex packages booktabs, threeparttable, and makecell in your document):
= pf.etable(
tab
[fit1, fit2, fit3, fit4, fit5, fit6],=[0.01, 0.05, 0.1],
signif_code=2,
digitstype="tex",
=True,
print_tex )
The following code generates a pdf including the regression table which you can display clicking on the link below the cell:
## Use pylatex to create a tex file with the table
def make_pdf(tab, file):
"Create a PDF document with tex table."
= pl.Document()
doc "booktabs"))
doc.packages.append(pl.Package("threeparttable"))
doc.packages.append(pl.Package("makecell"))
doc.packages.append(pl.Package(
with (
"A PyFixest LateX Table")),
doc.create(pl.Section(="htbp")) as table,
doc.create(pl.Table(position
):
table.append(pl.NoEscape(tab))
file, clean_tex=False)
doc.generate_pdf(
# Compile latex to pdf & display a button with the hyperlink to the pdf
# requires tex installation
= False
run if run:
"latexdocs/SampleTableDoc")
make_pdf(tab, "latexdocs/SampleTableDoc.pdf")) display(FileLink(
Rename variables
You can also rename variables if you want to have a more readable output. Just pass a dictionary to the labels
argument. Note that interaction terms will also be relabeled using the specified labels for the interacted variables (if you want to manually relabel an interaction term differently, add it to the dictionary).
= {
labels "Y": "Wage",
"Y2": "Wealth",
"X1": "Age",
"X2": "Years of Schooling",
"f1": "Industry",
"f2": "Year",
}
=labels) pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], labels
Wage | Wealth | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year | - | x | x | - | x | x |
Industry | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
If you want to label the rows indicating the inclusion of fixed effects not with the variable label but with a custom label, you can pass on a separate dictionary to the felabels
argument.
pf.etable(
[fit1, fit2, fit3, fit4, fit5, fit6],=labels,
labels={"f1": "Industry Fixed Effects", "f2": "Year Fixed Effects"},
felabels )
Wage | Wealth | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year Fixed Effects | - | x | x | - | x | x |
Industry Fixed Effects | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Custom model headlines
You can also add custom headers for each model by passing a list of strings to the model_headers
argument.
pf.etable(
[fit1, fit2, fit3, fit4, fit5, fit6],=labels,
labels=["US", "China", "EU", "US", "China", "EU"],
model_heads )
Wage | Wealth | |||||
---|---|---|---|---|---|---|
US | China | EU | US | China | EU | |
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year | - | x | x | - | x | x |
Industry | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Or change the ordering of headlines having headlines first and then dependent variables using the head_order
argument. “hd” stands for headlines then dependent variables, “dh” for dependent variables then headlines. Assigning “d” or “h” can be used to only show dependent variables or only headlines. When head_order=“” only model numbers are shown.
pf.etable(
[fit1, fit4, fit2, fit5, fit3, fit6],=labels,
labels=["US", "US", "China", "China", "EU", "EU"],
model_heads="hd",
head_order )
US | China | EU | ||||
---|---|---|---|---|---|---|
Wage | Wealth | Wage | Wealth | Wage | Wealth | |
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-1.267*** (0.174) |
-0.924*** (0.061) |
-1.232*** (0.192) |
-0.924*** (0.061) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.131** (0.042) |
-0.174*** (0.015) |
-0.118** (0.042) |
-0.185*** (0.025) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year | - | - | x | x | x | x |
Industry | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 998 | 997 | 998 | 997 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.120 | 0.659 | 0.172 | 0.659 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Remove the dependent variables from the headers:
pf.etable(
[fit1, fit4, fit2, fit5, fit3, fit6],=labels,
labels=["US", "US", "China", "China", "EU", "EU"],
model_heads="",
head_order )
(1) | (2) | (3) | (4) | (5) | (6) | |
---|---|---|---|---|---|---|
coef | ||||||
Age | -0.950*** (0.067) |
-1.267*** (0.174) |
-0.924*** (0.061) |
-1.232*** (0.192) |
-0.924*** (0.061) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.131** (0.042) |
-0.174*** (0.015) |
-0.118** (0.042) |
-0.185*** (0.025) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year | - | - | x | x | x | x |
Industry | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 998 | 997 | 998 | 997 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.120 | 0.659 | 0.172 | 0.659 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Further custom model information
You can add further custom model statistics/information to the bottom of the table by using the custom_stats
argument to which you pass a dictionary with the name of the row and lists of values. The length of the lists must be equal to the number of models.
pf.etable(
[fit1, fit2, fit3, fit4, fit5, fit6],=labels,
labels={
custom_model_stats"Number of Clusters": [42, 42, 42, 37, 37, 37],
"Additional Info": ["A", "A", "B", "B", "C", "C"],
}, )
Wage | Wealth | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year | - | x | x | - | x | x |
Industry | x | x | x | x | x | x |
stats | ||||||
Number of Clusters | 42 | 42 | 42 | 37 | 37 | 37 |
Additional Info | A | A | B | B | C | C |
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Custom table notes
You can replace the default table notes with your own notes using the notes
argument.
= "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."
mynotes
pf.etable(
[fit1, fit4, fit2, fit5, fit3, fit6],=labels,
labels=["US", "US", "China", "China", "EU", "EU"],
model_heads="hd",
head_order=mynotes,
notes )
US | China | EU | ||||
---|---|---|---|---|---|---|
Wage | Wealth | Wage | Wealth | Wage | Wealth | |
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-1.267*** (0.174) |
-0.924*** (0.061) |
-1.232*** (0.192) |
-0.924*** (0.061) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.131** (0.042) |
-0.174*** (0.015) |
-0.118** (0.042) |
-0.185*** (0.025) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year | - | - | x | x | x | x |
Industry | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 998 | 997 | 998 | 997 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.120 | 0.659 | 0.172 | 0.659 | 0.172 |
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. |
Publication-ready LaTex tables
With few lines of code you thus obtain a publication-ready latex table:
= pf.etable(
tab
[fit1, fit4, fit2, fit5, fit3, fit6],=labels,
labels=["US", "US", "China", "China", "EU", "EU"],
model_heads="hd",
head_ordertype="tex",
=mynotes,
notes=True,
show_fe=False,
show_se_type={
custom_model_stats"Number of Clusters": [42, 42, 42, 37, 37, 37],
},
)
# Compile latex to pdf & display a button with the hyperlink to the pdf
= False
run if run:
"latexdocs/SampleTableDoc2")
make_pdf(tab, "latexdocs/SampleTableDoc2.pdf")) display(FileLink(
Rendering Tables in Quarto
When you use quarto you can include latex tables generated by pyfixest when rendering the qmd file as pdf. Just specify output: asis
in the code block options of the respective chunk and print the LaTex string returned by etable. Don’t forget to include the \usepackage
commands for necessary latex packages in the YAML block. Here you find a sample qmd file.
When you render either a jupyter notebook or qmd file to html it is advisable to turn html-table-processing off in quarto as otherwise quarto adds further formatting which alters how your tables look like. You can do this in a raw cell at the top of your document.
---
format:
html:
html-table-processing: none
---
Descriptive Statistics via pf.dtable()
The function pf.dtable()
allows to display descriptive statistics for a set of variables in the same layout.
Basic Usage of dtable
Specify the variables you want to display the descriptive statistics for. You can also use a dictionary to rename the variables and add a caption.
pf.dtable(
data,vars=["Y", "Y2", "X1", "X2"],
=labels,
labels="Descriptive statistics",
caption=2,
digits )
Descriptive statistics | |||
N | Mean | Std. Dev. | |
---|---|---|---|
Wage | 999 | -0.13 | 2.30 |
Wealth | 1000 | -0.31 | 5.58 |
Age | 999 | 1.04 | 0.81 |
Years of Schooling | 1000 | -0.13 | 3.05 |
Choose the set of statistics to be displayed with stats
. You can use any pandas aggregation functions.
pf.dtable(
data,vars=["Y", "Y2", "X1", "X2"],
=["count", "mean", "std", "min", "max"],
stats=labels,
labels="Descriptive statistics",
caption )
Descriptive statistics | |||||
N | Mean | Std. Dev. | Min | Max | |
---|---|---|---|---|---|
Wage | 999 | -0.13 | 2.30 | -6.54 | 6.91 |
Wealth | 1000 | -0.31 | 5.58 | -16.97 | 17.16 |
Age | 999 | 1.04 | 0.81 | 0.00 | 2.00 |
Years of Schooling | 1000 | -0.13 | 3.05 | -9.67 | 10.99 |
Summarize by characteristics in columns and rows
You can summarize by characteristics using the bycol
argument when groups are to be displayed in columns. When the number of observations is the same for all variables in a group, you can also opt to display the number of observations only once for each group byin a separate line at the bottom of the table with counts_row_below==True
.
# Generate some categorial data
"country"] = np.random.choice(["US", "EU"], data.shape[0])
data["occupation"] = np.random.choice(["Blue collar", "White collar"], data.shape[0])
data[
# Drop nan values to have balanced data
=True)
data.dropna(inplace
pf.dtable(
data,vars=["Y", "Y2", "X1", "X2"],
=labels,
labels=["country", "occupation"],
bycol=["count", "mean", "std"],
stats="Descriptive statistics",
caption={"count": "Number of observations"},
stats_labels=True,
counts_row_below )
Descriptive statistics | ||||||||
EU | US | |||||||
---|---|---|---|---|---|---|---|---|
Blue collar | White collar | Blue collar | White collar | |||||
Mean | Std. Dev. | Mean | Std. Dev. | Mean | Std. Dev. | Mean | Std. Dev. | |
stats | ||||||||
Wage | -0.13 | 2.41 | 0.13 | 2.27 | -0.04 | 2.31 | -0.43 | 2.22 |
Wealth | -1.02 | 5.35 | 0.03 | 5.53 | 0.38 | 5.68 | -0.62 | 5.72 |
Age | 1.06 | 0.77 | 0.99 | 0.81 | 1.00 | 0.81 | 1.12 | 0.83 |
Years of Schooling | 0.10 | 3.07 | -0.25 | 3.11 | -0.20 | 2.96 | -0.16 | 3.05 |
nobs | ||||||||
Number of observations | 245 | 248 | 235 | 269 | ||||
You can also use custom aggregation functions to compute further statistics or affect how statistics are presented. Pyfixest provides two such functions mean_std
and mean_newline_std
which compute the mean and standard deviation and display both the same cell (either with line break between them or not). This allows to have more compact tables when you want to show statistics for many characteristcs in the columns.
You can also hide the display of the statistics labels in the header with hide_stats_labels=True
. In that case a table note will be added naming the statistics displayed using its label (if you have not provided a custom note).
pf.dtable(
data,vars=["Y", "Y2", "X1", "X2"],
=labels,
labels=["country", "occupation"],
bycol=["mean_newline_std", "count"],
stats="Descriptive statistics",
caption={"count": "Number of observations"},
stats_labels=True,
counts_row_below=True,
hide_stats )
Descriptive statistics | ||||
EU | US | |||
---|---|---|---|---|
Blue collar | White collar | Blue collar | White collar | |
stats | ||||
Wage | -0.13 (2.41) |
0.13 (2.27) |
-0.04 (2.31) |
-0.43 (2.22) |
Wealth | -1.02 (5.35) |
0.03 (5.53) |
0.38 (5.68) |
-0.62 (5.72) |
Age | 1.06 (0.77) |
0.99 (0.81) |
1.00 (0.81) |
1.12 (0.83) |
Years of Schooling | 0.10 (3.07) |
-0.25 (3.11) |
-0.20 (2.96) |
-0.16 (3.05) |
nobs | ||||
Number of observations | 245 | 248 | 235 | 269 |
Note: Displayed statistics are Mean (Std. Dev.). |
You can also split by characteristics in both columns and rows. Note that you can only use one grouping variable in rows, but several in columns (as shown above).
pf.dtable(
data,vars=["Y", "Y2", "X1", "X2"],
=labels,
labels=["country"],
bycol="occupation",
byrow=["count", "mean", "std"],
stats="Descriptive statistics",
caption )
Descriptive statistics | ||||||
EU | US | |||||
---|---|---|---|---|---|---|
N | Mean | Std. Dev. | N | Mean | Std. Dev. | |
Blue collar | ||||||
Wage | 245 | -0.13 | 2.41 | 235 | -0.04 | 2.31 |
Wealth | 245 | -1.02 | 5.35 | 235 | 0.38 | 5.68 |
Age | 245 | 1.06 | 0.77 | 235 | 1.00 | 0.81 |
Years of Schooling | 245 | 0.10 | 3.07 | 235 | -0.20 | 2.96 |
White collar | ||||||
Wage | 248 | 0.13 | 2.27 | 269 | -0.43 | 2.22 |
Wealth | 248 | 0.03 | 5.53 | 269 | -0.62 | 5.72 |
Age | 248 | 0.99 | 0.81 | 269 | 1.12 | 0.83 |
Years of Schooling | 248 | -0.25 | 3.11 | 269 | -0.16 | 3.05 |
And you can again export descriptive statistics tables also to LaTex:
= pf.dtable(
dtab
data,vars=["Y", "Y2", "X1", "X2"],
=labels,
labels=["country"],
bycol="occupation",
byrow=["count", "mean", "std"],
statstype="tex",
)
= False
run if run:
"latexdocs/SampleTableDoc3")
make_pdf(dtab, "latexdocs/SampleTableDoc3.pdf")) display(FileLink(
Table Layout for DataFrames with pf.make_table()
pf.make_table()
is called by pf.etable()
and pf.dtable()
to generate the tables in “gt” and “tex” format. But you can also use it directly to generate tables in the same layout from other pandas dataframes.
Basic Usage of make_table
= pd.DataFrame(np.random.randn(4, 4).round(2), columns=["A", "B", "C", "D"])
df
# Make Booktabs style table
=df, caption="This is a caption", notes="These are notes") pf.make_table(df
This is a caption | ||||
A | B | C | D | |
---|---|---|---|---|
0 | 0.98 | 3.24 | 0.47 | 1.4 |
1 | -0.37 | -0.2 | -0.75 | 0.0 |
2 | -0.26 | 0.91 | -0.29 | 0.4 |
3 | -1.25 | -0.08 | 0.26 | 0.14 |
These are notes |
Mutiindex DataFrames
When the respective dataframe has a mutiindex for the columns, columns spanners are generated from the index. The row index can also be a multiindex (of at most two levels). In this case the first index level is used to generate group rows (for instance using the index name as headline and separating the groups by a horizontal line) and the second index level is used to generate the row labels.
# Create a multiindex dataframe with random data
= pd.MultiIndex.from_tuples(
row_index
["Group 1", "Variable 1"),
("Group 1", "Variable 2"),
("Group 1", "Variable 3"),
("Group 2", "Variable 4"),
("Group 2", "Variable 5"),
("Group 3", "Variable 6"),
(
]
)
= pd.MultiIndex.from_product([["A", "B"], ["X", "Y"], ["High", "Low"]])
col_index = pd.DataFrame(np.random.randn(6, 8).round(3), index=row_index, columns=col_index)
df
=df, caption="This is a caption", notes="These are notes") pf.make_table(df
This is a caption | ||||||||
A | B | |||||||
---|---|---|---|---|---|---|---|---|
X | Y | X | Y | |||||
High | Low | High | Low | High | Low | High | Low | |
Group 1 | ||||||||
Variable 1 | 0.408 | -1.374 | 1.978 | -0.279 | 0.669 | -1.898 | -0.293 | -0.294 |
Variable 2 | 0.624 | -0.189 | 1.039 | 1.474 | -0.31 | 1.563 | 0.188 | 0.675 |
Variable 3 | -1.8 | 0.463 | 0.119 | 1.157 | 0.357 | 0.179 | -0.095 | -1.205 |
Group 2 | ||||||||
Variable 4 | -0.164 | 0.28 | -0.251 | 1.021 | 0.571 | -0.227 | -0.272 | 1.251 |
Variable 5 | -1.472 | 0.089 | -0.751 | -1.83 | -0.329 | -0.204 | -2.019 | -1.371 |
Group 3 | ||||||||
Variable 6 | 1.524 | -0.811 | 1.433 | 0.694 | -1.267 | -0.004 | 0.237 | 0.35 |
These are notes |
You can also hide column group names: This just creates a table where variables on the second level of the row index are displayed in groups based on the first level separated by horizontal lines.
pf.make_table(=df, caption="This is a caption", notes="These are notes", rgroup_display=False
df=style.text(style="italic"), locations=loc.body(rows=[1, 5])) ).tab_style(style
This is a caption | ||||||||
A | B | |||||||
---|---|---|---|---|---|---|---|---|
X | Y | X | Y | |||||
High | Low | High | Low | High | Low | High | Low | |
Group 1 | ||||||||
Variable 1 | 0.408 | -1.374 | 1.978 | -0.279 | 0.669 | -1.898 | -0.293 | -0.294 |
Variable 2 | 0.624 | -0.189 | 1.039 | 1.474 | -0.31 | 1.563 | 0.188 | 0.675 |
Variable 3 | -1.8 | 0.463 | 0.119 | 1.157 | 0.357 | 0.179 | -0.095 | -1.205 |
Group 2 | ||||||||
Variable 4 | -0.164 | 0.28 | -0.251 | 1.021 | 0.571 | -0.227 | -0.272 | 1.251 |
Variable 5 | -1.472 | 0.089 | -0.751 | -1.83 | -0.329 | -0.204 | -2.019 | -1.371 |
Group 3 | ||||||||
Variable 6 | 1.524 | -0.811 | 1.433 | 0.694 | -1.267 | -0.004 | 0.237 | 0.35 |
These are notes |
Custom Styling with Great Tables
You can use the rich set of methods offered by Great Tables to further customize the table display when the type is “gt”.
Example Styling
(
pf.etable([fit1, fit2, fit3, fit4, fit5, fit6])
.tab_options(="cornsilk",
column_labels_background_color="whitesmoke",
stub_background_color
)
.tab_style(=style.fill(color="mistyrose"),
style=loc.body(columns="(3)", rows=["X2"]),
locations
) )
Y | Y2 | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
X1 | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
X2 | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
X1:X2 | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
f2 | - | x | x | - | x | x |
f1 | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
Defining Table Styles: Some Examples
You can easily define table styles that you can apply to all tables in your project. Just define a dictionary with the respective values for the tab options (see the Great Tables documentation) and use the style with .tab_options(**style_dict)
.
= {
style_print "table_font_size": "12px",
"heading_title_font_size": "12px",
"source_notes_font_size": "8px",
"data_row_padding": "3px",
"column_labels_padding": "3px",
"row_group_border_top_style": "hidden",
"table_body_border_top_style": "None",
"table_body_border_bottom_width": "1px",
"column_labels_border_top_width": "1px",
"table_width": "14cm",
}
= {
style_presentation "table_font_size": "16px",
"table_font_color_light": "white",
"table_body_border_top_style": "hidden",
"table_body_border_bottom_style": "hidden",
"heading_title_font_size": "18px",
"source_notes_font_size": "12px",
"data_row_padding": "3px",
"column_labels_padding": "6px",
"column_labels_background_color": "midnightblue",
"stub_background_color": "whitesmoke",
"row_group_background_color": "whitesmoke",
"table_background_color": "whitesmoke",
"heading_background_color": "white",
"source_notes_background_color": "white",
"column_labels_border_bottom_color": "white",
"column_labels_font_weight": "bold",
"row_group_font_weight": "bold",
"table_width": "18cm",
}
= pf.dtable(
t1
data,vars=["Y", "Y2", "X1", "X2"],
=["count", "mean", "std", "min", "max"],
stats=labels,
labels="Descriptive statistics",
caption
)
= pf.etable(
t2
[fit1, fit2, fit3, fit4, fit5, fit6],=labels,
labels=False,
show_se={"f1": "Industry Fixed Effects", "f2": "Year Fixed Effects"},
felabels="Regression results",
caption )
**style_print))
display(t1.tab_options(**style_print)) display(t2.tab_options(
Descriptive statistics | |||||
N | Mean | Std. Dev. | Min | Max | |
---|---|---|---|---|---|
Wage | 997 | -0.13 | 2.31 | -6.54 | 6.91 |
Wealth | 997 | -0.32 | 5.59 | -16.97 | 17.16 |
Age | 997 | 1.04 | 0.81 | 0.00 | 2.00 |
Years of Schooling | 997 | -0.13 | 3.05 | -9.67 | 10.99 |
Regression results | ||||||
Wage | Wealth | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year Fixed Effects | - | x | x | - | x | x |
Industry Fixed Effects | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |
= {
style_printDouble "table_font_size": "12px",
"heading_title_font_size": "12px",
"source_notes_font_size": "8px",
"data_row_padding": "3px",
"column_labels_padding": "3px",
"table_body_border_bottom_style": "double",
"column_labels_border_top_style": "double",
"column_labels_border_bottom_width": "0.5px",
"row_group_border_top_style": "hidden",
"table_body_border_top_style": "None",
"table_width": "14cm",
}**style_printDouble))
display(t1.tab_options(**style_printDouble)) display(t2.tab_options(
Descriptive statistics | |||||
N | Mean | Std. Dev. | Min | Max | |
---|---|---|---|---|---|
Wage | 997 | -0.13 | 2.31 | -6.54 | 6.91 |
Wealth | 997 | -0.32 | 5.59 | -16.97 | 17.16 |
Age | 997 | 1.04 | 0.81 | 0.00 | 2.00 |
Years of Schooling | 997 | -0.13 | 3.05 | -9.67 | 10.99 |
Regression results | ||||||
Wage | Wealth | |||||
---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |
coef | ||||||
Age | -0.950*** (0.067) |
-0.924*** (0.061) |
-0.924*** (0.061) |
-1.267*** (0.174) |
-1.232*** (0.192) |
-1.231*** (0.192) |
Years of Schooling | -0.174*** (0.018) |
-0.174*** (0.015) |
-0.185*** (0.025) |
-0.131** (0.042) |
-0.118** (0.042) |
-0.074 (0.104) |
Age × Years of Schooling | 0.011 (0.018) |
-0.041 (0.081) |
||||
fe | ||||||
Year Fixed Effects | - | x | x | - | x | x |
Industry Fixed Effects | x | x | x | x | x | x |
stats | ||||||
Observations | 997 | 997 | 997 | 998 | 998 | 998 |
S.E. type | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 | by: f1 |
R2 | 0.489 | 0.659 | 0.659 | 0.120 | 0.172 | 0.172 |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) |