Regression Tables

Workflow

Publication-ready tables with Great Tables or LaTeX booktabs — customize labels, significance stars, and output formats.

Table Layout with PyFixest

Migration Notice

Starting with pyfixest 0.41.0 (currently in development), the table functionality is powered by maketables. The pf.etable() API remains unchanged. pf.dtable() is deprecated (use DTable() directly) and pf.make_table() has been removed (use maketables.MTable() directly).

Pyfixest comes with functions to generate publication-ready tables. Regression tables are generated with pf.etable(), which can output different formats, for instance using the Great Tables package or generating formatted LaTex Tables using booktabs. Descriptive statistics tables can be created with DTable() and custom tables with maketables.MTable().

To begin, we load some libraries and fit a set of regression models.

import numpy as np
import pandas as pd
import pylatex as pl  # for the latex table; note: not a dependency of pyfixest - needs manual installation
from maketables import DTable
from great_tables import loc, style  # great_tables is used by maketables internally
from IPython.display import FileLink, display

import pyfixest as pf

%load_ext autoreload
%autoreload 2

data = pf.get_data()

fit1 = pf.feols("Y ~ X1 + X2 | f1", data=data)
fit2 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=data)
fit3 = pf.feols("Y ~ X1 *X2 | f1 + f2", data=data)
fit4 = pf.feols("Y2 ~ X1 + X2 | f1", data=data)
fit5 = pf.feols("Y2 ~ X1 + X2 | f1 + f2", data=data)
fit6 = pf.feols("Y2 ~ X1 *X2 | f1 + f2", data=data)

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Regression Tables via `pf.etable()`

Basic Usage

We can compare all regression models via the pyfixest-internal pf.etable() function:

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6])

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
X2	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
X1 × X2			0.011 (0.019)			-0.041 (0.071)
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

You can also estimate and display multiple regressions with one line of code using the (py)fixest stepwise notation:

pf.etable(pf.feols("Y+Y2~csw(X1,X2,X1:X2)", data=data))

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-1.000*** (0.085)	-0.993*** (0.082)	-0.992*** (0.082)	-1.322*** (0.215)	-1.316*** (0.214)	-1.316*** (0.215)
X2		-0.176*** (0.022)	-0.197*** (0.036)		-0.133* (0.057)	-0.132 (0.095)
X1 × X2			0.02 (0.027)			-0.000746 (0.071)
Intercept	0.919*** (0.112)	0.889*** (0.108)	0.888*** (0.108)	1.064*** (0.283)	1.042*** (0.283)	1.042*** (0.283)
stats
Observations	998	998	998	999	999	999
R²	0.123	0.177	0.177	0.037	0.042	0.042
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Keep and drop variables

etable allows us to do a few things out of the box. For example, we can only keep the variables that we’d like, which keeps all variables that fit the provided regex match.

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], keep="X1")

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
X1 × X2			0.011 (0.019)			-0.041 (0.071)
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

We can use the exact_match argument to select a specific set of variables:

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], keep=["X1", "X2"], exact_match=True)

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
X2	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

We can also easily drop variables via the drop argument:

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], drop=["X1"])

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X2	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Display p-values or confidence intervals

By default, pf.etable() reports standard errors. But we can also ask to output p-values or confidence intervals via the coef_fmt function argument.

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], coef_fmt="b \n (se) \n [p]")

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-0.95*** (0.066) [0]	-0.924*** (0.056) [0]	-0.924*** (0.056) [0]	-1.267*** (0.211) [0]	-1.232*** (0.211) [0]	-1.231*** (0.211) [0]
X2	-0.174*** (0.018) [0]	-0.174*** (0.015) [0]	-0.185*** (0.025) [0]	-0.131* (0.056) [0.02]	-0.118* (0.056) [0.036]	-0.074 (0.094) [0.436]
X1 × X2			0.011 (0.019) [0.572]			-0.041 (0.071) [0.563]
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error) [p-value]

Significance levels and rounding

Additionally, we can also overwrite the defaults for the reported significance levels and control the rounding of results via the signif_code and digits function arguments:

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], signif_code=[0.01, 0.05, 0.1], digits=5)

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-0.94953*** (0.06637)	-0.92405*** (0.05606)	-0.92417*** (0.05608)	-1.26655*** (0.21078)	-1.23153*** (0.21141)	-1.23100*** (0.21149)
X2	-0.17423*** (0.01760)	-0.17411*** (0.01486)	-0.18550*** (0.02502)	-0.13056** (0.05592)	-0.11767** (0.05610)	-0.07369 (0.09447)
X1 × X2			0.01057 (0.01868)			-0.04082 (0.07054)
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.1, p < 0.05, * p < 0.01. Format of coefficient cell: Coefficient (Std. Error)

Other output formats

By default, pf.etable() returns a GT object (see the Great Tables package), but you can also opt to dataframe, markdown, or latex output via the type argument.

# Pandas styler output:
pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    signif_code=[0.01, 0.05, 0.1],
    digits=5,
    coef_fmt="b (se)",
    type="df",
)

		Y			Y2
		(1)	(2)	(3)	(4)	(5)	(6)
coef	X1	-0.94953*** (0.06637)	-0.92405*** (0.05606)	-0.92417*** (0.05608)	-1.26655*** (0.21078)	-1.23153*** (0.21141)	-1.23100*** (0.21149)
	X2	-0.17423*** (0.01760)	-0.17411*** (0.01486)	-0.18550*** (0.02502)	-0.13056** (0.05592)	-0.11767** (0.05610)	-0.07369 (0.09447)
	X1 × X2			0.01057 (0.01868)			-0.04082 (0.07054)
fe	f1	x	x	x	x	x	x
fe	f2	-	x	x	-	x	x
stats	Observations	997	997	997	998	998	998
stats	R²	0.489	0.659	0.659	0.12	0.172	0.172

# Markdown output:
pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    signif_code=[0.01, 0.05, 0.1],
    digits=5,
    type="md",
)

|                           | ('Y', '(1)')   | ('Y', '(2)')   | ('Y', '(3)')   | ('Y2', '(4)')   | ('Y2', '(5)')   | ('Y2', '(6)')   |
|:--------------------------|:---------------|:---------------|:---------------|:----------------|:----------------|:----------------|
| ('coef', 'X1')            | -0.94953***    | -0.92405***    | -0.92417***    | -1.26655***     | -1.23153***     | -1.23100***     |
|                           |  (0.06637)     |  (0.05606)     |  (0.05608)     |  (0.21078)      |  (0.21141)      |  (0.21149)      |
| ('coef', 'X2')            | -0.17423***    | -0.17411***    | -0.18550***    | -0.13056**      | -0.11767**      | -0.07369        |
|                           |  (0.01760)     |  (0.01486)     |  (0.02502)     |  (0.05592)      |  (0.05610)      |  (0.09447)      |
| ('coef', 'X1 × X2')       |                |                | 0.01057        |                 |                 | -0.04082        |
|                           |                |                |  (0.01868)     |                 |                 |  (0.07054)      |
| ('fe', 'f1')              | x              | x              | x              | x               | x               | x               |
| ('fe', 'f2')              | -              | x              | x              | -               | x               | x               |
| ('stats', 'Observations') | 997            | 997            | 997            | 998             | 998             | 998             |
| ('stats', 'R²')           | 0.489          | 0.659          | 0.659          | 0.12            | 0.172           | 0.172           |

To obtain latex output use type = "tex". If you want to save the table as a tex file, you can use the file_name= argument to specify the respective path where it should be saved. Etable will use latex packages booktabs, threeparttable, makecell, and tabularx for the table layout, so don’t forget to include these packages in your latex document.

# LaTex output (include latex packages booktabs, threeparttable, makecell, and tabularx in your document):
tab = pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    signif_code=[0.01, 0.05, 0.1],
    digits=2,
    type="tex",
)

The following code generates a pdf including the regression table which you can display clicking on the link below the cell:

## Use pylatex to create a tex file with the table


def make_pdf(tab, file):
    "Create a PDF document with tex table."
    doc = pl.Document()
    doc.packages.append(pl.Package("booktabs"))
    doc.packages.append(pl.Package("threeparttable"))
    doc.packages.append(pl.Package("makecell"))
    doc.packages.append(pl.Package("tabularx"))

    with (
        doc.create(pl.Section("A PyFixest LateX Table")),
        doc.create(pl.Table(position="htbp")) as table,
    ):
        table.append(pl.NoEscape(tab))

    doc.generate_pdf(file, clean_tex=False)


# Compile latex to pdf & display a button with the hyperlink to the pdf
# requires tex installation
run = False
if run:
    make_pdf(tab, "latexdocs/SampleTableDoc")
display(FileLink("latexdocs/SampleTableDoc.pdf"))

Path (latexdocs/SampleTableDoc.pdf) doesn't exist. It may still be in the process of being generated, or you may have the incorrect path.

Rename variables

You can also rename variables if you want to have a more readable output. Just pass a dictionary to the labels argument. Note that interaction terms will also be relabeled using the specified labels for the interacted variables (if you want to manually relabel an interaction term differently, add it to the dictionary).

labels = {
    "Y": "Wage",
    "Y2": "Wealth",
    "X1": "Age",
    "X2": "Years of Schooling",
    "f1": "Industry",
    "f2": "Year",
}

pf.etable([fit1, fit2, fit3, fit4, fit5, fit6], labels=labels)

	Wage			Wealth
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
Age × Years of Schooling			0.011 (0.019)			-0.041 (0.071)
fe
Industry	x	x	x	x	x	x
Year	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

If you want to label the rows indicating the inclusion of fixed effects not with the variable label but with a custom label, you can pass on a separate dictionary to the felabels argument.

pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    labels=labels,
    felabels={"f1": "Industry Fixed Effects", "f2": "Year Fixed Effects"},
)

	Wage			Wealth
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
Age × Years of Schooling			0.011 (0.019)			-0.041 (0.071)
fe
Industry Fixed Effects	x	x	x	x	x	x
Year Fixed Effects	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Rename categorical variables

By default, categorical variables are returned using the formulaic “C(variable)[T.value]” notation. Via the cat_template argument, you can rename categorical variables via a specified template {variable}={value}. This works when either the variable is categorial in the DataFrame, or the C() or i() operators are used in the regresson formula. ´

# Add a categorical variable
data['job'] = np.random.choice(["Managerial", "Admin", "Blue collar"], size=len(data), p=[1/3, 1/3, 1/3])
# Add a label for this variable to the dictionary
labels['job']="Job Family"

fit7 = pf.feols("Y ~ X1 + X2 + job", data = data)

pf.etable([fit7], labels=labels, cat_template = "{variable}::{value}")

	Wage
	(1)
coef
Age	-0.994*** (0.082)
Years of Schooling	-0.177*** (0.022)
Job Family::Blue collar	-0.082 (0.164)
Job Family::Managerial	0.048 (0.164)
Intercept	0.901*** (0.143)
stats
Observations	998
R²	0.178
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

But you can also remove the variable name and only keep the levels (categories) by specifying cat_template=“{value}”. Note that the labeling of categories also works in interaction terms:

fit7 = pf.feols("Y ~ X1 + X2 + job", data = data)
fit8 = pf.feols("Y ~ X1 + X2 + job*X2", data = data)

pf.etable([fit7, fit8], labels=labels, cat_template="{value}")

	Wage
	(1)	(2)
coef
Age	-0.994*** (0.082)	-0.99*** (0.082)
Years of Schooling	-0.177*** (0.022)	-0.164*** (0.038)
Blue collar	-0.082 (0.164)	-0.09 (0.164)
Managerial	0.048 (0.164)	0.052 (0.164)
Blue collar × Years of Schooling		-0.051 (0.053)
Managerial × Years of Schooling		0.016 (0.054)
Intercept	0.901*** (0.143)	0.897*** (0.143)
stats
Observations	998	998
R²	0.178	0.179
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Change reference category

You can also change the reference category of a categorical variable using the ref argument in the interaction i() operator. For example, repeating the last estimation but changing the reference category to “Managerial” instead of “Admin”:

fit9 = pf.feols("Y ~ X1 + X2 + i(job,ref='Managerial') + i(job,X2,ref='Managerial')", data = data)

pf.etable([fit9], labels=labels, cat_template="{value}")

	Wage
	(1)
coef
Age	-0.99*** (0.082)
Years of Schooling	-0.147*** (0.038)
Admin	-0.052 (0.164)
Blue collar	-0.143 (0.161)
Admin × Years of Schooling	-0.016 (0.054)
Blue collar × Years of Schooling	-0.068 (0.053)
Intercept	0.95*** (0.145)
stats
Observations	998
R²	0.179
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Notice that this process will change the _coefnames. In this example, the new _coefnames are:

fit9._coefnames

[np.str_('Intercept'),
 np.str_('X1'),
 np.str_('X2'),
 np.str_("C(job, contr.treatment(base='Managerial'))[T.Admin]"),
 np.str_("C(job, contr.treatment(base='Managerial'))[T.Blue collar]"),
 np.str_("C(job, contr.treatment(base='Managerial'))[T.Admin]:X2"),
 np.str_("C(job, contr.treatment(base='Managerial'))[T.Blue collar]:X2")]

Custom model headlines

You can also add custom headers for each model by passing a list of strings to the model_headers argument.

pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    labels=labels,
    model_heads=["US", "China", "EU", "US", "China", "EU"],
)

	Wage			Wealth
	US	China	EU	US	China	EU
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
Age × Years of Schooling			0.011 (0.019)			-0.041 (0.071)
fe
Industry	x	x	x	x	x	x
Year	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Or change the ordering of headlines having headlines first and then dependent variables using the head_order argument. “hd” stands for headlines then dependent variables, “dh” for dependent variables then headlines. Assigning “d” or “h” can be used to only show dependent variables or only headlines. When head_order=“” only model numbers are shown.

pf.etable(
    [fit1, fit4, fit2, fit5, fit3, fit6],
    labels=labels,
    model_heads=["US", "US", "China", "China", "EU", "EU"],
    head_order="hd",
)

	US		China		EU
	Wage	Wealth	Wage	Wealth	Wage	Wealth
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-1.267*** (0.211)	-0.924*** (0.056)	-1.232*** (0.211)	-0.924*** (0.056)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.131* (0.056)	-0.174*** (0.015)	-0.118* (0.056)	-0.185*** (0.025)	-0.074 (0.094)
Age × Years of Schooling					0.011 (0.019)	-0.041 (0.071)
fe
Industry	x	x	x	x	x	x
Year	-	-	x	x	x	x
stats
Observations	997	998	997	998	997	998
R²	0.489	0.12	0.659	0.172	0.659	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Remove the dependent variables from the headers:

pf.etable(
    [fit1, fit4, fit2, fit5, fit3, fit6],
    labels=labels,
    model_heads=["US", "US", "China", "China", "EU", "EU"],
    head_order="",
)

	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-1.267*** (0.211)	-0.924*** (0.056)	-1.232*** (0.211)	-0.924*** (0.056)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.131* (0.056)	-0.174*** (0.015)	-0.118* (0.056)	-0.185*** (0.025)	-0.074 (0.094)
Age × Years of Schooling					0.011 (0.019)	-0.041 (0.071)
fe
Industry	x	x	x	x	x	x
Year	-	-	x	x	x	x
stats
Observations	997	998	997	998	997	998
R²	0.489	0.12	0.659	0.172	0.659	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Further custom model information

You can add further custom model statistics/information to the bottom of the table by using the custom_stats argument to which you pass a dictionary with the name of the row and lists of values. The length of the lists must be equal to the number of models.

pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    labels=labels,
    custom_model_stats={
        "Number of Clusters": [42, 42, 42, 37, 37, 37],
        "Additional Info": ["A", "A", "B", "B", "C", "C"],
    },
)

	Wage			Wealth
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
Age × Years of Schooling			0.011 (0.019)			-0.041 (0.071)
fe
Industry	x	x	x	x	x	x
Year	-	x	x	-	x	x
stats
Number of Clusters	42	42	42	37	37	37
Additional Info	A	A	B	B	C	C
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Custom table notes

You can replace the default table notes with your own notes using the notes argument.

mynotes = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."
pf.etable(
    [fit1, fit4, fit2, fit5, fit3, fit6],
    labels=labels,
    model_heads=["US", "US", "China", "China", "EU", "EU"],
    head_order="hd",
    notes=mynotes,
)

	US		China		EU
	Wage	Wealth	Wage	Wealth	Wage	Wealth
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-1.267*** (0.211)	-0.924*** (0.056)	-1.232*** (0.211)	-0.924*** (0.056)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.131* (0.056)	-0.174*** (0.015)	-0.118* (0.056)	-0.185*** (0.025)	-0.074 (0.094)
Age × Years of Schooling					0.011 (0.019)	-0.041 (0.071)
fe
Industry	x	x	x	x	x	x
Year	-	-	x	x	x	x
stats
Observations	997	998	997	998	997	998
R²	0.489	0.12	0.659	0.172	0.659	0.172
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Publication-ready LaTex tables

With few lines of code you thus obtain a publication-ready latex table:

tab = pf.etable(
    [fit1, fit4, fit2, fit5, fit3, fit6],
    labels=labels,
    model_heads=["US", "US", "China", "China", "EU", "EU"],
    head_order="hd",
    type="tex",
    notes=mynotes,
    show_fe=True,
    show_se_type=False,
    custom_model_stats={
        "Number of Clusters": [42, 42, 42, 37, 37, 37],
    },
)

# Compile latex to pdf & display a button with the hyperlink to the pdf
run = False
if run:
    make_pdf(tab, "latexdocs/SampleTableDoc2")
display(FileLink("latexdocs/SampleTableDoc2.pdf"))

Path (latexdocs/SampleTableDoc2.pdf) doesn't exist. It may still be in the process of being generated, or you may have the incorrect path.

Rendering Tables in Quarto

When you use quarto you can include latex tables generated by pyfixest when rendering the qmd file as pdf. Just specify output: asis in the code block options of the respective chunk and print the LaTex string returned by etable. Don’t forget to include the \usepackage commands for necessary latex packages in the YAML block. Here you find a sample qmd file.

When you render either a jupyter notebook or qmd file to html it is advisable to turn html-table-processing off in quarto as otherwise quarto adds further formatting which alters how your tables look like. You can do this in a raw cell at the top of your document.

---
format:
  html:
    html-table-processing: none
---

Descriptive Statistics via `DTable()`

Deprecation Notice

pf.dtable() will be deprecated in the future. Please use DTable from the maketables package.

The function DTable() allows to display descriptive statistics for a set of variables in the same layout.

Basic Usage of DTable

Specify the variables you want to display the descriptive statistics for. You can also use a dictionary to rename the variables and add a caption.

DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    labels=labels,
    caption="Descriptive statistics",
    digits=2,
)

	N	Mean	Std. Dev.
Descriptive statistics
Wage	999.00	-0.13	2.30
Wealth	1,000	-0.31	5.58
Age	999.00	1.04	0.81
Years of Schooling	1,000	-0.13	3.05

Choose the set of statistics to be displayed with stats. You can use any pandas aggregation functions.

DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    stats=["count", "mean", "std", "min", "max"],
    labels=labels,
    caption="Descriptive statistics",
)

	N	Mean	Std. Dev.	Min	Max
Descriptive statistics
Wage	999.00	-0.13	2.30	-6.54	6.91
Wealth	1,000	-0.31	5.58	-16.97	17.16
Age	999.00	1.04	0.81	0.00	2.00
Years of Schooling	1,000	-0.13	3.05	-9.67	10.99

Summarize by characteristics in columns and rows

You can summarize by characteristics using the bycol argument when groups are to be displayed in columns. When the number of observations is the same for all variables in a group, you can also opt to display the number of observations only once for each group byin a separate line at the bottom of the table with counts_row_below==True.

# Generate some categorial data
data["country"] = np.random.choice(["US", "EU"], data.shape[0])
data["occupation"] = np.random.choice(["Blue collar", "White collar"], data.shape[0])

# Drop nan values to have balanced data
data.dropna(inplace=True)

DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    labels=labels,
    bycol=["country", "occupation"],
    stats=["count", "mean", "std"],
    caption="Descriptive statistics",
    stats_labels={"count": "Number of observations"},
    counts_row_below=True,
)

	EU				US
Descriptive statistics
	Blue collar		White collar		Blue collar		White collar
	Mean	Std. Dev.	Mean	Std. Dev.	Mean	Std. Dev.	Mean	Std. Dev.
stats
Wage	-0.20	2.25	-0.19	2.40	0.01	2.28	-0.13	2.30
Wealth	-0.84	5.43	0.24	5.71	-0.50	5.48	-0.20	5.69
Age	1.07	0.79	1.02	0.79	1.05	0.82	1.03	0.83
Years of Schooling	-0.31	3.22	0.15	2.98	-0.13	2.96	-0.22	3.04
nobs
Number of observations	233.00		247.00		254.00		263.00

You can also use custom aggregation functions to compute further statistics or affect how statistics are presented. Pyfixest provides two such functions mean_std and mean_newline_std which compute the mean and standard deviation and display both the same cell (either with line break between them or not). This allows to have more compact tables when you want to show statistics for many characteristcs in the columns.

You can also hide the display of the statistics labels in the header with hide_stats_labels=True. In that case a table note will be added naming the statistics displayed using its label (if you have not provided a custom note).

DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    labels=labels,
    bycol=["country", "occupation"],
    stats=["mean_newline_std", "count"],
    caption="Descriptive statistics",
    stats_labels={"count": "Number of observations"},
    counts_row_below=True,
    hide_stats=True,
)

	EU		US
Descriptive statistics
	Blue collar	White collar	Blue collar	White collar
stats
Wage	-0.20 (2.25)	-0.19 (2.40)	0.01 (2.28)	-0.13 (2.30)
Wealth	-0.84 (5.43)	0.24 (5.71)	-0.50 (5.48)	-0.20 (5.69)
Age	1.07 (0.79)	1.02 (0.79)	1.05 (0.82)	1.03 (0.83)
Years of Schooling	-0.31 (3.22)	0.15 (2.98)	-0.13 (2.96)	-0.22 (3.04)
nobs
Number of observations	233	247	254	263
Note: Displayed statistics are Mean (Std. Dev.).

You can also split by characteristics in both columns and rows. Note that you can only use one grouping variable in rows, but several in columns (as shown above).

DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    labels=labels,
    bycol=["country"],
    byrow="occupation",
    stats=["count", "mean", "std"],
    caption="Descriptive statistics",
)

	EU			US
Descriptive statistics
	N	Mean	Std. Dev.	N	Mean	Std. Dev.
Blue collar
Wage	233.00	-0.20	2.25	254.00	0.01	2.28
Wealth	233.00	-0.84	5.43	254.00	-0.50	5.48
Age	233.00	1.07	0.79	254.00	1.05	0.82
Years of Schooling	233.00	-0.31	3.22	254.00	-0.13	2.96
White collar
Wage	247.00	-0.19	2.40	263.00	-0.13	2.30
Wealth	247.00	0.24	5.71	263.00	-0.20	5.69
Age	247.00	1.02	0.79	263.00	1.03	0.83
Years of Schooling	247.00	0.15	2.98	263.00	-0.22	3.04

And you can again export descriptive statistics tables also to LaTex:

dtab = DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    labels=labels,
    bycol=["country"],
    byrow="occupation",
    stats=["count", "mean", "std"],
    type="tex",
)

run = False
if run:
    make_pdf(dtab, "latexdocs/SampleTableDoc3")
display(FileLink("latexdocs/SampleTableDoc3.pdf"))

Path (latexdocs/SampleTableDoc3.pdf) doesn't exist. It may still be in the process of being generated, or you may have the incorrect path.

Custom Styling with Great Tables

You can use the rich set of methods offered by Great Tables to further customize the table display when the type is “gt”.

Example Styling

(
    pf.etable([fit1, fit2, fit3, fit4, fit5, fit6])
    .tab_options(
        column_labels_background_color="cornsilk",
        stub_background_color="whitesmoke",
    )
    .tab_style(
        style=style.fill(color="mistyrose"),
        locations=loc.body(columns="(3)", rows=["X2"]),
    )
)

	Y			Y2
	(1)	(2)	(3)	(4)	(5)	(6)
coef
X1	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
X2	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
X1 × X2			0.011 (0.019)			-0.041 (0.071)
fe
f1	x	x	x	x	x	x
f2	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Defining Table Styles: Some Examples

You can easily define table styles that you can apply to all tables in your project. Just define a dictionary with the respective values for the tab options (see the Great Tables documentation) and use the style with .tab_options(**style_dict).

style_print = {
    "table_font_size": "12px",
    "heading_title_font_size": "12px",
    "source_notes_font_size": "8px",
    "data_row_padding": "3px",
    "column_labels_padding": "3px",
    "row_group_border_top_style": "hidden",
    "table_body_border_top_style": "None",
    "table_body_border_bottom_width": "1px",
    "column_labels_border_top_width": "1px",
    "table_width": "14cm",
}


style_presentation = {
    "table_font_size": "16px",
    "table_font_color_light": "white",
    "table_body_border_top_style": "hidden",
    "table_body_border_bottom_style": "hidden",
    "heading_title_font_size": "18px",
    "source_notes_font_size": "12px",
    "data_row_padding": "3px",
    "column_labels_padding": "6px",
    "column_labels_background_color": "midnightblue",
    "stub_background_color": "whitesmoke",
    "row_group_background_color": "whitesmoke",
    "table_background_color": "whitesmoke",
    "heading_background_color": "white",
    "source_notes_background_color": "white",
    "column_labels_border_bottom_color": "white",
    "column_labels_font_weight": "bold",
    "row_group_font_weight": "bold",
    "table_width": "18cm",
}

t1 = DTable(
    data,
    vars=["Y", "Y2", "X1", "X2"],
    stats=["count", "mean", "std", "min", "max"],
    labels=labels,
    caption="Descriptive statistics",
)

t2 = pf.etable(
    [fit1, fit2, fit3, fit4, fit5, fit6],
    labels=labels,
    show_se=False,
    felabels={"f1": "Industry Fixed Effects", "f2": "Year Fixed Effects"},
    caption="Regression results",
)

display(t1.make(type="gt", gt_style=style_print))
display(t2.tab_options(**style_print))

	N	Mean	Std. Dev.	Min	Max
Descriptive statistics
Wage	997.00	-0.13	2.31	-6.54	6.91
Wealth	997.00	-0.32	5.59	-16.97	17.16
Age	997.00	1.04	0.81	0.00	2.00
Years of Schooling	997.00	-0.13	3.05	-9.67	10.99

	Wage			Wealth
Regression results
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
Age × Years of Schooling			0.011 (0.019)			-0.041 (0.071)
fe
Industry Fixed Effects	x	x	x	x	x	x
Year Fixed Effects	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

style_printDouble = {
    "table_font_size": "12px",
    "heading_title_font_size": "12px",
    "source_notes_font_size": "8px",
    "data_row_padding": "3px",
    "column_labels_padding": "3px",
    "table_body_border_bottom_style": "double",
    "column_labels_border_top_style": "double",
    "column_labels_border_bottom_width": "0.5px",
    "row_group_border_top_style": "hidden",
    "table_body_border_top_style": "None",
    "table_width": "14cm",
}
display(t1.make(type="gt", gt_style=style_printDouble))
display(t2.tab_options(**style_printDouble))

	N	Mean	Std. Dev.	Min	Max
Descriptive statistics
Wage	997.00	-0.13	2.31	-6.54	6.91
Wealth	997.00	-0.32	5.59	-16.97	17.16
Age	997.00	1.04	0.81	0.00	2.00
Years of Schooling	997.00	-0.13	3.05	-9.67	10.99

	Wage			Wealth
Regression results
	(1)	(2)	(3)	(4)	(5)	(6)
coef
Age	-0.95*** (0.066)	-0.924*** (0.056)	-0.924*** (0.056)	-1.267*** (0.211)	-1.232*** (0.211)	-1.231*** (0.211)
Years of Schooling	-0.174*** (0.018)	-0.174*** (0.015)	-0.185*** (0.025)	-0.131* (0.056)	-0.118* (0.056)	-0.074 (0.094)
Age × Years of Schooling			0.011 (0.019)			-0.041 (0.071)
fe
Industry Fixed Effects	x	x	x	x	x	x
Year Fixed Effects	-	x	x	-	x	x
stats
Observations	997	997	997	998	998	998
R²	0.489	0.659	0.659	0.12	0.172	0.172
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)