Descriptive Statistics & Balance Tables

DTable() allows to display descriptive statistics for a set of variables in the same layout. DTable() inherits from the MTable base class, which provides all the core output functionality. This means that ETable can generate tables in multiple formats (HTML/GT, docx, LaTeX). BTable() inherits from DTable() to display simple Balance Tables adding statistical tests for treatment comparisons.

Basic Usage of `DTable()`

Specify the variables you want to display the descriptive statistics for. Here we also directly define variable labels and set these as default labels (see Setting defaults in the ETable documentation).

# Import necessary libraries
import numpy as np
import pandas as pd
import maketables as mt

# Load sample dataset
df = pd.read_csv("../data/salaries.csv")

# Define variable labels
labels = {
    "logwage": "ln(Wage)",
    "wage": "Wage",
    "age": "Age",
    "female": "Female",
    "tenure": "Years of Tenure",
    "occupation": "Occupation",
    "worker_type": "Worker Type",
    "education": "Education Level"
}

# Set default labels 
mt.MTable.DEFAULT_LABELS = labels

mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    caption="Descriptive statistics",
)

	N	Mean	Std. Dev.
Descriptive statistics
Wage	1,800	62,742	28,312
ln(Wage)	1,800	10.94	0.48
Age	1,800	40.77	11.10
Years of Tenure	1,800	17.62	11.18

Choose the set of statistics to be displayed with stats. You can use any pandas aggregation functions.

mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    stats=["count", "mean", "std", "min", "max"],
    caption="Descriptive statistics",
    
)

	N	Mean	Std. Dev.	Min	Max
Descriptive statistics
Wage	1,800	62,742	28,312	25,000	166,589
ln(Wage)	1,800	10.94	0.48	10.13	12.02
Age	1,800	40.77	11.10	22.00	65.00
Years of Tenure	1,800	17.62	11.18	0.00	43.00

Summarize by characteristics in columns and rows

You can summarize by characteristics using the bycol argument when groups are to be displayed in columns. When the number of observations is the same for all variables in a group, you can also opt to display the number of observations only once for each group byin a separate line at the bottom of the table with counts_row_below==True.

# Generate a categorical variable for gender from the dummy variable
df["gender"] = df["female"].map({0: "Male", 1: "Female"})

mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    bycol=["worker_type","gender"],
    stats=["count", "mean", "std"],
    caption="Descriptive statistics by worker type and gender",
    stats_labels={"count": "Number of observations"},
    counts_row_below=True,
    digits=2)

	Blue Collar				White Collar
Descriptive statistics by worker type and gender
	Female		Male		Female		Male
	Mean	Std. Dev.	Mean	Std. Dev.	Mean	Std. Dev.	Mean	Std. Dev.
stats
Wage	53,900	24,679	54,360	26,129	65,615	27,898	71,399	29,204
ln(Wage)	10.79	0.47	10.79	0.49	11.00	0.45	11.08	0.46
Age	41.10	10.96	39.83	11.14	41.79	11.02	40.20	11.17
Years of Tenure	17.86	11.19	16.73	11.15	18.59	11.08	17.10	11.23
nobs
Number of observations	357.00		368.00		530.00		545.00

You can also use custom aggregation functions to compute further statistics or affect how statistics are presented. Pyfixest provides two such functions mean_std and mean_newline_std which compute the mean and standard deviation and display both the same cell (either with line break between them or not). This allows to have more compact tables when you want to show statistics for many characteristcs in the columns.

You can also hide the display of the statistics labels in the header with hide_stats_labels=True. In that case a table note will be added naming the statistics displayed using its label (if you have not provided a custom note).

mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    bycol=["worker_type", "gender"],
    stats=["mean_newline_std", "count"],
    caption="Descriptive statistics by worker type and gender",
    stats_labels={"count": "Number of observations"},
    counts_row_below=True,
    hide_stats=True,
)

	Blue Collar		White Collar
Descriptive statistics by worker type and gender
	Female	Male	Female	Male
stats
Wage	53,900 (24,679)	54,360 (26,129)	65,615 (27,898)	71,399 (29,204)
ln(Wage)	10.79 (0.47)	10.79 (0.49)	11.00 (0.45)	11.08 (0.46)
Age	41.10 (10.96)	39.83 (11.14)	41.79 (11.02)	40.20 (11.17)
Years of Tenure	17.86 (11.19)	16.73 (11.15)	18.59 (11.08)	17.10 (11.23)
nobs
Number of observations	357	368	530	545
Note: Displayed statistics are Mean (Std. Dev.).

You can also split by characteristics in both columns and rows. Note that you can only use one grouping variable in rows, but several in columns (as shown above).

mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    bycol=["worker_type"],
    byrow="gender",
    stats=["count", "mean", "std"],
    caption="Descriptive statistics by worker type and gender",
)

	Blue Collar			White Collar
Descriptive statistics by worker type and gender
	N	Mean	Std. Dev.	N	Mean	Std. Dev.
Female
Wage	357.00	53,900	24,679	530.00	65,615	27,898
ln(Wage)	357.00	10.79	0.47	530.00	11.00	0.45
Age	357.00	41.10	10.96	530.00	41.79	11.02
Years of Tenure	357.00	17.86	11.19	530.00	18.59	11.08
Male
Wage	368.00	54,360	26,129	545.00	71,399	29,204
ln(Wage)	368.00	10.79	0.49	545.00	11.08	0.46
Age	368.00	39.83	11.14	545.00	40.20	11.17
Years of Tenure	368.00	16.73	11.15	545.00	17.10	11.23

Number formatting

DTable supports flexible number formatting via the format_spec argument. You can control formatting at three levels passing a dictionary:

Key types accepted:
- ('var', 'stat') — per-variable and per-statistic (most specific)
- 'var' — all statistics for a specific variable
- 'stat' — that statistic for all variables
Lookup priority (applied in this order): (var,stat) → var → stat.

This logic ensures you can set global stat styles, per-variable styles, or very specific per-variable/stat styles — the most specific match wins.

# Custom format specifications for variables/statistics
format_specs = {
    # Per-variable formats (applies to all stats for that variable unless overridden)
    'wage': ',.1f',     # Wage always with 1 decimals
    # Per-variable/statistic formats (most specific, takes precedence)
    ('age', 'mean'): '.3f',   # Age mean with 3 decimals
    ('tenure', 'std'): '.4f', # Tenure std with 4 decimals
}

mt.DTable(
    df,
    vars=["wage", "age", "tenure"],
    stats=["mean", "std", "min", "max", "count"],
    format_spec=format_specs,
    caption="Custom formatting example with per-variable/statistic logic"
)

	Mean	Std. Dev.	Min	Max	N
Custom formatting example with per-variable/statistic logic
Wage	62,741.8	28,312.4	25,000.0	166,589.0	1,800.0
Age	40.769	11.10	22.00	65.00	1,800
Years of Tenure	17.62	11.1762	0.00	43.00	1,800

Balance Tables with `BTable`

Balance Tables can be displayed with BTable which is based on DTable so inherits most of the latter’s functionality. It constructs simple balance tables that shows variables by groups (like treatments in an experiment) and performs statistical tests comparing these variables between the goups, displaying respective p-values.

For two groups it displays the p-value of the single group indicator (t test) for more then two groups the p-value of a joint Wald test that all group indicators are zero is displayed. BTable uses pyFixest to perform the tests. You can add fixed_effects via fixed_effects= ... and specify the vcov option, for instance to implement clustering (see pyfixest documentation).

mt.BTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    group="worker_type",
    caption="Balance Table",
)

	Blue Collar		White Collar		p-value
Balance Table
	Mean	Std. Dev.	Mean	Std. Dev.	p-value
Wage	54,134	25,409	68,547	28,701	0.000
ln(Wage)	10.79	0.48	11.04	0.46	0.000
Age	40.46	11.06	40.98	11.12	0.324
Years of Tenure	17.29	11.18	17.84	11.18	0.308

Basic Usage of DTable()

Summarize by characteristics in columns and rows

Number formatting

Balance Tables with BTable

Basic Usage of `DTable()`

Balance Tables with `BTable`