# Import necessary libraries
import numpy as np
import pandas as pd
import maketables as mt
# Load sample dataset
df = pd.read_csv("../data/salaries.csv")
# Define variable labels
labels = {
"logwage": "ln(Wage)",
"wage": "Wage",
"age": "Age",
"female": "Female",
"tenure": "Years of Tenure",
"occupation": "Occupation",
"worker_type": "Worker Type",
"education": "Education Level"
}
# Set default labels
mt.MTable.DEFAULT_LABELS = labelsDescriptive Statistics & Balance Tables
DTable() allows to display descriptive statistics for a set of variables in the same layout. DTable() inherits from the MTable base class, which provides all the core output functionality. This means that ETable can generate tables in multiple formats (HTML/GT, docx, LaTeX). BTable() inherits from DTable() to display simple Balance Tables adding statistical tests for treatment comparisons.
Basic Usage of DTable()
Specify the variables you want to display the descriptive statistics for. Here we also directly define variable labels and set these as default labels (see Setting defaults in the ETable documentation).
mt.DTable(
df,
vars=["wage", "logwage", "age", "tenure"],
caption="Descriptive statistics",
)| Descriptive statistics | |||
| N | Mean | Std. Dev. | |
|---|---|---|---|
| Wage | 1,800 | 62,742 | 28,312 |
| ln(Wage) | 1,800 | 10.94 | 0.48 |
| Age | 1,800 | 40.77 | 11.10 |
| Years of Tenure | 1,800 | 17.62 | 11.18 |
Choose the set of statistics to be displayed with stats. You can use any pandas aggregation functions.
mt.DTable(
df,
vars=["wage", "logwage", "age", "tenure"],
stats=["count", "mean", "std", "min", "max"],
caption="Descriptive statistics",
)| Descriptive statistics | |||||
| N | Mean | Std. Dev. | Min | Max | |
|---|---|---|---|---|---|
| Wage | 1,800 | 62,742 | 28,312 | 25,000 | 166,589 |
| ln(Wage) | 1,800 | 10.94 | 0.48 | 10.13 | 12.02 |
| Age | 1,800 | 40.77 | 11.10 | 22.00 | 65.00 |
| Years of Tenure | 1,800 | 17.62 | 11.18 | 0.00 | 43.00 |
Summarize by characteristics in columns and rows
You can summarize by characteristics using the bycol argument when groups are to be displayed in columns. When the number of observations is the same for all variables in a group, you can also opt to display the number of observations only once for each group byin a separate line at the bottom of the table with counts_row_below==True.
# Generate a categorical variable for gender from the dummy variable
df["gender"] = df["female"].map({0: "Male", 1: "Female"})
mt.DTable(
df,
vars=["wage", "logwage", "age", "tenure"],
bycol=["worker_type","gender"],
stats=["count", "mean", "std"],
caption="Descriptive statistics by worker type and gender",
stats_labels={"count": "Number of observations"},
counts_row_below=True,
digits=2)| Descriptive statistics by worker type and gender | ||||||||
| Blue Collar | White Collar | |||||||
|---|---|---|---|---|---|---|---|---|
| Female | Male | Female | Male | |||||
| Mean | Std. Dev. | Mean | Std. Dev. | Mean | Std. Dev. | Mean | Std. Dev. | |
| stats | ||||||||
| Wage | 53,900 | 24,679 | 54,360 | 26,129 | 65,615 | 27,898 | 71,399 | 29,204 |
| ln(Wage) | 10.79 | 0.47 | 10.79 | 0.49 | 11.00 | 0.45 | 11.08 | 0.46 |
| Age | 41.10 | 10.96 | 39.83 | 11.14 | 41.79 | 11.02 | 40.20 | 11.17 |
| Years of Tenure | 17.86 | 11.19 | 16.73 | 11.15 | 18.59 | 11.08 | 17.10 | 11.23 |
| nobs | ||||||||
| Number of observations | 357.00 | 368.00 | 530.00 | 545.00 | ||||
You can also use custom aggregation functions to compute further statistics or affect how statistics are presented. Pyfixest provides two such functions mean_std and mean_newline_std which compute the mean and standard deviation and display both the same cell (either with line break between them or not). This allows to have more compact tables when you want to show statistics for many characteristcs in the columns.
You can also hide the display of the statistics labels in the header with hide_stats_labels=True. In that case a table note will be added naming the statistics displayed using its label (if you have not provided a custom note).
mt.DTable(
df,
vars=["wage", "logwage", "age", "tenure"],
bycol=["worker_type", "gender"],
stats=["mean_newline_std", "count"],
caption="Descriptive statistics by worker type and gender",
stats_labels={"count": "Number of observations"},
counts_row_below=True,
hide_stats=True,
)| Descriptive statistics by worker type and gender | ||||
| Blue Collar | White Collar | |||
|---|---|---|---|---|
| Female | Male | Female | Male | |
| stats | ||||
| Wage | 53,900 (24,679) |
54,360 (26,129) |
65,615 (27,898) |
71,399 (29,204) |
| ln(Wage) | 10.79 (0.47) |
10.79 (0.49) |
11.00 (0.45) |
11.08 (0.46) |
| Age | 41.10 (10.96) |
39.83 (11.14) |
41.79 (11.02) |
40.20 (11.17) |
| Years of Tenure | 17.86 (11.19) |
16.73 (11.15) |
18.59 (11.08) |
17.10 (11.23) |
| nobs | ||||
| Number of observations | 357 | 368 | 530 | 545 |
| Note: Displayed statistics are Mean (Std. Dev.). | ||||
You can also split by characteristics in both columns and rows. Note that you can only use one grouping variable in rows, but several in columns (as shown above).
mt.DTable(
df,
vars=["wage", "logwage", "age", "tenure"],
bycol=["worker_type"],
byrow="gender",
stats=["count", "mean", "std"],
caption="Descriptive statistics by worker type and gender",
)| Descriptive statistics by worker type and gender | ||||||
| Blue Collar | White Collar | |||||
|---|---|---|---|---|---|---|
| N | Mean | Std. Dev. | N | Mean | Std. Dev. | |
| Female | ||||||
| Wage | 357.00 | 53,900 | 24,679 | 530.00 | 65,615 | 27,898 |
| ln(Wage) | 357.00 | 10.79 | 0.47 | 530.00 | 11.00 | 0.45 |
| Age | 357.00 | 41.10 | 10.96 | 530.00 | 41.79 | 11.02 |
| Years of Tenure | 357.00 | 17.86 | 11.19 | 530.00 | 18.59 | 11.08 |
| Male | ||||||
| Wage | 368.00 | 54,360 | 26,129 | 545.00 | 71,399 | 29,204 |
| ln(Wage) | 368.00 | 10.79 | 0.49 | 545.00 | 11.08 | 0.46 |
| Age | 368.00 | 39.83 | 11.14 | 545.00 | 40.20 | 11.17 |
| Years of Tenure | 368.00 | 16.73 | 11.15 | 545.00 | 17.10 | 11.23 |
Number formatting
DTable supports flexible number formatting via the format_spec argument. You can control formatting at three levels passing a dictionary:
- Key types accepted:
('var', 'stat')— per-variable and per-statistic (most specific)'var'— all statistics for a specific variable'stat'— that statistic for all variables
- Lookup priority (applied in this order): (
var,stat) →var→stat.
This logic ensures you can set global stat styles, per-variable styles, or very specific per-variable/stat styles — the most specific match wins.
# Custom format specifications for variables/statistics
format_specs = {
# Per-variable formats (applies to all stats for that variable unless overridden)
'wage': ',.1f', # Wage always with 1 decimals
# Per-variable/statistic formats (most specific, takes precedence)
('age', 'mean'): '.3f', # Age mean with 3 decimals
('tenure', 'std'): '.4f', # Tenure std with 4 decimals
}
mt.DTable(
df,
vars=["wage", "age", "tenure"],
stats=["mean", "std", "min", "max", "count"],
format_spec=format_specs,
caption="Custom formatting example with per-variable/statistic logic"
)| Custom formatting example with per-variable/statistic logic | |||||
| Mean | Std. Dev. | Min | Max | N | |
|---|---|---|---|---|---|
| Wage | 62,741.8 | 28,312.4 | 25,000.0 | 166,589.0 | 1,800.0 |
| Age | 40.769 | 11.10 | 22.00 | 65.00 | 1,800 |
| Years of Tenure | 17.62 | 11.1762 | 0.00 | 43.00 | 1,800 |
Balance Tables with BTable
Balance Tables can be displayed with BTable which is based on DTable so inherits most of the latter’s functionality. It constructs simple balance tables that shows variables by groups (like treatments in an experiment) and performs statistical tests comparing these variables between the goups, displaying respective p-values.
For two groups it displays the p-value of the single group indicator (t test) for more then two groups the p-value of a joint Wald test that all group indicators are zero is displayed. BTable uses pyFixest to perform the tests. You can add fixed_effects via fixed_effects= ... and specify the vcov option, for instance to implement clustering (see pyfixest documentation).
mt.BTable(
df,
vars=["wage", "logwage", "age", "tenure"],
group="worker_type",
caption="Balance Table",
)| Balance Table | |||||
| Blue Collar | White Collar | p-value | |||
|---|---|---|---|---|---|
| Mean | Std. Dev. | Mean | Std. Dev. | ||
| Wage | 54,134 | 25,409 | 68,547 | 28,701 | 0.000 |
| ln(Wage) | 10.79 | 0.48 | 11.04 | 0.46 | 0.000 |
| Age | 40.46 | 11.06 | 40.98 | 11.12 | 0.324 |
| Years of Tenure | 17.29 | 11.18 | 17.84 | 11.18 | 0.308 |