Adding New Model Classes to MakeTables

There are two ways to make a statistical package compatible with ETables in maketables for automatic table generation:



How Extractors Enable Table Display

Before diving into implementation details, it’s important to understand how extractors bridge statistical models and ETable visualizations.

The Core Workflow

When you call ETable(model), maketables uses an extractor to:

  1. Extract the coefficient table via coef_table(model) → Returns a DataFrame with columns like 'b' (estimates), 'se' (standard errors), 't' (t-stats), 'p' (p-values), and optional columns like confidence intervals

  2. Extract model statistics via stat(model, key) → Returns values for keys like 'N' (observations), 'r2' (R-squared), 'adj_r2' (adjusted R²), 'aic', 'bic', etc.

  3. Extract metadata → Dependent variable name, fixed effects specification, variable labels, and variance-covariance information


Using coef_fmt to Access Coefficient Information

The coef_fmt parameter is a template string that lets users control which columns from the coefficient table appear in the output and how they’re formatted. Think of it as a specification language:

from maketables import ETable

# Users specify which tokens (column names) to display and how to format them
table = ETable(result, coef_fmt="b:.3f* \n (se:.3f)")

Breaking down this example:

  • b:.3f → Display the 'b' column (coefficient) with 3 decimal places

  • * → Add significance stars after the coefficient (based on the 'p' column)

  • \n → Line break (puts standard error on next line)

  • (se:.3f) → Display the 'se' column in parentheses with 3 decimals


Your extractor’s coef_table() method must return a DataFrame with these token names as columns. The standard/canonical tokens are:

Token Meaning Source
b Coefficient estimate coef_table()
se Standard error coef_table()
t t-statistic coef_table()
p p-value coef_table()
ci95l, ci95u 95% confidence interval bounds coef_table() (optional)
Any other columns Custom model-specific stats coef_table() (optional)

Users can reference any column returned by your coef_table() method in the coef_fmt string. This gives maximum flexibility—if your model has unique statistics, include them as columns and users can format them.



Using model_stats to Access Model Statistics

Similarly, the model_stats parameter lets users specify which model-level statistics appear below the coefficient table:

table = ETable(result, model_stats=['N', 'r2', 'adj_r2', 'rmse'])



Overview of Implementation Approaches

Below are detailed examples for both approaches.



Adding a custom extractor

Dev Environment Setup

To get started, we encourage you to set up a development environment, which starts by installing the package manager of our choice, pixi:

  1. Install pixi by following the steps describe on their installation page.

  2. Clone maketables and create a dev environment with your package:

git clone git@github.com:py-econometrics/maketables.git # SSH
git clone https://github.com/py-econometrics/maketables.git # https
cd maketables
# create a new dev env and give it a name_of_new_env and add the
# packge for which you want to add a method name_of_model_package
pixi add --pypi --feature name_of_new_env name_of_model_package
# activate the new env: 
pixi shell -e name_of_new_env

Now you are good to go!

Adding New Model Classes to MakeTables

Example: Statsmodels OLS Extractor

Below we attach a simplified version of the model extractor protocol for statsmodels, which provides a good blueprint for the addition of other models. After you have implemented it, please don’t forget to update the SupportedModelClasses.md and readme.md!

from maketables.extractors import register_extractor, _get_attr
import pandas as pd

# Check if statsmodels is installed
try:
    from statsmodels.regression.linear_model import RegressionResultsWrapper
    HAS_STATSMODELS = True
except ImportError:
    HAS_STATSMODELS = False
    RegressionResultsWrapper = ()  # empty tuple for isinstance check


class MyStatsmodelsExtractor:
    """Extractor for statsmodels OLS results."""
    
    # dict that translates between maketables model names (keys) 
    # and statsmodels attributes
    STAT_MAP = {
        "N": "nobs",
        "r2": "rsquared",
        "adj_r2": "rsquared_adj",
        "aic": "aic",
        "bic": "bic",
        "fvalue": "fvalue",
        "se_type": "cov_type",
    }
    
    def can_handle(self, model) -> bool:
        # check if statsmodels is installed
        if not HAS_STATSMODELS:
            return False
        return isinstance(model, RegressionResultsWrapper)
    
    def coef_table(self, model) -> pd.DataFrame:
        # Return coefficient table with canonical column names: b, se, p.
        # These tokens can be referenced directly in ETable's coef_fmt string.
        # Any additional columns (e.g., confidence intervals) can also be included by just adding a 
        # column to the df named with the respective token that the user can specify in the format string.
        df = pd.DataFrame({
            "b": model.params,
            "se": model.bse,
            "t": model.tvalues,
            "p": model.pvalues,
        })
        
        df.index.name = "Coefficient"
        return df
    
    def depvar(self, model) -> str:
        # set the name of the dependent variable
        return getattr(model.model, "endog_names", "y")
    
    def fixef_string(self, model) -> str | None:
        # set the values of fixed effects as a string 
        # separated by a '+', ie 'f1+f2'. Only when 
        # fixed effects are supported
        return None

    def vcov_info(self, model) -> dict:
        # retrieve information on how the vcov matrix is computed
        return {"vcov_type": getattr(model, "cov_type", None), "clustervar": None}

    def var_labels(self, model) -> dict | None:
        # Extract variable labels from the model's data DataFrame when available.
        # Can be set to None for a MVP implementation
        return None

    # the remaining two methods can just be copied as stated below: 

    def stat(self, model: Any, key: str) -> Any:
        'Extract a statistic using STAT_MAP.'
        spec = self.STAT_MAP.get(key)
        if spec is None:
            return None
        val = _get_attr(model, spec)
        if key == "N" and val is not None:
            try:
                return int(val)
            except Exception:
                return val
        return val

    def supported_stats(self, model: Any) -> set[str]:
        'Return set of statistics available.'
        return {
            k for k, spec in self.STAT_MAP.items() if _get_attr(model, spec) is not None
        }

    # Optional methods for enhanced functionality:

    def stat_labels(self, model) -> dict[str, str] | None:
        '''Provide custom labels for statistics.
        
        These labels override ETable's default labels but are overridden by user-specified labels.
        For example, you might want to display 'Pseudo R²' instead of the default label.
        '''
        # Return None for OLS, but could customize for other model types
        return None

    def default_stat_keys(self, model) -> list[str] | None:
        '''Specify which statistics should be shown by default for this model type.
        
        When mixing model types in one table, ETable shows the union of all default stats.
        For example, logit/probit models might default to ['N', 'pseudo_r2', 'll'] while
        OLS models use ETable's standard defaults.
        '''
        # Check if this is a logit or probit model
        if hasattr(model, 'model') and model.model.__class__.__name__ in ['Logit', 'Probit']:
            return ['N', 'pseudo_r2', 'll']
        return None  # Use ETable's defaults for other models


# Register at the bottom of the script: 
if HAS_STATSMODELS:
    register_extractor(MyStatsmodelsExtractor())

Methods Summary (Required and Optional)

Required Methods

These methods must be implemented for a functional extractor:

Method Returns Purpose
can_handle(model) bool Return True if this extractor handles the model type
coef_table(model) DataFrame Columns (canonical tokens): b (estimate), se (std. error), p (p-value), optionally t (t-statistic). May include additional columns like ci95l, ci95u, etc.
stat(model, key) Any Extract stat by key: N, r2, adj_r2, se_type, etc. Return None if not available.
supported_stats(model) set[str] Set of available stat keys

Methods with Fallback Defaults

These methods are part of the protocol but have sensible defaults if not fully implemented:

Method Returns Purpose Fallback
depvar(model) str Dependent variable name "Dependent Variable" if not provided
fixef_string(model) str \| None Fixed effects spec (e.g., "entity+time") None (no fixed effects)
vcov_info(model) dict Keys: vcov_type, clustervar {} (empty dict)
var_labels(model) dict \| None Variable name → label mapping None (no labels)

Optional Methods

Method Returns Purpose
stat_labels(model) dict[str, str] \| None Custom labels for statistics (e.g., {'pseudo_r2': 'Pseudo R²'}). Override ETable defaults but user labels take priority.
default_stat_keys(model) list[str] \| None Default statistics to display for this model type (e.g., ['N', 'pseudo_r2', 'll'] for logit/probit). ETable shows union of all defaults when mixing model types.



Alternative: Plug-in Extractor Format

If you maintain your own package and want to make it compatible with maketables without requiring any code changes to maketables itself, you can use the plug-in extractor format.

Simply add specific attributes and methods to your model result class, and maketables will automatically detect and use them. This approach requires zero coupling between your package and maketables—your package never needs to import maketables.

Plug-in Format Specification

Required Attributes

1. Coefficient Table DataFrame (__maketables_coef_table__)

Add a property named __maketables_coef_table__ that returns a DataFrame with regression coefficients and statistics:

@property
def __maketables_coef_table__(self) -> pd.DataFrame:
    """
    Return a DataFrame with regression coefficients and statistics.
    
    Required columns:
    - 'b': coefficient estimates
    - 'se': standard errors
    - 'p': p-values
    
    Optional columns:
    - 't': t-statistics
    - 'ci95l', 'ci95u': 95% confidence interval bounds
    - 'ci90l', 'ci90u': 90% confidence interval bounds
    - Any other model-specific statistics
    
    Returns
    -------
    pd.DataFrame
        Index: coefficient names (str)
        Columns: canonical column names (str)
        Values: numeric (float or int)
    """
    coef_table = pd.DataFrame({
        'b': self.params,
        'se': self.bse,
        't': self.tvalues,
        'p': self.pvalues,
    })
    coef_table.index.name = 'Coefficient'
    return coef_table

2. Model Statistics Method (__maketables_stat__)

Add a method named __maketables_stat__ that returns model statistics by key:

def __maketables_stat__(self, key: str) -> float | str | int | None:
    """
    Return a model statistic by key.
    
    Common keys:
    - 'N': number of observations
    - 'r2': R-squared
    - 'adj_r2': adjusted R-squared
    - 'r2_within': within R-squared (panel models)
    - 'r2_between': between R-squared (panel models)
    - 'rmse': root mean squared error
    - 'aic': Akaike information criterion
    - 'bic': Bayesian information criterion
    - 'fvalue': F-statistic
    - 'f_pvalue': F-statistic p-value
    - 'se_type': type of standard errors (e.g., 'robust', 'clustered')
    - 'll': log-likelihood
    
    Args
    ----
    key : str
        The statistic key to retrieve.
    
    Returns
    -------
    float, str, int, or None
        The statistic value, or None if not available.
    """
    stats = {
        'N': self.nobs,
        'r2': self.rsquared,
        'adj_r2': self.rsquared_adj,
        'aic': self.aic,
        'bic': self.bic,
    }
    return stats.get(key)

3. Dependent Variable Name (__maketables_depvar__)

Add a property named __maketables_depvar__:

@property
def __maketables_depvar__(self) -> str:
    """
    Return the name of the dependent variable.
    
    Returns
    -------
    str
        Name of the dependent variable (e.g., 'wage', 'log_income').
    """
    return self.model.endog_names  # or however you store this

Optional Attributes

4. Fixed Effects String (__maketables_fixef_string__)

Add a property for models that support fixed effects:

@property
def __maketables_fixef_string__(self) -> str | None:
    """
    Return a string describing fixed effects.
    
    Returns
    -------
    str or None
        Fixed effects as a '+'-separated string (e.g., 'firm+year'),
        or None if no fixed effects / not applicable.
    """
    if hasattr(self, 'fe_vars'):
        return '+'.join(self.fe_vars)
    return None

5. Variable Labels (__maketables_var_labels__)

Add a property to provide variable name mappings:

@property
def __maketables_var_labels__(self) -> dict[str, str] | None:
    """
    Return a mapping from variable names to human-readable labels.
    
    Returns
    -------
    dict or None
        Mapping like {'wage': 'Log Wage', 'educ': 'Years of Education'}.
        Return None if no labels available.
    """
    if hasattr(self, 'data') and hasattr(self.data, 'attrs'):
        return self.data.attrs.get('variable_labels')
    return None

6. Variance-Covariance Information (__maketables_vcov_info__)

Add a property for variance-covariance matrix metadata:

@property
def __maketables_vcov_info__(self) -> dict[str, str] | None:
    """
    Return information about the variance-covariance matrix.
    
    Returns
    -------
    dict or None
        A dictionary with optional keys:
        - 'se_type': e.g., 'iid', 'robust', 'clustered'
        - 'cluster_var': name of clustering variable (if clustered)
        - 'cluster_level': level of clustering (if applicable)
        
        Return None or empty dict if not applicable.
    """
    vcov_info = {}
    if hasattr(self, 'cov_type'):
        vcov_info['se_type'] = self.cov_type
    if hasattr(self, 'cov_kwds') and 'groups' in self.cov_kwds:
        vcov_info['cluster_var'] = 'clustered'
    return vcov_info if vcov_info else None

7. Custom Statistic Labels (__maketables_stat_labels__)

Add an attribute to provide custom labels for model statistics:

@property
def __maketables_stat_labels__(self) -> dict[str, str] | None:
    """
    Return custom labels for model statistics.
    
    Returns
    -------
    dict or None
        Mapping from stat keys to display labels.
        Example: {'pseudo_r2': 'Pseudo R²', 'll': 'Log-Likelihood'}
        
        These labels override ETable's default labels but are overridden by
        user-specified labels in the ETable constructor.
    """
    return {
        'pseudo_r2': 'Pseudo R²',
        'll': 'Log-Likelihood',
        'chi2': 'χ² Statistic'
    }

8. Default Statistics to Display (__maketables_default_stat_keys__)

Add an attribute to specify which statistics should be shown by default:

@property
def __maketables_default_stat_keys__(self) -> list[str] | None:
    """
    Return a list of statistics to show by default for this model type.
    
    Returns
    -------
    list[str] or None
        List of stat keys to display by default.
        Example: ['N', 'pseudo_r2', 'll'] for logit/probit models
        
        When mixing model types in one table, ETable shows the union of
        all default stats. User-specified model_stats always override.
    """
    # For logit/probit models, show observations, pseudo R², and log-likelihood
    if self.model_type in ['logit', 'probit']:
        return ['N', 'pseudo_r2', 'll']
    return None  # Use ETable's defaults for other models

How maketables Detects and Uses These

When you pass a model to ETable(), maketables will automatically:

  1. Check for __maketables_coef_table__ → Use it as the coefficient table
  2. Check for __maketables_stat__(key) → Call it for requested statistics
  3. Check for __maketables_depvar__ → Use as dependent variable label
  4. Check for __maketables_fixef_string__ → Use for fixed effects panel (if applicable)
  5. Check for __maketables_var_labels__ → Use for variable relabeling (if applicable)
  6. Check for __maketables_vcov_info__ → Use for SE type information (if applicable)
  7. Check for __maketables_stat_labels__ → Use for custom statistic labels (if applicable)
  8. Check for __maketables_default_stat_keys__ → Use to determine which stats to show by default (if applicable)

Implementation Example

Here’s a complete example of a model result class implementing the plug-in format:

# mymodels/results.py
import pandas as pd

class MyRegressionResult:
    """A regression result object from the 'mymodels' package."""
    
    def __init__(self, params, bse, tvalues, pvalues, nobs, rsquared, 
                 depvar_name, data=None):
        self.params = params
        self.bse = bse
        self.tvalues = tvalues
        self.pvalues = pvalues
        self.nobs = nobs
        self.rsquared = rsquared
        self._depvar_name = depvar_name
        self.data = data
    
    @property
    def __maketables_coef_table__(self) -> pd.DataFrame:
        """Standard maketables coefficient table."""
        return pd.DataFrame({
            'b': self.params,
            'se': self.bse,
            't': self.tvalues,
            'p': self.pvalues,
        })
    
    def __maketables_stat__(self, key: str):
        """Standard maketables statistics access."""
        stats = {
            'N': self.nobs,
            'r2': self.rsquared,
        }
        return stats.get(key)
    
    @property
    def __maketables_depvar__(self) -> str:
        """Standard maketables dependent variable."""
        return self._depvar_name

Using Your Plug-in Compatible Model

Once your model class implements these attributes, users can use it directly with maketables without any additional setup:

from mymodels import MyRegression
from maketables import ETable

# Fit your model
result = MyRegression(y, X)

# maketables automatically detects the plug-in format!
table = ETable(result)
table.save('my_table.tex')