Poisson & GLMs

Core Estimation

Poisson for nonnegative outcomes, DiD with counts, and logit/probit for propensity-score based AB testing.

Scope

This vignette is practical by design. We focus on how to use pyfixest and when these models are useful in applied work.

1. Why Poisson? (and Why Wooldridge Likes It)

A common motivation for Poisson pseudo-maximum-likelihood (PPML) is that it works well for nonnegative outcomes and remains valid under fairly general forms of heteroskedasticity when the conditional mean is correctly specified.

In practice, that makes Poisson attractive for many applied settings where:

outcomes are counts or nonnegative flows,
zeros are common,
log-linear OLS can be fragile.

pyfixest provides PPML with high-dimensional fixed effects via pf.fepois().

2. A Standard Gravity-Style Trade Example (Boring but Useful)

Below is a simple gravity-style trade panel (exporter-importer-year). This is a synthetic dataset, but the specification mirrors common trade applications.

import numpy as np
import pandas as pd
import pyfixest as pf

def make_trade_data(seed=123, n_exporters=15, n_importers=15, years=range(2010, 2016)):
    rng = np.random.default_rng(seed)

    exporters = [f"E{i}" for i in range(n_exporters)]
    importers = [f"I{j}" for j in range(n_importers)]

    rows = []
    for y in years:
        for e in exporters:
            for i in importers:
                dist = rng.uniform(200, 9000)
                fta = rng.binomial(1, 0.15)
                exporter_fe = hash(e) % 7
                importer_fe = hash(i) % 6
                year_fe = y - min(years)

                # mean function for PPML
                eta = 2.5 - 0.35 * np.log(dist) + 0.25 * fta + 0.08 * exporter_fe + 0.06 * importer_fe + 0.05 * year_fe
                mu = np.exp(eta)
                trade = rng.poisson(mu)

                rows.append((y, e, i, dist, fta, trade))

    return pd.DataFrame(rows, columns=["year", "exporter", "importer", "distance", "fta", "trade"])

trade = make_trade_data()
trade.head()

	year	exporter	importer	distance	fta	trade
0	2010	E0	I0	6204.696397	0	0
1	2010	E0	I1	1822.471934	0	2
2	2010	E0	I2	7413.840142	1	1
3	2010	E0	I3	7453.326046	0	1
4	2010	E0	I4	8361.183875	0	1

fit_trade = pf.fepois(
    "trade ~ np.log(distance) + fta | exporter + importer + year",
    data=trade,
    vcov={"CRV1": "exporter"},
)
fit_trade.summary()

###

Estimation:  Poisson
Dep. var.: trade, Fixed effects: exporter + importer + year
sample: None = all
Inference:  CRV1
Observations:  1350

| Coefficient      |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:-----------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| np.log(distance) |     -0.368 |        0.022 |   -16.829 |      0.000 | -0.411 |  -0.325 |
| fta              |      0.254 |        0.040 |     6.338 |      0.000 |  0.175 |   0.332 |
---
Deviance: 1504.676

This is the basic PPML gravity workflow in pyfixest.

3. Handling Zeros: “The Log of Zeros”

One reason PPML is popular is straightforward handling of zero outcomes without ad-hoc transformations like log(y + 1).

This is central in modern work on nonnegative outcomes, including recent discussion in:

Bellégo, Benatia, and Pape (2024), Dealing with Logs and Zeros in Regression Models (QJE).

Quick contrast:

share_zeros = (trade["trade"] == 0).mean()
share_zeros

np.float64(0.32296296296296295)

With PPML, zero observations are naturally part of the likelihood through the mean function.

4. Poisson for a Simple DiD

Poisson DiD is useful when the outcome is a count or nonnegative rate and treatment is staggered or panel-based.

def make_count_did(seed=42, n_states=30, years=range(2010, 2018)):
    rng = np.random.default_rng(seed)
    states = [f"S{s}" for s in range(n_states)]

    rows = []
    treated_states = set(states[: n_states // 2])
    post_start = 2014

    for y in years:
        for s in states:
            treated = int(s in treated_states)
            post = int(y >= post_start)
            did = treated * post

            # latent state/year effects + treatment effect in post
            state_fe = (hash(s) % 9) / 10
            year_fe = (y - min(years)) / 10
            eta = 1.2 + state_fe + year_fe + 0.20 * did
            mu = np.exp(eta)
            y_count = rng.poisson(mu)

            rows.append((s, y, treated, post, did, y_count))

    return pd.DataFrame(rows, columns=["state", "year", "treated", "post", "did", "y_count"])

did_df = make_count_did()

fit_did_pois = pf.fepois(
    "y_count ~ did | state + year",
    data=did_df,
    vcov={"CRV1": "state"},
)
fit_did_pois.summary()

###

Estimation:  Poisson
Dep. var.: y_count, Fixed effects: state + year
sample: None = all
Inference:  CRV1
Observations:  240

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| did           |      0.172 |        0.088 |     1.955 |      0.051 | -0.000 |   0.345 |
---
Deviance: 244.495

If your outcome is nonnegative and has many zeros, this can be a practical alternative to log-linear OLS DiD.

5. Logit/Probit for Doubly Debiased AB Testing with Propensity Scores

For observational AB tests (or experiments with imperfect randomization), a common strategy is to combine:

a propensity score model P(T=1|X), and
an outcome model E[Y|T,X].

This yields doubly robust / doubly debiased estimators (AIPW-style scores).

Simulate an AB setting

def make_ab_data(seed=1, n=5000):
    rng = np.random.default_rng(seed)

    segment = rng.integers(0, 20, size=n)
    x1 = rng.normal(size=n)
    x2 = rng.normal(size=n)

    # non-random treatment assignment
    p = 1 / (1 + np.exp(-(-0.2 + 0.7 * x1 - 0.4 * x2 + 0.15 * (segment % 4))))
    t = rng.binomial(1, p)

    # outcome with true effect tau
    tau = 0.25
    y = tau * t + 0.6 * x1 + 0.3 * x2 + 0.2 * (segment % 5) + rng.normal(scale=1.0, size=n)

    return pd.DataFrame({"y": y, "t": t, "x1": x1, "x2": x2, "segment": segment})

ab = make_ab_data()
ab.head()

	y	t	x1	x2	segment
0	1.403683	1	0.247452	0.620339	9
1	-0.698392	1	0.710294	1.163892	10
2	0.504507	0	-0.650713	-0.347768	15
3	1.283318	0	-0.229100	0.122556	19
4	-0.181560	0	-0.781772	0.866387	0

Propensity via logit/probit (with fixed effects)

# both support fixed effects via the | syntax
ps_logit = pf.feglm("t ~ x1 + x2 | segment", data=ab, family="logit")
ps_probit = pf.feglm("t ~ x1 + x2 | segment", data=ab, family="probit")

ab["ehat_logit"] = np.clip(ps_logit.predict(type="response"), 0.01, 0.99)
ab["ehat_probit"] = np.clip(ps_probit.predict(type="response"), 0.01, 0.99)

Outcome regressions and a simple AIPW score

mu1 = pf.feols("y ~ x1 + x2 | segment", data=ab.loc[ab["t"] == 1]).predict(newdata=ab)
mu0 = pf.feols("y ~ x1 + x2 | segment", data=ab.loc[ab["t"] == 0]).predict(newdata=ab)

ab["mu1"] = mu1
ab["mu0"] = mu0

# AIPW score for ATE using logit propensity
ab["score_aipw_logit"] = (
    ab["mu1"] - ab["mu0"]
    + ab["t"] * (ab["y"] - ab["mu1"]) / ab["ehat_logit"]
    - (1 - ab["t"]) * (ab["y"] - ab["mu0"]) / (1 - ab["ehat_logit"])
)

ate_aipw_logit = ab["score_aipw_logit"].mean()
ate_aipw_logit

np.float64(0.2504501667772698)

This is the basic doubly debiased pattern: estimate treatment propensities with feglm (logit/probit), combine with outcome models, and form an orthogonal score.

6. GLMs with Marginal Effects (Placeholder)

Coming Soon

This section will add a full workflow for marginal effects after feglm() estimation, including:

average marginal effects (AMEs),
subgroup/conditional marginal effects,
inference with robust and clustered standard errors,
comparison of coefficient-scale vs probability-scale interpretation.

Planned examples:

Binary outcome with family="logit" and family="probit".
Post-estimation marginal effects tables suitable for reporting.
Practical guidance on when marginal effects change substantive conclusions relative to raw GLM coefficients.

7. Fixed Effects Support (Summary)

Both Poisson and GLM estimators in pyfixest support fixed effects using the same formula syntax:

Poisson: pf.fepois("y ~ x | fe1 + fe2", ...)
GLM families (logit, probit, gaussian): pf.feglm("y ~ x | fe1 + fe2", family="logit", ...)

That shared syntax makes it easy to move between linear, Poisson, and binary-response workflows while keeping your FE structure aligned.

Where to Go Next

Difference-in-Differences: richer DiD estimators and event studies.
Standard Errors & Inference: clustered inference, randomization inference, and wild bootstrap.
How-To: Variance Reduction in AB Tests: AB testing workflows with controls and panel structure.