On Small Sample Corrections

The fixest R package provides various options for small sample corrections. While it has an excellent vignette on the topic, reproducing its behavior in pyfixest took more time than expected. So that future developers (and my future self) can stay sane, I’ve compiled all of my hard-earned understanding of how small sample adjustments work in fixest and how they are implemented in pyfixest in this document.

In both fixest and pyfixest, small sample corrections are controlled by the ssc function. In pyfixest, ssc accepts four arguments: adj, cluster_adj, fixef_k and cluster_df.

Based on these inputs, the adjusted variance-covariance matrix is computed as:

vcov_adj = adj_val(N, dof_k) if adj else 1
          * cluster_adj_val(G, cluster_df) if cluster_adj else 1
          * vcov

Where:

adj: Enables or disables the first scalar adjustment.
cluster_adj: Enables or disables the second scalar adjustment.
vcov: The unadjusted variance-covariance matrix.
dof_k: The number of estimated parameters considered in the first adjustment. Impacts adj_val.
fixef_k: Determines how dof_k is computed (how fixed effects are counted).
cluster_df: Determines how cluster_adj_val is computed (only relevant for multi-way clustering).
G: The number of unique clusters (G = N for heteroskedastic errors).

Outside of this formula, we have df_t, which is the degrees of freedom used for p-values and confidence intervals:

df_t = N - dof_k for IID or heteroskedastic errors.
df_t = G - 1 for clustered errors.

Small Sample Adjustments

`adj = True`

If adj = True, the adjustment factor is:

adj_val = (N - 1) / (N - dof_k)

If adj = False, no adjustment is applied.

`fixef_k`

The fixef_k argument controls how fixed effects contribute to dof_k, and thus to adj_val. It supports three options:

"none"
"full"
"nested"

`fixef_k = "none"`

Fixed effects are ignored when counting parameters:

Example:
- Y ~ X1 | f1 → k = 1
- Y ~ X1 + X2 | f1 → k = 2

`fixef_k = "full"`

Fixed effects are fully counted. For n_fe total fixed effects and each fixed effect f_i, we set dof_k = k + k_fe,

If there is more than one fixed effect, we drop one level from each fixed effects except the first (to avoid multicollinearity) k_fe = sum_{i=1}^{n_fe} levels(f_i) - (n_fe - 1)
If there is only one fixed effect: k_fe = sum_{i=1}^{n_fe} levels(f_i) = levels(f_1)

`fixef_k = "nested"`

Fixed effects may be nested within cluster variables (e.g., district FEs nested in state clusters). If fixef_k = "nested", nested fixed effects do not count toward k_fe:

k_fe = sum_{i=1}^{n_fe} levels(f_i) - k_fe_nested - (n_fe - 1)

where k_fe_nested is the count of nested fixed effects. For cluster fixed effects, k_fe_nested = G, the number of clusters.

⚠️ Note: If you already subtracted a level from a nested FE, you may need to add it back.

`cluster_adj`

If cluster_adj = True, we apply a second correction:

cluster_df_val = G / (G - 1)

Where:

G is the number of clusters for clustered errors, or N for heteroskedastic errors.
This follows the approach in R’s sandwich package, interpreting heteroskedastic errors as “singleton clusters.”

Tip: If cluster_adj = True for IID errors, cluster_df_val defaults to 1. For heteroskedastic erros, despite its name, cluster_adj=True will apply an adjustment of (N-1) / N, as there are \(G = N\) singleton clusters.

`cluster_df`

Relevant only for multi-way clustering. Two-way clustering, for example, can be written as:

vcov = ssc_A * vcov_A + ssc_B * vcov_B - ssc_AB * vcov_AB

where A and B are clustering dimensions, with G_AB > G_A > G_B.

If cluster_df = "min", then G is set to the minimum value of G_A, G_B, and G_AB.
If cluster_df = "conventional", each clustering dimension uses its own cluster count (G_A, G_B, etc.) for its respective adjustment.

In Code

All of the above logic is implemented here.

Small Sample Adjustments

adj = True

fixef_k

fixef_k = "none"

fixef_k = "full"

fixef_k = "nested"

cluster_adj

cluster_df

More on Inference