Estimator Overview#

ModernDiD provides several estimators for different research designs. All estimators share a common API pattern, so once you learn one the others follow naturally. This page provides an overview of each estimator, its key arguments, and important caveats. For detailed usage with real data, see the individual example pages: Staggered DiD, Continuous DiD, Triple DiD, Intertemporal DiD, and Sensitivity Analysis.

Choosing the right estimator#

The choice of estimator depends on the structure of your treatment variable and research question.

  • Many applied settings involve a binary treatment that turns on permanently at staggered times across groups. att_gt handles this staggered adoption case and is a good starting point for most analyses.

  • etwfe provides an alternative regression-based approach to the same staggered adoption setting. It saturates a TWFE regression with cohort-by-time interactions and extends naturally to nonlinear models (Poisson, logit, probit) where the semiparametric methods are unavailable.

  • When treatment intensity varies continuously across units, cont_did extends the framework to recover dose-response functions showing how effects scale with dosage.

  • When a policy enables treatment for a group but only a subset of units within that group is actually eligible, ddd exploits this additional within-group variation. Parental leave affecting women but not men, or minimum wage affecting hourly but not salaried workers, are canonical examples.

  • When treatment is not permanent and can switch on and off or change intensity over time, did_multiplegt provides valid inference by comparing units whose treatment changed to those with the same baseline treatment that have not yet changed.

  • When units dynamically select into treatment based on past outcomes and covariates, dyn_balancing estimates the effect of specific treatment histories using sequential covariate balancing weights that do not require propensity score estimation. This is appropriate when parallel trends is violated by dynamic treatment selection.

  • After running any estimator, honest_did assesses how large violations of the parallel trends assumption would need to be to overturn your conclusions.

  • npiv estimates nonparametric structural functions using instrumental variables and B-spline sieves. It serves as a standalone tool for Engel curve estimation and similar problems, and also powers the nonparametric dose-response estimator in cont_did.

Staggered Difference-in-Differences#

The att_gt function is the primary estimator for staggered treatment adoption with binary, absorbing treatment. It estimates group-time average treatment effects on the treated (ATT(g,t)) and is the recommended starting point for most DiD analyses. This implements the Callaway and Sant’Anna (2021) framework.

result = did.att_gt(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    gname="first_treated",
    xformla="~ covariate",
    control_group="nevertreated",
    est_method="dr",
)

The result includes group-time ATT estimates, analytical standard errors, a variance-covariance matrix, and influence functions. A Wald pre-test for parallel trends is computed automatically from pre-treatment periods.

Group-time estimates are typically aggregated into interpretable summary parameters using aggte:

# Event study (dynamic effects relative to treatment)
event_study = did.aggte(result, type="dynamic")

# Simple weighted average across all post-treatment (g,t) cells
simple_agg = did.aggte(result, type="simple")

# Group-level averages (one ATT per cohort)
group_agg = did.aggte(result, type="group")

# Calendar-time averages (one ATT per period)
calendar_agg = did.aggte(result, type="calendar")

Clustered standard errors require boot=True. When clustervars is specified without the bootstrap, the reported standard errors do not account for clustering. At most two clustering variables are supported.

When a Dask or Spark DataFrame is passed as data, the estimator automatically routes to a distributed implementation. See Distributed Estimation for configuration details.

Extended Two-Way Fixed Effects (ETWFE)#

The etwfe function provides a regression-based alternative to att_gt for the same staggered adoption setting. It saturates the model with cohort-by-time interaction terms so that each (cohort, period) cell gets its own treatment effect, avoiding the negative weighting problem of conventional TWFE. This implements the Wooldridge (2025) framework.

mod = did.etwfe(
    data=data,
    yname="outcome",
    tname="year",
    gname="first_treated",
    idname="unit_id",
    xformla="~ covariate",
)

The cell-level estimates are then aggregated using emfx, which plays the same role as aggte for the Callaway and Sant’Anna estimator.

simple = did.emfx(mod, type="simple")
event  = did.emfx(mod, type="event")
group  = did.emfx(mod, type="group")
cal    = did.emfx(mod, type="calendar")

A key advantage of ETWFE over the semiparametric approach is native support for nonlinear models. Setting family="poisson" imposes parallel trends on the log scale, which is often more plausible for count or nonnegative outcomes. "logit" and "probit" are also available. Heterogeneous treatment effects by a categorical covariate can be estimated with the xvar parameter.

mod_pois = did.etwfe(
    data=data,
    yname="count_outcome",
    tname="year",
    gname="first_treated",
    family="poisson",
)
did.emfx(mod_pois, type="event")

Triple Difference-in-Differences#

The ddd function leverages an additional dimension of variation such as eligibility status. The API follows the same pattern as the other estimators. This implements the Ortiz-Villavicencio and Sant’Anna (2025) framework.

result = did.ddd(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    gname="first_treated",
    pname="eligible",              # partition/eligibility variable
    xformla="~ covariate",
    control_group="nevertreated",
    est_method="dr",
)

The triple DiD estimator adds pname to specify the partition variable that identifies eligible units within treatment groups. All other core arguments work the same as att_gt.

The estimator automatically detects whether the data has two periods or multiple periods, and whether the data is a balanced panel or repeated cross-sections. For two-period data the control_group and base_period parameters are ignored since there is only one possible comparison. Like att_gt, passing a Dask or Spark DataFrame automatically routes to a distributed implementation.

Difference-in-Differences with Continuous Treatments#

The cont_did function handles settings with treatment intensity rather than binary treatment. This implements the Callaway, Goodman-Bacon, and Sant’Anna (2024) framework.

result = did.cont_did(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    gname="first_treated",
    dname="dose",
    control_group="notyettreated",
    anticipation=0,
    base_period="varying",
    alp=0.05,
    boot=True,
    biters=1000,
    clustervars=["unit_id"],
    # Method-specific options
    target_parameter="level",      # level or slope
    aggregation="dose",            # dose or eventstudy
    dose_est_method="parametric",  # parametric or cck
)

All the inference options (alp, boot, biters, clustervars, cband) work the same way across estimators. The shared estimation options (control_group, anticipation, base_period) also behave identically.

Important

The continuous treatment estimator does not yet support covariates (only xformla="~1"), unbalanced panels, or discrete treatment values. Two-way clustering is not supported. The CCK estimation method (dose_est_method="cck") requires exactly two groups and two time periods, and cannot be combined with event study aggregation.

Dynamic Covariate Balancing DiD#

The dyn_balancing function estimates treatment effects in panel data where treatments change dynamically over time and units select into treatment based on past outcomes and covariates. This implements the Viviano and Bradic (2026) framework.

from moderndid.diddynamic import dyn_balancing

result = dyn_balancing(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    treatment_name="treatment",
    ds1=[1, 1],                   # always treated for 2 periods
    ds2=[0, 0],                   # never treated for 2 periods
    xformla="~ covariate1 + covariate2",
    fixed_effects=["region"],
    balancing="dcb",
)

The result includes the ATE, potential outcomes under each treatment history, analytical standard errors, and covariate imbalance diagnostics. The ds1 and ds2 arguments specify the two treatment sequences to compare, where the last element corresponds to the final period.

Three estimation modes are available through additional arguments. histories_length traces out how the effect evolves with exposure length (1 through \(h\) periods), final_periods estimates effects at different final time points, and impulse_response=True measures the effect of a one-period treatment shock at varying horizons.

history = dyn_balancing(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    treatment_name="treatment",
    ds1=[1, 1, 1, 1, 1],
    ds2=[0, 0, 0, 0, 0],
    histories_length=[1, 2, 3, 4, 5],
    xformla="~ covariate1 + covariate2",
)

Unlike the staggered DiD estimators, dynamic covariate balancing does not require parallel trends or staggered adoption. Instead, it relies on sequential ignorability (no unobserved confounders conditional on past observables) and a high-dimensional linear model on potential outcomes. The DCB weights are constructed through a quadratic program that does not require estimating or specifying the propensity score.

Difference-in-Differences with Intertemporal Treatment Effects#

The did_multiplegt function handles settings with non-binary, non-absorbing (time-varying) treatments where lagged treatments may affect outcomes. This implements the de Chaisemartin and D’Haultfoeuille (2024) framework.

result = did.did_multiplegt(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    dname="treatment",            # treatment variable (can vary over time)
    effects=5,                    # number of post-treatment periods
    placebo=3,                    # number of placebo periods
    cluster="unit_id",
)

Unlike att_gt which requires a gname (first treatment period), the intertemporal estimator uses dname directly since treatment can change multiple times. The estimator compares units whose treatment changes (“switchers”) to units with the same baseline treatment that have not yet switched. Setting effects=L produces estimates for each period of exposure from 1 through L, and placebo=K produces K pre-treatment placebo estimates for testing parallel trends.

result = did.did_multiplegt(
    data=data,
    yname="outcome",
    tname="year",
    idname="unit_id",
    dname="treatment",
    # Effect options
    effects=5,
    placebo=3,
    normalized=True,              # normalize by cumulative treatment change
    effects_equal=True,           # chi-squared test for equal effects
    # Inference options
    cluster="unit_id",
    ci_level=95.0,
    boot=True,
    biters=1000,
    # Control options
    controls=["covariate1", "covariate2"],
    trends_lin=True,              # unit-specific linear trends
)

By default, units that experience both treatment increases and decreases (bidirectional switchers) are dropped because their treatment effects can be written as a linear combination with negative weights, making the estimates difficult to interpret causally. Set keep_bidirectional_switchers=True to override this, but interpret results with caution.

The result includes an average total effect (ATE) per unit of treatment that accounts for both contemporaneous and lagged effects. The ATE is not computed when trends_lin=True. The estimator can also restrict to one direction of treatment change with switchers="in" or switchers="out", and test for effect heterogeneity across time-invariant covariates with predict_het. See the user guide for worked examples of these features.

Important

When continuous > 0, the variance estimators are not backed by proven asymptotic normality. Bootstrap inference (boot=True) is recommended.

Tip

Both cont_did and did_multiplegt accept a dose/treatment variable dname, but they target fundamentally different settings.

  • cont_did assumes treatment is absorbing. Once treated, a unit stays treated. Units differ only in how much treatment they receive (e.g., different minimum wage amounts across counties). The goal is to recover a dose-response function.

  • did_multiplegt allows treatment to be non-absorbing. A unit’s treatment can change, reverse, or fluctuate over time (e.g., tax rates adjusted every year). The goal is to estimate the effect of a treatment change, accounting for dynamics and lagged effects.

As a rule of thumb, if each unit receives a fixed dose at adoption, use cont_did. If treatment values shift period to period, use did_multiplegt.

Nonparametric Instrumental Variables#

The npiv function estimates nonparametric structural functions using B-spline sieves and two-stage least squares, with uniform confidence bands from the weighted bootstrap. This implements the Chen, Christensen, and Kankanala (2024) methodology.

result = did.npiv(
    data=data,
    yname="food_share",
    xname="log_expenditure",
    wname="log_wages",
    j_x_segments=5,
    biters=200,
    seed=42,
)

The result contains the estimated function h, derivative deriv, and 95% uniform confidence bands. When j_x_segments is omitted, the sieve dimension is selected automatically using the Lepski method, yielding adaptive confidence bands that are honest over a class of data-generating processes.

NPIV also serves as the estimation engine behind the nonparametric (CCK) dose-response estimator in cont_did. As a standalone tool, it is useful for Engel curve estimation, structural demand analysis, and other settings where the regressor is endogenous and the functional form is unknown.

Next steps#

Each estimator has a dedicated example page that walks through a full analysis with real or simulated data.

For scaling any of these estimators to large datasets, see Distributed Estimation.