Estimator Overview#
ModernDiD provides several estimators for different research designs. All estimators share a common API pattern, so once you learn one the others follow naturally. This page provides an overview of each estimator, its key arguments, and important caveats. For detailed usage with real data, see the individual example pages: Staggered DiD, Continuous DiD, Triple DiD, Intertemporal DiD, and Sensitivity Analysis.
Choosing the right estimator#
The choice of estimator depends on the structure of your treatment variable and research question.
Many applied settings involve a binary treatment that turns on permanently at staggered times across groups.
att_gthandles this staggered adoption case and is a good starting point for most analyses.etwfeprovides an alternative regression-based approach to the same staggered adoption setting. It saturates a TWFE regression with cohort-by-time interactions and extends naturally to nonlinear models (Poisson, logit, probit) where the semiparametric methods are unavailable.When treatment intensity varies continuously across units,
cont_didextends the framework to recover dose-response functions showing how effects scale with dosage.When a policy enables treatment for a group but only a subset of units within that group is actually eligible,
dddexploits this additional within-group variation. Parental leave affecting women but not men, or minimum wage affecting hourly but not salaried workers, are canonical examples.When treatment is not permanent and can switch on and off or change intensity over time,
did_multiplegtprovides valid inference by comparing units whose treatment changed to those with the same baseline treatment that have not yet changed.When units dynamically select into treatment based on past outcomes and covariates,
dyn_balancingestimates the effect of specific treatment histories using sequential covariate balancing weights that do not require propensity score estimation. This is appropriate when parallel trends is violated by dynamic treatment selection.After running any estimator,
honest_didassesses how large violations of the parallel trends assumption would need to be to overturn your conclusions.npivestimates nonparametric structural functions using instrumental variables and B-spline sieves. It serves as a standalone tool for Engel curve estimation and similar problems, and also powers the nonparametric dose-response estimator incont_did.
Staggered Difference-in-Differences#
The att_gt function is the primary estimator for
staggered treatment adoption with binary, absorbing treatment. It estimates
group-time average treatment effects on the treated (ATT(g,t)) and
is the recommended starting point for most DiD analyses. This implements the
Callaway and Sant’Anna (2021)
framework.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
xformla="~ covariate",
control_group="nevertreated",
est_method="dr",
)
The result includes group-time ATT estimates, analytical standard errors, a variance-covariance matrix, and influence functions. A Wald pre-test for parallel trends is computed automatically from pre-treatment periods.
Group-time estimates are typically aggregated into interpretable summary
parameters using aggte:
# Event study (dynamic effects relative to treatment)
event_study = did.aggte(result, type="dynamic")
# Simple weighted average across all post-treatment (g,t) cells
simple_agg = did.aggte(result, type="simple")
# Group-level averages (one ATT per cohort)
group_agg = did.aggte(result, type="group")
# Calendar-time averages (one ATT per period)
calendar_agg = did.aggte(result, type="calendar")
Clustered standard errors require boot=True. When clustervars is
specified without the bootstrap, the reported standard errors do not account
for clustering. At most two clustering variables are supported.
When a Dask or Spark DataFrame is passed as data, the estimator
automatically routes to a distributed implementation. See Distributed Estimation
for configuration details.
Extended Two-Way Fixed Effects (ETWFE)#
The etwfe function provides a regression-based
alternative to att_gt for the same staggered adoption
setting. It saturates the model with cohort-by-time interaction terms so that
each (cohort, period) cell gets its own treatment effect, avoiding the
negative weighting problem of conventional TWFE. This implements the
Wooldridge (2025) framework.
mod = did.etwfe(
data=data,
yname="outcome",
tname="year",
gname="first_treated",
idname="unit_id",
xformla="~ covariate",
)
The cell-level estimates are then aggregated using emfx,
which plays the same role as aggte for the Callaway and
Sant’Anna estimator.
simple = did.emfx(mod, type="simple")
event = did.emfx(mod, type="event")
group = did.emfx(mod, type="group")
cal = did.emfx(mod, type="calendar")
A key advantage of ETWFE over the semiparametric approach is native support
for nonlinear models. Setting family="poisson" imposes parallel trends on
the log scale, which is often more plausible for count or nonnegative
outcomes. "logit" and "probit" are also available. Heterogeneous
treatment effects by a categorical covariate can be estimated with the
xvar parameter.
mod_pois = did.etwfe(
data=data,
yname="count_outcome",
tname="year",
gname="first_treated",
family="poisson",
)
did.emfx(mod_pois, type="event")
Triple Difference-in-Differences#
The ddd function leverages an additional
dimension of variation such as eligibility status. The API follows
the same pattern as the other estimators.
This implements the
Ortiz-Villavicencio and Sant’Anna (2025)
framework.
result = did.ddd(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
pname="eligible", # partition/eligibility variable
xformla="~ covariate",
control_group="nevertreated",
est_method="dr",
)
The triple DiD estimator adds pname to specify the partition variable
that identifies eligible units within treatment groups. All other core
arguments work the same as att_gt.
The estimator automatically detects whether the data has two periods or
multiple periods, and whether the data is a balanced panel or repeated
cross-sections. For two-period data the control_group and
base_period parameters are ignored since there is only one possible
comparison. Like att_gt, passing a Dask or Spark DataFrame automatically
routes to a distributed implementation.
Difference-in-Differences with Continuous Treatments#
The cont_did function handles settings with
treatment intensity rather than binary treatment. This implements the
Callaway, Goodman-Bacon, and Sant’Anna (2024)
framework.
result = did.cont_did(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
dname="dose",
control_group="notyettreated",
anticipation=0,
base_period="varying",
alp=0.05,
boot=True,
biters=1000,
clustervars=["unit_id"],
# Method-specific options
target_parameter="level", # level or slope
aggregation="dose", # dose or eventstudy
dose_est_method="parametric", # parametric or cck
)
All the inference options (alp, boot, biters, clustervars,
cband) work the same way across estimators. The shared estimation
options (control_group, anticipation, base_period) also behave
identically.
Important
The continuous treatment estimator does not yet support covariates (only
xformla="~1"), unbalanced panels, or discrete treatment values.
Two-way clustering is not supported. The CCK estimation method
(dose_est_method="cck") requires exactly two groups and two time
periods, and cannot be combined with event study aggregation.
Dynamic Covariate Balancing DiD#
The dyn_balancing function estimates treatment effects in panel data
where treatments change dynamically over time and units select into
treatment based on past outcomes and covariates. This implements the
Viviano and Bradic (2026)
framework.
from moderndid.diddynamic import dyn_balancing
result = dyn_balancing(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
treatment_name="treatment",
ds1=[1, 1], # always treated for 2 periods
ds2=[0, 0], # never treated for 2 periods
xformla="~ covariate1 + covariate2",
fixed_effects=["region"],
balancing="dcb",
)
The result includes the ATE, potential outcomes under each treatment
history, analytical standard errors, and covariate imbalance diagnostics.
The ds1 and ds2 arguments specify the two treatment sequences to
compare, where the last element corresponds to the final period.
Three estimation modes are available through additional arguments.
histories_length traces out how the effect evolves with exposure
length (1 through \(h\) periods), final_periods estimates effects
at different final time points, and impulse_response=True measures
the effect of a one-period treatment shock at varying horizons.
history = dyn_balancing(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
treatment_name="treatment",
ds1=[1, 1, 1, 1, 1],
ds2=[0, 0, 0, 0, 0],
histories_length=[1, 2, 3, 4, 5],
xformla="~ covariate1 + covariate2",
)
Unlike the staggered DiD estimators, dynamic covariate balancing does not require parallel trends or staggered adoption. Instead, it relies on sequential ignorability (no unobserved confounders conditional on past observables) and a high-dimensional linear model on potential outcomes. The DCB weights are constructed through a quadratic program that does not require estimating or specifying the propensity score.
Difference-in-Differences with Intertemporal Treatment Effects#
The did_multiplegt function handles settings with
non-binary, non-absorbing (time-varying) treatments where lagged treatments
may affect outcomes. This implements the
de Chaisemartin and D’Haultfoeuille (2024)
framework.
result = did.did_multiplegt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
dname="treatment", # treatment variable (can vary over time)
effects=5, # number of post-treatment periods
placebo=3, # number of placebo periods
cluster="unit_id",
)
Unlike att_gt which requires a gname (first treatment period), the
intertemporal estimator uses dname directly since treatment can change
multiple times. The estimator compares units whose treatment changes
(“switchers”) to units with the same baseline treatment that have not yet
switched. Setting effects=L produces estimates for each period of
exposure from 1 through L, and placebo=K produces K pre-treatment
placebo estimates for testing parallel trends.
result = did.did_multiplegt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
dname="treatment",
# Effect options
effects=5,
placebo=3,
normalized=True, # normalize by cumulative treatment change
effects_equal=True, # chi-squared test for equal effects
# Inference options
cluster="unit_id",
ci_level=95.0,
boot=True,
biters=1000,
# Control options
controls=["covariate1", "covariate2"],
trends_lin=True, # unit-specific linear trends
)
By default, units that experience both treatment increases and decreases
(bidirectional switchers) are dropped because their treatment effects can
be written as a linear combination with negative weights, making the
estimates difficult to interpret causally. Set
keep_bidirectional_switchers=True to override this, but interpret
results with caution.
The result includes an average total effect (ATE) per unit of treatment
that accounts for both contemporaneous and lagged effects. The ATE is not
computed when trends_lin=True. The estimator can also restrict to
one direction of treatment change with switchers="in" or
switchers="out", and test for effect heterogeneity across
time-invariant covariates with predict_het. See the
user guide for worked examples of these features.
Important
When continuous > 0, the variance estimators are not backed by
proven asymptotic normality. Bootstrap inference (boot=True) is
recommended.
Tip
Both cont_did and did_multiplegt accept
a dose/treatment variable dname, but they target fundamentally
different settings.
cont_didassumes treatment is absorbing. Once treated, a unit stays treated. Units differ only in how much treatment they receive (e.g., different minimum wage amounts across counties). The goal is to recover a dose-response function.did_multiplegtallows treatment to be non-absorbing. A unit’s treatment can change, reverse, or fluctuate over time (e.g., tax rates adjusted every year). The goal is to estimate the effect of a treatment change, accounting for dynamics and lagged effects.
As a rule of thumb, if each unit receives a fixed dose at adoption, use
cont_did. If treatment values shift period to period,
use did_multiplegt.
Nonparametric Instrumental Variables#
The npiv function estimates nonparametric structural
functions using B-spline sieves and two-stage least squares, with uniform
confidence bands from the weighted bootstrap. This implements the
Chen, Christensen, and Kankanala (2024)
methodology.
result = did.npiv(
data=data,
yname="food_share",
xname="log_expenditure",
wname="log_wages",
j_x_segments=5,
biters=200,
seed=42,
)
The result contains the estimated function h, derivative deriv, and
95% uniform confidence bands. When j_x_segments is omitted, the sieve
dimension is selected automatically using the Lepski method, yielding
adaptive confidence bands that are honest over a class of data-generating
processes.
NPIV also serves as the estimation engine behind the nonparametric (CCK)
dose-response estimator in cont_did. As a standalone
tool, it is useful for Engel curve estimation, structural demand analysis,
and other settings where the regressor is endogenous and the functional
form is unknown.
Sensitivity Analysis for Parallel Trends Violations#
The didhonest module assesses robustness to parallel
trends violations. It takes results from att_gt, or external event
study results, and produces confidence intervals that remain valid under
specified degrees of parallel trends violation. This follows the
Rambachan and Roth (2023)
framework.
from moderndid.didhonest import honest_did
# First estimate group-time effects
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
)
# Aggregate into an event study (required)
event_study = did.aggte(result, type="dynamic")
# Then conduct sensitivity analysis
sensitivity = honest_did(event_study, event_time=0, sensitivity_type="smoothness")
The input must be a dynamic event study aggregation (not group- or
calendar-level), and the event study must have influence functions computed.
Pre-treatment and post-treatment event times must be consecutive integers
with no gaps. The requested event_time must exist in the post-treatment
periods.
Next steps#
Each estimator has a dedicated example page that walks through a full analysis with real or simulated data.
Staggered Difference-in-Differences for staggered adoption with
att_gtExtended Two-Way Fixed Effects (ETWFE) for extended TWFE with
etwfeContinuous Difference-in-Differences for dose-response with
cont_didTriple Difference-in-Differences for triple differences with
dddDiD with Intertemporal Treatment Effects for time-varying treatments with
did_multiplegtDynamic Covariate Balancing DiD for dynamic treatments with
dyn_balancingSensitivity Analysis for Parallel Trends for sensitivity analysis with
honest_didNonparametric Instrumental Variables Estimation for nonparametric IV with
npiv
For scaling any of these estimators to large datasets, see Distributed Estimation.