Estimator Overview#
ModernDiD provides several estimators for different research designs. All estimators share a common API pattern, so once you learn one the others follow naturally. This page provides an overview of each estimator, its key arguments, and important caveats. For detailed usage with real data, see the individual example pages: Staggered DiD, Continuous DiD, Triple DiD, Intertemporal DiD, and Sensitivity Analysis.
Choosing the right estimator#
The choice of estimator depends on the structure of your treatment variable and research question.
Many applied settings involve a binary treatment that turns on permanently at staggered times across groups.
att_gthandles this staggered adoption case and is a good starting point for most analyses.When treatment intensity varies continuously across units,
cont_didextends the framework to recover dose-response functions showing how effects scale with dosage.When a policy enables treatment for a group but only a subset of units within that group is actually eligible,
dddexploits this additional within-group variation. Parental leave affecting women but not men, or minimum wage affecting hourly but not salaried workers, are canonical examples.When treatment is not permanent and can switch on and off or change intensity over time,
did_multiplegtprovides valid inference by comparing units whose treatment changed to those with the same baseline treatment that have not yet changed.After running any estimator,
honest_didassesses how large violations of the parallel trends assumption would need to be to overturn your conclusions.
Staggered Difference-in-Differences#
The att_gt function is the primary estimator for
staggered treatment adoption with binary, absorbing treatment. It estimates
group-time average treatment effects on the treated (ATT(g,t)) and
is the recommended starting point for most DiD analyses. This implements the
Callaway and Sant’Anna (2021)
framework.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
xformla="~ covariate",
control_group="nevertreated",
est_method="dr",
)
The result includes group-time ATT estimates, analytical standard errors, a variance-covariance matrix, and influence functions. A Wald pre-test for parallel trends is computed automatically from pre-treatment periods.
Group-time estimates are typically aggregated into interpretable summary
parameters using aggte:
# Event study (dynamic effects relative to treatment)
event_study = did.aggte(result, type="dynamic")
# Simple weighted average across all post-treatment (g,t) cells
simple_agg = did.aggte(result, type="simple")
# Group-level averages (one ATT per cohort)
group_agg = did.aggte(result, type="group")
# Calendar-time averages (one ATT per period)
calendar_agg = did.aggte(result, type="calendar")
Clustered standard errors require boot=True. When clustervars is
specified without the bootstrap, the reported standard errors do not account
for clustering. At most two clustering variables are supported.
When a Dask or Spark DataFrame is passed as data, the estimator
automatically routes to a distributed implementation. See Distributed Estimation
for configuration details.
Triple Difference-in-Differences#
The ddd function leverages an additional
dimension of variation such as eligibility status. The API follows
the same pattern as the other estimators.
This implements the
Ortiz-Villavicencio and Sant’Anna (2025)
framework.
result = did.ddd(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
pname="eligible", # partition/eligibility variable
xformla="~ covariate",
control_group="nevertreated",
est_method="dr",
)
The triple DiD estimator adds pname to specify the partition variable
that identifies eligible units within treatment groups. All other core
arguments work the same as att_gt.
The estimator automatically detects whether the data has two periods or
multiple periods, and whether the data is a balanced panel or repeated
cross-sections. For two-period data the control_group and
base_period parameters are ignored since there is only one possible
comparison. Like att_gt, passing a Dask or Spark DataFrame automatically
routes to a distributed implementation.
Difference-in-Differences with Continuous Treatments#
The cont_did function handles settings with
treatment intensity rather than binary treatment. This implements the
Callaway, Goodman-Bacon, and Sant’Anna (2024)
framework.
result = did.cont_did(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
dname="dose",
control_group="notyettreated",
anticipation=0,
base_period="varying",
alp=0.05,
boot=True,
biters=1000,
clustervars=["unit_id"],
# Method-specific options
target_parameter="level", # level or slope
aggregation="dose", # dose or eventstudy
dose_est_method="parametric", # parametric or cck
)
All the inference options (alp, boot, biters, clustervars,
cband) work the same way across estimators. The shared estimation
options (control_group, anticipation, base_period) also behave
identically.
Important
The continuous treatment estimator does not yet support covariates (only
xformla="~1"), unbalanced panels, or discrete treatment values.
Two-way clustering is not supported. The CCK estimation method
(dose_est_method="cck") requires exactly two groups and two time
periods, and cannot be combined with event study aggregation.
Difference-in-Differences with Intertemporal Treatment Effects#
The did_multiplegt function handles settings with
non-binary, non-absorbing (time-varying) treatments where lagged treatments
may affect outcomes. This implements the
de Chaisemartin and D’Haultfoeuille (2024)
framework.
result = did.did_multiplegt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
dname="treatment", # treatment variable (can vary over time)
effects=5, # number of post-treatment periods
placebo=3, # number of placebo periods
cluster="unit_id",
)
Unlike att_gt which requires a gname (first treatment period), the
intertemporal estimator uses dname directly since treatment can change
multiple times. The estimator compares units whose treatment changes
(“switchers”) to units with the same baseline treatment that have not yet
switched. Setting effects=L produces estimates for each period of
exposure from 1 through L, and placebo=K produces K pre-treatment
placebo estimates for testing parallel trends.
result = did.did_multiplegt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
dname="treatment",
# Effect options
effects=5,
placebo=3,
normalized=True, # normalize by cumulative treatment change
effects_equal=True, # chi-squared test for equal effects
# Inference options
cluster="unit_id",
ci_level=95.0,
boot=True,
biters=1000,
# Control options
controls=["covariate1", "covariate2"],
trends_lin=True, # unit-specific linear trends
)
By default, units that experience both treatment increases and decreases
(bidirectional switchers) are dropped because their treatment effects can
be written as a linear combination with negative weights, making the
estimates difficult to interpret causally. Set
keep_bidirectional_switchers=True to override this, but interpret
results with caution.
The result includes an average total effect (ATE) per unit of treatment
that accounts for both contemporaneous and lagged effects. The ATE is not
computed when trends_lin=True. The estimator can also restrict to
one direction of treatment change with switchers="in" or
switchers="out", and test for effect heterogeneity across
time-invariant covariates with predict_het. See the
user guide for worked examples of these features.
Important
When continuous > 0, the variance estimators are not backed by
proven asymptotic normality. Bootstrap inference (boot=True) is
recommended.
Tip
Both cont_did and did_multiplegt accept
a dose/treatment variable dname, but they target fundamentally
different settings.
cont_didassumes treatment is absorbing. Once treated, a unit stays treated. Units differ only in how much treatment they receive (e.g., different minimum wage amounts across counties). The goal is to recover a dose-response function.did_multiplegtallows treatment to be non-absorbing. A unit’s treatment can change, reverse, or fluctuate over time (e.g., tax rates adjusted every year). The goal is to estimate the effect of a treatment change, accounting for dynamics and lagged effects.
As a rule of thumb, if each unit receives a fixed dose at adoption, use
cont_did. If treatment values shift period to period,
use did_multiplegt.
Sensitivity Analysis for Parallel Trends Violations#
The didhonest module assesses robustness to parallel
trends violations. It takes results from att_gt, or external event
study results, and produces confidence intervals that remain valid under
specified degrees of parallel trends violation. This follows the
Rambachan and Roth (2023)
framework.
from moderndid.didhonest import honest_did
# First estimate group-time effects
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
)
# Aggregate into an event study (required)
event_study = did.aggte(result, type="dynamic")
# Then conduct sensitivity analysis
sensitivity = honest_did(event_study, event_time=0, sensitivity_type="smoothness")
The input must be a dynamic event study aggregation (not group- or
calendar-level), and the event study must have influence functions computed.
Pre-treatment and post-treatment event times must be consecutive integers
with no gaps. The requested event_time must exist in the post-treatment
periods.
Next steps#
Each estimator has a dedicated example page that walks through a full analysis with real or simulated data.
Staggered Difference-in-Differences for staggered adoption with
att_gtContinuous Difference-in-Differences for dose-response with
cont_didTriple Difference-in-Differences for triple differences with
dddDiD with Intertemporal Treatment Effects for time-varying treatments with
did_multiplegtSensitivity Analysis for Parallel Trends for sensitivity analysis with
honest_did
For scaling any of these estimators to large datasets, see Distributed Estimation.