ModernDiD Quickstart#
ModernDiD estimates causal effects using difference-in-differences methods.
The att_gt function computes group-time average treatment
effects for staggered adoption designs. Other estimators handle continuous
treatments (cont_did), triple differences
(ddd), intertemporal treatments
(did_multiplegt), and sensitivity analysis
(didhonest). Most estimators support both panel data and
repeated cross-sections.
All estimators share a consistent API built around core parameters that describe your data structure. For an introduction to difference-in-differences terminology and setup, see Introduction to Difference-in-Differences. For theoretical details on each estimator, see the Background section.
Dataframe Agnostic#
All causal inference studies begin with data, and in most cases ModernDiD can work with yours directly. Any DataFrame that implements the Arrow PyCapsule Interface can be passed directly to any estimator without manual conversion. This includes polars, pandas, pyarrow Table, duckdb, cudf, and other Arrow-compatible DataFrames.
import moderndid as did
# Works with pandas
import pandas as pd
df_pandas = pd.read_csv("data.csv")
result = did.att_gt(data=df_pandas, ...)
# Works with polars
import polars as pl
df_polars = pl.read_csv("data.csv")
result = did.att_gt(data=df_polars, ...)
# Works with pyarrow
import pyarrow.parquet as pq
table = pq.read_table("data.parquet")
result = did.att_gt(data=table, ...)
# Works with duckdb
import duckdb
df_duck = duckdb.query("SELECT * FROM 'data.parquet'").arrow()
result = did.att_gt(data=df_duck, ...)
Conversion happens automatically via narwhals, with all internal operations using Polars for performance. This means you get the speed benefits of Polars regardless of your input format.
Core Arguments#
Every estimator requires three variables that identify the panel structure.
ynameThe outcome variable you want to measure treatment effects on (such as employment, earnings, or test scores).
tnameThe time period variable that indexes when observations occur (such as year, quarter, or month).
idnameThe unit identifier that tracks the same unit across time periods (such as state, county, or individual ID).
The treatment variable differs by estimator. Most estimators use gname
to indicate when each unit was first treated (units never treated should
have a value of 0). The intertemporal estimator uses dname instead,
since treatment can change multiple times over the panel.
Options like xformla for covariates, est_method for estimation
approach, and boot for bootstrap inference work similarly across
all estimators. Once you learn one, the others follow the same patterns.
Basic Usage#
Estimation in ModernDiD is typically a two-step process. First, compute group-time
effects, then aggregate them into the summary you need. We will focus on the att_gt function
for now, but the same pattern generally applies to all estimators.
The att_gt function
estimates a separate average treatment effect for every combination of
treatment cohort (group) and calendar period. A treatment cohort is the set
of units that were first treated at the same time. The result is a full
matrix of effects that captures how treatment impacts evolve for each cohort
over time.
import moderndid as did
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
)
The group-time matrix is typically too detailed to
interpret directly. The aggte function collapses it
into one of four summary forms, each answering a different question.
# Event study: average effect at each time relative to treatment
event_study = did.aggte(result, type="dynamic")
# Simple: single weighted average across all groups and periods
overall = did.aggte(result, type="simple")
# Group: one average per treatment cohort
by_group = did.aggte(result, type="group")
# Calendar: one average per calendar period
by_time = did.aggte(result, type="calendar")
The "dynamic" aggregation is the most common starting point because the
resulting event study plot lets you visually assess pre-trends and trace how
treatment effects build or fade over time.
Estimation Options#
The att_gt function provides several options for customizing the
estimation procedure. The defaults reflect best practices, but you may
want to adjust them based on your data and research design.
Estimation Methods#
The est_method parameter controls how group-time effects are
estimated. The default is "dr" for doubly robust, but you can also
use inverse probability weighting or outcome regression.
# Doubly robust (default) - combines IPW and regression
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
est_method="dr",
)
# Inverse probability weighting
result = did.att_gt(..., est_method="ipw")
# Outcome regression
result = did.att_gt(..., est_method="reg")
Doubly robust estimation is recommended because it provides consistent estimates if either the propensity score model or the outcome regression model is correctly specified.
Covariates#
When treated and comparison groups differ on observable characteristics,
include covariates using the xformla parameter. This strengthens the
parallel trends assumption by conditioning on these differences.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
xformla="~ covariate1 + covariate2",
)
The formula uses R-style syntax. Covariates should be time-invariant or measured before treatment to avoid conditioning on post-treatment variables.
Control Group#
By default, att_gt uses never-treated units as the comparison group.
You can instead use not-yet-treated units, which includes units that will
be treated in the future but have not yet been treated at time t.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
control_group="notyettreated",
)
Using not-yet-treated units can increase precision when there are few never-treated units, but requires assuming that treatment timing is unrelated to potential outcomes.
Bootstrap Inference#
By default, att_gt uses the multiplier bootstrap to compute standard
errors and simultaneous confidence bands.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
boot=True, # Use bootstrap (default is False)
biters=1000, # Number of bootstrap iterations (default)
cband=True, # Compute uniform confidence bands (default)
alp=0.05, # Significance level (default)
random_state=42, # For reproducibility
)
Setting boot=False (the default) uses analytical standard errors instead
of bootstrap, which is faster but does not support clustering or uniform
confidence bands. Setting cband=False computes pointwise confidence
intervals instead of simultaneous bands.
Clustering#
Standard errors can be clustered at one or two levels using the
clustervars parameter. Clustering requires using the bootstrap.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
clustervars=["unit_id"],
boot=True,
)
You can cluster on at most two variables, and one of them must be the
unit identifier (idname).
Anticipation#
If units can anticipate treatment before it actually occurs, account for
this using the anticipation parameter. This shifts the base period
backward.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
anticipation=1, # Allow 1 period of anticipation
)
With anticipation=1, the base period for group g becomes period
g-2 instead of g-1, treating period g-1 as potentially affected
by anticipation of the upcoming treatment.
Base Period#
The base_period parameter controls which pre-treatment period is used
as the comparison. The default "varying" uses the period immediately
before treatment (or before anticipation). Setting base_period="universal"
uses the same base period for all groups.
# Varying base period (default)
result = did.att_gt(..., base_period="varying")
# Universal base period
result = did.att_gt(..., base_period="universal")
A universal base period can be useful when you want all pre-treatment estimates to be relative to the same reference point.
Panel Data Options#
By default, att_gt expects a balanced panel where all units are
observed in all time periods. If your panel is unbalanced, set
allow_unbalanced_panel=True.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
allow_unbalanced_panel=True,
)
For repeated cross-sections rather than panel data, set panel=False
and omit the idname parameter.
Sampling Weights#
If your data has sampling weights, specify them using weightsname.
result = did.att_gt(
data=data,
yname="outcome",
tname="year",
idname="unit_id",
gname="first_treated",
weightsname="weight_column",
)
Visualization#
ModernDiD provides built-in plotting functions that return
plotnine ggplot objects.
# Plot group-time effects
did.plot_gt(result)
# Plot event study
did.plot_event_study(event_study)
Because the return value is a standard ggplot object, you can customize
it with any plotnine layer using the + operator.
from plotnine import labs, theme, element_text
plot = did.plot_event_study(event_study, ref_period=-1)
plot = (
plot
+ labs(
title="Effect of Minimum Wage on Employment",
x="Years Relative to Treatment",
y="ATT (log points)",
)
+ theme(plot_title=element_text(size=14, weight="bold"))
)
plot.save("event_study.png", dpi=200, width=8, height=5)
ModernDiD also ships with ready-made themes for common use cases.
from moderndid.plots import theme_publication
did.plot_event_study(event_study) + theme_publication()
Next steps#
This page covered the core workflow for staggered DiD with att_gt.
From here you can explore several directions depending on your research
design.
Panel Data Utilities shows how to diagnose and clean messy panel data before estimation.
Estimator Overview surveys all available estimators, including continuous treatment, triple differences, intertemporal DiD, and sensitivity analysis.
Distributed Estimation explains how to scale estimation to datasets that do not fit on one machine using Dask.
The Examples section walks through each estimator end-to-end with real and simulated data.