moderndid.ddd_mp_rc#

moderndid.ddd_mp_rc(data, y_col, time_col, id_col, group_col, partition_col, covariate_cols=None, control_group='nevertreated', base_period='universal', est_method='dr', boot=False, biters=1000, cband=False, cluster=None, alpha=0.05, trim_level=0.995, random_state=None, n_jobs=1)[source]#

Compute the multi-period doubly robust DDD estimator for the ATT with repeated cross-section data.

Implements the multi-period triple difference-in-differences estimator from [1] for repeated cross-section data with staggered treatment adoption. Unlike panel data, different samples are observed in each period.

The target parameters are the group-time average treatment effects

\[ATT(g, t) = \mathbb{E}[Y_t(g) - Y_t(\infty) \mid S=g, Q=1]\]

for all treatment cohorts \(g \in \mathcal{G}_{\mathrm{trt}}\) and time periods \(t \in \{2, \ldots, T\}\) such that \(t \geq g\).

For each \((g, t)\) cell, the estimator compares outcomes at time \(t\) to a base period. With base_period="universal", all comparisons use period \(g-1\) (the last pre-treatment period for cohort \(g\)). With base_period="varying", each comparison uses period \(t-1\).

For repeated cross-sections, the estimator follows the approach of [2], extending the DDD framework from [1]. Unlike panel data where outcomes are differenced within units, RCS fits separate outcome regression models for the target period \(t\) and the base period for each subgroup.

When multiple comparison groups are available (not-yet-treated setting), the estimator combines them using optimal GMM weights (Equation 4.11 from [1])

\[\widehat{w}_{\mathrm{gmm}}^{g,t} = \frac{\widehat{\Omega}_{g,t}^{-1} \mathbf{1}} {\mathbf{1}' \widehat{\Omega}_{g,t}^{-1} \mathbf{1}}\]

where \(\widehat{\Omega}_{g,t}\) is the covariance matrix of \(\widehat{ATT}_{\mathrm{dr},g_c}(g,t)\) across comparison groups. The GMM estimator (Equation 4.12 from [1]) is then

\[\widehat{ATT}_{\mathrm{dr,gmm}}(g,t) = \frac{\mathbf{1}' \widehat{\Omega}_{g,t}^{-1}} {\mathbf{1}' \widehat{\Omega}_{g,t}^{-1} \mathbf{1}} \widehat{ATT}_{\mathrm{dr}}(g,t).\]

Parameters:

dataDataFrame: Repeated cross-section data in long format with columns for outcome, time, observation id, treatment group, and partition.
y_colstr: Name of the outcome variable column.
time_colstr: Name of the time period column.
id_colstr: Name of the observation identifier column. For RCS, this can be a row index since units are not tracked across periods.
group_colstr: Name of the treatment group column (first period when treatment enabled). Use 0 or np.inf for never-treated units.
partition_colstr: Name of the partition/eligibility column (1 = eligible, 0 = ineligible).
covariate_colslist of str or None, default None: Names of covariate columns in the data. If None, uses intercept only.
control_group{“nevertreated”, “notyettreated”}, default “nevertreated”: Which units to use as controls. With “notyettreated”, multiple comparison groups may be available, triggering GMM aggregation.
base_period{“universal”, “varying”}, default “universal”: Base period selection. “universal” uses period g-1 as baseline for all comparisons; “varying” uses period t-1 for each t.
est_method{“dr”, “reg”, “ipw”}, default “dr”: Estimation method for each 2-period comparison.
bootbool, default False: Whether to use multiplier bootstrap for inference.
bitersint, default 1000: Number of bootstrap repetitions (only used if boot=True).
cbandbool, default False: Whether to compute uniform confidence bands (only used if boot=True).
clusterstr or None, default None: Name of the column containing cluster identifiers for clustered standard errors. If provided, the bootstrap resamples at the cluster level (only used if boot=True).
alphafloat, default 0.05: Significance level for confidence intervals.
trim_levelfloat, default 0.995: Trimming level for propensity scores.
random_stateint, Generator, or None, default None: Controls random number generation for bootstrap reproducibility.
n_jobsint, default=1: Number of parallel jobs for group-time estimation. 1 = sequential (default), -1 = all cores, >1 = that many workers.

Returns:

DDDMultiPeriodRCResult

A NamedTuple containing:

att: Array of ATT(g,t) point estimates
se: Standard errors for each ATT(g,t)
uci, lci: Confidence interval bounds
groups: Treatment cohort for each estimate
times: Time period for each estimate
glist, tlist: Unique cohorts and periods
inf_func_mat: Influence function matrix (n_obs x k)
n: Number of observations
args: Estimation arguments