moderndid.ddd_mp#
- moderndid.ddd_mp(data, y_col, time_col, id_col, group_col, partition_col, covariate_cols=None, control_group='nevertreated', base_period='universal', est_method='dr', boot=False, biters=1000, cband=False, cluster=None, alpha=0.05, random_state=None, n_jobs=1)[source]#
Compute the multi-period doubly robust DDD estimator for the ATT with panel data.
Implements the multi-period triple difference-in-differences estimator from [1]. The target parameters are the group-time average treatment effects
\[ATT(g, t) = \mathbb{E}[Y_t(g) - Y_t(\infty) \mid S=g, Q=1]\]for all treatment cohorts \(g \in \mathcal{G}_{trt}\) and time periods \(t \in \{2, \ldots, T\}\) such that \(t \geq g\).
For each (g,t) cell with comparison group \(g_{\mathrm{c}}\), the doubly robust estimand (Equation 4.8 from [1]) is
\[\begin{split}\widehat{ATT}_{\mathrm{dr},g_{\mathrm{c}}}(g,t) &= \mathbb{E}_n\left[ \left(\widehat{w}_{\mathrm{trt}}^{S=g,Q=1}(S,Q) - \widehat{w}_{g,0}^{S=g,Q=1}(S,Q,X)\right) \left(Y_t - Y_{g-1} - \widehat{m}_{Y_t-Y_{g-1}}^{S=g,Q=0}(X)\right)\right] \\ &+ \mathbb{E}_n\left[ \left(\widehat{w}_{\mathrm{trt}}^{S=g,Q=1}(S,Q) - \widehat{w}_{g_{\mathrm{c}},1}^{S=g,Q=1}(S,Q,X)\right) \left(Y_t - Y_{g-1} - \widehat{m}_{Y_t-Y_{g-1}}^{S=g_{\mathrm{c}},Q=1}(X)\right)\right] \\ &- \mathbb{E}_n\left[ \left(\widehat{w}_{\mathrm{trt}}^{S=g,Q=1}(S,Q) - \widehat{w}_{g_{\mathrm{c}},0}^{S=g,Q=1}(S,Q,X)\right) \left(Y_t - Y_{g-1} - \widehat{m}_{Y_t-Y_{g-1}}^{S=g_{\mathrm{c}},Q=0}(X)\right)\right].\end{split}\]When multiple comparison groups are available (not-yet-treated setting), the estimator combines them using optimal GMM weights (Equation 4.11 from [1])
\[\widehat{w}_{gmm}^{g,t} = \frac{\widehat{\Omega}_{g,t}^{-1} \mathbf{1}} {\mathbf{1}' \widehat{\Omega}_{g,t}^{-1} \mathbf{1}}\]where \(\widehat{\Omega}_{g,t}\) is the covariance matrix of \(\widehat{ATT}_{dr,g_c}(g,t)\) across comparison groups. The GMM estimator (Equation 4.12 from [1]) is then
\[\widehat{ATT}_{dr,gmm}(g,t) = \frac{\mathbf{1}' \widehat{\Omega}_{g,t}^{-1}} {\mathbf{1}' \widehat{\Omega}_{g,t}^{-1} \mathbf{1}} \widehat{ATT}_{dr}(g,t).\]- Parameters:
- data
DataFrame Panel data in long format with columns for outcome, time, unit id, treatment group, and partition.
- y_col
str Name of the outcome variable column.
- time_col
str Name of the time period column.
- id_col
str Name of the unit identifier column.
- group_col
str Name of the treatment group column (first period when treatment enabled). Use 0 or np.inf for never-treated units.
- partition_col
str Name of the partition/eligibility column (1 = eligible, 0 = ineligible).
- covariate_cols
listofstrorNone, defaultNone Names of covariate columns in the data. If None, uses intercept only.
- control_group{“nevertreated”, “notyettreated”}, default “nevertreated”
Which units to use as controls. With “notyettreated”, multiple comparison groups may be available, triggering GMM aggregation.
- base_period{“universal”, “varying”}, default “universal”
Base period selection. “universal” uses period g-1 as baseline for all comparisons; “varying” uses period t-1 for each t.
- est_method{“dr”, “reg”, “ipw”}, default “dr”
Estimation method for each 2-period comparison.
- bootbool, default
False Whether to use multiplier bootstrap for inference.
- biters
int, default 1000 Number of bootstrap repetitions (only used if boot=True).
- cbandbool, default
False Whether to compute uniform confidence bands (only used if boot=True).
- cluster
strorNone, defaultNone Name of the column containing cluster identifiers for clustered standard errors. If provided, the bootstrap resamples at the cluster level (only used if boot=True).
- alpha
float, default 0.05 Significance level for confidence intervals.
- random_state
int,Generator, orNone, defaultNone Controls random number generation for bootstrap reproducibility.
- n_jobs
int, default=1 Number of parallel jobs for group-time estimation. 1 = sequential (default), -1 = all cores, >1 = that many workers.
- data
- Returns:
DDDMultiPeriodResultA NamedTuple containing:
att: Array of ATT(g,t) point estimates
se: Standard errors for each ATT(g,t)
uci, lci: Confidence interval bounds
groups: Treatment cohort for each estimate
times: Time period for each estimate
glist, tlist: Unique cohorts and periods
inf_func_mat: Influence function matrix (n x k)
n: Number of units
args: Estimation arguments
See also
ddd_panelTwo-period DDD estimator for panel data.
Notes
The influence functions are rescaled by \(n / n_{g,t}\) where \(n_{g,t}\) is the number of units in each (g,t) cell, following the approach in [1].
The standard errors are computed from the influence function matrix as
\[\widehat{V} = \frac{1}{n} \widehat{\Psi}' \widehat{\Psi}, \quad \widehat{se}_{g,t} = \sqrt{\widehat{V}_{g,t,g,t} / n}\]where \(\widehat{\Psi}\) is the \(n \times k\) matrix of influence functions. For cells with GMM aggregation, the standard error formula from Equation 4.12 is used instead.
References