moderndid.gen_ddd_scalable#
- moderndid.gen_ddd_scalable(n: int, dgp_type: int = 1, n_periods: int = 10, n_cohorts: int = 8, n_covariates: int = 20, att_base: float = 10.0, panel: bool = True, random_state=None) dict[source]#
Generate configurable staggered DDD data for stress-testing.
- Parameters:
- n
int Number of units (panel) or observations per period (repeated cross-section).
- dgp_type{1, 2, 3, 4}, default=1
Controls nuisance function specification:
1: Both propensity score and outcome regression use Z (both correct)
2: Propensity score uses X, outcome regression uses Z (OR correct)
3: Propensity score uses Z, outcome regression uses X (PS correct)
4: Both use X (both misspecified when estimating with Z)
- n_periods
int, default=10 Total number of time periods (labeled 1..T). Must be >= 2.
- n_cohorts
int, default=8 Number of treated cohorts (excludes never-treated g=0). Must be >= 1 and < n_periods. Cohorts adopt treatment at times 2, 3, …, n_cohorts+1.
- n_covariates
int, default=20 Total covariates. Must be >= 4. First 4 get nonlinear transform via
_transform_covariates; rest are raw standard normals.- att_base
float, default=10.0 Base treatment effect. Cohort g at period t >= g gets
att_base * g * (t - g + 1) * partition.- panelbool, default=True
If True, generate panel data. If False, generate repeated cross-section data with disjoint units per period.
- random_state
int,Generator, orNone, default=None Controls randomness for reproducibility.
- n
- Returns:
dictDictionary containing:
data: pl.DataFrame in long format with columns [id, group, partition, time, y, cov1..covK, cluster]
data_wide: pl.DataFrame in wide format (panel with n_periods <= 20 only)
att_config: dict mapping each treated cohort g to
att_base * gcohort_values: list of all cohort values [0, 2, 3, …, n_cohorts+1]
n_periods: number of periods
n_covariates: number of covariates