moderndid.gen_ddd_scalable#

moderndid.gen_ddd_scalable(n: int, dgp_type: int = 1, n_periods: int = 10, n_cohorts: int = 8, n_covariates: int = 20, att_base: float = 10.0, panel: bool = True, random_state=None) dict[source]#

Generate configurable staggered DDD data for stress-testing.

Parameters:
nint

Number of units (panel) or observations per period (repeated cross-section).

dgp_type{1, 2, 3, 4}, default=1

Controls nuisance function specification:

  • 1: Both propensity score and outcome regression use Z (both correct)

  • 2: Propensity score uses X, outcome regression uses Z (OR correct)

  • 3: Propensity score uses Z, outcome regression uses X (PS correct)

  • 4: Both use X (both misspecified when estimating with Z)

n_periodsint, default=10

Total number of time periods (labeled 1..T). Must be >= 2.

n_cohortsint, default=8

Number of treated cohorts (excludes never-treated g=0). Must be >= 1 and < n_periods. Cohorts adopt treatment at times 2, 3, …, n_cohorts+1.

n_covariatesint, default=20

Total covariates. Must be >= 4. First 4 get nonlinear transform via _transform_covariates; rest are raw standard normals.

att_basefloat, default=10.0

Base treatment effect. Cohort g at period t >= g gets att_base * g * (t - g + 1) * partition.

panelbool, default=True

If True, generate panel data. If False, generate repeated cross-section data with disjoint units per period.

random_stateint, Generator, or None, default=None

Controls randomness for reproducibility.

Returns:
dict

Dictionary containing:

  • data: pl.DataFrame in long format with columns [id, group, partition, time, y, cov1..covK, cluster]

  • data_wide: pl.DataFrame in wide format (panel with n_periods <= 20 only)

  • att_config: dict mapping each treated cohort g to att_base * g

  • cohort_values: list of all cohort values [0, 2, 3, …, n_cohorts+1]

  • n_periods: number of periods

  • n_covariates: number of covariates