moderndid.gen_ddd_2periods#
- moderndid.gen_ddd_2periods(n, dgp_type, panel=True, random_state=None) dict[source]#
Generate synthetic data for 2-period DDD estimation.
Four subgroups are created based on treatment and partition status:
Subgroup 4: Treated AND Eligible (state=1, partition=1)
Subgroup 3: Treated BUT Ineligible (state=1, partition=0)
Subgroup 2: Eligible BUT Untreated (state=0, partition=1)
Subgroup 1: Untreated AND Ineligible (state=0, partition=0)
- Parameters:
- n
int, default=5000 Number of units to simulate. For panel data, this is the total number of units observed in both periods. For repeated cross-section data, this is the number of observations per period.
- dgp_type{1, 2, 3, 4}, default=1
Controls nuisance function specification:
1: Both propensity score and outcome regression use Z (both correct)
2: Propensity score uses X, outcome regression uses Z (OR correct)
3: Propensity score uses Z, outcome regression uses X (PS correct)
4: Both use X (both misspecified when estimating with Z)
- panelbool, default=True
If True, generate panel data where each unit is observed in both periods. If False, generate repeated cross-section data where different units are sampled in each period.
- random_state
int,Generator, orNone, default=None Controls randomness for reproducibility.
- n
- Returns:
dictDictionary containing:
data: pl.DataFrame in long format with columns [id, state, partition, time, y, cov1, cov2, cov3, cov4, cluster]
true_att: True ATT (always 0)
oracle_att: Oracle ATT from potential outcomes
efficiency_bound: Theoretical efficiency bound