moderndid.gen_ddd_2periods#

moderndid.gen_ddd_2periods(n, dgp_type, panel=True, random_state=None) → dict[source]#

Generate synthetic data for 2-period DDD estimation.

Four subgroups are created based on treatment and partition status:

Subgroup 4: Treated AND Eligible (state=1, partition=1)
Subgroup 3: Treated BUT Ineligible (state=1, partition=0)
Subgroup 2: Eligible BUT Untreated (state=0, partition=1)
Subgroup 1: Untreated AND Ineligible (state=0, partition=0)

Parameters:

nint, default=5000

Number of units to simulate. For panel data, this is the total number of units observed in both periods. For repeated cross-section data, this is the number of observations per period.

dgp_type{1, 2, 3, 4}, default=1

Controls nuisance function specification:

1: Both propensity score and outcome regression use Z (both correct)
2: Propensity score uses X, outcome regression uses Z (OR correct)
3: Propensity score uses Z, outcome regression uses X (PS correct)
4: Both use X (both misspecified when estimating with Z)

panelbool, default=True

If True, generate panel data where each unit is observed in both periods. If False, generate repeated cross-section data where different units are sampled in each period.

random_stateint, Generator, or None, default=None

Controls randomness for reproducibility.

Returns:

dict

Dictionary containing:

data: pl.DataFrame in long format with columns [id, state, partition, time, y, cov1, cov2, cov3, cov4, cluster]
true_att: True ATT (always 0)
oracle_att: Oracle ATT from potential outcomes
efficiency_bound: Theoretical efficiency bound