moderndid.gen_ddd_2periods#

moderndid.gen_ddd_2periods(n, dgp_type, panel=True, random_state=None) dict[source]#

Generate synthetic data for 2-period DDD estimation.

Four subgroups are created based on treatment and partition status:

  • Subgroup 4: Treated AND Eligible (state=1, partition=1)

  • Subgroup 3: Treated BUT Ineligible (state=1, partition=0)

  • Subgroup 2: Eligible BUT Untreated (state=0, partition=1)

  • Subgroup 1: Untreated AND Ineligible (state=0, partition=0)

Parameters:
nint, default=5000

Number of units to simulate. For panel data, this is the total number of units observed in both periods. For repeated cross-section data, this is the number of observations per period.

dgp_type{1, 2, 3, 4}, default=1

Controls nuisance function specification:

  • 1: Both propensity score and outcome regression use Z (both correct)

  • 2: Propensity score uses X, outcome regression uses Z (OR correct)

  • 3: Propensity score uses Z, outcome regression uses X (PS correct)

  • 4: Both use X (both misspecified when estimating with Z)

panelbool, default=True

If True, generate panel data where each unit is observed in both periods. If False, generate repeated cross-section data where different units are sampled in each period.

random_stateint, Generator, or None, default=None

Controls randomness for reproducibility.

Returns:
dict

Dictionary containing:

  • data: pl.DataFrame in long format with columns [id, state, partition, time, y, cov1, cov2, cov3, cov4, cluster]

  • true_att: True ATT (always 0)

  • oracle_att: Oracle ATT from potential outcomes

  • efficiency_bound: Theoretical efficiency bound