moderndid.core.panel.get_group#

moderndid.core.panel.get_group(data: Any, idname: str, tname: str, treatname: str, treat_period: int | None = None) Any[source]#

Extract treatment-group timing into a "G" column.

Staggered difference-in-differences estimators like att_gt require a group variable (gname) that records the first period each unit receives treatment. Many real-world datasets instead contain a binary treatment indicator that switches from 0 to 1 when treatment begins. This function converts that indicator into the group-timing variable "G" expected by the estimator. For each treated unit, G equals the first period where the treatment indicator is positive. For never-treated units, G is 0.

When the treatment indicator is static (e.g., a region dummy that equals 1 in every period for treated units), the first-switch logic would incorrectly assign G to the earliest observed period. In this case, pass treat_period to directly specify the known treatment onset: any unit with a positive value of treatname in any period receives G = treat_period, and all others receive G = 0.

Parameters:
dataDataFrame

Panel data. Accepts any object implementing the Arrow PyCapsule Interface (__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.

idnamestr

Unit identifier column.

tnamestr

Time period column.

treatnamestr

Binary treatment indicator column.

treat_periodint or None

Known treatment onset period. When provided, units with any positive value of treatname are assigned G = treat_period and all others receive G = 0, bypassing the first-switch detection logic. Useful for static treatment indicators that do not switch on at a specific time.

Returns:
DataFrame

Original columns plus "G", in the same format as data.

See also

att_gt

Estimate group-time average treatment effects.

diagnose_panel

Check whether treatment varies within units.

Examples

When the treatment indicator switches on at a specific period, the default behaviour detects the first switch automatically:

In [1]: from moderndid import get_group, load_favara_imbs
   ...: 
   ...: df = load_favara_imbs()
   ...: df = get_group(df, idname="county", tname="year", treatname="inter_bra")
   ...: df.select("county", "year", "inter_bra", "G").head(10)
   ...: 
Out[1]: 
shape: (10, 4)
┌────────┬──────┬───────────┬──────┐
│ county ┆ year ┆ inter_bra ┆ G    │
│ ---    ┆ ---  ┆ ---       ┆ ---  │
│ i64    ┆ i64  ┆ i64       ┆ i64  │
╞════════╪══════╪═══════════╪══════╡
│ 1001   ┆ 1994 ┆ 0         ┆ 1998 │
│ 1001   ┆ 1995 ┆ 0         ┆ 1998 │
│ 1001   ┆ 1996 ┆ 0         ┆ 1998 │
│ 1001   ┆ 1997 ┆ 0         ┆ 1998 │
│ 1001   ┆ 1998 ┆ 1         ┆ 1998 │
│ 1001   ┆ 1999 ┆ 1         ┆ 1998 │
│ 1001   ┆ 2000 ┆ 1         ┆ 1998 │
│ 1001   ┆ 2001 ┆ 1         ┆ 1998 │
│ 1001   ┆ 2002 ┆ 1         ┆ 1998 │
│ 1001   ┆ 2003 ┆ 1         ┆ 1998 │
└────────┴──────┴───────────┴──────┘

When the treatment indicator is static (e.g., a region dummy), pass treat_period to specify the known onset:

In [2]: from moderndid import get_group, load_cai2016
   ...: 
   ...: df = load_cai2016()
   ...: df = get_group(df, idname="hhno", tname="year",
   ...:                treatname="treatment", treat_period=2003)
   ...: df.select("hhno", "year", "treatment", "G").head(10)
   ...: 
Out[2]: 
shape: (10, 4)
┌──────┬──────┬───────────┬──────┐
│ hhno ┆ year ┆ treatment ┆ G    │
│ ---  ┆ ---  ┆ ---       ┆ ---  │
│ i64  ┆ i64  ┆ i64       ┆ i64  │
╞══════╪══════╪═══════════╪══════╡
│ 1    ┆ 2000 ┆ 1         ┆ 2003 │
│ 1    ┆ 2001 ┆ 1         ┆ 2003 │
│ 1    ┆ 2002 ┆ 1         ┆ 2003 │
│ 1    ┆ 2003 ┆ 1         ┆ 2003 │
│ 1    ┆ 2004 ┆ 1         ┆ 2003 │
│ 1    ┆ 2005 ┆ 1         ┆ 2003 │
│ 1    ┆ 2006 ┆ 1         ┆ 2003 │
│ 1    ┆ 2007 ┆ 1         ┆ 2003 │
│ 1    ┆ 2008 ┆ 1         ┆ 2003 │
│ 2    ┆ 2000 ┆ 1         ┆ 2003 │
└──────┴──────┴───────────┴──────┘