moderndid.core.panel.get_group#
- moderndid.core.panel.get_group(data: Any, idname: str, tname: str, treatname: str, treat_period: int | None = None) Any[source]#
Extract treatment-group timing into a
"G"column.Staggered difference-in-differences estimators like
att_gtrequire a group variable (gname) that records the first period each unit receives treatment. Many real-world datasets instead contain a binary treatment indicator that switches from 0 to 1 when treatment begins. This function converts that indicator into the group-timing variable"G"expected by the estimator. For each treated unit,Gequals the first period where the treatment indicator is positive. For never-treated units,Gis 0.When the treatment indicator is static (e.g., a region dummy that equals 1 in every period for treated units), the first-switch logic would incorrectly assign
Gto the earliest observed period. In this case, passtreat_periodto directly specify the known treatment onset: any unit with a positive value of treatname in any period receivesG = treat_period, and all others receiveG = 0.- Parameters:
- data
DataFrame Panel data. Accepts any object implementing the Arrow PyCapsule Interface (
__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.- idname
str Unit identifier column.
- tname
str Time period column.
- treatname
str Binary treatment indicator column.
- treat_period
intorNone Known treatment onset period. When provided, units with any positive value of treatname are assigned
G = treat_periodand all others receiveG = 0, bypassing the first-switch detection logic. Useful for static treatment indicators that do not switch on at a specific time.
- data
- Returns:
DataFrameOriginal columns plus
"G", in the same format as data.
See also
att_gtEstimate group-time average treatment effects.
diagnose_panelCheck whether treatment varies within units.
Examples
When the treatment indicator switches on at a specific period, the default behaviour detects the first switch automatically:
In [1]: from moderndid import get_group, load_favara_imbs ...: ...: df = load_favara_imbs() ...: df = get_group(df, idname="county", tname="year", treatname="inter_bra") ...: df.select("county", "year", "inter_bra", "G").head(10) ...: Out[1]: shape: (10, 4) ┌────────┬──────┬───────────┬──────┐ │ county ┆ year ┆ inter_bra ┆ G │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞════════╪══════╪═══════════╪══════╡ │ 1001 ┆ 1994 ┆ 0 ┆ 1998 │ │ 1001 ┆ 1995 ┆ 0 ┆ 1998 │ │ 1001 ┆ 1996 ┆ 0 ┆ 1998 │ │ 1001 ┆ 1997 ┆ 0 ┆ 1998 │ │ 1001 ┆ 1998 ┆ 1 ┆ 1998 │ │ 1001 ┆ 1999 ┆ 1 ┆ 1998 │ │ 1001 ┆ 2000 ┆ 1 ┆ 1998 │ │ 1001 ┆ 2001 ┆ 1 ┆ 1998 │ │ 1001 ┆ 2002 ┆ 1 ┆ 1998 │ │ 1001 ┆ 2003 ┆ 1 ┆ 1998 │ └────────┴──────┴───────────┴──────┘
When the treatment indicator is static (e.g., a region dummy), pass
treat_periodto specify the known onset:In [2]: from moderndid import get_group, load_cai2016 ...: ...: df = load_cai2016() ...: df = get_group(df, idname="hhno", tname="year", ...: treatname="treatment", treat_period=2003) ...: df.select("hhno", "year", "treatment", "G").head(10) ...: Out[2]: shape: (10, 4) ┌──────┬──────┬───────────┬──────┐ │ hhno ┆ year ┆ treatment ┆ G │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞══════╪══════╪═══════════╪══════╡ │ 1 ┆ 2000 ┆ 1 ┆ 2003 │ │ 1 ┆ 2001 ┆ 1 ┆ 2003 │ │ 1 ┆ 2002 ┆ 1 ┆ 2003 │ │ 1 ┆ 2003 ┆ 1 ┆ 2003 │ │ 1 ┆ 2004 ┆ 1 ┆ 2003 │ │ 1 ┆ 2005 ┆ 1 ┆ 2003 │ │ 1 ┆ 2006 ┆ 1 ┆ 2003 │ │ 1 ┆ 2007 ┆ 1 ┆ 2003 │ │ 1 ┆ 2008 ┆ 1 ┆ 2003 │ │ 2 ┆ 2000 ┆ 1 ┆ 2003 │ └──────┴──────┴───────────┴──────┘