moderndid.core.panel.complete_data#
- moderndid.core.panel.complete_data(data: Any, idname: str, tname: str, min_periods: int | None = None) Any[source]#
Keep units observed in at least min_periods time periods.
Provides a flexible alternative to
make_balanced_panel. Rather than requiring every unit to appear in all periods, you can set a threshold so that units with a reasonable amount of data are retained. When min_periods isNonethe behaviour is identical tomake_balanced_panel.- Parameters:
- data
DataFrame Panel data. Accepts any object implementing the Arrow PyCapsule Interface (
__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.- idname
str Unit identifier column.
- tname
str Time period column.
- min_periods
intorNone Minimum number of observed periods.
None(default) means all periods, equivalent tomake_balanced_panel.
- data
- Returns:
DataFrameFiltered panel in the same format as data.
See also
make_balanced_panelStrict balancing (all periods required).
Examples
In [1]: from moderndid import complete_data, load_favara_imbs ...: ...: df = load_favara_imbs() ...: filtered = complete_data(df, idname="county", tname="year", min_periods=10) ...: print(f"Before: {df.shape[0]} rows, After: {filtered.shape[0]} rows") ...: Before: 12538 rows, After: 12527 rows