moderndid.core.panel.complete_data#

moderndid.core.panel.complete_data(data: Any, idname: str, tname: str, min_periods: int | None = None) Any[source]#

Keep units observed in at least min_periods time periods.

Provides a flexible alternative to make_balanced_panel. Rather than requiring every unit to appear in all periods, you can set a threshold so that units with a reasonable amount of data are retained. When min_periods is None the behaviour is identical to make_balanced_panel.

Parameters:
dataDataFrame

Panel data. Accepts any object implementing the Arrow PyCapsule Interface (__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.

idnamestr

Unit identifier column.

tnamestr

Time period column.

min_periodsint or None

Minimum number of observed periods. None (default) means all periods, equivalent to make_balanced_panel.

Returns:
DataFrame

Filtered panel in the same format as data.

See also

make_balanced_panel

Strict balancing (all periods required).

Examples

In [1]: from moderndid import complete_data, load_favara_imbs
   ...: 
   ...: df = load_favara_imbs()
   ...: filtered = complete_data(df, idname="county", tname="year", min_periods=10)
   ...: print(f"Before: {df.shape[0]} rows, After: {filtered.shape[0]} rows")
   ...: 
Before: 12538 rows, After: 12527 rows