moderndid.core.panel.diagnose_panel#
- moderndid.core.panel.diagnose_panel(data: Any, idname: str, tname: str, treatname: str | None = None) PanelDiagnostics[source]#
Run a diagnostic battery on panel data.
Inspects the data for common issues that would cause estimation to fail or produce misleading results, including duplicate unit-time pairs, unbalanced units, gaps in the panel, missing values, single-period units, and early-treated units. When a treatment column is provided, the check also flags whether treatment varies within units over time (which usually indicates the data needs
get_groupto derive the group-timing variable).The returned
PanelDiagnosticsobject includes asuggestionslist that maps each detected problem to the appropriate remediation function (e.g.,deduplicate_panel,fill_panel_gaps,make_balanced_panel), making it a natural first step before calling any estimator.- Parameters:
- data
DataFrame Panel data. Accepts any object implementing the Arrow PyCapsule Interface (
__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.- idname
str Unit identifier column.
- tname
str Time period column.
- treatname
strorNone Treatment indicator column. If provided, checks whether treatment varies within units over time.
- data
- Returns:
PanelDiagnosticsStructured report with counts and actionable suggestions.
See also
deduplicate_panelRemove duplicate unit-time pairs.
fill_panel_gapsInsert null rows for missing pairs.
make_balanced_panelDrop units not observed in every period.
get_groupDerive group-timing from a binary treatment indicator.
Examples
In [1]: from moderndid import diagnose_panel, load_favara_imbs ...: ...: df = load_favara_imbs() ...: diag = diagnose_panel(df, idname="county", tname="year", treatname="inter_bra") ...: diag ...: Out[1]: ========================================================================================== Panel Diagnostics ========================================================================================== ┌───────────────────────────┬───────┐ │ Metric │ Value │ ├───────────────────────────┼───────┤ │ Units │ 1048 │ │ Periods │ 12 │ │ Observations │ 12538 │ │ Balanced │ No │ │ Duplicate unit-time pairs │ 0 │ │ Unbalanced units │ 5 │ │ Gaps │ 38 │ │ Rows with missing values │ 524 │ │ Single-period units │ 1 │ │ Early-treated units │ 0 │ │ Treatment time-varying │ Yes │ └───────────────────────────┴───────┘ ------------------------------------------------------------------------------------------ Suggestions ------------------------------------------------------------------------------------------ Call fill_panel_gaps() to fill 38 missing unit-time pairs Call make_balanced_panel() to drop 5 units not observed in all periods 524 rows contain missing values and will be dropped during preprocessing Call complete_data() or make_balanced_panel() to drop 1 units observed in only one period Treatment varies within units — verify this is expected or call get_group() ==========================================================================================