moderndid.core.panel.diagnose_panel#

moderndid.core.panel.diagnose_panel(data: Any, idname: str, tname: str, treatname: str | None = None) PanelDiagnostics[source]#

Run a diagnostic battery on panel data.

Inspects the data for common issues that would cause estimation to fail or produce misleading results, including duplicate unit-time pairs, unbalanced units, gaps in the panel, missing values, single-period units, and early-treated units. When a treatment column is provided, the check also flags whether treatment varies within units over time (which usually indicates the data needs get_group to derive the group-timing variable).

The returned PanelDiagnostics object includes a suggestions list that maps each detected problem to the appropriate remediation function (e.g., deduplicate_panel, fill_panel_gaps, make_balanced_panel), making it a natural first step before calling any estimator.

Parameters:
dataDataFrame

Panel data. Accepts any object implementing the Arrow PyCapsule Interface (__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.

idnamestr

Unit identifier column.

tnamestr

Time period column.

treatnamestr or None

Treatment indicator column. If provided, checks whether treatment varies within units over time.

Returns:
PanelDiagnostics

Structured report with counts and actionable suggestions.

See also

deduplicate_panel

Remove duplicate unit-time pairs.

fill_panel_gaps

Insert null rows for missing pairs.

make_balanced_panel

Drop units not observed in every period.

get_group

Derive group-timing from a binary treatment indicator.

Examples

In [1]: from moderndid import diagnose_panel, load_favara_imbs
   ...: 
   ...: df = load_favara_imbs()
   ...: diag = diagnose_panel(df, idname="county", tname="year", treatname="inter_bra")
   ...: diag
   ...: 
Out[1]: 
==========================================================================================
 Panel Diagnostics
==========================================================================================

┌───────────────────────────┬───────┐
│ Metric                    │ Value │
├───────────────────────────┼───────┤
│ Units                     │  1048 │
│ Periods                   │    12 │
│ Observations              │ 12538 │
│ Balanced                  │    No │
│ Duplicate unit-time pairs │     0 │
│ Unbalanced units          │     5 │
│ Gaps                      │    38 │
│ Rows with missing values  │   524 │
│ Single-period units       │     1 │
│ Early-treated units       │     0 │
│ Treatment time-varying    │   Yes │
└───────────────────────────┴───────┘

------------------------------------------------------------------------------------------
 Suggestions
------------------------------------------------------------------------------------------
 Call fill_panel_gaps() to fill 38 missing unit-time pairs
 Call make_balanced_panel() to drop 5 units not observed in all periods
 524 rows contain missing values and will be dropped during preprocessing
 Call complete_data() or make_balanced_panel() to drop 1 units observed in only one period
 Treatment varies within units  verify this is expected or call get_group()
==========================================================================================