ModernDiD 0.1.0 Release Notes#

ModernDiD 0.1.0 is the first major release of the package, graduating from the 0.0.x pre-release series with a stable, feature-complete API for modern difference-in-differences estimation in Python. This release brings major new capabilities including intertemporal DiD estimation, GPU acceleration via CuPy, distributed computing with Dask and Spark backends, and a DataFrame-agnostic interface powered by narwhals. The highlights for this release are:

  • Intertemporal DiD estimation for non-binary, non-absorbing treatments.

  • GPU acceleration via CuPy for NVIDIA GPUs.

  • Distributed computing backends for Dask and Apache Spark.

  • DataFrame-agnostic API via narwhals and the Arrow PyCapsule Interface.

  • Thread-based parallelism for group-time estimations.

  • Comprehensive R validation test suite.

  • Benchmarks demonstrating significant speed advantages over R counterparts.

  • Major documentation overhaul with plotting guides and background sections.

Python versions 3.11-3.13 are supported in this release.

New Features#

Intertemporal DiD Estimation#

A new didinter module implements the de Chaisemartin and D’Haultfoeuille (2024) estimator for difference-in-differences with non-binary, non-absorbing treatments via the did_multiplegt() function. This estimator handles settings where treatment can change over time (not just turn on), supporting both increases and decreases in treatment intensity. Bootstrap inference and dedicated plotting functions are included.

import moderndid as did

result = did.did_multiplegt(
    data=df,
    yname="Dl_vloans_b",
    idname="county",
    tname="year",
    dname="inter_bra",
    effects=5,
    cluster="state_n",
)

did.plot_multiplegt(result)

(#146, #147)

GPU Acceleration via CuPy#

Optional GPU acceleration is now available for NVIDIA GPUs via CuPy. When a CUDA-capable GPU is available, users can offload regression, propensity score estimation, and bootstrap inference to the GPU. A backend manager provides get_backend(), set_backend(), and use_backend() for controlling GPU usage.

import moderndid as did
from moderndid import use_backend

# Pass backend directly to any estimator
result = did.att_gt(
    data=data,
    yname="y",
    tname="time",
    idname="id",
    gname="group",
    backend="cupy",
)

# Or use a context manager for multiple calls
with use_backend("cupy"):
    result1 = did.att_gt(...)
    result2 = did.ddd(...)

# Or set globally
did.set_backend("cupy")
result = did.att_gt(...)
did.set_backend("numpy")  # revert

Install with:

uv pip install moderndid[gpu]

(#175)

Distributed Computing with Dask#

A full Dask distributed backend enables scaling att_gt(), ddd(), cont_did(), and did_multiplegt() across multiple workers and machines. Multi-GPU parallelism is supported when combined with CuPy. Results are numerically identical to local estimators. Simply pass a Dask DataFrame to any estimator and the distributed backend is used automatically.

import dask.dataframe as dd
import moderndid as did

ddf = dd.read_parquet("panel_data.parquet")

result = did.att_gt(
    data=ddf,
    yname="y",
    tname="time",
    idname="id",
    gname="group",
    xformla="~ x1 + x2",
    est_method="dr",
)

event_study = did.aggte(result, type="dynamic")
did.plot_event_study(event_study)

Install with:

uv pip install moderndid[dask]

(#185, #195)

Distributed Computing with Apache Spark#

A PySpark distributed backend provides the same scaling capabilities as Dask for att_gt(), ddd(), cont_did(), and did_multiplegt(). Shared distributed logic is consolidated in the moderndid.distributed module.

from pyspark.sql import SparkSession
import moderndid as did

spark = SparkSession.builder.master("local[*]").getOrCreate()
sdf = spark.read.parquet("panel_data.parquet")

result = did.att_gt(
    data=sdf,
    yname="y",
    tname="time",
    idname="id",
    gname="group",
    xformla="~ x1 + x2",
    est_method="dr",
)

event_study = did.aggte(result, type="dynamic")
did.plot_event_study(event_study)

Install with:

uv pip install moderndid[spark]

(#191, #195)

DataFrame-Agnostic Interface via Narwhals#

All estimators now accept any Arrow-compatible DataFrame through narwhals and the Arrow PyCapsule Interface. Users can pass polars, pandas, PyArrow, DuckDB, or any other Arrow-compatible DataFrame directly to all estimators without conversion.

import moderndid as did

# Works with pandas
import pandas as pd
df_pandas = pd.read_csv("data.csv")
result = did.att_gt(data=df_pandas, ...)

# Works with polars
import polars as pl
df_polars = pl.read_csv("data.csv")
result = did.att_gt(data=df_polars, ...)

# Works with pyarrow
import pyarrow.parquet as pq
table = pq.read_table("data.parquet")
result = did.att_gt(data=table, ...)

# Works with duckdb
import duckdb
df_duck = duckdb.query("SELECT * FROM 'data.parquet'").arrow()
result = did.att_gt(data=df_duck, ...)

(#144)

Panel Data Utilities#

New user-facing functions for common panel data operations provide utilities for data validation, gap filling, and deduplication.

(#182)

Optional Dependency Groups#

The package now supports granular installation of optional features via extras:

uv pip install moderndid[plots]      # plotnine visualizations
uv pip install moderndid[gpu]        # CuPy GPU acceleration
uv pip install moderndid[dask]       # Dask distributed backend
uv pip install moderndid[spark]      # Spark distributed backend
uv pip install moderndid[didcont]    # continuous treatment DiD
uv pip install moderndid[didhonest]  # sensitivity analysis
uv pip install moderndid[numba]      # Numba JIT acceleration
uv pip install moderndid[all]        # everything

(#159)

Improvements#

Thread-Based Parallelism for Group-Time Estimations#

Group-time estimation computations now use thread-based parallelism, providing significant speedups for estimators with many group-time cells.

(#172)

Faster HonestDiD Computations#

The didhonest sensitivity analysis module implementing Rambachan and Roth (2023) received significant performance improvements. The _construct_gamma function was rewritten to use direct NumPy matrix construction instead of SymPy RREF, yielding approximately 420x faster execution.

(#178)

PrettyTable Output for All Estimators#

All estimator result objects now display formatted output using PrettyTable, providing clean, readable summary tables in the console.

import moderndid as did

data = did.load_mpdta()
result = did.att_gt(
    data=data,
    yname="lemp",
    tname="year",
    idname="countyreal",
    gname="first.treat",
)

event_study = did.aggte(result, type="dynamic")
print(event_study)

(#170)

DDD Preprocessing Pipeline#

Triple difference-in-differences preprocessing following Ortiz-Villavicencio and Sant’Anna (2025) has been integrated into the core preprocessing logic, improving consistency and enabling shared validation across all estimators.

(#169)

Minimal Plotting Theme#

The plotting subsystem has been updated to use a minimal theme, producing cleaner publication-quality visualizations by default.

import moderndid as did

# Event study plot with reference period marker
event_study = did.aggte(result, type="dynamic")
did.plot_event_study(event_study, ref_period=-1)

# Group-time plot in a multi-panel layout
did.plot_gt(result, ncol=3)

# Aggregation plots
group_agg = did.aggte(result, type="group")
did.plot_agg(group_agg)

# Dose-response plot (continuous treatment)
did.plot_dose_response(dose_result, effect_type="att")

(#161)

Standardized Bootstrap Arguments#

Bootstrap-related arguments have been standardized across all estimators for a more consistent API. Linting has been consolidated on ruff with improved pre-commit hook coverage.

(#150)

Refactored Numba Integration#

The Numba JIT integration has been refactored along with improvements to how influence function aggregation is performed.

(#158)

Comprehensive Parameter Validation#

Validation checks have been added across all estimators for parameters including data types, column existence, treatment timing, and group definitions, with informative error messages.

(#173, #177)

Propensity Score Trimming and Collinearity Checks for DDD#

The DDD estimators now include propensity score trimming functionality and collinearity checks for covariates with informative diagnostics.

(#125)

Random State for Reproducibility#

A random_state parameter has been added to cont_did() implementing Callaway, Goodman-Bacon, and Sant’Anna (2024) for reproducible bootstrap inference.

(#126)

Bug Fixes#

Benchmarks#

Speed benchmarks have been added comparing ModernDiD estimators against their R counterparts. Results demonstrate significant speed advantages for the Python implementations, with the R DIDmultiplegtDYN package unable to handle observation sizes above 300k. Benchmarks were refreshed after the R package adopted Polars internally, providing a fairer comparison.

(#160, #188)

Testing#

R Validation Suite#

The existing R validation tests have been significantly enhanced and consolidated into a dedicated tests/validation/ directory. These tests verify numerical accuracy against the canonical R implementations to ensure that ModernDiD produces equivalent results.

  • Enhance R validation tests for att_gt() and aggte() against the R did package.

    (#134)

  • Enhance R validation tests for DR-DiD estimators from Sant’Anna and Zhao (2020) against the R DRDID package.

    (#133)

  • Enhance R validation tests for HonestDiD sensitivity analysis against the R HonestDiD package.

    (#135)

  • Enhance R validation tests for cont_did() estimator.

    (#132)

  • Consolidate all R validation tests into tests/validation/ and harden tolerances.

    (#194)

  • Add DDD-specific tests for improved coverage.

    (#122)

  • Improve overall test coverage across the codebase.

    (#128, #197, #199)

  • Stop ignoring distributed tests in CI for accurate coverage reporting.

    (#200)

  • Add validation-specific tox testing environment.

    (#180)

Documentation#

This release includes a major documentation overhaul following NumPy-style conventions, with new guides, background sections, and improved API reference.

  • Comprehensive documentation overhaul with NumPy-style release notes.

    (#140)

  • Add continuous DiD background section covering the Callaway, Goodman-Bacon, and Sant’Anna (2024) methodology.

    (#162)

  • Add user guide examples and background sections for all estimators.

    (#167)

  • Add plotting guide documentation.

    (#196)

  • Add GPU usage instructions and hardware specifications.

    (#176, #181)

  • Refine NPIV documentation.

    (#179)

  • Update architecture docs with estimator development workflow.

    (#171)

  • Add implementation standards and introduction to DiD section.

    (#138)

  • Reorganize and extend development documentation.

    (#136, #137)

  • Rework main function docstrings and result output formatting.

    (#121)

  • Add repeated cross-section examples for ddd().

    (#129)

  • Clean up HonestDiD documentation.

    (#124)

  • Update agg_ddd() function documentation.

    (#120)

  • Fix math formatting in didhonest background docs and refactor READMEs.

    (#201)

Contributors#

A total of 1 person contributed to this release.

  • Jordan Deklerk

Pull Requests Merged#

A total of 82 pull requests were merged for this release, of which 65 are feature or fix PRs (excluding automated dependency updates).

  • #201: Refactor READMEs and fix math rendering

  • #200: Stop ignoring distributed tests in CI

  • #199: Add more tests for coverage

  • #197: Improve test coverage

  • #196: Add plotting guide

  • #195: Add distributed did_multiplegt() and cont_did()

  • #194: Enhance R validation tests and fix bug in compute_conditional_cs_rmb()

  • #193: Update documentation images

  • #191: Add Spark distributed backend

  • #188: Refresh benchmarks against R packages

  • #187: Fix Dask test suite and update CI

  • #186: Fix README

  • #185: Add distributed computing via Dask and multi-GPU support

  • #182: Add panel data utilities

  • #181: Update README with hardware specifications

  • #180: Add validation tox testing environment

  • #179: Refine NPIV documentation

  • #178: Improve computations in didhonest module

  • #177: Update validation logic and error handling

  • #176: Update README for GPU instructions

  • #175: Add CuPy GPU acceleration

  • #173: Add validation checks for parameters

  • #172: Add thread-based parallelism for group-time estimations

  • #171: Update architecture documentation

  • #170: Add PrettyTable output for all estimators

  • #169: Implement DDD preprocessing pipeline

  • #168: Refactor documentation

  • #167: Add user guide examples and background sections

  • #166: Disable dependency dashboard

  • #165: Add benchmark commands

  • #162: Add continuous DiD background section

  • #161: Use minimal theme for plotting

  • #160: Add benchmarks against R packages

  • #159: Allow optional dependencies for installation

  • #158: Refactor Numba integration and update influence function aggregation

  • #150: Standardize bootstrap arguments and consolidate linting on ruff

  • #148: Update main README

  • #147: Add plotting and bootstrapping for intertemporal DiD

  • #146: Add didinter module for intertemporal DiD estimation

  • #145: Remove trailing commas from output

  • #144: Add DataFrame-agnostic functionality via narwhals

  • #143: Add version import to init

  • #142: Remove redundant section in getting started

  • #141: Fix linting imports

  • #140: Overhaul documentation style and add release notes

  • #139: Overhaul documentation styling and add release notes

  • #138: Add implementation standards and intro to DiD section

  • #137: Reorganize development documentation

  • #136: Add more detail to development documentation

  • #135: Fix computations in didhonest and add R validation

  • #134: Fix group aggregation in aggte() and add R validation

  • #133: Add R validation for drdid module

  • #132: Fix bugs in cont_did() and add R validation

  • #130: Update getting started docs and test suite

  • #129: Add repeated cross-section examples for ddd()

  • #128: Improve test coverage

  • #127: Update parameter names for drdid()

  • #126: Add random state for cont_did()

  • #125: Add propensity trimming and collinearity checks for DDD

  • #124: Clean up HonestDiD documentation

  • #123: Add DDD plotting functions

  • #122: Add DDD tests for coverage

  • #121: Rework main function docstrings

  • #120: Update agg_ddd() documentation

  • #119: Update changelog for v0.0.3