moderndid.etwfe#

moderndid.etwfe(data, yname: str, tname: str, gname: str, idname: str | None = None, xformla: str | None = None, xvar: str | None = None, tref=None, gref=None, cgroup: str = 'notyet', fe: str = 'vs', family=None, weightsname: str | None = None, vcov: str | dict | None = None, alp: float = 0.05, backend=None) EtwfeResult[source]#

Estimate the Extended Two-Way Fixed Effects model.

Implements the ETWFE methodology for difference-in-differences with staggered treatment adoption and heterogeneous treatment effects [1] [2]. Rather than discarding the TWFE estimator, the approach saturates the model with cohort-by-time interaction terms so that the coefficients on the treatment indicators directly recover the cohort-time-specific average treatment effects on the treated,

\[\tau_{g,t} \equiv E[y_t(g) - y_t(\infty) \mid d_g = 1], \quad t \ge g.\]

Under no anticipation (NA), conditional parallel trends (CPT), and linearity (LIN), the conditional expectation of the never-treated potential outcome is

\[E[y_t(\infty) \mid \mathbf{d}, \mathbf{x}] = \alpha + \sum_g \beta_g d_g + \mathbf{x}\boldsymbol{\kappa} + \sum_g (d_g \cdot \mathbf{x})\boldsymbol{\xi}_g + \sum_s \gamma_s f_{s,t} + \sum_s (f_{s,t} \cdot \mathbf{x})\boldsymbol{\pi}_s,\]

where \(d_g\) are treatment cohort indicators, \(f_{s,t}\) are time dummies, and \(\mathbf{x}\) are time-constant covariates. The ATTs are identified as

\[\tau_{g,t} = E(y_t \mid d_g = 1) - \bigl[(\alpha + \beta_g + \gamma_t) + E(\mathbf{x} \mid d_g = 1) \cdot (\boldsymbol{\kappa} + \boldsymbol{\xi}_g + \boldsymbol{\pi}_t)\bigr].\]

The ETWFE regression includes the full set of \(w_t \cdot d_g \cdot f_{s,t}\) treatment interactions, with covariates demeaned about their cohort means \(\dot{\mathbf{x}}_g = \mathbf{x} - \bar{\mathbf{x}}_g\). The pooled OLS estimates of \(\tau_{g,t}\) from this saturated regression are numerically identical to a cohort imputation procedure (Proposition 5.2 in [2]). Use emfx to aggregate the cell-level estimates into overall, group, calendar, or event-study summaries.

Parameters:
dataDataFrame

Panel data in long format. Accepts any object implementing the Arrow PyCapsule Interface (__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.

ynamestr

The name of the outcome variable.

tnamestr

The name of the column containing the time periods.

gnamestr

The name of the variable that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It should be 0 for units in the untreated group. It defines which “cohort” a unit belongs to.

idnamestr or None, default=None

The individual (cross-sectional unit) id name. When provided, unit fixed effects are absorbed in the regression.

xformlastr or None, default=None

A formula for the covariates to include in the model. It should be of the form "~ x1 + x2". Controls are demeaned within cohort groups following the Mundlak device so that the parallel trends assumption need only hold conditional on covariates.

xvarstr or None, default=None

Name of a covariate to interact with treatment for heterogeneous treatment effect analysis. The variable is demeaned within cohorts and interacted with the treatment and time indicators.

trefnumeric or None, default=None

Reference time period. Defaults to the minimum time period in the data.

grefnumeric or None, default=None

Reference cohort (control group). Auto-detected based on cgroup. For "never", selects the group beyond the last observed period. For "notyet", defaults to the latest-treated cohort.

cgroup{‘notyet’, ‘never’}, default=’notyet’

Control group strategy:

  • "notyet": use not-yet-treated units as controls (drops observations once the reference cohort enters treatment)

  • "never": use never-treated units as controls

fe{‘vs’, ‘feo’, ‘none’}, default=’vs’

Fixed effects specification:

  • "vs": varying slopes (controls interact with cohort and time FE)

  • "feo": fixed effects only

  • "none": no absorbed fixed effects

family{None, ‘gaussian’, ‘poisson’, ‘logit’, ‘probit’}, default=None

GLM family for nonlinear models. None and "gaussian" use OLS via feols. "poisson" uses Poisson QMLE via fepois. "logit" and "probit" use feglm. For non-Gaussian families, fe is set to "none" and idname is ignored (unit FE absorption is not supported for GLM).

weightsnamestr or None, default=None

The name of the column containing sampling weights. If not set, all observations have equal weight.

vcovstr or dict or None, default=None

Variance-covariance specification passed to pyfixest. Defaults to "hetero" (heteroskedasticity-robust). Examples: "iid", "hetero", "HC1", {"CRV1": "cluster_var"}.

alpfloat, default=0.05

The significance level.

backend{‘cupy’, ‘jax’, ‘numba’, ‘rust’, ‘scipy’} or None, default=None

Demeaner backend for pyfixest’s fixed-effects absorption. "cupy" and "jax" enable GPU acceleration (require CuPy or JAX with GPU support; without a GPU, pyfixest falls back to CPU). "numba" (the default), "rust", and "scipy" are CPU-only. None uses pyfixest’s default.

Returns:
EtwfeResult

Object containing ETWFE regression results:

  • coefficients: coefficient estimates for each interaction term

  • std_errors: standard errors for each coefficient

  • vcov: variance-covariance matrix

  • coef_names: coefficient names from pyfixest

  • gt_pairs: list of (group, time) pairs for each coefficient

  • n_obs: number of observations

  • n_units: number of unique cross-sectional units

  • r_squared: R-squared of the regression

  • data: fitted data (used internally by emfx)

  • config: configuration object (used internally by emfx)

  • estimation_params: dictionary with estimation details

See also

emfx

Aggregate ETWFE cell-level estimates into treatment effect summaries.

att_gt

Group-time ATT estimation via Callaway and Sant’Anna (2021).

References

[1]

Wooldridge, J. M. (2021). “Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators.”

[2] (1,2)

Wooldridge, J. M. (2023). “Simple Approaches to Nonlinear Difference-in-Differences with Panel Data.” The Econometrics Journal, 26(3), C31-C66.

Examples

The dataset below contains 500 counties observed from 2003 to 2007. Some counties are first treated in 2004, some in 2006, and some in 2007. The variable first.treat indicates the first period of treatment:

In [1]: from moderndid import etwfe, emfx, load_mpdta
   ...: 
   ...: df = load_mpdta()
   ...: print(df.head())
   ...: 
shape: (5, 6)
┌──────┬────────────┬──────────┬──────────┬─────────────┬───────┐
│ year ┆ countyreal ┆ lpop     ┆ lemp     ┆ first.treat ┆ treat │
│ ---  ┆ ---        ┆ ---      ┆ ---      ┆ ---         ┆ ---   │
│ i64  ┆ i64        ┆ f64      ┆ f64      ┆ i64         ┆ i64   │
╞══════╪════════════╪══════════╪══════════╪═════════════╪═══════╡
│ 2003 ┆ 8001       ┆ 5.896761 ┆ 8.461469 ┆ 2007        ┆ 1     │
│ 2004 ┆ 8001       ┆ 5.896761 ┆ 8.33687  ┆ 2007        ┆ 1     │
│ 2005 ┆ 8001       ┆ 5.896761 ┆ 8.340217 ┆ 2007        ┆ 1     │
│ 2006 ┆ 8001       ┆ 5.896761 ┆ 8.378161 ┆ 2007        ┆ 1     │
│ 2007 ┆ 8001       ┆ 5.896761 ┆ 8.487352 ┆ 2007        ┆ 1     │
└──────┴────────────┴──────────┴──────────┴─────────────┴───────┘

Estimate the saturated ETWFE model and print the cohort-time ATTs:

In [2]: mod = etwfe(
   ...:     data=df,
   ...:     yname="lemp",
   ...:     tname="year",
   ...:     gname="first.treat",
   ...:     idname="countyreal",
   ...: )
   ...: print(mod)
   ...: 
==============================================================================
 Extended Two-Way Fixed Effects (ETWFE)
==============================================================================

┌───────┬──────┬──────────┬────────────┬────────────────────────────┐
│ Group │ Time │ ATT(g,t) │ Std. Error │ [95% Pointwise Conf. Band] │
├───────┼──────┼──────────┼────────────┼────────────────────────────┤
│  2004 │ 2004 │  -0.0194 │     0.0308 │ [-0.0798,  0.0410]         │
│  2004 │ 2005 │  -0.0783 │     0.0276 │ [-0.1323, -0.0243] *       │
│  2004 │ 2006 │  -0.1361 │     0.0304 │ [-0.1957, -0.0765] *       │
│  2006 │ 2006 │   0.0025 │     0.0181 │ [-0.0331,  0.0381]         │
│  2004 │ 2007 │  -0.1047 │     0.0329 │ [-0.1693, -0.0401] *       │
│  2006 │ 2007 │  -0.0392 │     0.0217 │ [-0.0816,  0.0033]         │
│  2007 │ 2007 │  -0.0431 │     0.0179 │ [-0.0782, -0.0080] *       │
└───────┴──────┴──────────┴────────────┴────────────────────────────┘

------------------------------------------------------------------------------
 Signif. codes: '*' confidence band does not cover 0

------------------------------------------------------------------------------
 Data Info
------------------------------------------------------------------------------
 Control Group:  Not Yet Treated
 Observations:  2500
 Units:  500
 Fixed Effects:  countyreal + year

------------------------------------------------------------------------------
 Estimation Details
------------------------------------------------------------------------------
 Estimation Method:  Extended TWFE (OLS)
 R-squared:  0.9933

------------------------------------------------------------------------------
 Inference
------------------------------------------------------------------------------
 Significance level: 0.05
 Std. errors: hetero
==============================================================================
 Reference: Wooldridge (2021, 2023)

Aggregate into an event study with emfx:

In [3]: es = emfx(mod, type="event")
   ...: print(es)
   ...: 
==============================================================================
 Aggregate Treatment Effects (Event Study)
==============================================================================

 Overall summary of ATT's based on event-study/dynamic aggregation:

┌─────────┬────────────┬────────────────────────┐
│     ATT │ Std. Error │ [95% Conf. Interval]   │
├─────────┼────────────┼────────────────────────┤
│ -0.0477 │     0.0123 │ [ -0.0719,  -0.0235] * │
└─────────┴────────────┴────────────────────────┘


 Dynamic Effects:

┌────────────┬──────────┬────────────┬────────────────────────────┐
│ Event time │ Estimate │ Std. Error │ [95% Pointwise Conf. Band] │
├────────────┼──────────┼────────────┼────────────────────────────┤
│          0 │  -0.0311 │     0.0132 │ [-0.0569, -0.0052] *       │
│          1 │  -0.0522 │     0.0171 │ [-0.0857, -0.0187] *       │
│          2 │  -0.1361 │     0.0304 │ [-0.1957, -0.0765] *       │
│          3 │  -0.1047 │     0.0329 │ [-0.1693, -0.0401] *       │
└────────────┴──────────┴────────────┴────────────────────────────┘

------------------------------------------------------------------------------
 Signif. codes: '*' confidence band does not cover 0

------------------------------------------------------------------------------
 Data Info
------------------------------------------------------------------------------
 Control Group:  Not Yet Treated
 Observations:  2500
 Units:  500

------------------------------------------------------------------------------
 Estimation Details
------------------------------------------------------------------------------
 Estimation Method:  Extended TWFE (OLS)

------------------------------------------------------------------------------
 Inference
------------------------------------------------------------------------------
 Significance level: 0.05
 Delta method standard errors
==============================================================================
 Reference: Wooldridge (2021, 2023)