moderndid.etwfe#
- moderndid.etwfe(data, yname: str, tname: str, gname: str, idname: str | None = None, xformla: str | None = None, xvar: str | None = None, tref=None, gref=None, cgroup: str = 'notyet', fe: str = 'vs', family=None, weightsname: str | None = None, vcov: str | dict | None = None, alp: float = 0.05, backend=None) EtwfeResult[source]#
Estimate the Extended Two-Way Fixed Effects model.
Implements the ETWFE methodology for difference-in-differences with staggered treatment adoption and heterogeneous treatment effects [1] [2]. Rather than discarding the TWFE estimator, the approach saturates the model with cohort-by-time interaction terms so that the coefficients on the treatment indicators directly recover the cohort-time-specific average treatment effects on the treated,
\[\tau_{g,t} \equiv E[y_t(g) - y_t(\infty) \mid d_g = 1], \quad t \ge g.\]Under no anticipation (NA), conditional parallel trends (CPT), and linearity (LIN), the conditional expectation of the never-treated potential outcome is
\[E[y_t(\infty) \mid \mathbf{d}, \mathbf{x}] = \alpha + \sum_g \beta_g d_g + \mathbf{x}\boldsymbol{\kappa} + \sum_g (d_g \cdot \mathbf{x})\boldsymbol{\xi}_g + \sum_s \gamma_s f_{s,t} + \sum_s (f_{s,t} \cdot \mathbf{x})\boldsymbol{\pi}_s,\]where \(d_g\) are treatment cohort indicators, \(f_{s,t}\) are time dummies, and \(\mathbf{x}\) are time-constant covariates. The ATTs are identified as
\[\tau_{g,t} = E(y_t \mid d_g = 1) - \bigl[(\alpha + \beta_g + \gamma_t) + E(\mathbf{x} \mid d_g = 1) \cdot (\boldsymbol{\kappa} + \boldsymbol{\xi}_g + \boldsymbol{\pi}_t)\bigr].\]The ETWFE regression includes the full set of \(w_t \cdot d_g \cdot f_{s,t}\) treatment interactions, with covariates demeaned about their cohort means \(\dot{\mathbf{x}}_g = \mathbf{x} - \bar{\mathbf{x}}_g\). The pooled OLS estimates of \(\tau_{g,t}\) from this saturated regression are numerically identical to a cohort imputation procedure (Proposition 5.2 in [2]). Use
emfxto aggregate the cell-level estimates into overall, group, calendar, or event-study summaries.- Parameters:
- data
DataFrame Panel data in long format. Accepts any object implementing the Arrow PyCapsule Interface (
__arrow_c_stream__), including polars, pandas, pyarrow Table, and cudf DataFrames.- yname
str The name of the outcome variable.
- tname
str The name of the column containing the time periods.
- gname
str The name of the variable that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It should be 0 for units in the untreated group. It defines which “cohort” a unit belongs to.
- idname
strorNone, default=None The individual (cross-sectional unit) id name. When provided, unit fixed effects are absorbed in the regression.
- xformla
strorNone, default=None A formula for the covariates to include in the model. It should be of the form
"~ x1 + x2". Controls are demeaned within cohort groups following the Mundlak device so that the parallel trends assumption need only hold conditional on covariates.- xvar
strorNone, default=None Name of a covariate to interact with treatment for heterogeneous treatment effect analysis. The variable is demeaned within cohorts and interacted with the treatment and time indicators.
- tref
numericorNone, default=None Reference time period. Defaults to the minimum time period in the data.
- gref
numericorNone, default=None Reference cohort (control group). Auto-detected based on
cgroup. For"never", selects the group beyond the last observed period. For"notyet", defaults to the latest-treated cohort.- cgroup{‘notyet’, ‘never’}, default=’notyet’
Control group strategy:
"notyet": use not-yet-treated units as controls (drops observations once the reference cohort enters treatment)"never": use never-treated units as controls
- fe{‘vs’, ‘feo’, ‘none’}, default=’vs’
Fixed effects specification:
"vs": varying slopes (controls interact with cohort and time FE)"feo": fixed effects only"none": no absorbed fixed effects
- family{
None, ‘gaussian’, ‘poisson’, ‘logit’, ‘probit’}, default=None GLM family for nonlinear models.
Noneand"gaussian"use OLS viafeols."poisson"uses Poisson QMLE viafepois."logit"and"probit"usefeglm. For non-Gaussian families,feis set to"none"andidnameis ignored (unit FE absorption is not supported for GLM).- weightsname
strorNone, default=None The name of the column containing sampling weights. If not set, all observations have equal weight.
- vcov
strordictorNone, default=None Variance-covariance specification passed to pyfixest. Defaults to
"hetero"(heteroskedasticity-robust). Examples:"iid","hetero","HC1",{"CRV1": "cluster_var"}.- alp
float, default=0.05 The significance level.
- backend{‘cupy’, ‘jax’, ‘numba’, ‘rust’, ‘scipy’} or
None, default=None Demeaner backend for pyfixest’s fixed-effects absorption.
"cupy"and"jax"enable GPU acceleration (require CuPy or JAX with GPU support; without a GPU, pyfixest falls back to CPU)."numba"(the default),"rust", and"scipy"are CPU-only.Noneuses pyfixest’s default.
- data
- Returns:
EtwfeResultObject containing ETWFE regression results:
coefficients: coefficient estimates for each interaction term
std_errors: standard errors for each coefficient
vcov: variance-covariance matrix
coef_names: coefficient names from pyfixest
gt_pairs: list of (group, time) pairs for each coefficient
n_obs: number of observations
n_units: number of unique cross-sectional units
r_squared: R-squared of the regression
data: fitted data (used internally by
emfx)config: configuration object (used internally by
emfx)estimation_params: dictionary with estimation details
See also
References
[1]Wooldridge, J. M. (2021). “Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators.”
Examples
The dataset below contains 500 counties observed from 2003 to 2007. Some counties are first treated in 2004, some in 2006, and some in 2007. The variable
first.treatindicates the first period of treatment:In [1]: from moderndid import etwfe, emfx, load_mpdta ...: ...: df = load_mpdta() ...: print(df.head()) ...: shape: (5, 6) ┌──────┬────────────┬──────────┬──────────┬─────────────┬───────┐ │ year ┆ countyreal ┆ lpop ┆ lemp ┆ first.treat ┆ treat │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ f64 ┆ f64 ┆ i64 ┆ i64 │ ╞══════╪════════════╪══════════╪══════════╪═════════════╪═══════╡ │ 2003 ┆ 8001 ┆ 5.896761 ┆ 8.461469 ┆ 2007 ┆ 1 │ │ 2004 ┆ 8001 ┆ 5.896761 ┆ 8.33687 ┆ 2007 ┆ 1 │ │ 2005 ┆ 8001 ┆ 5.896761 ┆ 8.340217 ┆ 2007 ┆ 1 │ │ 2006 ┆ 8001 ┆ 5.896761 ┆ 8.378161 ┆ 2007 ┆ 1 │ │ 2007 ┆ 8001 ┆ 5.896761 ┆ 8.487352 ┆ 2007 ┆ 1 │ └──────┴────────────┴──────────┴──────────┴─────────────┴───────┘
Estimate the saturated ETWFE model and print the cohort-time ATTs:
In [2]: mod = etwfe( ...: data=df, ...: yname="lemp", ...: tname="year", ...: gname="first.treat", ...: idname="countyreal", ...: ) ...: print(mod) ...: ============================================================================== Extended Two-Way Fixed Effects (ETWFE) ============================================================================== ┌───────┬──────┬──────────┬────────────┬────────────────────────────┐ │ Group │ Time │ ATT(g,t) │ Std. Error │ [95% Pointwise Conf. Band] │ ├───────┼──────┼──────────┼────────────┼────────────────────────────┤ │ 2004 │ 2004 │ -0.0194 │ 0.0308 │ [-0.0798, 0.0410] │ │ 2004 │ 2005 │ -0.0783 │ 0.0276 │ [-0.1323, -0.0243] * │ │ 2004 │ 2006 │ -0.1361 │ 0.0304 │ [-0.1957, -0.0765] * │ │ 2006 │ 2006 │ 0.0025 │ 0.0181 │ [-0.0331, 0.0381] │ │ 2004 │ 2007 │ -0.1047 │ 0.0329 │ [-0.1693, -0.0401] * │ │ 2006 │ 2007 │ -0.0392 │ 0.0217 │ [-0.0816, 0.0033] │ │ 2007 │ 2007 │ -0.0431 │ 0.0179 │ [-0.0782, -0.0080] * │ └───────┴──────┴──────────┴────────────┴────────────────────────────┘ ------------------------------------------------------------------------------ Signif. codes: '*' confidence band does not cover 0 ------------------------------------------------------------------------------ Data Info ------------------------------------------------------------------------------ Control Group: Not Yet Treated Observations: 2500 Units: 500 Fixed Effects: countyreal + year ------------------------------------------------------------------------------ Estimation Details ------------------------------------------------------------------------------ Estimation Method: Extended TWFE (OLS) R-squared: 0.9933 ------------------------------------------------------------------------------ Inference ------------------------------------------------------------------------------ Significance level: 0.05 Std. errors: hetero ============================================================================== Reference: Wooldridge (2021, 2023)
Aggregate into an event study with
emfx:In [3]: es = emfx(mod, type="event") ...: print(es) ...: ============================================================================== Aggregate Treatment Effects (Event Study) ============================================================================== Overall summary of ATT's based on event-study/dynamic aggregation: ┌─────────┬────────────┬────────────────────────┐ │ ATT │ Std. Error │ [95% Conf. Interval] │ ├─────────┼────────────┼────────────────────────┤ │ -0.0477 │ 0.0123 │ [ -0.0719, -0.0235] * │ └─────────┴────────────┴────────────────────────┘ Dynamic Effects: ┌────────────┬──────────┬────────────┬────────────────────────────┐ │ Event time │ Estimate │ Std. Error │ [95% Pointwise Conf. Band] │ ├────────────┼──────────┼────────────┼────────────────────────────┤ │ 0 │ -0.0311 │ 0.0132 │ [-0.0569, -0.0052] * │ │ 1 │ -0.0522 │ 0.0171 │ [-0.0857, -0.0187] * │ │ 2 │ -0.1361 │ 0.0304 │ [-0.1957, -0.0765] * │ │ 3 │ -0.1047 │ 0.0329 │ [-0.1693, -0.0401] * │ └────────────┴──────────┴────────────┴────────────────────────────┘ ------------------------------------------------------------------------------ Signif. codes: '*' confidence band does not cover 0 ------------------------------------------------------------------------------ Data Info ------------------------------------------------------------------------------ Control Group: Not Yet Treated Observations: 2500 Units: 500 ------------------------------------------------------------------------------ Estimation Details ------------------------------------------------------------------------------ Estimation Method: Extended TWFE (OLS) ------------------------------------------------------------------------------ Inference ------------------------------------------------------------------------------ Significance level: 0.05 Delta method standard errors ============================================================================== Reference: Wooldridge (2021, 2023)