moderndid.drdid#

moderndid.drdid(data, yname, tname, idname=None, treatname=None, xformla=None, panel=True, est_method='imp', weightsname=None, boot=False, boot_type='weighted', n_boot=999, inf_func=False, trim_level=0.995)[source]#

Compute the locally efficient doubly robust DiD estimator for the ATT.

Implements doubly robust difference-in-differences estimation following [2]. These estimators combine inverse probability weighting and outcome regression, providing consistency if either the propensity score model or the outcome regression model is correctly specified, but not necessarily both.

The parameter of interest is the average treatment effect on the treated (ATT) in a two-period setting

\[\tau = \mathbb{E}[Y_1(1) - Y_1(0) \mid D = 1].\]

Identification relies on a conditional parallel trends assumption requiring that treated and comparison groups would have evolved similarly absent treatment, conditional on covariates \(X\)

\[\mathbb{E}[Y_1(0) - Y_0(0) \mid D=1, X] = \mathbb{E}[Y_1(0) - Y_0(0) \mid D=0, X].\]

For panel data, the doubly robust estimand combines outcome regression \(\mu_{0,\Delta}(X) = \mathbb{E}[\Delta Y \mid D=0, X]\) with propensity score weighting \(\pi(X) = \mathbb{P}(D=1 \mid X)\)

\[\tau^{dr} = \mathbb{E}\left[\left(\frac{D}{\mathbb{E}[D]} - \frac{\pi(X)(1-D)/(1-\pi(X))} {\mathbb{E}[\pi(X)(1-D)/(1-\pi(X))]}\right) (\Delta Y - \mu_{0,\Delta}(X))\right].\]

Parameters:

datapandas.DataFrame | polars.DataFrame

The input data containing outcome, time, unit ID, treatment, and optionally covariates and weights. Accepts both pandas and polars DataFrames.

ynamestr

Name of the column containing the outcome variable.

tnamestr

Name of the column containing the time periods (must have exactly 2 periods).

idnamestr | None, default None

Name of the column containing the unit ID. Required if panel=True.

treatnamestr

Name of the column containing the treatment group indicator. For panel data: time-invariant indicator (1 if ever treated, 0 if never treated). For repeated cross-sections: treatment status in the post-period.

xformlastr | None, default None

A formula for the covariates to include in the model. Should be of the form “~ X1 + X2” (intercept is always included). If None, equivalent to “~ 1” (intercept only).

panelbool, default True

Whether the data is panel (True) or repeated cross-sections (False). Panel data should be in long format with each row representing a unit-time observation.

est_method{“imp”, “trad”, “imp_local”, “trad_local”}, default “imp”

The method to estimate the nuisance parameters.

“imp”: Uses weighted least squares to estimate outcome regressions and inverse probability tilting to estimate the propensity score, leading to the improved locally efficient DR DiD estimator. For panel data, this corresponds to equation (3.1) in Sant’Anna and Zhao (2020). For repeated cross-sections, this uses a single propensity score model.
“trad”: Uses OLS to estimate outcome regressions and maximum likelihood to estimate propensity score, leading to the “traditional” locally efficient DR DiD estimator.
“imp_local”: For repeated cross-sections only. Implements the locally efficient estimator from equation (3.4) in Sant’Anna and Zhao (2020) with separate outcome regressions for each group and time period.
“trad_local”: For repeated cross-sections only. Traditional DR DiD estimator from equation (3.3) in Sant’Anna and Zhao (2020) that is not locally efficient.

weightsnamestr | None, default None

Name of the column containing sampling weights. If None, all observations have equal weight. Weights are normalized to have mean 1.

bootbool, default False

Whether to compute bootstrap standard errors. If False, analytical standard errors are reported.

boot_type{“weighted”, “multiplier”}, default “weighted”

Type of bootstrap to perform (only relevant if boot=True).

n_bootint, default 999

Number of bootstrap repetitions (only relevant if boot=True).

inf_funcbool, default False

Whether to return the influence function values.

trim_levelfloat, default 0.995

The level of trimming for the propensity score.

Returns:

DRDIDResult

NamedTuple containing:

att: The DR DiD point estimate.
se: The DR DiD standard error.
uci: The upper bound of a 95% confidence interval.
lci: The lower bound of a 95% confidence interval.
boots: Bootstrap draws of the ATT if boot=True.
att_inf_func: Influence function values if inf_func=True.
call_params: Original function call parameters.
args: Arguments used in the estimation.

See also

ipwdid: Inverse propensity weighted DiD estimator.
ordid: Outcome regression DiD estimator.

Notes

When panel data are available (panel=True), the function implements the locally efficient doubly robust DiD estimator for the ATT defined in equation (3.1) in [2]. This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and of a linear regression model for the outcome evolution among the comparison units.

When only stationary repeated cross-section data are available (panel=False), the function implements the locally efficient doubly robust DiD estimator for the ATT defined in equation (3.4) in [2]. This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and of (separate) linear regression models for the outcome of both treated and comparison units, in both pre and post-treatment periods.

When est_method=”imp” (the default), the nuisance parameters are estimated using the methods described in Sections 3.1 and 3.2 of [2]. The propensity score parameters are estimated using the inverse probability tilting estimator proposed by [1], and the outcome regression coefficients are estimated using weighted least squares.

When est_method=”trad”, the propensity score parameters are estimated using maximum likelihood, and the outcome regression coefficients are estimated using ordinary least squares.

The main advantage of using est_method=”imp” is that the resulting estimator is not only locally efficient and doubly robust for the ATT, but it is also doubly robust for inference; see [2] for details.

References

[1]

Graham, B., Pinto, C., and Egel, D. (2012), “Inverse Probability Tilting for Moment Condition Models with Missing Data.” Review of Economic Studies, vol. 79 (3), pp. 1053-1079. https://doi.org/10.1093/restud/rdr047

[2] (1,2,3,4,5)

Sant’Anna, P. H. C. and Zhao, J. (2020), “Doubly Robust Difference-in-Differences Estimators.” Journal of Econometrics, Vol. 219 (1), pp. 101-122. https://doi.org/10.1016/j.jeconom.2020.06.003

Examples

Estimate the average treatment effect on the treated (ATT) using panel data from a job training program. The data tracks the same individuals over time, before and after some received training.

In [1]: import moderndid
   ...: from moderndid import load_nsw
   ...: 
   ...: nsw_data = load_nsw()
   ...: 
   ...: att_result = moderndid.drdid(
   ...:     data=nsw_data,
   ...:     yname="re",
   ...:     tname="year",
   ...:     idname="id",
   ...:     treatname="experimental",
   ...:     xformla="~ age + educ + black + married + nodegree + hisp + re74",
   ...:     panel=True,
   ...:     est_method="imp",
   ...: )
   ...: 

In [2]: print(att_result)
==============================================================================
 Doubly Robust DiD Estimator (Improved Method)
==============================================================================
 Computed from 32834 observations and 12 covariates.

┌───────────┬────────────┬──────────┬───────────────────────────┐
│       ATT │ Std. Error │ Pr(>|t|) │ [95% Conf. Interval]      │
├───────────┼────────────┼──────────┼───────────────────────────┤
│ -901.2703 │   393.6125 │   0.0220 │ [-1672.7508, -129.7898] * │
└───────────┴────────────┴──────────┴───────────────────────────┘

------------------------------------------------------------------------------
 Data Info
------------------------------------------------------------------------------
 Data structure: Panel data

------------------------------------------------------------------------------
 Estimation Details
------------------------------------------------------------------------------
 Outcome regression: Weighted least squares
 Propensity score: Inverse probability tilting

------------------------------------------------------------------------------
 Inference
------------------------------------------------------------------------------
 Standard errors: Analytical
 Propensity score trimming: 0.995
==============================================================================
 Reference: Sant'Anna and Zhao (2020), Journal of Econometrics