moderndid.drdid#
- moderndid.drdid(data, yname, tname, idname=None, treatname=None, xformla=None, panel=True, est_method='imp', weightsname=None, boot=False, boot_type='weighted', n_boot=999, inf_func=False, trim_level=0.995)[source]#
Compute the locally efficient doubly robust DiD estimator for the ATT.
Implements doubly robust difference-in-differences estimation following [2]. These estimators combine inverse probability weighting and outcome regression, providing consistency if either the propensity score model or the outcome regression model is correctly specified, but not necessarily both.
The parameter of interest is the average treatment effect on the treated (ATT) in a two-period setting
\[\tau = \mathbb{E}[Y_1(1) - Y_1(0) \mid D = 1].\]Identification relies on a conditional parallel trends assumption requiring that treated and comparison groups would have evolved similarly absent treatment, conditional on covariates \(X\)
\[\mathbb{E}[Y_1(0) - Y_0(0) \mid D=1, X] = \mathbb{E}[Y_1(0) - Y_0(0) \mid D=0, X].\]For panel data, the doubly robust estimand combines outcome regression \(\mu_{0,\Delta}(X) = \mathbb{E}[\Delta Y \mid D=0, X]\) with propensity score weighting \(\pi(X) = \mathbb{P}(D=1 \mid X)\)
\[\tau^{dr} = \mathbb{E}\left[\left(\frac{D}{\mathbb{E}[D]} - \frac{\pi(X)(1-D)/(1-\pi(X))} {\mathbb{E}[\pi(X)(1-D)/(1-\pi(X))]}\right) (\Delta Y - \mu_{0,\Delta}(X))\right].\]- Parameters:
- data
pandas.DataFrame|polars.DataFrame The input data containing outcome, time, unit ID, treatment, and optionally covariates and weights. Accepts both pandas and polars DataFrames.
- yname
str Name of the column containing the outcome variable.
- tname
str Name of the column containing the time periods (must have exactly 2 periods).
- idname
str|None, defaultNone Name of the column containing the unit ID. Required if panel=True.
- treatname
str Name of the column containing the treatment group indicator. For panel data: time-invariant indicator (1 if ever treated, 0 if never treated). For repeated cross-sections: treatment status in the post-period.
- xformla
str|None, defaultNone A formula for the covariates to include in the model. Should be of the form “~ X1 + X2” (intercept is always included). If None, equivalent to “~ 1” (intercept only).
- panelbool, default
True Whether the data is panel (True) or repeated cross-sections (False). Panel data should be in long format with each row representing a unit-time observation.
- est_method{“imp”, “trad”, “imp_local”, “trad_local”}, default “imp”
The method to estimate the nuisance parameters.
“imp”: Uses weighted least squares to estimate outcome regressions and inverse probability tilting to estimate the propensity score, leading to the improved locally efficient DR DiD estimator. For panel data, this corresponds to equation (3.1) in Sant’Anna and Zhao (2020). For repeated cross-sections, this uses a single propensity score model.
“trad”: Uses OLS to estimate outcome regressions and maximum likelihood to estimate propensity score, leading to the “traditional” locally efficient DR DiD estimator.
“imp_local”: For repeated cross-sections only. Implements the locally efficient estimator from equation (3.4) in Sant’Anna and Zhao (2020) with separate outcome regressions for each group and time period.
“trad_local”: For repeated cross-sections only. Traditional DR DiD estimator from equation (3.3) in Sant’Anna and Zhao (2020) that is not locally efficient.
- weightsname
str|None, defaultNone Name of the column containing sampling weights. If None, all observations have equal weight. Weights are normalized to have mean 1.
- bootbool, default
False Whether to compute bootstrap standard errors. If False, analytical standard errors are reported.
- boot_type{“weighted”, “multiplier”}, default “weighted”
Type of bootstrap to perform (only relevant if boot=True).
- n_boot
int, default 999 Number of bootstrap repetitions (only relevant if boot=True).
- inf_funcbool, default
False Whether to return the influence function values.
- trim_level
float, default 0.995 The level of trimming for the propensity score.
- data
- Returns:
DRDIDResultNamedTuple containing:
att: The DR DiD point estimate.
se: The DR DiD standard error.
uci: The upper bound of a 95% confidence interval.
lci: The lower bound of a 95% confidence interval.
boots: Bootstrap draws of the ATT if boot=True.
att_inf_func: Influence function values if inf_func=True.
call_params: Original function call parameters.
args: Arguments used in the estimation.
Notes
When panel data are available (panel=True), the function implements the locally efficient doubly robust DiD estimator for the ATT defined in equation (3.1) in [2]. This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and of a linear regression model for the outcome evolution among the comparison units.
When only stationary repeated cross-section data are available (panel=False), the function implements the locally efficient doubly robust DiD estimator for the ATT defined in equation (3.4) in [2]. This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and of (separate) linear regression models for the outcome of both treated and comparison units, in both pre and post-treatment periods.
When est_method=”imp” (the default), the nuisance parameters are estimated using the methods described in Sections 3.1 and 3.2 of [2]. The propensity score parameters are estimated using the inverse probability tilting estimator proposed by [1], and the outcome regression coefficients are estimated using weighted least squares.
When est_method=”trad”, the propensity score parameters are estimated using maximum likelihood, and the outcome regression coefficients are estimated using ordinary least squares.
The main advantage of using est_method=”imp” is that the resulting estimator is not only locally efficient and doubly robust for the ATT, but it is also doubly robust for inference; see [2] for details.
References
[1]Graham, B., Pinto, C., and Egel, D. (2012), “Inverse Probability Tilting for Moment Condition Models with Missing Data.” Review of Economic Studies, vol. 79 (3), pp. 1053-1079. https://doi.org/10.1093/restud/rdr047
Examples
Estimate the average treatment effect on the treated (ATT) using panel data from a job training program. The data tracks the same individuals over time, before and after some received training.
In [1]: import moderndid ...: from moderndid import load_nsw ...: ...: nsw_data = load_nsw() ...: ...: att_result = moderndid.drdid( ...: data=nsw_data, ...: yname="re", ...: tname="year", ...: idname="id", ...: treatname="experimental", ...: xformla="~ age + educ + black + married + nodegree + hisp + re74", ...: panel=True, ...: est_method="imp", ...: ) ...: In [2]: print(att_result) ============================================================================== Doubly Robust DiD Estimator (Improved Method) ============================================================================== Computed from 32834 observations and 12 covariates. ┌───────────┬────────────┬──────────┬───────────────────────────┐ │ ATT │ Std. Error │ Pr(>|t|) │ [95% Conf. Interval] │ ├───────────┼────────────┼──────────┼───────────────────────────┤ │ -901.2703 │ 393.6125 │ 0.0220 │ [-1672.7508, -129.7898] * │ └───────────┴────────────┴──────────┴───────────────────────────┘ ------------------------------------------------------------------------------ Data Info ------------------------------------------------------------------------------ Data structure: Panel data ------------------------------------------------------------------------------ Estimation Details ------------------------------------------------------------------------------ Outcome regression: Weighted least squares Propensity score: Inverse probability tilting ------------------------------------------------------------------------------ Inference ------------------------------------------------------------------------------ Standard errors: Analytical Propensity score trimming: 0.995 ============================================================================== Reference: Sant'Anna and Zhao (2020), Journal of Econometrics