moderndid.npiv#

moderndid.npiv(y, x, w, x_eval=None, x_grid=None, alpha=0.05, basis='tensor', boot_num=99, j_x_degree=3, j_x_segments=None, k_w_degree=4, k_w_segments=None, k_w_smooth=2, knots='uniform', ucb_h=True, ucb_deriv=True, deriv_index=1, deriv_order=1, check_is_fullrank=False, w_min=None, w_max=None, x_min=None, x_max=None, seed=None)[source]#

Estimate nonparametric instrumental variables model with uniform confidence bands.

Estimates the structural function \(h_0\) and its derivatives in the nonparametric IV model

\[\mathbb{E}[Y - h_0(X) \mid W] = 0 \quad \text{(a.s.)}\]

where \(Y\) is a scalar outcome, \(X\) is a (possibly endogenous) regressor vector, and \(W\) is a vector of instrumental variables. The function is approximated by a B-spline sieve \(h_0(x) \approx (\psi^J(x))' c_J\) and coefficients are estimated by two-stage least squares using \(K\) B-spline basis functions of \(W\) as instruments

\[\hat{c}_J = (\boldsymbol{\Psi}_J' \mathbf{P}_K \boldsymbol{\Psi}_J)^{-} \boldsymbol{\Psi}_J' \mathbf{P}_K \mathbf{Y},\]

where \(\mathbf{P}_K = \mathbf{B}_K (\mathbf{B}_K' \mathbf{B}_K)^{-} \mathbf{B}_K'\) projects onto the instrument space. Function and derivative estimates are then given by

\[\hat{h}_J(x) = (\psi^J(x))' \hat{c}_J, \quad \partial^a \hat{h}_J(x) = (\partial^a \psi^J(x))' \hat{c}_J.\]

When j_x_segments is None, a bootstrap implementation of Lepski’s method selects the sieve dimension \(\tilde{J}\) that adapts to the unknown smoothness of \(h_0\) and instrument strength, achieving the minimax sup-norm convergence rate for both \(h_0\) and its derivatives.

The adaptive CCK procedure then constructs honest uniform confidence bands that guarantee coverage uniformly over a class of data-generating processes. When a fixed j_x_segments is supplied, the standard undersmoothing approach of [1] is used instead.

Parameters:

ynumpy.ndarray of shape (n,)

Outcome variable.

xnumpy.ndarray of shape (n,) or (n, p_x)

Endogenous regressors. Automatically promoted to 2-d if needed.

wnumpy.ndarray of shape (n,) or (n, p_w)

Instrumental variables. Requires \(K \geq J\).

x_evalnumpy.ndarray of shape (m, p_x), optional

Points at which to evaluate \(\hat{h}\) and its derivatives. If None, evaluates at the sample points x.

x_gridnumpy.ndarray, optional

Alias for x_eval. Ignored when x_eval is provided.

alphafloat, default=0.05

Significance level for \(100(1-\alpha)\%\) confidence bands.

basis{“tensor”, “additive”, “glp”}, default=”tensor”

Multivariate basis construction for \(X\):

"tensor": Full tensor product of univariate B-splines.
"additive": Sum of univariate B-splines (additive model).
"glp": Generalized linear product (hierarchical interactions).

boot_numint, default=99

Number of multiplier bootstrap draws for critical value computation. Each draw generates i.i.d. \(N(0,1)\) weights \((\varpi_i)_{i=1}^n\) to form bootstrap sup-\(t\) statistics.

j_x_degreeint, default=3

Degree of B-spline basis for \(X\) (order \(r = \text{degree} + 1\)). For UCBs of first derivatives, degree \(\geq 2\) is required; for second derivatives, \(\geq 3\).

j_x_segmentsint, optional

Number of segments for the \(X\) basis, determining sieve dimension \(J\). When None, the data-driven Lepski procedure selects \(\tilde{J}\) adaptively. Supplying a fixed value triggers the undersmoothing UCB approach.

k_w_degreeint, default=4

Degree of B-spline basis for \(W\). Defaults to j_x_degree + 1 because the reduced form \(\mathbb{E}[h_0(X) \mid W]\) is smoother than \(h_0\).

k_w_segmentsint, optional

Number of segments for the instrument basis. When None, chosen proportionally to j_x_segments via the resolution-level mapping \(l_w = \lceil (l + q) \, d / d_w \rceil\), where \(q\) is controlled by k_w_smooth.

k_w_smoothint, default=2

Controls the resolution gap \(q\) between the \(X\) and \(W\) bases in the data-driven procedure. Larger values yield more instrument basis functions relative to the \(X\) basis.

knots{“uniform”, “quantiles”}, default=”uniform”

Knot placement strategy:

"uniform": Equally spaced knots on the support.
"quantiles": Knots at empirical quantiles of the data.

ucb_hbool, default=True

Compute uniform confidence bands for \(\hat{h}\).

ucb_derivbool, default=True

Compute uniform confidence bands for \(\partial^a \hat{h}\).

deriv_indexint, default=1

Which component of \(X\) to differentiate with respect to (1-based indexing).

deriv_orderint, default=1

Order \(|a|\) of the derivative (1 = first, 2 = second, etc.).

check_is_fullrankbool, default=False

Verify that the basis matrices \(\boldsymbol{\Psi}_J\) and \(\mathbf{B}_K\) have full column rank before estimation.

w_min, w_maxfloat, optional

Override the support bounds for \(W\). Defaults to data range.

x_min, x_maxfloat, optional

Override the support bounds for \(X\). Defaults to data range.

seedint, optional

Random seed for bootstrap reproducibility.

Returns:

NPIVResult

Named tuple with the following fields:

h – Estimated \(\hat{h}_J(x)\) at evaluation points.
deriv – Estimated \(\partial^a \hat{h}_J(x)\).
h_lower, h_upper – Lower/upper UCB for \(h_0\).
h_lower_deriv, h_upper_deriv – Lower/upper UCB for \(\partial^a h_0\).
beta – Sieve coefficient vector \(\hat{c}_J\).
asy_se – Pointwise asymptotic standard errors \(\hat{\sigma}_J(x)\).
deriv_asy_se – Pointwise asymptotic standard errors \(\hat{\sigma}_J^a(x)\) for derivatives.
cv, cv_deriv – Bootstrap critical values \(z_{1-\alpha}^*\) used for band construction.
residuals – TSLS residuals \(\hat{u}_{i,J} = Y_i - \hat{h}_J(X_i)\).
j_x_degree, j_x_segments – Basis parameters for \(X\) (segments may differ from input when data-driven).
k_w_degree, k_w_segments – Basis parameters for \(W\).
args – Diagnostic dictionary. When data-driven selection is used, includes j_x_seg, k_w_seg, j_hat_max, theta_star, and other selection diagnostics.