moderndid.npiv_choose_j#

moderndid.npiv_choose_j(y, x, w, x_grid=None, j_x_degree=3, k_w_degree=4, k_w_smooth=2, knots='uniform', basis='tensor', x_min=None, x_max=None, w_min=None, w_max=None, grid_num=50, boot_num=99, check_is_fullrank=False, seed=None)[source]#

Select optimal B-spline dimensions.

Implements the full data-driven selection procedure from [1], combining a maximum dimension selection step with a Lepski-style test. This procedure selects a data-driven sieve dimension, \(\tilde{J}\), that is sup-norm rate-adaptive. This means it adapts to unknown features of the data-generating process (e.g., smoothness of \(h_0\), instrument strength) to achieve the minimax optimal convergence rate in sup-norm,

\[\sup_x |\hat{h}_{\tilde{J}}(x) - h_0(x)|.\]

The procedure involves two main steps. First, determine a maximum feasible dimension, \(\hat{J}_{\max}\), based on the sample size and an estimate of the sieve measure of ill-posedness, \(\hat{s}_J^{-1}\). This step defines the search grid \(\hat{\mathcal{J}}\) as

\[\hat{J}_{\max} = \min \left\{ J \in \mathcal{T} : J \sqrt{\log J} \hat{s}_J^{-1} \leq c \sqrt{n} \right\}.\]

Second, use a Lepski-style method with a multiplier bootstrap to select the optimal dimension \(\hat{J}\) from the grid \(\hat{\mathcal{J}}\). This is done by comparing estimates across different dimensions and selecting the smallest dimension that is not statistically different from estimates at larger dimensions. The final choice is \(\tilde{J} = \min\{\hat{J}, \hat{J}_n\}\), where \(\hat{J}_n\) is a slightly smaller, more conservative dimension for stability.

\[\hat{J} = \min \left\{ J \in \hat{\mathcal{J}} : \sup_{x, J_2 > J} \left| \frac{\hat{h}_J(x) - \hat{h}_{J_2}(x)}{\hat{\sigma}_{J, J_2}(x)} \right| \leq \theta_{1-\hat{\alpha}}^* \right\}.\]
Parameters:
ynumpy.ndarray

Dependent variable vector.

xnumpy.ndarray

Endogenous regressor matrix.

wnumpy.ndarray

Instrument matrix.

x_gridnumpy.ndarray, optional

Grid points for evaluation. If None, created automatically.

j_x_degreeint, default=3

Degree of B-spline basis for \(X\).

k_w_degreeint, default=4

Degree of B-spline basis for \(W\).

k_w_smoothint, default=2

Smoothness parameter for \(K\) selection.

knots{“uniform”, “quantiles”}, default=”uniform”

Knot placement method.

basis{“tensor”, “additive”, “glp”}, default=”tensor”

Type of basis for multivariate \(X\):

  • “tensor”: Full tensor product of univariate bases

  • “additive”: Sum of univariate bases

  • “glp”: Generalized linear product (hierarchical)

x_min, x_max, w_min, w_maxfloat, optional

Range limits for basis construction.

grid_numint, default=50

Number of grid points for evaluation.

boot_numint, default=99

Number of bootstrap replications for confidence bands.

check_is_fullrankbool, default=False

Whether to check if basis matrices have full rank.

seedint, optional

Random seed for reproducibility.

Returns:
dict

Dictionary containing:

  • j_x_segments: Selected number of segments for \(X\)

  • k_w_segments: Corresponding segments for \(W\)

  • j_tilde: Selected dimension

  • theta_star: Bootstrap critical value

  • j_hat_max: Maximum feasible dimension

  • Additional diagnostic information

See also

npiv_jhat_max

Compute maximum feasible dimension

npiv_j

Lepski-style test for dimension selection

References

[1]

Chen, X., Christensen, T. M., & Kankanala, S. (2024). Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities. https://arxiv.org/abs/2107.11869.