moderndid.npiv_choose_j#
- moderndid.npiv_choose_j(y, x, w, x_grid=None, j_x_degree=3, k_w_degree=4, k_w_smooth=2, knots='uniform', basis='tensor', x_min=None, x_max=None, w_min=None, w_max=None, grid_num=50, boot_num=99, check_is_fullrank=False, seed=None)[source]#
Select optimal B-spline dimensions.
Implements the full data-driven selection procedure from [1], combining a maximum dimension selection step with a Lepski-style test. This procedure selects a data-driven sieve dimension, \(\tilde{J}\), that is sup-norm rate-adaptive. This means it adapts to unknown features of the data-generating process (e.g., smoothness of \(h_0\), instrument strength) to achieve the minimax optimal convergence rate in sup-norm,
\[\sup_x |\hat{h}_{\tilde{J}}(x) - h_0(x)|.\]The procedure involves two main steps. First, determine a maximum feasible dimension, \(\hat{J}_{\max}\), based on the sample size and an estimate of the sieve measure of ill-posedness, \(\hat{s}_J^{-1}\). This step defines the search grid \(\hat{\mathcal{J}}\) as
\[\hat{J}_{\max} = \min \left\{ J \in \mathcal{T} : J \sqrt{\log J} \hat{s}_J^{-1} \leq c \sqrt{n} \right\}.\]Second, use a Lepski-style method with a multiplier bootstrap to select the optimal dimension \(\hat{J}\) from the grid \(\hat{\mathcal{J}}\). This is done by comparing estimates across different dimensions and selecting the smallest dimension that is not statistically different from estimates at larger dimensions. The final choice is \(\tilde{J} = \min\{\hat{J}, \hat{J}_n\}\), where \(\hat{J}_n\) is a slightly smaller, more conservative dimension for stability.
\[\hat{J} = \min \left\{ J \in \hat{\mathcal{J}} : \sup_{x, J_2 > J} \left| \frac{\hat{h}_J(x) - \hat{h}_{J_2}(x)}{\hat{\sigma}_{J, J_2}(x)} \right| \leq \theta_{1-\hat{\alpha}}^* \right\}.\]- Parameters:
- y
numpy.ndarray Dependent variable vector.
- x
numpy.ndarray Endogenous regressor matrix.
- w
numpy.ndarray Instrument matrix.
- x_grid
numpy.ndarray, optional Grid points for evaluation. If None, created automatically.
- j_x_degree
int, default=3 Degree of B-spline basis for \(X\).
- k_w_degree
int, default=4 Degree of B-spline basis for \(W\).
- k_w_smooth
int, default=2 Smoothness parameter for \(K\) selection.
- knots{“uniform”, “quantiles”}, default=”uniform”
Knot placement method.
- basis{“tensor”, “additive”, “glp”}, default=”tensor”
Type of basis for multivariate \(X\):
“tensor”: Full tensor product of univariate bases
“additive”: Sum of univariate bases
“glp”: Generalized linear product (hierarchical)
- x_min, x_max, w_min, w_max
float, optional Range limits for basis construction.
- grid_num
int, default=50 Number of grid points for evaluation.
- boot_num
int, default=99 Number of bootstrap replications for confidence bands.
- check_is_fullrankbool, default=False
Whether to check if basis matrices have full rank.
- seed
int, optional Random seed for reproducibility.
- y
- Returns:
dictDictionary containing:
j_x_segments: Selected number of segments for \(X\)
k_w_segments: Corresponding segments for \(W\)
j_tilde: Selected dimension
theta_star: Bootstrap critical value
j_hat_max: Maximum feasible dimension
Additional diagnostic information
See also
npiv_jhat_maxCompute maximum feasible dimension
npiv_jLepski-style test for dimension selection
References
[1]Chen, X., Christensen, T. M., & Kankanala, S. (2024). Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities. https://arxiv.org/abs/2107.11869.