moderndid.npiv_choose_j#
- moderndid.npiv_choose_j(y, x, w, x_grid=None, j_x_degree=3, k_w_degree=4, k_w_smooth=2, knots='uniform', basis='tensor', x_min=None, x_max=None, w_min=None, w_max=None, grid_num=50, biters=99, check_is_fullrank=False, seed=None)[source]#
Select optimal B-spline dimensions.
Implements the full data-driven selection procedure from [1], combining a maximum dimension selection step with a Lepski-style test. This procedure selects a data-driven sieve dimension, \(\tilde{J}\), that is sup-norm rate-adaptive. This means it adapts to unknown features of the data-generating process (e.g., smoothness of \(h_0\), instrument strength) to achieve the minimax optimal convergence rate in sup-norm,
\[\sup_x |\hat{h}_{\tilde{J}}(x) - h_0(x)|.\]The procedure involves two main steps. First, determine a maximum feasible dimension, \(\hat{J}_{\max}\), based on the sample size and an estimate of the sieve measure of ill-posedness, \(\hat{s}_J^{-1}\). This step defines the search grid \(\hat{\mathcal{J}}\) as
\[\hat{J}_{\max} = \min \left\{ J \in \mathcal{T} : J \sqrt{\log J} \hat{s}_J^{-1} \leq c \sqrt{n} \right\}.\]Second, use a Lepski-style method with a multiplier bootstrap to select the optimal dimension \(\hat{J}\) from the grid \(\hat{\mathcal{J}}\). This is done by comparing estimates across different dimensions and selecting the smallest dimension that is not statistically different from estimates at larger dimensions. The final choice is \(\tilde{J} = \min\{\hat{J}, \hat{J}_n\}\), where \(\hat{J}_n\) is a slightly smaller, more conservative dimension for stability.
\[\hat{J} = \min \left\{ J \in \hat{\mathcal{J}} : \sup_{x, J_2 > J} \left| \frac{\hat{h}_J(x) - \hat{h}_{J_2}(x)}{\hat{\sigma}_{J, J_2}(x)} \right| \leq \theta_{1-\hat{\alpha}}^* \right\}.\]- Parameters:
- y
numpy.ndarray Dependent variable vector.
- x
numpy.ndarray Endogenous regressor matrix.
- w
numpy.ndarray Instrument matrix.
- x_grid
numpy.ndarray, optional Grid points for evaluation. If None, created automatically.
- j_x_degree
int, default=3 Degree of B-spline basis for \(X\).
- k_w_degree
int, default=4 Degree of B-spline basis for \(W\).
- k_w_smooth
int, default=2 Smoothness parameter for \(K\) selection.
- knots{“uniform”, “quantiles”}, default=”uniform”
Knot placement method.
- basis{“tensor”, “additive”, “glp”}, default=”tensor”
Type of basis for multivariate \(X\):
“tensor”: Full tensor product of univariate bases
“additive”: Sum of univariate bases
“glp”: Generalized linear product (hierarchical)
- x_min, x_max, w_min, w_max
float, optional Range limits for basis construction.
- grid_num
int, default=50 Number of grid points for evaluation.
- biters
int, default=99 Number of bootstrap replications for confidence bands.
- check_is_fullrankbool, default=False
Whether to check if basis matrices have full rank.
- seed
int, optional Random seed for reproducibility.
- y
- Returns:
dictDictionary containing:
j_x_segments: Selected number of segments for \(X\)
k_w_segments: Corresponding segments for \(W\)
j_tilde: Selected dimension
theta_star: Bootstrap critical value
j_hat_max: Maximum feasible dimension
Additional diagnostic information
See also
npiv_jhat_maxCompute maximum feasible dimension
npiv_jLepski-style test for dimension selection
References
[1]Chen, X., Christensen, T. M., & Kankanala, S. (2024). Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities. https://arxiv.org/abs/2107.11869.
Examples
Select the sieve dimension adaptively for the Engel curve IV problem. The procedure determines the maximum feasible dimension and then applies the Lepski test to choose the optimal number of B-spline segments:
In [1]: import numpy as np ...: from moderndid import load_engel ...: from moderndid.npiv import npiv_choose_j ...: ...: df = load_engel() ...: y = df["food"].to_numpy() ...: x = df["logexp"].to_numpy().reshape(-1, 1) ...: w = df["logwages"].to_numpy().reshape(-1, 1) ...: sel = npiv_choose_j(y=y, x=x, w=w, biters=50, seed=42) ...: print(f"Selected segments: {sel['j_x_seg']}") ...: print(f"Selected dimension (J_tilde): {sel['j_tilde']}") ...: print(f"Max feasible dimension: {sel['j_hat_max']}") ...: print(f"Bootstrap critical value: {sel['theta_star']:.3f}") ...: Selected segments: 4 Selected dimension (J_tilde): 7 Max feasible dimension: 131 Bootstrap critical value: 3.461