moderndid.npiv_choose_j#

moderndid.npiv_choose_j(y, x, w, x_grid=None, j_x_degree=3, k_w_degree=4, k_w_smooth=2, knots='uniform', basis='tensor', x_min=None, x_max=None, w_min=None, w_max=None, grid_num=50, biters=99, check_is_fullrank=False, seed=None)[source]#

Select optimal B-spline dimensions.

Implements the full data-driven selection procedure from [1], combining a maximum dimension selection step with a Lepski-style test. This procedure selects a data-driven sieve dimension, \(\tilde{J}\), that is sup-norm rate-adaptive. This means it adapts to unknown features of the data-generating process (e.g., smoothness of \(h_0\), instrument strength) to achieve the minimax optimal convergence rate in sup-norm,

\[\sup_x |\hat{h}_{\tilde{J}}(x) - h_0(x)|.\]

The procedure involves two main steps. First, determine a maximum feasible dimension, \(\hat{J}_{\max}\), based on the sample size and an estimate of the sieve measure of ill-posedness, \(\hat{s}_J^{-1}\). This step defines the search grid \(\hat{\mathcal{J}}\) as

\[\hat{J}_{\max} = \min \left\{ J \in \mathcal{T} : J \sqrt{\log J} \hat{s}_J^{-1} \leq c \sqrt{n} \right\}.\]

Second, use a Lepski-style method with a multiplier bootstrap to select the optimal dimension \(\hat{J}\) from the grid \(\hat{\mathcal{J}}\). This is done by comparing estimates across different dimensions and selecting the smallest dimension that is not statistically different from estimates at larger dimensions. The final choice is \(\tilde{J} = \min\{\hat{J}, \hat{J}_n\}\), where \(\hat{J}_n\) is a slightly smaller, more conservative dimension for stability.

\[\hat{J} = \min \left\{ J \in \hat{\mathcal{J}} : \sup_{x, J_2 > J} \left| \frac{\hat{h}_J(x) - \hat{h}_{J_2}(x)}{\hat{\sigma}_{J, J_2}(x)} \right| \leq \theta_{1-\hat{\alpha}}^* \right\}.\]

Parameters:

ynumpy.ndarray

Dependent variable vector.

xnumpy.ndarray

Endogenous regressor matrix.

wnumpy.ndarray

Instrument matrix.

x_gridnumpy.ndarray, optional

Grid points for evaluation. If None, created automatically.

j_x_degreeint, default=3

Degree of B-spline basis for \(X\).

k_w_degreeint, default=4

Degree of B-spline basis for \(W\).

k_w_smoothint, default=2

Smoothness parameter for \(K\) selection.

knots{“uniform”, “quantiles”}, default=”uniform”

Knot placement method.

basis{“tensor”, “additive”, “glp”}, default=”tensor”

Type of basis for multivariate \(X\):

“tensor”: Full tensor product of univariate bases
“additive”: Sum of univariate bases
“glp”: Generalized linear product (hierarchical)

x_min, x_max, w_min, w_maxfloat, optional

Range limits for basis construction.

grid_numint, default=50

Number of grid points for evaluation.

bitersint, default=99

Number of bootstrap replications for confidence bands.

check_is_fullrankbool, default=False

Whether to check if basis matrices have full rank.

seedint, optional

Random seed for reproducibility.

Returns:

dict

Dictionary containing:

j_x_segments: Selected number of segments for \(X\)
k_w_segments: Corresponding segments for \(W\)
j_tilde: Selected dimension
theta_star: Bootstrap critical value
j_hat_max: Maximum feasible dimension
Additional diagnostic information

See also

npiv_jhat_max: Compute maximum feasible dimension
npiv_j: Lepski-style test for dimension selection

References

[1]

Chen, X., Christensen, T. M., & Kankanala, S. (2024). Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities. https://arxiv.org/abs/2107.11869.

Examples

Select the sieve dimension adaptively for the Engel curve IV problem. The procedure determines the maximum feasible dimension and then applies the Lepski test to choose the optimal number of B-spline segments:

In [1]: import numpy as np
   ...: from moderndid import load_engel
   ...: from moderndid.npiv import npiv_choose_j
   ...: 
   ...: df = load_engel()
   ...: y = df["food"].to_numpy()
   ...: x = df["logexp"].to_numpy().reshape(-1, 1)
   ...: w = df["logwages"].to_numpy().reshape(-1, 1)
   ...: sel = npiv_choose_j(y=y, x=x, w=w, biters=50, seed=42)
   ...: print(f"Selected segments: {sel['j_x_seg']}")
   ...: print(f"Selected dimension (J_tilde): {sel['j_tilde']}")
   ...: print(f"Max feasible dimension: {sel['j_hat_max']}")
   ...: print(f"Bootstrap critical value: {sel['theta_star']:.3f}")
   ...: 
Selected segments: 4
Selected dimension (J_tilde): 7
Max feasible dimension: 131
Bootstrap critical value: 3.461