Abstract
Let $$ Y_j=f_{\ast}(X_j)+\xi_j,\qquad j=1,\dots, n, $$ where $X, X_1,\dots, X_n$ are i.i.d. random variables in a measurable space $(S,\mathcal{A})$ with distribution $\Pi$ and $\xi, \xi_1,\dots ,\xi_n$ are i.i.d. random variables with ${\mathbb E}\xi=0$ independent of $(X_1,\dots, X_n).$ Given a dictionary $h_1,\dots, h_{N}: S\mapsto{\mathbb R},$ let $ f_{\lambda}:=\sum_{j=1}^N \lambda_j h_j$, $ \lambda=(\lambda_1,\dots, \lambda_N)\in{\mathbb R}^N. $ Given $\varepsilon>0,$ define $$ \hat\Lambda_{\varepsilon}:=\Biggl\{\lambda\in{\mathbb R}^N: \max_{1\leq k\leq N} \Biggl|n^{-1}\sum_{j=1}^n \bigl(f_{\lambda}(X_j)-Y_j\bigr)h_k(X_j)\Biggr| \leq\varepsilon \Biggr\} $$ and $$\hat\lambda:=\hat\lambda^{\varepsilon}\in \operatorname{Argmin}_{\lambda\in\hat\Lambda_{\varepsilon}}\|\lambda\| _{\ell_1}. $$ In the case where $f_{\ast}:=f_{\lambda^{\ast}}, \lambda^{\ast}\in {\mathbb R}^N,$ Candes and Tao Ann. Statist. 35 (2007) 2313-2351] suggested using $\hat\lambda$ as an estimator of $\lambda^{\ast}.$ They called this estimator “the Dantzig selector”. We study the properties of $f_{\hat\lambda}$ as an estimator of $f_{\ast}$ for regression models with random design, extending some of the results of Candes and Tao (and providing alternative proofs of these results).
Citation
Vladimir Koltchinskii. "The Dantzig selector and sparsity oracle inequalities." Bernoulli 15 (3) 799 - 828, August 2009. https://doi.org/10.3150/09-BEJ187
Information