Knockoff Statistic \(W_j\)

Generally speaking, knockoff statistic \(W\) make use of the information between design matrix \(X\) and its knockofff copy \(\tilde{X}\). After we get the knockoff statistic \(W_j\), we can compute data-dependent threshold using following formula: \[ T=\min \left\{t \in \mathcal{R}: \frac{\#\left(j: W_{j} \leq-t\right\}}{\#\left(j: W_{j} \geq t\right\} \vee 1} \leq q\right\} \] Where q is target FDR level. Based on the threshold \(T\), features j’s with \(W_j>T\) will be selected.

There are many ways to evaluate the importance of variable and compute \(W_j\), here we just list some kncokoff statistics used in our project and interactive tools.

Advanced usage with custom knockoff statistic

Model data using GLM

If you believe the response variable \(y\) is from generalied linear model, you may choose penalized generalized linear models to get the estimate of coeffiients as well as the entering time \(\lambda_k\). For example, if response variable \(y\) is binary, we can model the data with GLM, family=“binomial”; if response variable is non-negative integer, you can consider family=“poisson”.

Extra Method to copute knockoff statistic

Reference