Project 2 Simulation Study:

Josh Day
April 21, 2014

Introduction

  • Setup of Problem
  • Test statistics
  • Design of simulation
  • Results

Setup of Problem

Setup of Problem

  • \( H_0: X_i \sim N(0,1), \; i=1,...,n \)
  • \( H_A: X_i \sim (1-\pi)N(0,1) + \pi N(A, 1), \; i=1,...,n \)

  • Did we observe pure noise or noise plus signal?

Parameterization

How do we parameterize \( \pi, A \)?

  • \( \pi = \pi (n,\beta) = n^{-\beta} \)
    • Interpretation: large \( \beta \) \( \implies \) more sparse
  • \( A = A (r, n) = \sqrt{2 r \log n}\; \) where \( 0 < r < 1 \)
    • Interpretation: Large \( r \) \( \implies \) large signal
    • Not true for dense regime
    • Only sparse regime considered here

Test Statistics

Test Statistics

  • Proposed by Jager & Wellner (2007)
    • Let \( F_n(x) = \frac{1}{n}\sum_{i=1}^n 1_{(X_i < x)} \)
    • \( S_n(s) = \sup_{0 < x < 1} \; K_s(F_n(x), x) \)
    • Statistic defined by \( s \)

For example (Tukey's Higher Criticism Statistic):

\[ K_2(u,v)=\frac{1}{2}\frac{(u-v)^2}{v(1-v)} \]

Test Statistics

(1) Obtain p-values \( p_i = P( N(0,1) > X_i) \)

(2) Order p-values \( p_{(1)},...,p_{(n)} \)

(3) get \( K_s(\;F_n(p_{(i)}), p_{(i)}\;) = K_s(\frac{i}{n}, p_{(i)}) \) for each \( i \)

(4) get \( S_n(s) = \max K_s \)

\( s=2 \implies \) \( S_n(s)=\max \frac{1}{2}\frac{(i/n-p_{(i)})^2}{p_{(i)}(1-p_{(i)})} \)

Test Statistics

Asymptotic Distribution: \[ nS_n(s) - r_n \rightarrow^{d} Y_4 \]

where \[ r_n = \log(\log n) +\frac{1}{2}\log(\log(\log n)) - \frac{1}{2}\log(4\pi) \] and \[ F_{Y_4}(x)=\exp(-4\exp(-x)) \]

Critical Value

  • Reject for large values of \( nS_n(s)-r_n \)
  • Solving \( \alpha = 1-\exp(-4\exp(-x)) \) we get:
    • \( -\log\left(-\frac{\log(1-\alpha)}{4} \right) \)
  • \( \alpha=0.05 \) is used

Simulation Design

Setup of Simulation:

  • We wish to examine size and power for the 5 tests specifically mentioned by Jager & Wellner (2007)
  • n=10,000 and S=1000 (number of replications)
  • Standard errors: \( SE=\sqrt{\frac{p(1-p)}{S}} \)
    • Worst case scenario: p=\( \frac{1}{2} \)
    • \( \max (SE) = \sqrt{(0.5)^2/1000} = 0.01581... \)
    • Safely report results to two decimal places

Factors

  • levels of \( r \):
    • 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99
  • levels of \( \beta \):
    • 0.5, 0.6, 0.7, 0.8, 0.9
  • levels of \( s \):
    • 2, 1, ½, 0, -1

Results

Size of Tests

plot of chunk unnamed-chunk-1

r=0.01 (Essentially null hypothesis)

plot of chunk unnamed-chunk-2

r=0.1

plot of chunk unnamed-chunk-3

r=0.3

plot of chunk unnamed-chunk-4

r=0.5

plot of chunk unnamed-chunk-5

r=0.7

plot of chunk unnamed-chunk-6

r=0.9

plot of chunk unnamed-chunk-7

r=0.99

plot of chunk unnamed-chunk-8

Conclusion

  • \( s=1 \) is the winner

\[ u=i/n \;\;\; v=p_{(i)} \]

\[ K_1(u, v) = u\log\left(\frac{u}{v}\right)+(1-u)log\left(\frac{1-u}{1-v}\right) \]