Non Parametric Inference

July, 2019

Parametric & Non Parametric Hypothesis

Parametric hypothesis, if the family of distributions determined by the hypothesis can be put into one to one correspondence with a subset of finite dimensional Euclidean space.

If not, then non parametric hypothesis.

Every simple hypothesis is a parametric hypothesis.

Non parametric hypothesis is a composite hypothesis which is not simple.

Examples

\(X \sim Bin(n,p)\). To test \(H_0 : p = 0.5\) v/s \(H_1 : p = 0.75\). Both are parametric hypotheses.

\(X \sim Bin(n,p)\). To test \(H_0 : p = 0.5\) v/s \(H_1 : p > 0.5\). Here, \(\Theta_1 = \{p : p > 0.5\}\). Both are parametric hypotheses.

\((X,Y) \sim F(x,y)\), where \(F\) is bivariate normal with marginals \(N(0,1)\) and \(N(1,1)\). To test \(H_0 : \rho = \rho_0\) v/s \(H_1 : \rho > \rho_0\). Here \(\Theta_1 = \{\rho : \rho > \rho_0\}\). Both are parametric hypotheses.

\(X \sim F\), where \(F\) is a location family with location parameter \(\theta\). To test \(H_0 : \theta = 0\) v/s \(H_1 : \theta = 0.5\). Both are non parametric hypotheses.

Distribution Free

Suppose \((X_1,X_2,...,X_n) \sim F\).

\(F \in \mathcal{F}\), a class of distributions.

\(T(X_1,X_2,...,X_n)\) is distribution free if it’s distribution is same for all \(F \in \mathcal{F}\).

Will see examples of distribution free statistics later on.

Non parametrics and distribution free do not have same meaning.

Will see why later on.

Non Parametrics

No distributional assumptions.

Testing problems analogous to parametrics.

Test statistics are easy to calculate.

These need not be functions of observations.

In many cases these would be discrete random variables taking ‘few’ values.

Limiting distributions will be normal or chi-square.

Robust procedures.

Test for One Sample Location Family

Sign test.

Wilcoxon sign-rank test.

Sign Test

\(X_1,X_2,...,X_n\) IID \(F(x-\theta)\).

\(F \in \mathcal{F_0} = \{F : F\) is absolutely continuous, and \(F(0) = \frac{1}{2}\}\).

To test
- \(H_0 : \theta = 0\),
- \(H_1 : \theta > 0\),
- \(H_2 : \theta < 0\),
- \(H_3 : \theta \neq 0\).

Remarks

To test \(H_0^* : \theta = \theta_0\) (known) v/s corresponding alternative hyotheses, consider \(Y_i = X_i - \theta_0, \hspace{1mm} i =1(1)n\).

All the hypotheses are non parametric.

One can make appropriate adjustments to test for quantiles.

Sign Statistic

\(S = \sum_{i=1}^{n} I_{(0,\infty)}(X_i)\).

\(B = \sum_{i=1}^{n} I_{(-\infty,0)}(X_i)\).

\(S\) counts the number of sample observations which are positive.

\(S \sim Bin(n,p)\).

\(p=P(X_1>0)=1-P(X_1 \leq 0)=1-F(-\theta)\).

\(S\) is not distribution free.

But, under \(H_0, S \sim Bin(n,\frac{1}{2})\).

Sign statistic is distribution free only under \(H_0\).

Rejection rules

Reject \(H_0\) in favour of \(H_1\) for large values of \(S\).

Reject \(H_0\) in favour of \(H_2\) for small values of \(S\).

Reject \(H_0\) in favour of \(H_3\) for small or large values of \(S\).

Testing of \(H_0\) v/s \(H_1\) with respect to a given size \(\alpha\)

The test function is \[ \phi(s) = \begin{cases} 1, \text{if} \hspace{1mm} S > s,\\ a, \text{if} \hspace{1mm} S = s,\\ 0, \text{if} \hspace{1mm} S < s, \end{cases} \] where \(s\) is such that \(P_{H_0}(S>s) \leq \alpha < P_{H_0}(S \geq s)\) and \(a \in [0,1)\) is such that \(E_{H_0}\phi=\alpha\).

Testing of \(H_0\) v/s \(H_2\) with respect to a given size \(\alpha\)

The test function is \[ \phi(s) = \begin{cases} 1, \text{if} \hspace{1mm} S < s,\\ a, \text{if} \hspace{1mm} S = s,\\ 0, \text{if} \hspace{1mm} S > s, \end{cases} \] where \(s\) is such that \(P_{H_0}(S<s) \leq \alpha < P_{H_0}(S \leq s)\) and \(a \in [0,1)\) is such that \(E_{H_0}\phi=\alpha\).

Testing of \(H_0\) v/s \(H_3\) with respect to a given size \(\alpha\)

The test function is \[ \phi(s) = \begin{cases} 1, \text{if} \hspace{1mm} S < s_1,or \hspace{1mm} S>s_2,\\ a_1, \text{if} \hspace{1mm} S = s_1,\\ a_2, \text{if} \hspace{1mm} S = s_2,\\ 0, \text{if} \hspace{1mm} s_1<S<s_2, \end{cases} \] where \(s_1, s_2\) are such that \(P_{H_0}(S<s_1) \leq \alpha_1 < P_{H_0}(S \leq s_1)\), \(P_{H_0}(S>s_2) \leq \alpha_2 < P_{H_0}(S \geq s_2)\), and \(a_1, a_2 \in [0,1)\) are such that \(P_{H_0}(S<s_1)+a_1P_{H_0}(S=s_1)=\alpha_1\), \(P_{H_0}(S>s_2)+a_2P_{H_0}(S=s_2)=\alpha_2\), and \(0<\alpha_1,\alpha_2<\alpha_1+\alpha_2=\alpha<1\).

Limiting distribution

By CLT, \(T = \frac{S-\frac{n}{2}}{\sqrt{\frac{n}{4}}} \Longrightarrow Z, Z \sim N(0,1)\), under \(H_0\).

For testing \(H_0\) v/s \(H_1\), the test function is \[ \phi(t) = \begin{cases} 1, \text{if} \hspace{1mm} t>\tau_{\alpha},\\ 0, \text{Otherwise}. \end{cases} \]

For testing \(H_0\) v/s \(H_2\), the test function is \[ \phi(t) = \begin{cases} 1, \text{if} \hspace{1mm} t<-\tau_{\alpha},\\ 0, \text{Otherwise}. \end{cases} \]

For testing \(H_0\) v/s \(H_3\), the test function is \[ \phi(t) = \begin{cases} 1, \text{if} \hspace{1mm} |t|>\tau_{\alpha/2},\\ 0, \text{Otherwise}. \end{cases} \]
\(\tau_{\alpha}\) is the upper \(\alpha\) point of a \(N(0,1)\) distribution.

Remark

Suppose \((X_1,Y_1),(X_2,Y_2),...,(X_n,Y_n)\) is a random sample of size \(n\) from a bivariate population.

\(D=X-Y\).

Sign test can be used to test for the location of \(D\).

Sign test in R

library(nonpar)
x = c(1.8, 3.3, 5.65, 2.25, 2.5, 3.5, 2.75, 3.25, 3.10, 2.70, 3, 4.75, 3.4)
signtest(x, m = 3.5, alpha = 0.05, alternative = 'two.sided', conf.level = 0.95, exact = TRUE)

## 
##  Large Sample Approximation for the Sign Test 
##  
##  H0: The population median is =  3.5 
##  HA: The population median is not equal to  3.5 
##  
##  B = 10 
##  
##  Significance Level = 0.05 
##  The p-value is  0.043308142810792 
##  There is enough evidence to conclude that the population median is different than 3.5 at a significance level of  0.05 
##  
##  The  95 % confidence interval is [ 2.25 ,  3.3 ]. 
##

sign.test = function(data, quantile = 0.5, theta = 0, alternative = 'two.sided')
{
n = length(data)
y = data - theta
z = which(y > 0)
S = length(z)
b = binom.test(S, n, p = 1-quantile, alternative = alternative)
p.value = b$p.value
aa = list('Exact sign test', 
          'Quantile' = quantile, 
          'Theta' = theta, 
          'Alternative hypothesis' = alternative, 
          'Value of the sign statistic' = S,
          'p-value of the test' = p.value)
aa
}

x = c(1.8, 3.3, 5.65, 2.25, 2.5, 3.5, 2.75, 3.25, 3.10, 2.70, 3, 4.75, 3.4)
sign.test(data = x, quantile = 0.5, 
          theta = 3.5, alternative = 'two.sided')

## [[1]]
## [1] "Exact sign test"
## 
## $Quantile
## [1] 0.5
## 
## $Theta
## [1] 3.5
## 
## $`Alternative hypothesis`
## [1] "two.sided"
## 
## $`Value of the sign statistic`
## [1] 2
## 
## $`p-value of the test`
## [1] 0.02246094

Wilcoxon Sign-Rank Test

\(X_1,X_2,...,X_n\) IID \(F(x-\theta)\).

\(F \in \mathcal{F}_s = \{F : F \in \mathcal{F}_0, F(x) = 1 - F(-x) \forall x\}\).

To test
- \(H_0 : \theta = 0\),
- \(H_1 : \theta > 0\).
- \(H_2 : \theta < 0\),
- \(H_3 : \theta \neq 0\).

Remarks

To test \(H_0^* : \theta = \theta_0\) (known) v/s corresponding alternative hyotheses, consider \(Y_i = X_i - \theta_0, \hspace{1mm} i =1(1)n\).

All the hypotheses are non parametric.

Test Statistic

Find \(|X_1|,|X_2|,...,|X_n|\).

Let \(R_i = rank(|X_i|), i =1 (1) n\).

Let \(S_i = I_{(0,\infty)}(X_i), i = 1 (1) n\).

\(W = \sum_{i=1}^{n} R_i S_i\).

Let \(D_j = i\) when \(|X|_{(j)} = |X_i|\).

Let \(W_j = S_{D_j}, j = 1(1)n\).

\(T = \sum_{j=1}^{n} j W_j\).

Rejection rules

Reject \(H_0\) in favour of \(H_1\) for large values of \(W\) or \(T\).

Reject \(H_0\) in favour of \(H_2\) for small values of \(W\) or \(T\).

Reject \(H_0\) in favour of \(H_3\) for small or large values of \(W\) or \(T\).

Results

Under \(H_0\), \(S_1,S_2,...,S_n\) and the vector of ranks \(\textbf{R} = (R_1,R_2,...,R_n)\) are mutually independent.

\(S_1,S_2,...,S_n\) are independent of \(D_1,D_2,...,D_n\).

Under \(H_0\), \(W_1,W_2,...,W_n\) are IID random variables with \(P[W_i = 1] = P[W_i = 0] = \frac{1}{2}\).

Under \(H_0\), \(T\) is a linear combination of IID \(Ber(1/2)\) random variables.

\(T \in \mathcal{T} = \{0,1,2,...,\frac{n(n+1)}{2}\}\).

\(E(T) = \frac{n(n+1)}{4}\) and \(Var(T) = \frac{n(n+1)(2n+1)}{24}\), under \(H_0\).

\(T\) is distribution free under \(H_0\).

More Results

Under \(H_0\), MGF of \(T\) is given by \(M(t) = \frac{1}{2^n} \prod_{j=1}^{n}(1+e^{tj})\).

Under \(H_0\), \(T\) is symmetric about it’s mean.

Under \(H_0\), \(T^* = \sum_{i=1}^{n} R_i (1-S_i)\) has the same distribution as \(T\).

Large Sample Approximation

Suppose \(Z_1,Z_2,...,Z_n\) are IID with \(E(Z_i) \longrightarrow 0, Var(Z_i) = \sigma^2, 0 < \sigma^2 < \infty\). Let, \(S = \frac{1}{\sqrt{n}}\sum_{i=1}^{n} a_i Z_i\). If \(\frac{|a|_{(n)}}{\sqrt{\sum_{i=1}^{n} a_i^2}} \longrightarrow 0\) as \(n \longrightarrow \infty\), then \(\frac{S}{\sqrt{Var(S)}} \Longrightarrow Z, Z \sim N(0,1)\) as \(n \longrightarrow \infty\), \(Var(S) = \frac{\sigma^2}{n} \sum_{i=1}^{n} a_i^2\).

\(Z_i = W_i - \frac{1}{2}, E(Z_i) = 0, Var(Z_i) = \frac{1}{4}, i=1(1)n\). Note that, \(\frac{n}{\sqrt{\frac{n(n+1)(2n+1)}{6}}} \longrightarrow 0\) as \(n \longrightarrow \infty\). Hence, \(\frac{T-\frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}}} \Longrightarrow Z, Z \sim N(0,1)\) as \(n \longrightarrow \infty\).

Sign-Rank Test in R

wilcoxon.test = function(data, theta = 0, alternative = 'two.sided')
{
n = length(data)
y = data - theta
s = numeric(n)
for(i in 1:n)
{
if(y[i] > 0)
  s[i] = 1
else
  s[i] = 0
}
r = rank(abs(y))
W = sum(r*s)
Z = (W - (n*(n+1)/4))/sqrt(n*(n+1)*(2*n+1)/24)
if(alternative == 'two.sided')
p.value = 2*(1 - pnorm(abs(Z)))
if(alternative == 'greater')
p.value = 1 - pnorm(Z)
if(alternative == 'less')
p.value = pnorm(Z)
aa = list('Value of the Wilcoxon sign-rank statistic' = W, 'p-value of the large sample test' = p.value)
aa
}

x = rnorm(100, 3.5, 1)
wilcoxon.test(data = x, theta = 3, alternative = 'greater')

## $`Value of the Wilcoxon sign-rank statistic`
## [1] 3733
## 
## $`p-value of the large sample test`
## [1] 1.637167e-05

Two Sample Problems

Let \(X_1,X_2,...,X_m\) and \(Y_1,Y_2,...,Y_n\) be independent samples from two absolutely continuous distribution functions \(F_X\) and \(F_Y\), respectively.

Null hypothesis, \(H_0 : F_X(x) = F_Y(x), \forall x \in \mathbf{R}\).

Different types of alternatives, e.g.
Location alternative : \(F_Y(x) = F_X(x-\theta), \theta \neq 0\).
Scale alternative : \(F_Y(x) = F_X(x/\sigma), \sigma \neq 1\).
Lehmann alternative : \(F_Y(x) = 1 - [1-F_X(x)]^{\theta+1}, \theta+1>0\).
Stochastic alternative: \(F_Y(x) \geq F_X(x), \forall x\), and \(F_Y(x) > F_X(x)\) for at least one \(x\).
General alternative : \(F_Y(x) = F_X(x)\) for some \(x\).

A continuous random variable \(X\) is sthochastically larger than a continuous random variable \(Y\) if \(P(Y>x) \geq P(X>x), \forall x\).

Wald-Wolfowitz Runs Test

Combine and arrange the \(m\) \(X'\)s and \(n\) \(Y'\)s in increasing order of size.

Keep the identity of each value by noting the population from which it comes, and thus replace each value by the letter \(X\) or the letter \(Y\), as the case may be.

Consider this array and count the number of runs.

A run is a sequence of identical letters preceded and followed by a different letter or no letter at all.

If \(H_0\) is true, the \((m+n)\) values come from the same population.

\(X\) and \(Y\) values will be well mixed.

The total number of runs, \(R\) will be relatively large.

\(R\) will be small if the samples come from different populations, i.e. if \(H_0\) is false.

In the extreme case, if all the values of \(Y\) are greater than all the values of \(X\) or vice versa, then there will be only two runs.

Rejection Rule

When tested against \(H : F_X(x) \neq F_Y(x)\) for at least one \(x\), too few runs will reject \(H_0\).

The level \(\alpha\) rejection region will be \(R \leq r_{\alpha}\).

\(r_{\alpha}\) is the largest integer such that \(P_{H_0}(R \leq r_{\alpha}) \leq \alpha\).

Exact Null Distribution of \(R\)

Under \(H_0\), all the \(\binom{m+n}{m} = \binom{m+n}{n}\) distinguishable arrangements of \(m\) \(X'\)s and \(n\) \(Y'\)s in a line are equally likely.

Let \(R = 2d\) (even).

Happens iff one has \(d\) runs of \(X'\)s and \(d\) runs of \(Y'\)s.

First run may be either of \(X\) or of \(Y\).

To get \(d\) runs of \(X\), partition the \(m\) \(X\)’s into \(d\) non-empty groups (\(d \leq m\)).

This can be done by placing the \(X\)’s in a line and putting, say, each of \((d-1)\) bars between any two \(X\)’s in the line.

There are in all \((m-1)\) positions in which each bar can be placed, thus giving a total of \(\binom{m-1}{d-1}\) distinguishable arrangements of the \(m\) \(X\)’s in \(d\) groups.

Exact Null Distribution of \(R\) (Contd.)

Similarly, there are \(\binom{n-1}{d-1}\) distinguishable arrangements of the \(n\) \(Y\)’s in \(d\) groups.

Total number of distinguishable arrangements giving \(d\) runs of \(X\) and \(d\) runs of \(Y\), beginning with a run of \(X\) (or of \(Y\)), is \(\binom{m-1}{d-1} \binom{n-1}{d-1}\).

Under \(H_0\) they are equally likely.

\(P[R = 2d| H_0] = \frac{2 \binom{m-1}{d-1} \binom{n-1}{d-1}}{\binom{m+n}{m}}\).

For \(R = 2d + 1\) (odd), \(P[R = 2d+1|H_0] = \frac{\binom{m-1}{d} \binom{n-1}{d-1}+\binom{m-1}{d-1} \binom{n-1}{d}}{\binom{m+n}{m}}\).

Tables of critical values of \(R\) based on exact distribution are given by Swed and Eisenhart.

Moments of Exact Null Distribution of \(R\)

\(E(R|H_0) = \frac{2mn}{m+n} + 1\).

\(Var(R|H_0) = \frac{2mn(2mn-m-n)}{(m+n)^2 (m+n-1)}\).

Asymptotic Null Distribution

Valid for \(m>10, n>10\).

Assume \(\frac{m}{m+n}\) remains constant as \((m+n) \longrightarrow \infty\).

Then \(R \sim N(\frac{2mn}{m+n} + 1, \frac{2mn(2mn-m-n)}{(m+n)^2 (m+n-1)})\) asymptotically under \(H_0\).

Run Test in R

library(DescTools)
A = c(35, 44, 39, 50, 48, 29, 60, 75, 49, 66)
B = c(17, 23, 13, 24, 33, 21, 18, 16, 32)
RunsTest(A, B, alternative = 'less', exact = TRUE)

## 
##  Wald-Wolfowitz Runs Test
## 
## data:  A and B
## runs = 4, m = 10, n = 9, p-value = 0.001764
## alternative hypothesis: true number of runs is less than expected

RunsTest(A, B, alternative = 'less', exact = FALSE)

## 
##  Wald-Wolfowitz Runs Test
## 
## data:  A and B
## z = -2.8287, runs = 4, m = 10, n = 9, p-value = 0.002337
## alternative hypothesis: true number of runs is less than expected

run.test = function(x, y)
{
  m = length(x)
  n = length(y)
  N = m + n
  z = c(x, y)
  l = c(rep(1, each = m), rep(0, each = n))
  d = data.frame(z, l)
  d = d[order(z), ]
  l = d$l
  s = rle(l)
  R = length(s$lengths)
  Z = (R - (((2 * m * n) / N) + 1))/sqrt(((2 * m * n * (2 * m * n - N))/(N * N * (N - 1))))
  p.value = pnorm(Z)
  aa = list("Number of Runs" = R, "p-value of The Large Sample Run Test" = p.value)
  aa
}

x = rnorm(25, 3, 1)
y = rnorm(30, 5, 2)
run.test(x, y)

## $`Number of Runs`
## [1] 18
## 
## $`p-value of The Large Sample Run Test`
## [1] 0.002400373