Non Parametric Inference

July, 2019

Parametric & Non Parametric Hypothesis

Parametric hypothesis, if the family of distributions determined by the hypothesis can be put into one to one correspondence with a subset of finite dimensional Euclidean space.

If not, then non parametric hypothesis.

Every simple hypothesis is a parametric hypothesis.

Non parametric hypothesis is a composite hypothesis which is not simple.

Examples

\(X \sim Bin(n,p)\). To test \(H_0 : p = 0.5\) v/s \(H_1 : p = 0.75\). Both are parametric hypotheses.

\(X \sim Bin(n,p)\). To test \(H_0 : p = 0.5\) v/s \(H_1 : p > 0.5\). Here, \(\Theta_1 = \{p : p > 0.5\}\). Both are parametric hypotheses.

\((X,Y) \sim F(x,y)\), where \(F\) is bivariate normal with marginals \(N(0,1)\) and \(N(1,1)\). To test \(H_0 : \rho = \rho_0\) v/s \(H_1 : \rho > \rho_0\). Here \(\Theta_1 = \{\rho : \rho > \rho_0\}\). Both are parametric hypotheses.

\(X \sim F\), where \(F\) is a location family with location parameter \(\theta\). To test \(H_0 : \theta = 0\) v/s \(H_1 : \theta = 0.5\). Both are non parametric hypotheses.

Distribution Free

Suppose \((X_1,X_2,...,X_n) \sim F\).

\(F \in \mathcal{F}\), a class of distributions.

\(T(X_1,X_2,...,X_n)\) is distribution free if it's distribution is same for all \(F \in \mathcal{F}\).

Will see examples of distribution free statistics later on.

Non parametrics and distribution free do not have same meaning.

Will see why later on.

Non Parametrics

No distributional assumptions.

Testing problems analogous to parametrics.

Test statistics are easy to calculate.

These need not be functions of observations.

In many cases these would be discrete random variables taking 'few' values.

Limiting distributions will be normal or chi-square.

Robust procedures.

Test for One Sample Location Family

Sign test.

Wilcoxon sign-rank test.

Sign Test

\(X_1,X_2,...,X_n\) IID \(F(x-\theta)\).

\(F \in \mathcal{F_0} = \{F : F\) is absolutely continuous, and \(F(0) = \frac{1}{2}\}\).

To test
- \(H_0 : \theta = 0\),
- \(H_1 : \theta > 0\),
- \(H_2 : \theta < 0\),
- \(H_3 : \theta \neq 0\).

Remarks

To test \(H_0^* : \theta = \theta_0\) (known) v/s corresponding alternative hyotheses, consider \(Y_i = X_i - \theta_0, \hspace{1mm} i =1(1)n\).

All the hypotheses are non parametric.

One can make appropriate adjustments to test for quantiles.

Sign Statistic

\(S = \sum_{i=1}^{n} I_{(0,\infty)}(X_i)\).

\(B = \sum_{i=1}^{n} I_{(-\infty,0)}(X_i)\).

\(S\) counts the number of sample observations which are positive.

\(S \sim Bin(n,p)\).

\(p=P(X_1>0)=1-P(X_1 \leq 0)=1-F(-\theta)\).

\(S\) is not distribution free.

But, under \(H_0, S \sim Bin(n,\frac{1}{2})\). Sign statistic is distribution free only under \(H_0\).

Rejection rules

Reject \(H_0\) in favour of \(H_1\) for large values of \(S\).

Reject \(H_0\) in favour of \(H_2\) for small values of \(S\).

Reject \(H_0\) in favour of \(H_3\) for small or large values of \(S\).

Testing of \(H_0\) v/s \(H_1\) with respect to a given size \(\alpha\)

The test function is \[ \phi(s) = \begin{cases} 1, \text{if} \hspace{1mm} S > s,\\ a, \text{if} \hspace{1mm} S = s,\\ 0, \text{if} \hspace{1mm} S < s, \end{cases} \] where \(s\) is such that \(P_{H_0}(S>s) \leq \alpha < P_{H_0}(S \geq s)\) and \(a \in [0,1)\) is such that \(E_{H_0}\phi=\alpha\).

Testing of \(H_0\) v/s \(H_2\) with respect to a given size \(\alpha\)

The test function is \[ \phi(s) = \begin{cases} 1, \text{if} \hspace{1mm} S < s,\\ a, \text{if} \hspace{1mm} S = s,\\ 0, \text{if} \hspace{1mm} S > s, \end{cases} \] where \(s\) is such that \(P_{H_0}(S<s) \leq \alpha < P_{H_0}(S \leq s)\) and \(a \in [0,1)\) is such that \(E_{H_0}\phi=\alpha\).

Testing of \(H_0\) v/s \(H_3\) with respect to a given size \(\alpha\)

The test function is \[ \phi(s) = \begin{cases} 1, \text{if} \hspace{1mm} S < s_1,or \hspace{1mm} S>s_2,\\ a_1, \text{if} \hspace{1mm} S = s_1,\\ a_2, \text{if} \hspace{1mm} S = s_2,\\ 0, \text{if} \hspace{1mm} s_1<S<s_2, \end{cases} \] where \(s_1, s_2\) are such that \(P_{H_0}(S<s_1) \leq \alpha_1 < P_{H_0}(S \leq s_1)\), \(P_{H_0}(S>s_2) \leq \alpha_2 < P_{H_0}(S \geq s_2)\), and \(a_1, a_2 \in [0,1)\) are such that \(P_{H_0}(S<s_1)+a_1P_{H_0}(S=s_1)=\alpha_1\), \(P_{H_0}(S>s_2)+a_2P_{H_0}(S=s_2)=\alpha_2\), and \(0<\alpha_1,\alpha_2<\alpha_1+\alpha_2=\alpha<1\).

Limiting distribution

By CLT, \(T = \frac{S-\frac{n}{2}}{\sqrt{\frac{n}{4}}} \Longrightarrow Z, Z \sim N(0,1)\), under \(H_0\).

For testing \(H_0\) v/s \(H_1\), the test function is \[ \phi(t) = \begin{cases} 1, \text{if} \hspace{1mm} t>\tau_{\alpha},\\ 0, \text{Otherwise}. \end{cases} \]

For testing \(H_0\) v/s \(H_2\), the test function is \[ \phi(t) = \begin{cases} 1, \text{if} \hspace{1mm} t<-\tau_{\alpha},\\ 0, \text{Otherwise}. \end{cases} \]

For testing \(H_0\) v/s \(H_3\), the test function is \[ \phi(t) = \begin{cases} 1, \text{if} \hspace{1mm} |t|>\tau_{\alpha/2},\\ 0, \text{Otherwise}. \end{cases} \]
\(\tau_{\alpha}\) is the upper \(\alpha\) point of a \(N(0,1)\) distribution.

Remark

Suppose \((X_1,Y_1),(X_2,Y_2),...,(X_n,Y_n)\) is a random sample of size \(n\) from a bivariate population.

\(D=X-Y\).

Sign test can be used to test for the location of \(D\).

Sign test in R

library(nonpar)
x = c(1.8, 3.3, 5.65, 2.25, 2.5, 3.5, 2.75, 3.25, 3.10, 2.70, 3, 4.75, 3.4)
signtest(x, m = 3.5, alpha = 0.05, alternative = 'two.sided', 
         conf.level = 0.95, exact = TRUE)

## 
##  Large Sample Approximation for the Sign Test 
##  
##  H0: The population median is =  3.5 
##  HA: The population median is not equal to  3.5 
##  
##  B = 10 
##  
##  Significance Level = 0.05 
##  The p-value is  0.043308142810792 
##  There is enough evidence to conclude that the population median is different than 3.5 at a significance level of  0.05 
##  
##  The  95 % confidence interval is [ 2.25 ,  3.3 ]. 
##

sign.test = function(data, quantile = 0.5, theta = 0, alternative = 'two.sided')
{
n = length(data)
y = data - theta
z = which(y > 0)
S = length(z)
b = binom.test(S, n, p = 1-quantile, alternative = alternative)
p.value = b$p.value
aa = list('Exact sign test', 'Quantile' = quantile, 'Theta' = theta, 
          'Alternative hypothesis' = alternative, 'Value of the sign statistic' = S,
          'p-value of the test' = p.value)
aa
}