Prove the statement
\[ \adjustlimits\sup_{X_k \le x\le X_{k+1}} |k/N -F(x)| =\max (F(X_{k+1})-\frac{k}{N},\frac{k}{N}-F(X_k)); k=1,2,\hdots, N-1\]
Which was used in the derivation of \(D_N\)
If we have a sorted sample \(X_1 \le X_2 \le X_3 \le \hdots \le X_N\) then the Kolmogorov-Smirnov test statistic \(D_N\) is defined as
where \[ F_N(x) = \begin{cases} 0 & \text{if $x<X_1$} \\ \frac{k}{N} & \text{if $X_k \le x\le X_{k+1}$} \\ 1 & \text{if $x>X_N$} \end{cases} \]
Now
Now applying the conditions, \(x<X_1, X_1<x<X_2, X_2<x<X_3, \hdots, x>X_N\) we get
\[ D_N=max(\adjustlimits\sup_{x<X_1} |-F(x)|, \adjustlimits\sup_{X_1 \le x < X_{2}} |1/N -F(x)|,\adjustlimits\sup_{X_2 \le x < X_{3}} |2/N -F(x)|,\hdots \adjustlimits\sup_{x \ge X_N} |1-F(x)| )\]
From the graph below we see that, \[\adjustlimits\sup_{x<X_1} |-F(x)|=F(X_1)\] and \[\adjustlimits\sup_{x>X_N} |1-F(x)|=1-F(X_N)\]
We know from the properties of absolute values \[|\frac{k}{N}-F(x)|=\begin{cases} \frac{k}{N}-F(x) & \text{if $\frac{k}{N}>F(x)$}\\ -(\frac{k}{N}-F(x))& \text{if $\frac{k}{N}<F(x)$}\\ \end{cases}\]
Since cummulative distribution function (CDF) is non-decreasing and right continuous so for any \(X_k \le X_{k+1}\) we will have \(F(X_k) \le F(X_{k+1})\). And thus
and
Which always measures the distance between the empirical and standard CDF. Therefore, any of \(F(X_{k+1})-\frac{k}{N}\) or \(\frac{k}{N}-F(X_k)\) is positive. Thus,
\[ \adjustlimits\sup_{X_k \le x\le X_{k+1}} |k/N -F(x)| =\max (F(X_{k+1})-\frac{k}{N},\frac{k}{N}-F(X_k)); k=1,2,\hdots, N-1\]
Compute \(D_N=\max(D^{+},D^{-})\) for the dat set \(x_1=0.2, x_2=0.6, x_3=0.7\). Take F to be the CDF of \(U(0,1)\); the uniform distribution on \((0,1)\). What do you think \(D^+, D^-, D_N\) measure, intuitively?
We know
\[D_N = \sum_{-\infty < x < \infty}|F_N(x) - F(x)|\]
where \[ F_N(x) = \begin{cases} 0 & \text{if $x<X_1$} \\ \frac{k}{N} & \text{if $X_k \le x\le X_{k+1}$} \\ 1 & \text{if $x>X_N$} \end{cases} \]
So,
\[F_{N}(x)=\begin{cases}
0&\text{ if }x<0.2\\
\frac{1}{3}&\text{ if }x\in[0.2,0.6)\\
\frac{2}{3}&\text{ if }x\in[0.6,0.7)\\
1&\text{ if }x\geq0.7\\
\end{cases}\]
Now \(D_N\) can be written as
\(\begin{aligned}D_{N}&=\max\left[F(X_{1}),\max_{k=1,\ldots,N-1}\left(F(X_{k+1})-\tfrac{k}{N},\tfrac{k}{N}-F(X_{k}),1-F(X_{N})\right)\right]\\ &= \max\left[\underbrace{\max_{k=1,\ldots,N}\left(\tfrac{k}{N}-F(X_{k})\right)}_{D^{+}},\underbrace{\max_{k=1,\ldots,N}\left(F(X_{k})-\tfrac{k-1}{N}\right)}_{D^{-}}\right]\\&=\max\{D^{+},D^{-}\}\end{aligned}\)
Accordingly,
\(\begin{aligned} D^{+}&=\max\{\tfrac{1}{3}-0.2,\tfrac{2}{3}-0.6,\tfrac{3}{3}-0.7\}\approx\{0.13,0.067,0.3\}\\ D^{-}&=\max\{0.2-\tfrac{0}{3},0.6-\tfrac{1}{3},0.7-\tfrac{2}{3}\}\approx\{0.2,0.26,0.03\}\\ \end{aligned}\)
And thus, \(D_N=0.3\)
Prove
\(\begin{aligned}\max\left[F(X_{1}),\max_{k=1,\ldots,N-1}\left(F(X_{k+1})-\tfrac{k}{N},\tfrac{k}{N}-F(X_{k}),1-F(X_{N})\right)\right] &= \max\left[\max_{k=1,\ldots,N}\left(\tfrac{k}{N}-F(X_{k})\right),\max_{k=1,\ldots,N}\left(F(X_{k})-\tfrac{k-1}{N}\right)\right]\end{aligned}\)
Which was used in the derivation of \(D_N\)
If we have a sorted sample \(X_1 \le X_2 \le X_3 \le \hdots \le X_N\) then the Kolmogorov-Smirnov test statistic \(D_N\) is defined as
where \[ F_N(x) = \begin{cases} 0 & \text{if $x<X_1$} \\ \frac{k}{N} & \text{if $X_k \le x\le X_{k+1}$} \\ 1 & \text{if $x>X_N$} \end{cases} \]
Now applying the conditions, \(x<X_1, X_1<x<X_2, X_2<x<X_3, \hdots, x>X_N\) we get
\[ D_N=max(\adjustlimits\sup_{x<X_1} |-F(x)|, \adjustlimits\sup_{X_1 \le x < X_{2}} |1/N -F(x)|,\adjustlimits\sup_{X_2 \le x < X_{3}} |2/N -F(x)|,\hdots \adjustlimits\sup_{x \ge X_N} |1-F(x)| )\]
The first and last term can be written as
\[\adjustlimits\sup_{x<X_1} |-F(x)|=F(X_1)\] and \[\adjustlimits\sup_{x>X_N} |1-F(x)|=1-F(X_N)\]
For other terms we can write
\(\sup_{X_k\leq x < X_{k+1}}\left|\frac{k}{N} - F(x)\right| = \max\left(F(X_{k+1}) - \frac{k}{N},\frac{k}{N} - F(X_k)\right); k = 1, \ldots, N - 1\)
So,
\(\begin{aligned}D_{N}&=\max\left[F(X_{1}),\max_{k=1,\ldots,N-1}\left(F(X_{k+1})-\tfrac{k}{N},\tfrac{k}{N}-F(X_{k}),1-F(X_{N})\right)\right]\\ &= \max\left[\underbrace{\max_{k=1,\ldots,N}\left(\tfrac{k}{N}-F(X_{k})\right)}_{D^{+}},\underbrace{\max_{k=1,\ldots,N}\left(F(X_{k})-\tfrac{k-1}{N}\right)}_{D^{-}}\right]\\&=\max\{D^{+},D^{-}\}\end{aligned}\)
Consider the MCG with parameters \(a=23, M=10^8+1\) and let the seed be \(47594118\). Apply KS test to the first 1000 random numbers (including the seed) from this generator. Compute the KS-statistic and find its p-value. What is your conclusion for the generator?
N=1000; # Number of total observations
a=23; # Multiplier
M=(10^8)+1; # Modulo M
A = matrix(0, ncol=1, nrow=N) # Zero matrix of N by 1
seed<-as.double(47594118) # Initial seed
A[1,1]<-seed/M # Defining a function mcg to generate
mcg<-function(){ # N random numbers between 0 to 1
seed<<-(a*seed)%% M
round(seed/M,6)
}
for (i in 2:N) {
A[i,]<-c(mcg())
}
# CALCULATION OF KS-STATISTIC #######################################################
## STEP 1: Obtain the sample x and sort it ##########################################
x<-sort(A[,1])
## STEP 2: Hypothesis H_0: x follows F(x) vs H_a: x does not follow F(x) ############
## STEP 3: Compute D_N ##############################################################
k<-1:N
D_N<-max(max((k/N)-x),
max(x-((k-1)/N)))
D_N
## [1] 0.036956
## STEP 4: Run the test #############################################################
ks.test(x,"punif")
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.036956, p-value = 0.1302
## alternative hypothesis: two-sided
We see that the test statistic calcualted manually and generated from built in KS test are the same. So, the calculatin was correct. Now looking at p-value we see that it is not significant, so we do not have enough evidence against our null hypothesis. Hence the x values that we got from the generator follows the uniform distribution.
Implement an obviously “bad” random number generator of your choice and you should explain why it is bad. Then apply the \(\chi^2-\) test together with the KS-statistic as explained in Remark 1. Take \(M=10, N=1000\) and \(k=10\). What are your conclusions?
N=1000; # Total Number of observations
M=10;
k=10;
# INTENTIONALLY SELECTED BAD GENERATOR
a=1.5; # Multiplier a=1.5
x_n <- as.double(5) # initial value/seed=5
rangen <- function() { # rangen is the definition of the
x_n <<- (a* x_n+.5) %% M # generator function
} # x_(i+1)=1.5x_i +(1/2) Mod 10
A<-matrix(0,ncol = 1,nrow = N) # A is the zero matrix of dim N by 1
for(i in 1:N) {
A[i,] <- rangen()
}
X<-A[,1] # X is the N random numbers
head(sort(round(unique(X),2))) # Here is how X looks like (rounded)
## [1] 0.03 0.05 0.10 0.10 0.11 0.15
plot(X/M,X/M) # pair-wise plot of normalized values
# The reason we are claiming our generator is bad is the graph. From the graph we see
# it is a straight line, following a linear pattern which is not expected at all.
# All we want a random scatter plot which is not present here. This is why the chosen
# RNG is not a good generator.
# CHI-SQUARED TEST ##################################################################
H<-hist(X) # Histogram
Y<-H$counts # Y is the number of each kind of numbers
Y # generated by the generator. Also called
## [1] 76 107 140 129 128 106 90 85 87 52
# the observed values.
p<-1/k; # Since we expect the distribution would
# be uniform so probability p=1/k
Exp<-N*p; # Exp is the expected number of each kind
# numbers
## Chi-squared test statistic ########################################################
sm<-0;
for (i in 1:k) {
sm<<-sm+(((Y[i]-Exp)^2)/Exp)
}
sprintf("Chi-square Statistic %f", sm)
## [1] "Chi-square Statistic 66.840000"
## Testing the hypothesis ###########################################################
critcal_value<-qchisq(.95, df=(k-1))
critcal_value
## [1] 16.91898
options(warn=-1)
if(sm>critcal_value){
sprintf("We reject the null hypothesis")
}else{
sprintf("We do not reject the null hypothesis")
}
## [1] "We reject the null hypothesis"
# KS test ###########################################################################
x<-sort(unique(X/M))
k<-1:N
D_N<-max(max((k/N)-x),
max(x-((k-1)/N)))
D_N
## [1] 0.108436
ks.test(x,"punif")
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.10844, p-value = 1.224e-10
## alternative hypothesis: two-sided
# CONCLUSION #######################################################################
# Both of the test rejects the null hypothesis. In the first chi-squared test the
# test statistic is too bigger than the critical value and in the second test (KS)
# p-value is very small.