Class 4 (11th jan)

get random numbers from a Normal distribution with mean 25 and sd of 4

getting 100 normal Normal numbers

rnorm(100,mean=25,sd=4)->HundredvaluesNormal

Getting 500 normal numbers

rnorm(500,mean=25,sd=4)->FiveHundredvaluesNormal

Plotting both side by side

par(mfrow=c(1,2))
hist(HundredvaluesNormal)
hist(FiveHundredvaluesNormal)

The following tests are done to verify your hypothesis is correct or not

Shapiro test

to test your distribution is Normal or not, it will tell data type, value of statistics (w) and P value

shapiro.test(HundredvaluesNormal) 
## 
##  Shapiro-Wilk normality test
## 
## data:  HundredvaluesNormal
## W = 0.98556, p-value = 0.348
shapiro.test(FiveHundredvaluesNormal)
## 
##  Shapiro-Wilk normality test
## 
## data:  FiveHundredvaluesNormal
## W = 0.99721, p-value = 0.5602
  • w is our test statistics, it denotes degree of freedom means how many different values we have taken from data to calculate our statistics

  • test statistics (w) increases if we take more points, also P value has increased too, so by changing

  • the sample size , so with increase in sample size we get better normal distribution

  • If distribution is normally distributed then do Parametric test if its not then do non parametric test

NULL HYPOTHESIS- on the starting of an experiment we assume something to be true, so the hypothesis test is formulated so that we could reject the null hypothesis

  • NULL hypothesis will depend on what distribution we are seeing, if are seeing Normal distribution our NULL hypothesis will be “Yes it is Normally distributed”

  • but when the p value comes to be 0.2 or 0.3 (in rnorm) we couldn’t reject the hypothesis, it means it’s normally distributed.

  • smaller the P value greater the confidence to reject the null hypothesis

Generating Random numbers

Q: Get random numbers from a Poisson distribution with mean 4 Solution: r=random pois= poison distribution lambda is the height at which this distribution is located

rpois(100,lambda = 4)->HundredvaluesPois 
rpois(5000,lambda = 4)->FiveHundredvaluesPois

Now plotting the distribution

par(mfrow=c(1,2))
hist(HundredvaluesPois)
hist(FiveHundredvaluesPois)

Testing weather the distributions are normal or not

shapiro.test(HundredvaluesPois)
## 
##  Shapiro-Wilk normality test
## 
## data:  HundredvaluesPois
## W = 0.94154, p-value = 0.0002397
shapiro.test(FiveHundredvaluesPois)
## 
##  Shapiro-Wilk normality test
## 
## data:  FiveHundredvaluesPois
## W = 0.9613, p-value < 2.2e-16

As we can see P values are so less so distributions are not normal, anyway these are poison distribution so its obvious

Get random numbers from a Uniform distribution

r=random, unif=uniform, in uniorm distribuion all values are equally likely

runif(100)->HundredvaluesUnif 
runif(500)->FiveHundredvaluesUnif

Plotting

par(mfrow=c(1,2))
hist(HundredvaluesUnif)
hist(FiveHundredvaluesUnif)

Normality test

shapiro.test(HundredvaluesUnif)
## 
##  Shapiro-Wilk normality test
## 
## data:  HundredvaluesUnif
## W = 0.92375, p-value = 2.238e-05

P value is less then alpha value that is 0.05 it means its not normally distributed

shapiro.test(FiveHundredvaluesUnif)
## 
##  Shapiro-Wilk normality test
## 
## data:  FiveHundredvaluesUnif
## W = 0.95907, p-value = 1.455e-10

P value is very small in this case so our Null hypothesis that this is a normal distribution can be rejected

Skellam distribution

All the above distributions comes with base R Package, but we need to install Skellam package to use it. * install skellam package to look at skellam distribution * difference between two poison distribution is called skellam distribution

install.packages("skellam", repos = "http://cran.us.r-project.org") # done only once
## Installing package into 'C:/Users/govin/OneDrive/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'skellam' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\govin\AppData\Local\Temp\RtmpovHEsJ\downloaded_packages
library(skellam) # done everytime R is started

just to set a value so every time the following function gives same result

set.seed(19)

The lambda value is for two poison distributions so the skellam distribution can be generated by takng differences of them

rskellam(100, lambda1= 4, lambda2 =5)->HundredvaluesSkell 
rskellam(500, lambda1= 4, lambda2 =5)->FiveHundredvaluesSkell

Plotting Skellam distributions

par(mfrow=c(1,2))
hist(HundredvaluesSkell)
hist(FiveHundredvaluesSkell)

Normality test

shapiro.test(HundredvaluesSkell) 
## 
##  Shapiro-Wilk normality test
## 
## data:  HundredvaluesSkell
## W = 0.9438, p-value = 0.0003314

since P value is so less (<0.05) its not a normal distribution

shapiro.test(FiveHundredvaluesSkell)
## 
##  Shapiro-Wilk normality test
## 
## data:  FiveHundredvaluesSkell
## W = 0.98726, p-value = 0.0002342

since P value is so less (<0.05) its not a normal distribution

For loop

Using a for loop

for (i in 1:100)
{
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
## [1] 54
## [1] 55
## [1] 56
## [1] 57
## [1] 58
## [1] 59
## [1] 60
## [1] 61
## [1] 62
## [1] 63
## [1] 64
## [1] 65
## [1] 66
## [1] 67
## [1] 68
## [1] 69
## [1] 70
## [1] 71
## [1] 72
## [1] 73
## [1] 74
## [1] 75
## [1] 76
## [1] 77
## [1] 78
## [1] 79
## [1] 80
## [1] 81
## [1] 82
## [1] 83
## [1] 84
## [1] 85
## [1] 86
## [1] 87
## [1] 88
## [1] 89
## [1] 90
## [1] 91
## [1] 92
## [1] 93
## [1] 94
## [1] 95
## [1] 96
## [1] 97
## [1] 98
## [1] 99
## [1] 100
for (i in 100:1)
{
  print(i)
}
## [1] 100
## [1] 99
## [1] 98
## [1] 97
## [1] 96
## [1] 95
## [1] 94
## [1] 93
## [1] 92
## [1] 91
## [1] 90
## [1] 89
## [1] 88
## [1] 87
## [1] 86
## [1] 85
## [1] 84
## [1] 83
## [1] 82
## [1] 81
## [1] 80
## [1] 79
## [1] 78
## [1] 77
## [1] 76
## [1] 75
## [1] 74
## [1] 73
## [1] 72
## [1] 71
## [1] 70
## [1] 69
## [1] 68
## [1] 67
## [1] 66
## [1] 65
## [1] 64
## [1] 63
## [1] 62
## [1] 61
## [1] 60
## [1] 59
## [1] 58
## [1] 57
## [1] 56
## [1] 55
## [1] 54
## [1] 53
## [1] 52
## [1] 51
## [1] 50
## [1] 49
## [1] 48
## [1] 47
## [1] 46
## [1] 45
## [1] 44
## [1] 43
## [1] 42
## [1] 41
## [1] 40
## [1] 39
## [1] 38
## [1] 37
## [1] 36
## [1] 35
## [1] 34
## [1] 33
## [1] 32
## [1] 31
## [1] 30
## [1] 29
## [1] 28
## [1] 27
## [1] 26
## [1] 25
## [1] 24
## [1] 23
## [1] 22
## [1] 21
## [1] 20
## [1] 19
## [1] 18
## [1] 17
## [1] 16
## [1] 15
## [1] 14
## [1] 13
## [1] 12
## [1] 11
## [1] 10
## [1] 9
## [1] 8
## [1] 7
## [1] 6
## [1] 5
## [1] 4
## [1] 3
## [1] 2
## [1] 1

For loop via conditional statement

for (i in 1:10)
{
  if (i %in% c(1,3,5,7,9))
  {print(i)}
  else
  {print(0)}
}
## [1] 1
## [1] 0
## [1] 3
## [1] 0
## [1] 5
## [1] 0
## [1] 7
## [1] 0
## [1] 9
## [1] 0

illustration of central limit theoram

Creating (initializing) null vectors so that we can populate it later

pvals <- c() 
wvals <- c()

setting seed value

set.seed(19)
get uniform distribution of 5000 values
z = runif(5000)

using a for loop

for (i in 1:10)
{
  z = z+runif(5000) #adds another uniformely distributed variable
  t = shapiro.test(z)
  pvals[i] = t$p.value
  wvals[i] = t$statistic
}

hist(z)

The histogram is showing normal distribution hense sum of any type of distributions results in normal distribution

plot(c(1:10), pvals, type="b", xlab='itteration') #type="b" means both line and dots, for line and dots say type ="l"or "p" respectively

plot(c(1:10), wvals, type="b", xlab='itteration')

creating conditional statement in above function

For example if value is between 1 to 5 add exponential distribution of its between 6 to 10 add poison distribution

pvals <- c()
wvals <- c()
set.seed(19)

get uniform distribution of 5000 values

z = runif(5000)

using a for loop

for (i in 1:100)
{
  
  if (i %in% 1:50)
  {
  z = z+rexp(5000) #adds another uniformely distributed variable
  t = shapiro.test(z)
  pvals[i] = t$p.value
  #wvals[i] = t$statistic
  }
 #ifelse() 
  else{if (i %in% 51:100)
  {
    z=z+rpois(5000, lambda = 4)
    t = shapiro.test(z)
    pvals[i] = t$p.value
    wvals[i] = t$statistic    
  }
      }
}

plot z

hist(z)

the histogram is showing normal distribution hense sum of any type of distributions results in normal distribution

plot(c(1:100), pvals, type="b", xlab='itteration')
abline(h= 0.05, col="red")

plot(c(1:100), wvals, type="b", xlab='itteration')