getting 100 normal Normal numbers
rnorm(100,mean=25,sd=4)->HundredvaluesNormal
Getting 500 normal numbers
rnorm(500,mean=25,sd=4)->FiveHundredvaluesNormal
Plotting both side by side
par(mfrow=c(1,2))
hist(HundredvaluesNormal)
hist(FiveHundredvaluesNormal)
to test your distribution is Normal or not, it will tell data type, value of statistics (w) and P value
shapiro.test(HundredvaluesNormal)
##
## Shapiro-Wilk normality test
##
## data: HundredvaluesNormal
## W = 0.98556, p-value = 0.348
shapiro.test(FiveHundredvaluesNormal)
##
## Shapiro-Wilk normality test
##
## data: FiveHundredvaluesNormal
## W = 0.99721, p-value = 0.5602
w is our test statistics, it denotes degree of freedom means how many different values we have taken from data to calculate our statistics
test statistics (w) increases if we take more points, also P value has increased too, so by changing
the sample size , so with increase in sample size we get better normal distribution
If distribution is normally distributed then do Parametric test if its not then do non parametric test
NULL HYPOTHESIS- on the starting of an experiment we assume something to be true, so the hypothesis test is formulated so that we could reject the null hypothesis
NULL hypothesis will depend on what distribution we are seeing, if are seeing Normal distribution our NULL hypothesis will be “Yes it is Normally distributed”
but when the p value comes to be 0.2 or 0.3 (in rnorm) we couldn’t reject the hypothesis, it means it’s normally distributed.
smaller the P value greater the confidence to reject the null hypothesis
Q: Get random numbers from a Poisson distribution with mean 4 Solution: r=random pois= poison distribution lambda is the height at which this distribution is located
rpois(100,lambda = 4)->HundredvaluesPois
rpois(5000,lambda = 4)->FiveHundredvaluesPois
Now plotting the distribution
par(mfrow=c(1,2))
hist(HundredvaluesPois)
hist(FiveHundredvaluesPois)
Testing weather the distributions are normal or not
shapiro.test(HundredvaluesPois)
##
## Shapiro-Wilk normality test
##
## data: HundredvaluesPois
## W = 0.94154, p-value = 0.0002397
shapiro.test(FiveHundredvaluesPois)
##
## Shapiro-Wilk normality test
##
## data: FiveHundredvaluesPois
## W = 0.9613, p-value < 2.2e-16
As we can see P values are so less so distributions are not normal, anyway these are poison distribution so its obvious
r=random, unif=uniform, in uniorm distribuion all values are equally likely
runif(100)->HundredvaluesUnif
runif(500)->FiveHundredvaluesUnif
Plotting
par(mfrow=c(1,2))
hist(HundredvaluesUnif)
hist(FiveHundredvaluesUnif)
Normality test
shapiro.test(HundredvaluesUnif)
##
## Shapiro-Wilk normality test
##
## data: HundredvaluesUnif
## W = 0.92375, p-value = 2.238e-05
P value is less then alpha value that is 0.05 it means its not normally distributed
shapiro.test(FiveHundredvaluesUnif)
##
## Shapiro-Wilk normality test
##
## data: FiveHundredvaluesUnif
## W = 0.95907, p-value = 1.455e-10
P value is very small in this case so our Null hypothesis that this is a normal distribution can be rejected
All the above distributions comes with base R Package, but we need to install Skellam package to use it. * install skellam package to look at skellam distribution * difference between two poison distribution is called skellam distribution
install.packages("skellam", repos = "http://cran.us.r-project.org") # done only once
## Installing package into 'C:/Users/govin/OneDrive/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'skellam' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\govin\AppData\Local\Temp\RtmpovHEsJ\downloaded_packages
library(skellam) # done everytime R is started
just to set a value so every time the following function gives same result
set.seed(19)
The lambda value is for two poison distributions so the skellam distribution can be generated by takng differences of them
rskellam(100, lambda1= 4, lambda2 =5)->HundredvaluesSkell
rskellam(500, lambda1= 4, lambda2 =5)->FiveHundredvaluesSkell
Plotting Skellam distributions
par(mfrow=c(1,2))
hist(HundredvaluesSkell)
hist(FiveHundredvaluesSkell)
Normality test
shapiro.test(HundredvaluesSkell)
##
## Shapiro-Wilk normality test
##
## data: HundredvaluesSkell
## W = 0.9438, p-value = 0.0003314
since P value is so less (<0.05) its not a normal distribution
shapiro.test(FiveHundredvaluesSkell)
##
## Shapiro-Wilk normality test
##
## data: FiveHundredvaluesSkell
## W = 0.98726, p-value = 0.0002342
since P value is so less (<0.05) its not a normal distribution
for (i in 1:100)
{
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
## [1] 54
## [1] 55
## [1] 56
## [1] 57
## [1] 58
## [1] 59
## [1] 60
## [1] 61
## [1] 62
## [1] 63
## [1] 64
## [1] 65
## [1] 66
## [1] 67
## [1] 68
## [1] 69
## [1] 70
## [1] 71
## [1] 72
## [1] 73
## [1] 74
## [1] 75
## [1] 76
## [1] 77
## [1] 78
## [1] 79
## [1] 80
## [1] 81
## [1] 82
## [1] 83
## [1] 84
## [1] 85
## [1] 86
## [1] 87
## [1] 88
## [1] 89
## [1] 90
## [1] 91
## [1] 92
## [1] 93
## [1] 94
## [1] 95
## [1] 96
## [1] 97
## [1] 98
## [1] 99
## [1] 100
for (i in 100:1)
{
print(i)
}
## [1] 100
## [1] 99
## [1] 98
## [1] 97
## [1] 96
## [1] 95
## [1] 94
## [1] 93
## [1] 92
## [1] 91
## [1] 90
## [1] 89
## [1] 88
## [1] 87
## [1] 86
## [1] 85
## [1] 84
## [1] 83
## [1] 82
## [1] 81
## [1] 80
## [1] 79
## [1] 78
## [1] 77
## [1] 76
## [1] 75
## [1] 74
## [1] 73
## [1] 72
## [1] 71
## [1] 70
## [1] 69
## [1] 68
## [1] 67
## [1] 66
## [1] 65
## [1] 64
## [1] 63
## [1] 62
## [1] 61
## [1] 60
## [1] 59
## [1] 58
## [1] 57
## [1] 56
## [1] 55
## [1] 54
## [1] 53
## [1] 52
## [1] 51
## [1] 50
## [1] 49
## [1] 48
## [1] 47
## [1] 46
## [1] 45
## [1] 44
## [1] 43
## [1] 42
## [1] 41
## [1] 40
## [1] 39
## [1] 38
## [1] 37
## [1] 36
## [1] 35
## [1] 34
## [1] 33
## [1] 32
## [1] 31
## [1] 30
## [1] 29
## [1] 28
## [1] 27
## [1] 26
## [1] 25
## [1] 24
## [1] 23
## [1] 22
## [1] 21
## [1] 20
## [1] 19
## [1] 18
## [1] 17
## [1] 16
## [1] 15
## [1] 14
## [1] 13
## [1] 12
## [1] 11
## [1] 10
## [1] 9
## [1] 8
## [1] 7
## [1] 6
## [1] 5
## [1] 4
## [1] 3
## [1] 2
## [1] 1
for (i in 1:10)
{
if (i %in% c(1,3,5,7,9))
{print(i)}
else
{print(0)}
}
## [1] 1
## [1] 0
## [1] 3
## [1] 0
## [1] 5
## [1] 0
## [1] 7
## [1] 0
## [1] 9
## [1] 0
Creating (initializing) null vectors so that we can populate it later
pvals <- c()
wvals <- c()
setting seed value
set.seed(19)
z = runif(5000)
for (i in 1:10)
{
z = z+runif(5000) #adds another uniformely distributed variable
t = shapiro.test(z)
pvals[i] = t$p.value
wvals[i] = t$statistic
}
hist(z)
The histogram is showing normal distribution hense sum of any type of distributions results in normal distribution
plot(c(1:10), pvals, type="b", xlab='itteration') #type="b" means both line and dots, for line and dots say type ="l"or "p" respectively
plot(c(1:10), wvals, type="b", xlab='itteration')
For example if value is between 1 to 5 add exponential distribution of its between 6 to 10 add poison distribution
pvals <- c()
wvals <- c()
set.seed(19)
get uniform distribution of 5000 values
z = runif(5000)
using a for loop
for (i in 1:100)
{
if (i %in% 1:50)
{
z = z+rexp(5000) #adds another uniformely distributed variable
t = shapiro.test(z)
pvals[i] = t$p.value
#wvals[i] = t$statistic
}
#ifelse()
else{if (i %in% 51:100)
{
z=z+rpois(5000, lambda = 4)
t = shapiro.test(z)
pvals[i] = t$p.value
wvals[i] = t$statistic
}
}
}
plot z
hist(z)
the histogram is showing normal distribution hense sum of any type of distributions results in normal distribution
plot(c(1:100), pvals, type="b", xlab='itteration')
abline(h= 0.05, col="red")
plot(c(1:100), wvals, type="b", xlab='itteration')