The normal distribution curve or guassian distribution curve is defined by the following equation
\(Y = \frac{1}{\sigma\sqrt{2\pi}} e^{\frac{-(X-\mu)^2}{\sigma^2}}\)
where \(\mu = mean, \sigma = standardDeviation, \pi = 3.14 , e = 2.71\)
For a standard normal distribution the mean = 0 and standard deviation and variance = 1
\(Y = \frac{1}{\sqrt{2\pi}} e^{\frac{-z^2}{2}}\)
The area included between z = -1 and 1, z= -2 and 2 and z=-3 and 3 are equal, respectively to 68.27%, 95.45% , and 99.73 % of total area,which 1(refer to curve)
x <- seq( -3,3, 0.1)
y <- dnorm(x)
mu =0
s =1
curve(dnorm(x,mu,s),xlim=c(-3,3),main='Standard Normal', xlab = "z")
abline(v=0, lty = 2, col ="red")
codx2 <- c(-3,seq(-3,3,0.1),3)
cody2 <- c(0,dnorm(seq(-3,3,0.1),mu,s),0)
polygon(codx2,cody2,col='blue')
codx1 <- c(-2,seq(-2,2,0.1),2)
cody1 <- c(0,dnorm(seq(-2,2,0.1),mu,s),0)
polygon(codx1,cody1,col='green')
codx <- c(-1,seq(-1,1,0.1),1)
cody <- c(0,dnorm(seq(-1,1,0.1),mu,s),0)
polygon(codx,cody,col='red')
Some important properties of normal distribution
Mean = \(\mu\)
Variance = \(\sigma^2\)
Standard deviation = \(\sigma\)
Moment coefficient of skewness = \(\alpha_3 = 0\)
Moment coefficcient of kurtosis = \(\alpha_4 =3\)
Mean deviation \(\sigma\sqrt{\frac{2}{\pi}}= 0.797\sigma\)
Relation between the Binomial and normal distribution If N is large and if neither p nor q is too close to 0, the binomial distribution can be closely approximate by a normal distibution with standardized variable given by \(z= \frac{X-Np}{\sqrt{Npq}}\)
problem 1. In a math exam,the mean is 72 , standard deviation is 15. Determine the score of students recieving the grades (a) 60 (b) 93 (c) 72
\(z = \frac{X - \bar{X}}{\sigma}\)
X = 72 # mean
s =15 # standard deviation
x <- c(60,93,72)
z<- c()
for (i in 1:3){z[i] = (x[i]- X)/s }
z
## [1] -0.8 1.4 0.0
problem 2. Suppose number of matches by major cricket players during their career is normally distributed with mean as 1500 matches, standard deviation 350 matches. (a)what percentage play fewer than 750 matches? (b) what percentage play more than 2000 matches (C) find the 90 percentile for the number of matches played during career
mu = 1500
s =350
x <- seq(0,4000, by =150)
curve(dnorm(x,mu,s),xlim=c(0,4000),main='Standard Normal', xlab = "z")
abline(v=1500, lty = 2, col ="red")
# percentage play fewer than 750 games
k = 750
codx <- c(0,seq(0,750,150),750)
cody <- c(0,dnorm(seq(0,750,150),mu,s),0)
polygon(codx,cody,col='green')
p750 = round(pnorm(k, mean =mu, sd =s), digit = 3)
# percentage play in more than 2000
k = 2000
codx <- c(2000,seq(2000,4000,150),4000)
cody <- c(0,dnorm(seq(2000,4000,150),mu,s),0)
polygon(codx,cody,col='pink')
p2000 = round(pnorm(k, mean =mu, sd =s), digit = 3)
# percentage play in more than 2000
round(pnorm(k, mean =mu, sd =s, lower.tail = FALSE), digit = 3)
## [1] 0.077
# find the 90th percentile for the number of games played during a career
q90 <- qnorm(0.9, mean = mu, sd = s,
lower.tail = TRUE, log.p = FALSE)
abline(v=1948.543, lty= 7, col = "blue")
results <- c(p750,p2000, q90)
results
## [1] 0.016 0.923 1948.543
Problem 3. Find the area under the standard normal curve in the following case: (a) Between z =0.81 and z =1.94 (b) to the right of z = -1.28 (c) To the right of z = 2.05 or to the left of z = -1.44.
mu = 0
s =1
par(mfrow=c(2,2))
x <- seq(-4,4, by =1)
curve(dnorm(x,mu,s),xlim=c(-4,4),
main='Standard Normal', xlab = "z")
abline(v=0, lty= 2, col = "red")
#Between z =0.81 and z =1.94
codx <- c(0.81, seq(0.81,1.94,0.1),1.94)
cody <- c(0,dnorm(seq(0.81,1.94,0.1),mu,s),0)
polygon(codx,cody,col='green')
r1 <- pnorm(1.94, mean = mu , sd = s)- pnorm(0.81, mean = mu , sd = s)
# (b) to the right of z = -1.28
curve(dnorm(x,mu,s),xlim=c(-4,4),
main='Standard Normal', xlab = "z")
codx <- c(-1.28, seq(-1.28,4,0.1),4)
cody <- c(0,dnorm(seq(-1.28,4,0.1),mu,s),0)
polygon(codx,cody,col='green')
r2 <- 1-pnorm(-1.28, mean = mu , sd = s)
#(c) To the right of z = 2.05 or to the left of z = -1.44.
curve(dnorm(x,mu,s),xlim=c(-4,4),
main='Standard Normal', xlab = "z")
codx <- c(2.05, seq(2.05,4,0.1),4)
cody <- c(0,dnorm(seq(2.05,4,0.1),mu,s),0)
polygon(codx,cody,col='green')
codx <- c(-4, seq(-4,-1.44,0.1),-1.44)
cody <- c(0,dnorm(seq(-4,-1.44,0.1),mu,s),0)
polygon(codx,cody,col='green')
r3 <- 1-pnorm(2.05, mean = mu , sd = s) + pnorm(-1.44, mean = mu , sd = s)
Results <- c(r1,r2,r3)
Results
## [1] 0.18278024 0.89972743 0.09511591
Problem 4 . The time spent watching tv per week by middle - school students has a normal distribution with a mean 20.5 and standard deviation =5.5 hrs. Find the student percent who watch less than 25hrs per week ?
mu = 20.5
s =5.5
par(mfrow=c(1,2))
#Find the student percent who watch less than 25hrs per week ?
r25 <- pnorm(25, mean = mu , sd = s)
codx <- c(0,seq(0,25,1),25)
cody <- c(0,dnorm(seq(0,25,1),mu,s),0)
curve(dnorm(x,mu,s),xlim=c(0,40),main=' Less 25Hrs perWeek on TV')
abline(v=25, lty= 2, col = "blue")
polygon(codx,cody,col='red')
#Find the percent who watch over 30 hours per weekdaysDate()?
r30 <- pnorm(30, mean = mu , sd = s, lower.tail = FALSE)
curve(dnorm(x,mu,s),xlim=c(0,40),main='More 30Hrs per Week on TV')
abline(v=30, lty= 2, col = "blue")
codx <- c(30,seq(30,40,1),40)
cody <- c(0,dnorm(seq(30,40,1),mu,s),0)
polygon(codx,cody,col='red')
results <- c(r25, r30)
results
## [1] 0.79337331 0.04205935