Chapter 3 - Distributions of Random Variables

load the packages

library(DATA606) 
## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo
library(ggplot2)

3.6.1 Normal distribution

3.2 Area under the curve, Part II. What percent of a standard normal distribution N(?? = 0, ??= 1) is found in each region? Be sure to draw a graph. (a) Z > -1.13 (b) Z < 0.18 (c) Z > 8 (d) |Z| < 0.5

plots and solutions

normalPlot(mean = 0,sd = 1,bounds=(c(-1.13,Inf)),tails = FALSE)

when Z> -1.13, P =87.1%

normalPlot(mean = 0,sd = 1,bounds=(c(-Inf,0.18)),tails = FALSE)

when Z< 0.18, P = 57.1%;

normalPlot(mean = 0, sd = 1,bounds=(c(8,Inf)),tails = FALSE)

when Z> 8, P =6.66e-16;

normalPlot(mean = 0,sd = 1,bounds=(c(-Inf,0.5)),tails = FALSE)

when Z< 0.5, P =69.1%.

3.6.2 Evaluating the normal approximation

3.18 Heights of female college students. Below are heights of 25 female college students. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 54 55 56 56 57 58 58 59 60 60 60 61 61 62 62 63 63 63 64

20 21 22 23 24 25 65 65 67 67 69 73

  1. The mean height is 61.52 inches with a standard deviation of 4.58 inches. Use this information to determine if the heights approximately follow the 68-95-99.7% Rule.
pnorm(61.52+4.58,mean=61.52,sd=4.58)
## [1] 0.8413447

Probability for falling within 1 standard deviation of the mean is 84.13% but not close to 68%.

pnorm(61.52+2*4.58,mean=61.52,sd=4.58)
## [1] 0.9772499

Probability for falling within 1 standard deviation of the mean is 97.72% but not 95%.

pnorm(61.52+3*4.58,mean=61.52,sd=4.58)
## [1] 0.9986501

Probability for falling within 1 standard deviation of the mean is close to 99.7%.

So the distribution of the heights does not approximately follow the 68-95-99.7% Rule.

  1. Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below.
height <- c(54, 55, 56, 56, 57, 58, 58, 59, 60, 60, 60, 61, 61, 62, 62, 63, 63, 63, 64, 65, 65, 67, 67, 69, 73)
hist(height, prob=TRUE, xlab="height")
curve(dnorm(x, 61.52, 4.58),min(height), max(height), add=T, col="darkblue")

Based on the histogram, the distribution seems to be slightly skewed to the right.

qqnorm(height)
qqline(height)

The QQ-plot of the data shows that points tend to follow the line but with some deviation on both high and low ends. To see whether the Q-Q plot looks like that for data from a known normal distribution, a set simulated data and their Q-Q plots were generated as shown below:

qqnormsim(height)

By comparing the data plot with sample plots, we can see that Q-Q plot for the data is similar to that for the simulated data sets. Thus I conclude that the female students’ height data follows a normal distribution.