//Exercise 1.

Heights of 10 year olds, regardless of gender, closely follow a normal distribution with sample mean \({µ}\) =55 inches and standard deviation \({s}\) = 6 inches. \({X}\) \(\sim\) \({N(55,6)}\).

a) What is the probability that a randomly chosen 10 year old is shorter than 48 inches?

The probability P that a randomly chosen 10 year old is shorter than 48 inches is given by the formula \[{P(X<48)}\ = P(Z<\frac{48-55}{{6}}) = P(Z<-1.16667) = pnorm(-1.16667)\], which gives us 0.12.

(48-55)/6
## [1] -1.166667
round((pnorm((48-55)/6)), digits = 2)
## [1] 0.12

b) What is the probability that a randomly chosen 10 year old is between 60 and 65 inches?

The probability P that a randomly chosen 10 year old is between 60 and 65 inches is given by the formula \[{P(60<X<65)}\ = P(X<65) - P(X<60)\] \[{P(60<X<65)} =P(Z<\frac{65-55}{{6}}) - P(Z<\frac{60-55}{{6}})\] \[P(Z<1.666667) - P(Z<0.8333333)\], which gives us 0.15.

(65-55)/6 
## [1] 1.666667
(60-55)/6
## [1] 0.8333333

c) What is the height cutoff for the tallest 5% of 10 year olds?

The probability P for the height cutoff for the tallest 5% of 10 year olds is given by \[P(X<c)=0.05\]. We can start by taking the qnorm of 0.05 to get \(c\), which gives us -1.6448536. This gives us \[P(Z<-1.644854)=0.05\], and we can use this to solve for \(x\) of the formula \[ z=\frac{x-µ}{\sigma}\]. This gives us \[ -1.644854=\frac{x-55 in}{6in}\], which gives us 45 in.

(qnorm(0.05)*6)+55
## [1] 45.13088

//Exercise 2.

Suppose weights of the checked baggage of airline passengers follow a nearly normal distribution with mean 45 pounds and standard deviation 3.2 pounds, or \({X}\) \(\sim\) \({N(45,3.2)}\). Most airlines charge a fee for baggage that weigh in excess of 50 pounds. Determine what percent of airline passengers incur this fee.

The percentage of passengers that incur this fee can be found by the formula \[P(X>50) = P(Z>\frac{50lbs-45lbs}{{3.2lbs}}) = P(Z> 1.5625) = 1 - P(Z< 1.5625)\]. This gives us 6%.

(50-45)/3.2
## [1] 1.5625
1 - pnorm(1.5625)
## [1] 0.05908512

//Exercise 3.

Assume a random variable X follows a normal distribution with mean µ and standard deviation σ. What is the probability that an observation falls below µ + 2σ?

The probability that an observation falls below \(µ + 2 σ\) can be found by \[P(X< µ+2σ)\]. The probabiliy that we will get a value within 2 standard deviations of the mean is 95 %. Outside of this probability we must account for observations that are two standard deviations below the mean, which gives us a probability of 2.5%. So, the total probability that an observation falls below two standard deviations below the mean is 97.5%.

0.95+.025
## [1] 0.975

//Exercise 4.

Let \(Z ∼ N(0, 1)\) be a random variable following a standard normal distribution. Use qnorm() to find the value \(c\) such that:

a) \(P(Z<c)=0.8\) We can start by taking the qnorm of 0.8 to get \(c\), which gives us 0.8416212. This gives us \[P(Z<0.8416212)=0.8\], and we can use this to solve for \(x\) of the formula \[ z=\frac{x-µ}{\sigma}\]. This gives us \[ 0.8416212=\frac{x-0}{1}\], which gives us 1.

qnorm(0.8)
## [1] 0.8416212

b) \(P(Z<c)=0.01\)

We can start by taking the qnorm of 0.01 to get \(c\), which gives us -2.3263479. This gives us \[P(Z<-2.326348)=0.01\], and we can use this to solve for \(x\) of the formula \[ z=\frac{x-µ}{\sigma}\]. This gives us \[ -2.326348=\frac{x-0}{1}\], which gives us -2.

qnorm(0.01)
## [1] -2.326348

c) \(P(-c<Z<c)=0.95\)

We can start by taking the qnorm of 0.95 to get \(c\), which gives us 1.6448536. Next, we compute \[{P(-c<X<c)} = P(Z<\frac{c-0}{{1}}) - P(Z<\frac{-c-0}{{1}})= P(Z<c) - P(Z<-c)\].

This gives us \[P(Z<1.644854)=0.95\], and \[P(Z<-1.644854)=0.95\]. We can use this to solve for \(x\) of the formula \[ z=\frac{x-µ}{\sigma}\]. This gives us \[ 1.644854=\frac{c-0}{1}\] and \[1.644854=\frac{-c-0}{1}\]. \[P(Z<1.644854) - P(Z<-1.644854)\]. This gives us 3.

qnorm(.95)
## [1] 1.644854

//Exercise 5.

Generate 1000 random numbers from a normal distribution with mean µ = 100 and standard deviation σ = 20, and then plot the histogram. Also make a normal QQ plot with the random numbers you generated.

g<-rnorm(1000, mean=100, sd=20)
hist(g, main='')

qqnorm(g)

//Exercise 6.

Subset the females from the cdc data set. Make a density histogram and normal QQ plot of the height and weight variables for the females. Superimpose a normal curve on the density histogram, using the sample mean and sample standard deviation as the parameters. Use qqline() to add a reference line to the QQ plot. Do the points on the QQ plots fall on the straight line? Comment on any deviations in the data from the normal distribution.

a) Histogram and normal QQ plot of height variables for females.

cdc <-readRDS(url("https://ericwfox.github.io/data/cdc.rds"))

cdc_f<-subset(cdc, gender == "f" ) #subset females

hist(cdc_f$height, breaks=30, freq=FALSE, xlab="Female Heights", main='')
x <- seq(0, 300, 0.01)
y <- dnorm(x, mean=mean(cdc_f$height), sd=sd(cdc_f$height))
matlines(x, y, col="red", lwd=2)

qqnorm(cdc_f$height)
qqline(cdc_f$height)

The points on the QQ plot that hit the straight line lie in the middle of the graph.

b) Histogram and normal QQ plot of weight variables for females.

hist(cdc_f$weight, breaks=30, freq=FALSE, xlab="Female Weights", main='')
x <- seq(0, 300, 0.01)
y <- dnorm(x, mean=mean(cdc_f$weight), sd=sd(cdc_f$weight))
matlines(x, y, col="red", lwd=2)

qqnorm(cdc_f$weight)
qqline(cdc_f$weight)

Where the QQ plot curves on the graph is the part that the points of the plot fall on the straight line.