knitr::opts_chunk$set(cache=TRUE, warning=FALSE, message=FALSE, fig.width=12)
library(EasieR)
## Loading required package: lattice
library(knitr)
What do you guess are the standard deviations of the two distributions in the figure below?
plote1 <- xyplot(e1b ~ e1a,
type = "l",
horizontal = FALSE,
scales = list(y = list(at = NULL)),
main = "Normal distribution, mean: 40, ds: 8, range: 10-70",
panel = function(x, ...) {
panel.xyplot(x, ...)
vlines <- c(32, 40, 48)
panel.abline( v = vlines, lty = 3)
})
plote2 <- xyplot(e1d ~ e1c,
type = "l",
horizontal = FALSE,
scales = list(y = list(at = NULL)),
main = "Normal distribution, mean: 40, ds: 4, range: 10-70",
panel = function(x, ...) {
panel.xyplot(x, ...)
vlines <- c(36, 40, 44)
panel.abline( v = vlines, lty = 3)
})
## position: (a, b, c, d) b: height from the top; d: height from the bottom
print(plote1, position=c(0, .5, 1, 1), more=TRUE)
print(plote2, position=c(0, 0, 1, .5))
Solution:
In the second plot, the standard deviation is slighly smaller than the first’s.
Draw a picture of a normal distribution with mean 70 and standard deviation 5.
xyplot(e2b ~ e2a,
type = "l",
main = "Normal distribution, mean 70",
scales = list(y = list(at = NULL), x = list(at = seq(0, 140, 10))),
panel = function(x, ...){
panel.xyplot(x, ...)
vmean <- 70
panel.abline(v = vmean, lty = 4)
})
Draw a picture with two normal distributions. Give each a mean of 70, with standard deviations of 5 and 10. How do the distributions differ?
e3a <- seq(40, 100, length = 1000)
e3b <- dnorm(e3a, 70, 5)
plot_e3a <- xyplot(e3b ~ e3a,
type = "l",
main = "Plot 1 - Mean 70, sd 5")
e3c <- seq(40, 100, length = 1000)
e3d <- dnorm(e3a, 70, 10)
plot_e3b <- xyplot(e3d ~ e3c,
type = "l",
main = "Plot 2 - Mean 70, sd 10",
ylim = range(e3b))
print(plot_e3a, position = c(0, .5, 1, 1), more = TRUE)
print(plot_e3b, position = c(0, 0, 1, 0.5))
Solution:
The second distribution is more widely scattered around the mean that, hence, result a little smaller.
Have a normal distribution with mean 110 and standard deviation 15.
e4a <- seq(60, 170, length = 10000)
e4b <- dnorm(e4a, 110, 15)
plot_e4a <- xyplot(e4b ~ e4a,
type = "l",
main = "Plot 1",
scales = list(x = list(at = seq(60, 170, 10)), rot = 45),
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = 110, lty = 2)
})
plot_e4b <- xyplot(e4b ~ e4a,
type = "l",
main = "Plot 2",
scales = list(x = list(at = seq(60, 170, 10)), rot = 45),
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(110, 125), lty = 2)
xx <- c(110, x[x>=110 & x<=125], 125)
yy <- c(0, y[x>=110 & x<=125], 0)
panel.polygon(xx,yy, ..., col='red')
})
plot_e4c <- xyplot(e4b ~ e4a,
type = "l",
main = "Plot 3",
scales = list(x = list(at = seq(60, 170, 10)), rot = 45),
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(95, 110, 140), lty = 2)
xx <- c(95, x[x>=95 & x<=140], 140)
yy <- c(0, y[x>=95 & x<=140], 0)
panel.polygon(xx,yy, ..., col='red')
})
plot_e4d <- xyplot(e4b ~ e4a,
type = "l",
main = "Plot 4",
scales = list(x = list(at = seq(60, 170, 10)), rot = 45),
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(80, 95, 110), lty = 2)
xx <- c(85, x[x>=80 & x<=95], 95)
yy <- c(0, y[x>=80 & x<=95], 0)
panel.polygon(xx,yy, ..., col='red')
})
print(plot_e4a, split = c(1, 1, 2, 2), more = TRUE)
print(plot_e4b, split = c(2, 1, 2, 2), more = TRUE)
print(plot_e4c, split = c(1, 2, 2, 2), more = TRUE)
print(plot_e4d, split = c(2, 2, 2, 2), more = FALSE)
A. About what percentage of the values lie between 110 and 125?
Between 110 and 125 we have approximately the 34% of the observations.
B. About what percentage of the values lie between 95 and 140?
Between 95 and 140 there are approximately the 81% of the observations.
C. About what percentage of the values lie between 80 and 95?
Between 80 and 95 we should expect about the 12% of the observations.
Exam scores have a normal distribution with mean of 70 and standard deviation of 10. Bob’s score is 80. Find and intepret his standard score.
e5a <- seq(30, 110, length = 1000)
e5b <- dnorm(e5a, 70, 10)
e5c <- seq(-3, 3, length = 1000)
e5d <- dnorm(e5c, mean = 0, sd = 1)
# Bob's Z-score:
(80 - 70)/10
## [1] 1
Solution:
Bob’s Z-score is 1, that is, exactly one standard deviation from the mean.
plot_e5a <- xyplot(e5b ~ e5a,
type = "l",
main = "Bob's score",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(70, 80), lty = c(1, 2))
panel.text(lab = c("Mean", "Bob"), x = c(70, 80), y = 0.02,
adj = c(0.4, 0), cex = 1.5, srt = 45)
})
plot_e5b <- xyplot(e5d ~ e5c,
type = "l",
main = "Bob's standardized score",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(0, 1), lty = c(1, 2))
panel.text(lab = c("Mean", "Bob"), x = c(0, 1), y = 0.2,
adj = c(0.4, 0), cex = 1.5, srt = 45)
})
print(plot_e5a, split = c(1, 1, 1, 2), more = TRUE)
print(plot_e5b, split = c(1, 2, 1, 2), more = FALSE)
dev.off()
## null device
## 1
Bob scores 80 on both his math exam (which has a mean of 70 and standard deviation on 10) and his English exam (which has a mean of 85 and a standard deviation of 5). Find and interpret Bob’s z-scores on both exams to let him know wich exam (if either) he did better on.
e6a <- seq(0, 100, length = 1000)
e6b <- dnorm(e6a, mean = 70, sd = 10)
e6c <- seq(0, 100, length = 1000)
e6d <- dnorm(e6c, mean = 85, sd = 5)
plot_e6a <- xyplot(e6b ~ e6a,
type = "l",
main = "Math exam",
ylim = range(range(e6d)),
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(70, 80), lty = c(2, 3))
panel.text(lab = c("Mean", "Bob"), x = c(70, 80), y = 0.03,
adj = c(0.2, 0), cex = 1.5, srt = 90)
})
plot_e6b <- xyplot(e6d ~ e6c,
type = "l",
main = "English exam",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(85, 80), lty = c(2, 3))
panel.text(lab = c("Mean", "Bob"), x = c(85, 80), y = 0.03,
adj = c(0.4, 0), cex = 1.5, srt = 90)
})
print(plot_e6a, split = c(1, 1, 1, 2), more = TRUE)
print(plot_e6b, split = c(1, 2, 1, 2), more = FALSE)
Now lets draw the standard normal distibution of the exams and the standardized results of Bob. For the math exam, the standardized score for Bob is 1 and -1 for the other. Bob had a worst result in the second exam.
e6e <- seq(-3, 3, length = 1000)
e6f <- dnorm(e6e, 0, 1)
plot_e6c <- xyplot(e6f ~ e6e,
type = "l",
main = "Math standardized scores",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = 0, lty = 2)
panel.text(lab = "Mean and Bob", x = 0, y = 0.15, adj = c(0.2, -0.2), cex = 1.2, srt = 45)
})
plot_e6d <- xyplot(e6f ~ e6e,
type = "l",
main = "English standardized scores",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(0, -1), lty = c(2, 3))
panel.text(lab = c("Mean", "Bob"), x = c(0, -1 ), y = 0.25, adj = c(0.4, -0.2), cex = 1.3, srt = 90)
})
print(plot_e6c, split = c(1, 1, 1, 2), more = TRUE)
print(plot_e6d, split = c(1, 2, 1, 2), more = FALSE)
Sue’s math class has a mean of 70 with a standard deviation of 5. Her standard score is -2. What is her original score?
e7a <- seq(-3, 3, length = 1000)
e7b <- dnorm(e7a, 0, 1)
e7c <- seq(0, 100, length = 1000)
e7d <- dnorm(e7c, 70, 5)
plot_e7a <- xyplot(e7b ~ e7a,
type = "l",
main = "Sue's standard math score",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(0, -2), lty = c(2, 3))
panel.text(lab = c("Mean", "Sue"), x = c(0, -2), y = 0.03, adj = c(0.4, 0), cex = 1.2, srt = 45)
})
plot_e7b <- xyplot(e7d ~ e7c,
type = "l",
main = "Sue's math result",
scales = list(y = list(at = seq(0, 0.4, 0.1))),
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(70, 60), lty = c(2, 3))
panel.text(lab = c("Mean", "Sue"), x = c(70, 60), y = 0.03, adj = c(0.4, 0), cex = 1.2, srt = 45)
})
print(plot_e7a, split = c(1, 1, 1, 2), more = TRUE)
print(plot_e7b, split = c(1, 2, 1, 2), more = FALSE)
Suppose your score on the exam is directly at the mean. What is your standard score.
e8a <- seq(-3, 3, length = 1000)
e8b <- dnorm(e8a, 0, 1)
xyplot(e8b ~ e8a,
type = "l",
main = "Standard normal distribution",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = 0, lty = 2)
panel.text(lab = "Mean", x = 0, y = 0.2, adj = c(0.4, 0.2), cex = 1.2, srt = 45)
})
Solution:
The mean of a standard normal distribution is always zero. And since the z-score is \((x-\mu)/\sigma\), it would result in 0.
SUppose the weight of cereal boxes have a normal distribution with mean of 20 ounces and a standard deviation of half an ounce. A box that has a standard score of zero weighs how much?
Solution:
The standard score is \(z = (x-\mu)/\sigma\). In this case, a cereal box with standard score of zero weighs \(x = (z/\sigma)-\mu\), that is 20 ounces.
Suppose you want to put fat Fido on a weight-loss program. Before the program, his weight had a standard score of +2 compared to the dogs of its breed/age, and after the program, his weight has a standard score of -2. His weight before the program was 150 pounds, and the standard deviation for the breed is 5.
e10a <- seq(-4, 4, length = 1000)
e10b <- dnorm(e10a, 0, 1)
xyplot(e10b ~ e10a, type = "l",
main = "Fido",
panel = function(x, ...){
panel.xyplot(x, ...)
panel.abline(v = c(2, -2), lty = 3)
panel.text(lab = c("Before", "After"), x = c(2, -2), y = 0.2, adj = c(0.2, 0), cex = 1.3, srt = 45)
})
A. What is the mean weight for Fido’s breed/age?
Since \(z = (x - \mu)/\sigma\), the breed mean \(\mu\) is \(-z\sigma + x\), that is 140.
B. What is its weight after the weight-loss program?
After the program the weight \(x_2\) is $ - 2$, that is 130.
Bob’s commuting times to work hava normal distribution with a mean of 45 minutes and a standard deviation of 10 minutes.
A. What percentage of time does bob get to work in 30 minutes or less?
zPlot(30, 45, 10, p = 1)
Since \(z\) is \((X - \mu)/\sigma\) is -1.5, the probability for Bob to get to work in 30 minutes or less is 0.0668072, that is 6.7%.
B. Bob’s workday starts at 9am, if he leaves at 8am, how often he is late?
zPlot(60, 45, 10, 2)
The 0.0668072%, approximately the 7%.
Times to complete a statistics exam have a normal distribution with mean 40 minuteas and standard deviation of 6 minutes. Deshawn’s time comes in the 90th percentile. What percentage of the students are still working on their exams when Deshawn leaves?
Solution: 10%.
Suppose your exam score has a standard score of 0.9. Does this mean that 90% of the other exam scoes are lower than yours?
Solution: No. It means that my score is 0.9 standard daviation above the mean. Because of the empirical rule, I can say that I am approximately part of 65% of the scores of this exam, since the 68% of scores usually are between one standard deviation from the mean. Futhermore, a z-score of 0.9 corresponds to the 81.59th percentile.
If a beby’s weight is ad the median, what is her percentile?
Solution: It is the 50th percentile.
Clint sleeps an average of 8 hours per night with a standard deviation of 15 minutes. What is the chance he will sleep less than 7.5 hours tonight?
zPlot(7.5, 8, 0.25, 1)
Solution:
The probability for Clint to sleep less than 7.5 hours is 2.3%.
Suppose you know that Bob’s test score is above the mea, but he does not remember how much. At least, how many students must score lower than Bob?
Solution:
At least the 50%.
Bob’s commuting times to work have a normal distribution with a mean of 45 minutes and a standard deviation of 10 minutes. How often does Bob get to work in 30 to 45 minutes?
zPlot(30, 45, 10, 3, 45)
Solution:
Bob gets to work in 30 to 45 minutes the ~43% of the times.
The times taken to complete a statistics exam has a normal distribution with a mean of 40 minutes and a standard deviation of 6 minutes. What is the chance of Deshawn completing the exam in 30 to 35 minutes?
zPlot(30, 40, 6, p = 3, y = 35)
Solution:
The probability for Deshawn of completing the exam in 30 to 35 minutes is 15.4%.
Times untile service at a restaurant have a normal distribution with mean of 10 minutes and standard deviation of 3 minutes. What is the chance of it taking longer than 15 minutes to get served?
zPlot(15, 10, 5, p = 2)
Solution:
The probability to get served in more than 15 minutes is the 15.9%.
At the same restaurant as in problem 19 with the same normal distribution, what is the chance of it taking no more than 15 minutes to get service?
zPlot(15, 10, 5, 1)
Solution:
The probability to get served in more than 15 minutes is the 84.1%.
Clint, obviously not in college, sleeps an average of 8 hours per night with a standard deviation of 15 minutes. What is the chance of him sleeping between 7.5 and 8.5 hours on any given night?
zPlot(7.5, 8, 0.25, 3, 8.5)
Solution:
The probability of Clint sleeping between 7.5 and 8.5 hours per night is 95.4%.
One state’s annua rainfall has a normal distribution with a mean of 100 inches and a standard deviation of 25 inches. Suppose corn grows best when the annual rain fall is between 100 and 150 inches. What is the chance of achieving this amount of rainfall?
zPlot(100, 100, 25, 3, 150)
Solution:
The chance of achieving an amount of rainfall between 100 and 150 is 47.7%.
Weights have a normal distribution with a mean of 100 and a standard deviation of 10. What weight has 60 percent of the values lying below it?
Solution:
The z-score for the 60th (0.6) percentile is 0.25. The cutting point \(x\) is \(z\sigma + \mu\), that is 102.5.
zPlot(102.5, 100, 10, 1)
Jimmy walks a mile, and his previous times have a normal distribution with a mean of 8 minutes and a standard deviation of 1 minute. What time does he have to make to get into his own top 10 precent of his fastest times?
Solution:
Since we are talking about times, the lower the better. Jimmy’s z-score has to be in the for the 10th (0.1) percentile: -1.28. The cutting point \(x\) is \(z\sigma + \mu\), that is 6.72 minutes.
zPlot(6.72, 8, 1, 1)
The times it takes to complete a statistics exam have a normal distribution with a mean of 40 minutes and standard deviation of 6 minutes. Deshawn’s time falls at the 42nd percentile. How long does Deshawn take to finish the exam?
Solution:
The z-score for the 42nd percentile (0.42) is -0.2. Hence, because of the \(z\) for \(x\) formula \(x = z\sigma + \mu\), the time \(x\) for Deshawn in finishing the exam is 38.8 minutes. Let’s test the result with the zPlot function.
zPlot(38.8, 40, 6)
Exam scores for a particular test have a normal distribution with a mean of 75 and standard deviation of 5. The instructor wants to give the top 20% of the scores an A. What is the cutoff for an A?
Solution:
Since we are looking for the top 20% of the distribution, we are looking for the 80th percentile. The z-score for the 8Oth percentile (0.8) is 0.84. Hence, the \(x\) representing the cut point is, according to the formula \(x = z\sigma + \mu\), 79.2.
zPlot(79.2, 75, 5)
Service call times for one company have a normal distribution with a mean of 10 minutes and a standard deviation of 3 minutes. Researchers study the longest 10% of the call to make improvements. How long do the longest 10 percent last?
Solution:
In this case, we are referring to higher part of the distribution, that is, the 90th percentile (0.9). The z-score for the 90th percentile is 1.28 and the \(x\), according to the formula \(x = z\sigma + \mu\), is 13.84 minutes.
zPlot(13.84, 10, 3)
Statcars have a miles per gallon normal distribution with a mean of 75. Twenty-percent of the vehicles get more than 100 miles per gallon. What is the standard deviation?
Solution:
Since we are interested in the top 20% of the distribution, we shall consider the 80th percentile. The z-score for the 80th percentile (0.8) is 0.84. The standard deviation \(\sigma\), hence, is \((x - \mu)/z = 29.76\)