if (Sys.info()["sysname"] == "Windows") {
setwd("~/Masters/DATA606/Week3/Homework")
} else {
setwd("~/Documents/Masters/DATA606/Week3/Homework")
}
library(DATA606)
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
require(ggplot2)
## Loading required package: ggplot2
Answer:
The probability is:
pnorm(-1.13, mean = 0, sd = 1, lower.tail = FALSE)
## [1] 0.8707619
I graphed each answer twice; once using normalPlot, and then again using ggplot()
normalPlot()
normalPlot(mean = 0, sd = 1, bounds = c(-1.13, 4))
ggplot()
lb <- -4
ub <- 4
z1 <- -1.13
z2 <- ub
pick_line1 <- z1
pick_line2 <- z1
Q1 <- ggplot(data.frame(x = c(lb, ub)), aes(x)) + stat_function(fun = dnorm) +
stat_function(fun = dnorm, xlim = c(z1, z2), geom = "area",
alpha = 0.5) + geom_vline(xintercept = pick_line1, color = "black",
alpha = 0.75) + geom_text(aes(x = pick_line1, y = 0.25, label = sprintf("Z = %s\n",
pick_line1)), color = "black", angle = 90) + geom_vline(xintercept = pick_line2,
color = "black", alpha = 0.75) + geom_text(aes(x = pick_line2,
y = 0.25, label = sprintf("Z = %s\n", pick_line2)), color = "black",
angle = 90)
Q1
Answer:
The probability is:
pnorm(0.18, mean = 0, sd = 1)
## [1] 0.5714237
Using normalPlot()
normalPlot(mean = 0, sd = 1, bounds = c(0.18), tails = TRUE)
Using ggplot()
lb <- -4
ub <- 4
z1 <- lb
z2 <- 0.18
pick_line1 <- z2
pick_line2 <- z2
Q2 <- ggplot(data.frame(x = c(lb, ub)), aes(x)) + stat_function(fun = dnorm) +
stat_function(fun = dnorm, xlim = c(z1, z2), geom = "area",
alpha = 0.5) + geom_vline(xintercept = pick_line1, color = "black",
alpha = 0.75) + geom_text(aes(x = pick_line1, y = 0.25, label = sprintf("\nZ = %s",
pick_line1)), color = "black", angle = 90) + geom_vline(xintercept = pick_line2,
color = "black", alpha = 0.75) + geom_text(aes(x = pick_line2,
y = 0.25, label = sprintf("\nZ = %s", pick_line2)), color = "black",
angle = 90)
Q2
The probability is:
pnorm(8, mean = 0, sd = 1, lower.tail = FALSE)
## [1] 6.220961e-16
using normalPlot()
normalPlot(mean = 0, sd = 1, bounds = c(8, 9))
using ggplot()
lb <- -10
ub <- 10
z1 <- 8
z2 <- ub
pick_line1 <- z1
pick_line2 <- z1
Q3 <- ggplot(data.frame(x = c(lb, ub)), aes(x)) + stat_function(fun = dnorm) +
stat_function(fun = dnorm, xlim = c(z1, z2), geom = "area",
alpha = 0.5) + geom_vline(xintercept = pick_line1, color = "black",
alpha = 0.75) + geom_text(aes(x = pick_line1, y = 0.25, label = sprintf("\nZ = %s",
pick_line1)), color = "black", angle = 90) + geom_vline(xintercept = pick_line2,
color = "black", alpha = 0.75) + geom_text(aes(x = pick_line2,
y = 0.25, label = sprintf("\nZ = %s", pick_line2)), color = "black",
angle = 90)
Q3
The probability is:
pnorm(0.5, mean = 0, sd = 1) - pnorm(-0.5, mean = 0, sd = 1)
## [1] 0.3829249
using normalPlot()
normalPlot(mean = 0, sd = 1, bounds = c(-0.5, 0.5))
using ggplot()
lb <- -4
ub <- 4
z1 <- -0.5
z2 <- 0.5
pick_line1 <- z1
pick_line2 <- z2
Q4 <- ggplot(data.frame(x = c(lb, ub)), aes(x)) + stat_function(fun = dnorm) +
stat_function(fun = dnorm, xlim = c(z1, z2), geom = "area",
alpha = 0.5) + geom_vline(xintercept = pick_line1, color = "black",
alpha = 0.75) + geom_text(aes(x = pick_line1, y = 0.25, label = sprintf("Z = %s\n",
pick_line1)), color = "black", angle = 90) + geom_vline(xintercept = pick_line2,
color = "black", alpha = 0.75) + geom_text(aes(x = pick_line2,
y = 0.25, label = sprintf("\nZ = %s", pick_line2)), color = "black",
angle = 90)
Q4
Answer:
Given the data is normally distributed, the value of the top 2% of the population would correlate to a known z value which can then be used to solve the following equation.
z_value <- qnorm(p = 0.98, mean = 0, sd = 1)
z_value
## [1] 2.053749
\[Z\quad =\quad \frac { x\quad -\quad \mu }{ \sigma } \\ \sigma \quad =\quad \frac { x\quad -\quad \mu }{ Z } \\ \sigma \quad =\quad \frac { 132\quad -\quad 100 }{ 2.053 }=15.59\]
Answer:
Using the same methods as described in part (a):
high_c <- 1 - 0.185
z_value <- qnorm(p = high_c, mean = 0, sd = 1)
z_value
## [1] 0.8964734
\[\sigma \quad =\quad \frac { x\quad -\quad \mu }{ Z } \\ \sigma \quad =\quad \frac { 220\quad -\quad 185 }{ 0.896 } =39.06\]
Answer:
We can use qnorm to determine what heights corespond to given percentiles and use that information to subset the data and determine the resulting proportions. If the calculated propoprtions are close to the corresponding percent (68-95-99.7%) then the rule is followed.
college_f <- c(54, 55, 56, 56, 57, 58, 58, 59, 60, 60, 60, 61,
61, 62, 62, 63, 63, 63, 64, 65, 65, 67, 67, 69, 73)
college_f_mean <- mean(college_f)
college_f_sd <- sd(college_f)
college_68_z <- qnorm(p = 0.68, mean = 61.52, sd = 4.58)
sum(college_68_z > college_f)/length(college_f)
## [1] 0.72
college_95_z <- qnorm(p = 0.95, mean = 61.52, sd = 4.58)
sum(college_95_z > college_f)/length(college_f)
## [1] 0.96
college_99.7_z <- qnorm(p = 0.997, mean = 61.52, sd = 4.58)
sum(college_99.7_z > college_f)/length(college_f)
## [1] 1
It appears the 68-95-99.7% rule is generally followed. However, from the values determined, it appears each test returned higher than expected results which indicates that there is more data to the left of the mean than would be in a normal distribution; therefore, the data has a right skew. A histogram plot and normal probability plot confirms our findings.
hist(college_f)
qqnorm(college_f)
qqline(college_f)
Using the methodology from the lab asignment, we can show the normal probability plot of simulated data that does follow a normal distribution:
qqnormsim(college_f)
We can see that simulations of college female heights can be just as extreme as the data set we were given. This analysis gives us more evidence that our data follows a normal distribution.
Answer:
Yes, the normal probability plot shows the sample generally follows a normal distribtuion. While the values bend up and to the left of the line which indicates a right skew is present, the qqnormsim() results show that this could be just due to the samples taken.
Answer:
This is a description of a data set that can be approximated using a geometric distribution. The probability is:
\[{ \left( 1-p \right) }^{ n-1 }\quad \ast \quad p\] \[{ \left( 1-0.02 \right) }^{ 10-1 }\quad \ast \quad 0.02\quad =\quad 0.0167\]
Answer:
Given that the production of each transistor is independent, then it is just the probability that there was no defect (98%) for 100 trials.
\[{ \left( 1-0.02 \right) }^{ 100 }\quad =\quad 0.1326\]
Answer:
Since this is a geometric distribution, the average and standard deviation can be determined as follows:
Mean:
\[\mu \quad =\quad \frac { 1 }{ p } \quad =\quad \frac { 1 }{ .02 } \quad =\quad 50\]
Standard Deviation:
\[\sigma \quad =\quad \sqrt { \frac { 1-p }{ { p }^{ 2 } } } \quad =\quad \sqrt { \frac { 1-0.02 }{ 0.02^{ 2 } } } =\quad 49.50\]
Answer:
Mean:
\[\mu \quad =\quad \frac { 1 }{ p } \quad =\quad \frac { 1 }{ .05 } \quad =\quad 20\]
Standard Deviation:
\[\sigma \quad =\quad \sqrt { \frac { 1-p }{ { p }^{ 2 } } } \quad =\quad \sqrt { \frac { 1-0.05 }{ 0.05^{ 2 } } } =\quad 19.49\]
Answer:
As the probability increases, the mean reduces in value because it is inversely correlated to the probability. Also, since the denominator of the standard deviation approximation has the higher power of x, the standard deviation will also decrease as the probability increases.
Answer:
choose(3, 2)
## [1] 3
\[\left( \begin{matrix} n \\ k \end{matrix} \right) { p }^{ k }{ \left( 1-p \right) }^{ n-k }\quad =\quad \left( \begin{matrix} 3 \\ 2 \end{matrix} \right) { 0.51 }^{ 2 }{ \left( 1-0.51 \right) }^{ 3-2 }\quad =\quad 3\quad *\quad { 0.51 }^{ 2 }\quad *\quad { \left( { 1-0.51 } \right) }^{ 1 }\quad =\quad 0.3823\]
Answer:
\[BBG = 0.51 * 0.51 * 0.49 = .1274 \\BGB = 0.51 * 0.49 * 0.51 = .1274\\GBB = 0.51 * 0.51 * 0.49 = .1274\\0.1274 + 0.1274 + 0.1274 = 0.3822\]
The answers for parts (a) and (b) match. The small discrepancy is due to rounding.
Answer:
As the number of trials increase, the number of possible combinations grows at a very fast rate. Using the example above, the smallest possible combination (other than n = 0 or n = n) would be n = 1 (or n = 7) which is 8. Adding another success increases these trials greatly
choose(8, 1)
## [1] 8
choose(8, 2)
## [1] 28
choose(8, 3)
## [1] 56
Answer:
This is a description of a negative binomial distribution. This probability can be calculated as follows:
choose(10 - 1, 3 - 1)
## [1] 36
\[\left( \begin{matrix} n-1 \\ k-1 \end{matrix} \right) { p }^{ k }{ \left( 1-p \right) }^{ n-k }\quad =\quad \left( \begin{matrix} 10-1 \\ 3-1 \end{matrix} \right) { 0.15 }^{ 3 }{ \left( 1-0.15 \right) }^{ 10-3 }\quad =\quad 36\quad *\quad { 0.15 }^{ 3 }\quad *\quad { \left( { 1-0.15 } \right) }^{ 7 }\quad =\quad 0.039\]
Answer:
Given each serve is independent of the previous serves, the probability that the next serve (10th) serve will be successful is 15%.
Answer:
part (a) is seeking the probability that the third success will happen on the tenth attempt. part (b) is not concerned with the outcome of the previous nine serves, only the tenth attempt. Therefore, it is expected that there would be a different probability for each part.