library('DATA606')
## Loading required package: shiny
## Warning: package 'shiny' was built under R version 3.5.1
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
## Loading required package: OIdata
## Warning: package 'OIdata' was built under R version 3.5.1
## Loading required package: RCurl
## Warning: package 'RCurl' was built under R version 3.5.1
## Loading required package: bitops
## Loading required package: maps
## Warning: package 'maps' was built under R version 3.5.1
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.1
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:openintro':
##
## diamonds
## Loading required package: markdown
## Warning: package 'markdown' was built under R version 3.5.1
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
library('data.table')
N(μ =0,! = 1) is found in each region? Be sure to draw a graph.
The area under the curve where the Z score is greater than -1.13 is 0.87. Using the complement Probability 1-P.
1-0.1292
## [1] 0.8708
normalPlot(mean = 0, sd = 1, bounds = c(-1.13, 5), tails = FALSE)
(b) Z < 0.18 The area under the curve where the Z score is less than 0.18 is .5714
normalPlot(mean = 0, sd = 1, bounds = c(-5, 0.18), tails = FALSE)
normalPlot(mean = 0, sd = 1, bounds = c(0, 0), tails = FALSE)
Can be re-written as -.5 < z > .5 = P(Z<.5)-P(Z>-.5) The area under the curve is 0.383
0.6915-0.3085
## [1] 0.383
normalPlot(mean = 0, sd = 1, bounds = c(-.5, .5), tails = FALSE)
##3.4 Triathlon times, Part I. In triathlons, it is common for racers to be placed into age and gender groups. Friends Leo and Mary both completed the Hermosa Beach Triathlon, where Leo competed in the Men, Ages 30 - 34 group while Mary competed in the Women, Ages 25 - 29 group. Leo completed the race in 1:22:28 (4948 seconds), while Mary completed the race in 1:31:53 (5513 seconds). Obviously Leo finished faster, but they are curious about how they did within their respective groups. Can you help them? Here is some information on the performance of their groups:
• The finishing times of the Men, Ages 30 - 34 group has a mean of 4313 seconds with a standard deviation of 583 seconds.
• The finishing times of the Women, Ages 25 - 29 group has a mean of 5261 seconds with a standard deviation of 807 seconds.
• The distributions of finishing times for both groups are approximately Normal.
Remember: a better performance corresponds to a faster finish. (a) Write down the short-hand for these two normal distributions. ###Answer For men, N(u=4313, sd=583) For women, N(u=5261, sd=807)
(4948-4313) / 583
## [1] 1.089194
Mary’s z-score is 0.312
(5513-5261) / 807
## [1] 0.3122677
The z-scores show the number of standard deviations either above or below the mean.
Did Leo or Mary rank better in their respective groups? Explain your reasoning. ###Answer We see that Leo finished better than Mary did in his group than she did within her’s. Leo was more than a full standard deviation above the mean while Mary was only 1/3 above the mean.
What percent of the triathletes did Leo finish faster than in his group? ###Answer Leo finished faster than 86.21% of the runners while
What percent of the triathletes did Mary finish faster than in her group? ###Answer Mary finished faster than 62.17% of the runners in her group.
data =c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
one_sd<-data[between(data,(mean(data)-sd(data)), (mean(data)+sd(data)))]
length(one_sd) / length(data)
## [1] 0.68
two_sd<-data[between(data,(mean(data)-2*sd(data)), (mean(data)+2*sd(data)))]
length(two_sd) / length(data)
## [1] 0.96
three_sd<-data[between(data,(mean(data)-3*sd(data)), (mean(data)+3*sd(data)))]
length(three_sd) / length(data)
## [1] 1
Yes. The grpahs show a unimodal distribution with the highest bar being in the middle of the graph.
qqnorm(data)
qqline(data)
p=.02 P(X=k) = (1-p)^(k-1)*p
k=10 1-p = .8 k-1=10 p=
p <- .02
k<- 10
One_minus_p <- 1-p
prob_tenth_transistor_defect <- (One_minus_p^(k-1))*p
prob_tenth_transistor_defect
## [1] 0.01667496
prob_of_defect <- .02
prob_of_no_defect <- 1-prob_of_defect
n <-100
prob_of_no_defect^n
## [1] 0.1326196
u<- 1/p
u
## [1] 50
What is the standard deviation? ##Answer: 49.49747
var <- (1-p) / (p^2)
sd <- sqrt(var)
sd
## [1] 49.49747
p2 <- .05
u2 <- 1/p2
u2
## [1] 20
var2 <- (1-p2) / (p2^2)
sd2 <- sqrt(var2)
sd2
## [1] 19.49359
Based on your answers to parts (c) and (d), how does increasing the probability of an event affect the mean and standard deviation of the wait time until success? ##Answer: Increasing the probability of an event lowers the mean and standard deviation of the wait until success.
n = 3 k = 2 p = .51
dbinom(2,size=3,prob = .51)
## [1] 0.382347
P(g)XP(b)xP(b) P(b)xP(g)xP(g) P(b)xP(b)xP(g)
prob_g <- .49
prob_b <- .51
(prob_g*prob_b*prob_b) + (prob_b*prob_g*prob_b) +(prob_b*prob_b*prob_g)
## [1] 0.382347
If we wanted to calculate the probability that a couple who plans to have 8 kids will have 3 boys, briefly describe why the approach from part (b) would be more tedious than the approach from part (a). ##Answer: It would be manually tedious to repeatedly write out the order of the probabilities. The three boys could be at any three of the 8 outcomes. Also, this would be prone to error.
dbinom(2,size=9,prob = .15) * .15
## [1] 0.03895012
Suppose she has made two successful serves in nine attempts. What is the probability that her 10th serve will be successful? ##Answer: Given independence of each try, she has a 15% percent chance on her 10th try.
Even though parts (a) and (b) discuss the same scenario, the probabilities you calculated should be different. Can you explain the reason for this discrepancy? ##Answer: Each serve is independent of each other. In part (a) we did not know which two serves were successful. In part (b), we were informed that two of the first serves were successful.