Graded: 3.2 (see normalPlot), 3.4, 3.18 (use qqnormsim from lab 3), 3.22, 3.38, 3.42
data(package = 'openintro')
library(DATA606)
## Loading required package: shiny
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
## Loading required package: OIdata
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: maps
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:openintro':
##
## diamonds
## Loading required package: markdown
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
a Z > -1.13
?normalPlot
?pnorm
normalPlot(mean = 0, bounds = c(-1.13,100000), sd = 1)
1-pnorm(-1.13, mean = 0, sd = 1 )
## [1] 0.8707619
b Z < 0.18
normalPlot(mean = 0, bounds = c(-100000,0.18), sd = 1)
pnorm(0.18, mean = 0, sd = 1 )
## [1] 0.5714237
c Z > 8
normalPlot(mean = 0, bounds = c(8,1000000), sd = 1)
1-pnorm(8, mean = 0, sd = 1 )
## [1] 6.661338e-16
d |Z| <0.5
normalPlot(mean = 0, bounds = c(-.5,.5) ,sd = 1)
pnorm(.5, mean = 0, sd = 1 )-pnorm(-.5, mean = 0, sd = 1 )
## [1] 0.3829249
• The finishing times of the Men, Ages 30 - 34 group has a mean of 4313 seconds with a standard deviation of 583 seconds. • The finishing times of the Women, Ages 25 - 29 group has a mean of 5261 seconds with a standard deviation of 807 seconds. • The distributions of finishing times for both groups are approximately Normal.
Remember: a better performance corresponds to a faster finish.
Men, Ages 30 - 34 N(μ =4313, sigma = 583)
Women, Ages 25 - 29 N(μ =5261, sigma = 807)
z_leo <- (4948-4313)/583
z_leo
## [1] 1.089194
Mary
z_mary <- (5513-5261)/807
z_mary
## [1] 0.3122677
Comparing their times with respect to the averages in their groups, Mary performed better in her group. Her finish time was within 10% of the mean while Leo’s finish time was greater than 10%.
Leo
percent_leo <- (1- pnorm(4948,mean = 4313, sd = 583))*100
percent_leo
## [1] 13.80342
Mary
percent_mary <- (1 - pnorm(5513,mean = 5261, sd = 807))*100
percent_mary
## [1] 37.74186
Yes, it would effect the answers in parts (b)-(e) since some means/standard deviations would be different.
mean_heights <- 61.52
sd_heights <- 4.58
total_women <- 25
heights <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
one_sd <- 100*sum(heights > (mean_heights-1*sd_heights) & heights < (mean_heights+1*sd_heights))/total_women
two_sd <- 100*sum(heights > (mean_heights-2*sd_heights) & heights < (mean_heights+2*sd_heights))/total_women
three_sd <- 100*sum(heights > (mean_heights-3*sd_heights) & heights < (mean_heights+3*sd_heights))/total_women
rule <- c(one_sd,two_sd,three_sd)
rule
## [1] 68 96 100
Yes it follows the rule
hist(heights, probability = T)
# lines(x = x, y = dnorm(x = x, mean = 61.25, sd = 4.58), col = 'red')
qqnorm(heights)
The distribution appears to be nearly normal. Based on the pictures (in the homework), the bell curve on the histogram plot and the probability plot shows a pretty normal distribution.
(0.98)^9 * 0.02
## [1] 0.01667496
(0.98)^100
## [1] 0.1326196
1/0.02
## [1] 50
In 50 attemps, the standard deviation is sqrt(nxpxq), where q = 1-p
sqrt(50*.98*.02)
## [1] 0.9899495
1/.05
## [1] 20
sqrt(20*.95*.05)
## [1] 0.9746794
If we increase the probability of deffects, the mean for a defective product to be produced would lower, and the standard deviation would .70 as we approach 50% deffective rate after which it would return to .9899495 as we increase deffective rate towards 98%.
boy <- 0.51
girl <- 0.49
prob_2_boys <- choose(3,2)*(boy^2)*(girl)
prob_2_boys
## [1] 0.382347
prob_2_boys <- boy*boy*girl + boy*girl*boy + girl*boy*boy
prob_2_boys
## [1] 0.382347
The approach in part b is tedious because of the number of combinations (8 choose 3 = 56). Using combinatorics, we can simply this process as see in problem a.
serve <- 0.15
no_serve <- 0.85
choose(9,2)*(serve^2)*(no_serve^7)*serve
## [1] 0.03895012
0.15 - each attemp is iid so the probability of any given serve being successful will be 0.15.
In (a), we are explaining 2 series of events that are all iid. The first series of events is her having 2 successful serves out of 9. The second is her having just 1 successful serve, which is probability 0.15. In order to get the probability of event (a) we must multiply the probability of series 1 with the probability of series 2. In (b), we are only testing whether the 10th serve will be successful or not. As stated in the description, each serve is independent of each other, meaning the 10th serve is not affected by the outcomes of the previous 9 serves. It is only 1 event counting 1 serve as opposed to (b) that counts a series of events.