library('DATA606')##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
normalPlot(bounds=c(-1.13,Inf))round(1-pnorm(-1.13,0,1),2)## [1] 0.87
normalPlot(bound=c(-Inf,.18))round(pnorm(.18,0,1),2)## [1] 0.57
normalPlot(bound=c(8,Inf))round(1-pnorm(8,0,1),2)## [1] 0
normalPlot(bound=c(-.5,.5))round(pnorm(.5,0,1) - pnorm(-.5,0,1),2)## [1] 0.38
\[Men,\quad Ages\quad 30-34:\quad N(\mu =4313,\quad \sigma =583)\]
\[Women,\quad Ages\quad 25-29:\quad N(\mu =5261,\quad \sigma =807)\]
###b)
Zleo <- (4948-4313)/583
Zleo ## [1] 1.089194
Zmary <- (5513-5261)/807
Zmary ## [1] 0.3122677
These Z-scores show that Leo was 1.09 and Mary was .31 standard deviations above the mean in their groups.
1-pnorm(4948, 4313, 583)## [1] 0.1380342
1-pnorm(5313, 5261, 807)## [1] 0.4743115
Leo ranked better, being in the top 14% while Mary was in top 47%.
pnorm(4948, 4313, 583)## [1] 0.8619658
Leo was faster than 86% of his group
pnorm(5313, 5261, 807)## [1] 0.5256885
Mary was faster than 53% of her group
Answer b) would not change since a normal distribution isn’t needed for Z-scores. c) to e) could not be calculated without a normal distribution.
url <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%203%20Exercise%20Data/fheights.txt"
fheights <- read.table(url, header = TRUE, stringsAsFactors = FALSE)
onesd <- 61.52 + (1*4.58)
twosd <- 61.52 + (2*4.58)
threesd <- 61.52 + (3*4.58)
sum(fheights$heighs < onesd)/length(fheights$heighs)## [1] 0.8333333
sum(fheights$heighs < twosd)/length(fheights$heighs)## [1] 0.9583333
sum(fheights$heighs < threesd)/length(fheights$heighs)## [1] 1
This does follow the 68-95-99.7 rule since 83% are within 1 standard deviation, 96% are with 2, and 100% are within 3.
qqnorm(fheights$heighs)hist(fheights$heighs,probability = TRUE)
lines(50:75,dnorm(50:75,mean(fheights$heighs),sd(fheights$heighs)), col="blue")qqnormsim(fheights$heighs)The data appears to follow a normal distribution. The outputs are symetrical and match the randomly generated simulation.
((1-.02)^9) * (.02)## [1] 0.01667496
(1-.02)^100## [1] 0.1326196
Ex <- 1/.02
Ex## [1] 50
sd <- sqrt((1-0.02)/0.02^2)
sd## [1] 49.49747
Ex_new <- 1/.05
Ex_new## [1] 20
sd_new <- sqrt((1-0.05)/0.05^2)
sd_new## [1] 19.49359
Increasing the probability decreases te mean and standard deviation of waiting before failure.
dbinom(x=2,size=3,prob=.51)## [1] 0.382347
B <- .51 #probability of boy
G <- 1-B #probability of girl
#Different ways 3 kids can be born
P_kid1 <- B*B*G
P_kid2 <- B*G*B
P_kid3 <- G*B*B
#Total Probabilty
Total_kid <- P_kid1+P_kid2+P_kid3
Total_kid## [1] 0.382347
The answers for a and b match.
Using part b whould be more tedious since there would be many more combinations in the order that the kids could be born, making the formula longer.
p <- 0.15 #probability of successful server
n <- 10 #number of attempts
k <- 3 #number of successes
factorial(n - 1) / (factorial(k-1) * (factorial(n - k))) * p^k * (1-p)^(n-k)## [1] 0.03895012
The probability is 3.9%
Since the events are independent the probabilty is 15%.
Part b is looking at the probabilty of a single event with each event being independent. While part a is looking at a combination of events.