90% confidence interval: (65, 77)
Distribution: approximately normal, standard deviation unknown
25 observations
n <- 25
ci1 <- 65
ci2 <- 77
Sample mean:
sample_mean <- (ci1 + ci2)/2
sample_mean
## [1] 71
Margin of error:
me <- (ci2 - ci1)/2
me
## [1] 6
Sample standard deviation:
t <- qt(0.95, 24)
sd <- me/t*sqrt(n)
sd
## [1] 17.53481
Standard deviation: 250
Margin of error: no more than 25
z1 <- 1.645 #for 90% ci
me <- 25
sd <- 250
n1 <- (z1*sd/me)^2
n1
## [1] 270.6025
99% confidence interval:
The sample size needs to be larger to gain more accurate estimate. From the formula point of view, \(z\)-score will be greater for a 99% confidence interval and therefore makes the sample size greater.
Minimum sample size for 99% confidence interval:
z2 <- 2.575
n2 <- (z2*sd/me)^2
n2
## [1] 663.0625
sample size: 200
Difference in the average of reading and writing scores?
There is difference in the average of reading and writing score, but not too obvious.
scores independent?
While there is no definite connection between reading and writing scores, scores should not be independent.
is there an evident difference in the average scores of students in the reading and writing exam?
\(H_0\): there is no difference between the average scores of reading and writing. \(\mu_r=\mu_w\)
\(H_A\): there is difference between the average scores. \(\mu_r\neq\mu_w\)
p <- pt((-0.545-0)/(8.887/sqrt(200)),200-1)
p
## [1] 0.1934182
Since 0.193 > 0.05, there is no convincing evidence between the average scores on the two exams.
What type of error might we have made? Explain what the error means in the context of the application.
A Type 1 Error is rejecting the null hypothesis when H0 is actually true.
A Type 2 Error is failing to reject the null hypothesis when the alternative is actually true.
Since we rejected the alternative hypothesis, we can only mke Type 2 Error.
Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning.
Yes. When we select \(H_0\), the confidence interval should include 0.
Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel e“ciency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a di???erence between the average fuel e”ciency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.
\(H_0\): \(\mu_A=\mu_M\)
\(H_A\): \(\mu_A\neq\mu_M\)
n <- 26
sd_A <- 3.58
sd_M <- 4.51
mean_A <- 16.12
mean_M <- 19.85
mean_diff <- mean_A - mean_M
sd_diff <- sqrt((sd_A ^ 2 / n) + (sd_M ^ 2 /n))
p <- pt((mean_diff - 0)/sd_diff, n - 1)
p
## [1] 0.001441807
Since 0.0014 < 0.05, \(H_0\) should be rejected and \(H_A\) should be accepted.
The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
(a) Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.
\(H_0\): all average number of hours worked across the five groups are equal.
\(H_A\): at least 1 of the average number doesn’t equal to the rest.