Working backwards, Part II. (5.24, p. 203) A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.
n <- 25
x1 <- 65
x2 <- 77
ME <- ((x2 - x1)/2)
ME <- ((77-65)/2)
ME
## [1] 6
df <- 25 - 1
p <- 0.9
p_2tails <- p + (1 - p)/2
t_val <- qt(p_2tails, df)
SE <- ME / t_val # ME = t * SE
sd <- SE * sqrt(n)# SE = sd/sqrt(n)
sd
## [1] 17.53481
SAT scores. (7.14, p. 261) SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.
Z <- 1.65
ME <- 25
SD <- 250
sampl <- (((Z*SD)/(ME))^2)
sampl
## [1] 272.25
Solution: Luke wants to use a 99% CI, therefore, Zscore would be larger with 99 % confidence interval. The sample should be larger since it will require a higher Z number multiplied by the standard deviation and squared.
z <- 2.575 # With 99% CI
ME <- 25
sd <- 250
n <- ((z * sd) / ME ) ^ 2
n
## [1] 663.0625
High School and Beyond, Part I. (7.20, p. 266) The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects. Here we examine a simple random sample of 200 students from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.
The means seem slighly different but there is no clear differences in average reading and writing with the distribution almost normal.
Are the reading and writing scores of each student independent of each other? The scores of either reading or writing seems to be independent.
Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?
H0: The difference in means between reading and writing equal zero (μr−μw=0). HA: The difference in means between reading and writing does NOT equal zero (μr−μw≠0).
The conditions required for this test are independence and normality. The box plots indicates that the Conditions are satisfied.
n <- 200
mean_diff <- -.545
df <- n-1
SD <- 8.887
SE <- SD/sqrt(n)
T <- (mean_diff-0)/SE
pvalue <- pt(T, df)
pvalue
## [1] 0.1934182
Since the p-value is greater than 0.05, this implies that there is no sufficient evidence to support the HA hypothesis of a difference in the mean of student’s reading and writing exam scores.
Type I error: Incorrectly reject the null hypothesis. Type II error: Incorrectly reject the alternative hypothesis. Since we have rejected HA, if it were actually true, we would have committed a Type II error.
Yes, I will expect CI to include 0, as the results shows that there is no sufficient evidence of a difference in the means.
Fuel efficiency of manual and automatic cars, Part II. (7.28, p. 276) The table provides summary statistics on highway fuel economy of cars manufactured in 2012. Use these statistics to calculate a 98% confidence interval for the difference between average highway mileage of manual and automatic cars, and interpret this interval in the context of the data.
The hypotheses for this test are as follows:
H0: The difference of average miles is equal to zero. HA: The difference of average miles is NOT equal to zero.
n = 26
mean_diff = 16.12 - 19.85
sd_diff = 3.58 - 4.51
SE_diff = sqrt((3.58^2)/n + (4.51^2)/n)
t = (mean_diff-0)/SE_diff
p = round(2 * pt(t, n-1), 3)
p
## [1] 0.003
With σ=.05 and p = 0.003, I will reject H0 and accept that there is evidence of a difference in fuel efficiency between manual and automatic transmissions.
Email outreach efforts. (7.34, p. 284) A medical research group is recruiting people to complete short surveys about their medical history. For example, one survey asks for information on a person’s family history in regards to cancer. Another survey asks about what topics were discussed during the person’s last visit to a hospital. So far, as people sign up, they complete an average of just 4 surveys, and the standard deviation of the number of surveys is about 2.2. The research group wants to try a new interface that they think will encourage new enrollees to complete more surveys, where they will randomize each enrollee to either get the new interface or the current interface. How many new enrollees do they need for each interface to detect an effect size of 0.5 surveys per enrollee, if the desired power level is 80%?
# alpha = 0.05, z = 1.96
# beta = 0.80, z = 0.842
d = 0.5
sd = 2.2
samplSize <- 2*((1.96 + 0.842)^2)*2.2^2/0.5^2
samplSize
## [1] 303.9986
The will need 304 new enrollees
Work hours and education. The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
#library(openintro)
#download.file("http://www.openintro.org/stat/data/gss2010.Rda",destfile="gss2010.Rda")
#load("gss2010.Rda")
The hypotheses for this ANOVA test follow: H0: The difference of ALL averages are equal. HA: There is one average that is NOT equal to the other ones.
The data in each group seems normal and the variability is similar across the groups which satisfies the assumptions for ANOVA that observations should be independent.
mu <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
table <- data.frame (mu, sd, n)
n <- sum(table$n)
k <- length(table$mu)
# Find degrees of freedom
df <- k - 1
Residual <- n - k
# F-statistic:
Prf <- 0.0682
F_stat <- qf( 1 - Prf, df , Residual)
# F-statistic = MSG/MSE
MSG <- 501.54
MSE <- MSG / F_stat
SSG <- df * MSG
SSE <- 267382
SST <- SSG + SSE
Dft <- df + Residual