5.6 Working backwards, Part II.

A 90% confidence interval for a population mean is (65,77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

Solution to 5.6

ANSWER:

Formula for the sample mean: \(\frac{x_1 + x_2}{2}\).

n <- 25
x1 <- 65
x2 <- 77
Samp_mean <- (x1+x2)/2
print(Samp_mean)
## [1] 71

ANSWER: Sample mean for this sample is: ‘r samp_mean’ (71).
Formula for the margin of error: \(\frac{x_2 - x_1}{2}\).

moe <- (x2-x1)/2
print(moe)
## [1] 6

ANSWER:Margin of error is: 6.
Formula for the standard deviation: \(SE = \frac{s}{\sqrt{n}}\).

df <- n-1
c <- 0.9
c2 <- c + (1-c)/2
t24 <- qt(c2, df)

\(ME = t_{24}^{*}SE \to SE = \frac{ME}{t_{24}^{*}} \to s = \frac{ME\sqrt{n}}{t_{24}^{*}} \to \frac{6\sqrt{25}}{t_{24}}\)

s <- (moe*sqrt(25))/(t24)
print(s)
## [1] 17.53481

ANSWER: The standard deviation for this sample is: 17.5348146 (17.53).

5.14 SAT scores.

SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.
(a) Raina wants to use a 90% confidence interval. How large a sample should she collect?
(b) Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.
(c) Calculate the minimum required sample size for Luke.

Answer (a)

\(ME = z*\frac{\sigma}{\sqrt{n}}, \quad ME = 25, \sigma = 250\).
For a confidence interval of 90% we want our ME which equals \(1.645 \times \frac{250}{\sqrt{n}}\) to be less than 25.

# Z 1.645 is for 90% confidence interval
z <- 1.645
me <- 25
sigma <- 250
# 25 = 1.645*(250/sqrt(n))
n <- (z*sigma/me)^2

ANSWER: 270.6025

Answer (b):

What Luke wants will certainly result in a larger sample size, due to the fact that he wants a very high certainly (at 99% level), which is very costly in sample size (the more strict rule, the bigger same size needed).

Answer (c)

# Z 2.58 is the Z value for 99% confidence interval
z <- 2.58
n <- (z*sigma/me)^2

ANSWER: 665.64 Luke will need a minimum of 666 students (round up of 665.64)

5.20 High School and Beyond, Part I.

  1. Is there a clear difference in the average reading and writing scores?
  2. Are the reading and writing scores of each student independent of each other?
  3. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?
  4. Check the conditions required to complete this test.
  5. The average observed difference in scores is \(\bar{x}_{\text{read}-\text{write}} = -0.545\), and the standard deviation of the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?
  6. What type of error might we have made? Explain what the error means in the context of the application.
  7. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning. ### Answer (a): For n = 200. from this box plot, we cannot find a clar difference in the average reading and writing scores. The mean in the box plot are close to each other. In the histogram, the center seems to be close to 0.

Answer (b):

No, the reading and writing scores are not independent of each other. For each student, these are tests conducted on the same subject.

Answer (c)

\(H_0: \mu_{\text{diff}} = 0.\) There is no difference in the average scores of students in the reading and writing exam. \(H_A: \mu_{\text{diff}} \neq 0.\) There is a difference in average scores.

Answer (d)

Condition of independence: this hold true, because the sample size of N 200 is bigger than the required 30. The 200 is also less than 10% of the population size. Condition of skewness: Judging from the histogram, it looks like there is no skewness observed. The t-distribution can be safely applied to this with above two conditions.

Answer (e)

n <- 200
df <- n-1
sd <- 8.886
avg_diff <- -0.545
se <- sd / sqrt(n)
tdf <- (avg_diff - 0)/(se)
p <- 2 * pt(tdf, df)

Answer (f)

Since the calculated P value is 0.3867, which is bigger than the 0.05 threshold, we conclude that there is no evidence that the two groups are statistically different from each other. In this case, the possible error we could have made is a Type 2 Error, since we failed to detect a difference.

Answer (g)

Since 0 is included in the range of our null hypothesis, and we accepted the null hypothesis, yes, I would expect 0 to be included in a confidence interval.

5.32 Fuel efficiency of manual and automatic cars, Part I.

Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.

Answer to 5.32

\(\bar{x}_{\text{Automatic}} = 16.12, \quad s_{\text{Automatic}} = 3.58\).

\(\bar{x}_{\text{Manual}} = 19.85, \quad s_{\text{Manual}} = 4.51\).
\(n = 26\).
\(df = n - 1 = 25\).

\(H_0: \mu_{A} = 0.\) There is no difference in mpg between automatic and manual cars.
\(H_A: \mu_{M} \neq 0\). There is a difference in mpg.

\(\bar{x}_{\text{diff}} = \bar{x}_{A} - \bar{x}_{M} = 16.12 - 19.85 = -3.73\).

\(SE = \sqrt{\frac{s_{A}^2}{n}+\frac{s_{M}^2}{n}} = \sqrt{\frac{(3.58)^2}{26}+\frac{(4.51)^2}{26}} \approx 1.12927.\)

From above calucation, the P value calculated is 0.00288, which is less than the 0.052 threshold. We reject the null hypothesis and conlcude that there IS a difference in their fuel efficiency measured by mpg between automatic and manual cars.

Exercise 5.48 Work hours and education

The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents. Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.

Answer(a)

\(H_0: \mu_1 = \mu_2 = ... = \mu_n\). The mean number of hours worked is the same across all groups.
\(H_A: \mu_1 \neq \mu_2 \neq ... \neq \mu_n\). The mean number of hours worked is not the same, or there is some difference in some of the groups.

Answer (b)

We have found that both of the assumption hold true. Independence assumption:since \(n= 1,172 > 30\), and comprises \(< 10\%\) of the population, so we assume that the data is normally distributed. We also conclude the the variability across all groups are equal among groups, judging from the fact that the mean and standard deviation are similar.

Answer (c):

options(scipen = 999)  # this option to make the output look readable
# book pg 250 for these formulas 
df <- 4  # df, categories -1
dfe <- 1167  # total N-k, total number of cases minus the number of groups

MSG <- 501.54  # given values
SSE <- 267382

# Calculate the missing values
MSE <- SSE / dfe
SSG <- df * MSG
f_Statit <- MSG / MSE
dft <- df + dfe
SST <- SSG + df

pr <- pf(q = f_Statit, df, dfe, lower.tail = FALSE)
pr
## [1] 0.06819325

Answer (d)

Interpretation: above caluculated P value is 0.06819, which is >0.05. WE cannot reject the null hypothesis. We conclude that there is no difference across all demographics for the number of hours worked.