First check that the conditions for inference using the t-distribution are satisfied:
The 90% confidence interval is (65, 77).
Then the sample mean, margin of error, and the sample standard deviation are:
Sample mean: \(\bar{x} = 71\)
ci <- c(65, 77)
(m <- (ci[1] + ci[2])/2)
## [1] 71Margin of error: \(ME_{\bar{x}} = 6\)
(me <- (ci[2] - ci[1])/2)
## [1] 6Sample standard deviation: \(s = 17.5\)
This is calculated as \(s = SE_{\bar{x}} \cdot \sqrt{n}\), where the standard error is \(SE_{\bar{x}} = ME_{\bar{x}} / t^*\). The critical t-value of \(t^* = 1.71\) is based on the degrees of freedom \(df = 24\).
n <- 25
df <- n - 1
# critical t-value for 90% confidence interval
(t <- qt(0.95, df))
## [1] 1.710882
# standard error
(se <- me / t)
## [1] 3.506963
# sample std dev
(s <- se * sqrt(n))
## [1] 17.53481The population standard deviation is known in this case, \(\sigma = 250\).
The margin of error should be \(ME_{\bar{x}} \le 25\).
90% confidence interval
\(ME_{\bar{x}} = t^* SE_{\bar{x}} = t^* \sigma / \sqrt{n} \le 25\)
Let’s assume the sample size is large enough that the t-distribution approximates the normal distribution, so that we can use z-scores instead of t-scores. Then we use \(t_{0.90}^* \approx z^*_{0.90} = 1.645\) and solve for \(n\): \[n \ge \left(\frac{z^*_{0.90} \sigma}{25}\right)^2 = 270.6\]
So the sample size should be at least 271.
s <- 250
me <- 25
(z <- qnorm(0.95))
## [1] 1.644854
(n <- (z * s / me)^2)
## [1] 270.5543
Let’s check that when using the t-score with \(n=271\) that \(ME_{\bar{x}} \le 25\). Now the critical t-value is \(t^*_{0.90} = 1.651\) for \(df = n-1 = 270\):
\[ME_{\bar{x}} = t^*_{0.90} SE_{\bar{x}} = \frac{t^*_{0.90} \sigma}{\sqrt{n}} = 25.06\]
n <- 271
df <- n - 1
(t <- qt(0.95, df))
## [1] 1.650517
(se <- s / sqrt(n))
## [1] 15.18642
(me <- t * se)
## [1] 25.06544
Note that for a sample size of \(n=271\), the margin of error \(ME_{\bar{x}}\) is slightly greater than 25. If we need to be precisely below 25, then we should increase the sample size to \(n=273\), in which case the margin of error becomes \(ME_{\bar{x}} = 24.97\).
n <- 273
df <- n - 1
(t <- qt(0.95, df))
## [1] 1.650475
(se <- s / sqrt(n))
## [1] 15.13069
(me <- t * se)
## [1] 24.9728299% percent confidence interval
Holding the margin of error constant at 25, the sample size for a 99% confidence interval should be larger than that for a 95% confidence interval. This is because the margin of error is equal to the critical t-value times the standard error:
\[ME_{\bar{x}} = t^* SE_{\bar{x}} = \frac{t^* \sigma}{\sqrt{n}} \lt 25\]
When we increase the critical t-value (corresponding to the confidence level increasing from 95% to 99%), the standard error must decrease in order to hold the margin of error constant. Since the population standard deviation is fixed, we must increase the sample size in order to decrease the standard error.
99% confidence interval
Following the same calculation as in part (a), where now \(z^*_{0.99} = 2.576\):
\[n \ge \left(\frac{z^*_{0.99} \sigma}{25}\right)^2 = 664.9\]
So now the sample size should be at least 665.
(z <- qnorm(0.995))
## [1] 2.575829
(n <- (z * s / me)^2)
## [1] 664.9346
As before, let’s confirm using the t-distribution that a sample size of 665 produces a margin of error \(ME_{\bar{x}} \le 25\).
\[ME_{\bar{x}} = t^*_{0.99} SE_{\bar{x}} = \frac{t^*_{0.99} \sigma}{\sqrt{n}} = 25.04\]
n <- 665
df <- n - 1
(t <- qt(0.995, df))
## [1] 2.583254
(se <- s / sqrt(n))
## [1] 9.694584
(me <- t * se)
## [1] 25.04357
Again, to be precise, we need to increase the sample size to \(n=668\), in order to bring the margin of error below 25; in this case the margin of error becomes \(ME_{\bar{x}} = 24.99\)
n <- 668
df <- n - 1
(t <- qt(0.995, df))
## [1] 2.58322
(se <- s / sqrt(n))
## [1] 9.67279
(me <- t * se)
## [1] 24.98695This is paired data set of reading and writing scores with sample size of \(n=200\). The observations are the differences of reading and writing scores (reading - writing) for each student.
There is not a clear difference between the average reading and writing scores. The median writing score appears to be higher than the median reading score, but there appears to be wider dispersion in the reading scores. For the histogram of differences in scores (reading - writing) for the paired data set, it appears that the distribution is roughly symmetric and centered around 0 (i.e., no difference in reading and writing scores for each student, on average).
The 200 students are randomly selected from the entire survey population, and the sample size of 200 presumably is \(\ll\) the population size. This implies that the observed differences of scores should be independent.
\(H_0: \mu_{read-write} = 0\)
\(H_A: \mu_{read-write} \neq 0\)
This is a two-tailed test.
Sample mean: \(\bar{x}_{read-write} = -0.545\)
Sample standard deviation: \(s = 8.887\)
Standard error: \(SE_{\bar{x}_{read-write}} = s / \sqrt{n} = 0.628\)
T-score: \(T = (\bar{x}_{read-write} - \mu_0) / SE_{\bar{x}_{read-write}} = -0.867\).
The p-value (two-tailed) based on the T-score above with \(df = 199\) is 39%, in which case we fail to reject the null hypothesis. In other words, the data do not provide convincing evidence, at a significance level of \(\alpha = 0.05\) (or even at \(\alpha = 0.20\)), that the mean difference of scores is different than 0.
m = -0.545
s = 8.887
n = 200
df = n - 1
(se = s / sqrt(n))
## [1] 0.6284058
(t = m / se)
## [1] -0.867274
# two-tailed p-value
(p = pt(t, df)) * 2
## [1] 0.3868365We may have made a Type 2 error, i.e., we failed to reject the null hypothesis \(H_0\) when the alternative hypothesis \(H_A\) is in fact true. In this case, the mean difference in the reading and writing scores for the population is different than 0, but we fail to make that conclusion based on the hypothesis testing of the sample mean.
Yes; the sample estimate \(\bar{x}_{read-write}\) is too close to the null estimate of 0 to reject the null hypothesis, which suggests that the confidence interval around the sample estimate will include 0. We can confirm this:
Critical t-score: \(t^*_{0.95} = 1.972\)
95% confidence interval: \(\left(\bar{x}_{read-write} \pm t^*_{0.95} SE_{\bar{x}_{read-write}}\right) = (-1.784, 0.694)\)
The confidence interval, as expected, does include 0.
(t_crit <- qt(0.975, df))
## [1] 1.971957
m + c(-t_crit, t_crit) * se
## [1] -1.7841889 0.6941889We can assume that the conditions for inference are satisfied, per the question; so we will use inference with the t-distribution to test the difference of two population means.
The p-value of 0.3% is well under \(\alpha = 0.05\), so we can reject \(H_0\) in favor of the alternative hypothesis \(H_A\). We conclude that the mean fuel efficiencies of automatic and manual cars are different.
x1 <- 19.85
x2 <- 16.12
s1 <- 4.51
s2 <- 3.58
n1 <- 26
n2 <- 26
df <- min(n1, n2) - 1
(m <- x1 - x2)
## [1] 3.73
(se <- sqrt(s1^2 / n1 + s2^2 / n2))
## [1] 1.12927
(t <- m / se)
## [1] 3.30302
# two-tailed p-value
(p <- pt(t, df, lower.tail=FALSE) * 2)
## [1] 0.002883615ANOVA analysis of means across many groups.
Here there are \(k=5\) groups, with \(n=1172\) observations.
\(H_0: \mu_{<HS} = \mu_{HS} = \mu_{JC} = \mu_{B} = \mu_{G}\)
Population means are the same for all groups
\(H_A: \mu_i \neq \mu_j\) for at least one pair of groups \(i\) and \(j\)
Population means are NOT the same for all groups
See table below.
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| degree | 4 | 2,006 | 501.54 | 2.19 | 0.0682 |
| Residuals | 1,167 | 267,382 | 229.12 | ||
| ——– | ——– | ——– | ——– | ——– | ——– |
| Total | 1,171 | 269,388 |
n <- 1172
k <- 5
MSG <- 501.54
SSE <- 267382
(df_G <- k - 1)
## [1] 4
(df_E <- n - k)
## [1] 1167
(df_T <- n - 1)
## [1] 1171
(SSG <- MSG * df_G)
## [1] 2006.16
(SST <- SSG + SSE)
## [1] 269388.2
(MSE <- SSE / df_E)
## [1] 229.1191
(F <- MSG / MSE)
## [1] 2.188992The p-value of 0.068 is greater than the significance level of \(\alpha = 0.05\), so we fail to reject \(H_0\). The survey data is insufficient to demonstrate that there is a statistically significant difference of population means across the groups.