Homework 5

Problem 5.6

First, we need to calculate the mean, $\bar{x}$, by taking the average of the limits of the confidence interval:

$\bar{x} = \frac{\text{Lower Limit }+\text{ Upper Limit}}{2} = \frac{65 + 77}{2} = \boxed{71}$

We also know that the margin of error can be calculated with the following equation:

$E = t^{*}\frac{s}{\sqrt{n}}$

which can be rearranged to:

$s = \frac{E\sqrt{n}}{t^{*}}$

The margin of error, $E$, can be calculated by the difference of the boundaries of the confidence interval divided by 2:

$E = \frac{\text{Upper Limit }-\text{ Lower Limit}}{2} = \frac{77 - 65}{2} = \boxed{6}$

The sample size, $n$, is 25, and the critical value, $t^{*}$, can be found in the $t$-distribution chart with the degrees of freedom, $df$, being equal to 24 ($n-1$), and the cutoff for the tail being 0.10 ($1-\text{Confidence Level}$). Or you can use the following R-code:

conf <- 0.90
samp_size <- 25
crit_t <- abs(qt((1-conf)/2,samp_size-1)) # Divide by 2 for two-tailed distribution
crit_t

[1] 1.710882

Now, we can pluh back in to the previous equation for standard deviation:

$s = \frac{E\sqrt{n}}{t^{*}} = \frac{6\sqrt{25}}{1.710882} = 17.5348 \approx \boxed{17.5348}$

$\boxed{\bar{x} = 71, E = 6, s = 17.5348}$

Problem 5.14

We are given the following parameters:

$s = \$250, E = \$25$

If the sample is greater than 30, we can assume normality and use the following equation for margin of error:

$E = z^{*}\frac{s}{\sqrt{n}}$

We can rearrange this equation to get:

$n = \left(z^{*}\frac{s}{E}\right)^{2}$

The first step is to calculate the critical z value, $z^{*}$, at the 90% confidence level ($c = 0.90$). We can look in the table or use R:

conf <- 0.90
tail <- (1-conf)/2 # Divide by 2 for two-tailed distribution
crit_z <- qnorm(tail,lower.tail=FALSE)
crit_z

[1] 1.644854

Now, we can plug this back into the equation and get the sample size:

$n = \left(z^{*}\frac{s}{E}\right)^{2} = \left(1.644854\frac{250}{25}\right)^2 = 270.554$

We need to round up to ensure enough people, and we get $\boxed{n = 271}$.

This sample size should be larger. If we want to be more confident with the same margin of error, the sample size will need to be larger.
The first step is to calculate the critical z value, $z^{*}$, at the 99% confidence level ($c = 0.99$). We can look in the table or use R:

conf <- 0.99
tail <- (1-conf)/2 # Divide by 2 for two-tailed distribution
crit_z <- qnorm(tail,lower.tail=FALSE)
crit_z

[1] 2.575829

Now, we can plug this back into the equation and get the sample size:

$n = \left(z^{*}\frac{s}{E}\right)^{2} = \left(2.575829\frac{250}{25}\right)^2 = 663.49$

We need to round up to ensure enough people, and we get $\boxed{n = 664}$.

Problem 5.20

We are given the following parameters:

$n = 200, \bar{x}_{diff} = -0.545 (\text{part e}), s_{diff} = 8.887 (\text{part e})$

There does not appear to be a clear difference since the box plots and the medians are relatively close together. The histogram also seems to be centered about 0.
They are not independent of each other because the people who are better readers tend to also be better writers and vice versa.
The null hypothesis for this would be that there is no difference between reading and writing scores, and the alternative hypothesis would be that the scores are different:

$H_{0}: \mu_{diff} = 0$

$H_{a}: \mu_{diff} \ne 0$

Yes. We can assume independence because the sample size is greater than 10. It’s random because the students are from a random simple. Finally, it’s normal because 199 degrees of freedom is shown to align almost exactly with a normal distribution.
The $t$ value can be calculated by the following equation:

$t = \frac{\bar{x}_{diff}}{s_{diff}\sqrt{n}} = \frac{-0.545}{8.887/\sqrt{200}}$

The p-value for the above t-statistic and a sample size of 200 can be calculated using the following R-code (assumed 95% confidence):

xd <- -0.545
sd <- 8.887
size <- 200
t <- xd/(sd/sqrt(size))
df <- size-1
pt(t,df,lower=FALSE)

[1] 0.8065818

Since the p-value is greater than 0.05, we failed to reject the null hypothesis, and there is not enough evidence to say that there is a difference in scores.

It is possible that we made a Type 2 Error, which is failing to reject the null hypothesis, when the null hypothesis is false. In our case, the test scores may be different, but we did not see that due to the error.
Since we failed to reject the null hypothesis, we should see 0 in the confidence interval because 0 may be the true average difference.

Problem 5.32

Let’s denote the Automatic data with the subscript “1”, and denote the Manual data with the subscript “2”. The givens for both are:

$\text{Automatic}: n_{1} = 26, \bar{x}_{1} = 16.12, s_{1} = 3.58$

$\text{Manual}: n_{2} = 26, \bar{x}_{2} = 19.85, s_{2} = 4.51$

The problem also stated that they were looking for a difference between the two. Therefore, the hypotheses look like this:

$H_{0}: \mu_{1} = \mu_{2}$

$H_{a}: \mu_{1} \ne \mu_{2}$

The final given was that we are assuming a 5% significance level ($\alpha=0.05$). The equation to calculate the t-statistic is:

$t = \frac{\bar{x}_{1}-\bar{x}_{2}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}$

The degrees of freedom, $df$, is the sample size of the smaller sample minus 1. We can calulate the p-value with this information in R:

x1 <- 16.12
s1 <- 3.58
n1 <- 26
x2 <- 19.85
s2 <- 4.51
n2 = 26
vec1 = c(x1)
vec2 = c(x2)
t <- abs(x1-x2)/(sqrt(((s1^2)/n1)+((s2^2)/n2)))
df <- min(n1-1,n2-1)
2*pt(t,df,lower=FALSE)

[1] 0.002883615

Since the p-value is less than 0.05, we can reject the null hypothesis in favor of the alternative. Therefore, there is significant evidence that the average fuel efficiency between manual and automatic are different.

Problem 5.48

The null and alternative hypothesis are listed below:

$H_{0}: \mu_{1} = \mu_{2} = \mu_{3} = \mu_{4} = \mu_{5}$

$H_{a}: \text{Not all means are equal}$

The subscripts represent each of the groups.

This study is independent because the people in the different groups are independent. We can say that it is normally distributed because there are a lot of samples per group. We can also say that there are equal variances since the ratio of the largest standard deviation to the smallest is less than 2.
First, we need to calculate the degrees of freedom for the groups, $df_{G}$, residuals, $df_{E}$, and the total, $df_{T}$. The degrees of freedom are calculated below:

$df_{G} = \text{Num. Groups} - 1 = 5 - 1 = \boxed{4}$

$df_{E} = n - \text{Num. Groups} = 1172 - 5 = \boxed{1167}$

$df_{T} = df_{C} + df_{R} = 4 + 1167 = \boxed{1171}$

The Sum of the Squares for the groups, $SSG$, can be seen below:

$MSG = \frac{SSG}{df_{G}} \rightarrow SSG = MSG \times df_{G} = 501.54 \times 4 = \boxed{2006.16}$

The Total Sum of the Squares, $SST$, can be seen below:

$SST = SSG + SSE = 2006.16 + 267382 = \boxed{269388}$

The Mean Square of the Residuals, $MSE$, can be seen below:

$MSE = \frac{SSE}{df_{E}} = \frac{267382}{1167} = \boxed{229.119}$

The $F$-value can be calculated by the following:

$F = \frac{MSG}{MSE} = \frac{501.54}{229.119} = \boxed{2.189}$

The p-value is greater than the significance level ($\alpha=0.05$). Therefore, there is not enough evidence to reject the null hypothesis in favor of the alternative. Therefore, there is no statistical evidence that a group has a different mean.