DATA 606 HW 5

5.6 Working backwards, Part II.

A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

Identify variables: \[n = 25\] \[df = n - 1 = 25 - 1 = 24\] SAMPLE MEAN: \[\bar{x} = \frac{65+77}{2} = 71\] The \(t\) statistic for 0.10 with \(df = 24\) is 1.71

Therefore: \[ \begin{align} \bar{x} + t_{24} \times SE &= 77 \\ 71 + 1.71 \times SE &= 77 \\ SE &= \frac{77-71}{1.71} \approx 3.51 \end{align} \]

SAMPLE STANDARD DEVIATION: \[ \begin{align} SE &= \frac{s}{\sqrt{n}} \\ s &= SE \times \sqrt{n} = 3.51 \times \sqrt{25} \approx 17.54 \end{align} \] MARGIN OF ERROR: \[CI_{upper} - \bar{x} = 77 - 71 = 6\]

5.14 SAT scores.

SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.

Raina wants to use a 90% con???dence interval. How large a sample should she collect?

For the margin of error to be less than 25, then \[t \times SE \leq 25\] For larger samples, the \(t\) statistic for 90% confidence is 1.64 and standard error is defined \(SE = \frac{s}{\sqrt{n}}\). \[1.64 \times \frac{250}{\sqrt{n}} \leq 25 \rightarrow n \geq 268.96 \rightarrow n \geq 269\]

Luke wants to use a 99% con???dence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

Since the \(t\) statistic is larger for 99% confidence, the \(n\) must be larger as well to maintain the margin of error less than 25. This is true because the formula for margin of error has the \(t\) statistic in the numerator and \(n\) in the denominator.

Calculate the minimum required sample size for Luke. \[2.33 \times \frac{250}{\sqrt{n}} \leq 25 \rightarrow n \geq 542.89 \rightarrow n \geq 543\]

5.20 High School and Beyond, Part I.

The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects. Here we examine a simple random sample of 200 students from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.

Is there a clear difference in the average reading and writing scores?

The median value of the write score is higher, but the IQR for each is fairly similar. Also, the histogram of the difference looks to be normally distributed around 0.

Are the reading and writing scores of each student independent of each other?

I can’t imagine that a student’s reading score is independent of his or her writing score - they are similar skills that probably influence one another.

Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?

\[H_{0}: \mu_{read} - \mu_{write} = 0\] \[H_{0}: \mu_{read} - \mu_{write} \neq 0\]

Check the conditions required to complete this test.

The sample size is large enough at 200 observations. Each student’s observations are theoretically independent of other students. The sample was taken randomly. We don’t see the skew in the original data, but the sample size is large enough to cover moderate skew.

The average observed difference in scores is \(\bar{x}_{read-write} = 0.545\), and the standard deviation of the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?

\[SE = \frac{s}{\sqrt{n}} = \frac{8.887}{\sqrt{200}} \approx 0.6284\] \[t = \frac{\bar{x} - 0}{SE_{x}} = \frac{0.545-0}{0.6284} \approx 0.8673\] This \(t\) value is smaller than any on the table under the \(df = 150\) row, so I expect the probability to be larger. We fail to reject the null hypothesis.

What type of error might we have made? Explain what the error means in the context of the application.

We may have failed to reject the null hypothesis when it should have been rejected - a type II error. This means the difference may not be zero between reading and writing scores.

Based on the results of this hypothesis test, would you expect a con???dence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning.

Since the \(p\)-value was so large, I expect the confidence interval to include 0.

DATA 606 HW 5

Steve Tipton

March 25, 2018

5.6 Working backwards, Part II.

5.14 SAT scores.

5.20 High School and Beyond, Part I.