Data 606 HW 5

Graded: 5.6, 5.14, 5.20, 5.32, 5.48

5.6

A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

Due to the small sample size, we use a T-Disribution with 24df

n<-25
(sample_mean <- (65+77)/2)

## [1] 71

df<-24
t<-1.71
se <- (77 - 71)/1.71
(margin_of_error <- t*se)

## [1] 6

(sample_std_dev <- se * sqrt(n))

## [1] 17.54386

5.14 SAT Scores

SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.

Raina wants to use a 90% confidence interval. How large a sample should she collect?
Raina needs a sample size of 271

sd <- 250
z <- qnorm(.95, 0,1)
n <- ((z^2) / (25^2)) * sd^2
round(n)

## [1] 271

Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

Luke needs a larger sample size for a higher confidence interval

Calculate the minimum required sample size for Luke. Luke needs a sample size of at least 663

z <- qnorm(.995, mean = 0, sd = 1)
n <- ((z^2) / (25^2)) * 250^2
round(n)

## [1] 663

5.20

Is there a clear difference in the average reading and writing scores?
There is not a clear difference between average reading and writing scores based on the boxplot and distributions of differences. The distribution appears to be normal centered around zero, and the boxplots have similar median and variance
Are the reading and writing scores of each student independent of each other?
No, in this case they are paired since each observation has one of each score
Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?

\(H_0\) : \(\mu_{diff}= 0\)
\(H_A\) : \(\mu_{diff}≠ 0\)

Check the conditions required to complete this test.
** 1) The differences were taken from a random sample and are presumed to be less than 10% of the population, thereforore they are independant.**
** 2) The normal distribution is appropriate as distribution appears to be nearly normal and sample size is greater than 30 **
The average observed difference in scores is x ̄read write = 0.545, and the standard deviation of the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?
Since the p-value is greater than .05, we fail to reject the null hypothesis. As a result, the data does not provide convincing evidence of a difference between the average scores on the two exams

se <- 8.887/sqrt(200)
t <- (-.545-0)/se
(p <- pt(t,199,lower.tail=TRUE)*2)

## [1] 0.3868365

What type of error might we have made? Explain what the error means in the context of the application.
We could have made a type 2 error, failing to reject the null hypothesis when in fact there is indeed a difference in scores
Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning. Yes, based on the above analysis I would expect the confidence interval to include 0 since it falls witin less than 1 SE of the observed mean

5.32

Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.
** As the p-value is .0014, there is strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions. **

auto_mu <- 16.12
auto_sd <- 3.58
man_mu <- 19.85
man_sd <- 4.51
n <- 26

diff <- man_mu - auto_mu 
se <- sqrt((auto_sd^2/n) + (man_sd^2/n))
t <- (diff - 0)/se
(p = pt(t, n-1, lower.tail = FALSE))

## [1] 0.001441807

5.48

Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.
\(H_0\) : Mean hours worked is the same across all groups.
\(H_A\) : At least one group has a different mean hours worked.
Check conditions and describe any assumptions you must make to proceed with the test.

We assume the sample is randomly selected, is less than 10% of the population, and is larger than 30 observations.
The boxplots appear to be mostly normal and with similar values except for the Bachelor’s group, although as the number of observations in this groups is 253 there isn’t a concern regarding skew.

Below is part of the output associated with this test. Fill in the empty cells.
What is the conclusion of the test?
Using a significance level of a=.05, our p-value of .07 is greater than our significance level. As a result, we’d fail to reject the null hypothesis.

Data 606 HW 5

John Perez

3/22/2019

5.6

5.14 SAT Scores

5.20

5.32

5.48