5.6 Working backwards, Part II. A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

Answer: As we are calculating 90% confidence level here, that means 5% on each side. df = n-1, hence df = 24 in this case Calculating the t-value:

t5.6 <- abs(qt(p=0.05, df=24))
t5.6
## [1] 1.710882

Calculating the standard error:

SE_5.6 <- (77-65)/(2 * t5.6)
SE_5.6
## [1] 3.506963

Calculating the standard deviation

sd_5.6 <- SE_5.6 * sqrt(25)
sd_5.6
## [1] 17.53481

Calculating the Margin of error:

ME_5.6 <- t5.6 * SE_5.6
ME_5.6
## [1] 6

5.14 SAT scores. SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points. (a) Raina wants to use a 90% confidence interval. How large a sample should she collect?

Answer: Calculating z-score for a normal distribution for 95% CI

sd_5.14 <- 250
Z_5.14_Raina <- qnorm(0.95, mean = 0, sd = 1)

Deriving the formula for this question. now, Margin Error (ME) < 25 Z * SE < 25

Z * (sd/sqrt(n)) < 25

n > (Z * sd / 25)^2 Using this in R:

n_5.14_Raina <- (Z_5.14_Raina * sd_5.14 / 25)^2
n_5.14_Raina
## [1] 270.5543

Hence Raina should collect at least 271

  1. Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

Answer: Calculating z-score for a normal distribution for 99% CI

Z_5.14_Luke <- qnorm(0.995, mean = 0, sd = 1)
Z_5.14_Luke
## [1] 2.575829

Bigger the Z-score, bigger the sample. As Luke’s Z-score is bigger, Luke’s sample will have to be larger.

  1. Calculate the minimum required sample size for Luke.

Answer:

#Using the same formula for Luke - using Z-score for Luke
n_5.14_Luke <- (Z_5.14_Luke * sd_5.14 / 25)^2
n_5.14_Luke
## [1] 663.4897

5.20 High School and Beyond, Part I. The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects. Here we examine a simple random sample of 200 students from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the di???erences in scores are shown below.

  1. Is there a clear difference in the average reading and writing scores?

Answer: The difference in the average reading and writing follows a nearly normal distribution pattern.

  1. Are the reading and writing scores of each student independent of each other?

Answer: The 2 data sets are paired as each observation of the reading score has a corresponding observation in the writing data set. That means tfor each student, the reading and writing goes hand in hand.

  1. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?

Answer: H0 : The average difference in reading and writing skills of students is zero ??-reading - ??-writing = 0 HA : The average difference in reading and writing skills of students is not zero. that means the reading and writing scores are independent. ??-reading - ??-writing <> 0

  1. Check the conditions required to complete this test. Answer:
  1. The total population of high school seniors overall should be big, and hence the sample of 200 is less than 10% of the population. That means the observations are independent.

  2. The sample size is quite large, n=200.

  3. The difference between the reading and writing scores follow a nearly normal distribution.

Hence, we can safely say that the inference conditions are met for this test.

  1. The average observed difference in scores is ¯xread???write = ???0.545, and the standard deviation of the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?

Answer :

x_bar_5.20 <- -0.545
x_null_5.20 <- 0
sd_5.20 <- 8.887
df <- 199

se_5.20 <- sd_5.20 / sqrt(200)
T_value_5.20 <- (x_bar_5.20 - x_null_5.20) / se_5.20

T_value_5.20
## [1] -0.867274
p_value_5.20 <- 2*pt(T_value_5.20, df = 199, lower.tail = TRUE)
p_value_5.20
## [1] 0.3868365

p-value = 0.39, and hence p-value > significance value, we cannot reject H0. That means we cannot reject that there is no difference between the average score in 2 exams

  1. What type of error might we have made? Explain what the error means in the context of the application. Answer: This might fall under Type 2 error. We cannot reject H0 due to to p-value in e above. However there is a possibility that the 2 average values difference was actually there. That means H0 was proven to be true but H1 was truth. That means that Type 2 error can happen.

  2. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning. Answer: Yes, as H0 was not rejected, the null value (difference = 0) will fall within the CI.