DATA606 WK8 HMWK: Inference for Numerical Data

5.6. Working backwards, Part II. A 90% con???dence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This con???dence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

n = 25
df: 25 - 1 =24

Mean is the midpoint in the interval

m <- (65 + 77) / 2
m

## [1] 71

Margin of Error

me <- m - 65 #OR
me <- 77 - m
me

## [1] 6

Standard Deviation

n <- 25
df <- 24
t_critical <- qt(0.9, df) 
se <- me/t_critical
sd = se * sqrt(n); sd

## [1] 22.76459

5.14. SAT scores. SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.

Raina wants to use a 90% confidence interval. How large a sample should she collect?

Confidence level = 0.90 => 1.65 (z-score) Equation: \(1.65 * \frac{s}{\sqrt{n}} = 25\)

me <- 25
me / 1.65

## [1] 15.15152

s <- 250
sqrt_n <- 250 / 15.15152
sqrt_n

## [1] 16.49999

samp_size <- sqrt_n^2
samp_size

## [1] 272.2498

Approximately 273

Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

Luke’s sample size would be bigger because he will need more cases to fit that confidence interval. His confidence level will cover Raina’s interval and more.

Calculate the minimum required sample size for Luke.

Confidence level = 0.99 => 2.58 (z-score)

me <- 25
me / 2.58

## [1] 9.689922

s <- 250
sqrt_n <- 250 / 9.689922
sqrt_n

## [1] 25.8

samp_size <- sqrt_n^2
samp_size

## [1] 665.6401

Approximately 666

5.20. High School and Beyond, Part I.

Is there a clear difference in the average reading and writing scores? YES
Are the reading and writing scores of each student independent of each other?
YES. Sample is random, less than 10% of the population and > 30
Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?

\(H_0: \mu = 0\)There is no difference in the average scores from reading and writing among the students
\(H_A: \mu \neq 0\) There is a difference in the average scores from reading and writing among the students

Check the conditions required to complete this test. Independence: Sample is random, less than 10% of the population and > 30
Slight skew but we can be lenient due to the fact that the sample size is greater than 30.
The average observed difference in scores is \(\bar{x}_{read-write} = -0.545\), and the standard deviationof the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?

N.B This proves that the writing scores are higher.

\(T = \frac{-0.545 - 0}{8.887 / \sqrt{200}}\)

t_critical <- (-0.545 - 0 )/ (8.887/ sqrt(200))
t_critical

## [1] -0.867274

df <- 200-1
p_value <- pt(t_critical, df)*2
p_value

## [1] 0.3868365

P-value is greater than 0.05 so we failed to reject \(H_0\)

What type of error might we have made? Explain what the error means in the context of the application.

Type 2 Error. We should have rejected the null hypothesis because it clearly showed that the students did better in writing than reading. We were not able to detect it.

Based on the results of this hypothesis test, would you expect a con???dence interval for the average di???erence between the reading and writing scores to include 0? Explain your reasoning.

YES. Since we failed to reject \(H_0\) which has a null value of 0

5.32. Fuel efficiency of manual and automatic cars, Part I.

\(H_0: \mu_m = \mu_a\)
\(H_A: \mu_m \neq \mu_a\)

mean = \(\bar{x_m} - \bar{x_a}\) \(SE = \sqrt{\frac{S_m^2}{n_m} + \frac{S_a^2}{n_a}}\)

mean = 19.85 - 16.12
SE <- sqrt(((3.58^2) / 26) + ((4.51^2) / 26))

t_critical <- (mean - 0) / SE
t_critical

## [1] 3.30302

df <- 26 - 1

p_value <- pt(t_critical, df, lower.tail = F) * 2

P-value is less than 0.05 so we reject \(H_0\). The data does provide strong evidence of a difference between the average fuel of cars with manual and automatic transmission.

5.48. Work hours and education

Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.

\(H_0: \mu_{>hs} = \mu_{hs} = ... = \mu_{grad}\) The average number of hours worked is the same in all groups. Any observed difference is due to chance.
\(H_A:\mu_{>hs} \neq \mu_{hs} \neq ... \neq \mu_{grad}\) The average hours varies by education.

Check conditions and describe any assumptions you must make to proceed with the test.

Independence: It was not stated whether the sample was random however, the sample size is greater than 30 and is less than 10% of the population size.
Normality: there are some deviations from normality for all groups but not too extreme. It is not a substantial concern since there are more than 30 observations in each groups.
Variance: The standard deviation varies a bit from one group to the next but any uncertainty will be reported in the final results.

Below is part of the output associated with this test. Fill in the empty cells.

	Df	Sum Sq	Mean Sq	F-Value	\(Pr(>F)\)
degree	4	2006.16	501.54	2.17	0.0682
Residuals	1167	267,382	230.84
Total	1172	269,388.16

What is the conclusion of the test?

Since the P-value is greater than 0.05, it indicates that there is not enough evidence to reject \(H_0\). That is. the data does not provide strong evidence that the average amount of hours worked varies by degree/education level.

DATA606 WK8 HMWK: Inference for Numerical Data

Javern Wilson

March 24, 2019

5.20. High School and Beyond, Part I.

5.32. Fuel efficiency of manual and automatic cars, Part I.

5.48. Work hours and education