5.6, 5.14, 5.20, 5.32, 5.48

5.6

A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

n = 25
ME = (77-65)/2
t_value = qt(.95,n-1)
# ME = t_value * SE
# SE = ME/t_value
SE = ME/t_value

x_bar = (77+65)/2
sd = SE*sqrt(n)

sample mean = 71 margin of error = 6 sample standard deviation = 17.53

5.14

SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.

sd = 250
ME = 25

(a) Raina wants to use a 90% confidence interval. How large a sample should she collect?

z_coef = 1.645
#ME = z_score*sd/sqrt(n)
n = (z_coef*sd/ME)^2
n
## [1] 270.6025

There should be at leat 271 students

(b) Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

A CI of 99% would be wider than the CI of 90%. Since we’re increasing the z_coef we’d also be increasing the value of ((z_coef)*SD/ME)^2, which equals n. Therefore I’d expect a larger sample.

(c) Calculate the minimum required sample size for Luke.

z_coef = 2.576
sd = 250
ME= 25
n = (z_coef*sd/ME)^2

Luke would need at least a sample of 664 students

5.20

High School and Beyond, Part I. The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects. Here we examine a simple random sample of 200 students from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.

(a) Is there a clear difference in the average reading and writing scores? There does not appear to be a clear difference in the average between reading and writing scores

(b) Are the reading and writing scores of each student independent of each other? The score of each students are independent of each other

(c) Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?

H0: The difference between reading and writing scores for each student is 0.

Ha: THe different between reading and writing scores for each student does not equal 0.

(d) Check the conditions required to complete this test. Idenpendence per part (b)

Normal distribution, the plots above seem to show normal distribution

(e) The average observed difference in scores is x ̄read write = 0.545, and the standard deviation of the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?

n = 200 
df = n-1
mean = -.545
SD = 8.887

SE = SD/sqrt(n)

t_value = (mean)/SE

p_value = pt(t_value,df)

p_value
## [1] 0.1934182

p_value > 0.05, therefore we fail to reject the null hypothesis.

(f) What type of error might we have made? Explain what the error means in the context of the application.

A type II error by rejecting the Ha hypothesis and stating that there is not a difference in the average scores.

(g) Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning.

Yes.I’d expect the difference to include zero. Since the distribution appears to be normal with the mean appearing to be centered around 0, I’d expect the CI to include 0.

5.32

Each year the US Environ- mental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.

H_0 = The diff between avg miles for Auto and Manual cars is 0 H_a - The diff between avg miles for Auto and Manual cars does not equal 0

n < 26
## [1] FALSE
sd_auto = 3.58
sd_man = 4.51
mean_auto = 16.12
mean_man = 19.85

mean_diff = mean_auto - mean_man

se_diff = sqrt((sd_auto^2/n) + (sd_man^2/n))

t_value = (mean_diff)/se_diff

p_value = pt(t_value,n-1)
p_value
## [1] 3.19742e-17

Since the p_value is less than 0.05, we reject H_0 and conclude H_a. There is evidence to suggest there is a difference between average mpg in auto and manual cars.

5.48

The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents. Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.

(a) Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.

H_0: The difference of all means are equal. H_a: At least 1 mean is not equal to the others

(b) Check conditions and describe any assumptions you must make to proceed with the test. We assume independence per the description.

It looks as if the box plots show normal distributions with the exception of the Less than HS and Graduate group wich have a slight skew, and the Bachelor’s group which has an apparent skew.

Variablility seems equal in every group

(c) Below is part of the output associated with this test. Fill in the empty cells.

#Create the data table
educational_attainment <- as.data.frame(rbind(c(38.67,39.6,41.39,42.55,40.85,40.45), c(15.81,14.97,18.1,13.62,15.51,15.17), c(121,546,97,253,155,1172)))
colnames(educational_attainment) = c("Less than HS", "HS","Jr Coll","Bachelor's", "Graduate","Total")
row.names(educational_attainment) = c("Mean","SD","n")
educational_attainment
##      Less than HS     HS Jr Coll Bachelor's Graduate   Total
## Mean        38.67  39.60   41.39      42.55    40.85   40.45
## SD          15.81  14.97   18.10      13.62    15.51   15.17
## n          121.00 546.00   97.00     253.00   155.00 1172.00
degree_Df <- 4
Resideuals_Df <- educational_attainment["n","Total"]-(degree_Df+1)
Total_Df <- Resideuals_Df + degree_Df

pr_f <- 0.0682
f_stat <- qf(1-pr_f,degree_Df,Resideuals_Df)

mean_sq = 501.54
Residuals_Mean_sq = mean_sq/f_stat


sum_sq = 267382
degree_sum_sq = mean_sq*degree_Df
Total_sum_sq = sum_sq + degree_sum_sq

table <- as.data.frame(rbind(c(degree_Df,degree_sum_sq,mean_sq,f_stat,pr_f), c(Resideuals_Df,sum_sq,Residuals_Mean_sq, NA,NA), c(Total_Df,Total_sum_sq,NA,NA,NA)))

colnames(table) <- c("Df", "Sum Sq","Mean Sq", "F Value", "Pr(>F)")
row.names(table) <- c("degree","Residuals","Total")

table
##             Df    Sum Sq  Mean Sq  F Value Pr(>F)
## degree       4   2006.16 501.5400 2.188931 0.0682
## Residuals 1167 267382.00 229.1255       NA     NA
## Total     1171 269388.16       NA       NA     NA

(d) What is the conclusion of the test?

Per the p value 0.0682 > 0.05, we fail to reject the null hypothesis.