Kuruvilla-HW-Chapter5

5.6 Working backwards, Part II. A 90% confidence interval for a population mean is (65,77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations.Calculate the sample mean, the margin of error, and the sample standard deviation

Answer First we can work out the sample mean, by finding the mid point of the confidence interval range. We also get the T * SE segment of the confidence interval equation out of these computations.

Mean = 71

Margin Of Error = (77 - 65)/2 = 6

df <- n - 1
t <- qt(.90, df)

t

## [1] 1.317836

With the t value, we can separate the standard error:

ME = tSE = t sd/sqrt(n)

ME = tsd/sqrt(n) sd = MEsqrt(n)/t

sd = ME*sqrt(n)/t 

sd

## [1] 22.76459

Standard Deviation = 22.7646 Mean = 71 Margin Of Error = 6

5.14 SAT scores. SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.

Raina wants to use a 90% confidence interval. How large a sample should she collect?

Answer :

#we will use formual n = (Z(.05)*(standard deviation)/ME)^2
z.star <- 1.65
ME <- 25
SD <- 250

sample.size <- (((z.star*SD)/(ME))^2)
sample.size

## [1] 272.25

So the sample size should be 273.

Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

Answer : Since we are multiplying the SD by a larger number, we will get a larger sample size for Luke’s 99% interval.

Calculate the minimum required sample size for Luke.

Answer :

z.star_L <- 2.58
ME_L <- 25
SD_L <- 250

sample.size_L <- (((z.star_L*SD_L)/(ME_L))^2)
sample.size_L

## [1] 665.64

So the sample size for Luke should be 666

5.20 High School and Beyond, Part I The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and serveral other subjects. Here we examine a simple random sample of 200 from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.

Is there a clear difference in the average reading and writing scores?

Answer : There is no clear difference in the average of the reading and writing scores. The difference distribution is fairly normal around the zero difference, though there is a slight skew to the right

Are the reading and writing scores of each student independent of each other?

Answer : The scores are independent of each other as simple random sampling is used to select the sample of 200.

Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in reading and writing exam?

Answer : H_0: mean_(read) - mean_(write) = 0 H_A: mean_(read) - mean_(write) does not equal 0

Check the conditions required to complete this test.

Answer : The conditions required for this test are independence and normality. 1. As stated above, the observations are independent as the sample is selected randomly. 2. The box plot provided in the text suggests the data are reasonably normally distributed and no outliers exist. Also the sample size is big enough to nullify the skewness if any.

The average observed difference in scores is -.545 and the standard deviation of the difference is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?

Answer :

n <- 200
mean.diff <- -.545
df <- n-1
SD <- 8.887
SE <- SD/sqrt(n)
T <- (mean.diff-0)/SE
pvalue <- pt(T, df)
pvalue

## [1] 0.1934182

t-value, .19 > .05 so we fail to reject the null hypothesis. we do not have convining evidence of a difference between the average reading and writing exam scores.

What type of error might we have made? Explain what the error means in the context of the application.

Answer : Since we did NOT reject the null hypothesis, we are at a risk of making a Type II error. In this instance, we should have noticed that we had convincing data that there is a difference in the reading and writing average scores but did not.

(g)Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and the writing scores to include 0? Explain your reasoning.

Answer : Being that our results indicated that there is no difference in the reading and writing scores, I would expect that the confidence interval would include 0.

5.32 Fuel Efficiency of manual and automatic cars, Part I Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency(in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmission in terms of their average city mileage? Assume that conditions of inference are satisfied.

Null: mean.A - mean.M = 0
alternate: mean.A - mean.M does not equal 0

Auto_N <- 26
Man_N <- 26
Auto_SD <- 3.58
Man_SD <- 4.51
Auto_Mean <- 16.12
Man_Mean <- 19.85
alpha <- .05
Diff_Mean <- Auto_Mean - Man_Mean
Auto_SE <- Auto_SD/sqrt(Auto_N) 
Man_SE <- Man_SD/sqrt(Man_N)
Diff_SE <- sqrt(((Auto_SE)^2)+(Man_SE)^2)
T.1 <- (Diff_Mean-0)/Diff_SE

pvalue.1 <- pt(T.1, 25)
pvalue.1 <- 2*pvalue.1 #because we are running a two tailed test, we multiply by 2
pvalue.1

## [1] 0.002883615

alpha = 0.05 and p-value is 0.0029. So we reject the null hypothesis. There is sufficient evidene to say that there is a difference between the average fuel efficiency of cars with manual and automatc transmission in terms of their average city mileage.

5.48 Work hours and education The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents. Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.

Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.

Null: mean(lessthanhs) = mean(HS) = mean(JC) = mean(B) = mean(G) Alternate: at least one of the means differ
Check conditions and describe any assumptions you must make to proceed with the test.

The observations must be independent. This is a representative survey of US residents that’s below 10% of the population, so as long as this data is truly a random sample this condition should hold. The data must also be approximately normal. The data appears normal enough from the boxplots, and for a sample size of around 1000 this should be usable. The data must also have comparable standard deviations. This condition also appears to be satisfied looking at the table, with the potential outlier of junior college with a standard deviation of 18.1.
Below is part of the output associated with this test. Fill in the empty cells.

Credit to www.khanacademy.com. They had great explanations on how to calculate the MSG, MSE, and F values. Another website with great information on how to calculate the MSG, MSE, and F values is:
http://oak.ucc.nau.edu/rh232/courses/EPS625/Handouts/One-Way%20ANOVA/Hand%20Calculation%20of%20ANOVA.pdf

#            | Df    | Sum Sq  | Mean Sq | F value | Pr(>F) 
#---------------------------------------------------------------
#degree      | 4     | 2004.1  | 501.03  | 2.19    | 0.0682
#---------------------------------------------------------------
#Residuals   | 1167  | 267382  | 229.12  | NA      | NA
#---------------------------------------------------------------
#Total       | 1171  | 269386.1| NA      | NA      | NA
#---------------------------------------------------------------

What is the conclusion of the test?

If we are using a p value < 0.05, then we fail to reject the null hypothesis. Our P value here is 0.0682.

Kuruvilla-HW-Chapter5

James Kuruvilla

April 17, 2017