Homework5

5.6 Working backwards, Part II. A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

sample <- 25
margin_error <- (77-65)/2
margin_error

## [1] 6

sample_mean <- margin_error + 65
sample_mean

## [1] 71

tVal <- qt(.95, sample-1)
tVal

## [1] 1.710882

# mean = t*sd/sqrt(n)
sd <- (sqrt(sample)*margin_error)/tVal
sd

## [1] 17.53481

Therefore, sample mean is 71, margin of error is 6 and standard deviation is 17.534

5.14 SAT scores. SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.

Raina wants to use a 90% confidence interval. How large a sample should she collect?

Answer:

#Use Margin error formula using T Score
t <- 1.65
sd <-250
merror <-25

((t*sd)/merror)^2

## [1] 272.25

Raina should collect 272.

Luke wants to use a 99% confidence interval. Without calculating the actual sample size, determine whether his sample should be larger or smaller than Raina’s, and explain your reasoning.

Answer: Luke’s sample size should be bigger than Raina’s since he wants to determine without calculting the actual size.

Calculate the minimum required sample size for Luke.

Answer:

#Use Margin error formula with T Score
t <- 2.575
sd <-250
merror <-25

((t*sd)/merror)^2

## [1] 663.0625

The sample size for Luke is 663 at least.

5.20 High School and Beyond, Part I. The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects. Here we examine a simple random sample of 200 students from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.

knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.20.JPG")

Is there a clear di???erence in the average reading and writing scores?

Answer: According to the picture, there is no significant difference because the (read-write) histogram is normally distributed and grouped around zero.

Are the reading and writing scores of each student independent of each other?

Answer: It might not be independent for a students reading and writing scores because its for a same student but definitely it will differ between students.

Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?

Answer: Ho: ??r ??? ??w = 0 Ha: ??r ??? ??w ??? 0

Check the conditions required to complete this test.

Answer: The conditions required to perform the test are: 1) Independence : They are paired they are not independent. 2) Normally distributed : This distribution looks normal. It is still possible to do other tests.

The average observed difference in scores is ¯xread???write = ??? 0.545, and the standard deviation of the differences is 8.887 points. Do these data provide convincing evidence of a difference between the average scores on the two exams?

Answer:

sd <- 8.887
xbar <- -0.545
tVal <- xbar/sd
n <- 200
p <- pt(tVal, n-1)
p

## [1] 0.4755808

The p value is 0.4755 that means there is a difference between the average scores on the two exams.

What type of error might we have made? Explain what the error means in the context of the application.

Answer:

There are two types of error:

Type 1 error: In which, we reject the null when the true value is not actually different. Type 2 error: In which we do not reject the null when there is a difference between the means.

Here, we might be suffering from Type 2 error.

Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? Explain your reasoning.

Answer: Yes, because we tested whether or not the means were different from each other, to reject the null hypothesis we would require the 95% Confidence Interval not to include zero. We did not reject the null, so we know that the 95% Confidence Interval does include zero.

5.32 Fuel efficiency of manual and automatic cars, Part I. Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.

knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.32.JPG")

Answer:

mMean <- 19.85
mSD <- 4.51
aMean <- 16.12
aSD <- 3.58
n <- 26

#to calculate degree of freedom
degFreedom <- (aSD**2/n + mSD**2/n)**2 / ((aSD**2/n)**2/(n-1) + (mSD**2/n)**2/(n-1))

#for the t-test
se <- sqrt(aSD**2/n + mSD**2/n)

t = (aMean - mMean)/se
pt(t, degFreedom )

## [1] 0.00091087

Here, p value is greather than 0.05, we can reject the null hypothesis.

5.48 Work hours and education. The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.

knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.48.JPG")

Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.

Answer: HO:??l= ??hs=??JrCol=??barhalors=??grad Ha:?????i?????j (b) Check conditions and describe any assumptions you must make to proceed with the test.

Answer: The observations are independant within and between groups: 1) It is most likely true. 2) The data within groups is normal. 3) This is also most likely true because we can see some individual points on the bar charts that don’t look clumped. 4) The cross group variance is relatively equal. 5) This also looks like it is satisfied.

Below is part of the output associated with this test. Fill in the empty cells.

knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.48c.JPG")

Answer:

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.5.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

xbar <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
df <- data.frame (xbar, sd, n)
df %>% knitr::kable()

xbar	sd	n
38.67	15.81	121
39.60	14.97	546
41.39	18.10	97
42.55	13.62	253
40.85	15.51	155

What is the conclusion of the test?

Answer:

The means is different.

Homework5

sudhan_maharjan

October 28, 2018