5.6 Working backwards, Part II. A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.
sample <- 25
margin_error <- (77-65)/2
margin_error
## [1] 6
sample_mean <- margin_error + 65
sample_mean
## [1] 71
tVal <- qt(.95, sample-1)
tVal
## [1] 1.710882
# mean = t*sd/sqrt(n)
sd <- (sqrt(sample)*margin_error)/tVal
sd
## [1] 17.53481
Therefore, sample mean is 71, margin of error is 6 and standard deviation is 17.534
5.14 SAT scores. SAT scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students at this college as part of a class project. They want their margin of error to be no more than 25 points.
Answer:
#Use Margin error formula using T Score
t <- 1.65
sd <-250
merror <-25
((t*sd)/merror)^2
## [1] 272.25
Raina should collect 272.
Answer: Luke’s sample size should be bigger than Raina’s since he wants to determine without calculting the actual size.
Answer:
#Use Margin error formula with T Score
t <- 2.575
sd <-250
merror <-25
((t*sd)/merror)^2
## [1] 663.0625
The sample size for Luke is 663 at least.
5.20 High School and Beyond, Part I. The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects. Here we examine a simple random sample of 200 students from this survey. Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.
knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.20.JPG")
Answer: According to the picture, there is no significant difference because the (read-write) histogram is normally distributed and grouped around zero.
Answer: It might not be independent for a students reading and writing scores because its for a same student but definitely it will differ between students.
Answer: Ho: ??r ??? ??w = 0 Ha: ??r ??? ??w ??? 0
Answer: The conditions required to perform the test are: 1) Independence : They are paired they are not independent. 2) Normally distributed : This distribution looks normal. It is still possible to do other tests.
Answer:
sd <- 8.887
xbar <- -0.545
tVal <- xbar/sd
n <- 200
p <- pt(tVal, n-1)
p
## [1] 0.4755808
The p value is 0.4755 that means there is a difference between the average scores on the two exams.
Answer:
There are two types of error:
Type 1 error: In which, we reject the null when the true value is not actually different. Type 2 error: In which we do not reject the null when there is a difference between the means.
Here, we might be suffering from Type 2 error.
Answer: Yes, because we tested whether or not the means were different from each other, to reject the null hypothesis we would require the 95% Confidence Interval not to include zero. We did not reject the null, so we know that the 95% Confidence Interval does include zero.
5.32 Fuel efficiency of manual and automatic cars, Part I. Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.
knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.32.JPG")
Answer:
mMean <- 19.85
mSD <- 4.51
aMean <- 16.12
aSD <- 3.58
n <- 26
#to calculate degree of freedom
degFreedom <- (aSD**2/n + mSD**2/n)**2 / ((aSD**2/n)**2/(n-1) + (mSD**2/n)**2/(n-1))
#for the t-test
se <- sqrt(aSD**2/n + mSD**2/n)
t = (aMean - mMean)/se
pt(t, degFreedom )
## [1] 0.00091087
Here, p value is greather than 0.05, we can reject the null hypothesis.
5.48 Work hours and education. The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.48.JPG")
Answer: HO:??l= ??hs=??JrCol=??barhalors=??grad Ha:?????i?????j (b) Check conditions and describe any assumptions you must make to proceed with the test.
Answer: The observations are independant within and between groups: 1) It is most likely true. 2) The data within groups is normal. 3) This is also most likely true because we can see some individual points on the bar charts that don’t look clumped. 4) The cross group variance is relatively equal. 5) This also looks like it is satisfied.
knitr::include_graphics("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/5.48c.JPG")
Answer:
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
xbar <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
df <- data.frame (xbar, sd, n)
df %>% knitr::kable()
xbar | sd | n |
---|---|---|
38.67 | 15.81 | 121 |
39.60 | 14.97 | 546 |
41.39 | 18.10 | 97 |
42.55 | 13.62 | 253 |
40.85 | 15.51 | 155 |
Answer:
The means is different.