DATA606_Assignment5

mean <- (65 + 77)/2

#  CI = mean +/- (Margin of Error) --> (Margin of Error) = CI - mean
margin.error <- 77 - mean

#(Margin of Error) = T * (standard deviation)/sqrt(n observations)
st.dev <- (margin.error/2.06) * sqrt(25)

paste("Sample Mean: ", mean)

## [1] "Sample Mean:  71"

paste("Margin of Error: ", margin.error)

## [1] "Margin of Error:  6"

paste("Sample Standard Deviation: ", round(st.dev,2))

## [1] "Sample Standard Deviation:  14.56"

Ans a:

z <- qnorm(.95, mean = 0, sd = 1)
sat.sd <- 250
margin.error2 <- 25
sat.size <- (sat.sd/(margin.error2/z))^2
paste("Raina should collect", round(sat.size,0), "students.")

## [1] "Raina should collect 271 students."

Ans b: Luke’s sample will need to be bigger.

Ans c:

z1 <- qnorm(.995, mean = 0, sd = 1)
sat.sd <- 250
margin.error3 <- 25
sat.size1 <- (sat.sd/(margin.error3/z1))^2
paste("Luke should collect", round(sat.size1,0)+1, "students.")

## [1] "Luke should collect 664 students."

Ans a: There is no clear difference in the average reading and writing scores on visual inspection.

Ans b: Given that this is a 200 student sampling form a large database from the National Center of Education Statistics, it is presumed that they are independent as well as the sampling is likely less than 10% of the total sample in the survey.

Ans c: H0 (Null hypothesis): The difference in the average scores of students in the reading and writing exam == 0 HA (Alternative hypothesis): The difference in the average scores of students in the reading and writing exam != 0

Ans d: Because these are samples picked from a survey and represent less than 10% of of the entire survey, it is likely independent. Given that this is a large database, it is presumed that it likely has a normal (or near normal) distribution. And even if there is a skew, this can be ignored given the fact that there is 200 in the sample size. The box plots demonstrate little skew as well.

Ans e:

st.er <- 8.887/sqrt(200)
Tscore <- (0.545 - 0)/st.er
paste("T score:", round(Tscore,3))

## [1] "T score: 0.867"

Ans f: This may potentially be a type II error. This means that we may have not enough samples to power our study to detect a difference.

Ans g: Yes, since we failed to reject H0, which had a null value of 0.

manual.mean <- 19.85
manual.sd <- 4.51
manual.n <- 26
automatic.mean <- 16.12
automatic.sd <- 3.58
automatic.n <- 26
mean.diff.car <- manual.mean - automatic.mean

# Find standard of error
car.se <- sqrt((3.58^2/26) + (4.51^2/26))

# Search for T value
car.T <- (mean.diff.car - 0)/car.se
paste("T score:", round(car.T,2))

## [1] "T score: 3.3"

Ans a:

H0 (Null hypothesis): The average hours worked among all 5 groups is equal. In other words, the differences in mean hours worked == 0
HA (Alterative hypothesis): Differents in mean hours worked among all 5 groups != 0

Ans b: It is reasonable to assume that all 1172 respondents probably did not know each other, making them independent. Also, given that these are 1172 respondents for the entire country, it is also safe to assume that the sample taken is < 10% of the population. The standard deviations look fairly similar to each other, thus each group has similar variance. And we’ll need to make the assumption that the distribution is normal. However, even if the distribution was skewed, given that we have 1172 respondents, the ANOVA can still be performed given the high N number.

Ans c:

#             | Df    | Sum Sq  | Mean Sq | F value | Pr(>F) 
#_____________________________________________________________________________________
# degree      | 4     | 2004.1  | 501.03  | 2.19    | 0.0682
# Residuals   | 1167  | 267382  | 229.12  |
# Total       | 1171  | 269386.1

Ans d: p value < 0.05 wil fail to reject the null hypothesis. Our P value here is 0.0682.

DATA606_Assignment5

Niteen Kumar

March 25, 2018