Q.5.6

#margin of error
upper = 77
lower = 65
ME = (upper - lower) / 2
ME
## [1] 6
#sample mean
mean = upper - ME
mean
## [1] 71
#standard deviation
##upper = mean + z*(sd/sqrt(n))
##lower = mean - z*(sd/sqrt(n))

z = 1.645
n = 25

##ME = z*(sd/sqrt(n))

sd = (ME*sqrt(n))/z
sd
## [1] 18.23708

Q.5.14

#a)
sd = 250
ME = 25
##ME = z*(sd/sqrt(n))
z = 1.645

##sqrt(n) = (z*sd)/ME
n = ((z*sd)/ME)^2
n
## [1] 270.6025
#b)
##Given that n becomes larger as z increases, Luke's sample size should be larger than Raina's.

#c)
z = 2.576
n = ((z*sd)/ME)^2
n
## [1] 663.5776

Q.5.20

#a)
#since distribution of differences in scores is fairly symmetric (normally distributed), the difference in the average reading and writing scores is not that large.

#b)
#since distribution of differences in scores is fairly symmetric (normally distributed), sample size is bigger than 30 and the sample size is less than 10% of the population, we can say they are independent of each other.

#c)
#H0: mu_score_reading = mu_score_writing
#Ha: mu_score_reading != mu_score_writing

#d)
#The reading and writing scores of each student should be independent (the sample size should be less than 10% of the population and samples are randomly selected), sample size should be larger than 30 and differences in scores should be fairly symmetric (normally distributed).

#e)
x_diff = -0.545
null = 0
sd_diff = 8.887
n = 200
SE = sd_diff/sqrt(n)
df = 199

t_score = (x_diff - null)/SE
t_score
## [1] -0.867274
pt(t_score, df=df)*2
## [1] 0.3868365
#P-value is larger than 0.05 so we fail to reject null-hypothesis. There is no difference between average score of reading and average score of writing. The sample does not provide convincing evidence for a difference between average score of reading and average score of writing.

#f)
#We might have type 2 error where you falsely fail to reject the null hypothesis.

#g)
#the confidence interval will include 0 as we failed to reject null hypothesis.

Q.5.32

#H0: mu_mean_man = mu_mean_auto
#Ha: mu_mean_man != mu_mean_auto

mean_man = 19.85
mean_auto = 16.12
sd_man = 4.51
sd_auto = 3.58
n = 26

x_diff = (mean_man - mean_auto)
null = 0
sd_diff = (sd_man - sd_auto)

SE_diff = sqrt((sd_man^2/n)+(sd_auto^2/n))
df = 25

t_score = (x_diff - null)/SE_diff
t_score
## [1] 3.30302
(1 - pt(t_score, df=df)) * 2
## [1] 0.002883615
#Since p-value is less than 0.05, we reject null hypothesis in favor of alternative hypothesis.

Q.5.48

#a)
#H0: mu_<hs = mu_hs = mu_jr_col = mu_bac = mu_grad
#Ha: at least one of the mean is different 

#b)
#The oberservations should be independent within and across groups, the data within each group are nearly normal and the variability across the groups should be about equal.

#c)
k <- 5
n <- 1172
MSG <- 501.54
SSE <- 267382
p <- 0.0682


df_g <- k-1
df_e <- n-k
df_t <- df_g + df_e
df <- c(df_g, df_e, df_t)


MSE <- SSE / df_e
MS <- c(MSG, MSE, NA)


SSG <- df_g * MSG
SST <- SSG + SSE
SS <- c(SSG, SSE, SST)

F <- MSG / MSE


annova <- data.frame(df, SS, MS, c(F, NA, NA), c(p, NA, NA))
colnames(annova) <- c("df", "Sum_Sq", "Mean_Sq", "F_Value", "Pr(>F)")
rownames(annova) <- c("degree", "Residuals", "Total")

annova
##             df    Sum_Sq  Mean_Sq  F_Value Pr(>F)
## degree       4   2006.16 501.5400 2.188992 0.0682
## Residuals 1167 267382.00 229.1191       NA     NA
## Total     1171 269388.16       NA       NA     NA
#d)
#Since p-value is larger than 0.05 (0.0682), we fail to reject null-hypothesis at 0.05 significance level. There is no difference in the average number of hours worked for the five groups.