Q.5.6
#margin of error
upper = 77
lower = 65
ME = (upper - lower) / 2
ME
## [1] 6
#sample mean
mean = upper - ME
mean
## [1] 71
#standard deviation
##upper = mean + z*(sd/sqrt(n))
##lower = mean - z*(sd/sqrt(n))
z = 1.645
n = 25
##ME = z*(sd/sqrt(n))
sd = (ME*sqrt(n))/z
sd
## [1] 18.23708
Q.5.14
#a)
sd = 250
ME = 25
##ME = z*(sd/sqrt(n))
z = 1.645
##sqrt(n) = (z*sd)/ME
n = ((z*sd)/ME)^2
n
## [1] 270.6025
#b)
##Given that n becomes larger as z increases, Luke's sample size should be larger than Raina's.
#c)
z = 2.576
n = ((z*sd)/ME)^2
n
## [1] 663.5776
Q.5.20
#a)
#since distribution of differences in scores is fairly symmetric (normally distributed), the difference in the average reading and writing scores is not that large.
#b)
#since distribution of differences in scores is fairly symmetric (normally distributed), sample size is bigger than 30 and the sample size is less than 10% of the population, we can say they are independent of each other.
#c)
#H0: mu_score_reading = mu_score_writing
#Ha: mu_score_reading != mu_score_writing
#d)
#The reading and writing scores of each student should be independent (the sample size should be less than 10% of the population and samples are randomly selected), sample size should be larger than 30 and differences in scores should be fairly symmetric (normally distributed).
#e)
x_diff = -0.545
null = 0
sd_diff = 8.887
n = 200
SE = sd_diff/sqrt(n)
df = 199
t_score = (x_diff - null)/SE
t_score
## [1] -0.867274
pt(t_score, df=df)*2
## [1] 0.3868365
#P-value is larger than 0.05 so we fail to reject null-hypothesis. There is no difference between average score of reading and average score of writing. The sample does not provide convincing evidence for a difference between average score of reading and average score of writing.
#f)
#We might have type 2 error where you falsely fail to reject the null hypothesis.
#g)
#the confidence interval will include 0 as we failed to reject null hypothesis.
Q.5.32
#H0: mu_mean_man = mu_mean_auto
#Ha: mu_mean_man != mu_mean_auto
mean_man = 19.85
mean_auto = 16.12
sd_man = 4.51
sd_auto = 3.58
n = 26
x_diff = (mean_man - mean_auto)
null = 0
sd_diff = (sd_man - sd_auto)
SE_diff = sqrt((sd_man^2/n)+(sd_auto^2/n))
df = 25
t_score = (x_diff - null)/SE_diff
t_score
## [1] 3.30302
(1 - pt(t_score, df=df)) * 2
## [1] 0.002883615
#Since p-value is less than 0.05, we reject null hypothesis in favor of alternative hypothesis.
Q.5.48
#a)
#H0: mu_<hs = mu_hs = mu_jr_col = mu_bac = mu_grad
#Ha: at least one of the mean is different
#b)
#The oberservations should be independent within and across groups, the data within each group are nearly normal and the variability across the groups should be about equal.
#c)
k <- 5
n <- 1172
MSG <- 501.54
SSE <- 267382
p <- 0.0682
df_g <- k-1
df_e <- n-k
df_t <- df_g + df_e
df <- c(df_g, df_e, df_t)
MSE <- SSE / df_e
MS <- c(MSG, MSE, NA)
SSG <- df_g * MSG
SST <- SSG + SSE
SS <- c(SSG, SSE, SST)
F <- MSG / MSE
annova <- data.frame(df, SS, MS, c(F, NA, NA), c(p, NA, NA))
colnames(annova) <- c("df", "Sum_Sq", "Mean_Sq", "F_Value", "Pr(>F)")
rownames(annova) <- c("degree", "Residuals", "Total")
annova
## df Sum_Sq Mean_Sq F_Value Pr(>F)
## degree 4 2006.16 501.5400 2.188992 0.0682
## Residuals 1167 267382.00 229.1191 NA NA
## Total 1171 269388.16 NA NA NA
#d)
#Since p-value is larger than 0.05 (0.0682), we fail to reject null-hypothesis at 0.05 significance level. There is no difference in the average number of hours worked for the five groups.