Questions

5.6

n <- 25

# conf interval
ci <- c(65, 77)

# sample mean
smean <- (ci[2]+ci[1]) %>%
  divide_by(2)

# margin of error
me <- (ci[2]-ci[1]) %>%
  divide_by(2)

# standard deviation
p <- .9
p2 <- p + (1-p)/2
t <- qt(p2, n-1)

se <- me/t
sd <- se * sqrt(n)

df <- data_frame(sample_mean = smean, margin_of_error = me, std_dev = sd)

knitr::kable(df, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

sample_mean	margin_of_error	std_dev
71	6	17.53481

5.14

# me = z * se
# se = sd/sqrt(n)
# n = (z*sd/me) ^ 2

z <- 1.65
me <- 25
sd <- 250

n <- (((z * sd)/me)^2) %>%
  ceiling %>%
  print

## [1] 273

b.) Luke’s sample size should be larger because a higher Z value from the higher confidence interval, making the numerator a lot larger
c.)

# me = z * se
# se = sd/sqrt(n)
# n = (z*sd/me) ^ 2

z <- 2.575
me <- 25
sd <- 250

n <- (((z * sd)/me)^2) %>%
  ceiling %>%
  print

## [1] 664

5.20

a.) There does not seem to be a clear different in the average reading and writing scores. The difference distribution is approximately normal with a center around 0.
b.) On a student by student basis, the scores are independent, but within the same student, the scores do not appear to be independent.
c.) H0: Mr - Mw = 0; HA: Mr - Mw != 0
d.) The difference histogram gives the appearance of dependence, while the box plots show a reasonably normal distribution with no great outliers.
e.) The p-value is not less than .05 so there is not enough evidence to reject the NULL.

avg_diff <- -.545
sd_diff <- 8.887
n <- 200

se <- sd_diff/sqrt(n)

# t value
t <- (avg_diff-0)/se

# p value
p <- pt(t, n-1) %>%
  print

## [1] 0.1934182

f.) In this case, since we failed to reject the NULL, we may have incorrectly rejected the alternative hypothesis which would be a Type II error.
g.) I would expect the confidence interval for the average difference to include zero because we failed to reject the NULL which indicates that zero is within the normal realm of possibility.

5.32

H0: M_diff = 0; HA: M_diff <> 0
The p-value is less than .05 and we reject the null hypothesis. This would mean there is enough evidence to conclude that there is a difference in fuel efficiency.

n <- 26

# automatics
m_auto <- 16.12
sd_auto <- 3.58

# manuals
m_manu <- 19.85
sd_manu <- 4.51

# diff
m_diff <- m_auto - m_manu

# sd diff
se <- sqrt((sd_auto^2/n) + (sd_manu^2/n))

# t value
t <- (m_diff - 0)/se
p <- pt(t, n-1) %>%
  print

## [1] 0.001441807

5.48

a.) H0: All averages are equal; HA: At least one average is not equal
b.) The observations are independent within and across groups. There are numerous outliers within each group. The variability appears to be similar across groups based on the table of deviations.
c.)

mu <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
data <- data_frame(mu, sd, n)

n <- sum(data$n)
k <- length(data$mu)

# degrees of freedom
df <- k - 1
dfr <- n - k

# f-stat
prf <- .0682
f_stat <- qf(1 - prf, df, dfr)

# msd
msd <- 501.54
msr <- msd / f_stat

ssd <- df * msd
ssr <- 267382

# sst = ssd + ssr
sst <- ssd + ssr
dft <- df + dfr

final_df <- data_frame(df = c(df, dfr, dft), sum_sq = c(ssd, ssr, sst), mean_sq = c(msd, msr, NA), f_val = c(f_stat, NA, NA), prf = c(prf, NA, NA))

knitr::kable(final_df, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

df	sum_sq	mean_sq	f_val	prf
4	2006.16	501.5400	2.188931	0.0682
1167	267382.00	229.1255	NA	NA
1171	269388.16	NA	NA	NA

d.) Since the p-value is greater than .05, there is not enough evidence to reject the NULL.

Data 606 - HW5

Questions

5.6

5.14

5.20

5.32

5.48