HW 5

5.6

x1 <- 65 
x2 <- 77
n <- 25
SampleMean <- (x2+x1)/2
SampleMean
## [1] 71
MarginError <- (x2-x1)/2
MarginError
## [1] 6
df <- 25-1
t <- qt(.95, df)
t
## [1] 1.710882
sd <- (MarginError/t)*5
sd
## [1] 17.53481

\[Sample\quad Mean\quad =\quad \frac { { x }_{ 2 }+{ x }_{ 1 } }{ 2 } \]
\[Sample Mean =71\] \[Margin\quad of\quad Error =\quad \frac { { x }_{ 2 }2{ x }_{ 1 } }{ 2 }\] \[Margin\quad of\quad Error = 6\] \[SD = (ME/t)*5\] \[SD = 17.53\]

5.14

a)

z <- 1.65 #90% CI
ME <- 25
SD <- 250

Raina <- round(((z*SD)/ME)^2,0)
Raina
## [1] 272

b)

Luke’s sample size would need to be larger, with a 99% confidence interval his z score will be larger making the result of multiplying by the SD larger.

c)

z <- 2.58 #99% CI
ME <- 25
SD <- 250

Luke <- round(((z*SD)/ME)^2,0)
Luke
## [1] 666

5.20

a)

There does not seem to be a clear difference in the average reading and writing scores

b)

The reading and writing scores of each student are independent of each other

c)

\[{ H }_{ 0 }:\quad { \mu }_{ reading }-{ \mu }_{ writing }=\quad 0\]
\[{ H }_{ A }:\quad { \mu }_{ reading }-{ \mu }_{ writing }\neq \quad 0\]

d)

The obersvations are independent and the distrubtion is normal with no skew.

e)

mu <- -.545
df <- n-1
SD <- 8.887
n <- 200

SE <- SD/sqrt(n)

t <- (mu-0)/SE

p <- pt(t, df)
p
## [1] 0.1971904

THe p-value is greater than 0.05 so we cannot to reject the null hypothesis. There is no convincing evidence that of a difference between the average reading and writing exam scores.

f)

We made have made a Type II error in rejecting the alternative hypothesis and wrongly concluded that there is no a difference in the average reading and writing scores.

g)

With our conclusion that there is no difference we would expect 0 to be in our confidence interval.

5.32

\[{ H }_{ 0 }:\quad { \mu }_{ Auto }-{ \mu }_{ manual }\quad =\quad 0\]

\[{ H }_{ A }:\quad { \mu }_{ Auto }-{ \mu }_{ manual }\quad \neq \quad 0\]

n <- 26

SDauto <- 3.58
SDmanual <- 4.51

mdiff <- 16.12 - 19.85



SEauto <- SDauto/sqrt(n) 
SEmanual <- SDmanual/sqrt(n)

SE <- sqrt(((SEauto)^2)+(SEmanual)^2)
T <- (mdiff-0)/SE
p <- pt(T, n-1)
p <- 2*p 
p
## [1] 0.002883615

The p-value is less than 0.05 so we can reject the null hypothesis. There is convincing evidence that the difference in the average city MPG of automatic and manual vehicles.

5.48

a)

\[{ H }_{ 0 }:\quad { \mu }_{ lessHS }={ \mu }_{ HS }={ \mu }_{ jrcol }={ \mu }_{ Bach }={ \mu }_{ Grad }\]
\[{ H }_{ A }:\quad At\quad least\quad one\quad mean\quad is\quad not\quad equal\]

b)

We will assume that there is independence across the gorup. FOr diststribution, there are some outliers in each box plot and skew in the bachelor’s plot. For variability, we look at each standard deviation and assume there is variability.

c)

library(knitr)

work_hours <- data.frame (
  mu <- c(38.67, 39.6, 41.39, 42.55, 40.85),
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51),
n <- c(121, 546, 97, 253, 155)
)

colnames(work_hours) <- c("mean","sd","n")
knitr::kable(work_hours)
mean sd n
38.67 15.81 121
39.60 14.97 546
41.39 18.10 97
42.55 13.62 253
40.85 15.51 155
n <- sum(work_hours$n)

k <- length(work_hours$sd)
k
## [1] 5
df_deg <- k - 1
df_deg
## [1] 4
df_res <- n-k
df_res
## [1] 1167
prf <- 0.0682
total_mean <- 40.45

F <- qf( 1 - prf, df_deg , df_res)
F
## [1] 2.188931
SSG <- sum( work_hours$n * (work_hours$mean - total_mean)^2 )
SSG
## [1] 2004.101
MSE=501.54
MSG <- MSE/F
MSG
## [1] 229.1255
anova548 <- data.frame(
  names <- c("degree","Residuals","Total"),
  Df <- c("4","1167","1171"),
  SumSq <- c("2004.1","267382","269386.1"),
  MeanSq <- c("501.54","229.13",""),
  Fvalue <- c("2.19","",""),
  prf <- c("0.0682","","")
)

colnames(anova548) <- c("names","Df","Sum Sq","Mean Sq","F value","Pr(>F)")

knitr::kable(anova548)
names Df Sum Sq Mean Sq F value Pr(>F)
degree 4 2004.1 501.54 2.19 0.0682
Residuals 1167 267382 229.13
Total 1171 269386.1

d)

THe p-value is greater than .05 therefore we do not reject the null hypothesis and conclude that there is no significant difference between the 5 groups.