Problem 5.6

Sample mean

(65 + 77) / 2

## [1] 71

Margin of Error

t.24 <- qt(0.95, 24)
t.24

## [1] 1.710882

Sample Standard Deviation

(6/t.24)*5

## [1] 17.53481

Problem 5.14

z = 1.645

ME = z * SD/underroot(n) so if we replace n with ME, formula to get n becomes

n = (z*SD/ME)^2

((1.645*250) / 25)^2

## [1] 270.6025

To get 99 percent confidence interval, Luke needs to have larger sample size
z = 2.58

((2.58 * 250) / 25)^2

## [1] 665.64

If we round it off, Luke will need 666 sample size

Problem 5.20

Median is very nearby but IQR looks slighlty differently at least apparently. I am not sure if difference is significant or not until I don’t check it statistically.
Reading and writing are not independent of each other.
H0 = The average difference between student’s reading and writing score is equal to 0 Ha = The average difference between student’s reading and writing score is not equal to 0
Sample is taken randomly and sample size is 200 which is greater than 30. Data seems normal too so yes all the conditions required to complete the test are satisfied.

df <- 199
diff <- -0.545
sd <- 8.887
SE <- sd/sqrt(199)
t <- (diff - 0) / SE
p <- pt(t, df)
p

## [1] 0.1940119

As per the p-value, we didn’t reject H0 and average score for reading and writing is equal to 0.

We might have Type II error in case if we incorrectly reject Ha
Yes, we would expect to have 0 in our confidence interval as we’ve failed to reject H0.

Problem 5.32

SE <- sqrt(((3.58)^2)/26 + ((4.51)^2)/26)
SE

## [1] 1.12927

mean_diff <- 16.12 - 19.85
t <- (mean_diff - 0) / SE
t

## [1] -3.30302

If we see at the critical value for 5% significance i.e. 2.0555 and the t-score is -3.30302 which is higher than the critical value that’s why we fail to reject H0 and means are equal for both automatic and manual.

Problem 5.48

Ho: There is no difference in the average number of hours worked among the five groups Ha: There is difference in the average number of hours worked among the five groups
If we assume that the data were collected randomly then we may also assume independence within each group and across all groups. If we take a look at boxplots, there are few outliers but the large number of sample size would be sufficient enough not to worry about the outliers. We can assume that the data is normal. If we take a look for variability, there could be variation in every person’s answer. Apparantly, there is no much variation in the means and SDs in all groups but we cannot tell for sure if there is significant difference or not until we don’t check it statistically.

mean <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
data_table <- data.frame (mean, sd, n)

n <- sum(data_table$n)
k <- length(data_table$mean)

Now let’s find degrees of freedom

df <- k - 1
dfResidual <- n - k

F-statistics

Prf <- 0.0682
F_statistic <- qf( 1 - Prf, df , dfResidual)

F-statistic = MSG/MSE

MSG <- 501.54
MSE <- MSG / F_statistic

MSG = 1 / df * SSG

SSG <- df * MSG
SSE <- 267382

SST = SSG + SSE, and df_Total = df + dfResidual

SST <- SSG + SSE
dft <- df + dfResidual

anova <- data.frame(
  names <- c("degree","Residuals","Total"),
  Df <- c("4","1167","1171"),
  SumSq <- c("2004.1","267382","269386.1"),
  MeanSq <- c("501.54","229.13",""),
  Fvalue <- c("2.19","",""),
  prf <- c("0.0682","","")
)

colnames(anova) <- c("names","Df","Sum Sq","Mean Sq","F value","Pr(>F)")
knitr::kable(anova)

names	Df	Sum Sq	Mean Sq	F value	Pr(>F)
degree	4	2004.1	501.54	2.19	0.0682
Residuals	1167	267382	229.13
Total	1171	269386.1

Since the p-value is greater than 0.05 that’s why we fail to reject Ho and therefore the average number of hours worked varies accross the five groups.

Data 606 - Homework 5

Habib U Khan

March 24, 2019

Problem 5.6

Problem 5.14

Problem 5.20

Problem 5.32

Problem 5.48