Question 1 Two Sample T test

dat <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/lifespans.csv")
boxplot(Lifespan~Supplier, 
        data = dat,
        main = "Boxplot of Lifespans by Supplier",
        xlab = "Supplier",
        ylab = "Lifespan (hours)")

library(car)

## Loading required package: carData

levene_test <- leveneTest(Lifespan ~ Supplier, data = dat)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

print(levene_test)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  2.0991 0.1646
##       18

The p value is 0.1646 which is larger than 0.05, so it means that the variances are not significantly different, and the assumption of equal variances holds. We can use pooled variance.

H_0: \(\mu_A\)= \(\mu_B\)

H_a: \(\mu_A\) \(\neq\) \(\mu_B\)

t_test <- t.test(Lifespan ~ Supplier, data = dat, var.equal = TRUE, alternative = "two.sided")
print(t_test)

## 
##  Two Sample t-test
## 
## data:  Lifespan by Supplier
## t = 2.6682, df = 18, p-value = 0.01567
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##   2.411193 20.269776
## sample estimates:
## mean in group A mean in group B 
##        504.4806        493.1401

P value is 0.01567 which is lower than alpha=0.05, so we can reject the null hypothesis, which means there is a statistically significant difference between the mean lifespans of the two suppliers.

Question 2 Paired t test

H_0: mu_before = mu_after

H_a: mu_before > mu_after

before <- c(35, 40, 32, 38, 36, 42, 39, 41)
after <- c(30, 38, 31, 36, 34, 40, 37, 39)
t_test <- t.test(before, after, paired = TRUE, alternative = "greater")
print(t_test)

## 
##  Paired t-test
## 
## data:  before and after
## t = 5.4628, df = 7, p-value = 0.0004715
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  1.469666      Inf
## sample estimates:
## mean difference 
##            2.25

The p value is 0.0004715, which is much smaller than significance level of 0.05. We can reject the null hypothesis and conclude that the training program significantly reduces task completion time.

Question 3 CRD

H_0: mu_A=mu_B=mu_C

H_a: At least one mu from A, B, or C is different.

library(GAD)
dat3 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/tensile_strength.csv")
str(dat3)

## 'data.frame':    15 obs. of  2 variables:
##  $ Material: chr  "A" "A" "A" "A" ...
##  $ Strength: num  522 497 501 479 492 ...

dat3$Material <- as.factor(dat3$Material)
aov.model <- aov(Strength~Material, data = dat3)
summary(aov.model)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Material     2  972.2   486.1   3.039 0.0855 .
## Residuals   12 1919.3   159.9                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(aov.model)

The p value is 0.0855 which is smaller than 0.1 but larger than 0.05, so we can reject the null hypothesis on alpha = 0.10 level. So we can say that there is a significant difference between the tensile strength.

?TukeyHSD
TukeyHSD(aov.model, conf.level = 0.90)

##   Tukey multiple comparisons of means
##     90% family-wise confidence level
## 
## Fit: aov(formula = Strength ~ Material, data = dat3)
## 
## $Material
##          diff         lwr      upr     p adj
## B-A 18.871780   0.7508066 36.99275 0.0853225
## C-A 14.389771  -3.7312026 32.51074 0.2113877
## C-B -4.482009 -22.6029824 13.63896 0.8432108

From the result, we can see that only B and A have a statistically difference on 90% confidence level, and material B have higher tensile strength than material A.

Question 4 Factorial Design

The model equation is \(y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}\)

\(y_{ijk}\): Observed yield for the k replicate of the i level of factor A and j level of factor B.

\(\mu\): Overall mean.

\(\alpha_i\): Effect of the i level of factor A (Temperature).

\(\beta_j\): Effect of the j level of factor B (Pressure).

\((\alpha\beta)_{ij}\): Interaction effect between factors A and B.

\(\epsilon_{ijk}\): Random error term.

dat4 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/yield_data.csv")
str(dat4)

## 'data.frame':    12 obs. of  3 variables:
##  $ Temperature: chr  "Low" "Low" "Low" "Low" ...
##  $ Pressure   : chr  "Low" "Low" "Low" "High" ...
##  $ Yield      : num  52.5 49.3 53.2 62.6 53.8 ...

dat4$Temperature <- as.fixed(dat4$Temperature)
dat4$Pressure <- as.fixed(dat4$Pressure)
model <- aov(Yield~Temperature+Pressure+Temperature*Pressure, data=dat4)
gad(model)

## $anova
## Analysis of Variance Table
## 
## Response: Yield
##                      Df Sum Sq Mean Sq F value    Pr(>F)    
## Temperature           1 433.78  433.78 26.6368 0.0008625 ***
## Pressure              1  95.90   95.90  5.8891 0.0414123 *  
## Temperature:Pressure  1   0.99    0.99  0.0605 0.8119149    
## Residuals             8 130.28   16.29                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The temperature and pressure have significant effects on yield, with p value of 0.0008625 and 0.0414123 respectively.

The interaction between these two factors does’t have significant effect, suggesting their effects are independent.

Final Exam

Peihang Li

2024-12-06

Question 1 Two Sample T test

Question 2 Paired t test

Question 3 CRD

Question 4 Factorial Design