dat <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/lifespans.csv")
boxplot(Lifespan~Supplier,
data = dat,
main = "Boxplot of Lifespans by Supplier",
xlab = "Supplier",
ylab = "Lifespan (hours)")
library(car)
## Loading required package: carData
levene_test <- leveneTest(Lifespan ~ Supplier, data = dat)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 2.0991 0.1646
## 18
The p value is 0.1646 which is larger than 0.05, so it means that the variances are not significantly different, and the assumption of equal variances holds. We can use pooled variance.
H_0: \(\mu_A\)= \(\mu_B\)
H_a: \(\mu_A\) \(\neq\) \(\mu_B\)
t_test <- t.test(Lifespan ~ Supplier, data = dat, var.equal = TRUE, alternative = "two.sided")
print(t_test)
##
## Two Sample t-test
##
## data: Lifespan by Supplier
## t = 2.6682, df = 18, p-value = 0.01567
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## 2.411193 20.269776
## sample estimates:
## mean in group A mean in group B
## 504.4806 493.1401
P value is 0.01567 which is lower than alpha=0.05, so we can reject the null hypothesis, which means there is a statistically significant difference between the mean lifespans of the two suppliers.
H_0: mu_before = mu_after
H_a: mu_before > mu_after
before <- c(35, 40, 32, 38, 36, 42, 39, 41)
after <- c(30, 38, 31, 36, 34, 40, 37, 39)
t_test <- t.test(before, after, paired = TRUE, alternative = "greater")
print(t_test)
##
## Paired t-test
##
## data: before and after
## t = 5.4628, df = 7, p-value = 0.0004715
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
## 1.469666 Inf
## sample estimates:
## mean difference
## 2.25
The p value is 0.0004715, which is much smaller than significance level of 0.05. We can reject the null hypothesis and conclude that the training program significantly reduces task completion time.
H_0: mu_A=mu_B=mu_C
H_a: At least one mu from A, B, or C is different.
library(GAD)
dat3 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/tensile_strength.csv")
str(dat3)
## 'data.frame': 15 obs. of 2 variables:
## $ Material: chr "A" "A" "A" "A" ...
## $ Strength: num 522 497 501 479 492 ...
dat3$Material <- as.factor(dat3$Material)
aov.model <- aov(Strength~Material, data = dat3)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Material 2 972.2 486.1 3.039 0.0855 .
## Residuals 12 1919.3 159.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)
The p value is 0.0855 which is smaller than 0.1 but larger than 0.05, so we can reject the null hypothesis on alpha = 0.10 level. So we can say that there is a significant difference between the tensile strength.
?TukeyHSD
TukeyHSD(aov.model, conf.level = 0.90)
## Tukey multiple comparisons of means
## 90% family-wise confidence level
##
## Fit: aov(formula = Strength ~ Material, data = dat3)
##
## $Material
## diff lwr upr p adj
## B-A 18.871780 0.7508066 36.99275 0.0853225
## C-A 14.389771 -3.7312026 32.51074 0.2113877
## C-B -4.482009 -22.6029824 13.63896 0.8432108
From the result, we can see that only B and A have a statistically difference on 90% confidence level, and material B have higher tensile strength than material A.
The model equation is \(y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}\)
\(y_{ijk}\): Observed yield for the k replicate of the i level of factor A and j level of factor B.
\(\mu\): Overall mean.
\(\alpha_i\): Effect of the i level of factor A (Temperature).
\(\beta_j\): Effect of the j level of factor B (Pressure).
\((\alpha\beta)_{ij}\): Interaction effect between factors A and B.
\(\epsilon_{ijk}\): Random error term.
dat4 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/yield_data.csv")
str(dat4)
## 'data.frame': 12 obs. of 3 variables:
## $ Temperature: chr "Low" "Low" "Low" "Low" ...
## $ Pressure : chr "Low" "Low" "Low" "High" ...
## $ Yield : num 52.5 49.3 53.2 62.6 53.8 ...
dat4$Temperature <- as.fixed(dat4$Temperature)
dat4$Pressure <- as.fixed(dat4$Pressure)
model <- aov(Yield~Temperature+Pressure+Temperature*Pressure, data=dat4)
gad(model)
## $anova
## Analysis of Variance Table
##
## Response: Yield
## Df Sum Sq Mean Sq F value Pr(>F)
## Temperature 1 433.78 433.78 26.6368 0.0008625 ***
## Pressure 1 95.90 95.90 5.8891 0.0414123 *
## Temperature:Pressure 1 0.99 0.99 0.0605 0.8119149
## Residuals 8 130.28 16.29
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The temperature and pressure have significant effects on yield, with p value of 0.0008625 and 0.0414123 respectively.
The interaction between these two factors does’t have significant effect, suggesting their effects are independent.