Working with the data
# Working with the data
data1 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/lifespans.csv")
data1
## Supplier Lifespan
## 1 A 504.9671
## 2 A 498.6174
## 3 A 506.4769
## 4 A 515.2303
## 5 A 497.6585
## 6 A 497.6586
## 7 A 515.7921
## 8 A 507.6743
## 9 A 495.3053
## 10 A 505.4256
## 11 B 498.0487
## 12 B 498.0141
## 13 B 508.6294
## 14 B 476.3008
## 15 B 479.1262
## 16 B 496.5657
## 17 B 489.8075
## 18 B 509.7137
## 19 B 491.3796
## 20 B 483.8154
supplier_A <- data1$Lifespan[data1$Supplier == "A"]
supplier_B <- data1$Lifespan[data1$Supplier == "B"]
Check the variance
# Checking the variance
boxplot(supplier_A, supplier_B,
names = c("Supplier A", "Supplier B"),
main = "Boxplot of Lifespans by Supplier",
xlab = "Supplier",
ylab = "Lifespan (hours)",
col = c("blue", "red"))
Supplier A appears to have a slightly smaller spread than Supplier B.
Just to confirm, I use Levene Test
# Using Levene’s Test to test the equal variance
levene.test(y = data1$Lifespan, group = data1$Supplier, location = "mean")
##
## Classical Levene's test based on the absolute deviations from the mean
## ( none not applied because the location is not set to median )
##
## data: data1$Lifespan
## Test Statistic = 2.0772, p-value = 0.1667
Since the P-Value (0.1667) is greater than significance level (0.05), the variances are similar
The hypothesis to test are:
Applying the Two-Sample T-Test
# Applying the Two-Sample T-Test
t.test(x = supplier_A,y = supplier_B, var.equal = TRUE, alternative = "two.sided", conf.level = 0.95)
##
## Two Sample t-test
##
## data: supplier_A and supplier_B
## t = 2.6682, df = 18, p-value = 0.01567
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.411193 20.269776
## sample estimates:
## mean of x mean of y
## 504.4806 493.1401
CONCLUSIONS:
Since the P-Value (0.01567) is less than significance level (0.05), I reject the null hypothesis. This indicates that there is a statistically significant difference in the mean lifespans of components between Supplier A and Supplier B.
The sample average for Supplier A is 504.4806
The sample average for Supplier B is 493.1401
The T-statistic value is 2.6682
Creating the Dataframe:
worker = 1:8
before = c(35, 40, 32, 38, 36, 42, 39, 41)
after = c(30, 38, 31, 36, 34, 40, 37, 39)
data2 <- data.frame(worker,before, after)
data2
## worker before after
## 1 1 35 30
## 2 2 40 38
## 3 3 32 31
## 4 4 38 36
## 5 5 36 34
## 6 6 42 40
## 7 7 39 37
## 8 8 41 39
The hypothesis to test are:
Applying Paired T-Test.
# Applying Paired T-Test
t.test(x = data2$before, y = data2$after, alternative = "two.sided", conf.level = 0.95, paired = TRUE)
##
## Paired t-test
##
## data: data2$before and data2$after
## t = 5.4628, df = 7, p-value = 0.0009431
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 1.276065 3.223935
## sample estimates:
## mean difference
## 2.25
CONCLUSIONS:
The P-values is 0.0009431
Since the P-Value (0.0009431) is less than the significance level (0.05), I reject the null hypothesis. This indicates that there is a significant difference between the means.
The mean difference is 2.25
The T-statistic value is 5.4628
Working with the data
# Working with the data
data3 <- read.csv("tensile_strength.csv")
data3$Material <- as.factor(data3$Material)
rmarkdown::paged_table(data3)
The hypothesis to test are:
To perform the ANOVA test, we assume that the assumptions of normality and homogeneity of variance are met
Testing ANOVA
# Testing ANOVA
aov.model <- aov(Strength ~ Material, data = data3)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Material 2 972.2 486.1 3.039 0.0855 .
## Residuals 12 1919.3 159.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CONCLUSIONS:
Since the P-Value (0.0855) is greater than significance level (0.05), We fail to reject the null hypothesis. This indicates that there is no statistically significant difference in tensile strength among the materials
The F-statistic value is 3.039
To confirm, I explored the results using Tukey’s test (despite the previous results not being significant)
# Testing tukeymodel
tukeymodel <- TukeyHSD(aov.model, conf.level = 0.95)
tukeymodel
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Strength ~ Material, data = data3)
##
## $Material
## diff lwr upr p adj
## B-A 18.871780 -2.467260 40.21082 0.0853225
## C-A 14.389771 -6.949269 35.72881 0.2113877
## C-B -4.482009 -25.821049 16.85703 0.8432108
plot(tukeymodel)
CONCLUSIONS:
Working with the data
# Working with the data
data4 <- read.csv("yield_data.csv")
data4$Temperature <- as.fixed(data4$Temperature)
data4$Pressure <- as.fixed(data4$Pressure)
rmarkdown::paged_table(data4)
Linear Effect equation:
\[ y_{i,j} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{i,j} \] Where:
Hypothesis test for Main Effect A (Temperature):
Hypothesis test for Main Effect B (Pressure):
Hypothesis test for Interaction Effect AB (Temperature and Pressure):
Testing the model
# Testing the model
temperature <- data4$Temperature
pressure <- data4$Pressure
response <- data4$Yield
equation <- response ~ temperature + pressure + temperature*pressure
model <- aov(equation)
GAD::gad(model)
## $anova
## Analysis of Variance Table
##
## Response: response
## Df Sum Sq Mean Sq F value Pr(>F)
## temperature 1 433.78 433.78 26.6368 0.0008625 ***
## pressure 1 95.90 95.90 5.8891 0.0414123 *
## temperature:pressure 1 0.99 0.99 0.0605 0.8119149
## Residuals 8 130.28 16.29
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CONCLUSIONS:
Since the P-Value (0.0008625) of Main Effect “A” (Temperature) is less than significance level (0.05), We reject the null hypothesis. This indicates that there is a significant effect of Temperature on yield.
Since the P-Value (0.0414123) of Main Effect “B” (Pressure) is less than significance level (0.05), We reject the null hypothesis. This indicates that there is a significant effect of Pressure on yield.
Since the P-Value (0.8119149) of Interaction Effect “AB” (Temperature and Pressure) is greater than significance level (0.05), We fail to reject the null hypothesis. This indicates that there is no significant interaction effect between Temperature and Pressure on yield.
Just to confirm the results we plot the “interaction plot”
interaction.plot(x.factor = temperature,
trace.factor = pressure,
response = response,
col=c("red","blue")
)
We can confirm that there is no interaction between factors A (Temperature) and B (Pressure) because the lines in the interaction plot are approximately parallel.
# Question 1
# #####################################################################################################################
# Working with the data
data1 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/lifespans.csv")
data1
supplier_A <- data1$Lifespan[data1$Supplier == "A"]
supplier_B <- data1$Lifespan[data1$Supplier == "B"]
# Checking the variance
boxplot(supplier_A, supplier_B,
names = c("Supplier A", "Supplier B"),
main = "Boxplot of Lifespans by Supplier",
xlab = "Supplier",
ylab = "Lifespan (hours)",
col = c("blue", "red"))
# Using Levene’s Test to test the equal variance
levene.test(y = data1$Lifespan, group = data1$Supplier, location = "mean")
# Applying the Two-Sample T-Test
t.test(x = supplier_A,y = supplier_B, var.equal = FALSE, alternative = "two.sided", conf.level = 0.95)
# Question 2
# #####################################################################################################################
# Applying Paired T-Test
t.test(x = data2$before, y = data2$after, alternative = "two.sided", conf.level = 0.95, paired = TRUE)
# Question 3
# #####################################################################################################################
# Working with the data
data3 <- read.csv("tensile_strength.csv")
data3$Material <- as.factor(data3$Material)
rmarkdown::paged_table(data3)
# Testing ANOVA
aov.model <- aov(Strength ~ Material, data = data3)
summary(aov.model)
# Testing tukeymodel
tukeymodel <- TukeyHSD(aov.model, conf.level = 0.95)
tukeymodel
plot(tukeymodel)
# Question 4
# #####################################################################################################################
# Working with the data
data4 <- read.csv("yield_data.csv")
data4$Temperature <- as.fixed(data4$Temperature)
data4$Pressure <- as.fixed(data4$Pressure)
rmarkdown::paged_table(data4)
# Testing the model
temperature <- data4$Temperature
pressure <- data4$Pressure
response <- data4$Yield
equation <- response ~ temperature + pressure + temperature*pressure
model <- aov(equation)
GAD::gad(model)