1 QUESTION 1

Working with the data

# Working with the data

data1 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/lifespans.csv")
data1
##    Supplier Lifespan
## 1         A 504.9671
## 2         A 498.6174
## 3         A 506.4769
## 4         A 515.2303
## 5         A 497.6585
## 6         A 497.6586
## 7         A 515.7921
## 8         A 507.6743
## 9         A 495.3053
## 10        A 505.4256
## 11        B 498.0487
## 12        B 498.0141
## 13        B 508.6294
## 14        B 476.3008
## 15        B 479.1262
## 16        B 496.5657
## 17        B 489.8075
## 18        B 509.7137
## 19        B 491.3796
## 20        B 483.8154
supplier_A <- data1$Lifespan[data1$Supplier == "A"]
supplier_B <- data1$Lifespan[data1$Supplier == "B"]

Check the variance

# Checking the variance

boxplot(supplier_A, supplier_B,
        names = c("Supplier A", "Supplier B"),
        main = "Boxplot of Lifespans by Supplier",
        xlab = "Supplier",
        ylab = "Lifespan (hours)",
        col = c("blue", "red"))

Supplier A appears to have a slightly smaller spread than Supplier B.

Just to confirm, I use Levene Test

# Using Levene’s Test to test the equal variance

levene.test(y = data1$Lifespan, group = data1$Supplier, location = "mean")
## 
##  Classical Levene's test based on the absolute deviations from the mean
##  ( none not applied because the location is not set to median )
## 
## data:  data1$Lifespan
## Test Statistic = 2.0772, p-value = 0.1667

Since the P-Value (0.1667) is greater than significance level (0.05), the variances are similar

The hypothesis to test are:

  • \(H_0: \mu_{\text{SupplierA}} = \mu_{\text{SupplierB}}\)
  • \(H_a: \mu_{\text{SupplierA}} \neq \mu_{\text{SupplierB}}\)

Applying the Two-Sample T-Test

# Applying the Two-Sample T-Test

t.test(x = supplier_A,y = supplier_B, var.equal = TRUE, alternative = "two.sided", conf.level = 0.95)
## 
##  Two Sample t-test
## 
## data:  supplier_A and supplier_B
## t = 2.6682, df = 18, p-value = 0.01567
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   2.411193 20.269776
## sample estimates:
## mean of x mean of y 
##  504.4806  493.1401

CONCLUSIONS:

  • Since the P-Value (0.01567) is less than significance level (0.05), I reject the null hypothesis. This indicates that there is a statistically significant difference in the mean lifespans of components between Supplier A and Supplier B.

  • The sample average for Supplier A is 504.4806

  • The sample average for Supplier B is 493.1401

  • The T-statistic value is 2.6682



2 QUESTION 2

Creating the Dataframe:

worker = 1:8
before = c(35, 40, 32, 38, 36, 42, 39, 41)
after = c(30, 38, 31, 36, 34, 40, 37, 39)

data2 <- data.frame(worker,before, after)
data2
##   worker before after
## 1      1     35    30
## 2      2     40    38
## 3      3     32    31
## 4      4     38    36
## 5      5     36    34
## 6      6     42    40
## 7      7     39    37
## 8      8     41    39

The hypothesis to test are:

  • \(H_0: \mu_{\text{before}} = \mu_{\text{after}}\)
  • \(H_a: \mu_{\text{before}} \neq \mu_{\text{after}}\)

Applying Paired T-Test.

# Applying Paired T-Test

t.test(x = data2$before, y = data2$after, alternative = "two.sided", conf.level = 0.95, paired = TRUE)
## 
##  Paired t-test
## 
## data:  data2$before and data2$after
## t = 5.4628, df = 7, p-value = 0.0009431
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.276065 3.223935
## sample estimates:
## mean difference 
##            2.25

CONCLUSIONS:

  • The P-values is 0.0009431

  • Since the P-Value (0.0009431) is less than the significance level (0.05), I reject the null hypothesis. This indicates that there is a significant difference between the means.

  • The mean difference is 2.25

  • The T-statistic value is 5.4628



3 QUESTION 3

Working with the data

# Working with the data
data3 <- read.csv("tensile_strength.csv")
data3$Material <- as.factor(data3$Material)

rmarkdown::paged_table(data3)

The hypothesis to test are:

  • \(H_0: \mu_{\text{Material_A}} = \mu_{\text{Material_B}} = \mu_{\text{Material_C}}\)
  • \(H_a: \text{At least one } \mu_{\text{Material_i}}\text{ differs ,} \forall i \in \{A,B,C\}\)

To perform the ANOVA test, we assume that the assumptions of normality and homogeneity of variance are met

Testing ANOVA

# Testing ANOVA
aov.model <- aov(Strength ~ Material, data = data3)
summary(aov.model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Material     2  972.2   486.1   3.039 0.0855 .
## Residuals   12 1919.3   159.9                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

CONCLUSIONS:

  • Since the P-Value (0.0855) is greater than significance level (0.05), We fail to reject the null hypothesis. This indicates that there is no statistically significant difference in tensile strength among the materials

  • The F-statistic value is 3.039

To confirm, I explored the results using Tukey’s test (despite the previous results not being significant)

# Testing tukeymodel
tukeymodel <- TukeyHSD(aov.model, conf.level = 0.95)
tukeymodel
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Strength ~ Material, data = data3)
## 
## $Material
##          diff        lwr      upr     p adj
## B-A 18.871780  -2.467260 40.21082 0.0853225
## C-A 14.389771  -6.949269 35.72881 0.2113877
## C-B -4.482009 -25.821049 16.85703 0.8432108
plot(tukeymodel)

CONCLUSIONS:

  • For B-A: The difference in means between Material B and Material A is 18.87; however, it is not statistically significant (p-value: 0.0853).
  • For C-A: The difference in means between Material C and Material A is 14.39, but it is not statistically significant (p-value: 0.2114).
  • For C-B: The difference in means between Material C and Material B is -4.48, and it is also not statistically significant (p-value: 0.8432).
  • The Tukey’s test confirms that there are no statistically significant differences in tensile strength among Materials A, B, and C, which aligns with the ANOVA results.



4 QUESTION 4

Working with the data

# Working with the data
data4 <- read.csv("yield_data.csv")
data4$Temperature <- as.fixed(data4$Temperature)
data4$Pressure <- as.fixed(data4$Pressure)

rmarkdown::paged_table(data4)

Linear Effect equation:

\[ y_{i,j} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{i,j} \] Where:

  • \(\mu = \text{Grand Mean}\)
  • \(\alpha_i = \text{Main Effect of Temperature}\)
  • \(\beta_j = \text{Main Effect of Pressure}\)
  • \((\alpha\beta)_{ij} = \text{Interaction Effect of Temperature and Pressure}\)
  • \(\epsilon_{i,j} = \text{Random Error}\)

Hypothesis test for Main Effect A (Temperature):

  • \(H_0: \alpha_i = 0\) (Temperature has no significant effect on yield.)
  • \(H_a: \alpha_i \neq 0\) (Temperature has a significant effect on yield.)

Hypothesis test for Main Effect B (Pressure):

  • \(H_0: \beta_j = 0\) (Pressure has no significant effect on yield.)
  • \(H_a: \beta_j \neq 0\) (Pressure has a significant effect on yield.)

Hypothesis test for Interaction Effect AB (Temperature and Pressure):

  • \(H_0: (\alpha\beta)_{ij} = 0\) (No interaction effect between temperature and pressure.)
  • \(H_a: (\alpha\beta)_{ij} \neq 0\) (There is a significant interaction effect between temperature and pressure.)

Testing the model

# Testing the model
temperature <- data4$Temperature
pressure <- data4$Pressure
response <- data4$Yield
equation <- response ~ temperature + pressure + temperature*pressure
model <- aov(equation)
GAD::gad(model)
## $anova
## Analysis of Variance Table
## 
## Response: response
##                      Df Sum Sq Mean Sq F value    Pr(>F)    
## temperature           1 433.78  433.78 26.6368 0.0008625 ***
## pressure              1  95.90   95.90  5.8891 0.0414123 *  
## temperature:pressure  1   0.99    0.99  0.0605 0.8119149    
## Residuals             8 130.28   16.29                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

CONCLUSIONS:

  • Since the P-Value (0.0008625) of Main Effect “A” (Temperature) is less than significance level (0.05), We reject the null hypothesis. This indicates that there is a significant effect of Temperature on yield.

  • Since the P-Value (0.0414123) of Main Effect “B” (Pressure) is less than significance level (0.05), We reject the null hypothesis. This indicates that there is a significant effect of Pressure on yield.

  • Since the P-Value (0.8119149) of Interaction Effect “AB” (Temperature and Pressure) is greater than significance level (0.05), We fail to reject the null hypothesis. This indicates that there is no significant interaction effect between Temperature and Pressure on yield.

Just to confirm the results we plot the “interaction plot”

interaction.plot(x.factor     = temperature,
                 trace.factor = pressure,
                 response     = response,
                 col=c("red","blue")  
                 )

We can confirm that there is no interaction between factors A (Temperature) and B (Pressure) because the lines in the interaction plot are approximately parallel.

5 Complete R-Code

# Question 1
# #####################################################################################################################
# Working with the data
data1 <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/lifespans.csv")
data1
supplier_A <- data1$Lifespan[data1$Supplier == "A"]
supplier_B <- data1$Lifespan[data1$Supplier == "B"]

# Checking the variance
boxplot(supplier_A, supplier_B,
        names = c("Supplier A", "Supplier B"),
        main = "Boxplot of Lifespans by Supplier",
        xlab = "Supplier",
        ylab = "Lifespan (hours)",
        col = c("blue", "red"))

# Using Levene’s Test to test the equal variance

levene.test(y = data1$Lifespan, group = data1$Supplier, location = "mean")

# Applying the Two-Sample T-Test
t.test(x = supplier_A,y = supplier_B, var.equal = FALSE, alternative = "two.sided", conf.level = 0.95)

# Question 2
# #####################################################################################################################
# Applying Paired T-Test
t.test(x = data2$before, y = data2$after, alternative = "two.sided", conf.level = 0.95, paired = TRUE)


# Question 3
# #####################################################################################################################
# Working with the data
data3 <- read.csv("tensile_strength.csv")
data3$Material <- as.factor(data3$Material)
rmarkdown::paged_table(data3)

# Testing ANOVA
aov.model <- aov(Strength ~ Material, data = data3)
summary(aov.model)

# Testing tukeymodel
tukeymodel <- TukeyHSD(aov.model, conf.level = 0.95)
tukeymodel
plot(tukeymodel)


# Question 4
# #####################################################################################################################
# Working with the data
data4 <- read.csv("yield_data.csv")
data4$Temperature <- as.fixed(data4$Temperature)
data4$Pressure <- as.fixed(data4$Pressure)
rmarkdown::paged_table(data4)


# Testing the model
temperature <- data4$Temperature
pressure <- data4$Pressure
response <- data4$Yield
equation <- response ~ temperature + pressure + temperature*pressure
model <- aov(equation)
GAD::gad(model)