1. Data Loading and Cleaning

library(readxl)
WeeklyLab4Data <- read_excel("~/Downloads/WeeklyLab4Data.xlsx")
mydata <- WeeklyLab4Data
summary(mydata)
##  RunningTemperature StorageTemperature Formulation       
##  Length:54          Length:54          Length:54         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##    Machine           LifeinHours   
##  Length:54          Min.   :16.00  
##  Class :character   1st Qu.:25.25  
##  Mode  :character   Median :29.00  
##                     Mean   :30.07  
##                     3rd Qu.:33.00  
##                     Max.   :47.00

There are five variables, which Running Temperature, Storage Temperature, Formulation, Machine are string variables, Lifeinhours is numeric variable. We need to substitute string to numeric variables for quantitative analysis.

2. Plotting the Relationship Between Variables.

plot(density(mydata$LifeinHours))

## The LifeinHours data is a normal distribution. 

boxplot(mydata$LifeinHours ~ mydata$RunningTemperature, data = mydata, main = "LifeinHours vs. RunningTemperature", xlab = "Running Temperature", ylab = "Life in Hours")

boxplot(mydata$LifeinHours ~ mydata$StorageTemperature, data = mydata, main = "LifeinHours vs. StorageTemperature", xlab = "Storage Temperature", ylab = "Life in Hours")

boxplot(mydata$LifeinHours ~ mydata$Formulation, data = mydata, main = "LifeinHours vs. Formulatio", xlab = "Formulation", ylab = "Life in Hours")

boxplot(mydata$LifeinHours ~ mydata$Machine, data = mydata, main = "LifeinHours vs. Machine", xlab = "Machine", ylab = "Life in Hours")

From the above plotting, we could generally conclude that without considering the co-relationship and interaction:

LOW RunningTemperature brought longest Life in Hours

LOW StorageTemperature brought longest Life in Hours

F2 brought longest Life in Hours

M2 brought longest Life in Hours

Considering the interaction effect between Variables

Running Temperature & Storage Temperature Interaction

interaction.plot(mydata$RunningTemperature, mydata$StorageTemperature, mydata$LifeinHours, fun = mean, fixed = TRUE, trace.label = "Storage Tempeature", xlab = "Running Temperature", ylab = "Average Life in Hours")

interaction.plot(mydata$Formulation, mydata$Machine, mydata$LifeinHours, fun = mean, fixed = TRUE, trace.label = "Machine", xlab = "Formulation", ylab = "Average Life in Hours")

####From the above plotting, we could generally conclude that interactions existed between variables. ####In addition, LOW running temperature & storage temperature, F2 formulation, Machine 2 is the optimized choice to get the longest Life in Hours.

3. Quantitive Analysis

anovamodel1 <- aov(mydata$LifeinHours ~ mydata$RunningTemperature*mydata$StorageTemperature + mydata$Formulation + mydata$Machine, data = mydata)
summary(anovamodel1)
##                                                     Df Sum Sq Mean Sq
## mydata$RunningTemperature                            2  770.3   385.1
## mydata$StorageTemperature                            2  585.5   292.7
## mydata$Formulation                                   1  363.0   363.0
## mydata$Machine                                       2   64.0    32.0
## mydata$RunningTemperature:mydata$StorageTemperature  4  201.6    50.4
## Residuals                                           42  487.3    11.6
##                                                     F value   Pr(>F)    
## mydata$RunningTemperature                            33.192 2.26e-09 ***
## mydata$StorageTemperature                            25.229 6.36e-08 ***
## mydata$Formulation                                   31.281 1.53e-06 ***
## mydata$Machine                                        2.759  0.07482 .  
## mydata$RunningTemperature:mydata$StorageTemperature   4.344  0.00497 ** 
## Residuals                                                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anovamodel2 <- aov(mydata$LifeinHours ~ mydata$RunningTemperature*mydata$StorageTemperature + mydata$Formulation, data = mydata)
summary(anovamodel2)
##                                                     Df Sum Sq Mean Sq
## mydata$RunningTemperature                            2  770.3   385.1
## mydata$StorageTemperature                            2  585.5   292.7
## mydata$Formulation                                   1  363.0   363.0
## mydata$RunningTemperature:mydata$StorageTemperature  4  201.6    50.4
## Residuals                                           44  551.4    12.5
##                                                     F value   Pr(>F)    
## mydata$RunningTemperature                            30.734 4.44e-09 ***
## mydata$StorageTemperature                            23.361 1.22e-07 ***
## mydata$Formulation                                   28.965 2.71e-06 ***
## mydata$RunningTemperature:mydata$StorageTemperature   4.023  0.00725 ** 
## Residuals                                                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anovamodel3 <- aov(mydata$LifeinHours ~ mydata$RunningTemperature*mydata$StorageTemperature, data = mydata)
summary(anovamodel3)
##                                                     Df Sum Sq Mean Sq
## mydata$RunningTemperature                            2  770.3   385.1
## mydata$StorageTemperature                            2  585.5   292.7
## mydata$RunningTemperature:mydata$StorageTemperature  4  201.6    50.4
## Residuals                                           45  914.3    20.3
##                                                     F value   Pr(>F)    
## mydata$RunningTemperature                            18.955 1.07e-06 ***
## mydata$StorageTemperature                            14.408 1.46e-05 ***
## mydata$RunningTemperature:mydata$StorageTemperature   2.481   0.0572 .  
## Residuals                                                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In model 1, with a 4.344 F-value, four variables were considered significant interaction.

In model 2 & 3, we would tell that Running Temperature and Storage Temperature had the strongest interation. Thus, I would run Tukey Test for details of only this two variables:

TukeyHSD(anovamodel3, "mydata$RunningTemperature", conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mydata$LifeinHours ~ mydata$RunningTemperature * mydata$StorageTemperature, data = mydata)
## 
## $`mydata$RunningTemperature`
##               diff       lwr       upr     p adj
## low-high  9.111111  5.469545 12.752677 0.0000007
## med-high  5.944444  2.302879  9.586010 0.0007678
## med-low  -3.166667 -6.808232  0.474899 0.0996512
TukeyHSD(anovamodel3, "mydata$StorageTemperature", conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mydata$LifeinHours ~ mydata$RunningTemperature * mydata$StorageTemperature, data = mydata)
## 
## $`mydata$StorageTemperature`
##               diff       lwr        upr     p adj
## low-high  8.000000  4.358434 11.6415656 0.0000092
## med-high  4.888889  1.247323  8.5304545 0.0060161
## med-low  -3.111111 -6.752677  0.5304545 0.1075016
TukeyHSD(anovamodel3, "mydata$RunningTemperature:mydata$StorageTemperature", conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mydata$LifeinHours ~ mydata$RunningTemperature * mydata$StorageTemperature, data = mydata)
## 
## $`mydata$RunningTemperature:mydata$StorageTemperature`
##                           diff         lwr         upr     p adj
## low:high-high:high   5.8333333  -2.6432572 14.30992385 0.3979677
## med:high-high:high   4.0000000  -4.4765905 12.47659052 0.8320152
## high:low-high:high   5.0000000  -3.4765905 13.47659052 0.6032770
## low:low-high:high   18.8333333  10.3567428 27.30992385 0.0000002
## med:low-high:high   10.0000000   1.5234095 18.47659052 0.0103830
## high:med-high:high   2.6666667  -5.8099239 11.14325719 0.9813461
## low:med-high:high   10.3333333   1.8567428 18.80992385 0.0071759
## med:med-high:high   11.5000000   3.0234095 19.97659052 0.0018626
## med:high-low:high   -1.8333333 -10.3099239  6.64325719 0.9984846
## high:low-low:high   -0.8333333  -9.3099239  7.64325719 0.9999960
## low:low-low:high    13.0000000   4.5234095 21.47659052 0.0002984
## med:low-low:high     4.1666667  -4.3099239 12.64325719 0.7991722
## high:med-low:high   -3.1666667 -11.6432572  5.30992385 0.9484958
## low:med-low:high     4.5000000  -3.9765905 12.97659052 0.7258309
## med:med-low:high     5.6666667  -2.8099239 14.14325719 0.4370564
## high:low-med:high    1.0000000  -7.4765905  9.47659052 0.9999836
## low:low-med:high    14.8333333   6.3567428 23.30992385 0.0000290
## med:low-med:high     6.0000000  -2.4765905 14.47659052 0.3605510
## high:med-med:high   -1.3333333  -9.8099239  7.14325719 0.9998530
## low:med-med:high     6.3333333  -2.1432572 14.80992385 0.2916344
## med:med-med:high     7.5000000  -0.9765905 15.97659052 0.1206672
## low:low-high:low    13.8333333   5.3567428 22.30992385 0.0001044
## med:low-high:low     5.0000000  -3.4765905 13.47659052 0.6032770
## high:med-high:low   -2.3333333 -10.8099239  6.14325719 0.9920699
## low:med-high:low     5.3333333  -3.1432572 13.80992385 0.5189883
## med:med-high:low     6.5000000  -1.9765905 14.97659052 0.2604373
## med:low-low:low     -8.8333333 -17.3099239 -0.35674281 0.0352795
## high:med-low:low   -16.1666667 -24.6432572 -7.69007615 0.0000051
## low:med-low:low     -8.5000000 -16.9765905 -0.02340948 0.0488888
## med:med-low:low     -7.3333333 -15.8099239  1.14325719 0.1385659
## high:med-med:low    -7.3333333 -15.8099239  1.14325719 0.1385659
## low:med-med:low      0.3333333  -8.1432572  8.80992385 1.0000000
## med:med-med:low      1.5000000  -6.9765905  9.97659052 0.9996464
## low:med-high:med     7.6666667  -0.8099239 16.14325719 0.1046902
## med:med-high:med     8.8333333   0.3567428 17.30992385 0.0352795
## med:med-low:med      1.1666667  -7.3099239  9.64325719 0.9999465

This line of result listed presented significantly positive effect on Life in Hours when Running Temperature and Storage Temperature are low; Negative effect on Life in Hours when Running Temperature and Storage Temperature are high.

\(`mydata\)RunningTemperature:mydata$StorageTemperature` diff lwr upr p adj

low:low-high:high 18.8333333 10.3567428 27.30992385 0.0000002

4. Conclusion

This data set is clean and neaty about the chemical test. I compared the dependent variables’ relationship to independent variable Life in Hours in the step 2.

From Step 2, I got the results for single indenpendent variable ~ dependent variable:

LOW RunningTemperature brought longest Life in Hours

LOW StorageTemperature brought longest Life in Hours

F2 brought longest Life in Hours

M2 brought longest Life in Hours

As the results above, I also tested the interaction between variables, which was my hypothesis that two temperature variable would be the most interacted variables. But I also tested the Machine and Formulation as references.

From Step 3, I would go further to Anova Test, building 3 models and then found that Temperatures were the most valued variables.

At last, I ran Tukey test only for Running Temperature and Storage Temperature.

From Plotting to Quantitative analysis, I suggested the best way for longest life in hours is:

LOW RunningTemperature, LOW StorageTemperature, Formulation F2, Machine M2.

Thank you!