You work at a chemical processing firm. The engineers are testing new formulations of a commercial grade lubricant. This lubricant will be used in high use heavy machinery. They want to know how long each formulation lasts in general, but also how the formulations last under different temperatures and following different storage temperatures. Since these are initial tests they have simplified the predictor variables by forming categories (low, medium, high heat). These tests were performed across three different machines which might vary in some ways. Below is the data from this study. Your task is to provide answers to their questions. Make sure to check the assumptions of your model and to provide a well written summary of the findings.
- How long each formulation lasts in general?
- How the formulations last under different temperatures and following different storage temperatures?
## RunningTemperature StorageTemperature Formulation Machine LifeinHours
## high:18 high:18 F1:27 M1:18 Min. :16.00
## low :18 low :18 F2:27 M2:18 1st Qu.:25.25
## med :18 med :18 M3:18 Median :29.00
## Mean :30.07
## 3rd Qu.:33.00
## Max. :47.00
## 'data.frame': 54 obs. of 5 variables:
## $ RunningTemperature: Factor w/ 3 levels "high","low","med": 1 1 1 2 2 2 3 3 3 1 ...
## $ StorageTemperature: Factor w/ 3 levels "high","low","med": 1 1 1 1 1 1 1 1 1 2 ...
## $ Formulation : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Machine : Factor w/ 3 levels "M1","M2","M3": 1 3 2 2 3 1 2 3 1 2 ...
## $ LifeinHours : int 25 16 20 27 24 26 26 26 20 24 ...
## 'data.frame': 54 obs. of 5 variables:
## $ RunningTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 3 3 3 2 2 2 1 ...
## $ StorageTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 1 1 1 1 1 1 3 ...
## $ Formulation : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Machine : Factor w/ 3 levels "M1","M2","M3": 1 3 2 2 3 1 2 3 1 2 ...
## $ LifeinHours : int 25 16 20 27 24 26 26 26 20 24 ...
## 'data.frame': 27 obs. of 5 variables:
## $ RunningTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 3 3 3 2 2 2 1 ...
## $ StorageTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 1 1 1 1 1 1 3 ...
## $ Formulation : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Machine : Factor w/ 3 levels "M1","M2","M3": 1 3 2 2 3 1 2 3 1 2 ...
## $ LifeinHours : int 25 16 20 27 24 26 26 26 20 24 ...
## 'data.frame': 27 obs. of 5 variables:
## $ RunningTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 3 3 3 2 2 2 1 ...
## $ StorageTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 1 1 1 1 1 1 3 ...
## $ Formulation : Factor w/ 2 levels "F1","F2": 2 2 2 2 2 2 2 2 2 2 ...
## $ Machine : Factor w/ 3 levels "M1","M2","M3": 2 1 3 2 3 1 1 2 3 3 ...
## $ LifeinHours : int 26 24 24 34 28 31 26 31 30 27 ...
## RunningTemperature StorageTemperature Formulation Machine LifeinHours
## high:9 high:9 F1:27 M1:9 Min. :16.00
## med :9 med :9 F2: 0 M2:9 1st Qu.:24.00
## low :9 low :9 M3:9 Median :26.00
## Mean :27.48
## 3rd Qu.:32.00
## Max. :39.00
## RunningTemperature StorageTemperature Formulation Machine LifeinHours
## high:9 high:9 F1: 0 M1:9 Min. :21.00
## med :9 med :9 F2:27 M2:9 1st Qu.:27.50
## low :9 low :9 M3:9 Median :31.00
## Mean :32.67
## 3rd Qu.:37.50
## Max. :47.00
> Dataset is seperated into two subsets by formulation, new subsets F1 and F2 created respectively. Use Summary() to evaluate each set and mean Life in Hours for F1 is 27.48 and for F2 is 32.67. According to the boxplot. F2 outperform F1, the lowest Life in Hours of F2 is much higher than F1.
> Lower running temperature appears to help extending formulation life.
It appears that storage temperature did not have such a drastic impact on formulation life compare to Running Temperature. However, note that formulation F2 appears to perform better at Med and Low temperature. And F1 performs best at low storage temperature.
According to the boxplot, for F1, machines do not seem to affect greatly in terms of formulation performance. For F2 however, machine 2 appears to help exten formulation life. Nevertheless, machine appears to have less significance in general
> So yes, the plot does show that there seems to be some sort of interactions between storage and running temperature, since the lines are not parallel. Continue on with ANOVA
## Call:
## aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature,
## data = lab2)
##
## Terms:
## RunningTemperature StorageTemperature
## Sum of Squares 770.2593 585.4815
## Deg. of Freedom 2 2
## RunningTemperature:StorageTemperature Residuals
## Sum of Squares 201.6296 914.3333
## Deg. of Freedom 4 45
##
## Residual standard error: 4.507607
## Estimated effects may be unbalanced
## Df Sum Sq Mean Sq F value Pr(>F)
## RunningTemperature 2 770.3 385.1 18.955 1.07e-06
## StorageTemperature 2 585.5 292.7 14.408 1.46e-05
## RunningTemperature:StorageTemperature 4 201.6 50.4 2.481 0.0572
## Residuals 45 914.3 20.3
##
## RunningTemperature ***
## StorageTemperature ***
## RunningTemperature:StorageTemperature .
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Great, so ANOVA suggests that both Running and Storage temperature are significant. The interaction, on the other hand are not as strong as I had hope. Proceed to TukeyHSD test
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, data = lab2)
##
## $RunningTemperature
## diff lwr upr p adj
## med-high 5.944444 2.302879 9.586010 0.0007678
## low-high 9.111111 5.469545 12.752677 0.0000007
## low-med 3.166667 -0.474899 6.808232 0.0996512
For running temperature, lower temperature definitely helps with formulation life. From best to worst in terms of formulation performance in different running temperature: Low > Med > High
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, data = lab2)
##
## $StorageTemperature
## diff lwr upr p adj
## med-high 4.888889 1.2473232 8.530455 0.0060161
## low-high 8.000000 4.3584344 11.641566 0.0000092
## low-med 3.111111 -0.5304545 6.752677 0.1075016
Similar story here as Running Temperature, from best to worst temperature performance: Low > Med > High
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, data = lab2)
##
## $`RunningTemperature:StorageTemperature`
## diff lwr upr p adj
## med:high-high:high 4.0000000 -4.47659052 12.476591 0.8320152
## low:high-high:high 5.8333333 -2.64325719 14.309924 0.3979677
## high:med-high:high 2.6666667 -5.80992385 11.143257 0.9813461
## med:med-high:high 11.5000000 3.02340948 19.976591 0.0018626
## low:med-high:high 10.3333333 1.85674281 18.809924 0.0071759
## high:low-high:high 5.0000000 -3.47659052 13.476591 0.6032770
## med:low-high:high 10.0000000 1.52340948 18.476591 0.0103830
## low:low-high:high 18.8333333 10.35674281 27.309924 0.0000002
## low:high-med:high 1.8333333 -6.64325719 10.309924 0.9984846
## high:med-med:high -1.3333333 -9.80992385 7.143257 0.9998530
## med:med-med:high 7.5000000 -0.97659052 15.976591 0.1206672
## low:med-med:high 6.3333333 -2.14325719 14.809924 0.2916344
## high:low-med:high 1.0000000 -7.47659052 9.476591 0.9999836
## med:low-med:high 6.0000000 -2.47659052 14.476591 0.3605510
## low:low-med:high 14.8333333 6.35674281 23.309924 0.0000290
## high:med-low:high -3.1666667 -11.64325719 5.309924 0.9484958
## med:med-low:high 5.6666667 -2.80992385 14.143257 0.4370564
## low:med-low:high 4.5000000 -3.97659052 12.976591 0.7258309
## high:low-low:high -0.8333333 -9.30992385 7.643257 0.9999960
## med:low-low:high 4.1666667 -4.30992385 12.643257 0.7991722
## low:low-low:high 13.0000000 4.52340948 21.476591 0.0002984
## med:med-high:med 8.8333333 0.35674281 17.309924 0.0352795
## low:med-high:med 7.6666667 -0.80992385 16.143257 0.1046902
## high:low-high:med 2.3333333 -6.14325719 10.809924 0.9920699
## med:low-high:med 7.3333333 -1.14325719 15.809924 0.1385659
## low:low-high:med 16.1666667 7.69007615 24.643257 0.0000051
## low:med-med:med -1.1666667 -9.64325719 7.309924 0.9999465
## high:low-med:med -6.5000000 -14.97659052 1.976591 0.2604373
## med:low-med:med -1.5000000 -9.97659052 6.976591 0.9996464
## low:low-med:med 7.3333333 -1.14325719 15.809924 0.1385659
## high:low-low:med -5.3333333 -13.80992385 3.143257 0.5189883
## med:low-low:med -0.3333333 -8.80992385 8.143257 1.0000000
## low:low-low:med 8.5000000 0.02340948 16.976591 0.0488888
## med:low-high:low 5.0000000 -3.47659052 13.476591 0.6032770
## low:low-high:low 13.8333333 5.35674281 22.309924 0.0001044
## low:low-med:low 8.8333333 0.35674281 17.309924 0.0352795
Time to check the interactions between Running and Storage Temperature. According to the TukeyHSD test, and observing the P-Adjusted column, the ones that shown significance mainly consist of Low temperature.
The mean life in hours for Formulation 1 is 27.48 and 32.67 for Formulation 2. The ANOVA model suggest that formulations tend to perform better at lower temperatures, and this is true for both running and storage temperature. Meanwhile, the interations between Running and Storage temperature, albeit lack great significance, nonethelss suggest similar result of better performance at lower temperature. Machine did not play a big role in terms of affecting formulation performance.