Lab Objective

You work at a chemical processing firm. The engineers are testing new formulations of a commercial grade lubricant. This lubricant will be used in high use heavy machinery. They want to know how long each formulation lasts in general, but also how the formulations last under different temperatures and following different storage temperatures. Since these are initial tests they have simplified the predictor variables by forming categories (low, medium, high heat). These tests were performed across three different machines which might vary in some ways. Below is the data from this study. Your task is to provide answers to their questions. Make sure to check the assumptions of your model and to provide a well written summary of the findings.

Question

  1. How long each formulation lasts in general?
  2. How the formulations last under different temperatures and following different storage temperatures?

Understanding the Dataset

##  RunningTemperature StorageTemperature Formulation Machine  LifeinHours   
##  high:18            high:18            F1:27       M1:18   Min.   :16.00  
##  low :18            low :18            F2:27       M2:18   1st Qu.:25.25  
##  med :18            med :18                        M3:18   Median :29.00  
##                                                            Mean   :30.07  
##                                                            3rd Qu.:33.00  
##                                                            Max.   :47.00
## 'data.frame':    54 obs. of  5 variables:
##  $ RunningTemperature: Factor w/ 3 levels "high","low","med": 1 1 1 2 2 2 3 3 3 1 ...
##  $ StorageTemperature: Factor w/ 3 levels "high","low","med": 1 1 1 1 1 1 1 1 1 2 ...
##  $ Formulation       : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Machine           : Factor w/ 3 levels "M1","M2","M3": 1 3 2 2 3 1 2 3 1 2 ...
##  $ LifeinHours       : int  25 16 20 27 24 26 26 26 20 24 ...

Subset Dataset by Formulation and Defining Factor Orders

## 'data.frame':    54 obs. of  5 variables:
##  $ RunningTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 3 3 3 2 2 2 1 ...
##  $ StorageTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 1 1 1 1 1 1 3 ...
##  $ Formulation       : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Machine           : Factor w/ 3 levels "M1","M2","M3": 1 3 2 2 3 1 2 3 1 2 ...
##  $ LifeinHours       : int  25 16 20 27 24 26 26 26 20 24 ...
## 'data.frame':    27 obs. of  5 variables:
##  $ RunningTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 3 3 3 2 2 2 1 ...
##  $ StorageTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 1 1 1 1 1 1 3 ...
##  $ Formulation       : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Machine           : Factor w/ 3 levels "M1","M2","M3": 1 3 2 2 3 1 2 3 1 2 ...
##  $ LifeinHours       : int  25 16 20 27 24 26 26 26 20 24 ...
## 'data.frame':    27 obs. of  5 variables:
##  $ RunningTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 3 3 3 2 2 2 1 ...
##  $ StorageTemperature: Factor w/ 3 levels "high","med","low": 1 1 1 1 1 1 1 1 1 3 ...
##  $ Formulation       : Factor w/ 2 levels "F1","F2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Machine           : Factor w/ 3 levels "M1","M2","M3": 2 1 3 2 3 1 1 2 3 3 ...
##  $ LifeinHours       : int  26 24 24 34 28 31 26 31 30 27 ...
##  RunningTemperature StorageTemperature Formulation Machine  LifeinHours   
##  high:9             high:9             F1:27       M1:9    Min.   :16.00  
##  med :9             med :9             F2: 0       M2:9    1st Qu.:24.00  
##  low :9             low :9                         M3:9    Median :26.00  
##                                                            Mean   :27.48  
##                                                            3rd Qu.:32.00  
##                                                            Max.   :39.00
##  RunningTemperature StorageTemperature Formulation Machine  LifeinHours   
##  high:9             high:9             F1: 0       M1:9    Min.   :21.00  
##  med :9             med :9             F2:27       M2:9    1st Qu.:27.50  
##  low :9             low :9                         M3:9    Median :31.00  
##                                                            Mean   :32.67  
##                                                            3rd Qu.:37.50  
##                                                            Max.   :47.00

> Dataset is seperated into two subsets by formulation, new subsets F1 and F2 created respectively. Use Summary() to evaluate each set and mean Life in Hours for F1 is 27.48 and for F2 is 32.67. According to the boxplot. F2 outperform F1, the lowest Life in Hours of F2 is much higher than F1.

Evaluating Other Factors

Running Temperature

> Lower running temperature appears to help extending formulation life.

Storage Temperature

It appears that storage temperature did not have such a drastic impact on formulation life compare to Running Temperature. However, note that formulation F2 appears to perform better at Med and Low temperature. And F1 performs best at low storage temperature.

Machine

According to the boxplot, for F1, machines do not seem to affect greatly in terms of formulation performance. For F2 however, machine 2 appears to help exten formulation life. Nevertheless, machine appears to have less significance in general

Interations

> So yes, the plot does show that there seems to be some sort of interactions between storage and running temperature, since the lines are not parallel. Continue on with ANOVA

ANOVA

## Call:
##    aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, 
##     data = lab2)
## 
## Terms:
##                 RunningTemperature StorageTemperature
## Sum of Squares            770.2593           585.4815
## Deg. of Freedom                  2                  2
##                 RunningTemperature:StorageTemperature Residuals
## Sum of Squares                               201.6296  914.3333
## Deg. of Freedom                                     4        45
## 
## Residual standard error: 4.507607
## Estimated effects may be unbalanced
##                                       Df Sum Sq Mean Sq F value   Pr(>F)
## RunningTemperature                     2  770.3   385.1  18.955 1.07e-06
## StorageTemperature                     2  585.5   292.7  14.408 1.46e-05
## RunningTemperature:StorageTemperature  4  201.6    50.4   2.481   0.0572
## Residuals                             45  914.3    20.3                 
##                                          
## RunningTemperature                    ***
## StorageTemperature                    ***
## RunningTemperature:StorageTemperature .  
## Residuals                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Great, so ANOVA suggests that both Running and Storage temperature are significant. The interaction, on the other hand are not as strong as I had hope. Proceed to TukeyHSD test

TukeyHSD

Running Temperature

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, data = lab2)
## 
## $RunningTemperature
##              diff       lwr       upr     p adj
## med-high 5.944444  2.302879  9.586010 0.0007678
## low-high 9.111111  5.469545 12.752677 0.0000007
## low-med  3.166667 -0.474899  6.808232 0.0996512

For running temperature, lower temperature definitely helps with formulation life. From best to worst in terms of formulation performance in different running temperature: Low > Med > High

Storage Temperature

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, data = lab2)
## 
## $StorageTemperature
##              diff        lwr       upr     p adj
## med-high 4.888889  1.2473232  8.530455 0.0060161
## low-high 8.000000  4.3584344 11.641566 0.0000092
## low-med  3.111111 -0.5304545  6.752677 0.1075016

Similar story here as Running Temperature, from best to worst temperature performance: Low > Med > High

Interaction

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = LifeinHours ~ RunningTemperature * StorageTemperature, data = lab2)
## 
## $`RunningTemperature:StorageTemperature`
##                          diff          lwr       upr     p adj
## med:high-high:high  4.0000000  -4.47659052 12.476591 0.8320152
## low:high-high:high  5.8333333  -2.64325719 14.309924 0.3979677
## high:med-high:high  2.6666667  -5.80992385 11.143257 0.9813461
## med:med-high:high  11.5000000   3.02340948 19.976591 0.0018626
## low:med-high:high  10.3333333   1.85674281 18.809924 0.0071759
## high:low-high:high  5.0000000  -3.47659052 13.476591 0.6032770
## med:low-high:high  10.0000000   1.52340948 18.476591 0.0103830
## low:low-high:high  18.8333333  10.35674281 27.309924 0.0000002
## low:high-med:high   1.8333333  -6.64325719 10.309924 0.9984846
## high:med-med:high  -1.3333333  -9.80992385  7.143257 0.9998530
## med:med-med:high    7.5000000  -0.97659052 15.976591 0.1206672
## low:med-med:high    6.3333333  -2.14325719 14.809924 0.2916344
## high:low-med:high   1.0000000  -7.47659052  9.476591 0.9999836
## med:low-med:high    6.0000000  -2.47659052 14.476591 0.3605510
## low:low-med:high   14.8333333   6.35674281 23.309924 0.0000290
## high:med-low:high  -3.1666667 -11.64325719  5.309924 0.9484958
## med:med-low:high    5.6666667  -2.80992385 14.143257 0.4370564
## low:med-low:high    4.5000000  -3.97659052 12.976591 0.7258309
## high:low-low:high  -0.8333333  -9.30992385  7.643257 0.9999960
## med:low-low:high    4.1666667  -4.30992385 12.643257 0.7991722
## low:low-low:high   13.0000000   4.52340948 21.476591 0.0002984
## med:med-high:med    8.8333333   0.35674281 17.309924 0.0352795
## low:med-high:med    7.6666667  -0.80992385 16.143257 0.1046902
## high:low-high:med   2.3333333  -6.14325719 10.809924 0.9920699
## med:low-high:med    7.3333333  -1.14325719 15.809924 0.1385659
## low:low-high:med   16.1666667   7.69007615 24.643257 0.0000051
## low:med-med:med    -1.1666667  -9.64325719  7.309924 0.9999465
## high:low-med:med   -6.5000000 -14.97659052  1.976591 0.2604373
## med:low-med:med    -1.5000000  -9.97659052  6.976591 0.9996464
## low:low-med:med     7.3333333  -1.14325719 15.809924 0.1385659
## high:low-low:med   -5.3333333 -13.80992385  3.143257 0.5189883
## med:low-low:med    -0.3333333  -8.80992385  8.143257 1.0000000
## low:low-low:med     8.5000000   0.02340948 16.976591 0.0488888
## med:low-high:low    5.0000000  -3.47659052 13.476591 0.6032770
## low:low-high:low   13.8333333   5.35674281 22.309924 0.0001044
## low:low-med:low     8.8333333   0.35674281 17.309924 0.0352795

Time to check the interactions between Running and Storage Temperature. According to the TukeyHSD test, and observing the P-Adjusted column, the ones that shown significance mainly consist of Low temperature.

Summary

The mean life in hours for Formulation 1 is 27.48 and 32.67 for Formulation 2. The ANOVA model suggest that formulations tend to perform better at lower temperatures, and this is true for both running and storage temperature. Meanwhile, the interations between Running and Storage temperature, albeit lack great significance, nonethelss suggest similar result of better performance at lower temperature. Machine did not play a big role in terms of affecting formulation performance.