Objectives

You work at a chemical processing firm. The engineers are testing new formulations of a commercial grade lubricant. This lubricant will be used in high use heavy machinery. They want to know how long each formulation lasts in general, but also how the formulations last under different temperatures and following different storage temperatures. Since these are initial tests they have simplified the predictor variables by forming categories (low, medium, high heat). These tests were performed across three different machines which might vary in some ways. Below is the data from this study. Your task is to provide answers to their questions. Make sure to check the assumptions of your model and to provide a well written summary of the findings

Loading the “moments” package

library(moments)

Loading the dataset

library(readxl)
Lab2 <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Labs/Lab 2/WeeklyLab2Data.xlsx")
Lab2
## # A tibble: 54 x 5
##    RunningTemperature StorageTemperature Formulation Machine LifeinHours
##    <chr>              <chr>              <chr>       <chr>         <dbl>
##  1 high               high               F1          M1               25
##  2 high               high               F1          M3               16
##  3 high               high               F1          M2               20
##  4 low                high               F1          M2               27
##  5 low                high               F1          M3               24
##  6 low                high               F1          M1               26
##  7 med                high               F1          M2               26
##  8 med                high               F1          M3               26
##  9 med                high               F1          M1               20
## 10 high               low                F1          M2               24
## # … with 44 more rows

Let’s get the mean average life of different formulations using the aggregate function

aggregate(Lab2$LifeinHours ~ Lab2$Formulation, Lab2, mean)
##   Lab2$Formulation Lab2$LifeinHours
## 1               F1         27.48148
## 2               F2         32.66667

The above answers the first question of the engineers - Formulation F1 lasts about 27.5 hours in average and Formulation F2 lasts about 33 hours in average

Now, let’s plot the distribution to see if there is any skewness in the data or not

plot(density(Lab2$LifeinHours))

From the above plot, the data of life in hours look normally distributed

Let’s visualize the data on different factors such as running temperature, storage temperature, etc.

boxplot(Lab2$LifeinHours ~ Lab2$RunningTemperature, data = Lab2, main = "Impact of running temperature on duration of formulation", xlab = "Running Temperature" , ylab = "Life in hours")

boxplot(Lab2$LifeinHours ~ Lab2$StorageTemperature, data = Lab2, main = "Impact of storage temperature on duration of formulation", xlab = "Storage Temperature" , ylab = "Life in hours")

boxplot(Lab2$LifeinHours ~ Lab2$Machine, data = Lab2, main = "Impact of machine on duration of formulation", xlab = "Machine in use" , ylab = "Life in hours")

Based on the box plots, we can see that there is a variation in the average life of a formulation across different running and storage temperaturs

However, the variation is relatively small when we consider Machine so the average life of the two formulations is similar across different machines

Therefore, the machines in use are a nuisance factor and can be used as blocks for the anova model we will be building later on. The main variables for the anova model would be running temperature and storage temperature

Now there could be interaction effects that might be impacting the duration of the formulations. Let’s plot these interaction effects

interaction.plot(Lab2$RunningTemperature, Lab2$StorageTemperature, Lab2$LifeinHours, fun = mean, fixed = TRUE, trace.label = "Storage Tempeature", xlab = "Running Temperature", ylab = "Average Life in Hours", leg.bg = par("bg"), leg.bty = "n")

Based on the above plot, we can see that there is some sort of interaction between running and storage temperatures as the lines aren’t completely parallel and they cross each other for medium running temperature

Therefore, the life of a forumulation gets affected by two different kinds of variables - one for the running and storage temperatures and the other for an interaction effect between these two types of temperatures

Let’s build our anova model using the machine in use as a block

AnovaModel <- aov(Lab2$LifeinHours ~ Lab2$RunningTemperature*Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
summary(AnovaModel)
##                                                 Df Sum Sq Mean Sq F value
## Lab2$RunningTemperature                          2  770.3   385.1  19.476
## Lab2$StorageTemperature                          2  585.5   292.7  14.804
## Lab2$Machine                                     2   64.0    32.0   1.619
## Lab2$RunningTemperature:Lab2$StorageTemperature  4  201.6    50.4   2.549
## Residuals                                       43  850.3    19.8        
##                                                   Pr(>F)    
## Lab2$RunningTemperature                         9.51e-07 ***
## Lab2$StorageTemperature                         1.28e-05 ***
## Lab2$Machine                                      0.2099    
## Lab2$RunningTemperature:Lab2$StorageTemperature   0.0528 .  
## Residuals                                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the anova model results, we see that both the running and storage temperatures are coming out to be significant. Furthermore, the interaction effect isn’t coming out to be significant but it is being pretty close to being significant.

To deep dive further into the variables, we can perform Tukey’s HSD test

Tukey’s HSD test on Running Temperature

TukeyHSD(AnovaModel, "Lab2$RunningTemperature", conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Lab2$LifeinHours ~ Lab2$RunningTemperature * Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
## 
## $`Lab2$RunningTemperature`
##               diff       lwr       upr     p adj
## low-high  9.111111  5.512970 12.709252 0.0000007
## med-high  5.944444  2.346304  9.542585 0.0006825
## med-low  -3.166667 -6.764807  0.431474 0.0943866

From the Tukey test, we see that the formulations last longer in low and medium temperatures compared to higher temperature and the difference in life is significant when going from high to a medium or low temperature.

Tukey’s HSD test on Storage Temperature

TukeyHSD(AnovaModel, "Lab2$StorageTemperature", conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Lab2$LifeinHours ~ Lab2$RunningTemperature * Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
## 
## $`Lab2$StorageTemperature`
##               diff       lwr        upr     p adj
## low-high  8.000000  4.401859 11.5981407 0.0000081
## med-high  4.888889  1.290748  8.4870296 0.0054519
## med-low  -3.111111 -6.709252  0.4870296 0.1019761

Similar results as that of the running temperature based Tukey test.

Let’s try Tukey on the interaction effect

TukeyHSD(AnovaModel, "Lab2$RunningTemperature:Lab2$StorageTemperature", conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Lab2$LifeinHours ~ Lab2$RunningTemperature * Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
## 
## $`Lab2$RunningTemperature:Lab2$StorageTemperature`
##                           diff         lwr        upr     p adj
## low:high-high:high   5.8333333  -2.5479702 14.2146369 0.3807252
## med:high-high:high   4.0000000  -4.3813036 12.3813036 0.8213051
## high:low-high:high   5.0000000  -3.3813036 13.3813036 0.5862276
## low:low-high:high   18.8333333  10.4520298 27.2146369 0.0000001
## med:low-high:high   10.0000000   1.6186964 18.3813036 0.0092777
## high:med-high:high   2.6666667  -5.7146369 11.0479702 0.9796005
## low:med-high:high   10.3333333   1.9520298 18.7146369 0.0063900
## med:med-high:high   11.5000000   3.1186964 19.8813036 0.0016438
## med:high-low:high   -1.8333333 -10.2146369  6.5479702 0.9983179
## high:low-low:high   -0.8333333  -9.2146369  7.5479702 0.9999955
## low:low-low:high    13.0000000   4.6186964 21.3813036 0.0002622
## med:low-low:high     4.1666667  -4.2146369 12.5479702 0.7871033
## high:med-low:high   -3.1666667 -11.5479702  5.2146369 0.9442452
## low:med-low:high     4.5000000  -3.8813036 12.8813036 0.7113060
## med:med-low:high     5.6666667  -2.7146369 14.0479702 0.4194657
## high:low-med:high    1.0000000  -7.3813036  9.3813036 0.9999816
## low:low-med:high    14.8333333   6.4520298 23.2146369 0.0000256
## med:low-med:high     6.0000000  -2.3813036 14.3813036 0.3438166
## high:med-med:high   -1.3333333  -9.7146369  7.0479702 0.9998356
## low:med-med:high     6.3333333  -2.0479702 14.7146369 0.2763025
## med:med-med:high     7.5000000  -0.8813036 15.8813036 0.1118403
## low:low-high:low    13.8333333   5.4520298 22.2146369 0.0000918
## med:low-high:low     5.0000000  -3.3813036 13.3813036 0.5862276
## high:med-high:low   -2.3333333 -10.7146369  6.0479702 0.9912726
## low:med-high:low     5.3333333  -3.0479702 13.7146369 0.5012580
## med:med-high:low     6.5000000  -1.8813036 14.8813036 0.2459501
## med:low-low:low     -8.8333333 -17.2146369 -0.4520298 0.0319946
## high:med-low:low   -16.1666667 -24.5479702 -7.7853631 0.0000046
## low:med-low:low     -8.5000000 -16.8813036 -0.1186964 0.0445592
## med:med-low:low     -7.3333333 -15.7146369  1.0479702 0.1288173
## high:med-med:low    -7.3333333 -15.7146369  1.0479702 0.1288173
## low:med-med:low      0.3333333  -8.0479702  8.7146369 1.0000000
## med:med-med:low      1.5000000  -6.8813036  9.8813036 0.9996056
## low:med-high:med     7.6666667  -0.7146369 16.0479702 0.0967455
## med:med-high:med     8.8333333   0.4520298 17.2146369 0.0319946
## med:med-low:med      1.1666667  -7.2146369  9.5479702 0.9999401

Out of the 36 combinations of the interaction variable, only those ones are coming out to be significant where either the running or storage temperature is high and the difference is compared with a low temperature. It is basically saying the same thing as the above two Tukey’s test.

Conclusion

At an overall level, Formulation F1 lasts about 27.5 hours in average and Formulation F2 lasts about 33 hours in average. The Anova model says that the formulations last longer in relatively lower temperatures of running and storage which basically means that the ideal temperatures in which these formulations should be used are lower. There are possible interactions of running and storage temperatures on the life of the formulations but these interactions aren’t significant. The machine types don’t have a significant impact on the life of a formulation.