You work at a chemical processing firm. The engineers are testing new formulations of a commercial grade lubricant. This lubricant will be used in high use heavy machinery. They want to know how long each formulation lasts in general, but also how the formulations last under different temperatures and following different storage temperatures. Since these are initial tests they have simplified the predictor variables by forming categories (low, medium, high heat). These tests were performed across three different machines which might vary in some ways. Below is the data from this study. Your task is to provide answers to their questions. Make sure to check the assumptions of your model and to provide a well written summary of the findings
Loading the “moments” package
library(moments)
Loading the dataset
library(readxl)
Lab2 <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Labs/Lab 2/WeeklyLab2Data.xlsx")
Lab2
## # A tibble: 54 x 5
## RunningTemperature StorageTemperature Formulation Machine LifeinHours
## <chr> <chr> <chr> <chr> <dbl>
## 1 high high F1 M1 25
## 2 high high F1 M3 16
## 3 high high F1 M2 20
## 4 low high F1 M2 27
## 5 low high F1 M3 24
## 6 low high F1 M1 26
## 7 med high F1 M2 26
## 8 med high F1 M3 26
## 9 med high F1 M1 20
## 10 high low F1 M2 24
## # … with 44 more rows
Let’s get the mean average life of different formulations using the aggregate function
aggregate(Lab2$LifeinHours ~ Lab2$Formulation, Lab2, mean)
## Lab2$Formulation Lab2$LifeinHours
## 1 F1 27.48148
## 2 F2 32.66667
The above answers the first question of the engineers - Formulation F1 lasts about 27.5 hours in average and Formulation F2 lasts about 33 hours in average
Now, let’s plot the distribution to see if there is any skewness in the data or not
plot(density(Lab2$LifeinHours))
From the above plot, the data of life in hours look normally distributed
Let’s visualize the data on different factors such as running temperature, storage temperature, etc.
boxplot(Lab2$LifeinHours ~ Lab2$RunningTemperature, data = Lab2, main = "Impact of running temperature on duration of formulation", xlab = "Running Temperature" , ylab = "Life in hours")
boxplot(Lab2$LifeinHours ~ Lab2$StorageTemperature, data = Lab2, main = "Impact of storage temperature on duration of formulation", xlab = "Storage Temperature" , ylab = "Life in hours")
boxplot(Lab2$LifeinHours ~ Lab2$Machine, data = Lab2, main = "Impact of machine on duration of formulation", xlab = "Machine in use" , ylab = "Life in hours")
Based on the box plots, we can see that there is a variation in the average life of a formulation across different running and storage temperaturs
However, the variation is relatively small when we consider Machine so the average life of the two formulations is similar across different machines
Therefore, the machines in use are a nuisance factor and can be used as blocks for the anova model we will be building later on. The main variables for the anova model would be running temperature and storage temperature
Now there could be interaction effects that might be impacting the duration of the formulations. Let’s plot these interaction effects
interaction.plot(Lab2$RunningTemperature, Lab2$StorageTemperature, Lab2$LifeinHours, fun = mean, fixed = TRUE, trace.label = "Storage Tempeature", xlab = "Running Temperature", ylab = "Average Life in Hours", leg.bg = par("bg"), leg.bty = "n")
Based on the above plot, we can see that there is some sort of interaction between running and storage temperatures as the lines aren’t completely parallel and they cross each other for medium running temperature
Therefore, the life of a forumulation gets affected by two different kinds of variables - one for the running and storage temperatures and the other for an interaction effect between these two types of temperatures
Let’s build our anova model using the machine in use as a block
AnovaModel <- aov(Lab2$LifeinHours ~ Lab2$RunningTemperature*Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
summary(AnovaModel)
## Df Sum Sq Mean Sq F value
## Lab2$RunningTemperature 2 770.3 385.1 19.476
## Lab2$StorageTemperature 2 585.5 292.7 14.804
## Lab2$Machine 2 64.0 32.0 1.619
## Lab2$RunningTemperature:Lab2$StorageTemperature 4 201.6 50.4 2.549
## Residuals 43 850.3 19.8
## Pr(>F)
## Lab2$RunningTemperature 9.51e-07 ***
## Lab2$StorageTemperature 1.28e-05 ***
## Lab2$Machine 0.2099
## Lab2$RunningTemperature:Lab2$StorageTemperature 0.0528 .
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the anova model results, we see that both the running and storage temperatures are coming out to be significant. Furthermore, the interaction effect isn’t coming out to be significant but it is being pretty close to being significant.
To deep dive further into the variables, we can perform Tukey’s HSD test
Tukey’s HSD test on Running Temperature
TukeyHSD(AnovaModel, "Lab2$RunningTemperature", conf.level = 0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Lab2$LifeinHours ~ Lab2$RunningTemperature * Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
##
## $`Lab2$RunningTemperature`
## diff lwr upr p adj
## low-high 9.111111 5.512970 12.709252 0.0000007
## med-high 5.944444 2.346304 9.542585 0.0006825
## med-low -3.166667 -6.764807 0.431474 0.0943866
From the Tukey test, we see that the formulations last longer in low and medium temperatures compared to higher temperature and the difference in life is significant when going from high to a medium or low temperature.
Tukey’s HSD test on Storage Temperature
TukeyHSD(AnovaModel, "Lab2$StorageTemperature", conf.level = 0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Lab2$LifeinHours ~ Lab2$RunningTemperature * Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
##
## $`Lab2$StorageTemperature`
## diff lwr upr p adj
## low-high 8.000000 4.401859 11.5981407 0.0000081
## med-high 4.888889 1.290748 8.4870296 0.0054519
## med-low -3.111111 -6.709252 0.4870296 0.1019761
Similar results as that of the running temperature based Tukey test.
Let’s try Tukey on the interaction effect
TukeyHSD(AnovaModel, "Lab2$RunningTemperature:Lab2$StorageTemperature", conf.level = 0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Lab2$LifeinHours ~ Lab2$RunningTemperature * Lab2$StorageTemperature + Lab2$Machine, data = Lab2)
##
## $`Lab2$RunningTemperature:Lab2$StorageTemperature`
## diff lwr upr p adj
## low:high-high:high 5.8333333 -2.5479702 14.2146369 0.3807252
## med:high-high:high 4.0000000 -4.3813036 12.3813036 0.8213051
## high:low-high:high 5.0000000 -3.3813036 13.3813036 0.5862276
## low:low-high:high 18.8333333 10.4520298 27.2146369 0.0000001
## med:low-high:high 10.0000000 1.6186964 18.3813036 0.0092777
## high:med-high:high 2.6666667 -5.7146369 11.0479702 0.9796005
## low:med-high:high 10.3333333 1.9520298 18.7146369 0.0063900
## med:med-high:high 11.5000000 3.1186964 19.8813036 0.0016438
## med:high-low:high -1.8333333 -10.2146369 6.5479702 0.9983179
## high:low-low:high -0.8333333 -9.2146369 7.5479702 0.9999955
## low:low-low:high 13.0000000 4.6186964 21.3813036 0.0002622
## med:low-low:high 4.1666667 -4.2146369 12.5479702 0.7871033
## high:med-low:high -3.1666667 -11.5479702 5.2146369 0.9442452
## low:med-low:high 4.5000000 -3.8813036 12.8813036 0.7113060
## med:med-low:high 5.6666667 -2.7146369 14.0479702 0.4194657
## high:low-med:high 1.0000000 -7.3813036 9.3813036 0.9999816
## low:low-med:high 14.8333333 6.4520298 23.2146369 0.0000256
## med:low-med:high 6.0000000 -2.3813036 14.3813036 0.3438166
## high:med-med:high -1.3333333 -9.7146369 7.0479702 0.9998356
## low:med-med:high 6.3333333 -2.0479702 14.7146369 0.2763025
## med:med-med:high 7.5000000 -0.8813036 15.8813036 0.1118403
## low:low-high:low 13.8333333 5.4520298 22.2146369 0.0000918
## med:low-high:low 5.0000000 -3.3813036 13.3813036 0.5862276
## high:med-high:low -2.3333333 -10.7146369 6.0479702 0.9912726
## low:med-high:low 5.3333333 -3.0479702 13.7146369 0.5012580
## med:med-high:low 6.5000000 -1.8813036 14.8813036 0.2459501
## med:low-low:low -8.8333333 -17.2146369 -0.4520298 0.0319946
## high:med-low:low -16.1666667 -24.5479702 -7.7853631 0.0000046
## low:med-low:low -8.5000000 -16.8813036 -0.1186964 0.0445592
## med:med-low:low -7.3333333 -15.7146369 1.0479702 0.1288173
## high:med-med:low -7.3333333 -15.7146369 1.0479702 0.1288173
## low:med-med:low 0.3333333 -8.0479702 8.7146369 1.0000000
## med:med-med:low 1.5000000 -6.8813036 9.8813036 0.9996056
## low:med-high:med 7.6666667 -0.7146369 16.0479702 0.0967455
## med:med-high:med 8.8333333 0.4520298 17.2146369 0.0319946
## med:med-low:med 1.1666667 -7.2146369 9.5479702 0.9999401
Out of the 36 combinations of the interaction variable, only those ones are coming out to be significant where either the running or storage temperature is high and the difference is compared with a low temperature. It is basically saying the same thing as the above two Tukey’s test.
At an overall level, Formulation F1 lasts about 27.5 hours in average and Formulation F2 lasts about 33 hours in average. The Anova model says that the formulations last longer in relatively lower temperatures of running and storage which basically means that the ideal temperatures in which these formulations should be used are lower. There are possible interactions of running and storage temperatures on the life of the formulations but these interactions aren’t significant. The machine types don’t have a significant impact on the life of a formulation.