Introduction to the Problem
A company that manufactures large industrial fans recently acquired a smaller competitor manufacturing fans with stronger blades. The companies used different manufacturing methods to attach the fan blades (spyders) to the fan hubs, which give the assembly its strength. In an attempt to determine the best manufacturing process, an experiment was designed to investigate the better levels of:
Two types of hole in the spyder
Two assembly methods, and
Two hub (or barrel) shapes.
The full factorial design resulted in eight treatments.
Destructive testing was used to break the spyder from the hub and the Torque required to break the part was measured in footpounds. The levels of the three factors are denoted by plus(+) or minus(-) one. Here, we analyse the data to determine the optimal levels of the 3 factors resulting in the strongest spyders.
NOTE : The strongest Spyders are the ones which require the highest Torque that needs to be applied to break it. Essentially, we are seeking a combination of the 3 factors (Hub, Assembly, Hole) that will provide us with an indication of the highest Torque values.
Possible Questions to Answer
Data Analysis
inp.data <- read.table("C:\\Users\\Lenovo\\Desktop\\spyder_strength.data", header = TRUE)
inp.data$Hole <- as.factor(inp.data$Hole)
inp.data$Assembly <- as.factor(inp.data$Assembly)
inp.data$Hub <- as.factor(inp.data$Hub)
##Convert factors to readable formats
levels(inp.data$Hole)[levels(inp.data$Hole)=="-1"] <- "Hole1"
levels(inp.data$Hole)[levels(inp.data$Hole)=="1"] <- "Hole2"
levels(inp.data$Assembly)[levels(inp.data$Assembly)=="-1"] <- "Assembly1"
levels(inp.data$Assembly)[levels(inp.data$Assembly)=="1"] <- "Assembly2"
levels(inp.data$Hub)[levels(inp.data$Hub)=="-1"] <- "Hub1"
levels(inp.data$Hub)[levels(inp.data$Hub)=="1"] <- "Hub2"
##Creating Treatment Column (For comparison purposes)
for(i in 1:64){
if (inp.data$Hole[i] == "Hole1" & inp.data$Assembly[i] == "Assembly1" & inp.data$Hub[i]=="Hub1") inp.data$treatment[i] <- "H1A1H1"
if (inp.data$Hole[i] == "Hole1" & inp.data$Assembly[i] == "Assembly1" & inp.data$Hub[i]=="Hub2") inp.data$treatment[i] <- "H1A1H2"
if (inp.data$Hole[i] == "Hole1" & inp.data$Assembly[i] == "Assembly2" & inp.data$Hub[i]=="Hub1") inp.data$treatment[i] <- "H1A2H1"
if (inp.data$Hole[i] == "Hole1" & inp.data$Assembly[i] == "Assembly2" & inp.data$Hub[i]=="Hub2") inp.data$treatment[i] <- "H1A2H2"
if (inp.data$Hole[i] == "Hole2" & inp.data$Assembly[i] == "Assembly1" & inp.data$Hub[i]=="Hub1") inp.data$treatment[i] <- "H2A1H1"
if (inp.data$Hole[i] == "Hole2" & inp.data$Assembly[i] == "Assembly1" & inp.data$Hub[i]=="Hub2") inp.data$treatment[i] <- "H2A1H2"
if (inp.data$Hole[i] == "Hole2" & inp.data$Assembly[i] == "Assembly2" & inp.data$Hub[i]=="Hub1") inp.data$treatment[i] <- "H2A2H1"
if (inp.data$Hole[i] == "Hole2" & inp.data$Assembly[i] == "Assembly2" & inp.data$Hub[i]=="Hub2") inp.data$treatment[i] <- "H2A2H2"
}
inp.data$treatment <- as.factor(inp.data$treatment)
First we plot the Torque values for each of the 8 Treatment groups and visually inspect if there are any differences in the Mean Torque Values across the different groups.
bymedian <- with(inp.data, reorder(inp.data$treatment, inp.data$Torque, median))
boxplot(Torque~bymedian, data = inp.data, main = "Torque values for different Treatment Group", xlab = "Treatment Group", ylab ="Torque", cex.axis = 0.75)
As can be seen from the boxplot above for Torque values for different Treatment groups, there seems to be a significant effect of various combinations of Hole, Assembly, and Hub factor levels on Torque.
We would next graphically explore how the mean Torque value changes given specific levels of each of the given Factors (Hub, Assembly, Hole) considered individually.
plot.design(Torque ~ Hole*Assembly*Hub,data=inp.data,xlab="Factors",main = "Main Effects Model")
As can be seen from the plot, the mean value of Torque seems to change the most based on the level of Hub (Hub1 or Hub2) that is chosen.
We now proceed to see whether different combinations of levels of factors differently affect the Mean Torque value.
We can test for the statistical significance of “Treatment group affects Torque” by proceeding with a One-Way ANOVA Model for Treatment and Torque.
H0 : There is no different in Mean values of Torque across the different Treatment Groups.
\(\mu\) 1 = \(\mu\) 2 = \(\mu\) 3 = \(\mu\) 4 = \(\mu\) 5 = \(\mu\) 6 = \(\mu\) 7 = \(\mu\) 8
H1: There is a significant difference in Mean values of Torque across different Treatment Groups.
\(\mu\)l \(\ne\) \(\mu\)m for l \(\ne\) m. l = 1..8, m = 1..8
anovaModel1 <- anova(aov(Torque~treatment, data= inp.data))
anovaModel1
## Analysis of Variance Table
##
## Response: Torque
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 7 219653 31379 1013.3 < 2.2e-16 ***
## Residuals 56 1734 31
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The high F-value obtained above allows us to reject the Null Hypothesis that Mean values for Torques are equal across Treatment groups.
Now that we have confirmed that Mean values of Torque are different across different Treatment Groups, we can proceed with building an ANOVA model that helps us see the Main Effects and the Interaction Effects between Assembly Method, Hub Shape and Hole Type in the given data.
ANOVA Model
SST = SSTreatments + SSE
Assumptions for ANOVA Model
The dependent variable should be measured at the continuous level (i.e., it is an interval or ratio variable).
The three independent variables should each consist of two or more categorical, independent groups.
You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves.
There should be no significant outliers.
Your dependent variable should be approximately normally distributed for each combination of the groups of the three independent variables.
There needs to be homogeneity of variances for each combination of the groups of the three independent variables.
Assumptions 1, 2 and 3 are usually satisfied by the Design of the Experiment itself.
Assumption 4 can be tested as follows:
boxplot(inp.data$Torque, ylab = "Torque", main = "Boxplot of Torque values")
The boxplot indicates that there are No visible Outliers in the given data.
## P-Values for Shapiro Wilk Test for Normality for different Treatment Groups
## ---------------------------------------------------------------------------
## Treatment Group H1A1H1 : 0.9731535
## Treatment Group H1A1H2 : 0.278501
## Treatment Group H1A2H1 : 0.09648861
## Treatment Group H1A2H2 : 0.001685335
## Treatment Group H2A1H1 : 0.03940426
## Treatment Group H2A1H2 : 0.0004790566
## Treatment Group H2A2H1 : 0.09619312
## Treatment Group H2A2H2 : 0.0004790566
Analysis of the obtained p-values shows that barring 3 samples, the rest of the samples hold up to Assumption 5.
H0 : Variances are equal across Treatment Groups.
H1 : Variances are not equal across Treatment Groups.
##Bartlett Test
bartlett.test(Torque ~ treatment, data=inp.data)
##
## Bartlett test of homogeneity of variances
##
## data: Torque by treatment
## Bartlett's K-squared = 11.553, df = 7, p-value = 0.1162
The p-value obtained above indicates that we cannot reject the Null Hypothesis that Variance is the same for all the Treatment Groups. Therefore Assumption 6 also holds.
Since we have completed checking the Assumptions we have made regarding the given data, we can proceed with ANOVA Model building.
ANOVA Model - Three Way Analysis of Variance
anovaModel3 <- anova(aov(Torque ~ Hole*Hub*Assembly , data = inp.data))
anovaModel3
## Analysis of Variance Table
##
## Response: Torque
## Df Sum Sq Mean Sq F value Pr(>F)
## Hole 1 8258 8258 266.6837 < 2.2e-16 ***
## Hub 1 193050 193050 6234.1653 < 2.2e-16 ***
## Assembly 1 13369 13369 431.7289 < 2.2e-16 ***
## Hole:Hub 1 594 594 19.1865 5.250e-05 ***
## Hole:Assembly 1 2849 2849 91.9991 2.026e-13 ***
## Hub:Assembly 1 135 135 4.3641 0.04126 *
## Hole:Hub:Assembly 1 1397 1397 45.1097 1.017e-08 ***
## Residuals 56 1734 31
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F-values and the corresponding p-values for each of the above term indicates the following :
There are significant Main Effects of the 3 Factors (Hub, Assembly, Hole) on mean Torque values.
There are significant 2-way Interactions between any 2 of the given Factors.
There is a significant 3-way Interaction Effect in the Model between Hub, Assembly, and Hole.
The 3-way Interaction Effect can be visualized as follows:
NOTE : The 3 way Interaction plot is built for Hole Type and Mean Torque values, after varying the levels for Hub shape and Assembly Method.
The statistical significance of the 3-way Interaction term and the plot above allows us to infer the following :
Choice of Hole Type affects the Mean Torque values differently based on the different combinations of Hub Shape and Assembly Method that is selected.
Choice of Hub Shape affects the Mean Torque values differently based on the different combinations of Assembly Method and Hole Type that is selected.
Choice of Assembly Method affects the Mean Torque values differently based on the different combinations of Hole Type and Hub Shape that is selected.
Another way of interpreting the 3 way Interaction in the above plot would be - The effect of interaction between Assembly Method and Hub shape, depends on the level of Hole Type.
For example, from the above plot, choice of Hole 2 produces a greater Mean Torque compared to Hole 1 when Hub 1 and Assembly 1 have been selected. Changing either Hub/Assembly levels changes the way Hole Type affects Torque values(Changing Hole Type from 1 to 2 doesn’t seem to produce any significant difference in Mean Torque values when Hub 1 and Assembly 2 are selected).
Basically, a three-way interaction means that one, or more, two-way interactions differ across the levels of a third variable.
We can recursively observe the 3-way Interactions by keeping Hole Type fixed at Hole 1 and Hole 2 and checking the Interactions between Hub Shape and Assembly Method. Then we can fix Hub Shape at 1 or 2 and observe how Assembly Method affects Mean Torque Values.
df1 <- subset(inp.data, inp.data$Hole=="Hole1", select = -Hole)
interaction.plot(df1$Hub,df1$Assembly,df1$Torque,main="2 Way Interaction for Hole 1", xlab = "Hub Type", ylab = "Mean Torque", legend =TRUE, trace.label = "Assembly Method")
df2 <- subset(inp.data, inp.data$Hole=="Hole2", select = -Hole)
interaction.plot(df2$Hub,df2$Assembly,df2$Torque,main="2 Way Interaction for Hole 2", xlab = "Hub Type", ylab = "Mean Torque", legend =TRUE, trace.label = "Assembly Method")
df3 <- subset(df1, df1$Hub=="Hub1", select = -Hub)
plotmeans(Torque~ Assembly,xlab="Assembly",ylab="Torque",main="For Hole 1 and Hub 1",data = df3)
df4 <- subset(df1, df1$Hub=="Hub2", select = -Hub)
plotmeans(Torque~ Assembly,xlab="Assembly",ylab="Torque",main="For Hole 1 and Hub 2",data = df4)
Above, we have plotted Interactions for :
Keeping Hole Type fixed at 1 and checking Interactions for Hub and Assembly.
Keeping Hole Type fixed at 2 and checking Interactions for Hub and Assembly.
The Interaction between Hub and Assembly seems to change for the two levels of Hole Type.
Next we plotted :
Hole Type = 1 and Hub Shape = 1 (fixed) to give Effect of Assembly on Mean Torque.
Hole Type = 1 and Hub Shape = 2 (fixed) to give Effect of Assembly on Mean Torque.
We can observe how fixing the levels of Hole and Hub factors impacts the choice of Assembly Method on Mean Torques.
The 2-way Interaction Effects can be visualized as follows:
interaction.plot(inp.data$Hole,inp.data$Assembly,inp.data$Torque,main="2 Way Interaction - Hole vs Assembly")
interaction.plot(inp.data$Assembly,inp.data$Hub,inp.data$Torque,main="2 Way Interaction - Assembly vs Hub")
interaction.plot(inp.data$Hole,inp.data$Hub,inp.data$Torque,main="2 Way Interaction - Hole vs Hub")
Each of the above plots indicates that there are 2-way Interactions between each of the Factors. While the plots don’t show “explicit” Interactions (indicated by intersection of lines), the non-parallel nature of the plot indicates Ordinal Interactions. We can infer the following from the ANOVA table and the above plots (keeping in mind the 3-way Interaction):
The relationship between Hole Type and Torque values depends on the value of Assembly Method. Similarly, the relationship between Assembly Method and Torque values depends on the value of Hole type.
The relationship between Assembly Method and Torque values depends on the value of Hub shape. Similarly, the relationship between Hub shape and Torque values depends on the value of Assembly Method.
The relationship between Hole Type and Torque values depends on the value of Hub shape. Similarly, the relationship between Hub Shape and Torque values depends on the value of Hole Type.
NOTE : All of the above inferences can be made keeping in mind the significance of the F-values obtained for 3-way and 2-way Interactions in the ANOVA table.
The Main Effects (keeping in mind the 2-way and 3-way Interaction Effects) can be visualized as follows:
plotmeans(Torque~ Hole,xlab="Hole",ylab="Torque",main="Mean Torque for Hole Levels",data = inp.data)
plotmeans(Torque~ Assembly,xlab="Assembly",ylab="Torque",main="Mean Torque for Assembly Levels",data = inp.data)
plotmeans(Torque~ Hub,xlab="Hole",ylab="Hub",main="Mean Torque for Hub Levels",data = inp.data )
A visual inspection indicates the following:
Mean Torque for Hole 2 is greater than Mean Torque for Hole 1.
Mean Torque for Assembly 1 is greater than Mean Torque for Assembly 2.
Mean Torque for Hub 2 is greater than Mean Torque for Hub 1.
We can infer statistical significance of the differences in Mean Torque values for each of the above case only after performing a Tukey Honest Significant Difference (HSD) Test.
A Table containing the Mean Torque Values for each Combination of the given factors as well as the Grand Mean is as shown below :
fm <- aov(Torque ~ Hole*Hub*Assembly , data = inp.data)
model.tables(fm, data = inp.data, type = "means")
## Tables of means
## Grand mean
##
## 118.0156
##
## Hole
## Hole
## Hole1 Hole2
## 106.66 129.38
##
## Hub
## Hub
## Hub1 Hub2
## 63.09 172.94
##
## Assembly
## Assembly
## Assembly1 Assembly2
## 132.47 103.56
##
## Hole:Hub
## Hub
## Hole Hub1 Hub2
## Hole1 48.69 164.62
## Hole2 77.50 181.25
##
## Hole:Assembly
## Assembly
## Hole Assembly1 Assembly2
## Hole1 114.44 98.88
## Hole2 150.50 108.25
##
## Hub:Assembly
## Assembly
## Hub Assembly1 Assembly2
## Hub1 79.00 47.19
## Hub2 185.94 159.94
##
## Hole:Hub:Assembly
## , , Assembly = Assembly1
##
## Hub
## Hole Hub1 Hub2
## Hole1 53.25 175.62
## Hole2 104.75 196.25
##
## , , Assembly = Assembly2
##
## Hub
## Hole Hub1 Hub2
## Hole1 44.12 153.62
## Hole2 50.25 166.25
Before we proceed with Tukey HSD Test, we need to test if the ANOVA Model we have fitted is adequate i.e. it holds up to Residual Analysis.
Residual Analysis
model <- aov(Torque~Hole*Assembly*Hub ,data =inp.data )
res<-resid(model)
plot(model,which=c(1,2))
From the above NQQ plot, it would appear the the Residuals are Normal. We can test this formally as follows:
H0 : The given data follows a Normal Distribution.
H1 : The given data doesn’t follow a Normal Distribution.
normtest(res)
## Warning: package 'nortest' was built under R version 3.4.1
## Method P.Value
## 1 Shapiro-Wilk normality test 0.172357678
## 2 Anderson-Darling normality test 0.043735579
## 3 Cramer-von Mises normality test 0.051058803
## 4 Lilliefors (Kolmogorov-Smirnov) normality test 0.004193211
## 5 Shapiro-Francia normality test 0.121477563
The p-value indicates the Residuals obtained follow a Normal Distribution.
Plotting the Residuals against various Factors :
plot(as.numeric(inp.data$Hole),res,xlab="Hole",ylab="Residuals")
plot(as.numeric(inp.data$Assembly),res,xlab="Assembly",ylab="Residuals")
plot(as.numeric(inp.data$Hub),res,xlab="Hub",ylab="Residuals")
While variance seems to be equal across the different groups, we can formally Test it as follows:
H0 : The variance is equal across the groups.
H1 : The variance is unequal across the groups.
Bartlett’s Test
bartlett.test(res~inp.data$Hole)
##
## Bartlett test of homogeneity of variances
##
## data: res by inp.data$Hole
## Bartlett's K-squared = 2.8068, df = 1, p-value = 0.09386
bartlett.test(res~inp.data$Assembly)
##
## Bartlett test of homogeneity of variances
##
## data: res by inp.data$Assembly
## Bartlett's K-squared = 0.3554, df = 1, p-value = 0.5511
bartlett.test(res~inp.data$Hub)
##
## Bartlett test of homogeneity of variances
##
## data: res by inp.data$Hub
## Bartlett's K-squared = 6.121, df = 1, p-value = 0.01336
Based on the p-values obtained for the 3 factors above, we can conclude that variance in Residuals across each of the different factor groups is equal.
Also, the Residuals plotted against Subjects gives no visible pattern so we can conclude that the ANOVA Model we have fitted is adequate.
Finally, we can check for statistical significance of differences in Mean Torque Values across different Treatment Groups using a Tukey HSD Test.
Significance of difference in Mean Torque values across Treatment groups
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Torque ~ Hole * Assembly * Hub, data = inp.data)
##
## $Hole
## diff lwr upr p adj
## H2-H1 22.71875 19.93186 25.50564 0
##
## $Assembly
## diff lwr upr p adj
## A2-A1 -28.90625 -31.69314 -26.11936 0
##
## $Hub
## diff lwr upr p adj
## Hu2-Hu1 109.8437 107.0569 112.6306 0
##
## $`Hole:Assembly`
## diff lwr upr p adj
## H2:A1-H1:A1 36.0625 30.852944 41.2720558 0.0000000
## H1:A2-H1:A1 -15.5625 -20.772056 -10.3529442 0.0000000
## H2:A2-H1:A1 -6.1875 -11.397056 -0.9779442 0.0137571
## H1:A2-H2:A1 -51.6250 -56.834556 -46.4154442 0.0000000
## H2:A2-H2:A1 -42.2500 -47.459556 -37.0404442 0.0000000
## H2:A2-H1:A2 9.3750 4.165444 14.5845558 0.0000799
##
## $`Hole:Hub`
## diff lwr upr p adj
## H2:Hu1-H1:Hu1 28.8125 23.60294 34.02206 0
## H1:Hu2-H1:Hu1 115.9375 110.72794 121.14706 0
## H2:Hu2-H1:Hu1 132.5625 127.35294 137.77206 0
## H1:Hu2-H2:Hu1 87.1250 81.91544 92.33456 0
## H2:Hu2-H2:Hu1 103.7500 98.54044 108.95956 0
## H2:Hu2-H1:Hu2 16.6250 11.41544 21.83456 0
##
## $`Assembly:Hub`
## diff lwr upr p adj
## A2:Hu1-A1:Hu1 -31.8125 -37.02206 -26.60294 0
## A1:Hu2-A1:Hu1 106.9375 101.72794 112.14706 0
## A2:Hu2-A1:Hu1 80.9375 75.72794 86.14706 0
## A1:Hu2-A2:Hu1 138.7500 133.54044 143.95956 0
## A2:Hu2-A2:Hu1 112.7500 107.54044 117.95956 0
## A2:Hu2-A1:Hu2 -26.0000 -31.20956 -20.79044 0
##
## $`Hole:Assembly:Hub`
## diff lwr upr p adj
## H2:A1:Hu1-H1:A1:Hu1 51.500 42.740286 60.2597138 0.0000000
## H1:A2:Hu1-H1:A1:Hu1 -9.125 -17.884714 -0.3652862 0.0354797
## H2:A2:Hu1-H1:A1:Hu1 -3.000 -11.759714 5.7597138 0.9587675
## H1:A1:Hu2-H1:A1:Hu1 122.375 113.615286 131.1347138 0.0000000
## H2:A1:Hu2-H1:A1:Hu1 143.000 134.240286 151.7597138 0.0000000
## H1:A2:Hu2-H1:A1:Hu1 100.375 91.615286 109.1347138 0.0000000
## H2:A2:Hu2-H1:A1:Hu1 113.000 104.240286 121.7597138 0.0000000
## H1:A2:Hu1-H2:A1:Hu1 -60.625 -69.384714 -51.8652862 0.0000000
## H2:A2:Hu1-H2:A1:Hu1 -54.500 -63.259714 -45.7402862 0.0000000
## H1:A1:Hu2-H2:A1:Hu1 70.875 62.115286 79.6347138 0.0000000
## H2:A1:Hu2-H2:A1:Hu1 91.500 82.740286 100.2597138 0.0000000
## H1:A2:Hu2-H2:A1:Hu1 48.875 40.115286 57.6347138 0.0000000
## H2:A2:Hu2-H2:A1:Hu1 61.500 52.740286 70.2597138 0.0000000
## H2:A2:Hu1-H1:A2:Hu1 6.125 -2.634714 14.8847138 0.3667120
## H1:A1:Hu2-H1:A2:Hu1 131.500 122.740286 140.2597138 0.0000000
## H2:A1:Hu2-H1:A2:Hu1 152.125 143.365286 160.8847138 0.0000000
## H1:A2:Hu2-H1:A2:Hu1 109.500 100.740286 118.2597138 0.0000000
## H2:A2:Hu2-H1:A2:Hu1 122.125 113.365286 130.8847138 0.0000000
## H1:A1:Hu2-H2:A2:Hu1 125.375 116.615286 134.1347138 0.0000000
## H2:A1:Hu2-H2:A2:Hu1 146.000 137.240286 154.7597138 0.0000000
## H1:A2:Hu2-H2:A2:Hu1 103.375 94.615286 112.1347138 0.0000000
## H2:A2:Hu2-H2:A2:Hu1 116.000 107.240286 124.7597138 0.0000000
## H2:A1:Hu2-H1:A1:Hu2 20.625 11.865286 29.3847138 0.0000000
## H1:A2:Hu2-H1:A1:Hu2 -22.000 -30.759714 -13.2402862 0.0000000
## H2:A2:Hu2-H1:A1:Hu2 -9.375 -18.134714 -0.6152862 0.0278403
## H1:A2:Hu2-H2:A1:Hu2 -42.625 -51.384714 -33.8652862 0.0000000
## H2:A2:Hu2-H2:A1:Hu2 -30.000 -38.759714 -21.2402862 0.0000000
## H2:A2:Hu2-H1:A2:Hu2 12.625 3.865286 21.3847138 0.0007601
In the above Plot, the Values marked in Black indicate those configurations of Factor Variables that have no Significant Difference in Mean Torque Values namely :
Hole Type 2, Assembly Method 2, Hub Shape 1 and Hole Type 1, Assembly Method 1, Hub Shape 1 have no significant difference in the Mean Torque Values.
Hole Type 2, Assembly Method 2, Hub Shape 1 and Hole Type 1, Assembly Method 2, Hub Shape 1 have no significant difference in the Mean Torque Values.
For the remaining 26 combinations of Factor levels, it can be seen that there are significant differences in the mean values for Torques produced.
Conclusion
We would recommend proceeding with Hole Type 2, Assembly Method 1 and Hub Shape 2 for obtaining Spyders that are the most resistant to destructive testing (high Torque values).
One of the main goals of ANOVA is establishing the order of importance between various Factors and their Interactions. Based on our Analysis, the following list has been created that ranks the various factors in descending order of Importance :
1.Hub
2.Assembly
3.Hole
4.Hole : Assembly
5.Hole : Assembly : Hub
6.Hole : Hub
7.Hub : Assembly