Exercise 1
#Source DF SS MS F P
#A 1 0.322
#B 80.554 40.2771 4.59
#A:B
#Error 12 105.327 8.7773
#Total 17 231.551
#Fill in the blanks in the ANOVA table
#dfB = SSB/MSB = 80.554/40.2771 = 2
#dfAB = 17 - 12 - 1 - 2 = 2
# MSA = SSA/dfA = 0.332/1 = 0.332
# SStotal = SSE + SSA + SSB + SSAB
# SSAB = SStotal - SSE - SSA - SSB = 231.551 - 105.327 - 0.322 - 80.554 = 45.348
# MSAB = SSAB/dfAB = 45.348/ 2 = 22.674
# F_a = MSA / MSE = 0.322/8.7773 = 0.03668554
# F_ab = MSAB/ MSE = 22.674/8.7773 = 2.58
# P_a = 1 - pf(0.03668554,1,12)
# P_b = 1 - pf(4.59,2,12)
# P_ab = 1 - pf(2.58,2,12)
# Complete Table
#Source DF SS MS F P
#A 1 0.322 0.322 0.037 0.85
#B 2 80.554 40.2771 4.59 0.033 *
#A:B 2 45.348 22.674 2.58 0.117
#Error 12 105.327 8.7773
#Total 17 231.551
# 3 levels of factorB
# abn - 1 = 17
# abn = 18
# n = 18/ab
n = 18/ (2*3)
n
## [1] 3
#n = 3
#There are no interaction effect since p-value associated to term, A:B is not significant.
#factor A (Main effect) is not significant
#Factor B(Main effect) is significant since the p-value is greater than 0.05
Exercise 2
# Two levels of factors
# Factor A - 2
# Factor B - 3
# no of replicates, n = 3
# Main effects (A,B) are significant
# Interaction effect (A:B) are not significant.
# The interaction term is dropped from the model.
# The dferror is ?
# dftotal = dfA + dfB + dferror
# dfA = 1, a=2
# dfB = 2, b =3
# dftotal = nab - 1 = 3*3*2 - 1 = 17
# dferror = 17-2-1 = 14
# Answer is (b) 14
Exercise 3a
# Two levels of factors
# Factor A - 3
# Factor B - 2
# no of replicates, n = 2
# SST = 100, SSA = 40, SSB = 25, SSAB = 16.
# The estimate of the error variance (MSE) is ?
SSE = 100-40-25 -16
SSE
## [1] 19
MSE = SSE/(3*2*(2-1))
MSE
## [1] 3.166667
# (e) None of the Above.
Exercise 3b
MSE = 3.166667
MSAB = 16/((2)*(1))
FAB = MSAB/MSE
P_ab = 1 - pf(FAB,(3-1)*(2-1),(3)*(2)*(2) - 1)
P_ab
## [1] 0.1250707
# P_ab is 0.1250707
# Since P_ab is not significant (P_ab > 0.05), we can drop the interaction term.
Exercise 4a
#The residuals from the model fit to data from a designed experiment
#(a) are independent random variables. True False (Answer : True)
#(b) are always normally distributed. True False (Answer : True)
#(c) always have constant variance. True False (Answe : True)
Exercise 4b
# Summarize this section
#Residuals are independent random variables.
# In design and analysis of experiment, Its important to draw a conclusion that is true for a entire population. IF the residuals are not independent, it means repetition of observation is highly probable and hence the sample wouldn't be a good estimator of a measure of the population. As it relates to ANOVA, This is by far the most important assumption to meet.
# Residuals are always normally distributed
# This assumption is important since the p-values are based on the T-distribution. If the residual error isn’t normal, then the p-values aren't solid enough for intepretation.
# Residuals always have constant Variance
# This assumption is important since the p-values are based on the T-distribution. If the residual error do not have constant variance across groups, then the p-values aren't solid enough for intepretation.
Data Analysis section
Question 1: This question is looking at the effects of temperature and copper content on warping. A 2 way anova model is used to find the the relationship between the main factors,interaction factors and the response variable.
# Treatment A: Copper Content (%)(40,60,80,100).
# Treatment B: Temperature(50,75,100,125)
#a=4
#b=4
n=2
y <- c(17,20,16,21,24,22,28,27,12,9,18,13,17,12,27,31,16,12,18,21,25,23,30,23,21,17,23,21,23,22,29,31)
A <- factor(rep(c(rep("40",n),rep("60",n),rep("80",n),rep("100",n)),4))
B <- factor(c(rep("50",4*n),rep("75",4*n),rep("100",4*n,),rep("125",4*n,)))
X <- data.frame(y,A,B)
## main effects model
model <- aov(y~A*B)
anova(model)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## A 3 698.34 232.781 34.3272 3.35e-07 ***
## B 3 156.09 52.031 7.6728 0.002127 **
## A:B 9 113.78 12.642 1.8643 0.132748
## Residuals 16 108.50 6.781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#################################################################################################################################################################################################################################
#PART A
#################################################################################################################################################################################################################################
#Is there any indication that either factor affects the amount of warping? Is there any interaction between the factors? Use α = 0.05.
# Ho: Factor is associated with the response.
# Ha: Factor is not associated with the response.
# Looking at the anova table, at 0.05 significance level - both main effect terms (Temperature and copper content) are significant since their p-value(s) are less than alpha at 0.05 while the interaction term is insignificant since its p-value is greater than alpha at 0.05 level of significance.
#################################################################################################################################################################################################################################
#PART B
#################################################################################################################################################################################################################################
#Analyze the residuals from this experiment.
#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)
shapiro.test(model$residuals)
##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.95013, p-value = 0.1454
# The majority of the residual points seems to be on or around the line suggesting the normality assumption is met. The shapiro wilk test confirms this since its p-value is greater than alpha at 0.05 significance.
#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)
#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.
#################################################################################################################################################################################################################################
#PART c & D
#################################################################################################################################################################################################################################
interaction.plot(x.factor = A,
trace.factor = B,
response = y,
fun = mean,
type="b",
col=c("black","red","green","blue"),
pch=c(19, 17, 15,16),
fixed=TRUE,
leg.bty = "o")
#As seen in Interaction plot, lowest warpping value is achieved at copper value of 40.
Question 2:
This question is looking at the effects of pressure and temperature on chemical process. A 2 way anova model is used to find the the relationship between the main factors,interaction factors and the response variable. A block is added to control for the variance introduced by days.
# Treatment A: Pressure (250,260,270).
# Treatment B: Temperature(low,medium,high)
#a=3
#b=3
n=1
y<-c(86.3 , 84.0, 85.8, 86.1, 85.2, 87.3,
88.5, 87.3, 89.0, 89.4, 89.9, 90.3,89.1,
90.2, 91.3, 91.7, 93.2, 93.7)
A <- factor(rep(c(rep("250",n),rep("260",n),rep("270",n)),6))
B <- factor(c(rep("LOW",6*n),rep("Meduim",6*n),rep("High",6*n,)))
block <- factor(rep(c(rep("1",3),rep("2",3)),3))
X <- data.frame(y,A,B)
## main effects model
model <- aov(y~A*B + block)
anova(model)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## A 2 5.508 2.754 5.1838 0.035988 *
## B 2 99.854 49.927 93.9807 2.778e-06 ***
## block 1 13.005 13.005 24.4800 0.001124 **
## A:B 4 4.452 1.113 2.0952 0.173314
## Residuals 8 4.250 0.531
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Looking at the anova table, at 0.05 significance level - both main effect terms (Pressure and temperature) are significant since their p-value(s) are less than alpha at 0.05 while the interaction term is insignificant since its p-value is greater than alpha at 0.05 level of significance.
#Analyze the residuals from this experiment.
#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)
shapiro.test(model$residuals)
##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.97846, p-value = 0.9327
# The majority of the residual points seems to be on or around the line suggesting the normality assumption is met. The shapiro wilk test confirms this since its p-value is greater than alpha at 0.05 significance.
#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)
#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.
Question 3a:
Brief Summary about problem: This question is looking at the effects of 2 factors with several levels on potential fatigue. A 2 way anova model is used to find the the relationship between the main factors,interaction factors and the response variable.
# Treatment A: Bottle Type (Glass,Plastic).
# Treatment B: Worker(1,2)
y <- c(39,58,45,35,44,42,35,21,20,16,13,11,13,16,10,15)
A <- rep(c(rep("Glass",4),rep("Plastic",4)),2)
B <- c(rep("1",8),rep("2",8))
df <- data.frame(y,A,B)
model <- aov(y~A*B)
anova(model)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 105.06 105.06 1.8147 0.2028
## B 1 2626.56 2626.56 45.3670 2.087e-05 ***
## A:B 1 52.56 52.56 0.9079 0.3595
## Residuals 12 694.75 57.90
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Looking at the anova table, at 0.05 significance level - The main effect (work) is found to be the only significant factor since its p-value(s) is less than alpha at 0.05 while the interaction term is insignificant since its p-value is greater than alpha at 0.05 level of significance.
#Analyze the residuals from this experiment.
#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)
shapiro.test(model$residuals)
##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.9854, p-value = 0.9922
# The majority of the residual points seems to be on or around the line suggesting the normality assumption is met. The shapiro wilk test confirms this since its p-value is greater than alpha at 0.05 significance.
#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)
#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.
Question 3b
#FOR Factor AB
#Calculate approximate 95 percent confidence limits for the factor effects in Data Analysis 3a.
#Do the results of this analysis agree with the analysis of variance results you may have performed in that question?
SE_A = sqrt((1/((4)*2^(2-2)))* 105.06) * - 1
SE_A
## [1] -5.124939
#FOR Factor B
SE_B = sqrt((1/((4)*2^(2-2)))* 2626.56) * - 1
SE_B
## [1] -25.62499
#For Factor AB
SE_AB = sqrt((1/((4)*2^(2-2)))* 52.56)
SE_AB
## [1] 3.624914
#SE effect
SE = sqrt((1/((4)*2^(2-2)))* 57.90)
SE
## [1] 3.804602
#CI Factor effect for Factor A
upper_limit_A = SE_A + (SE * 1.96)
lower_limit_A = SE_A - (SE * 1.96)
CI_factorA=c(lower_limit_A,upper_limit_A)
CI_factorA
## [1] -12.581960 2.332082
#CI Factor effect for Factor B
upper_limit_B = SE_B + (SE * 1.96)
lower_limit_B = SE_B - (SE * 1.96)
CI_factorB=c(lower_limit_B,upper_limit_B)
CI_factorB
## [1] -33.08201 -18.16797
#CI Factor effect for Factor B
upper_limit_AB = SE_AB + (SE * 1.96)
lower_limit_AB = SE_AB - (SE * 1.96)
CI_factorAB=c(lower_limit_AB,upper_limit_AB)
CI_factorAB
## [1] -3.832107 11.081935
#Summary todo.
#The 95% confidence interval for factor B does not contain zero and hence its significant. This agrees with result from the 3a
Question 4
:This question looks at the effet on four factors on cracks.
#Exercise 4a
y <- c( 7.037, 6.376,
14.707, 15.219,
11.635, 12.089,
17.273, 17.815,
10.403, 10.151,
4.368, 4.098,
9.360, 9.253,
13.440, 12.923,
8.561, 8.951,
16.867, 17.052,
13.876, 13.658,
19.824, 19.639,
11.846, 12.337,
6.125, 5.904,
11.190, 10.935,
15.653, 15.053)
A <- rep(c(rep(-1,2),rep(1,2)),8)
B <- rep(c(rep(-1,4),rep(1,4)),4)
C <- rep(c(rep(-1,8),rep(1,8)),2)
D <- c(rep(-1,16),rep(1,16))
#Part a
## factor effect
fe <- y%*%cbind(A,B,C,D)/(2^(4-1)/2)
fe
## A B C D
## [1,] 12.0755 15.9035 -14.385 7.831
#Part b
# number of variables
p <- 4
# binary -1/+1 matrix
bin <- matrix(0,2^p,p)
#
# fill in the matrix
for(k in 0:(2^p-1)) bin[k+1,] <- 2*as.numeric(intToBits(k)[1:p])-1
# name the columns
colnames(bin) <- LETTERS[1:p]
# name the rows
temp <- NULL
for(k in 1:(2^p)) temp <- c(temp,paste(letters[which(bin[k,]==1)],collapse=""))
rownames(bin) <- temp
## create full factorial table
# set up the extended matrix
binext <- matrix(0,2^p,2^p-1-p)
## fill it in
#
# storage for column names
temp <- NULL
# column index
h <- 1
for(k in 2:p){
combos <- t(combn(p,k))
for(j in 1:nrow(combos)){
binext[,h] <- apply(bin[,combos[j,]],1,prod)
temp <- c(temp,paste(LETTERS[combos[j,]],collapse=""))
h <- h+1
}
}
# cbind it to the original bin
bin <- cbind(bin,binext)
colnames(bin) <- c(LETTERS[1:p],temp)
df <- data.frame(y,bin)
## Warning in data.frame(y, bin): row names were found from a short variable
## and have been discarded
model <- aov(y~., data =df)
anova(model)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 0.016 0.016 0.0080 0.929702
## B 1 72.909 72.909 36.9633 1.594e-05 ***
## C 1 126.461 126.461 64.1130 5.482e-07 ***
## D 1 103.464 103.464 52.4542 1.966e-06 ***
## AB 1 0.005 0.005 0.0025 0.960863
## AC 1 0.035 0.035 0.0178 0.895524
## AD 1 0.236 0.236 0.1198 0.733747
## BC 1 29.927 29.927 15.1722 0.001287 **
## BD 1 128.496 128.496 65.1451 4.942e-07 ***
## CD 1 0.074 0.074 0.0374 0.849129
## ABC 1 0.024 0.024 0.0120 0.913966
## ABD 1 0.207 0.207 0.1050 0.750144
## ACD 1 0.061 0.061 0.0308 0.862918
## BCD 1 78.751 78.751 39.9253 1.021e-05 ***
## ABCD 1 0.021 0.021 0.0107 0.918880
## Residuals 16 31.559 1.972
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#As seen in the result, both main factors and interaction effects are related to cracking.
#c
summary(lm(y~.,data=df))
##
## Call:
## lm(formula = y ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2875 -0.9154 0.0000 0.9154 1.2875
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.98806 0.24827 48.286 < 2e-16 ***
## A -0.02225 0.24827 -0.090 0.92970
## B 1.50944 0.24827 6.080 1.59e-05 ***
## C 1.98794 0.24827 8.007 5.48e-07 ***
## D -1.79812 0.24827 -7.243 1.97e-06 ***
## AB -0.01238 0.24827 -0.050 0.96086
## AC -0.03313 0.24827 -0.133 0.89552
## AD -0.08594 0.24827 -0.346 0.73375
## BC 0.96706 0.24827 3.895 0.00129 **
## BD -2.00388 0.24827 -8.071 4.94e-07 ***
## CD 0.04800 0.24827 0.193 0.84913
## ABC -0.02725 0.24827 -0.110 0.91397
## ABD -0.08044 0.24827 -0.324 0.75014
## ACD -0.04356 0.24827 -0.175 0.86292
## BCD 1.56875 0.24827 6.319 1.02e-05 ***
## ABCD 0.02569 0.24827 0.103 0.91888
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.404 on 16 degrees of freedom
## Multiple R-squared: 0.9448, Adjusted R-squared: 0.8931
## F-statistic: 18.27 on 15 and 16 DF, p-value: 2.741e-07
# See model
# d
#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)
shapiro.test(model$residuals)
##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.78308, p-value = 1.991e-05
# The majority of the residual points are not on the line suggesting the normality assumption is not met. The shapiro wilk test confirms this since its p-value is less than alpha at 0.05 significance. Further work is required.
#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)
#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.
#e and f
#ran out of time.
#Exercise 4D
block <- factor (ifelse(bin[,ncol(bin)]>0,1,2))
new_data= data.frame(y,bin,block)
## Warning in data.frame(y, bin, block): row names were found from a short
## variable and have been discarded
fit2<-aov (y~.-ABCD,data=new_data)
anova(fit2)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 0.016 0.016 0.0080 0.929702
## B 1 72.909 72.909 36.9633 1.594e-05 ***
## C 1 126.461 126.461 64.1130 5.482e-07 ***
## D 1 103.464 103.464 52.4542 1.966e-06 ***
## AB 1 0.005 0.005 0.0025 0.960863
## AC 1 0.035 0.035 0.0178 0.895524
## AD 1 0.236 0.236 0.1198 0.733747
## BC 1 29.927 29.927 15.1722 0.001287 **
## BD 1 128.496 128.496 65.1451 4.942e-07 ***
## CD 1 0.074 0.074 0.0374 0.849129
## ABC 1 0.024 0.024 0.0120 0.913966
## ABD 1 0.207 0.207 0.1050 0.750144
## ACD 1 0.061 0.061 0.0308 0.862918
## BCD 1 78.751 78.751 39.9253 1.021e-05 ***
## block 1 0.021 0.021 0.0107 0.918880
## Residuals 16 31.559 1.972
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Adding block didn't seem to affect model.
#I ran out of time so wasn't able to complete part b an c.