Exercise 1

#Source  DF    SS        MS       F       P
#A       1    0.322
#B            80.554  40.2771   4.59
#A:B
#Error  12    105.327   8.7773
#Total  17    231.551
  1. Fill in the blanks in the ANOVA table
#Fill in the blanks in the ANOVA table

#dfB =  SSB/MSB = 80.554/40.2771 = 2

#dfAB = 17 - 12 - 1 - 2 = 2

# MSA = SSA/dfA = 0.332/1 = 0.332

# SStotal = SSE + SSA + SSB + SSAB 

# SSAB = SStotal - SSE - SSA - SSB = 231.551 - 105.327 - 0.322 - 80.554  = 45.348

# MSAB = SSAB/dfAB = 45.348/ 2 = 22.674

# F_a = MSA /  MSE = 0.322/8.7773 = 0.03668554
 
# F_ab = MSAB/ MSE = 22.674/8.7773 = 2.58

# P_a = 1 - pf(0.03668554,1,12)
# P_b = 1 - pf(4.59,2,12)
# P_ab =  1 - pf(2.58,2,12)


# Complete Table

#Source  DF    SS        MS       F       P
#A       1    0.322    0.322     0.037   0.85
#B       2    80.554   40.2771   4.59    0.033 *
#A:B     2    45.348   22.674    2.58    0.117
#Error  12    105.327  8.7773    
#Total  17    231.551
  1. How many levels were used for factor B?
# 3 levels of factorB
  1. How many replicates of the experiment were performed?
# abn - 1 = 17
# abn = 18
# n = 18/ab
n = 18/ (2*3)
n
## [1] 3
#n = 3
  1. What conclusions would you draw about this experiment?
#There are no interaction effect since p-value associated to term, A:B is not significant.

#factor A (Main effect) is not significant

#Factor B(Main effect) is significant since the p-value is greater than 0.05

Exercise 2

# Two levels of factors
# Factor A - 2
# Factor B - 3
# no of replicates, n = 3
# Main effects (A,B) are significant
# Interaction effect (A:B) are not significant.
# The interaction term is dropped from the model.

# The dferror is ?

# dftotal = dfA + dfB + dferror

# dfA =  1, a=2
# dfB =  2, b =3
# dftotal = nab - 1 = 3*3*2  - 1 = 17
# dferror = 17-2-1 = 14

# Answer is (b) 14

Exercise 3a

# Two levels of factors
# Factor A - 3
# Factor B - 2
# no of replicates, n = 2
# SST = 100, SSA = 40, SSB = 25, SSAB = 16.

# The estimate of the error variance (MSE) is ?

 SSE = 100-40-25 -16
 SSE
## [1] 19
 MSE = SSE/(3*2*(2-1))
 MSE
## [1] 3.166667
 # (e) None of the Above.

Exercise 3b

MSE =  3.166667
MSAB = 16/((2)*(1))
FAB = MSAB/MSE
P_ab =  1 - pf(FAB,(3-1)*(2-1),(3)*(2)*(2) - 1)
P_ab
## [1] 0.1250707
# P_ab is  0.1250707
# Since P_ab is not significant (P_ab > 0.05), we can drop the interaction term.

Exercise 4a

#The residuals from the model fit to data from a designed experiment
#(a) are independent random variables. True False (Answer : True)
#(b) are always normally distributed. True False  (Answer : True)
#(c) always have constant variance. True False    (Answe  : True)

Exercise 4b

# Summarize this section

#Residuals are independent random variables.

# In design and analysis of experiment, Its important to draw a conclusion that is true for a entire population. IF the residuals are not independent, it means repetition of observation is highly probable and hence the sample wouldn't be a good estimator of a measure of the population. As it relates to ANOVA, This is by far the most important assumption to meet.

# Residuals are  always normally distributed

# This assumption is important since the p-values are based on the T-distribution. If the residual error isn’t normal, then the p-values aren't solid enough for intepretation. 

# Residuals always have constant Variance

# This assumption is important since the p-values are based on the T-distribution. If the residual error do not have constant variance across groups, then the p-values aren't solid enough for intepretation. 

Data Analysis section

Question 1: This question is looking at the effects of temperature and copper content on warping. A 2 way anova model is used to find the the relationship between the main factors,interaction factors and the response variable.

# Treatment A: Copper Content (%)(40,60,80,100).
# Treatment B: Temperature(50,75,100,125)
#a=4
#b=4
n=2
y <- c(17,20,16,21,24,22,28,27,12,9,18,13,17,12,27,31,16,12,18,21,25,23,30,23,21,17,23,21,23,22,29,31)
A <- factor(rep(c(rep("40",n),rep("60",n),rep("80",n),rep("100",n)),4))
B <- factor(c(rep("50",4*n),rep("75",4*n),rep("100",4*n,),rep("125",4*n,)))
X <- data.frame(y,A,B)
## main effects model
model <- aov(y~A*B)
anova(model)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## A          3 698.34 232.781 34.3272 3.35e-07 ***
## B          3 156.09  52.031  7.6728 0.002127 ** 
## A:B        9 113.78  12.642  1.8643 0.132748    
## Residuals 16 108.50   6.781                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#################################################################################################################################################################################################################################

#PART A

#################################################################################################################################################################################################################################


#Is there any indication that either factor affects the amount of warping? Is there any interaction between the factors? Use α = 0.05.

#   Ho: Factor is associated with the response.
#   Ha: Factor is not associated with the response.

# Looking at the anova table, at 0.05 significance level - both main effect terms (Temperature and copper content) are significant since their p-value(s) are less than alpha at 0.05 while the interaction term is insignificant since its p-value is greater than alpha at 0.05 level of significance.


#################################################################################################################################################################################################################################

#PART B

#################################################################################################################################################################################################################################

#Analyze the residuals from this experiment.

#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)

shapiro.test(model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model$residuals
## W = 0.95013, p-value = 0.1454
# The majority of the residual points seems to be on or around the line suggesting the normality assumption is met. The shapiro wilk test confirms this since its p-value is greater than alpha at 0.05 significance.

#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)

#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.



#################################################################################################################################################################################################################################

#PART c & D

#################################################################################################################################################################################################################################







interaction.plot(x.factor     = A,
                 trace.factor = B,
                 response     = y,
                 fun = mean,
                 type="b",
                 col=c("black","red","green","blue"),  
                 pch=c(19, 17, 15,16),             
                 fixed=TRUE,                   
                 leg.bty = "o")

#As seen in Interaction plot, lowest warpping value is achieved at copper value of 40.

Question 2:

This question is looking at the effects of pressure and temperature on chemical process. A 2 way anova model is used to find the the relationship between the main factors,interaction factors and the response variable. A block is added to control for the variance introduced by days.

# Treatment A: Pressure (250,260,270).
# Treatment B: Temperature(low,medium,high) 
#a=3
#b=3
n=1
y<-c(86.3 , 84.0, 85.8, 86.1, 85.2, 87.3,
88.5, 87.3, 89.0, 89.4, 89.9, 90.3,89.1,
90.2, 91.3, 91.7, 93.2, 93.7)
A <- factor(rep(c(rep("250",n),rep("260",n),rep("270",n)),6))
B <- factor(c(rep("LOW",6*n),rep("Meduim",6*n),rep("High",6*n,)))
block <- factor(rep(c(rep("1",3),rep("2",3)),3))
X <- data.frame(y,A,B)
## main effects model
model <- aov(y~A*B + block)
anova(model)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## A          2  5.508   2.754  5.1838  0.035988 *  
## B          2 99.854  49.927 93.9807 2.778e-06 ***
## block      1 13.005  13.005 24.4800  0.001124 ** 
## A:B        4  4.452   1.113  2.0952  0.173314    
## Residuals  8  4.250   0.531                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Looking at the anova table, at 0.05 significance level - both main effect terms (Pressure and temperature) are significant since their p-value(s) are less than alpha at 0.05 while the interaction term is insignificant since its p-value is greater than alpha at 0.05 level of significance.

#Analyze the residuals from this experiment.

#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)

shapiro.test(model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model$residuals
## W = 0.97846, p-value = 0.9327
# The majority of the residual points seems to be on or around the line suggesting the normality assumption is met. The shapiro wilk test confirms this since its p-value is greater than alpha at 0.05 significance.

#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)

#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.

Question 3a:

Brief Summary about problem: This question is looking at the effects of 2 factors with several levels on potential fatigue. A 2 way anova model is used to find the the relationship between the main factors,interaction factors and the response variable.

# Treatment A: Bottle Type (Glass,Plastic).
# Treatment B: Worker(1,2) 
y <- c(39,58,45,35,44,42,35,21,20,16,13,11,13,16,10,15)

A <- rep(c(rep("Glass",4),rep("Plastic",4)),2)
B <- c(rep("1",8),rep("2",8))

df <- data.frame(y,A,B)


model <- aov(y~A*B)
anova(model)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## A          1  105.06  105.06  1.8147    0.2028    
## B          1 2626.56 2626.56 45.3670 2.087e-05 ***
## A:B        1   52.56   52.56  0.9079    0.3595    
## Residuals 12  694.75   57.90                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Looking at the anova table, at 0.05 significance level - The main effect (work) is found to be the only significant factor since its  p-value(s) is less than alpha at 0.05 while the interaction term is insignificant since its p-value is greater than alpha at 0.05 level of significance.

#Analyze the residuals from this experiment.

#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)

shapiro.test(model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model$residuals
## W = 0.9854, p-value = 0.9922
# The majority of the residual points seems to be on or around the line suggesting the normality assumption is met. The shapiro wilk test confirms this since its p-value is greater than alpha at 0.05 significance.

#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)

#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.

Question 3b

#FOR Factor AB
#Calculate approximate 95 percent confidence limits for the factor effects in Data Analysis 3a. 
#Do the results of this analysis agree with the analysis of variance results you may have performed in that question?



SE_A = sqrt((1/((4)*2^(2-2)))* 105.06) * - 1 
SE_A
## [1] -5.124939
#FOR Factor B
SE_B = sqrt((1/((4)*2^(2-2)))* 2626.56) * - 1
SE_B
## [1] -25.62499
#For Factor AB
SE_AB = sqrt((1/((4)*2^(2-2)))* 52.56)
SE_AB
## [1] 3.624914
#SE effect
SE = sqrt((1/((4)*2^(2-2)))* 57.90)
SE
## [1] 3.804602
#CI Factor effect for Factor A
upper_limit_A = SE_A + (SE * 1.96)
lower_limit_A = SE_A - (SE * 1.96)
CI_factorA=c(lower_limit_A,upper_limit_A)
CI_factorA
## [1] -12.581960   2.332082
#CI Factor effect for Factor B
upper_limit_B = SE_B + (SE * 1.96)
lower_limit_B = SE_B - (SE * 1.96)
CI_factorB=c(lower_limit_B,upper_limit_B)
CI_factorB
## [1] -33.08201 -18.16797
#CI Factor effect for Factor B
upper_limit_AB = SE_AB + (SE * 1.96)
lower_limit_AB = SE_AB - (SE * 1.96)
CI_factorAB=c(lower_limit_AB,upper_limit_AB)
CI_factorAB
## [1] -3.832107 11.081935
#Summary todo.
 
#The 95% confidence interval for factor B does not contain zero and hence its significant. This agrees with result from the 3a

Question 4

:This question looks at the effet on four factors on cracks.

#Exercise 4a


y <- c( 7.037, 6.376,
14.707, 15.219,
11.635, 12.089,
17.273, 17.815,
10.403, 10.151,
 4.368, 4.098,
9.360, 9.253,
13.440, 12.923,
8.561, 8.951,
16.867, 17.052,
13.876, 13.658,
19.824, 19.639,
11.846, 12.337,
 6.125, 5.904,
11.190, 10.935,
15.653, 15.053)

A <- rep(c(rep(-1,2),rep(1,2)),8)      
B <- rep(c(rep(-1,4),rep(1,4)),4)
C <- rep(c(rep(-1,8),rep(1,8)),2)
D <- c(rep(-1,16),rep(1,16))

#Part a

## factor effect
fe <- y%*%cbind(A,B,C,D)/(2^(4-1)/2)
fe
##            A       B       C     D
## [1,] 12.0755 15.9035 -14.385 7.831
#Part b

# number of variables
p <- 4
# binary -1/+1 matrix
bin <- matrix(0,2^p,p)
#
# fill in the matrix
for(k in 0:(2^p-1)) bin[k+1,] <- 2*as.numeric(intToBits(k)[1:p])-1
# name the columns
colnames(bin) <- LETTERS[1:p]
# name the rows
temp <- NULL
for(k in 1:(2^p)) temp <- c(temp,paste(letters[which(bin[k,]==1)],collapse=""))
rownames(bin) <- temp
## create full factorial table
# set up the extended matrix
binext <- matrix(0,2^p,2^p-1-p)
## fill it in
#
# storage for column names
temp <- NULL
# column index
h <- 1
for(k in 2:p){
combos <- t(combn(p,k))
for(j in 1:nrow(combos)){
binext[,h] <- apply(bin[,combos[j,]],1,prod)
temp <- c(temp,paste(LETTERS[combos[j,]],collapse=""))
h <- h+1
}
}
# cbind it to the original bin
bin <- cbind(bin,binext)
colnames(bin) <- c(LETTERS[1:p],temp)

df <- data.frame(y,bin)
## Warning in data.frame(y, bin): row names were found from a short variable
## and have been discarded
model <- aov(y~., data =df)

anova(model)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## A          1   0.016   0.016  0.0080  0.929702    
## B          1  72.909  72.909 36.9633 1.594e-05 ***
## C          1 126.461 126.461 64.1130 5.482e-07 ***
## D          1 103.464 103.464 52.4542 1.966e-06 ***
## AB         1   0.005   0.005  0.0025  0.960863    
## AC         1   0.035   0.035  0.0178  0.895524    
## AD         1   0.236   0.236  0.1198  0.733747    
## BC         1  29.927  29.927 15.1722  0.001287 ** 
## BD         1 128.496 128.496 65.1451 4.942e-07 ***
## CD         1   0.074   0.074  0.0374  0.849129    
## ABC        1   0.024   0.024  0.0120  0.913966    
## ABD        1   0.207   0.207  0.1050  0.750144    
## ACD        1   0.061   0.061  0.0308  0.862918    
## BCD        1  78.751  78.751 39.9253 1.021e-05 ***
## ABCD       1   0.021   0.021  0.0107  0.918880    
## Residuals 16  31.559   1.972                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#As seen in the result, both main factors and interaction effects are related to cracking.

#c

summary(lm(y~.,data=df))
## 
## Call:
## lm(formula = y ~ ., data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2875 -0.9154  0.0000  0.9154  1.2875 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.98806    0.24827  48.286  < 2e-16 ***
## A           -0.02225    0.24827  -0.090  0.92970    
## B            1.50944    0.24827   6.080 1.59e-05 ***
## C            1.98794    0.24827   8.007 5.48e-07 ***
## D           -1.79812    0.24827  -7.243 1.97e-06 ***
## AB          -0.01238    0.24827  -0.050  0.96086    
## AC          -0.03313    0.24827  -0.133  0.89552    
## AD          -0.08594    0.24827  -0.346  0.73375    
## BC           0.96706    0.24827   3.895  0.00129 ** 
## BD          -2.00388    0.24827  -8.071 4.94e-07 ***
## CD           0.04800    0.24827   0.193  0.84913    
## ABC         -0.02725    0.24827  -0.110  0.91397    
## ABD         -0.08044    0.24827  -0.324  0.75014    
## ACD         -0.04356    0.24827  -0.175  0.86292    
## BCD          1.56875    0.24827   6.319 1.02e-05 ***
## ABCD         0.02569    0.24827   0.103  0.91888    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.404 on 16 degrees of freedom
## Multiple R-squared:  0.9448, Adjusted R-squared:  0.8931 
## F-statistic: 18.27 on 15 and 16 DF,  p-value: 2.741e-07
# See model

# d
#For Normality:
qqnorm(model$residuals)
qqline(model$residuals)

shapiro.test(model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model$residuals
## W = 0.78308, p-value = 1.991e-05
# The majority of the residual points are not on the line suggesting the normality assumption is not met. The shapiro wilk test confirms this since its p-value is less than alpha at 0.05 significance. Further work is required.

#For Constant Variance
plot(model$fitted.values,model$residuals)
abline(h=0)

#Since no unique shape (cone) and equal number of points observed, we conclude that the constant variance assumption is met.


#e and f
#ran out of time.


#Exercise 4D
block <- factor (ifelse(bin[,ncol(bin)]>0,1,2))
new_data= data.frame(y,bin,block)
## Warning in data.frame(y, bin, block): row names were found from a short
## variable and have been discarded
fit2<-aov (y~.-ABCD,data=new_data)
anova(fit2)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## A          1   0.016   0.016  0.0080  0.929702    
## B          1  72.909  72.909 36.9633 1.594e-05 ***
## C          1 126.461 126.461 64.1130 5.482e-07 ***
## D          1 103.464 103.464 52.4542 1.966e-06 ***
## AB         1   0.005   0.005  0.0025  0.960863    
## AC         1   0.035   0.035  0.0178  0.895524    
## AD         1   0.236   0.236  0.1198  0.733747    
## BC         1  29.927  29.927 15.1722  0.001287 ** 
## BD         1 128.496 128.496 65.1451 4.942e-07 ***
## CD         1   0.074   0.074  0.0374  0.849129    
## ABC        1   0.024   0.024  0.0120  0.913966    
## ABD        1   0.207   0.207  0.1050  0.750144    
## ACD        1   0.061   0.061  0.0308  0.862918    
## BCD        1  78.751  78.751 39.9253 1.021e-05 ***
## block      1   0.021   0.021  0.0107  0.918880    
## Residuals 16  31.559   1.972                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Adding block didn't seem to affect model.


#I ran out of time so wasn't able to complete part b an c.