Library Used
library(car)
library(tidyr)
library(dplyr)
library(agricolae)
library(MASS)
library(GAD)
The effective life of insulating fluids at an accelerated load of 35 kV is being studied. Test data have been obtained for four types of fluids. The results from a completely randomized experiment as follows
fluid1<-c(17.6,18.9, 16.3, 17.4, 20.1, 21.6)
fluid2<-c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3<-c( 21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4<-c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
fluid<-c(fluid1,fluid2,fluid3,fluid4)
type<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
type<-as.factor(type)
library(tidyr)
df_323<-data.frame(fluid,type)
str(df)
## function (x, df1, df2, ncp, log = FALSE)
\(H_{0} : \mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}=\mu\)
\(H_{a}\) : One of the \(\mu_{i}\) is different
model_anova323<-aov(df_323$fluid~df_323$type,data=df_323)
summary(model_anova323)
## Df Sum Sq Mean Sq F value Pr(>F)
## df_323$type 3 30.16 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_anova323)
LSD323<- LSD.test(model_anova323,'df$type',console = TRUE)
## Name: df$type
## df_323$type
From the ANOVA model we can find the p-value (0.5225) to be greater than \(\alpha\)
Hence we failed to reject \(H_{0.}\) We can not confidently indicate that, the fluid type means are different.
The LSD test shows the Fluid 3 have the largest mean out of 04. So we can say that fluid type 3 has the longer life.
Variance assumption check
mean_fluid <-c(rep(mean(fluid1),6),rep(mean(fluid2),6),rep(mean(fluid3),6),rep(mean(fluid4),6))
res_323<-fluid-mean_fluid
plot(res_323~mean_fluid,main='Plot for Constant Variance',xlab= 'Fluid Type', ylab='Residuals')
Plot of the residuals have same spread for all four fluid types, therefore the constant variance assumptions holds.
An experiment was performed to investigate the effectiveness of five insulating materials. Four samples of each material were tested at an elevated voltage level to accelaerate the time to failure.
ft_1<-c(100,157,194,178)
ft_2<-c(1,2,4,18)
ft_3<-c(880,1256,5276,4355)
ft_4<-c(495,7040,5307,10050)
ft_5<-c(7,5,29,2)
failure_time<-c(ft_1,ft_2,ft_3,ft_4,ft_5)
material_type<-c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
df_328<-data.frame(failure_time,material_type)
df_328$material_type<-as.factor(df_328$material_type)
boxplot(df_328$failure_time~df_328$material_type, main='Boxplot of Failure Time',
xlab='Material Type',ylab='Failure Time (min)')
From the data table we can see that, the failure time of Material 2 & 5 is drastically different from the rest of the materials data. Which indicates that the means are significantly different.
Also from the box plot we can see that, the means are nowhere close to each other and the variance is widely different from each other.
So we conclude that the materials do not have the same effect on mean failure time.
mean_ft<-c(rep(mean(ft_1),4),rep(mean(ft_2),4),rep(mean(ft_3),4),
rep(mean(ft_4),4),rep(mean(ft_5),4))
res_ft<-failure_time-mean_ft
plot(res_ft~mean_ft,main='Residuals vs Predicted Response Plot',
xlab='Predicted Response',ylab='Residuals')
qqnorm(res_ft)
qqline(res_ft, col='red')
From the residuals vs predicted response plot we can clearly state that the variances are not constant.
The Normality plot of the residuals do not uphold the Normally assumption.
Considering part b analysis, we can conduct a BoxCox Transformation.
boxcox(df_328$failure_time~df_328$material_type,data=df_328)
From the BoxCox we can see, lamda is between -0.1 to 0.1.
Lets assume \(\lambda\) =0.
For \(\lambda\)=0, a log-transformation is required.
log_failure_time<-log(df_328$failure_time)
model_anova328_log<-aov(log_failure_time~df_328$material_type,data=df_328)
summary(model_anova328_log)
## Df Sum Sq Mean Sq F value Pr(>F)
## df_328$material_type 4 165.02 41.25 37.48 1.21e-07 ***
## Residuals 15 16.51 1.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_anova328_log)
From the ANOVA model plot we can see that, after log-transformation the residuals shows normality & constant variance. Hence the ANOVA assumption of Normality & Constant Variance holds.
Also we find the p-value=1.21e-7 < 0.05, so we can confidently say that mean failure time of at least one material is different.
A semiconductor manufacturer has developed three different methods for reducing particle counts on wafers. All three methods are tested on five different wafers and the after treatment particle count obtained. The data are shown below:
c_1 <- c(31,10,21,4,1)
c_2 <- c(62,40,24,30,35)
c_3<- c(53,27,120,97,68)
methods<-as.factor(c(rep(1,5),rep(2,5),rep(3,5)))
counts<-c(c_1,c_2,c_3)
df_329<-data.frame(counts,methods)
\(H_{0} : \mu_{1}=\mu_{2}=\mu_{3}=\mu\)
\(H_{a}\) : One of the \(\mu_{i}\) is different
model_anova329<-aov(df_329$counts~df_329$methods,data=df_329)
summary(model_anova329)
## Df Sum Sq Mean Sq F value Pr(>F)
## df_329$methods 2 8964 4482 7.914 0.00643 **
## Residuals 12 6796 566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mean_329<-c(rep(mean(c_1),5),rep(mean(c_2),5),rep(mean(c_3),5))
res_329<-counts-mean_329
qqnorm(res_329, main='NPP for Residuals')
qqline(res_329,col='darkblue')
plot(res_329~mean_329,main='Residuals vs Predicted Response',
xlab='Methods', ylab='Residuals')
The residuals vs Predicted response plot shows that the variances are not constant, additionally the residuals do not follow normality.
boxcox(model_anova329)
From the BoxCox plot we can assume \(\lambda\) to be between 0.2 to 0.8
lamda=0.5
y_329=df_329$counts^lamda
Boxplot(y_329~df_329$methods,data= df_329, xlab='Methods', ylab='residuals')
## [1] "6"
model_anova329_t<-aov(y_329~df_329$methods,data=df_329)
summary(model_anova329_t)
## Df Sum Sq Mean Sq F value Pr(>F)
## df_329$methods 2 63.90 31.95 9.84 0.00295 **
## Residuals 12 38.96 3.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_anova329_t)
After BoxCox transformation the residuals shows normality & constant variance.
From ANOVA Model we find the p-value=0.00295 < 0.05, so we confidently reject \(H_0\).
Use the Kruskal–Wallis test for the experiment in Problem 3.23. Compare the conclusions obtained with those from the usual analysis of variance.
kruskal.test(df_323$fluid~df_323$type,data=df_323)
##
## Kruskal-Wallis rank sum test
##
## data: df_323$fluid by df_323$type
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015
In Problem 3.23 we failed to reject \(H_0\), that is the means are equal for 04 fluid types.
From the Kruskal-Wallis test we find that the p-value= 0.1015>0.05, Hence we failed to reject the Null Hypothesis.
Use the Kruskal–Wallis test for the experiment in Problem 3.23. Are the results comparable to those found by the usual analysis of variance?
kruskal.test(df_323$fluid~df_323$type,data=df_323)
##
## Kruskal-Wallis rank sum test
##
## data: df_323$fluid by df_323$type
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015
From the kruskal-wallis test and the usual analysis of variance(ANOVA) we find,
Both fails to reject the Null Hypothesis.
In case of Kruskal_wallis test, the p-value is almost the double compared to the ANOVA.
A chemist wishes to test the effect of four chemical agents on the strength of a particular type of cloth. Because there might be variability from one bolt to another, the chemist decides to use a randomized block design, with the bolts of cloth considered as blocks. She selects five bolts and applies all four chemicals in random order to each bolt. The resulting tensile strengths follow. Analyze the data from this experiment (use ( $ 0.05) and draw appropriate conclusions.
ts <- c(73,68,74,71,67,73,67,75,72,70,75,68,78,73,68,73,71,75,75,69)
chemical <- c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))
bolt <- c(rep(seq(1:5),4))
str(bolt)
## int [1:20] 1 2 3 4 5 1 2 3 4 5 ...
chemical <- as.fixed (chemical)
bolt <- as.fixed (bolt)
\(H_{0} : \tau_{i}=0\)
\(H_a : \tau_i \neq 0\)
Where \(\tau_i = \mu-\mu_i\)
model_43 <- lm(ts~chemical+bolt)
gad(model_43)
## $anova
## Analysis of Variance Table
##
## Response: ts
## Df Sum Sq Mean Sq F value Pr(>F)
## chemical 3 12.95 4.317 2.3761 0.1211
## bolt 4 157.00 39.250 21.6055 2.059e-05 ***
## Residuals 12 21.80 1.817
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As we can see from GAD test the p value=0.1211 which is greater than \(\alpha\)(0.05)
Hence we fail to reject Null Hypothesis i.e we can say there is no difference among chemical on cloth even including nuisance variability of bolts
Assuming that chemical types and bolts are fixed, estimate the model parameters \(\tau_i\) and \(\beta_j\) in Problem 4.3.
We know \(\tau_i=\mu-\mu_i\)
grandmean_416 = mean(ts)
mean_t1 = mean(ts[1:5])
mean_t2 = mean(ts[6:10])
mean_t3 = mean(ts[11:15])
mean_t4 = mean(ts[16:20])
t_1 = grandmean_416-mean_t1
t_2 = grandmean_416-mean_t2
t_3 = grandmean_416-mean_t3
t_4 = grandmean_416-mean_t4
t_1
## [1] 1.15
t_2
## [1] 0.35
t_3
## [1] -0.65
t_4
## [1] -0.85
B_1 <- grandmean_416-mean(c(73,73,75,73))
B_2 <- grandmean_416-mean(c(68,67,68,71))
B_3 <- grandmean_416-mean(c(74,75,78,75))
B_4 <- grandmean_416-mean(c(71,72,73,75))
B_5 <- grandmean_416-mean(c(67,70,68,69))
B_1
## [1] -1.75
B_2
## [1] 3.25
B_3
## [1] -3.75
B_4
## [1] -1
B_5
## [1] 3.25
We get Chemical effect, \[\tau_1=1.15, \tau_2-0.35, \tau_3=-0.65, \tau_4=-0.85\]
Bolts effect, \(\beta_1=-1.75, \beta_2=3.25, \beta_3=-3.75, \beta_4=-1, \beta_5=3.25\)
The effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process is being studied. Each batch of new material is only large enough to permit five runs to be made. Furthermore, each run requires approximately hours, so only five runs can be made in one day. The experimenter decides to run the experiment as a Latin square so that day and batch effects may be systematically controlled. She obtains the data that follow. Analyze the data from this experiment (use \(\alpha= 0.05\)) and draw conclusions.
day <- c(rep(1,5), rep(2,5), rep(3,5), rep(4,5), rep(5,5))
batch <- c(rep(seq(1,5),5))
ingred <- c("A","C","B","D","E","B","E","A","C","D","D","A","C",
"E","B", "C","D","E","B","A","E","B","D","A","C")
obs <- c(8,11,4,6,4,7,2,9,8,2,1,7,10,6,3,7,3,1,6,8,3,8,5,10,8)
day <- as.fixed(day)
batch <- as.fixed(batch)
ingred <- as.fixed(ingred)
\(H_0 : \tau_i=0\); for all i
\(H_a : \tau_i \neq 0\); for some i.
Linear Equation Model, \(y_{ijk} = \mu + \tau_i+\beta_j+\gamma_k+\varepsilon_{ijk}\)
Where \(\mu\)=grand mean, \(\tau_i\)=fixed effect for treatments, \(\beta_j\)=Block effect for j, \(\gamma_k\)= block effect for k
and \(\varepsilon_{ijk}\)=Random error.
model_anova422 <- aov(obs~batch+day+ingred)
anova(model_anova422)
## Analysis of Variance Table
##
## Response: obs
## Df Sum Sq Mean Sq F value Pr(>F)
## batch 4 15.44 3.860 1.2345 0.3476182
## day 4 12.24 3.060 0.9787 0.4550143
## ingred 4 141.44 35.360 11.3092 0.0004877 ***
## Residuals 12 37.52 3.127
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA test of the Latin Square model we find the p-value=0.35>0.05,
Hence we can not reject \(H_0\). There is no significant difference between batches.
Additionally day has p-value=0.455>0.05, which shows blocking this was justifiable.
Although the Ingredients p-value suggest a non-significant impact on the experiment, blocking it seems logical because it represents a source of known nuisance.
library(car)
library(tidyr)
library(dplyr)
library(agricolae)
library(MASS)
library(GAD)
#3.23
##Reading Data
fluid1<-c(17.6,18.9, 16.3, 17.4, 20.1, 21.6)
fluid2<-c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3<-c( 21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4<-c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
fluid<-c(fluid1,fluid2,fluid3,fluid4)
type<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
type<-as.factor(type)
library(tidyr)
df_323<-data.frame(fluid,type)
str(df)
##Stating the hypothesis
##H_0 : mu_1=mU_2=mu_3=mu_4=mu
##H_a : one of the mean is different.
model_anova323<-aov(df_323$fluid~df_323$type,data=df_323)
summary(model_anova323)
plot(model_anova323)
LSD323<- LSD.test(model_anova323,'df$type',console = TRUE)
## 3.23(a) From the ANOVA model we can find the p-value (0.5225) to be greater than alpha.
## Hence we failed to reject H_0. We can not confidently indicate that, the fluid type means are different.
## 3.23(b) The LSD test shows the Fluid 3 have the largest mean out of 04. So we can say that fluid type 3 has the longer life.
## 3.23 (c) Variance assumption check
mean_fluid <-c(rep(mean(fluid1),6),rep(mean(fluid2),6),rep(mean(fluid3),6),rep(mean(fluid4),6))
res_323<-fluid-mean_fluid
plot(res_323~mean_fluid,main='Plot for Constant Variance',xlab= 'Means of Fluid Type', ylab='Residuals')
## Plot of the residuals have same spread for all four fluid types, therefore the constant variance assumptions holds.
#3.28
## Reading the Data
ft_1<-c(100,157,194,178)
ft_2<-c(1,2,4,18)
ft_3<-c(880,1256,5276,4355)
ft_4<-c(495,7040,5307,10050)
ft_5<-c(7,5,29,2)
failure_time<-c(ft_1,ft_2,ft_3,ft_4,ft_5)
material_type<-c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
df_328<-data.frame(failure_time,material_type)
str(df_328)
df_328$material_type<-as.factor(df_328$material_type)
boxplot(df_328$failure_time~df_328$material_type, main='Boxplot of Failure Time',
xlab='Material Type',ylab='Failure Time (min)')
## (a) From the data table we can see that, the failure time of Material 2 & 5 is drastically
## different from the rest of the materials data. Which indicates that the means are significantly different.
## Also from the box plot we can see that, the means are nowhere close to each other and the variance is widely different from each other.
## So we conclude that the materials do not have the same effect on mean failure time.
mean_ft<-c(rep(mean(ft_1),4),rep(mean(ft_2),4),rep(mean(ft_3),4),
rep(mean(ft_4),4),rep(mean(ft_5),4))
res_ft<-failure_time-mean_ft
plot(res_ft~mean_ft,main='Residuals vs Predicted Response Plot',
xlab='Predicted Response',ylab='Residuals')
qqnorm(res_ft)
qqline(res_ft, col='red')
## (b) From the residuals vs predicted response plot we can clearly state that the variances are not constant.
## The Normality plot of the residuals do not uphold the Normally assumption.
## (c) Considering part b analysis, we can conduct a BoxCox Transformation.
boxcox(df_328$failure_time~df_328$material_type,data=df_328)
## From the BoxCox we can see, lamda is between -0.1 to 0.1. Lets assume lamda=0
## For lamda=0, a log-transformation is required.
log_failure_time<-log(df_328$failure_time)
model_anova328_log<-aov(log_failure_time~df_328$material_type,data=df_328)
summary(model_anova328_log)
plot(model_anova328_log)
## From the ANOVA model plot we can see that, after log-transformation the residuals shows
## normality & constant variance. Hence the ANOVA assumption of Normality & Constant Variance holds.
## Also we find the p-value=1.21e-7 <0.05, so we can confidently say that mean failure time of at least one
## material is different.
# 3.29
## Reading the Data
c_1 <- c(31,10,21,4,1)
c_2 <- c(62,40,24,30,35)
c_3<- c(53,27,120,97,68)
methods<-as.factor(c(rep(1,5),rep(2,5),rep(3,5)))
counts<-c(c_1,c_2,c_3)
df_329<-data.frame(counts,methods)
##Hypotheses
##H_0 : u1=u2=u3
##H_a : one of the mean is different.
model_anova329<-aov(df_329$counts~df_329$methods,data=df_329)
summary(model_anova329)
## (a) From the summary of the ANOVA model we find the p-value(0.006) to be very small.
## So we reject H_0, all methods do not have same effect on the mean particle count.
mean_329<-c(rep(mean(c_1),5),rep(mean(c_2),5),rep(mean(c_3),5))
res_329<-counts-mean_329
qqnorm(res_329, main='NPP for Residuals')
qqline(res_329,col='darkblue')
plot(res_329~mean_329,main='Residuals vs Predicted Response',
xlab='Methods', ylab='Residuals')
## (b) The residuals vs Predicted response plot shows that the variances are not constant,
## additionally the residuals do not follow normality.
##(c)
boxcox(model_anova329)
## From the BoxCox plot we can assume lamda to be between 0.2 to 0.8
lamda=0.5
y_329=df_329$counts^lamda
Boxplot(y_329~df_329$methods,data= df_329, xlab='Methods', ylab='residuals')
model_anova329_t<-aov(y_329~df_329$methods,data=df_329)
summary(model_anova329_t)
plot(model_anova329_t)
## After BoxCox transformation the residuals shows normality & constant variance.
## From ANOVA Model we find the p-value=0.00295 < 0.05, so we confidently reject H_0.
# 3.51
kruskal.test(df_323$fluid~df_323$type,data=df_323)
## In Problem 3.23 we failed to reject H_0, that is the means are equal for 04 fluid types.
## From the Kruskal-Wallis test we find that the p-value= 0.1015>0.05, Hence we failed to reject
## the Null Hypothesis.
# 3.52
## From the kruskal-wallis test and the usual analysis of variance(ANOVA) we find,
## Both fails to reject the Null Hypothesis.
## In case of Kruskal_wallis test, the p-value is almost the double compared to the ANOVA.
# 4.3
## Reading the Data
ts <- c(73,68,74,71,67,73,67,75,72,70,75,68,78,73,68,73,71,75,75,69)
chemical <- c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))
bolt <- c(rep(seq(1:5),4))
str(bolt)
chemical <- as.fixed (chemical)
bolt <- as.fixed (bolt)
## Hypotheses
## H_0 : t_i=0
## H_a : t_i !=0
## Where t_i=u-u_i
## Linear Model
model_43 <- lm(ts~chemical+bolt)
gad(model_43)
## As we can see from GAD test the p value=0.1211 which is greater than alpha(0.05)
## Hence we fail to reject Null Hypothesis i.e we can say there is no difference among chemical on cloth
## even including nuisance variability of bolts
# 4.16
## Effect of the Chemical, t_i
## We know t_i=u-u_i
grandmean_416 = mean(ts)
mean_t1 = mean(ts[1:5])
mean_t2 = mean(ts[6:10])
mean_t3 = mean(ts[11:15])
mean_t4 = mean(ts[16:20])
t_1 = grandmean_416-mean_t1
t_2 = grandmean_416-mean_t2
t_3 = grandmean_416-mean_t3
t_4 = grandmean_416-mean_t4
## Effect of the Bolts, B_i
B_1 <- grandmean_416-mean(c(73,73,75,73))
B_2 <- grandmean_416-mean(c(68,67,68,71))
B_3 <- grandmean_416-mean(c(74,75,78,75))
B_4 <- grandmean_416-mean(c(71,72,73,75))
B_5 <- grandmean_416-mean(c(67,70,68,69))
## We get Chemical effect t-1=1.15, t_2-0.35, t_3=-0.65, t_4=-0.85
## Bolts effect B_1=-1.75, B_2=3.25, B_3=-3.75, B_4=-1, B_5=3.25
# 4.22
## Reading The Data
day <- c(rep(1,5), rep(2,5), rep(3,5), rep(4,5), rep(5,5))
batch <- c(rep(seq(1,5),5))
ingred <- c("A","C","B","D","E","B","E","A","C","D","D","A","C",
"E","B", "C","D","E","B","A","E","B","D","A","C")
obs <- c(8,11,4,6,4,7,2,9,8,2,1,7,10,6,3,7,3,1,6,8,3,8,5,10,8)
day <- as.fixed(day)
batch <- as.fixed(batch)
ingred <- as.fixed(ingred)
## Hypotheses for Fixed Effect Model
## H_0 : t_i=0; for all i
## H_a : t_i !=0; for some i.
## Linear Equation y_ijk=u+t_i+B_j+Y_k+E_ijk
## Where u=grand mean, t_i=fixed effect for treatments
## B_j=Block effect for j, Y_k= block effect for k and E_ijk=Random error.
model_anova422 <- aov(obs~batch+day+ingred)
anova(model_anova422)
## From the ANOVA test of the Latin Square model we find the p-value=0.35>0.05,
## Hence we can not reject H_0. There is no significant difference between batches.
## Additionally day has p-value=0.455>0.05, which shows blocking this was justifiable.
## Although the Ingredients p-value suggest a non-significant impact on the experiment,
## blocking it seems logical because it represents a source of known nuisance.