Homework week 11

Libraries

library(DoE.base)

## Warning: package 'DoE.base' was built under R version 4.2.2

## Loading required package: grid

## Loading required package: conf.design

## Registered S3 method overwritten by 'DoE.base':
##   method           from       
##   factorize.factor conf.design

## 
## Attaching package: 'DoE.base'

## The following objects are masked from 'package:stats':
## 
##     aov, lm

## The following object is masked from 'package:graphics':
## 
##     plot.design

## The following object is masked from 'package:base':
## 
##     lengths

Question 6.8

Preparing the data

medium <- rep(c(rep(1,2),rep(2,2)),6)
time <- c(rep(rep(12,4),3),rep(rep(18,4),3))

obs <- c(21,22,25,26,
         23,28,24,25,
         20,26,29,27,
         37,39,31,34,
         38,38,29,33,
         35,36,30,35)

data <- data.frame(medium, time, obs)

Statistical Analysis

The best linear equation model that describes the statisticas analysis for this data is:

\[ y_{ijk}=\mu+\alpha_i+\beta_j+\alpha\beta_{ij}+\epsilon_{ijk} \]

That’s because we need to be certain if the two different factors have a interaction between them. Therefore, the hypothesis that we first need to test is:

\[ H_o: \alpha\beta_{ij} = 0 \\ H_a: \alpha\beta_{ij} \neq 0 \]

For this, let’s perform a ANOVA test considering $\alpha=0.05$:

model.aov.1 <- aov(obs~medium+time+medium*time, data=data)
summary(model.aov.1)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## medium       1    9.4     9.4   1.835 0.190617    
## time         1  590.0   590.0 115.506 9.29e-10 ***
## medium:time  1   92.0    92.0  18.018 0.000397 ***
## Residuals   20  102.2     5.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value of the interaction is lesser than alpha, we reject Ho and, therefore, consider that there is a significant interaction between these two factors.

Now, let’s plot the residuals to see if the model that we have chosen is appropriated:

plot(model.aov.1,1)

From the plot above it is possible to see that the standard deviation is not the most constant as possible, but it is fairly constant.

plot(model.aov.1,2)

And since the data is normal, we can conclude that the model is appropriated to this data analysis.

Question 6.12

Item a

Item b

Preparing Data

A <- c("-","-","-","-","+","+","+","+","-","-","-","-","+","+","+","+")
B <- c("-","-","-","-","-","-","-","-","+","+","+","+","+","+","+","+")
obs <- c(14.037, 16.165,    13.972, 13.907,
         13.880, 13.860,    14.032, 13.914,
         14.821, 14.757,  14.843,   14.878,
         14.888, 14.921,    14.415, 14.932)

data <-  data.frame(A,B,obs)

Testing the data

To check the significance of the factors, let’s consider, first, the hypothesis for the interaction between the main effects, Hypothesis:

\[ Ho: \alpha\beta_{ij}=0 \\ Ha: \alpha\beta_{ij}\neq0 \]

model.aov.2 <- aov(obs~A+B+A*B,data=data)

summary(model.aov.2)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## A            1  0.403  0.4026   1.262 0.2833  
## B            1  1.374  1.3736   4.305 0.0602 .
## A:B          1  0.317  0.3170   0.994 0.3386  
## Residuals   12  3.828  0.3190                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value of the interaction is higher than $\alpha=0.05$, we fail to reject Ho. Therefore, the interaction between the main effects is not significant and we need to test the mais effects without considering the interaction.

The new hypothesis for the test are the following:

$$ Ho: _i=0 \ Ha: _i\

Ho: _j=0\ Ha: _j $$

model.aov.2 <- aov(obs~A+B,data=data)

summary(model.aov.2)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## A            1  0.403  0.4026   1.263 0.2815  
## B            1  1.374  1.3736   4.308 0.0584 .
## Residuals   13  4.145  0.3189                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results of the analysis of variance, we can see that both factors p-values are higher than $\alpha=0.05$. Therefore, we fail to reject Ho and we can conclude that neither factors are significant to the data.

Item c

Item d

First, let’s check the assumption for constant variance:

plot(model.aov.2,1)

From the plot above, we can see that the variance can be considered as fairly constant. But it is also possible to see that there is an outlier in the data

plot(model.aov.2,2)

From the plot above, we can consider the data as fairly normal (even with the outlier)

Item e

The first thing to do with the outlier is to investigate how this specific data was collected.

If any unconfomity was find, I would replace it with the mean of the data.

Question 6.21

Item a

Preparing the data

length <- rep(c(rep("-",7),rep("+",7)),8)
type <- rep(c(rep("-",14),rep("+",14)),4)
brk <- rep(c(rep("-",28),rep("+",28)),2)
slope <- c(rep("-",56),rep("+",56))

obs <- c(10.0, 18.0, 14.0, 12.5, 19.0, 16.0, 18.5,
         0.0, 16.5, 4.5, 17.5, 20.5, 17.5, 33.0,
         4.0, 6.0, 1.0, 14.5, 12.0, 14.0, 5.0,
         0.0, 10.0, 34.0, 11.0, 25.5, 21.5, 0.0,
         0.0, 0.0, 18.5, 19.5, 16.0, 15.0, 11.0,
         5.0, 20.5, 18.0, 20.0, 29.5, 19.0, 10.0,
         6.5, 18.5, 7.5, 6.0, 0.0, 10.0, 0.0,
         16.5, 4.5, 0.0, 23.5, 8.0, 8.0, 8.0,
         4.5, 18.0, 14.5, 10.0, 0.0, 17.5, 6.0,
         19.5, 18.0, 16.0, 5.5, 10.0, 7.0, 36.0,
         15.0, 16.0, 8.5, 0.0, 0.5, 9.0, 3.0,
         41.5, 39.0, 6.5, 3.5, 7.0, 8.5, 36.0,
         8.0, 4.5, 6.5, 10.0, 13.0, 41.0, 14.0,
         21.5, 10.5, 6.5, 0.0, 15.5, 24.0, 16.0,
         0.0, 0.0, 0.0, 4.5, 1.0, 4.0, 6.5,
         18.0, 5.0, 7.0, 10.0, 32.5, 18.5, 8.0)

data <- data.frame(length,type,brk,slope,obs)

Testing the data

model.aov.3 <- aov(obs~length*type*brk*slope,data=data)
summary(model.aov.3)

##                       Df Sum Sq Mean Sq F value  Pr(>F)   
## length                 1    917   917.1  10.588 0.00157 **
## type                   1    388   388.1   4.481 0.03686 * 
## brk                    1    145   145.1   1.676 0.19862   
## slope                  1      1     1.4   0.016 0.89928   
## length:type            1    219   218.7   2.525 0.11538   
## length:brk             1     12    11.9   0.137 0.71178   
## type:brk               1    115   115.0   1.328 0.25205   
## length:slope           1     94    93.8   1.083 0.30066   
## type:slope             1     56    56.4   0.651 0.42159   
## brk:slope              1      2     1.6   0.019 0.89127   
## length:type:brk        1      7     7.3   0.084 0.77294   
## length:type:slope      1    113   113.0   1.305 0.25623   
## length:brk:slope       1     39    39.5   0.456 0.50121   
## type:brk:slope         1     34    33.8   0.390 0.53386   
## length:type:brk:slope  1     96    95.6   1.104 0.29599   
## Residuals             96   8316    86.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the statistical test, we con conclude that there is no significant factor since all the p-values are below $\alpha=0.05$.

Item b

plot(model.aov.3,1)

From the plot above, it is possible to see that the model it is not the mos adequate model. That is because the variance is not constant.

Question 6.36

Getting the data

A <- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B <- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
C <- c(-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)

obs <- c(1.92,11.28,1.09,5.75,2.13,9.53,1.03,5.35,1.60,11.73,1.16,4.68,2.16,9.11,1.07,5.30)

data <- data.frame(A,B,C,D,obs)

Item a

mod <- lm(obs~A*B*C*D,data=data)

coef(mod)

## (Intercept)           A           B           C           D         A:B 
##    4.680625    3.160625   -1.501875   -0.220625   -0.079375   -1.069375 
##         A:C         B:C         A:D         B:D         C:D       A:B:C 
##   -0.298125    0.229375   -0.056875   -0.046875    0.029375    0.344375 
##       A:B:D       A:C:D       B:C:D     A:B:C:D 
##   -0.096875   -0.010625    0.094375    0.141875

halfnormal(mod)

## 
## Significant effects (alpha=0.05, Lenth method):

## [1] A     B     A:B   A:B:C

From the results above, we can check that the most apropriated model for this analysis is a model that involves A, B, C, AB and ABC.

Item b

model.aov.4 <- aov(obs~A+B+C+A*B+A*B*C,data=data)
summary(model.aov.4)

##             Df Sum Sq Mean Sq  F value   Pr(>F)    
## A            1 159.83  159.83 1563.061 1.84e-10 ***
## B            1  36.09   36.09  352.937 6.66e-08 ***
## C            1   0.78    0.78    7.616  0.02468 *  
## A:B          1  18.30   18.30  178.933 9.33e-07 ***
## A:C          1   1.42    1.42   13.907  0.00579 ** 
## B:C          1   0.84    0.84    8.232  0.02085 *  
## A:B:C        1   1.90    1.90   18.556  0.00259 ** 
## Residuals    8   0.82    0.10                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model.aov.4,1)

From the plot above it is possible to conclude that the model is not appropriated for this analysis since the variance is not even close to be constant.

Item c and d

obs1 <- log(obs)
data1 <- data.frame(A,B,C,D,obs1)

mod1 <- lm(obs1~A*B*C*D,data=data1)

coef(mod1)

##  (Intercept)            A            B            C            D          A:B 
##  1.185417116  0.812870345 -0.314277554 -0.006408558 -0.018077390 -0.024684570 
##          A:C          B:C          A:D          B:D          C:D        A:B:C 
## -0.039723700 -0.004225796 -0.009578245  0.003708723  0.017780432  0.063434408 
##        A:B:D        A:C:D        B:C:D      A:B:C:D 
## -0.029875960 -0.003740235  0.003765760  0.031322043

halfnormal(mod1)

## 
## Significant effects (alpha=0.05, Lenth method):

## [1] A     B     A:B:C

model.aov.4.1 <- aov(obs1~A+B+C+A*B*C,data = data1)
plot(model.aov.4.1,1)

The transformation was really usefull to this data. Now, the variances do not differ that much.

The fitted model is coded in the chunk above and has the terms A, B C, and ABC.

Question 6.39

Getting the data

A <- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B <- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
C <- c(-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
E <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)

y <- c(8.11,5.56,5.77,5.82,9.17,7.8,3.23,5.69,8.82,14.23,9.2,8.94,8.68,11.49,6.25,9.12,7.93,5,7.47,12,9.86,3.65,6.4,11.61,12.43,17.55,8.87,25.83,13.06,18.85,11.78,26.05)

data <- data.frame(A,B,C,D,E,y)

Item a

mod <- lm(y~A*B*C*D*E,data=data)
halfnormal(mod)

## 
## Significant effects (alpha=0.05, Lenth method):

##  [1] D     E     A:D   A     D:E   B:E   A:B   A:B:E A:E   A:D:E

Item b

plot(mod,1)

We cannot conclude anything about the model now. Since there is no significancy value to be analyzed.

Item c

From item b, we can conclude that factor C is not signiticant. Also, the halfnormal plot also tells us that only the factors A, B, D, E, AD, DE, BE, AB, ABE, AR, ADE.

Therefore we should consider the following linear model:

mod <- lm(y~A+B+D+E+A*D+D*E+B*E+A*B+A*B*E+A*E+A*D*E,data=data)
plot(mod,1)

From the plot above, the model seems to be apropriated, since the variance is constant.

Homework week 11

2022-11-13

Libraries

Question 6.8

Preparing the data

Statistical Analysis

Question 6.12

Item a

Item b

Preparing Data

Testing the data

Item c

Item d

Item e

Question 6.21

Item a

Preparing the data

Testing the data

Item b

Question 6.36

Getting the data

Item a

Item b

Item c and d

Question 6.39

Getting the data

Item a

Item b

Item c

Item d