ANOVA - Blocking

What assumption must we test to include a variable as a blocking factor? Nrmality, Independence of Observation, Equal Variance, and Additivity of Interactions. The block should explain some of the variance from the Sum of Squares (SS) error so we can first: explain more variance and second: have less unexplained variance.
Recognize the IV, DV, block and create a table for the following research statement. “A company is planning to investigate the motor skills or elderly population. The company separates the target population into three age categories: 60 – 69, 70 – 79, and above 80 then randomly assign the participants in the study to one of the three task conditions. After individuals have completed the task, their performance will be compared.”

IV - task condition 1, 2, 3 (predictor) DV - performance score (outcome) Block - age categories (1: 60-69, 2: 70-79, 3: 80&above) Table - BlockTask Condition 1 Task Condition 2 Task Condition 3 Age 1
Age 2
Age 3
Use the data “Lab 3” with the research question to perform a fine report. age “1”:60-69, “2”: 70-79 and “3”: above 80.
The statement of the research/ study purpose H0: Motor skills (performance score) of elederly population are equal (Condition1 = Condition2 = Condition3) H1: at least one pair of scores differ from one another
The type of analysis conducted, i.e. D’Agostino test, Scatterplot of residuals, Bartlett test. etc.
Descriptive statistics: basic information of the data, i.e. age and gender of the participants.
The ANOVA test
Post-hoc analysis
Effect size
Conclusions

setwd("E:\\mikhilesh\\HU Sem VI ANLY 510 and 506\\ANLY 510 Kao Principals and Applications\\assignment and project")
library(readxl)

## Warning: package 'readxl' was built under R version 3.6.3

data3 <- read_excel("Lecture 3 Lab3.xlsx")
names(data3)

## [1] "Age"               "Performance_score" "Condition"

str(data3)

## Classes 'tbl_df', 'tbl' and 'data.frame':    89 obs. of  3 variables:
##  $ Age              : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Performance_score: num  36 39 35 36 37 35 36 37 35 35 ...
##  $ Condition        : num  1 1 1 1 1 1 1 1 1 2 ...

summary(data3)

##       Age    Performance_score   Condition    
##  Min.   :1   Min.   :15.00     Min.   :1.000  
##  1st Qu.:1   1st Qu.:24.00     1st Qu.:1.000  
##  Median :2   Median :27.00     Median :2.000  
##  Mean   :2   Mean   :27.52     Mean   :2.034  
##  3rd Qu.:3   3rd Qu.:32.00     3rd Qu.:3.000  
##  Max.   :3   Max.   :39.00     Max.   :3.000

Assumptions

library(moments)
plot(density(data3$Performance_score), main = "Density Plot")

qqnorm(data3$Performance_score)

agostino.test(data3$Performance_score)

## 
##  D'Agostino skewness test
## 
## data:  data3$Performance_score
## skew = -0.11171, z = -0.45976, p-value = 0.6457
## alternative hypothesis: data have a skewness

shapiro.test(data3$Performance_score)

## 
##  Shapiro-Wilk normality test
## 
## data:  data3$Performance_score
## W = 0.9755, p-value = 0.09018

anscombe.test(data3$Performance_score)

## 
##  Anscombe-Glynn kurtosis test
## 
## data:  data3$Performance_score
## kurt = 2.2365, z = -2.0554, p-value = 0.03984
## alternative hypothesis: kurtosis is not equal to 3

#Residual plot (lm - linear model)
performance.lm <- lm(Performance_score ~ Condition, data = data3)
performance.lm

## 
## Call:
## lm(formula = Performance_score ~ Condition, data = data3)
## 
## Coefficients:
## (Intercept)    Condition  
##      36.686       -4.509

performance.res <- resid(performance.lm) #OR performance.res <- residuals(performance.lm)
performance.res

##          1          2          3          4          5          6          7 
##  3.8225868  6.8225868  2.8225868  3.8225868  4.8225868  2.8225868  3.8225868 
##          8          9         10         11         12         13         14 
##  4.8225868  2.8225868  7.3311713  6.3311713  3.3311713  4.3311713  5.3311713 
##         15         16         17         18         19         20         21 
##  5.3311713  6.3311713  7.3311713  6.3311713  4.3311713  6.8397558  6.8397558 
##         22         23         24         25         26         27         28 
##  4.8397558  4.8397558  5.8397558  5.8397558  2.8397558  3.8397558  3.8397558 
##         29         30         31         32         33         34         35 
##  4.8397558  1.8225868  0.8225868  2.8225868 -0.1774132  4.8225868  0.8225868 
##         36         37         38         39         40         41         42 
##  1.8225868 -0.1774132 -1.1774132 -0.1774132  2.3311713  0.3311713  0.3311713 
##         43         44         45         46         47         48         49 
## -0.6688287 -0.6688287 -1.6688287 -1.6688287  0.3311713  1.3311713 -1.6688287 
##         50         51         52         53         54         55         56 
##  1.8397558 -0.1602442 -0.1602442  0.8397558  1.8397558  0.8397558 -0.1602442 
##         57         58         59         60         61         62         63 
## -2.1602442 -1.1602442  0.8397558  0.8397558 -5.1774132 -4.1774132 -5.1774132 
##         64         65         66         67         68         69         70 
## -6.1774132 -7.1774132 -6.1774132 -7.1774132 -5.1774132 -4.1774132 -4.6688287 
##         71         72         73         74         75         76         77 
## -5.6688287 -3.6688287 -5.6688287 -3.6688287 -2.6688287 -3.6688287 -4.6688287 
##         78         79         80         81         82         83         84 
## -6.6688287 -7.6688287 -4.1602442 -6.1602442 -6.1602442 -5.1602442 -4.1602442 
##         85         86         87         88         89 
## -7.1602442 -6.1602442 -5.1602442 -4.1602442 -8.1602442

plot(data3$Performance_score, performance.res, ylab = "Residual", xlab = "Condition", main = "Independence of Observation")
abline(0, 0)

bartlett.test(data3$Performance_score, data3$Condition)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  data3$Performance_score and data3$Condition
## Bartlett's K-squared = 0.14381, df = 2, p-value = 0.9306

# Variance are equal. p-value = 0.9306 failed to reject the null hypothesis

#Checking variance among groups
tapply(data3$Performance_score, data3$Condition, var)

##        1        2        3 
## 18.36508 20.94713 20.72903

#ratio of largest variance to smallest variance 20.95/18.37 = 1.1 - is way less than 3 (signifies that there is no issue with failing to reject null hypothesis)

#After checking all the assumptions fulfill our criteria, now we will perform ANOVA with Blocking Design

#IV - 3 conditions (predictor) categorical
#DV - performance score (outcome) continuous

#Testing Additivity of Interactions - To check there is no interaction between predictor/IV - conditions and block - age (Interaction variable - factor(Condition)*factor(Age))
#Perform the linear model: 
model1 <- aov(Performance_score ~ factor(Condition)*factor(Age), data = data3) 
model1

## Call:
##    aov(formula = Performance_score ~ factor(Condition) * factor(Age), 
##     data = data3)
## 
## Terms:
##                 factor(Condition) factor(Age) factor(Condition):factor(Age)
## Sum of Squares          1199.0299   1549.6499                       22.6398
## Deg. of Freedom                 2           2                             4
##                 Residuals
## Sum of Squares   152.9051
## Deg. of Freedom        80
## 
## Residual standard error: 1.382502
## Estimated effects may be unbalanced

summary(model1) #OR summary(aov(Performance_score ~ factor(Condition), data = data3)) #(Don't forget to factor()ize your predictor)

##                               Df Sum Sq Mean Sq F value Pr(>F)    
## factor(Condition)              2 1199.0   599.5 313.667 <2e-16 ***
## factor(Age)                    2 1549.6   774.8 405.389 <2e-16 ***
## factor(Condition):factor(Age)  4   22.6     5.7   2.961 0.0246 *  
## Residuals                     80  152.9     1.9                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Interaction effect - factor(Condition):factor(Age) is not sifnificant (p-value < 0.05)

model2 <- aov(Performance_score ~ factor(Condition), data = data3)
model2

## Call:
##    aov(formula = Performance_score ~ factor(Condition), data = data3)
## 
## Terms:
##                 factor(Condition) Residuals
## Sum of Squares           1199.030  1725.195
## Deg. of Freedom                 2        86
## 
## Residual standard error: 4.478884
## Estimated effects may be unbalanced

summary(model2)

##                   Df Sum Sq Mean Sq F value  Pr(>F)    
## factor(Condition)  2   1199   599.5   29.89 1.4e-10 ***
## Residuals         86   1725    20.1                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Effect of Condition is signifiacnt

#OTHER MOdels FOR COmparison
model3 <- aov(Performance_score ~ factor(Age), data = data3)
model3

## Call:
##    aov(formula = Performance_score ~ factor(Age), data = data3)
## 
## Terms:
##                 factor(Age) Residuals
## Sum of Squares     1549.733  1374.492
## Deg. of Freedom           2        86
## 
## Residual standard error: 3.997807
## Estimated effects may be unbalanced

summary(model3)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(Age)  2   1550   774.9   48.48 7.97e-15 ***
## Residuals   86   1374    16.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model4 <- aov(Performance_score ~ factor(Condition) + factor(Age), data = data3)
model4

## Call:
##    aov(formula = Performance_score ~ factor(Condition) + factor(Age), 
##     data = data3)
## 
## Terms:
##                 factor(Condition) factor(Age) Residuals
## Sum of Squares          1199.0299   1549.6499  175.5449
## Deg. of Freedom                 2           2        84
## 
## Residual standard error: 1.445621
## Estimated effects may be unbalanced

summary(model4)

##                   Df Sum Sq Mean Sq F value Pr(>F)    
## factor(Condition)  2 1199.0   599.5   286.9 <2e-16 ***
## factor(Age)        2 1549.6   774.8   370.8 <2e-16 ***
## Residuals         84  175.5     2.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Age can not be considered a block as there is significant Interaction effect.

#Analysis of Variance Table
anova(model1, model2)

## Analysis of Variance Table
## 
## Model 1: Performance_score ~ factor(Condition) * factor(Age)
## Model 2: Performance_score ~ factor(Condition)
##   Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
## 1     80  152.91                                 
## 2     86 1725.19 -6   -1572.3 137.1 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#To find out what group differs from the others
library(pgirmess)

## Warning: package 'pgirmess' was built under R version 3.6.3

#Pairwise comparisons using t tests - Bonferroni Method
pairwise.t.test(data3$Performance_score, data3$Condition, paired = FALSE, p.adjust.method = "bonferroni") #method can be "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or "BY“

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  data3$Performance_score and data3$Condition 
## 
##   1      2     
## 2 0.0017 -     
## 3 6e-11  0.0002
## 
## P value adjustment method: bonferroni

#Significant difference between condition 1, 2, and 3 noted.

#Kurskal-Wallis:
kruskalmc(Performance_score ~ factor(Condition), data = data3)

## Multiple comparison test after Kruskal-Wallis 
## p.value: 0.05 
## Comparisons
##      obs.dif critical.dif difference
## 1-2 19.60357     16.25251       TRUE
## 1-3 39.31970     16.12547       TRUE
## 2-3 19.71613     15.84052       TRUE

#Tukey’s Test:
TukeyHSD(model1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Performance_score ~ factor(Condition) * factor(Age), data = data3)
## 
## $`factor(Condition)`
##          diff       lwr       upr p adj
## 2-1 -4.204762 -5.072310 -3.337214     0
## 3-1 -9.006912 -9.867679 -8.146146     0
## 3-2 -4.802151 -5.647707 -3.956594     0
## 
## $`factor(Age)`
##           diff        lwr       upr p adj
## 2-1  -4.516166  -5.369099 -3.663233     0
## 3-1 -10.310345 -11.177377 -9.443313     0
## 3-2  -5.794179  -6.647112 -4.941246     0
## 
## $`factor(Condition):factor(Age)`
##                  diff        lwr         upr     p adj
## 2:1-1:1 -2.922222e+00  -4.947474  -0.8969700 0.0005097
## 3:1-1:1 -8.022222e+00 -10.047474  -5.9969700 0.0000000
## 1:2-1:1 -2.922222e+00  -4.947474  -0.8969700 0.0005097
## 2:2-1:1 -8.722222e+00 -10.747474  -6.6969700 0.0000000
## 3:2-1:1 -1.276768e+01 -14.748843 -10.7865103 0.0000000
## 1:3-1:1 -9.666667e+00 -11.744532  -7.5888017 0.0000000
## 2:3-1:1 -1.342222e+01 -15.447474 -11.3969700 0.0000000
## 3:3-1:1 -1.872222e+01 -20.747474 -16.6969700 0.0000000
## 3:1-2:1 -5.100000e+00  -7.071236  -3.1287642 0.0000000
## 1:2-2:1  1.421085e-14  -1.971236   1.9712358 1.0000000
## 2:2-2:1 -5.800000e+00  -7.771236  -3.8287642 0.0000000
## 3:2-2:1 -9.845455e+00 -11.771369  -7.9195406 0.0000000
## 1:3-2:1 -6.744444e+00  -8.769697  -4.7191922 0.0000000
## 2:3-2:1 -1.050000e+01 -12.471236  -8.5287642 0.0000000
## 3:3-2:1 -1.580000e+01 -17.771236 -13.8287642 0.0000000
## 1:2-3:1  5.100000e+00   3.128764   7.0712358 0.0000000
## 2:2-3:1 -7.000000e-01  -2.671236   1.2712358 0.9674422
## 3:2-3:1 -4.745455e+00  -6.671369  -2.8195406 0.0000000
## 1:3-3:1 -1.644444e+00  -3.669697   0.3808078 0.2078478
## 2:3-3:1 -5.400000e+00  -7.371236  -3.4287642 0.0000000
## 3:3-3:1 -1.070000e+01 -12.671236  -8.7287642 0.0000000
## 2:2-1:2 -5.800000e+00  -7.771236  -3.8287642 0.0000000
## 3:2-1:2 -9.845455e+00 -11.771369  -7.9195406 0.0000000
## 1:3-1:2 -6.744444e+00  -8.769697  -4.7191922 0.0000000
## 2:3-1:2 -1.050000e+01 -12.471236  -8.5287642 0.0000000
## 3:3-1:2 -1.580000e+01 -17.771236 -13.8287642 0.0000000
## 3:2-2:2 -4.045455e+00  -5.971369  -2.1195406 0.0000001
## 1:3-2:2 -9.444444e-01  -2.969697   1.0808078 0.8583631
## 2:3-2:2 -4.700000e+00  -6.671236  -2.7287642 0.0000000
## 3:3-2:2 -1.000000e+01 -11.971236  -8.0287642 0.0000000
## 1:3-3:2  3.101010e+00   1.119844   5.0821766 0.0001161
## 2:3-3:2 -6.545455e-01  -2.580459   1.2713685 0.9750171
## 3:3-3:2 -5.954545e+00  -7.880459  -4.0286315 0.0000000
## 2:3-1:3 -3.755556e+00  -5.780808  -1.7303033 0.0000028
## 3:3-1:3 -9.055556e+00 -11.080808  -7.0303033 0.0000000
## 3:3-2:3 -5.300000e+00  -7.271236  -3.3287642 0.0000000

library(pastecs)

## Warning: package 'pastecs' was built under R version 3.6.3

library(compute.es)

## Warning: package 'compute.es' was built under R version 3.6.3

#First get the relevant stats for each factor level (only relevant output pasted below):
by(data3$Performance_score, data3$Condition, stat.desc)

## data3$Condition: 1
##      nbr.val     nbr.null       nbr.na          min          max        range 
##   28.0000000    0.0000000    0.0000000   25.0000000   39.0000000   14.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##  898.0000000   33.0000000   32.0714286    0.8098739    1.6617239   18.3650794 
##      std.dev     coef.var 
##    4.2854497    0.1336220 
## ------------------------------------------------------------ 
## data3$Condition: 2
##      nbr.val     nbr.null       nbr.na          min          max        range 
##   30.0000000    0.0000000    0.0000000   20.0000000   35.0000000   15.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##  836.0000000   27.5000000   27.8666667    0.8356061    1.7090064   20.9471264 
##      std.dev     coef.var 
##    4.5768031    0.1642393 
## ------------------------------------------------------------ 
## data3$Condition: 3
##      nbr.val     nbr.null       nbr.na          min          max        range 
##   31.0000000    0.0000000    0.0000000   15.0000000   30.0000000   15.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##  715.0000000   24.0000000   23.0645161    0.8177276    1.6700226   20.7290323 
##      std.dev     coef.var 
##    4.5529147    0.1973991

#n - sample size, M - mean, SD - standard deviation 
#Condition 1 (n = 28, M = 32.07 , SD = 4.3) 
#Condition 2 (n = 30, M = 27.87, SD = 4.6)
#Condition 3 (n = 31, M = 23.06, SD = 4.6)

#Use these values to compute a few standard Measures of Effect Size (MES or mes) for any pair of interest 

#OPtion 1 will compare Condition 1 and 2 ~ Performace_score vs Condition
mes(27.87, 32.07, 4.6, 4.3, 30, 28)

## Mean Differences ES: 
##  
##  d [ 95 %CI] = -0.94 [ -1.48 , -0.4 ] 
##   var(d) = 0.08 
##   p-value(d) = 0 
##   U3(d) = 17.31 % 
##   CLES(d) = 25.26 % 
##   Cliff's Delta = -0.49 
##  
##  g [ 95 %CI] = -0.93 [ -1.46 , -0.39 ] 
##   var(g) = 0.07 
##   p-value(g) = 0 
##   U3(g) = 17.63 % 
##   CLES(g) = 25.55 % 
##  
##  Correlation ES: 
##  
##  r [ 95 %CI] = -0.43 [ -0.62 , -0.2 ] 
##   var(r) = 0.01 
##   p-value(r) = 0 
##  
##  z [ 95 %CI] = -0.46 [ -0.73 , -0.2 ] 
##   var(z) = 0.02 
##   p-value(z) = 0 
##  
##  Odds Ratio ES: 
##  
##  OR [ 95 %CI] = 0.18 [ 0.07 , 0.48 ] 
##   p-value(OR) = 0 
##  
##  Log OR [ 95 %CI] = -1.71 [ -2.69 , -0.72 ] 
##   var(lOR) = 0.25 
##   p-value(Log OR) = 0 
##  
##  Other: 
##  
##  NNT = -6.14 
##  Total N = 58

#OPtion 2 will compare Condition 2 and 3 ~ Performace_score vs Condition
mes(23.06, 27.87, 4.6, 4.6, 31, 30)

## Mean Differences ES: 
##  
##  d [ 95 %CI] = -1.05 [ -1.58 , -0.51 ] 
##   var(d) = 0.07 
##   p-value(d) = 0 
##   U3(d) = 14.79 % 
##   CLES(d) = 22.98 % 
##   Cliff's Delta = -0.54 
##  
##  g [ 95 %CI] = -1.03 [ -1.56 , -0.5 ] 
##   var(g) = 0.07 
##   p-value(g) = 0 
##   U3(g) = 15.1 % 
##   CLES(g) = 23.27 % 
##  
##  Correlation ES: 
##  
##  r [ 95 %CI] = -0.47 [ -0.64 , -0.25 ] 
##   var(r) = 0.01 
##   p-value(r) = 0 
##  
##  z [ 95 %CI] = -0.51 [ -0.77 , -0.25 ] 
##   var(z) = 0.02 
##   p-value(z) = 0 
##  
##  Odds Ratio ES: 
##  
##  OR [ 95 %CI] = 0.15 [ 0.06 , 0.4 ] 
##   p-value(OR) = 0 
##  
##  Log OR [ 95 %CI] = -1.9 [ -2.87 , -0.93 ] 
##   var(lOR) = 0.25 
##   p-value(Log OR) = 0 
##  
##  Other: 
##  
##  NNT = -5.87 
##  Total N = 61

#OPtion 3 will compare Condition 1 and 3 ~ Performace_score vs Condition
mes(23.06, 32.07, 4.6, 4.3, 31, 28)

## Mean Differences ES: 
##  
##  d [ 95 %CI] = -2.02 [ -2.65 , -1.39 ] 
##   var(d) = 0.1 
##   p-value(d) = 0 
##   U3(d) = 2.17 % 
##   CLES(d) = 7.66 % 
##   Cliff's Delta = -0.85 
##  
##  g [ 95 %CI] = -1.99 [ -2.61 , -1.37 ] 
##   var(g) = 0.1 
##   p-value(g) = 0 
##   U3(g) = 2.31 % 
##   CLES(g) = 7.93 % 
##  
##  Correlation ES: 
##  
##  r [ 95 %CI] = -0.72 [ -0.82 , -0.56 ] 
##   var(r) = 0 
##   p-value(r) = 0 
##  
##  z [ 95 %CI] = -0.9 [ -1.16 , -0.64 ] 
##   var(z) = 0.02 
##   p-value(z) = 0 
##  
##  Odds Ratio ES: 
##  
##  OR [ 95 %CI] = 0.03 [ 0.01 , 0.08 ] 
##   p-value(OR) = 0 
##  
##  Log OR [ 95 %CI] = -3.66 [ -4.8 , -2.53 ] 
##   var(lOR) = 0.34 
##   p-value(Log OR) = 0 
##  
##  Other: 
##  
##  NNT = -5.05 
##  Total N = 59

#Effect size  (more than or equal to) < = 0.1 is small, 0.25 is medium, 0.4 is large (Cohen, 1988)

#Summary of ANOVA:
Observations from the study were analyzed by conducting a one-way analysis of variance using R version 3.6.1. First, all assumptions are met and there is no adjustment made. Results suggest that the task conditions (predictor) has a significant effect on performace score (outcome) (F(2, 86) = 29.89, p < .001). [We can not consider age group as blocks because it also has a significant effect on outcomes F(1, 87) = 60.21, p < .001.] Continuing the discussion with specifically which task condition produced the signiificantaly differed measures of the performance score, a Tukey’s hoc test was established. The result suggested that there is a significant difference between task condition 1 and 2, condition 2 and 3, and condition 1 and 3 (All p-value < 0.001) in terms of the performance score. The effect were large, Cohen’ D = 0.95, 1.05, and 2.02.

ANOVA - Blocking

Mikhilesh Dehane

5/17/2020