1. What assumption must we test to include a variable as a blocking factor?

We must have perform of test on the additivity of interaction. Also, we need to makesure that there is no interaction between the blocking variable and our predictor(s) ofinterest

  1. Recognize the IV, DV, block and create a table for the following research statement.

    “A company is planning to investigate the motor skills or elderly population. The company separates the target population into three age categories: 60 – 69, 70 – 79, and above 80 then randomly assign the participants in the study to one of the three task conditions. After individuals have completed the task, their performance will be compared.”

IV : Age categories : 60 – 69, 70 – 79, and above 80 DV : Performance blocks : conditions : 1, 2, 3

  1. Use the data “Lab 3” with the research question to perform a fine report.

*age “1”:60-69, “2”: 70-79 and “3”: above 80.

Observations from the study were analyzed by conducting a one-way analysis of variance using R version 4.0.2. First, all assumptions are met, and there is no adjustment made. From the result it suggests that Age has a significant effect on the performance (F (2, 89) = 370.8, p < .001). the output also shows that condition influences performance score (F (2, 89) = 286.9, p <.001). A Bonferroni hoc test shows that there exists significant difference between the three pairs. Continue the discussion with specifically which groups differed, a Tukey’s hoc test was established. The result suggested that there is a significant difference between brewing methods. The effect was large, difference of score between 60 – 69 and 70 – 79 is Cohen’ D = 1.13. and difference of score between 60 – 69 and above 80 is Cohen’ D = 2.72

library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("moments")
library("pastecs")
## 
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
df = read_excel('/Users/jingx/Downloads/HU/510/Lab3.xlsx')
df = df %>% mutate(Age = Age %>% as.factor(), Condition = Condition %>% as.factor())
library("compute.es")
library("moments")
# First look at the overall distribution:
plot(density(df$Performance_score))

qqnorm(df$Performance_score)

# D'Agostino skewness test:
agostino.test(df$Performance_score)
## 
##  D'Agostino skewness test
## 
## data:  df$Performance_score
## skew = -0.11171, z = -0.45976, p-value = 0.6457
## alternative hypothesis: data have a skewness
plot(density(df$Performance_score))

With shapiro.test > 0.05 suggesting the performance distribution is not significanlty different from a normal distribution.

lm = lm(Performance_score ~ Age, data=df) 
res = resid(lm)
plot(df$Performance_score, res, ylab="Residuals", xlab="Performance")
abline(0, 0)

From the residuel plots, it seems like the residuals are not strictly normaly distributed. but it does seem independent.

bartlett.test(df$Performance_score, df$Age)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  df$Performance_score and df$Age
## Bartlett's K-squared = 1.0587, df = 2, p-value = 0.589
tapply(df$Performance_score, df$Age, var)
##        1        2        3 
## 12.89901 18.99570 15.83744

The variance seems quite similar across age groups. 19/13 = 1.5 < 3 which is acceptable variance differences

m <- aov(Performance_score ~ Condition*Age  , data = df)
summary(m)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## Condition      2 1199.0   599.5 313.667 <2e-16 ***
## Age            2 1549.6   774.8 405.389 <2e-16 ***
## Condition:Age  4   22.6     5.7   2.961 0.0246 *  
## Residuals     80  152.9     1.9                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2 <- aov(Performance_score ~ Age+Condition  , data = df)
summary(m2)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Age          2 1549.7   774.9   370.8 <2e-16 ***
## Condition    2 1198.9   599.5   286.9 <2e-16 ***
## Residuals   84  175.5     2.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Having the intervaction term not significant, meaning that there is no significant interaction between the treatment and the block. having both IV significant vaidates the set ups of the model

pairwise.t.test(df$Performance_score, df$Age, paired = FALSE)
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  df$Performance_score and df$Age 
## 
##   1       2      
## 2 3.5e-05 -      
## 3 3.2e-15 4.8e-07
## 
## P value adjustment method: holm
TukeyHSD(m2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Performance_score ~ Age + Condition, data = df)
## 
## $Age
##           diff        lwr       upr p adj
## 2-1  -4.512792  -5.403864 -3.621720     0
## 3-1 -10.310345 -11.216147 -9.404543     0
## 3-2  -5.797553  -6.688625 -4.906480     0
## 
## $Condition
##          diff       lwr       upr p adj
## 2-1 -4.189467 -5.095808 -3.283126     0
## 3-1 -9.005432 -9.904688 -8.106176     0
## 3-2 -4.815965 -5.699331 -3.932599     0
t = by(df$Performance_score, df$Age, stat.desc)
mes(t$`1`['mean'], t$`2`['mean'], t$`1`['std.dev'], t$`2`['std.dev'], t$`1`['nbr.val'], t$`2`['nbr.val'], verbose = FALSE)$d
## [1] 1.13
mes(t$`1`['mean'], t$`3`['mean'], t$`1`['std.dev'], t$`3`['std.dev'], t$`1`['nbr.val'], t$`3`['nbr.val'], verbose = FALSE)$d
## [1] 2.72