The data file Weeklylab9data.xlsx contains workers performance (a percentage based on several factors such as output, quality of output, follow safety procedures, team work, etc.) in a production facility. Production was measured under two conditions for each worker (before and after lunch) and each worker had one kind of production task out of two (weld or bolt). Analyze the data using a repeated measures ANOVA and then as a mixed effects regression model to see if there is any difference in performance before or after lunch, or for different kinds of tasks.
library(moments)
library(kableExtra)
library(readxl)
library(car)
library(lmerTest)
library(lsmeans)cData <- read_excel("WeeklyLab9Data.xlsx")
kable(cData, "html") %>%
  kable_styling(bootstrap_options = "striped")| Participant | BeforeAfter | Task | Performance | 
|---|---|---|---|
| 1 | 0 | 0 | 0.60 | 
| 2 | 0 | 0 | 0.50 | 
| 3 | 0 | 0 | 0.70 | 
| 4 | 0 | 0 | 0.60 | 
| 5 | 0 | 0 | 0.70 | 
| 6 | 0 | 0 | 0.70 | 
| 7 | 0 | 0 | 0.30 | 
| 8 | 0 | 0 | 0.50 | 
| 9 | 0 | 0 | 0.30 | 
| 10 | 0 | 0 | 0.40 | 
| 11 | 0 | 0 | 0.60 | 
| 12 | 0 | 0 | 0.70 | 
| 13 | 0 | 0 | 0.40 | 
| 14 | 0 | 0 | 0.70 | 
| 15 | 0 | 0 | 0.60 | 
| 16 | 0 | 0 | 0.60 | 
| 17 | 0 | 0 | 0.70 | 
| 18 | 0 | 0 | 0.30 | 
| 19 | 0 | 0 | 0.50 | 
| 20 | 0 | 0 | 0.50 | 
| 21 | 0 | 0 | 0.70 | 
| 22 | 0 | 0 | 0.40 | 
| 23 | 0 | 0 | 0.40 | 
| 24 | 0 | 0 | 0.40 | 
| 25 | 0 | 0 | 0.50 | 
| 26 | 0 | 1 | 0.31 | 
| 27 | 0 | 1 | 0.50 | 
| 28 | 0 | 1 | 0.70 | 
| 29 | 0 | 1 | 0.61 | 
| 30 | 0 | 1 | 0.40 | 
| 31 | 0 | 1 | 0.30 | 
| 32 | 0 | 1 | 0.50 | 
| 33 | 0 | 1 | 0.31 | 
| 34 | 0 | 1 | 0.70 | 
| 35 | 0 | 1 | 0.60 | 
| 36 | 0 | 1 | 0.41 | 
| 37 | 0 | 1 | 0.60 | 
| 38 | 0 | 1 | 0.71 | 
| 39 | 0 | 1 | 0.30 | 
| 40 | 0 | 1 | 0.41 | 
| 41 | 0 | 1 | 0.31 | 
| 42 | 0 | 1 | 0.50 | 
| 43 | 0 | 1 | 0.40 | 
| 44 | 0 | 1 | 0.71 | 
| 45 | 0 | 1 | 0.60 | 
| 46 | 0 | 1 | 0.70 | 
| 47 | 0 | 1 | 0.61 | 
| 48 | 0 | 1 | 0.30 | 
| 49 | 0 | 1 | 0.51 | 
| 50 | 0 | 1 | 0.70 | 
| 1 | 1 | 0 | 0.42 | 
| 2 | 1 | 0 | 0.61 | 
| 3 | 1 | 0 | 0.71 | 
| 4 | 1 | 0 | 0.51 | 
| 5 | 1 | 0 | 0.61 | 
| 6 | 1 | 0 | 0.32 | 
| 7 | 1 | 0 | 0.71 | 
| 8 | 1 | 0 | 0.32 | 
| 9 | 1 | 0 | 0.31 | 
| 10 | 1 | 0 | 0.52 | 
| 11 | 1 | 0 | 0.62 | 
| 12 | 1 | 0 | 0.31 | 
| 13 | 1 | 0 | 0.52 | 
| 14 | 1 | 0 | 0.31 | 
| 15 | 1 | 0 | 0.51 | 
| 16 | 1 | 0 | 0.72 | 
| 17 | 1 | 0 | 0.42 | 
| 18 | 1 | 0 | 0.32 | 
| 19 | 1 | 0 | 0.42 | 
| 20 | 1 | 0 | 0.62 | 
| 21 | 1 | 0 | 0.52 | 
| 22 | 1 | 0 | 0.71 | 
| 23 | 1 | 0 | 0.61 | 
| 24 | 1 | 0 | 0.62 | 
| 25 | 1 | 0 | 0.72 | 
| 26 | 1 | 1 | 0.71 | 
| 27 | 1 | 1 | 0.51 | 
| 28 | 1 | 1 | 0.52 | 
| 29 | 1 | 1 | 0.72 | 
| 30 | 1 | 1 | 0.52 | 
| 31 | 1 | 1 | 0.42 | 
| 32 | 1 | 1 | 0.42 | 
| 33 | 1 | 1 | 0.41 | 
| 34 | 1 | 1 | 0.73 | 
| 35 | 1 | 1 | 0.63 | 
| 36 | 1 | 1 | 0.33 | 
| 37 | 1 | 1 | 0.62 | 
| 38 | 1 | 1 | 0.32 | 
| 39 | 1 | 1 | 0.31 | 
| 40 | 1 | 1 | 0.71 | 
| 41 | 1 | 1 | 0.32 | 
| 42 | 1 | 1 | 0.52 | 
| 43 | 1 | 1 | 0.42 | 
| 44 | 1 | 1 | 0.51 | 
| 45 | 1 | 1 | 0.71 | 
| 46 | 1 | 1 | 0.52 | 
| 47 | 1 | 1 | 0.42 | 
| 48 | 1 | 1 | 0.43 | 
| 49 | 1 | 1 | 0.71 | 
| 50 | 1 | 1 | 0.43 | 
summary(cData)##   Participant    BeforeAfter       Task      Performance    
##  Min.   : 1.0   Min.   :0.0   Min.   :0.0   Min.   :0.3000  
##  1st Qu.:13.0   1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.4000  
##  Median :25.5   Median :0.5   Median :0.5   Median :0.5100  
##  Mean   :25.5   Mean   :0.5   Mean   :0.5   Mean   :0.5186  
##  3rd Qu.:38.0   3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:0.6225  
##  Max.   :50.0   Max.   :1.0   Max.   :1.0   Max.   :0.7300The focus groups are nested in cities and there are 3 potential co-variates and 1 key predictor of interest:
cData$Participant <- factor(cData$Participant)
cData$BeforeAfter  <- factor(cData$BeforeAfter)
cData$Task  <- factor(cData$Task)plot(density(cData$Performance))The skewness value is close to zero so we can assume that the Rating values are normally distributed. The level of skewness can be identified by using D’Agostino skewness test
agostino.test(cData$Performance)## 
##  D'Agostino skewness test
## 
## data:  cData$Performance
## skew = -0.065896, z = -0.286090, p-value = 0.7748
## alternative hypothesis: data have a skewnesscheck if we have equal variances for the predictors to find any issues:
bartlett.test(cData$Performance, cData$BeforeAfter)## 
##  Bartlett test of homogeneity of variances
## 
## data:  cData$Performance and cData$BeforeAfter
## Bartlett's K-squared = 0.021317, df = 1, p-value = 0.8839bartlett.test(cData$Performance, cData$Task)## 
##  Bartlett test of homogeneity of variances
## 
## data:  cData$Performance and cData$Task
## Bartlett's K-squared = 0.021613, df = 1, p-value = 0.8831Checking residuals to see if the model fits good:
modelranova <- aov(Performance ~ BeforeAfter*Task + Error(Participant/(BeforeAfter)), data = cData)
summary(modelranova)## 
## Error: Participant
##           Df Sum Sq  Mean Sq F value Pr(>F)
## Task       1 0.0052 0.005184   0.225  0.637
## Residuals 48 1.1047 0.023015               
## 
## Error: Participant:BeforeAfter
##                  Df Sum Sq  Mean Sq F value Pr(>F)
## BeforeAfter       1 0.0002 0.000196   0.010  0.921
## BeforeAfter:Task  1 0.0023 0.002304   0.118  0.733
## Residuals        48 0.9388 0.019558Looks like we have no significant effects at all. Check the residuals to see if we just have a big problem with our model.
ex.aov.proj <- proj(modelranova)
qqnorm(ex.aov.proj[[2]][, "Residuals"]) #residuals for taskqqnorm(ex.aov.proj[[3]][, "Residuals"]) #residuals for BeforeAfter and BeforeAfter*Taskshapiro.test(ex.aov.proj[[2]][, "Residuals"])## 
##  Shapiro-Wilk normality test
## 
## data:  ex.aov.proj[[2]][, "Residuals"]
## W = 0.96554, p-value = 0.01019shapiro.test(ex.aov.proj[[3]][, "Residuals"])## 
##  Shapiro-Wilk normality test
## 
## data:  ex.aov.proj[[3]][, "Residuals"]
## W = 0.98713, p-value = 0.4464Not the best fits. Now, lets do the same analysis using linear regression. We already tested the most relevant assumption (i.e., no real skew) and we likely don’t need to worry about linear relationship because we have categorical predictors and the same goes for multicollinearity since the study was designed in such as way these should not correlate. Will still need to check the residuals at the end so we just fit the model now.
modelmixed <- lmer(Performance ~ BeforeAfter*Task + (1|Participant), data = cData)
summary(modelmixed)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Performance ~ BeforeAfter * Task + (1 | Participant)
##    Data: cData
## 
## REML criterion at convergence: -84.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6366 -0.7499 -0.0622  0.7278  1.4405 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Participant (Intercept) 0.001728 0.04157 
##  Residual                0.019558 0.13985 
## Number of obs: 100, groups:  Participant, 50
## 
## Fixed effects:
##                    Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         0.53200    0.02918 95.37128  18.232   <2e-16 ***
## BeforeAfter1       -0.01240    0.03956 48.00000  -0.313    0.755    
## Task1              -0.02400    0.04127 95.37128  -0.582    0.562    
## BeforeAfter1:Task1  0.01920    0.05594 48.00000   0.343    0.733    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) BfrAf1 Task1 
## BeforeAftr1 -0.678              
## Task1       -0.707  0.479       
## BfrAftr1:T1  0.479 -0.707 -0.678Nothing is significant as above. Lets look at this model’s residuals and will see if it suggests something is off with the fit just as we found with the RANOVA
shapiro.test(resid(modelmixed))## 
##  Shapiro-Wilk normality test
## 
## data:  resid(modelmixed)
## W = 0.93021, p-value = 5.115e-05Evaluating the difference in versions. We can see from the regression version 2 and 3 are better than 1 but we don’t know if 2 and 3 differ.
qqnorm(resid(modelmixed))
qqline(resid(modelmixed))shapiro.test(resid(modelmixed))## 
##  Shapiro-Wilk normality test
## 
## data:  resid(modelmixed)
## W = 0.93021, p-value = 5.115e-05We analyzed data from a study looking at workers performance before and after lunch (within subjects) in one of two tasks (welding or bolting, between subjects). We used both a repeated measures ANOVA and a mixed effects regression model. In neither analysis was performance found to be significantly affected before (M = .52) or after lunch (M = .52; F(1, 48) = .01, p = .92; b = -.01, t(48) = .31, p = .76), or between tasks (M = .53 vs .51; F(1, 48) = .23, p = .764; b = -.02, t(95.37) = .58, p = .56), nor was their interaction found to be significant (F(1, 48) = .12, p = .73; b = -.02, t(48) = .34, p = .73). In short, we find no reliable difference between task type or time of day on worker performance.