ANLY 510 - Weekly Lab 9

Saurav Chopra

2018-07-25


Objectives

The data file Weeklylab9data.xlsx contains workers performance (a percentage based on several factors such as output, quality of output, follow safety procedures, team work, etc.) in a production facility. Production was measured under two conditions for each worker (before and after lunch) and each worker had one kind of production task out of two (weld or bolt). Analyze the data using a repeated measures ANOVA and then as a mixed effects regression model to see if there is any difference in performance before or after lunch, or for different kinds of tasks.

Loading libraries

library(moments)
library(kableExtra)
library(readxl)
library(car)
library(lmerTest)
library(lsmeans)

Loading the dataset

cData <- read_excel("WeeklyLab9Data.xlsx")
kable(cData, "html") %>%
  kable_styling(bootstrap_options = "striped")
Participant BeforeAfter Task Performance
1 0 0 0.60
2 0 0 0.50
3 0 0 0.70
4 0 0 0.60
5 0 0 0.70
6 0 0 0.70
7 0 0 0.30
8 0 0 0.50
9 0 0 0.30
10 0 0 0.40
11 0 0 0.60
12 0 0 0.70
13 0 0 0.40
14 0 0 0.70
15 0 0 0.60
16 0 0 0.60
17 0 0 0.70
18 0 0 0.30
19 0 0 0.50
20 0 0 0.50
21 0 0 0.70
22 0 0 0.40
23 0 0 0.40
24 0 0 0.40
25 0 0 0.50
26 0 1 0.31
27 0 1 0.50
28 0 1 0.70
29 0 1 0.61
30 0 1 0.40
31 0 1 0.30
32 0 1 0.50
33 0 1 0.31
34 0 1 0.70
35 0 1 0.60
36 0 1 0.41
37 0 1 0.60
38 0 1 0.71
39 0 1 0.30
40 0 1 0.41
41 0 1 0.31
42 0 1 0.50
43 0 1 0.40
44 0 1 0.71
45 0 1 0.60
46 0 1 0.70
47 0 1 0.61
48 0 1 0.30
49 0 1 0.51
50 0 1 0.70
1 1 0 0.42
2 1 0 0.61
3 1 0 0.71
4 1 0 0.51
5 1 0 0.61
6 1 0 0.32
7 1 0 0.71
8 1 0 0.32
9 1 0 0.31
10 1 0 0.52
11 1 0 0.62
12 1 0 0.31
13 1 0 0.52
14 1 0 0.31
15 1 0 0.51
16 1 0 0.72
17 1 0 0.42
18 1 0 0.32
19 1 0 0.42
20 1 0 0.62
21 1 0 0.52
22 1 0 0.71
23 1 0 0.61
24 1 0 0.62
25 1 0 0.72
26 1 1 0.71
27 1 1 0.51
28 1 1 0.52
29 1 1 0.72
30 1 1 0.52
31 1 1 0.42
32 1 1 0.42
33 1 1 0.41
34 1 1 0.73
35 1 1 0.63
36 1 1 0.33
37 1 1 0.62
38 1 1 0.32
39 1 1 0.31
40 1 1 0.71
41 1 1 0.32
42 1 1 0.52
43 1 1 0.42
44 1 1 0.51
45 1 1 0.71
46 1 1 0.52
47 1 1 0.42
48 1 1 0.43
49 1 1 0.71
50 1 1 0.43
summary(cData)
##   Participant    BeforeAfter       Task      Performance    
##  Min.   : 1.0   Min.   :0.0   Min.   :0.0   Min.   :0.3000  
##  1st Qu.:13.0   1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.4000  
##  Median :25.5   Median :0.5   Median :0.5   Median :0.5100  
##  Mean   :25.5   Mean   :0.5   Mean   :0.5   Mean   :0.5186  
##  3rd Qu.:38.0   3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:0.6225  
##  Max.   :50.0   Max.   :1.0   Max.   :1.0   Max.   :0.7300

Factor all our categorical predictors.

The focus groups are nested in cities and there are 3 potential co-variates and 1 key predictor of interest:

cData$Participant <- factor(cData$Participant)
cData$BeforeAfter  <- factor(cData$BeforeAfter)
cData$Task  <- factor(cData$Task)

Plotting the distribution to see if there is any skewness in the data or not

plot(density(cData$Performance))

The skewness value is close to zero so we can assume that the Rating values are normally distributed. The level of skewness can be identified by using D’Agostino skewness test

agostino.test(cData$Performance)
## 
##  D'Agostino skewness test
## 
## data:  cData$Performance
## skew = -0.065896, z = -0.286090, p-value = 0.7748
## alternative hypothesis: data have a skewness

check if we have equal variances for the predictors to find any issues:

bartlett.test(cData$Performance, cData$BeforeAfter)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  cData$Performance and cData$BeforeAfter
## Bartlett's K-squared = 0.021317, df = 1, p-value = 0.8839
bartlett.test(cData$Performance, cData$Task)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  cData$Performance and cData$Task
## Bartlett's K-squared = 0.021613, df = 1, p-value = 0.8831

Checking residuals to see if the model fits good:

modelranova <- aov(Performance ~ BeforeAfter*Task + Error(Participant/(BeforeAfter)), data = cData)
summary(modelranova)
## 
## Error: Participant
##           Df Sum Sq  Mean Sq F value Pr(>F)
## Task       1 0.0052 0.005184   0.225  0.637
## Residuals 48 1.1047 0.023015               
## 
## Error: Participant:BeforeAfter
##                  Df Sum Sq  Mean Sq F value Pr(>F)
## BeforeAfter       1 0.0002 0.000196   0.010  0.921
## BeforeAfter:Task  1 0.0023 0.002304   0.118  0.733
## Residuals        48 0.9388 0.019558

Looks like we have no significant effects at all. Check the residuals to see if we just have a big problem with our model.

ex.aov.proj <- proj(modelranova)
qqnorm(ex.aov.proj[[2]][, "Residuals"]) #residuals for task

qqnorm(ex.aov.proj[[3]][, "Residuals"]) #residuals for BeforeAfter and BeforeAfter*Task

shapiro.test(ex.aov.proj[[2]][, "Residuals"])
## 
##  Shapiro-Wilk normality test
## 
## data:  ex.aov.proj[[2]][, "Residuals"]
## W = 0.96554, p-value = 0.01019
shapiro.test(ex.aov.proj[[3]][, "Residuals"])
## 
##  Shapiro-Wilk normality test
## 
## data:  ex.aov.proj[[3]][, "Residuals"]
## W = 0.98713, p-value = 0.4464

Not the best fits. Now, lets do the same analysis using linear regression. We already tested the most relevant assumption (i.e., no real skew) and we likely don’t need to worry about linear relationship because we have categorical predictors and the same goes for multicollinearity since the study was designed in such as way these should not correlate. Will still need to check the residuals at the end so we just fit the model now.

modelmixed <- lmer(Performance ~ BeforeAfter*Task + (1|Participant), data = cData)
summary(modelmixed)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Performance ~ BeforeAfter * Task + (1 | Participant)
##    Data: cData
## 
## REML criterion at convergence: -84.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6366 -0.7499 -0.0622  0.7278  1.4405 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Participant (Intercept) 0.001728 0.04157 
##  Residual                0.019558 0.13985 
## Number of obs: 100, groups:  Participant, 50
## 
## Fixed effects:
##                    Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         0.53200    0.02918 95.37128  18.232   <2e-16 ***
## BeforeAfter1       -0.01240    0.03956 48.00000  -0.313    0.755    
## Task1              -0.02400    0.04127 95.37128  -0.582    0.562    
## BeforeAfter1:Task1  0.01920    0.05594 48.00000   0.343    0.733    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) BfrAf1 Task1 
## BeforeAftr1 -0.678              
## Task1       -0.707  0.479       
## BfrAftr1:T1  0.479 -0.707 -0.678

Nothing is significant as above. Lets look at this model’s residuals and will see if it suggests something is off with the fit just as we found with the RANOVA

shapiro.test(resid(modelmixed))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(modelmixed)
## W = 0.93021, p-value = 5.115e-05

Evaluating the difference in versions. We can see from the regression version 2 and 3 are better than 1 but we don’t know if 2 and 3 differ.

qqnorm(resid(modelmixed))
qqline(resid(modelmixed))

shapiro.test(resid(modelmixed))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(modelmixed)
## W = 0.93021, p-value = 5.115e-05

Conclusion

We analyzed data from a study looking at workers performance before and after lunch (within subjects) in one of two tasks (welding or bolting, between subjects). We used both a repeated measures ANOVA and a mixed effects regression model. In neither analysis was performance found to be significantly affected before (M = .52) or after lunch (M = .52; F(1, 48) = .01, p = .92; b = -.01, t(48) = .31, p = .76), or between tasks (M = .53 vs .51; F(1, 48) = .23, p = .764; b = -.02, t(95.37) = .58, p = .56), nor was their interaction found to be significant (F(1, 48) = .12, p = .73; b = -.02, t(48) = .34, p = .73). In short, we find no reliable difference between task type or time of day on worker performance.