Weekly Lab 9

Workers Performance

The data file Weeklylab9data.xlsx contains workers performance (a percentage based on several factors such as output, quality of output, follow safety procedures, team work, etc.) in a production facility. Production was measured under two conditions for each worker (before and after lunch) and each worker had one kind of production task out of two (weld or bolt). Analyze the data using a repeated measures ANOVA and then as a mixed effects regression model to see if there is any difference in performance before or after lunch, or for different kinds of tasks.

library(readxl)
## Warning: package 'readxl' was built under R version 3.4.4
WorkPerf <- read_excel("C:/Users/Enrique/OneDrive/Documents/HU/ANLY510_Principles7Applicaitons02/Data/Workers Performance.xlsx")
head(WorkPerf)
## # A tibble: 6 x 4
##   Participant BeforeAfter  Task Performance
##         <dbl>       <dbl> <dbl>       <dbl>
## 1        1.00           0     0       0.600
## 2        2.00           0     0       0.500
## 3        3.00           0     0       0.700
## 4        4.00           0     0       0.600
## 5        5.00           0     0       0.700
## 6        6.00           0     0       0.700
str(WorkPerf)
## Classes 'tbl_df', 'tbl' and 'data.frame':    100 obs. of  4 variables:
##  $ Participant: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ BeforeAfter: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Task       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Performance: num  0.6 0.5 0.7 0.6 0.7 0.7 0.3 0.5 0.3 0.4 ...

1. Prepare data: Convert BeforeAfter and Task to factors

WorkPerf$BeforeAfter= factor(WorkPerf$BeforeAfter)
WorkPerf$Task= factor(WorkPerf$Task)
levels(WorkPerf$BeforeAfter)
## [1] "0" "1"
levels(WorkPerf$Task)
## [1] "0" "1"

Let’s take a firt glance at the data:

summary(WorkPerf$Performance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3000  0.4000  0.5100  0.5186  0.6225  0.7300

3. Plot rating dsitribution and test for skewness and normality

plot(density(WorkPerf$Performance), col="red", lwd=2, lty="dashed")

There seems to be skewness towards the right, lets test for skewness and normality:

library(moments)
agostino.test(WorkPerf$Performance)
## 
##  D'Agostino skewness test
## 
## data:  WorkPerf$Performance
## skew = -0.065896, z = -0.286090, p-value = 0.7748
## alternative hypothesis: data have a skewness
shapiro.test(WorkPerf$Performance)
## 
##  Shapiro-Wilk normality test
## 
## data:  WorkPerf$Performance
## W = 0.90798, p-value = 3.423e-06

Skewness and normality tests failed. Transform data to eliminate skewness by taking the log.

WorkPerf$Performance= log(WorkPerf$Performance)
agostino.test(WorkPerf$Performance)
## 
##  D'Agostino skewness test
## 
## data:  WorkPerf$Performance
## skew = -0.37291, z = -1.57790, p-value = 0.1146
## alternative hypothesis: data have a skewness
shapiro.test(WorkPerf$Performance)
## 
##  Shapiro-Wilk normality test
## 
## data:  WorkPerf$Performance
## W = 0.89786, p-value = 1.123e-06

Skewness is still positive but not as significant as before. We can proceed with analysis.

4. Lets visualize the associaiton between predictors and workers performance:

boxplot(WorkPerf$Performance~WorkPerf$BeforeAfter, 
        main="Performance before and after lunch", 
        xlab="Before (0) and After (1)", ylab="Workers Performance", 
        col=c("light blue","light pink"), lwd=2)

plot(WorkPerf$Participant, WorkPerf$Performance, main= "Performance per Participant")
abline(lm(WorkPerf$Performance~WorkPerf$Participant), lwd=1, col=2, lty="dashed")

boxplot(WorkPerf$Performance~WorkPerf$Task, 
        main="Performance per Task", 
        xlab="Weld (0) Bolt (1)", ylab="Workers Performance", 
        col=c("light blue","light pink"), lwd=2)

There is no significant difference in performance before and after lunch or per task. We need to test these variables in ANOVA to confirm that is the case.

5. Fit ANOVA and lienar regression models to explore relationships.

Neither variables appears to be statistically significant at the population level.

Perf_anova = aov(Performance~BeforeAfter*Task, data= WorkPerf)
summary(Perf_anova)
##                  Df Sum Sq Mean Sq F value Pr(>F)
## BeforeAfter       1  0.000 0.00017   0.002  0.965
## Task              1  0.022 0.02239   0.249  0.619
## BeforeAfter:Task  1  0.018 0.01788   0.199  0.657
## Residuals        96  8.625 0.08984

We accept the HO of equal means across groups.Performance is very similar before and after lunch as well as per task.

Plot residuals from model to validate our results.

qqnorm(Perf_anova$residuals)
qqline(Perf_anova$residuals)

Conclusion:

Residual plot shows a clear pattern with most residuals too far from 0. This is usually an indicaiton of lack of fit due to a large portion of data not explained by the model.