Sick Days Taken by Day of Week and by Month

Intro

People miss work when they are sick. Sometimes, people miss work and say it is because they are sick when they are really not sick at all. If we assume people are equally likely to get sick on any given day of the week, we can plot the data and run a statistical test to compare the number of observed sick days vs. the number of expected sick days. If we observe meaningful differences between the observed and expected counts, it may suggest that people are faking sick and missing work for a different reason.

Data

Sick leave data were collected between 2011 and Q1 2015:

## Set Working Directory
setwd("~/Documents/HRMeasured/Sick Leave")

## Read data into R session.  FYI - stringsAsFactors = FALSE converts dates to Character instead of Factor
sickData <- read.csv("sickLeave.csv", header = TRUE, stringsAsFactors = FALSE)

## Select single variable of interest for the study <-sickLeaveTaken
sickData <- sickData[c(5)]
dim(sickData)
## [1] 21264     1
head(sickData)
##   sickLeaveTaken
## 1         3/7/14
## 2        3/10/14
## 3        6/25/13
## 4        6/23/14
## 5         9/8/14
## 6        7/12/13

Dates when employees used sick days were recoded into two variables:

## Convert dates from Character to POSIX
wDate <- strptime(as.character(sickData$sickLeaveTaken), "%m/%d/%y")

## Save reformatted Date variables to dataframe
sickData <- data.frame(sickData, wDate)

## Create new date variables - 'Day of the Week', and 'month'
sickDotw <- weekdays(wDate)
sickMonth <- months(wDate)

## Save new date variables to dataframe
sickData <- data.frame(sickData, sickDotw, sickMonth)
head(sickDotw)
## [1] "Friday"  "Monday"  "Tuesday" "Monday"  "Monday"  "Friday"
head(sickMonth)
## [1] "March"     "March"     "June"      "June"      "September" "July"

Some reformatting and sorting were needed to get the plots to display Monday through Friday and January through December:

## Re-order days of week to be Sunday through Saturday and months to be January through December
sickData$sickDotw <- factor(sickData$sickDotw, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"))
sickData$sickMonth <- factor(sickData$sickMonth, levels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))

## Subset Days of the week to only include observations equal to Monday through Friday
sickData <- subset(sickData, sickDotw == "Monday" | sickDotw =="Tuesday" | sickDotw == "Wednesday" | sickDotw == "Thursday" | sickDotw == "Friday")

Analysis

The number of sick days by Day of the Week (DOTW) were plotted:

## Create barplots:  Count of Sick Days by Day of the Week
##install.packages("ggplot2")
library(ggplot2)
qplot(factor(sickDotw), data=sickData, xlab = "", ylab = "", main = "Sick Days taken by DOTW",geom="bar")

plot of chunk unnamed-chunk-4

table(sickData$sickDotw)
## 
##    Monday   Tuesday Wednesday  Thursday    Friday 
##      5118      4221      3908      3697      4240

The number of sick days by Month were plotted:

## Create barplots:  Count of Sick Days by Day Month
library(ggplot2)
qplot(factor(sickMonth), data=sickData, xlab = "", ylab = "", main = "Sick Days taken by Month",geom="bar")

plot of chunk unnamed-chunk-5

table(sickData$sickMonth)
## 
##   January  February     March     April       May      June      July 
##      2541      2090      2132      1505      1297      1426      1440 
##    August September   October  November  December 
##      1532      1632      1919      1683      1987

A chi-squared test was run on sick days by day of the week to test if the differences between observed and expected counts were statistically different:

##Chi-square - Test to see if there is a statistical difference between DOTW
chiSquareWeekDay <- table(sickData$sickDotw)
chiSquareWeekDay
## 
##    Monday   Tuesday Wednesday  Thursday    Friday 
##      5118      4221      3908      3697      4240
chisq.test(table(sickData$sickDotw))
## 
##  Chi-squared test for given probabilities
## 
## data:  table(sickData$sickDotw)
## X-squared = 277.63, df = 4, p-value < 2.2e-16

Results

Sick Days by Day of the Week (DOTW)

Monday is the most frequent day people in the sample called in sick. Mondays account for 20% more sick days than the next two most frequent days, Tuesday and Friday. A chi-squared test with 4 degrees of freedom resulted in p < .01 suggesting the differences in sick days across weekdays is not likely due to chance.

Sick Days by Month

No statistical tests were planned to run on sick days per month. Just the barplot was used to look at the frequency of sick days reported by each month to see if any patterns emerged.

Discussion

Monday accounted for 20% more sick days in the sample than the next two most frequent days, Tuesday and Friday. People were less likely to call in sick on Wednesday and Thursday compared with the other weekdays. Based on these results, it seems people do fake sick and miss work, usually on a Monday.

When looking at the number of sick days by month, a different pattern emerges. One that suggests people miss work when they really are sick. The most frequent months people use sick days cluster around the winter months: January, February, and March with the lowest reported sick days occur during the summer months.