The opportunity to tidy this data set came from the Typical workday questions for week one and week two. By using the gather function, it permits the data table to go from 50 columns down to 38. Further filtering was done to remove responses of “2” or no to those questions, since the response were binary - we don’t need both values. This allows us to remove two additional columns, because we don’t need columns full of the same values on each row - in this case “1”. The net result is much longer file, because there’s a row for each respondent in reply to these questions.
doturl<-"https://raw.githubusercontent.com/Misterresearch/CUNY-Projects/master/Signalmen%20background%20data.csv"
dottable <- read.table(file = doturl, header = TRUE, sep = ",",
na.strings = "NA")
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.3
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Reduce column count from 50 to 38, now tidy
dottable1<-gather(dottable, "two week cycle", "count", 17:30)
#Remove extraneous "no" responses from binary typical_workday variables
dottidy<-(dplyr::filter(dottable1, dottable1$count!=2))
#sort table by respondent ID
dotfilter<-dplyr::arrange(dottidy, `ID`)
#The following function gives us a rank order of the days most worked.
dotdayorder<-(dotfilter %>% group_by(`two week cycle`) %>% summarise(total = sum(count, na.rm = TRUE)))
dotdaysort<-arrange(dotdayorder, desc(total))
dotdaysort
## Source: local data frame [14 x 2]
##
## two week cycle total
## (chr) (int)
## 1 Tues1 378
## 2 Thurs1 377
## 3 Wed1 377
## 4 Tues2 373
## 5 Mon2 369
## 6 Wed2 345
## 7 Mon1 344
## 8 Thurs2 343
## 9 Fri1 319
## 10 Fri2 287
## 11 Sat1 69
## 12 Sun2 59
## 13 Sat2 31
## 14 Sun1 30
#change path below to local folder
#write.csv(dotdaysort, "/Users/digitalmarketer1977/Desktop/dotdaysort.csv")