The Data

The kitchen areas at FreeAgent are usually very clean but in this scenario, let’s imagine that we have been very messy recently! We haven’t been putting our mugs in the dishwasher or generally keeping the kitchen clean so we’ve designed a daily rota to make sure that one person is responsible for giving the kitchen a five-minute blitz after lunchtime.

The data science team are interested in analyzing the data to find out why people became so messy in the first place, so we asked everyone to keep details of their cleaning duty in a shared spreadsheet. We asked them:

## [1] 9 6
date_cleaned name sides_and_sink dishwasher_full no_of_mugs notes
16-07-18 Charlotte sides and sink NO 16 new cleaning schedule begins today!
17-7-18 David E NA NA 2 - 6 NA
18/07/18 Hannah sides YES 15(ish) all fine
19 07 18 Dave sink, side YES 18 so many mugs
20-7-18 Charlie didn’t need doing YES 12-14 (i think!) can’t remember very well
23-07-2018 Davida NA NA 0 All of us were at conference but I still had to put 0s in form!
24-07-18 David A both NO 4-14 NA
25 7 2018 Hanson just the side NO 3 not v messy
260718 Davie sink only YES 6 NA

This looks like a great data set for cleaning!

Please note, while the article this is taken from (which includes instructions on how to clean it, but I am not going to look at them) mentions that there is a duplicated row this is not evident in the data provided on the site to download. It also makes some note about how Davida should have put an NA in instead of a 0 because everyone was out of the office–that makes no sense at all to me so I am not doing it.


Date cleaned

It looks like all of the dates follow the European format (day/month/year), so that makes things easier than if they were mixed formats, especially with lubridate, my favorite function ever.

x
2018-07-16
2018-07-17
2018-07-18
2018-07-19
2018-07-20
2018-07-23
2018-07-24
2018-07-25
2018-07-26


Knowing what day of the week each of these days corresponds to will also be useful

## [1] "date_cleaned"    "dow"             "name"            "sides_and_sink" 
## [5] "dishwasher_full" "no_of_mugs"      "notes"


Sides and Sink

Creating columns for sides and sink that check whether that column has the words “side”, “sink”, or “both”. These are logical vectors and depending on what was going to be done with the data I might transform these to binary (also might do the same with dishwasher full).

sides sink
TRUE TRUE
NA NA
TRUE FALSE
TRUE TRUE
FALSE FALSE
NA NA
TRUE TRUE
TRUE FALSE
FALSE TRUE


Number of Mugs

The easiest thing to do here is to assume that either the largest or smallest number written is accurate enough. A little harder would be to take the average of the high and low (if present). Why not have some fun?

no_of_mugs notes sides sink mugslow mugshigh
16 new cleaning schedule begins today! TRUE TRUE 16 16
4 NA NA NA 2 6
15 all fine TRUE FALSE 15 15
18 so many mugs TRUE TRUE 18 18
13 can’t remember very well FALSE FALSE 12 14
0 All of us were at conference but I still had to put 0s in form! NA NA 0 0
9 NA TRUE TRUE 4 14
3 not v messy TRUE FALSE 3 3
6 NA FALSE TRUE 6 6


Final cleaning

Reduce and reorder columns

date_cleaned dow name no_of_mugs dishwasher_full sides sink notes
2018-07-16 Monday Charlotte 16 NO TRUE TRUE new cleaning schedule begins today!
2018-07-17 Tuesday David E 4 NA NA NA NA
2018-07-18 Wednesday Hannah 15 YES TRUE FALSE all fine
2018-07-19 Thursday Dave 18 YES TRUE TRUE so many mugs
2018-07-20 Friday Charlie 13 YES FALSE FALSE can’t remember very well
2018-07-23 Monday Davida 0 NA NA NA All of us were at conference but I still had to put 0s in form!
2018-07-24 Tuesday David A 9 NO TRUE TRUE NA
2018-07-25 Wednesday Hanson 3 NO TRUE FALSE not v messy
2018-07-26 Thursday Davie 6 YES FALSE TRUE NA

Analysis

What day are there the most mugs?

## # A tibble: 5 x 2
##   dow        mugs
##   <fct>     <dbl>
## 1 Monday       16
## 2 Tuesday      13
## 3 Wednesday    18
## 4 Thursday     24
## 5 Friday       13

Thursdays are clearly the biggest days for mug usage with Wednesdays close behind.

Lets graph it

I guess on Monday folks are rested from the weekend, then they need some help getting through the rest of the week until Friday. As these data were collected in the summer, maybe the company had summer Fridays.


Are men or women more likely to wash either the sides or sink?

Gender wasn’t collected with this information but can be determined. There are resources available to determine likely gender if it were needed for a larger data set.

## Joining by: gender
##   gender count washed       pct
## 1      F     3      2 0.6666667
## 2      M     6      4 0.6666667

Are men or women more likely to fill out the form completely?

##   mugcount na_count complete gender
## 1        0        0        0      F
## 2        1        4        1      M
## 3        0        0        0      F
## 4        0        0        0      M
## 5        1        0        1      M
## 6        0        3        1      F
## 7        1        1        1      M
## 8        0        0        0      M
## 9        0        1        1      M
## # A tibble: 2 x 2
##   gender complete
##   <fct>     <int>
## 1 F             1
## 2 M             4
## Joining by: gender
##   gender count complete       pct
## 1      F     3        1 0.3333333
## 2      M     6        4 0.6666667


Conclusion

Employees clearly need more direction in how to fill out these forms. The most successful method would probably to provide some sort of incentive for correctly filling it out.

Thursdays have the most dirty mugs–should make sure to keep on top of it these days.

Men and woman are equally likely (about 2/3) to clean either the sides or the sink.

Men are more likely to completely fill the form.