Edited version of previous Rpubs!

Data 607 - project 2

Heather Geiger;March 12, 2018

About this data

This data was uploaded by Nicholas Schettini in my Data 607 class in the CUNY Master’s of Data Science program.

I have also put the raw data into my Github, available here:

https://raw.githubusercontent.com/heathergeiger/Data607_project2/master/TimeUse.csv

Nicholas gave the following description for the data:

“I found this dataset on time use by gender and by country. Some of the variables include eating, sleeping, employment, travel, school, study, walking the dog, etc. It seems you could analyze how males vs. females spend their time, and how each countries males and females compare to each other. Maybe certain countries spend more time doing something more than another country; same goes for gender.”

Loading libraries and reading in data

Load libraries.

library(tidyr)
library(stringr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

Read in data.

timeuse <- read.csv("TimeUse.csv",header=TRUE,skipNul = TRUE,check.names=FALSE,stringsAsFactors=FALSE)

Look at the data and run transformations.

Take a look at the file.

There are a lot of columns, so we’ll display just the first 10 along with just the column names for all.

dim(timeuse)
head(timeuse[,1:10])
colnames(timeuse)

## [1] 28 58

##     SEX                                 GEO/ACL00 Total Personal care Sleep
## 1 Males                                   Belgium 24:00         10:45  8:15
## 2 Males                                  Bulgaria 24:00         11:54  9:08
## 3 Males Germany (including  former GDR from 1991) 24:00         10:40  8:08
## 4 Males                                   Estonia 24:00         10:35  8:24
## 5 Males                                     Spain 24:00         11:11  8:36
## 6 Males                                    France 24:00         11:44  8:45

##   Eating Other and/or unspecified personal care
## 1   1:49                                   0:42
## 2   2:07                                   0:39
## 3   1:43                                   0:49
## 4   1:19                                   0:52
## 5   1:47                                   0:48
## 6   2:18                                   0:41

##   Employment, related activities and travel as part of/during main and second job
## 1                                                                            3:07
## 2                                                                            3:32
## 3                                                                            3:27
## 4                                                                            4:27
## 5                                                                            4:21
## 6                                                                            3:48

##   Main and second job and related travel Activities related to employment and unspecified employment
## 1                                   3:05                                                        0:02
## 2                                   3:27                                                        0:04
## 3                                   3:21                                                        0:06
## 4                                   4:20                                                        0:07
## 5                                   4:17                                                        0:03
## 6                                   3:46                                                        0:02

##  [1] "SEX"                                                                            
##  [2] "GEO/ACL00"                                                                      
##  [3] "Total"                                                                          
##  [4] "Personal care"                                                                  
##  [5] "Sleep"                                                                          
##  [6] "Eating"                                                                         
##  [7] "Other and/or unspecified personal care"                                         
##  [8] "Employment, related activities and travel as part of/during main and second job"
##  [9] "Main and second job and related travel"                                         
## [10] "Activities related to employment and unspecified employment"                    
## [11] "Study"                                                                          
## [12] "School and university except homework"                                          
## [13] "Homework"                                                                       
## [14] "Free time study"                                                                
## [15] "Household and family care"                                                      
## [16] "Food management except dish washing"                                            
## [17] "Dish washing"                                                                   
## [18] "Cleaning dwelling"                                                              
## [19] "Household upkeep except cleaning dwelling"                                      
## [20] "Laundry"                                                                        
## [21] "Ironing"                                                                        
## [22] "Handicraft and producing textiles and other care for textiles"                  
## [23] "Gardening; other pet care"                                                      
## [24] "Tending domestic animals"                                                       
## [25] "Caring for pets"                                                                
## [26] "Walking the dog"                                                                
## [27] "Construction and repairs"                                                       
## [28] "Shopping and services"                                                          
## [29] "Childcare, except teaching, reading and talking"                                
## [30] "Teaching, reading and talking with child"                                       
## [31] "Household management and help family member"                                    
## [32] "Leisure, social and associative life"                                           
## [33] "Organisational work"                                                            
## [34] "Informal help to other households"                                              
## [35] "Participatory activities"                                                       
## [36] "Visiting and feasts"                                                            
## [37] "Other social life"                                                              
## [38] "Entertainment and culture"                                                      
## [39] "Resting"                                                                        
## [40] "Walking and hiking"                                                             
## [41] "Sports and outdoor activities except walking and hiking"                        
## [42] "Computer games"                                                                 
## [43] "Computing"                                                                      
## [44] "Hobbies and games except computing and computer games"                          
## [45] "Reading books"                                                                  
## [46] "Reading, except books"                                                          
## [47] "TV and video"                                                                   
## [48] "Radio and music"                                                                
## [49] "Unspecified leisure"                                                            
## [50] "Travel except travel related to jobs"                                           
## [51] "Travel to/from work"                                                            
## [52] "Travel related to study"                                                        
## [53] "Travel related to shopping and services"                                        
## [54] "Transporting a child"                                                           
## [55] "Travel related to other household purposes"                                     
## [56] "Travel related to leisure, social and associative life"                         
## [57] "Unspecified travel"                                                             
## [58] "Unspecified time use"

In v1 of this script, I did not realize at first which the umbrella categories were.

I realize now that “Personal care” is a superset of “Sleep”, “Eating”, and “Other and/or unspecified personal care”.

“Employment, related activities and travel as part of/during main and second job” is a superset of “Main and second job and related travel” and “Activities related to employment and unspecified employment”.

“Study” is a superset of “School and university except homework”, “Homework”, and “Free time study”.

“Household and family care” is a superset of activities from “Food management except dish washing” to “Household management and help family member”.

“Leisure, social and associative life” is a superset of “Organisational work” to “Unspecified leisure”.

“Travel except travel related to jobs” is a superset of “Travel to/from work” to “Unspecified travel”.

Finally, there is an “Other” type category called “Unspecified time use”.

Let’s make a table of which umbrella category each sub-category fits under.

umbrella_per_sub_category <- data.frame(Individual.activity = colnames(timeuse)[c(5:7,9:10,12:14,16:31,33:49,51:57,58)],
Umbrella = rep(c("Personal care",
"Employment, related activities and travel as part of/during main and second job",
"Study",
"Household and family care",
"Leisure, social and associative life",
"Travel except travel related to jobs",
"Unspecified time use"),
times=c(3,2,3,length(16:31),length(33:49),length(51:57),1)),
stringsAsFactors=FALSE)

For the remainder of this analysis, we’ll focus only on umbrella categories.

However, we save the data from non-umbrella category activities in a different object, which we could transform and analyze in a similar way if we wanted to.

For this purpose, “Unspecified Time use” is included for both umbrella and non-umbrella.

umbrella_categories <- colnames(timeuse)[colnames(timeuse) %in% umbrella_per_sub_category$Umbrella]
non_umbrella_categories <- colnames(timeuse)[colnames(timeuse) %in% umbrella_per_sub_category$Individual.activity]

timeuse_umbrella <- timeuse %>% select(c("SEX","GEO/ACL00","Total",umbrella_categories))

timeuse_non_umbrella <- timeuse %>% select(c("SEX","GEO/ACL00","Total",non_umbrella_categories))

timeuse <- timeuse_umbrella

What countries are included in this data set?

unique(timeuse[,"GEO/ACL00"])

##  [1] "Belgium"                                   "Bulgaria"                                  "Germany (including  former GDR from 1991)"
##  [4] "Estonia"                                   "Spain"                                     "France"                                   
##  [7] "Italy"                                     "Latvia"                                    "Lithuania"                                
## [10] "Poland"                                    "Slovenia"                                  "Finland"                                  
## [13] "United Kingdom"                            "Norway"

I’m assuming “Total” column will be the same for all countries, but let’s check.

If so, remove this column.

Also rename “GEO/ACL00” to “Country” and “SEX” to “Sex”.

table(timeuse$Total)

## 
## 24:00 
##    28

timeuse <- timeuse[,setdiff(colnames(timeuse),"Total")]

colnames(timeuse)[1:2] <- c("Sex","Country")

Convert from wide to long format.

dim(timeuse)

## [1] 28  9

timeuse <- gather(timeuse,Activity,Time,-Sex,-Country)
dim(timeuse)

## [1] 196   4

head(timeuse)

##     Sex                                   Country      Activity  Time
## 1 Males                                   Belgium Personal care 10:45
## 2 Males                                  Bulgaria Personal care 11:54
## 3 Males Germany (including  former GDR from 1991) Personal care 10:40
## 4 Males                                   Estonia Personal care 10:35
## 5 Males                                     Spain Personal care 11:11
## 6 Males                                    France Personal care 11:44

Write a function to convert the HH:MM notation to number of minutes.

hours_and_minutes_to_minutes <- function(time){
    time_split <- strsplit(time,":")[[1]]
    hours <- as.numeric(time_split[1])
    minutes <- as.numeric(time_split[2])
    return((hours * 60) + minutes)
}

Run this function on Time column.

timeuse <- data.frame(timeuse,
    Time.in.minutes = unlist(lapply(timeuse$Time,FUN=hours_and_minutes_to_minutes)),
    stringsAsFactors=FALSE)

head(timeuse)

##     Sex                                   Country      Activity  Time Time.in.minutes
## 1 Males                                   Belgium Personal care 10:45             645
## 2 Males                                  Bulgaria Personal care 11:54             714
## 3 Males Germany (including  former GDR from 1991) Personal care 10:40             640
## 4 Males                                   Estonia Personal care 10:35             635
## 5 Males                                     Spain Personal care 11:11             671
## 6 Males                                    France Personal care 11:44             704

tail(timeuse)

##         Sex        Country             Activity Time Time.in.minutes
## 191 Females      Lithuania Unspecified time use 0:04               4
## 192 Females         Poland Unspecified time use 0:05               5
## 193 Females       Slovenia Unspecified time use 0:02               2
## 194 Females        Finland Unspecified time use 0:12              12
## 195 Females United Kingdom Unspecified time use 0:10              10
## 196 Females         Norway Unspecified time use 0:03               3

Let’s make sure time adds up to 24 hours for all countries and genders.

24*60

## [1] 1440

aggregate(Time.in.minutes ~ Country + Sex,FUN=sum,data=timeuse)

##                                      Country     Sex Time.in.minutes
## 1                                    Belgium Females            1440
## 2                                   Bulgaria Females            1440
## 3                                    Estonia Females            1440
## 4                                    Finland Females            1439
## 5                                     France Females            1440
## 6  Germany (including  former GDR from 1991) Females            1440
## 7                                      Italy Females            1441
## 8                                     Latvia Females            1441
## 9                                  Lithuania Females            1440
## 10                                    Norway Females            1441
## 11                                    Poland Females            1440
## 12                                  Slovenia Females            1440
## 13                                     Spain Females            1439
## 14                            United Kingdom Females            1441
## 15                                   Belgium   Males            1440
## 16                                  Bulgaria   Males            1441
## 17                                   Estonia   Males            1439
## 18                                   Finland   Males            1440
## 19                                    France   Males            1440
## 20 Germany (including  former GDR from 1991)   Males            1440
## 21                                     Italy   Males            1440
## 22                                    Latvia   Males            1440
## 23                                 Lithuania   Males            1439
## 24                                    Norway   Males            1439
## 25                                    Poland   Males            1439
## 26                                  Slovenia   Males            1440
## 27                                     Spain   Males            1441
## 28                            United Kingdom   Males            1438

Yes, they do, minus a few minutes difference at most probably due to rounding errors.

Let’s change some of the category names to something shorter.

timeuse$Activity <- plyr::mapvalues(timeuse$Activity,
    from = c("Employment, related activities and travel as part of/during main and second job",
        "Leisure, social and associative life",
        "Travel except travel related to jobs"),
    to = c("Employment",
        "Leisure and social",
        "Travel, non-job-related"))

Take a look at the data. Let’s pick a random country and look at lines for all males and females.

set.seed(1392)

test_country <- sample(unique(timeuse$Country),1)

timeuse %>% filter(Country == test_country & Sex == "Females")

##       Sex  Country                  Activity  Time Time.in.minutes
## 1 Females Slovenia             Personal care 10:32             632
## 2 Females Slovenia                Employment  2:42             162
## 3 Females Slovenia                     Study  0:19              19
## 4 Females Slovenia Household and family care  4:56             296
## 5 Females Slovenia        Leisure and social  4:27             267
## 6 Females Slovenia   Travel, non-job-related  1:02              62
## 7 Females Slovenia      Unspecified time use  0:02               2

timeuse %>% filter(Country == test_country & Sex == "Males")

##     Sex  Country                  Activity  Time Time.in.minutes
## 1 Males Slovenia             Personal care 10:31             631
## 2 Males Slovenia                Employment  3:53             233
## 3 Males Slovenia                     Study  0:15              15
## 4 Males Slovenia Household and family care  2:38             158
## 5 Males Slovenia        Leisure and social  5:31             331
## 6 Males Slovenia   Travel, non-job-related  1:10              70
## 7 Males Slovenia      Unspecified time use  0:02               2

Analysis

Run some minor clean-up of the country names (to make them shorter where needed).

Then, make a panel plot with time use by country and gender.

timeuse$Country <- plyr::mapvalues(timeuse$Country,
    from = c("Germany (including  former GDR from 1991)","United Kingdom"),
    to = c("Germany","UK"))

ggplot(timeuse,
aes(Country,Time.in.minutes,fill=Sex)) +
geom_bar(stat="identity",position = "dodge") +
facet_wrap(~Activity,scales="free_y") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

Time spent on unspecified is less than 10-15 minutes. Let’s plot minus that.

ggplot(timeuse[!(timeuse$Activity %in% "Unspecified time use"),],
aes(Country,Time.in.minutes,fill=Sex)) +
geom_bar(stat="identity",position = "dodge") +
facet_wrap(~Activity,scales="free_y") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

We see dramatic gender differences in time spent on employment (much higher for men across all countries) and household and family care (much higher for women across all countries).

Are there any major differences across countries?

Let’s compare countries now, separated by gender.

Let’s also remove “study” this time, as again there is very little time allocated to this category across countries.

mycol <- c("#004949","#009292","#FF6DB6","#FFB677","#490092","#006DDB","#B66DFF","#6DB6FF","#B6DBFF","#920000","#924900","#DBD100","#24FF24","#FFFF6D","#000000") #Set up colorblind friendly vector. 

for(activity in setdiff(unique(timeuse$Activity),c("Unspecified time use","Study")))
{
print(ggplot(timeuse %>% filter(Activity == activity),
aes(Country,Time.in.minutes,fill=Country)) +
geom_bar(stat="identity") +
facet_wrap(~Sex,scales="free_y") +
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank()) +
scale_fill_manual(values = mycol) + 
ggtitle(activity))
}

We definitely see some country-related differences in time spent on employment, with both males and females in Latvia and Lithuania (and a bit Estonia, though more so for females) spending more time on this activity.

Belgium, Finland, Germany, and Norway seem to spend more time on leisure, with differences especially dramatic for females.

We also start to see an interaction between country and gender in these plots. For example, Italian females spend the most time on household and family care compared to females across countries. Meanwhile Italian males have the lowest amounts of time spent on household and family care compared to males in other countries.

Time Use by Country Survey - Tidying and Analyzing the Data