#Introduction

#Employees’ attainment of a satisfactory work-life balance (WLB) has become an increasing area of focus for professionals, policy makers and academics (Dilmaghani et al., 2019). The past few years, there has been a societal shift towards greater gender equality in labour market attainment (Ibid, 2019). In most developed countries, the impact gender has on people’s WLB has been analysed as the traditional model of a male breadwinner and a female homemaker is becoming increasingly less common. Balancing work and family demands is a struggle that most employees deal with daily (Karkoulian et al., 2016). However, while men have been encouraged to show increasing engagement in family life, childcare and homemaking, the greater shares of chores are still reported to fall on women (Dilmaghani et al., 2019). Consequently, females generally bear the double burden of unpaid house labour and paid employment. Some research indicates that the conflict between work and family life appears to be a greater problem for women than for men (Pace et al., 2021). The aim of this research question is to explore whether there is a difference in satisfaction with WLB between women and men in Scotland.

#Research Question: Is there a difference in satisfaction with work-life balance between women and men in Scotland?

#Dependent Variable: Satisfaction with work-life balance. Participants were asked the question: ‘How satisfied with balance between time on paid work and time on other aspects of life?’ and were provided with a continuous, numeric 11-point scale from 0 (extremely dissatisfied) to 10 (extremely satisfied).

#Independent Variable: Gender. Binary scale where participants could tick either ‘male’ or ‘female’.

#Cross-Sectional Data set: The Scottish Health Survey of 2019 will be used to answer the research question. #Specific File: shes19i_eul

#In addition to gender, stress at work, part-time or full-time employment, and class (Karkoulian et al., 2016; Pace et al., 2021; Dilmaghani et al., 2019) were mentioned as additional factors that could influence someone’s satisfaction with WLB and have therefore been chosen as controlled variables.

#Controll Variable 1: Stress at work. Participants were asked the question: ‘In general, how do you find your job?’ and were provided with a 5-point ordinal Likert scale.

#Controll Variable 2: Total household income. A categorical scale from 1 (<£520) to 31 (£150,000+).

#Controll Variable 3: Working full time or part-time. Binary scale where participants could tick either ‘Full-time’ or ‘Part-time’.

#Data Cleaning

#Step 1: Read in Data

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
read_tsv("ScotHS19.tab")
## Rows: 6881 Columns: 2294
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (2294): CPSerialA, chhserialA, Person, Stype12, Main, Boost, Sample, Ver...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6,881 × 2,294
##     CPSerialA chhserialA Person Stype12  Main Boost Sample  Vera SYear   Bio
##         <dbl>      <dbl>  <dbl>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
##  1 1200168601   12001686      1       1     1     0      1     1    12     0
##  2 1200168602   12001686      2       1     1     0      1     1    12     0
##  3 1200417301   12004173      1       1     1     0      1     1    12     0
##  4 1200417302   12004173      2       1     1     0      1     1    12     0
##  5 1200539801   12005398      1       1     1     0      1     1    12     0
##  6 1200063401   12000634      1       1     1     0      1     1    12     0
##  7 1200171301   12001713      1       1     1     0      1     1    12     0
##  8 1200171302   12001713      2       1     1     0      1     1    12     0
##  9 1200050501   12000505      1       1     1     0      1     1    12     0
## 10 1200050502   12000505      2       1     1     0      1     1    12     0
## # ℹ 6,871 more rows
## # ℹ 2,284 more variables: LegPar <dbl>, Par1 <dbl>, Par2 <dbl>, SelCh <dbl>,
## #   LiveWith <dbl>, Hholder <dbl>, HRPID <dbl>, HHldr1 <dbl>, HHldr2 <dbl>,
## #   HHldr3 <dbl>, HHldr4 <dbl>, HHldr5 <dbl>, HHldr6 <dbl>, HHldr7 <dbl>,
## #   HHldr8 <dbl>, HHldr9 <dbl>, HHldr10 <dbl>, HHldr97 <dbl>, HHResp <dbl>,
## #   HQResp <dbl>, HiHNum <dbl>, JntEldA <dbl>, JntEldB <dbl>, DVHRPNum <dbl>,
## #   OwnRnt08 <dbl>, PasSm <dbl>, SmokHm <dbl>, EatTog <dbl>, LiveArea <dbl>, …
ScotHS19 <- read_tsv("ScotHS19.tab")
## Rows: 6881 Columns: 2294
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (2294): CPSerialA, chhserialA, Person, Stype12, Main, Boost, Sample, Ver...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(ScotHS19)

#Step 2: Select Variables

SelectedVariables <- ScotHS19 %>% select(WorkBal, totinc, Sex, StrWork, HFtPtime)
view(SelectedVariables)

#Step 2.1: Clarify Variables

table_labels <- list(
  WorkBal = "Satisfaction with Work-Life Balance",
  Sex = "Sex",
  totinc = "Total Income",
  StrWork = "Level of Stress at Work", 
  HFtPtime = "Full-time or Part-time Employment")

#Step 3: Filter

SelectedVariables <- filter(SelectedVariables,
                            WorkBal >= 0, WorkBal <= 10, totinc >= 1, totinc <= 31, StrWork >= 1, StrWork <= 5, Sex >= 1, Sex <=2, HFtPtime >= 1, HFtPtime <=2)

#Step 4: Coerce and Label

SelectedVariables %>%
  mutate(WorkBal = factor(WorkBal, levels = 0:10, labels = c("0 - Extremely dissatisfied", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10 - Extremely satisfied" )))
## # A tibble: 1,431 × 5
##    WorkBal                  totinc   Sex StrWork HFtPtime
##    <fct>                     <dbl> <dbl>   <dbl>    <dbl>
##  1 7                            18     2       2        2
##  2 10 - Extremely satisfied     15     1       2        2
##  3 3                            22     1       2        1
##  4 7                            13     2       3        2
##  5 4                            16     1       4        1
##  6 7                            16     2       2        1
##  7 10 - Extremely satisfied     22     2       1        1
##  8 2                            23     1       4        1
##  9 7                            23     2       1        1
## 10 10 - Extremely satisfied     19     1       3        1
## # ℹ 1,421 more rows
SelectedVariables %>%
  mutate(Sex = factor(Sex, levels = 1:2, labels = c("1 - male", "2 - female")))
## # A tibble: 1,431 × 5
##    WorkBal totinc Sex        StrWork HFtPtime
##      <dbl>  <dbl> <fct>        <dbl>    <dbl>
##  1       7     18 2 - female       2        2
##  2      10     15 1 - male         2        2
##  3       3     22 1 - male         2        1
##  4       7     13 2 - female       3        2
##  5       4     16 1 - male         4        1
##  6       7     16 2 - female       2        1
##  7      10     22 2 - female       1        1
##  8       2     23 1 - male         4        1
##  9       7     23 2 - female       1        1
## 10      10     19 1 - male         3        1
## # ℹ 1,421 more rows
SelectedVariables %>%
  mutate(totinc = factor(totinc, levels = 1:31, labels = c("1 - <£520", "2 - £520<£1,600", "3 - £1,600<£2,600", "4 - £2,600<£3,600", "5 - £3,600<£5,200", "6 - £5,200<£7,800",
                                                           "7 - £7,800<£10,400", "8 - £10,400<£13,000", "9 - £13,000<£15,600", "10 - £15,600<£18,200", "11 - £18,200<£20,800", 
                                                           "12 - £20,800<£23,400", "13 - £23,400<£26,000", "14 - £26,000<£28,600", "15 - £28,600<£31,200", "16 - £31,200<£33,800", "17 - £33,800<£36,400", "18 - £36,400<£41,600", 
                                                           "19 - £41,600<£46,800", "20 - £46,800<£52,000", "21 - £52,000<£60,000", "22 - £60,000<£70,000", "23 - £70,000<£78,000", "24 - £78,000<£90,000", "25 - £90,000<£100,000",
                                                           "26 - £100,000<£110,000", "27 - £110,000<£120,000", "28 - £120,000<£130,000", "29 - £130,000<£140,000", "30 - £140,000<£150,000", "31 - £150,000+")))
## # A tibble: 1,431 × 5
##    WorkBal totinc                 Sex StrWork HFtPtime
##      <dbl> <fct>                <dbl>   <dbl>    <dbl>
##  1       7 18 - £36,400<£41,600     2       2        2
##  2      10 15 - £28,600<£31,200     1       2        2
##  3       3 22 - £60,000<£70,000     1       2        1
##  4       7 13 - £23,400<£26,000     2       3        2
##  5       4 16 - £31,200<£33,800     1       4        1
##  6       7 16 - £31,200<£33,800     2       2        1
##  7      10 22 - £60,000<£70,000     2       1        1
##  8       2 23 - £70,000<£78,000     1       4        1
##  9       7 23 - £70,000<£78,000     2       1        1
## 10      10 19 - £41,600<£46,800     1       3        1
## # ℹ 1,421 more rows
SelectedVariables %>%
  mutate(StrWork = factor(StrWork, levels = 1:5, labels = c("1 - Not at all stressful", "2 - Mildly stressful", "3 - Moderately stressful", "4 - Very stressful", "5 - Extremely stressful")))
## # A tibble: 1,431 × 5
##    WorkBal totinc   Sex StrWork                  HFtPtime
##      <dbl>  <dbl> <dbl> <fct>                       <dbl>
##  1       7     18     2 2 - Mildly stressful            2
##  2      10     15     1 2 - Mildly stressful            2
##  3       3     22     1 2 - Mildly stressful            1
##  4       7     13     2 3 - Moderately stressful        2
##  5       4     16     1 4 - Very stressful              1
##  6       7     16     2 2 - Mildly stressful            1
##  7      10     22     2 1 - Not at all stressful        1
##  8       2     23     1 4 - Very stressful              1
##  9       7     23     2 1 - Not at all stressful        1
## 10      10     19     1 3 - Moderately stressful        1
## # ℹ 1,421 more rows
SelectedVariables %>%
  mutate(HFtPtime = factor(HFtPtime, levels = 1:2, labels = c("1 - Full-time", "2 - Part-time")))
## # A tibble: 1,431 × 5
##    WorkBal totinc   Sex StrWork HFtPtime     
##      <dbl>  <dbl> <dbl>   <dbl> <fct>        
##  1       7     18     2       2 2 - Part-time
##  2      10     15     1       2 2 - Part-time
##  3       3     22     1       2 1 - Full-time
##  4       7     13     2       3 2 - Part-time
##  5       4     16     1       4 1 - Full-time
##  6       7     16     2       2 1 - Full-time
##  7      10     22     2       1 1 - Full-time
##  8       2     23     1       4 1 - Full-time
##  9       7     23     2       1 1 - Full-time
## 10      10     19     1       3 1 - Full-time
## # ℹ 1,421 more rows

#Step 5: Decriptive Statistics

#DV

library(arsenal)
## 
## Attaching package: 'arsenal'
## The following object is masked from 'package:lubridate':
## 
##     is.Date
summarise(SelectedVariables,
          mean_satisfaction= mean(WorkBal),
          median_satisfaction = median(WorkBal))
## # A tibble: 1 × 2
##   mean_satisfaction median_satisfaction
##               <dbl>               <dbl>
## 1              6.68                   7
summarise(SelectedVariables,
          quartiles = quantile(WorkBal, probs = c(0.25, 0.5, 0.75)), 
          q = c("First Quartile", "Second Quartile", "Third Quartile"))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 3 × 2
##   quartiles q              
##       <dbl> <chr>          
## 1         5 First Quartile 
## 2         7 Second Quartile
## 3         8 Third Quartile

#IV

SelectedVariables %>% count(Sex) %>% mutate(percent = n / sum(n)*100)
## # A tibble: 2 × 3
##     Sex     n percent
##   <dbl> <int>   <dbl>
## 1     1   667    46.6
## 2     2   764    53.4

#More female than male participants. IV described with the mode (a frequency table)

#Summary of all Variables

library(arsenal)
summary(SelectedVariables, text=TRUE)
##     WorkBal           totinc           Sex           StrWork     
##  Min.   : 0.000   Min.   : 1.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 5.000   1st Qu.:13.00   1st Qu.:1.000   1st Qu.:2.000  
##  Median : 7.000   Median :19.00   Median :2.000   Median :2.000  
##  Mean   : 6.681   Mean   :18.02   Mean   :1.534   Mean   :2.454  
##  3rd Qu.: 8.000   3rd Qu.:22.00   3rd Qu.:2.000   3rd Qu.:3.000  
##  Max.   :10.000   Max.   :31.00   Max.   :2.000   Max.   :5.000  
##     HFtPtime    
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :1.000  
##  Mean   :1.186  
##  3rd Qu.:1.000  
##  Max.   :2.000

The mean for WLB is 6.68 and the median is 7, which means that people in Scotland are generally satisfied with their WLB. The majority fo participants answered between 5 to 8.

SelectedVariables %>% group_by(Sex) %>% summarise(mean_satisfaction = mean(WorkBal), 
                                                  median_satisfaction = median(WorkBal))
## # A tibble: 2 × 3
##     Sex mean_satisfaction median_satisfaction
##   <dbl>             <dbl>               <dbl>
## 1     1              6.66                   7
## 2     2              6.70                   7

However, when the mean and median is compared between gender, women (2) has rated their satisfaction slightly higher than men (1) on average. However, the median is the same. This contradicts findings from previous research indentified in the literature review as it reported that women are less satisfied then men with their WLB.

#Step 6 Visualisation

library(ggplot2)

#DV

SelectedVariables %>%
  ggplot(aes(WorkBal)) +
  geom_histogram() +
  labs(title = "Distribution of Satisfaction with Work-Life Balance in Scotland",
       subtitle = "Scale 0 - 10",
       x = "Level of Satisfaction",
       y = "Count",
       caption = "Source: Scottish Health Survey 2019")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#The majority of participants have answered 5 or above on the 0-10 scale of WLB.

SelectedVariables %>%
  ggplot(aes(y = WorkBal)) +
  geom_boxplot() +
  labs(title = "Distribution of Satisfaction with Work-Life Balance in Scotland",
       x = "Level of Satisfaction",
       y = "",
       caption = "Source: Scottish Health Survey 2019")

#Here the quartiles are visualised for all participants. 25% answered 5 or below. 25% ansered 8 or above.

#IV

SelectedVariables %>%
  ggplot(aes(Sex)) +
  geom_bar() +
  labs(title = "Gender", 
       x = "", 
       y = "Count" +
         theme_minimal())

#More female than male participants. Logaritmic plot is relevant for DV-IV. I labelled 1 and 2 as male and female but they still show up as 1 and 2.

#DV-IV Relationship

library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
SelectedVariables %>% 
  ggplot(aes(x = Sex, y = WorkBal)) +
  geom_boxplot() +
  theme_minimal() +
  scale_y_log10() +
  labs(title = "Distribution of Satisfaction by Sex", 
       subtitle = "Logaritmic scale",
       x = "Sex",
       y = "Level of Satisfaction", 
       caption = "Source: Scottish Health Survey 2019")
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
## Warning: Removed 19 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

I do not understand why only one boxplot comes up - appreciate any tips

SelectedVariables %>%
  ggplot(aes(WorkBal)) + 
  geom_density(aes(fill = Sex), alpha = 0.2) +
  labs( title = "Distribution of satisfaction with work-life balance by sex",
        subtitle = "Logarithmic scale",
        x = "Level of Satisfaction",
        y = "Density",
        caption = "Source: Scottish Health Survey 2019") +
  theme_minimal() +
  scale_x_log10()
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 19 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning: The following aesthetics were dropped during statistical transformation: fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

#Similarly, to earlier boxplot, it seems like only the variable WorkBal shows up and not both WorkBal and Sex. `

SelectedVariables %>%
  ggplot(aes(WorkBal)) + 
  geom_histogram(aes(fill=Sex)) +
  labs( title = "Distribution of satisfaction with work-life balance by sex",
        subtitle = "Logarithmic scale",
        x = "Level of Satisfaction",
        y = "Density",
        caption = "Source: Scottish Health Survey 2019",
        fill = "Sex") + 
  theme_minimal() +
  scale_x_log10()
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 19 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: The following aesthetics were dropped during statistical transformation: fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

#Same as above happens with histograms to compare IV-DV