#Introduction
#Employees’ attainment of a satisfactory work-life balance (WLB) has become an increasing area of focus for professionals, policy makers and academics (Dilmaghani et al., 2019). The past few years, there has been a societal shift towards greater gender equality in labour market attainment (Ibid, 2019). In most developed countries, the impact gender has on people’s WLB has been analysed as the traditional model of a male breadwinner and a female homemaker is becoming increasingly less common. Balancing work and family demands is a struggle that most employees deal with daily (Karkoulian et al., 2016). However, while men have been encouraged to show increasing engagement in family life, childcare and homemaking, the greater shares of chores are still reported to fall on women (Dilmaghani et al., 2019). Consequently, females generally bear the double burden of unpaid house labour and paid employment. Some research indicates that the conflict between work and family life appears to be a greater problem for women than for men (Pace et al., 2021). The aim of this research question is to explore whether there is a difference in satisfaction with WLB between women and men in Scotland.
#Research Question: Is there a difference in satisfaction with work-life balance between women and men in Scotland?
#Dependent Variable: Satisfaction with work-life balance. Participants were asked the question: ‘How satisfied with balance between time on paid work and time on other aspects of life?’ and were provided with a continuous, numeric 11-point scale from 0 (extremely dissatisfied) to 10 (extremely satisfied).
#Independent Variable: Gender. Binary scale where participants could tick either ‘male’ or ‘female’.
#Cross-Sectional Data set: The Scottish Health Survey of 2019 will be used to answer the research question. #Specific File: shes19i_eul
#In addition to gender, stress at work, part-time or full-time employment, and class (Karkoulian et al., 2016; Pace et al., 2021; Dilmaghani et al., 2019) were mentioned as additional factors that could influence someone’s satisfaction with WLB and have therefore been chosen as controlled variables.
#Controll Variable 1: Stress at work. Participants were asked the question: ‘In general, how do you find your job?’ and were provided with a 5-point ordinal Likert scale.
#Controll Variable 2: Total household income. A categorical scale from 1 (<£520) to 31 (£150,000+).
#Controll Variable 3: Working full time or part-time. Binary scale where participants could tick either ‘Full-time’ or ‘Part-time’.
#Data Cleaning
#Step 1: Read in Data
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
read_tsv("ScotHS19.tab")
## Rows: 6881 Columns: 2294
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (2294): CPSerialA, chhserialA, Person, Stype12, Main, Boost, Sample, Ver...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6,881 × 2,294
## CPSerialA chhserialA Person Stype12 Main Boost Sample Vera SYear Bio
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1200168601 12001686 1 1 1 0 1 1 12 0
## 2 1200168602 12001686 2 1 1 0 1 1 12 0
## 3 1200417301 12004173 1 1 1 0 1 1 12 0
## 4 1200417302 12004173 2 1 1 0 1 1 12 0
## 5 1200539801 12005398 1 1 1 0 1 1 12 0
## 6 1200063401 12000634 1 1 1 0 1 1 12 0
## 7 1200171301 12001713 1 1 1 0 1 1 12 0
## 8 1200171302 12001713 2 1 1 0 1 1 12 0
## 9 1200050501 12000505 1 1 1 0 1 1 12 0
## 10 1200050502 12000505 2 1 1 0 1 1 12 0
## # ℹ 6,871 more rows
## # ℹ 2,284 more variables: LegPar <dbl>, Par1 <dbl>, Par2 <dbl>, SelCh <dbl>,
## # LiveWith <dbl>, Hholder <dbl>, HRPID <dbl>, HHldr1 <dbl>, HHldr2 <dbl>,
## # HHldr3 <dbl>, HHldr4 <dbl>, HHldr5 <dbl>, HHldr6 <dbl>, HHldr7 <dbl>,
## # HHldr8 <dbl>, HHldr9 <dbl>, HHldr10 <dbl>, HHldr97 <dbl>, HHResp <dbl>,
## # HQResp <dbl>, HiHNum <dbl>, JntEldA <dbl>, JntEldB <dbl>, DVHRPNum <dbl>,
## # OwnRnt08 <dbl>, PasSm <dbl>, SmokHm <dbl>, EatTog <dbl>, LiveArea <dbl>, …
ScotHS19 <- read_tsv("ScotHS19.tab")
## Rows: 6881 Columns: 2294
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (2294): CPSerialA, chhserialA, Person, Stype12, Main, Boost, Sample, Ver...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(ScotHS19)
#Step 2: Select Variables
SelectedVariables <- ScotHS19 %>% select(WorkBal, totinc, Sex, StrWork, HFtPtime)
view(SelectedVariables)
#Step 2.1: Clarify Variables
table_labels <- list(
WorkBal = "Satisfaction with Work-Life Balance",
Sex = "Sex",
totinc = "Total Income",
StrWork = "Level of Stress at Work",
HFtPtime = "Full-time or Part-time Employment")
#Step 3: Filter
SelectedVariables <- filter(SelectedVariables,
WorkBal >= 0, WorkBal <= 10, totinc >= 1, totinc <= 31, StrWork >= 1, StrWork <= 5, Sex >= 1, Sex <=2, HFtPtime >= 1, HFtPtime <=2)
#Step 4: Coerce and Label
SelectedVariables %>%
mutate(WorkBal = factor(WorkBal, levels = 0:10, labels = c("0 - Extremely dissatisfied", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10 - Extremely satisfied" )))
## # A tibble: 1,431 × 5
## WorkBal totinc Sex StrWork HFtPtime
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 7 18 2 2 2
## 2 10 - Extremely satisfied 15 1 2 2
## 3 3 22 1 2 1
## 4 7 13 2 3 2
## 5 4 16 1 4 1
## 6 7 16 2 2 1
## 7 10 - Extremely satisfied 22 2 1 1
## 8 2 23 1 4 1
## 9 7 23 2 1 1
## 10 10 - Extremely satisfied 19 1 3 1
## # ℹ 1,421 more rows
SelectedVariables %>%
mutate(Sex = factor(Sex, levels = 1:2, labels = c("1 - male", "2 - female")))
## # A tibble: 1,431 × 5
## WorkBal totinc Sex StrWork HFtPtime
## <dbl> <dbl> <fct> <dbl> <dbl>
## 1 7 18 2 - female 2 2
## 2 10 15 1 - male 2 2
## 3 3 22 1 - male 2 1
## 4 7 13 2 - female 3 2
## 5 4 16 1 - male 4 1
## 6 7 16 2 - female 2 1
## 7 10 22 2 - female 1 1
## 8 2 23 1 - male 4 1
## 9 7 23 2 - female 1 1
## 10 10 19 1 - male 3 1
## # ℹ 1,421 more rows
SelectedVariables %>%
mutate(totinc = factor(totinc, levels = 1:31, labels = c("1 - <£520", "2 - £520<£1,600", "3 - £1,600<£2,600", "4 - £2,600<£3,600", "5 - £3,600<£5,200", "6 - £5,200<£7,800",
"7 - £7,800<£10,400", "8 - £10,400<£13,000", "9 - £13,000<£15,600", "10 - £15,600<£18,200", "11 - £18,200<£20,800",
"12 - £20,800<£23,400", "13 - £23,400<£26,000", "14 - £26,000<£28,600", "15 - £28,600<£31,200", "16 - £31,200<£33,800", "17 - £33,800<£36,400", "18 - £36,400<£41,600",
"19 - £41,600<£46,800", "20 - £46,800<£52,000", "21 - £52,000<£60,000", "22 - £60,000<£70,000", "23 - £70,000<£78,000", "24 - £78,000<£90,000", "25 - £90,000<£100,000",
"26 - £100,000<£110,000", "27 - £110,000<£120,000", "28 - £120,000<£130,000", "29 - £130,000<£140,000", "30 - £140,000<£150,000", "31 - £150,000+")))
## # A tibble: 1,431 × 5
## WorkBal totinc Sex StrWork HFtPtime
## <dbl> <fct> <dbl> <dbl> <dbl>
## 1 7 18 - £36,400<£41,600 2 2 2
## 2 10 15 - £28,600<£31,200 1 2 2
## 3 3 22 - £60,000<£70,000 1 2 1
## 4 7 13 - £23,400<£26,000 2 3 2
## 5 4 16 - £31,200<£33,800 1 4 1
## 6 7 16 - £31,200<£33,800 2 2 1
## 7 10 22 - £60,000<£70,000 2 1 1
## 8 2 23 - £70,000<£78,000 1 4 1
## 9 7 23 - £70,000<£78,000 2 1 1
## 10 10 19 - £41,600<£46,800 1 3 1
## # ℹ 1,421 more rows
SelectedVariables %>%
mutate(StrWork = factor(StrWork, levels = 1:5, labels = c("1 - Not at all stressful", "2 - Mildly stressful", "3 - Moderately stressful", "4 - Very stressful", "5 - Extremely stressful")))
## # A tibble: 1,431 × 5
## WorkBal totinc Sex StrWork HFtPtime
## <dbl> <dbl> <dbl> <fct> <dbl>
## 1 7 18 2 2 - Mildly stressful 2
## 2 10 15 1 2 - Mildly stressful 2
## 3 3 22 1 2 - Mildly stressful 1
## 4 7 13 2 3 - Moderately stressful 2
## 5 4 16 1 4 - Very stressful 1
## 6 7 16 2 2 - Mildly stressful 1
## 7 10 22 2 1 - Not at all stressful 1
## 8 2 23 1 4 - Very stressful 1
## 9 7 23 2 1 - Not at all stressful 1
## 10 10 19 1 3 - Moderately stressful 1
## # ℹ 1,421 more rows
SelectedVariables %>%
mutate(HFtPtime = factor(HFtPtime, levels = 1:2, labels = c("1 - Full-time", "2 - Part-time")))
## # A tibble: 1,431 × 5
## WorkBal totinc Sex StrWork HFtPtime
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 7 18 2 2 2 - Part-time
## 2 10 15 1 2 2 - Part-time
## 3 3 22 1 2 1 - Full-time
## 4 7 13 2 3 2 - Part-time
## 5 4 16 1 4 1 - Full-time
## 6 7 16 2 2 1 - Full-time
## 7 10 22 2 1 1 - Full-time
## 8 2 23 1 4 1 - Full-time
## 9 7 23 2 1 1 - Full-time
## 10 10 19 1 3 1 - Full-time
## # ℹ 1,421 more rows
#Step 5: Decriptive Statistics
#DV
library(arsenal)
##
## Attaching package: 'arsenal'
## The following object is masked from 'package:lubridate':
##
## is.Date
summarise(SelectedVariables,
mean_satisfaction= mean(WorkBal),
median_satisfaction = median(WorkBal))
## # A tibble: 1 × 2
## mean_satisfaction median_satisfaction
## <dbl> <dbl>
## 1 6.68 7
summarise(SelectedVariables,
quartiles = quantile(WorkBal, probs = c(0.25, 0.5, 0.75)),
q = c("First Quartile", "Second Quartile", "Third Quartile"))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 3 × 2
## quartiles q
## <dbl> <chr>
## 1 5 First Quartile
## 2 7 Second Quartile
## 3 8 Third Quartile
#IV
SelectedVariables %>% count(Sex) %>% mutate(percent = n / sum(n)*100)
## # A tibble: 2 × 3
## Sex n percent
## <dbl> <int> <dbl>
## 1 1 667 46.6
## 2 2 764 53.4
#More female than male participants. IV described with the mode (a frequency table)
#Summary of all Variables
library(arsenal)
summary(SelectedVariables, text=TRUE)
## WorkBal totinc Sex StrWork
## Min. : 0.000 Min. : 1.00 Min. :1.000 Min. :1.000
## 1st Qu.: 5.000 1st Qu.:13.00 1st Qu.:1.000 1st Qu.:2.000
## Median : 7.000 Median :19.00 Median :2.000 Median :2.000
## Mean : 6.681 Mean :18.02 Mean :1.534 Mean :2.454
## 3rd Qu.: 8.000 3rd Qu.:22.00 3rd Qu.:2.000 3rd Qu.:3.000
## Max. :10.000 Max. :31.00 Max. :2.000 Max. :5.000
## HFtPtime
## Min. :1.000
## 1st Qu.:1.000
## Median :1.000
## Mean :1.186
## 3rd Qu.:1.000
## Max. :2.000
SelectedVariables %>% group_by(Sex) %>% summarise(mean_satisfaction = mean(WorkBal),
median_satisfaction = median(WorkBal))
## # A tibble: 2 × 3
## Sex mean_satisfaction median_satisfaction
## <dbl> <dbl> <dbl>
## 1 1 6.66 7
## 2 2 6.70 7
#Step 6 Visualisation
library(ggplot2)
#DV
SelectedVariables %>%
ggplot(aes(WorkBal)) +
geom_histogram() +
labs(title = "Distribution of Satisfaction with Work-Life Balance in Scotland",
subtitle = "Scale 0 - 10",
x = "Level of Satisfaction",
y = "Count",
caption = "Source: Scottish Health Survey 2019")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#The majority of participants have answered 5 or above on the 0-10 scale
of WLB.
SelectedVariables %>%
ggplot(aes(y = WorkBal)) +
geom_boxplot() +
labs(title = "Distribution of Satisfaction with Work-Life Balance in Scotland",
x = "Level of Satisfaction",
y = "",
caption = "Source: Scottish Health Survey 2019")
#Here the quartiles are visualised for all participants. 25% answered 5
or below. 25% ansered 8 or above.
#IV
SelectedVariables %>%
ggplot(aes(Sex)) +
geom_bar() +
labs(title = "Gender",
x = "",
y = "Count" +
theme_minimal())
#More female than male participants. Logaritmic plot is relevant for
DV-IV. I labelled 1 and 2 as male and female but they still show up as 1
and 2.
#DV-IV Relationship
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
SelectedVariables %>%
ggplot(aes(x = Sex, y = WorkBal)) +
geom_boxplot() +
theme_minimal() +
scale_y_log10() +
labs(title = "Distribution of Satisfaction by Sex",
subtitle = "Logaritmic scale",
x = "Sex",
y = "Level of Satisfaction",
caption = "Source: Scottish Health Survey 2019")
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
## Warning: Removed 19 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
SelectedVariables %>%
ggplot(aes(WorkBal)) +
geom_density(aes(fill = Sex), alpha = 0.2) +
labs( title = "Distribution of satisfaction with work-life balance by sex",
subtitle = "Logarithmic scale",
x = "Level of Satisfaction",
y = "Density",
caption = "Source: Scottish Health Survey 2019") +
theme_minimal() +
scale_x_log10()
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 19 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning: The following aesthetics were dropped during statistical transformation: fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
#Similarly, to earlier boxplot, it seems like only the variable WorkBal shows up and not both WorkBal and Sex. `
SelectedVariables %>%
ggplot(aes(WorkBal)) +
geom_histogram(aes(fill=Sex)) +
labs( title = "Distribution of satisfaction with work-life balance by sex",
subtitle = "Logarithmic scale",
x = "Level of Satisfaction",
y = "Density",
caption = "Source: Scottish Health Survey 2019",
fill = "Sex") +
theme_minimal() +
scale_x_log10()
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 19 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: The following aesthetics were dropped during statistical transformation: fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
#Same as above happens with histograms to compare IV-DV