2024-10-14

Loading in the Data

First we will load in our data using the read.csv() function

knitr::opts_chunk$set(echo = TRUE)
df = read.csv("/Users/amyfishencord/Downloads/Impact_of_Remote_Work_on_Mental_Health.csv",
              header=TRUE)

Importing Libraries

Next, we will be importing libraries used to produce data visualization and complete data manipulation.

library(ggplot2)
library(dplyr)
library(plotrix)
library(plotly)
library(knitr)
library(naniar)
library(RColorBrewer)

About the Data

The “Remote Work and Mental Health” dataset explores the effects of remote work on employees’ mental well-being. It includes 5,000 records collected from employees world-wide that capture various factors such as stress levels, job satisfaction, and feelings of social isolation among workers across different industries and job roles.

Quick Overview of the Data

I will be using the str() function to show each column name and the first few values in the dataset to get a quick overview of the data and datatypes we will be using.

## 'data.frame':    5000 obs. of  20 variables:
##  $ Employee_ID                      : chr  "EMP0001" "EMP0002" "EMP0003" "EMP0004" ...
##  $ Age                              : int  32 40 59 27 49 59 31 42 56 30 ...
##  $ Gender                           : chr  "Non-binary" "Female" "Non-binary" "Male" ...
##  $ Job_Role                         : chr  "HR" "Data Scientist" "Software Engineer" "Software Engineer" ...
##  $ Industry                         : chr  "Healthcare" "IT" "Education" "Finance" ...
##  $ Years_of_Experience              : int  13 3 22 20 32 31 24 6 9 28 ...
##  $ Work_Location                    : chr  "Hybrid" "Remote" "Hybrid" "Onsite" ...
##  $ Hours_Worked_Per_Week            : int  47 52 46 32 35 39 51 54 24 57 ...
##  $ Number_of_Virtual_Meetings       : int  7 4 11 8 12 3 7 7 4 6 ...
##  $ Work_Life_Balance_Rating         : int  2 1 5 4 2 4 3 3 2 1 ...
##  $ Stress_Level                     : chr  "Medium" "Medium" "Medium" "High" ...
##  $ Mental_Health_Condition          : chr  "Depression" "Anxiety" "Anxiety" "Depression" ...
##  $ Access_to_Mental_Health_Resources: chr  "No" "No" "No" "Yes" ...
##  $ Productivity_Change              : chr  "Decrease" "Increase" "No Change" "Increase" ...
##  $ Social_Isolation_Rating          : int  1 3 4 3 3 5 5 5 2 2 ...
##  $ Satisfaction_with_Remote_Work    : chr  "Unsatisfied" "Satisfied" "Unsatisfied" "Unsatisfied" ...
##  $ Company_Support_for_Remote_Work  : int  1 2 5 3 3 1 3 4 4 1 ...
##  $ Physical_Activity                : chr  "Weekly" "Weekly" "None" "None" ...
##  $ Sleep_Quality                    : chr  "Good" "Good" "Poor" "Poor" ...
##  $ Region                           : chr  "Europe" "Asia" "North America" "Europe" ...

Missing Values

There are no missing values within the variables of our dataset, making our dataset complete.

Overview of variables

I want to focus on looking at a few specific variables to get a better understanding of what they mean and what their values are. First, I want to look closer into the Age variable, specifically at the summary.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      22      31      41      41      51      60

Distribution of Age

The age distribution is multinomial, showing multiple peaks and valleys, which is important for understanding the diverse experiences and perspectives of employees across different age groups in the context of remote work.

Gender Variable

Next, I want to focus on the gender variable showing the percentages of each gender in the dataset. This helps us understand a background of participants in the dataset.

       Female              Male        Non-binary Prefer not to say 
         1274              1270              1214              1242 

Job Role

The boxplot of job roles reveals the distribution of employees across various positions

Industry

The boxplot of different industries reveals the distribution of industries with remote work.

Mean Age, by Job Roles and Industry

We can observe almost all job roles industries are dominated by employees in the age range of 39 - 43. Software Engineers and Designer mean age are all over 40 for each industry. Meanwhile HR has the most industries including the mean age under 40.

Work Location, Region

Work Location
Var1 Freq
Hybrid 1649
Onsite 1637
Remote 1714
Region
Var1 Freq
Africa 860
Asia 829
Europe 840
North America 777
Oceania 867
South America 827

Stress Level, Mental Condition, Sleep Quality

Stress Level
Var1 Freq
High 1686
Low 1645
Medium 1669
Mental Health Condition
Var1 Freq
Anxiety 1278
Burnout 1280
Depression 1246
None 1196
Sleep Quality
Var1 Freq
Average 1628
Good 1687
Poor 1685

Balanced Representation in the Dataset

From diving deeper into a few of our variables in the dataset, we can see from the graphs and tables, each attribute in many of our variables is close to equal. With this diversed but equal representation, our analysis can more accurately reflect the experiences of each group. This balance allows us to explore deeper insights without bias, ensuring that all employee experiences are represented. Now, let’s dive into visualizing relationships within the data and explore predictive models for further insights.

Objective and Problem Definition

This project explores key trends within the “Impact of Remote Work on Mental Health” dataset to identify factors influencing employee mental health. Specifically, I will investigate relationships between variables such as job role, stress level, and work-life balance to determine which factors most significantly impact mental well-being among on-site, hybrid and remote employees.

Years of Exp vs Hours Worked Per Week

## [1] -0.01853681

Our calculated correlation coefficient is -0.0185 meaning there is minimal to no correlation between years of experience and hours worked, whether an employee has just a few years of experience or many years, it does not significantly affect the number of hours they work each week.

Stress Level vs Hours Worked

## [1] 0.9995765

Our calculated correlation coefficient is 0.99, the high correlation suggests that the variables are closely related. Employees who report higher stress levels tend to work more hours on average.

Work Location vs Stress Level

We can observe employees working in a hybrid or remote model experience higher stress levels.

Job Role vs Stress Level

Employees of jobs including data scientists and sales tend to have a majority stress level of high. Employees of jobs including HR, marketing, and software engineers have a majority stress level of medium while designers and project managers have a majority stress level of low.

Rate of Productivity vs Work Location

Employees working in a remote or hybrid work model tend to experience a decrease in their productivity.

Access to Mental Help vs Productivity

Firse, we filter out the employees who have no mental health conditions.

Among employees who have access to mental health resources, the majority report a decrease in productivity. In contrast, most employees without access to mental health resources exhibit no change in productivity. This suggests that while mental health resources are crucial for support, they may not directly correlate with productivity gains in the short term. Alternatively, it could indicate that those already struggling with productivity may be more likely to seek out these resources.

Work Life Balance vs Job Role

On a scale from 1-5, we will calculate and show the mean work life balance rating for each job role.

Remote software engineers have the highest average work life balance at 3.2, while remote employees working in marketing have the lowest average work life balance rate at 2.7.

Physical Activity and Sleep Quality

This table shows the most common values for physical activity and sleep quality for employees in each work location.

## # A tibble: 3 × 3
##   Work_Location Physical_Activity_Mode Sleep_Quality_Mode
##   <chr>         <chr>                  <chr>             
## 1 Hybrid        Weekly                 Good              
## 2 Onsite        Weekly                 Poor              
## 3 Remote        Daily                  Average

Hybrid employees work out weekly with good sleep quality. Onsite employees work out weekly but have poor sleep quality. Remote employees work out daily with average sleep quality.

Productivity vs Company Support

We can observe there is no correlation between productivity change and the given company support rating.

Age, Hours Worked & Satisfaction Level

This 3D scatter plot illustrates the relationship between employee age, hours worked per week, and satisfaction level regarding remote work, with points color-coded by job role. The distribution of points highlights potential trends, such as how satisfaction levels vary across different age groups and workloads.

Conclusion

The analysis of the “Impact of Remote Work on Mental Health” dataset highlights several trends in employee demographics and well-being. Employees aged 39 to 43 dominate various job roles, particularly in Software Engineering and Design. The dataset reflects a balanced representation across job roles, enhancing the credibility of our insights.

A strong correlation (0.99) indicates that higher stress levels are associated with longer working hours, while years of experience show minimal correlation (-0.0185) with hours worked. Job roles like Data Scientists and Sales exhibit higher stress levels, whereas remote and hybrid employees tend to report decreased productivity.

Interestingly, access to mental health resources does not guarantee productivity gains for those struggling, and remote Software Engineers report the highest work-life balance. Overall, these findings reveal the complex relationships between job roles, stress, productivity, and mental health resources, paving the way for further exploration of these dynamics.