First we will load in our data using the read.csv() function
knitr::opts_chunk$set(echo = TRUE)
df = read.csv("/Users/amyfishencord/Downloads/Impact_of_Remote_Work_on_Mental_Health.csv",
header=TRUE)
2024-10-14
First we will load in our data using the read.csv() function
knitr::opts_chunk$set(echo = TRUE)
df = read.csv("/Users/amyfishencord/Downloads/Impact_of_Remote_Work_on_Mental_Health.csv",
header=TRUE)
Next, we will be importing libraries used to produce data visualization and complete data manipulation.
library(ggplot2) library(dplyr) library(plotrix) library(plotly) library(knitr) library(naniar) library(RColorBrewer)
The “Remote Work and Mental Health” dataset explores the effects of remote work on employees’ mental well-being. It includes 5,000 records collected from employees world-wide that capture various factors such as stress levels, job satisfaction, and feelings of social isolation among workers across different industries and job roles.
I will be using the str() function to show each column name and the first few values in the dataset to get a quick overview of the data and datatypes we will be using.
## 'data.frame': 5000 obs. of 20 variables: ## $ Employee_ID : chr "EMP0001" "EMP0002" "EMP0003" "EMP0004" ... ## $ Age : int 32 40 59 27 49 59 31 42 56 30 ... ## $ Gender : chr "Non-binary" "Female" "Non-binary" "Male" ... ## $ Job_Role : chr "HR" "Data Scientist" "Software Engineer" "Software Engineer" ... ## $ Industry : chr "Healthcare" "IT" "Education" "Finance" ... ## $ Years_of_Experience : int 13 3 22 20 32 31 24 6 9 28 ... ## $ Work_Location : chr "Hybrid" "Remote" "Hybrid" "Onsite" ... ## $ Hours_Worked_Per_Week : int 47 52 46 32 35 39 51 54 24 57 ... ## $ Number_of_Virtual_Meetings : int 7 4 11 8 12 3 7 7 4 6 ... ## $ Work_Life_Balance_Rating : int 2 1 5 4 2 4 3 3 2 1 ... ## $ Stress_Level : chr "Medium" "Medium" "Medium" "High" ... ## $ Mental_Health_Condition : chr "Depression" "Anxiety" "Anxiety" "Depression" ... ## $ Access_to_Mental_Health_Resources: chr "No" "No" "No" "Yes" ... ## $ Productivity_Change : chr "Decrease" "Increase" "No Change" "Increase" ... ## $ Social_Isolation_Rating : int 1 3 4 3 3 5 5 5 2 2 ... ## $ Satisfaction_with_Remote_Work : chr "Unsatisfied" "Satisfied" "Unsatisfied" "Unsatisfied" ... ## $ Company_Support_for_Remote_Work : int 1 2 5 3 3 1 3 4 4 1 ... ## $ Physical_Activity : chr "Weekly" "Weekly" "None" "None" ... ## $ Sleep_Quality : chr "Good" "Good" "Poor" "Poor" ... ## $ Region : chr "Europe" "Asia" "North America" "Europe" ...
There are no missing values within the variables of our dataset, making our dataset complete.
I want to focus on looking at a few specific variables to get a better understanding of what they mean and what their values are. First, I want to look closer into the Age variable, specifically at the summary.
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 22 31 41 41 51 60
The age distribution is multinomial, showing multiple peaks and valleys, which is important for understanding the diverse experiences and perspectives of employees across different age groups in the context of remote work.
Next, I want to focus on the gender variable showing the percentages of each gender in the dataset. This helps us understand a background of participants in the dataset.
Female Male Non-binary Prefer not to say
1274 1270 1214 1242
The boxplot of job roles reveals the distribution of employees across various positions
The boxplot of different industries reveals the distribution of industries with remote work.
We can observe almost all job roles industries are dominated by employees in the age range of 39 - 43. Software Engineers and Designer mean age are all over 40 for each industry. Meanwhile HR has the most industries including the mean age under 40.
| Var1 | Freq |
|---|---|
| Hybrid | 1649 |
| Onsite | 1637 |
| Remote | 1714 |
| Var1 | Freq |
|---|---|
| Africa | 860 |
| Asia | 829 |
| Europe | 840 |
| North America | 777 |
| Oceania | 867 |
| South America | 827 |
| Var1 | Freq |
|---|---|
| High | 1686 |
| Low | 1645 |
| Medium | 1669 |
| Var1 | Freq |
|---|---|
| Anxiety | 1278 |
| Burnout | 1280 |
| Depression | 1246 |
| None | 1196 |
| Var1 | Freq |
|---|---|
| Average | 1628 |
| Good | 1687 |
| Poor | 1685 |
From diving deeper into a few of our variables in the dataset, we can see from the graphs and tables, each attribute in many of our variables is close to equal. With this diversed but equal representation, our analysis can more accurately reflect the experiences of each group. This balance allows us to explore deeper insights without bias, ensuring that all employee experiences are represented. Now, let’s dive into visualizing relationships within the data and explore predictive models for further insights.
This project explores key trends within the “Impact of Remote Work on Mental Health” dataset to identify factors influencing employee mental health. Specifically, I will investigate relationships between variables such as job role, stress level, and work-life balance to determine which factors most significantly impact mental well-being among on-site, hybrid and remote employees.
## [1] -0.01853681
Our calculated correlation coefficient is -0.0185 meaning there is minimal to no correlation between years of experience and hours worked, whether an employee has just a few years of experience or many years, it does not significantly affect the number of hours they work each week.
## [1] 0.9995765
Our calculated correlation coefficient is 0.99, the high correlation suggests that the variables are closely related. Employees who report higher stress levels tend to work more hours on average.
We can observe employees working in a hybrid or remote model experience higher stress levels.
Employees of jobs including data scientists and sales tend to have a majority stress level of high. Employees of jobs including HR, marketing, and software engineers have a majority stress level of medium while designers and project managers have a majority stress level of low.
Employees working in a remote or hybrid work model tend to experience a decrease in their productivity.
Firse, we filter out the employees who have no mental health conditions.
Among employees who have access to mental health resources, the majority report a decrease in productivity. In contrast, most employees without access to mental health resources exhibit no change in productivity. This suggests that while mental health resources are crucial for support, they may not directly correlate with productivity gains in the short term. Alternatively, it could indicate that those already struggling with productivity may be more likely to seek out these resources.
On a scale from 1-5, we will calculate and show the mean work life balance rating for each job role.
Remote software engineers have the highest average work life balance at 3.2, while remote employees working in marketing have the lowest average work life balance rate at 2.7.
This table shows the most common values for physical activity and sleep quality for employees in each work location.
## # A tibble: 3 × 3 ## Work_Location Physical_Activity_Mode Sleep_Quality_Mode ## <chr> <chr> <chr> ## 1 Hybrid Weekly Good ## 2 Onsite Weekly Poor ## 3 Remote Daily Average
Hybrid employees work out weekly with good sleep quality. Onsite employees work out weekly but have poor sleep quality. Remote employees work out daily with average sleep quality.
We can observe there is no correlation between productivity change and the given company support rating.
This 3D scatter plot illustrates the relationship between employee age, hours worked per week, and satisfaction level regarding remote work, with points color-coded by job role. The distribution of points highlights potential trends, such as how satisfaction levels vary across different age groups and workloads.
The analysis of the “Impact of Remote Work on Mental Health” dataset highlights several trends in employee demographics and well-being. Employees aged 39 to 43 dominate various job roles, particularly in Software Engineering and Design. The dataset reflects a balanced representation across job roles, enhancing the credibility of our insights.
A strong correlation (0.99) indicates that higher stress levels are associated with longer working hours, while years of experience show minimal correlation (-0.0185) with hours worked. Job roles like Data Scientists and Sales exhibit higher stress levels, whereas remote and hybrid employees tend to report decreased productivity.
Interestingly, access to mental health resources does not guarantee productivity gains for those struggling, and remote Software Engineers report the highest work-life balance. Overall, these findings reveal the complex relationships between job roles, stress, productivity, and mental health resources, paving the way for further exploration of these dynamics.