Introduction

This document will show multiple graphs about how the stress levels and other factors can affect a workplace environment.

About this dataset

This data set comes from Kaggle. It explores the multiple stress factors in an Indian workplace. There are 50,000 entries that can range from demographics to workplace dynamics.

url: (https://www.kaggle.com/datasets/ashaychoudhary/corporate-stress-dataset-insights-into-workplace/code)

Findings and Conclusion

Here are some findings that was noticed during the analysis and where it can be found. In “Stress Levels Among the Age Groups and Genders” aka Tab 1 , we can see that the highest stress level is at 10 at 4482 counts and lowest is at 0 at 4442 counts.The stacked bar also shows that there are certain age groups are more stressed than others. Here are the ones with the most stress: The female and male group ages (30-40) and (50-60), non-binary group ages (20-30) and (60-70). Looking at Tab 2 (Average Stress Levels throughout the years) we are looking at the years of experience vs the the average stress level. While I originally assumed that the first five years would be the most stressful we are actually wrong in which year 8 has the highest average stress level at 5.19 while the lowest stress is at 4.79 at the 16th year of experience. We can see that year 2 has the second lowest stress level. Tab 3 (Intensity levels for stress in a workplace size) can help companies understand the they need to provide more or less resources towards a company especially if it’s a large chain that varies in size. The darkest purple represents the highest number of counts which happens to be at a small company at stress level 5 with a count of 1,635. We can assume that employees at a small company will be more stressed than that of a larger company. Tab 4 (Amount of leave taken by each health issue) allows us to explore what health issues the employee has. It’s important to understand this in order for a company to retain employees and understand if they are able to allocate more hours of leave. This category is broken into 4 groups; physical, mental, none, and both. We can see that HR has reported the highest percentage of reporting both health issues and requesting leave. Lastly, Tab 5 (Burnout Symptoms within genders) shows the distribution of genders and whether or not they have experienced being burnout. Oddly enough we can see that it’s an even distribution across the board. While it is surprising that the distribution it would be nice if we can see if there was an effective point system for the employee to fall into a specific category.
Ultimately, we can see there are a variety of reasons as to what stressers can be at a workplace. In the future, it would be interesting to see how this changes over time.

#set directory
setwd("C:/Users/jenni/OneDrive/Documents/ds736data")

# Load all our libraries
library(data.table)
library(ggplot2)
library(dplyr)
library(scales)
library(lubridate)
library(plotly)
library(scales)
library(RColorBrewer)
library(ggthemes)
library(tidyr)
library(ggrepel)
library(htmlwidgets)
library(cowplot)
library(ggpubr)
library(flexdashboard)
library(bslib)
library(rmarkdown)
# here we are using an csv file 
filename<-read.csv("corporate_stress_dataset.csv")
#we want to make the ID the new index since it looks to be a counter and it's not an employee badge or an overly unique identifier
#make new column
filename$ID<-c(1:nrow(filename))
rownames(filename)<-filename$ID
#since there are NA i want to remove it completely from the data set
# would be using drop na from library tidyr
dfclean<- filename %>% drop_na()

Tab 1

#look at stress level and the amt of times it's been used
dfstress<-count(dfclean,Stress_Level)
ggplot(dfclean, aes(x=Age, y=Stress_Level, fill=Gender)) +
  geom_bar(stat="identity") +  #graph type 
  coord_flip() + #flip graph 
  scale_y_continuous(labels=comma)+ #add commas
  theme_few()+ 
  labs(title=" Stress levels among the Age groups within Gender", x="Age", y="Stress level")+
  theme(plot.title=element_text(hjust=0.5))+ # making it centered 
  scale_fill_brewer(palette="Pastel1") #color changer 

Tab 2

#let's make a new df just about stress levels and years of experience
experiencedf<- dfclean %>% # original data 
  select(Experience_Years, Stress_Level) %>% # what we want from og data 
  group_by(Experience_Years)%>% # what are we grouping by 
  summarise(avgstresslvl= mean(Stress_Level)) %>% # getting avg of stress level per ex lvl
  data.frame() # make a new data frame

# what's the min and max points? we will plot these 
hilo<-experiencedf %>%
  filter(avgstresslvl==min(avgstresslvl)| avgstresslvl ==max(avgstresslvl)) %>%
  data.frame()
#MAKING X AXIS LABELS 
xaxis<-min(experiencedf$Experience_Years) :max(experiencedf$Experience_Years)



ggplot(experiencedf, aes(x= Experience_Years, y= avgstresslvl))+
  geom_line(color="purple",size=0.5)+ 
  geom_point(shape=21, size=3, color="black",fill='white')+
  theme_light()+ 
  labs(title="Average Stress Levels throughout the years", x="experience year", y=" AVG Stress level")+
  scale_x_continuous(labels=xaxis, breaks= xaxis, minor_breaks = NULL)+
  theme(plot.title=element_text(hjust=0.5))+ 
  geom_point(data=hilo, aes(x=Experience_Years, y=avgstresslvl), shape=21, size=4, fill= "black", color="black")+
  geom_label_repel(aes(label = ifelse(avgstresslvl == max(avgstresslvl) | avgstresslvl == min(avgstresslvl), round(avgstresslvl,2), "")),
                   box.padding = 1, point.padding = 1, size = 4, color = 'Grey50', segment.color = "purple")

Tab 3

# make a dataframe that we want to pull from 

size<- dfclean %>% 
  select(Stress_Level,Company_Size) %>%
  group_by(Company_Size, Stress_Level) %>%
  summarise(counts=n(), .groups='keep') %>%
  data.frame
 #order i want it to be in 
sizeorder<-c('Small','Medium','Large')

#setting it
size$Company_Size <-factor(size$Company_Size, levels=sizeorder)
 

g<-ggplot(size, aes(x=Company_Size, y=Stress_Level, fill=counts)) +
  geom_tile(color='black')+
  geom_text(aes(label=comma(counts)))+
  labs(title="intensity levels for stress in a workplace size",
       x=" company size",
       y="stress levels",
       fill="counts of stess")+
  theme_grey()+
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_discrete(limits=levels(size$Company_Size))+
  scale_fill_continuous(low='white',high='purple')

ggplotly(g, height=500, width=500, tooltip=c("counts","Company_Size","Stress_Level")) %>%
  style(hoverlabel=list(bgcolor='white'))

Tab 4

#dataframe to get the counts of health issues per department
healthdf<- dfclean %>%
  select(Health_Issues,Annual_Leaves_Taken, Department) %>% #columns i want
  group_by(Department, Health_Issues) %>% # what to group by 
  summarise(counts=n(), totleave=sum(Annual_Leaves_Taken), .groups='keep') %>% # i want ct of issues reported and sum of totleave taken
  data.frame() 
healthpie<-plot_ly(textposition="inside", labels= ~Health_Issues, values= ~totleave) %>%
#Sales dept
  add_pie(data=healthdf[healthdf$Department== "Sales",],
          name="Sales", title= "Sales", domain=list(row=0,column=0)) %>%
#HR dept
  add_pie(data=healthdf[healthdf$Department== "HR",],
          name="HR", title= "HR", domain=list(row=0,column=1)) %>%
#Marketing dept  
  add_pie(data=healthdf[healthdf$Department== "Marketing",],
          name="Marketing", title= "Marketing", domain=list(row=0,column=2)) %>%
#finance dept
  add_pie(data=healthdf[healthdf$Department== "Finance",],
          name="Finance", title= "Finance", domain=list(row=1,column=0)) %>%
#admin dept
  add_pie(data=healthdf[healthdf$Department== "Admin",],
          name="Admin", title= "Admin", domain=list(row=1,column=1)) %>%
#IT dept
  add_pie(data=healthdf[healthdf$Department== "IT",],
          name="IT", title= "IT", domain=list(row=1,column=2)) %>%
  layout(title="Trellis Chart: Amount of amount of leave taken(by hours) by each health issue",
         showlegend=TRUE, grid=list(rows=3, columns=3))
healthpie 

Tab 5

burnout<-dfclean %>%
  select(Gender,Burnout_Symptoms) %>%
  mutate(symptoms=factor(Burnout_Symptoms, levels=c("Yes", "No", "Occasional"))) %>%
  group_by(Gender,symptoms) %>%
  summarise(counts=n(), .groups='keep') %>%
  group_by(symptoms) %>%
  #takes sum of all one gender and then divide that by the symptoms 
  mutate(percent_total=round(100*counts/sum(counts),1))%>% 
  data.frame()
plot_ly(hole=0.7)%>% 
  layout(title="burnout symptoms within genders")%>% 
  add_trace(data=burnout[burnout$Gender=="Female",],
            labels=~symptoms,
            values~burnout[burnout$Gender=="Female", "percent_total"],
            type="pie",
            textposition="inside",
            hovertemplate= "Gender:Female <br> Burnout_symptom:%{label}<br>Percent:%{percent}%<br> burnout symptom:%{value}<extra></extra>") %>%
  add_trace(data=burnout[burnout$Gender=="Male",],
            labels=~symptoms,
            values~burnout[burnout$Gender=="Male", "counts"],
            type="pie",
            textposition="inside",
            hovertemplate= "Gender:Male <br> Burnout_symptom:%{label}<br>Percent:%{percent}%<br> burnout symptom:%{value}<extra></extra>",
            domain=list(
              x=c(0.16,0.84),
              y=c(0.16,0.84)))%>%
  add_trace(data=burnout[burnout$Gender=="Non-Binary",],
            labels=~symptoms,
            values~burnout[burnout$Gender=="Non-Binary", "counts"],
            type="pie",
            textposition="inside",
            hovertemplate= "Gender:NonBinary <br> Burnout_symptom:%{label}<br>Percent:%{percent}% <br> burnout symptom:%{value}<extra></extra>",
            domain=list(
               x=c(0.27,0.73),
                y=c(0.27,0.73)))