This document will show multiple graphs about how the stress levels and other factors can affect a workplace environment.
This data set comes from Kaggle. It explores the multiple stress factors in an Indian workplace. There are 50,000 entries that can range from demographics to workplace dynamics.
url: (https://www.kaggle.com/datasets/ashaychoudhary/corporate-stress-dataset-insights-into-workplace/code)
Here are some findings that was noticed during the analysis and where
it can be found. In “Stress Levels Among the Age Groups and
Genders” aka Tab 1 , we can see that the highest stress level
is at 10 at 4482 counts and lowest is at 0 at 4442 counts.The stacked
bar also shows that there are certain age groups are more stressed than
others. Here are the ones with the most stress: The female and male
group ages (30-40) and (50-60), non-binary group ages (20-30) and
(60-70). Looking at Tab 2 (Average Stress Levels throughout the
years) we are looking at the years of experience vs the the
average stress level. While I originally assumed that the first five
years would be the most stressful we are actually wrong in which year 8
has the highest average stress level at 5.19 while the lowest stress is
at 4.79 at the 16th year of experience. We can see that year 2 has the
second lowest stress level. Tab 3 (Intensity levels for stress
in a workplace size) can help companies understand the they
need to provide more or less resources towards a company especially if
it’s a large chain that varies in size. The darkest purple represents
the highest number of counts which happens to be at a small company at
stress level 5 with a count of 1,635. We can assume that employees at a
small company will be more stressed than that of a larger company.
Tab 4 (Amount of leave taken by each health issue)
allows us to explore what health issues the employee has. It’s important
to understand this in order for a company to retain employees and
understand if they are able to allocate more hours of leave. This
category is broken into 4 groups; physical, mental, none, and both. We
can see that HR has reported the highest percentage of reporting both
health issues and requesting leave. Lastly, Tab 5 (Burnout
Symptoms within genders) shows the distribution of genders and
whether or not they have experienced being burnout. Oddly enough we can
see that it’s an even distribution across the board. While it is
surprising that the distribution it would be nice if we can see if there
was an effective point system for the employee to fall into a specific
category.
Ultimately, we can see there are a variety of reasons as to what
stressers can be at a workplace. In the future, it would be interesting
to see how this changes over time.
#set directory
setwd("C:/Users/jenni/OneDrive/Documents/ds736data")
# Load all our libraries
library(data.table)
library(ggplot2)
library(dplyr)
library(scales)
library(lubridate)
library(plotly)
library(scales)
library(RColorBrewer)
library(ggthemes)
library(tidyr)
library(ggrepel)
library(htmlwidgets)
library(cowplot)
library(ggpubr)
library(flexdashboard)
library(bslib)
library(rmarkdown)
# here we are using an csv file
filename<-read.csv("corporate_stress_dataset.csv")
#we want to make the ID the new index since it looks to be a counter and it's not an employee badge or an overly unique identifier
#make new column
filename$ID<-c(1:nrow(filename))
rownames(filename)<-filename$ID
#since there are NA i want to remove it completely from the data set
# would be using drop na from library tidyr
dfclean<- filename %>% drop_na()
#look at stress level and the amt of times it's been used
dfstress<-count(dfclean,Stress_Level)
ggplot(dfclean, aes(x=Age, y=Stress_Level, fill=Gender)) +
geom_bar(stat="identity") + #graph type
coord_flip() + #flip graph
scale_y_continuous(labels=comma)+ #add commas
theme_few()+
labs(title=" Stress levels among the Age groups within Gender", x="Age", y="Stress level")+
theme(plot.title=element_text(hjust=0.5))+ # making it centered
scale_fill_brewer(palette="Pastel1") #color changer
#let's make a new df just about stress levels and years of experience
experiencedf<- dfclean %>% # original data
select(Experience_Years, Stress_Level) %>% # what we want from og data
group_by(Experience_Years)%>% # what are we grouping by
summarise(avgstresslvl= mean(Stress_Level)) %>% # getting avg of stress level per ex lvl
data.frame() # make a new data frame
# what's the min and max points? we will plot these
hilo<-experiencedf %>%
filter(avgstresslvl==min(avgstresslvl)| avgstresslvl ==max(avgstresslvl)) %>%
data.frame()
#MAKING X AXIS LABELS
xaxis<-min(experiencedf$Experience_Years) :max(experiencedf$Experience_Years)
ggplot(experiencedf, aes(x= Experience_Years, y= avgstresslvl))+
geom_line(color="purple",size=0.5)+
geom_point(shape=21, size=3, color="black",fill='white')+
theme_light()+
labs(title="Average Stress Levels throughout the years", x="experience year", y=" AVG Stress level")+
scale_x_continuous(labels=xaxis, breaks= xaxis, minor_breaks = NULL)+
theme(plot.title=element_text(hjust=0.5))+
geom_point(data=hilo, aes(x=Experience_Years, y=avgstresslvl), shape=21, size=4, fill= "black", color="black")+
geom_label_repel(aes(label = ifelse(avgstresslvl == max(avgstresslvl) | avgstresslvl == min(avgstresslvl), round(avgstresslvl,2), "")),
box.padding = 1, point.padding = 1, size = 4, color = 'Grey50', segment.color = "purple")
# make a dataframe that we want to pull from
size<- dfclean %>%
select(Stress_Level,Company_Size) %>%
group_by(Company_Size, Stress_Level) %>%
summarise(counts=n(), .groups='keep') %>%
data.frame
#order i want it to be in
sizeorder<-c('Small','Medium','Large')
#setting it
size$Company_Size <-factor(size$Company_Size, levels=sizeorder)
g<-ggplot(size, aes(x=Company_Size, y=Stress_Level, fill=counts)) +
geom_tile(color='black')+
geom_text(aes(label=comma(counts)))+
labs(title="intensity levels for stress in a workplace size",
x=" company size",
y="stress levels",
fill="counts of stess")+
theme_grey()+
theme(plot.title = element_text(hjust = 0.5))+
scale_x_discrete(limits=levels(size$Company_Size))+
scale_fill_continuous(low='white',high='purple')
ggplotly(g, height=500, width=500, tooltip=c("counts","Company_Size","Stress_Level")) %>%
style(hoverlabel=list(bgcolor='white'))
#dataframe to get the counts of health issues per department
healthdf<- dfclean %>%
select(Health_Issues,Annual_Leaves_Taken, Department) %>% #columns i want
group_by(Department, Health_Issues) %>% # what to group by
summarise(counts=n(), totleave=sum(Annual_Leaves_Taken), .groups='keep') %>% # i want ct of issues reported and sum of totleave taken
data.frame()
healthpie<-plot_ly(textposition="inside", labels= ~Health_Issues, values= ~totleave) %>%
#Sales dept
add_pie(data=healthdf[healthdf$Department== "Sales",],
name="Sales", title= "Sales", domain=list(row=0,column=0)) %>%
#HR dept
add_pie(data=healthdf[healthdf$Department== "HR",],
name="HR", title= "HR", domain=list(row=0,column=1)) %>%
#Marketing dept
add_pie(data=healthdf[healthdf$Department== "Marketing",],
name="Marketing", title= "Marketing", domain=list(row=0,column=2)) %>%
#finance dept
add_pie(data=healthdf[healthdf$Department== "Finance",],
name="Finance", title= "Finance", domain=list(row=1,column=0)) %>%
#admin dept
add_pie(data=healthdf[healthdf$Department== "Admin",],
name="Admin", title= "Admin", domain=list(row=1,column=1)) %>%
#IT dept
add_pie(data=healthdf[healthdf$Department== "IT",],
name="IT", title= "IT", domain=list(row=1,column=2)) %>%
layout(title="Trellis Chart: Amount of amount of leave taken(by hours) by each health issue",
showlegend=TRUE, grid=list(rows=3, columns=3))
healthpie
burnout<-dfclean %>%
select(Gender,Burnout_Symptoms) %>%
mutate(symptoms=factor(Burnout_Symptoms, levels=c("Yes", "No", "Occasional"))) %>%
group_by(Gender,symptoms) %>%
summarise(counts=n(), .groups='keep') %>%
group_by(symptoms) %>%
#takes sum of all one gender and then divide that by the symptoms
mutate(percent_total=round(100*counts/sum(counts),1))%>%
data.frame()
plot_ly(hole=0.7)%>%
layout(title="burnout symptoms within genders")%>%
add_trace(data=burnout[burnout$Gender=="Female",],
labels=~symptoms,
values~burnout[burnout$Gender=="Female", "percent_total"],
type="pie",
textposition="inside",
hovertemplate= "Gender:Female <br> Burnout_symptom:%{label}<br>Percent:%{percent}%<br> burnout symptom:%{value}<extra></extra>") %>%
add_trace(data=burnout[burnout$Gender=="Male",],
labels=~symptoms,
values~burnout[burnout$Gender=="Male", "counts"],
type="pie",
textposition="inside",
hovertemplate= "Gender:Male <br> Burnout_symptom:%{label}<br>Percent:%{percent}%<br> burnout symptom:%{value}<extra></extra>",
domain=list(
x=c(0.16,0.84),
y=c(0.16,0.84)))%>%
add_trace(data=burnout[burnout$Gender=="Non-Binary",],
labels=~symptoms,
values~burnout[burnout$Gender=="Non-Binary", "counts"],
type="pie",
textposition="inside",
hovertemplate= "Gender:NonBinary <br> Burnout_symptom:%{label}<br>Percent:%{percent}% <br> burnout symptom:%{value}<extra></extra>",
domain=list(
x=c(0.27,0.73),
y=c(0.27,0.73)))