This section of our code loads the data into the markdown file.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
This filters our data into our specific firm, firm 2.
data<- read.csv("F2019_Section_4.csv")
group_2<- data %>%
filter(firmid==2)
In this section of our code, we group our firm 2 data into specific categories. The first one we group by age which we use in the further coding to group into young and old.
group_2<-group_2 %>%
mutate(age=2019- birthyear) %>%
mutate(old= ifelse(age>=45, 1, 0))
group_2$old<- factor(group_2$old, labels = c("Young", "Old"))
group_2$age[1:10]
## [1] 49 41 53 49 65 65 69 57 60 33
group_2$old[1:10]
## [1] Old Young Old Old Old Old Old Old Old Young
## Levels: Young Old
group_2<- group_2 %>%
mutate(age_group= cut(age, breaks = c(19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74), labels=c("20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "50-64", "65-69", "70-74")))
group_2$age[1:5]
## [1] 49 41 53 49 65
group_2$age_group[1:5]
## [1] 45-49 40-44 50-54 45-49 65-69
## 11 Levels: 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 50-64 ... 70-74
In this section, we filtered the data to create age requirements for young and old. We also created a filter to contain workers that will become older within the next five years.
older_data<-group_2 %>%
filter(birthyear<=1964)
younger_data<- group_2 %>%
filter(birthyear>=1965)
Becoming_OldWorkers<-group_2 %>%
filter(birthyear>=1964, birthyear<=1969)
Retiring_Workers<-group_2 %>%
filter(age>=60, age<=65)
The following code contains the chi squared test that we talked about on slide 3. The information is within our notes. It compares our specific firm to the CIA World Fact Book and gives us the statistical significance that the two data sets are comparable.
chisq.test(table(group_2$age_group), p=c(.0729, .0806, .0452, .0964, .0962, .0896, .0918, .1183, .1205, .1031, .0854))
##
## Chi-squared test for given probabilities
##
## data: table(group_2$age_group)
## X-squared = 634.52, df = 10, p-value < 2.2e-16
The next three lines of code contains our graphs that we have in our presentation. The first code is in slide 7. This graph shows the distribution of young and old workers based off occupation. We filtered out several different occupations to only show the four most relevant ones.
important_occupations<-group_2 %>%
filter(occupation=="Sales"| occupation=="Service"| occupation=="Production"| occupation=="Office and Administrative")
ggplot(important_occupations, aes(x=occupation, fill=old)) +
geom_bar() +
coord_flip() +
scale_fill_manual(values=c("red", "blue", "green", "purple")) +
labs(x = "Occupation",
y = "Count",
title = "Number of Workers by Occupation") +
theme_classic(base_size = 14) +
theme(legend.title = element_blank(), legend.position = "top")
This code is on slide 6. This graph is displaying the distribution of young and old workers witin their different levels of education.
bar2 <- ggplot(group_2, aes(x=education, fill = old)) +
geom_bar() +
coord_flip() +
scale_fill_manual(values=c("red", "blue")) +
labs(x = "",
y = "Number of Employees",
title = "Number of Employees by Education Level") +
theme_classic(base_size = 14) +
theme(legend.title = element_blank(), legend.position = "top")
bar2
This code is on slide 4. This graph is showing the comparison of wage between young and old workers based off of their education level.
ggplot(group_2, aes(x=education, y=dailywage, fill = old)) +
stat_summary(fun.data = mean_sdl, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.3) +
facet_wrap(~ old, nrow = 2) +
coord_flip() +
labs(x = "Education",
y = "Daily Wage",
title = "Comparison of Young/Old Pay") +
scale_fill_manual(values=c("red", "blue")) +
theme_classic(base_size = 14) +
theme(legend.position = "none")
Overall, we have found that there is a large amount of older workers leaving the sales department, with few younger workers replacing them. There is a large amount of younger workers coming into service. Because of this, we find it essential to fill these gaps in the company. We can try to fill these positions by more training or looking into possible immigration of workers.
All graphs can be viewed on slide 11.