Use the following data to produce 1 table of summary information and 2-3 graphs.

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year.

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units.

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend.

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don’t overthink it!

#setwd("~/Binghamton/harp325")
data <- read.csv("occupation_gender_race.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
#This table is the summary table including two professions. It is NOT used in any of the plots.
summary <- data %>% filter(description %in% c('Total, 16 years and over','Computer and mathematical occupations','Actuaries','Statisticians'))

#Data frame for professional job_type
summary.professional <- data %>% filter(job_type %in% c('professional'))
#Data frame for total job_type
summary.total <- data %>% filter(job_type %in% c('total'))
#Data frame for computer_all job_type
summary.computer.all <- data %>% filter(job_type %in% c('computer_all'))

#ggplot for computer jobs by percent ethnicity over time
ggplot(summary.computer.all) + geom_line(aes(x = year, y = Black, color = "Black")) + geom_line(aes(x = year, y = Asian, color = "Asian")) + geom_line(aes(x = year, y = Hispanic.Latino, color = "Hispanic/Latino")) + scale_color_manual(labels = c("Black","Asian","Hispanic/Latino"), values = c("red","blue","green")) + theme_minimal() + labs(y = "Percent Workers", x = "Year", title = "Percent Ethnicity Over Time for Computer Jobs", color = "Race")

#ggplot for professional jobs by percent ethnicity over time
ggplot(summary.professional) + geom_line(aes(x = year, y = Black, color = "Black")) + geom_line(aes(x = year, y = Asian, color = "Asian")) + geom_line(aes(x = year, y = Hispanic.Latino, color = "Hispanic/Latino")) + scale_color_manual(labels = c("Black","Asian","Hispanic/Latino"), values = c("red","blue","green")) + theme_minimal() + labs(y = "Percent Workers", x = "Year", title = "Percent Ethnicity Over Time for Professional Jobs", color = "Race")

#ggplot for total jobs by percent ethnicity over time
ggplot(summary.total) + geom_line(aes(x = year, y = Black, color = "Black")) + geom_line(aes(x = year, y = Asian, color = "Asian")) + geom_line(aes(x = year, y = Hispanic.Latino, color = "Hispanic/Latino")) + scale_color_manual(labels = c("Black","Asian","Hispanic/Latino"), values = c("red","blue","green")) + theme_minimal() + labs(y = "Percent Workers", x = "Year", title = "Percent Ethnicity Over Time for Total Jobs", color = "Race")

#Key Findings:
#The analysis shows that the percentage of black employees in computer jobs has increased significantly the past two decades, but it is still below 25%.
#For professional jobs, diversity has increased slightly, but the percentages of black, asian, and hispanic/latino workers is still only around 10% each.
#For percent ethnicity in total jobs, there has been some increases but not much. Hispanic/latino workers make up about 17.5% of total workers, while black workers make up about only 6%.
#In conclusion, while diversity has steadily increased in all fields over the past two decades, the diversity levels still remain low in some fields, such as in professional jobs. This may indicate a lack of diversity in professions that require higher education, and thus a lack of diversity in those who are able to attend higher education.