Harp 325 Midterm Revised

Use the following data to produce 1 table of summary information and 2-3 graphs.

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year.

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units.

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend.

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don’t overthink it!

#setwd("~/Binghamton/harp325")
data <- read.csv("occupation_gender_race.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)




table1 <- data %>% group_by(job_type, year) %>% filter(job_type %in% c("computer_all", "total", "professional")) %>% summarise(Average.Black=mean(Black), Average.Women=mean(Women), Average.Asian=mean(Asian), Average.Hispanic.Latino=mean(Hispanic.Latino)) #Creates a table for the average(mean) computer jobs, total jobs, and professional jobs over the years between different ethnic and gender demographics.

## `summarise()` has grouped output by 'job_type'. You can override using the
## `.groups` argument.

table1

table2 <- data %>% group_by(job_type, year) %>% filter(job_type %in% c("computer_all", "total", "professional")) %>% summarise(Total.Black=sum(Black), Total.Women=sum(Women), Total.Asian=sum(Asian), Total.Hispanic.Latino=sum(Hispanic.Latino)) #Creates a table for the total(sum) computer jobs, total jobs, and professional jobs over the years between different ethnic and gender demographics.

## `summarise()` has grouped output by 'job_type'. You can override using the
## `.groups` argument.

table2

ggplot(table1%>% filter(job_type %in% c("computer_all","professional","total"))) +
  geom_line(aes(x=year, y =Average.Asian, color =job_type)) + 
  labs(y = "Total", x = "Year",
       title = "Asians in work place over time",
       color="Job Type")

  scale_color_manual(labels = c("computer_all","professional","total"),
                     values = c("green", "blue","red"))

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: colour
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     get_transformation: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: computer_all professional total
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: grey50
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

  #Creates a line graph using ggplot for Asians in the work place over time from table 1, using only the Average.Asian variable contained in the table.

table1 <- table1 %>% filter(year ==2020)
table1

ggplot(data = table1, aes(x = job_type, y = Average.Asian, fill= job_type)) +
  geom_col() +
  labs(title = "Average percent of Asians by Occupation in the year 2020",
       x = "Occupation",
       y = "Average percent of Asians") +
  scale_fill_manual(values = c("blue", "red", "green")) #Creates graph showing percent of Asians by Occupation in the year 2020, containing percent of Asians in computer and professional occupations. The third column is the total percent of Asians in the workplace who have worked for more than 16 years.

table2 <- table2 %>% filter(year ==2020)
table2

ggplot(table2)+ 
  geom_point(aes(x=job_type,y=Total.Black, color="Black"))+
  geom_point(aes(x=job_type,y=Total.Asian, color="Asian"))+
  geom_point(aes(x=job_type,y=Total.Hispanic.Latino,color="Hispanic"))+
  labs(title = "Ethnicity Prevalence by Occupation in 2020",
       x = "Job Type",
       y = "Ethnic Prevalence",
       color="Ethinicity type")

scale_color_manual(labels = c("Black","Asian","Hispanic"),
                     values = c("green", "blue","red")) # #Creates a point graph for ethnicity prevalence by occupation job types of computer, professional, and total workers employed for 16 years, among Asian, Black, and Hispanic ethnicity types.

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: colour
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     get_transformation: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: Black Asian Hispanic
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: grey50
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

Analysis: My first graph consists of a line graph illuminating that the percentage of Asian employees in computer jobs surpasses their prominence in areas of professional jobs and the overall workforce roles over the decade 2010-2020. The graph shows that the computer line is plotted significantly above the other two lines reflecting total workforce and professional job occupations for Asians. This reveals the disproportionate concentration of Asians in tech sector jobs compared to their percentage in overall workforce employment.

In my second graph, visualizing the average percentage of Asian employees across three occupational categories, consisting of computer jobs, professional occupations, and the total workforce of workers working over 16 years. This data was taken from the year 2020, the most recent year, providing a snapshot of the current state of Asian diversity in these workforces. In 2020, Asians accounted for more than 20% of computer job workers, but only 10 percent of professional jobs and less than 10 percent of the total work force that has worked more than 16 years. Looking at this single year, rather than the full 2010-2020 trend, allows us to examine carefully the most recent diversity figures. This was important to visualize into a graph. We learned in class that there are different concentrations of groups throughout professions, with careers in technology and sciences consistently being mentioned in the conversations concerning diversity.

In my third and final graph, I created a scatter plot comparing Black, Asian, and Hispanic prevalence across the same categories from graph 2. I was inspired to create this graph after learning about the difficulties that a coworker was facing in applying for a job in computer science in the United States as a result of his non-citizen status. This compelled me to want to examine trends or differences in occupation based on race, as I believed I would see a disparity between ethnic groups across specific professions. To my surprise, the computer industry confirmed my suspicions of a major gap, where Asian workers dominate that field. Meanwhile, they are nearly completely left out of the professional work force and total workforce of workers laboring for more than 16 years. In terms of total workforce, Hispanics dominate the bulk of workers with 16 years of experience. Black and Hispanics share a very similar distribution in terms of professional job types, both being around 10%. Ultimately, the graph clearly displays the 2020 ethnic variations between necessary modern career options, providing a concrete baseline for continued diversity efforts.