Use the following data to produce 1 table of summary information and 2-3 graphs.

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year.

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units.

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend.

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don’t overthink it!

#setwd("~/Binghamton/harp325")
data <- read.csv("occupation_gender_race.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

library(dplyr)
library(ggplot2)
library(tidyr) #installing tidyr so I can use the pivot_longer() function
library(ggthemes) #installing ggthemes so that I can use the minimal theme for my line graph
str(data) #checking type of data
'data.frame':   70 obs. of  8 variables:
 $ job_type   : chr  "total" "professional" "computer_all" "computer" ...
 $ description: chr  "Total, 16 years and over" "Professional and related occupations" "Computer and mathematical occupations" "Computer and information research scientists" ...
 $ year       : int  2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
 $ All        : num  147795 36502 5603 42 594 ...
 $ Women      : num  46.8 57 25.2 NA 35.6 11.4 21.1 19.4 25.1 27.8 ...
 $ Black      : num  12.1 10.5 9.1 NA 9.7 11.9 6.3 6.2 12 3.7 ...
 $ Asian      : num  6.4 10.1 23 NA 18.7 6.9 28.3 34.1 29.6 16.2 ...
 $ Latino     : num  17.6 10.1 8.4 NA 8.1 15.8 6.6 5.9 9.2 5.9 ...
computer_data <- data %>% filter(job_type=="computer_all") 
ggplot(computer_data)+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Latino, color = "Latino"))+
  labs(y = "Percent of the workers", x = "Year", title = "Various Demographics in Computer Science and Mathematics related careers from 2005 to 2020", color = "Demographic group")+
  theme_minimal()

NA

As this graph shows, there has been various changes in diversity in computer science and mathematics related careers. Out of the three racial groups, the asian demographic group have seen the most growth in representation. The asian group is also significantly more represented than the black group and Latino group. In 2005, there was a 7% difference between the asian group and the black and Latino group. This gap has only widened over time. The percentage of Latino and black people has increased from 2005. However this growth pales in comparison to the growth that the Asian demographic group has. The percentage of women in the field is the only observed group that has decreased over time. This is alarming because the gender diversity was already poor and it is only decreasing over the years.

summary_data<-data %>% 
  group_by(year) %>% 
  filter(job_type=="computer") %>% 
  mutate("POC"=Black+Asian+Latino) %>% 
  summarise(avg_POC_percent=mean(POC,na.rm=T),
            min_POC_percent=min(POC,na.rm=T),
            max_POC_percent=max(POC,na.rm=T),
            avg_women_percent=mean(Women,na.rm=T),
            min_women_percent=min(Women,na.rm=T),
            max_women_percent=max(Women,na.rm=T),
            avg_Black_percent=mean(Black,na.rm=T),
            min_Black_percent=min(Black,na.rm=T),
            max_Black_percent=max(Black,na.rm=T),
            avg_Asian_percent=mean(Asian,na.rm=T),
            min_Asian_percent=min(Asian,na.rm=T),
            max_Asian_percent=max(Asian,na.rm=T),
            avg_Latino_percent=mean(Latino,na.rm=T),
            min_Latino_percent=min(Latino,na.rm=T),
            max_Latino_percent=max(Latino,na.rm=T))

This summary table captures the average

bar_data<-data %>% filter(job_type=="computer_all") %>% select(-c(job_type,description))
bar_data$Women<-bar_data$Women*bar_data$All/100
bar_data$Black<-bar_data$Black*bar_data$All/100
bar_data$Latino<-bar_data$Latino*bar_data$All/100
bar_data$Asian<-bar_data$Asian*bar_data$All/100

long_data <- bar_data %>% 
  #The column titles become the categories in a new column after the reshaping
  #I'm naming this new column Gender
  #The number of students are put in a new values column, which I'm calling Majors
  pivot_longer(-year, names_to = "Demographic", values_to = "Careers")

ggplot(long_data, aes(x = Demographic, y = Careers, group = Demographic, fill = Demographic)) +
  geom_bar(stat="Identity") +
  labs(y = "Number of people employed", x = "Groups", 
       title = "Demographic Groups in Computer Science and Mathemathics Occupations, 2020",
       color = "Gender")+
  scale_fill_manual(values = c("red","green","blue","yellow","black"))+
  theme(legend.position="none")

LS0tCnRpdGxlOiAiSGFycCAzMjUgTWlkdGVybSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKVXNlIHRoZSBmb2xsb3dpbmcgZGF0YSB0byBwcm9kdWNlIDEgdGFibGUgb2Ygc3VtbWFyeSBpbmZvcm1hdGlvbiBhbmQgMi0zIGdyYXBocy4gCgpOb3RlcyBvbiB0aGUgZGF0YTogdGhlIHRvdGFsLCBwcm9mZXNzaW9uYWwsIGFuZCBjb21wdXRlcl9hbGwgam9iIGdyb3VwcyBhcmUgYXZhaWxhYmxlIGZvciBhbGwgZm91ciB5ZWFycyBvZiBkYXRhLiBGb2N1cyBvbiB0aGVzZSBpZiB5b3Ugd2FudCB0byBwcm9kdWNlIGdyYXBocyBvZiBqb2JzIG92ZXIgdGltZS4gVGhlIGluZGl2aWR1YWwgb2NjdXBhdGlvbnMgY2hhbmdlIGZyb20gb25lIHllYXIgdG8gdGhlIG5leHQsIHNvIHlvdSB3aWxsIG5vdCBiZSBhYmxlIHRvIGdyYXBoIHRoZW0gb3ZlciB0aW1lICh3aXRoIHRoZSBleGNlcHRpb24gb2YgY29tcHV0ZXIgcHJvZ3JhbW1lcnMpLiBZb3UgbWF5LCBob3dldmVyLCBmaWx0ZXIgb25lIHllYXIgb2YgZGF0YSBhbmQgbWFrZSBhIGJhciBvciBwb2ludCBwbG90IGZvciBlYWNoIG9jY3VwYXRpb24gaW4gdGhhdCB5ZWFyLiAKClRoZSBBbGwgdmFyaWFibGUgaXMgbWVhc3VyZWQgYXMgdG90YWwgam9icyBpbiB0aGF0IGNhdGVnb3J5LCB3aGlsZSBXb21lbiwgQmxhY2ssIEFzaWFuLCBhbmQgTGF0aW5vIGFyZSB0aGUgcGVyY2VudCBvZiB3b3JrZXJzIHdobyBpZGVudGlmeSB3aXRoIGVhY2ggZ3JvdXAuIFlvdSBjYW5ub3QgcHV0IEFsbCBvbiB0aGUgc2FtZSBwbG90IGFzIG9uZSBvZiB0aGVzZSB2YXJpYWJsZXMsIHNpbmNlIHRoZXkgYXJlIG1lYXN1cmVkIGluIGRpZmZlcmVudCB1bml0cy4gCgpUaGUgZ29hbCBvZiB0aGUgYXNzaWdubWVudCBpcyBub3Qgb25seSB0byBwcmFjdGljZSBtYWtpbmcgcGxvdHM6IHlvdXIgdGFzayBpcyB0byBwcmVzZW50IHRoZSBkYXRhIGluIGEgY2xlYXIgYW5kIG1lYW5pbmdmdWwgd2F5LiBQb2ludHMgd2lsbCBiZSBkZWR1Y3RlZCwgZm9yIGV4YW1wbGUsIGZyb20gcGxvdHMgdGhhdCBhcmUgaGFyZCB0byByZWFkIG9yIHVuZGVyc3RhbmQuIFlvdSBhcmUgYWxzbyBlbmNvdXJhZ2VkIHRvIHRoaW5rIGFib3V0IGhvdyB0byB1c2UgY29sb3JzLCBsYWJlbHMsIGFuZCB0aGVtZXMgZWZmZWN0aXZlbHkuIEdyYXBocyB3aXRoIG11bHRpcGxlIGdyb3VwcyBtdXN0IGhhdmUgYSBsZWdlbmQuIAoKQWx0aG91Z2ggeW91IHdpbGwgbmVlZCB0byBmaWx0ZXIgdGhlIGRhdGEsIGl0IGRvZXMgbm90IG5lZWQgdG8gYmUgY2xlYW5lZCB1cCB0b28gbXVjaCBpbiBvcmRlciB0byBiZSBncmFwaGVkLiBEb24ndCBvdmVydGhpbmsgaXQhCgpgYGB7cn0KI3NldHdkKCJ+L0JpbmdoYW10b24vaGFycDMyNSIpCmRhdGEgPC0gcmVhZC5jc3YoIm9jY3VwYXRpb25fZ2VuZGVyX3JhY2UuY3N2Iiwgc3RyaW5nc0FzRmFjdG9ycyA9IEYsIGZpbGVFbmNvZGluZz0iVVRGLTgtQk9NIikKCmxpYnJhcnkoZHBseXIpCmxpYnJhcnkoZ2dwbG90MikKbGlicmFyeSh0aWR5cikgI2luc3RhbGxpbmcgdGlkeXIgc28gSSBjYW4gdXNlIHRoZSBwaXZvdF9sb25nZXIoKSBmdW5jdGlvbgpsaWJyYXJ5KGdndGhlbWVzKSAjaW5zdGFsbGluZyBnZ3RoZW1lcyBzbyB0aGF0IEkgY2FuIHVzZSB0aGUgbWluaW1hbCB0aGVtZSBmb3IgbXkgbGluZSBncmFwaApgYGAKYGBge3J9CnN0cihkYXRhKSAjY2hlY2tpbmcgdHlwZSBvZiBkYXRhCmBgYApgYGB7cn0KY29tcHV0ZXJfZGF0YSA8LSBkYXRhICU+JSBmaWx0ZXIoam9iX3R5cGU9PSJjb21wdXRlcl9hbGwiKSAKZ2dwbG90KGNvbXB1dGVyX2RhdGEpKwogIGdlb21fbGluZShhZXMoeCA9IHllYXIsIHkgPSBXb21lbiwgY29sb3IgPSAiV29tZW4iKSkrCiAgZ2VvbV9saW5lKGFlcyh4ID0geWVhciwgeSA9IEJsYWNrLCBjb2xvciA9ICJCbGFjayIpKSsKICBnZW9tX2xpbmUoYWVzKHggPSB5ZWFyLCB5ID0gQXNpYW4sIGNvbG9yID0gIkFzaWFuIikpKwogIGdlb21fbGluZShhZXMoeCA9IHllYXIsIHkgPSBMYXRpbm8sIGNvbG9yID0gIkxhdGlubyIpKSsKICBsYWJzKHkgPSAiUGVyY2VudCBvZiB0aGUgd29ya2VycyIsIHggPSAiWWVhciIsIHRpdGxlID0gIlZhcmlvdXMgRGVtb2dyYXBoaWNzIGluIENvbXB1dGVyIFNjaWVuY2UgYW5kIE1hdGhlbWF0aWNzIHJlbGF0ZWQgY2FyZWVycyBmcm9tIDIwMDUgdG8gMjAyMCIsIGNvbG9yID0gIkRlbW9ncmFwaGljIGdyb3VwIikrCiAgdGhlbWVfbWluaW1hbCgpCiAgCmBgYApBcyB0aGlzIGdyYXBoIHNob3dzLCB0aGVyZSBoYXMgYmVlbiB2YXJpb3VzIGNoYW5nZXMgaW4gZGl2ZXJzaXR5IGluIGNvbXB1dGVyIHNjaWVuY2UgYW5kIG1hdGhlbWF0aWNzIHJlbGF0ZWQgY2FyZWVycy4gT3V0IG9mIHRoZSB0aHJlZSByYWNpYWwgZ3JvdXBzLCB0aGUgYXNpYW4gZGVtb2dyYXBoaWMgZ3JvdXAgaGF2ZSBzZWVuIHRoZSBtb3N0IGdyb3d0aCBpbiByZXByZXNlbnRhdGlvbi4gVGhlIGFzaWFuIGdyb3VwIGlzIGFsc28gc2lnbmlmaWNhbnRseSBtb3JlIHJlcHJlc2VudGVkIHRoYW4gdGhlIGJsYWNrIGdyb3VwIGFuZCBMYXRpbm8gZ3JvdXAuIEluIDIwMDUsIHRoZXJlIHdhcyBhIDclIGRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgYXNpYW4gZ3JvdXAgYW5kIHRoZSBibGFjayBhbmQgTGF0aW5vIGdyb3VwLiBUaGlzIGdhcCBoYXMgb25seSB3aWRlbmVkIG92ZXIgdGltZS4gVGhlIHBlcmNlbnRhZ2Ugb2YgTGF0aW5vIGFuZCBibGFjayBwZW9wbGUgaGFzIGluY3JlYXNlZCBmcm9tIDIwMDUuIEhvd2V2ZXIgdGhpcyBncm93dGggcGFsZXMgaW4gY29tcGFyaXNvbiB0byB0aGUgZ3Jvd3RoIHRoYXQgdGhlIEFzaWFuIGRlbW9ncmFwaGljIGdyb3VwIGhhcy4KVGhlIHBlcmNlbnRhZ2Ugb2Ygd29tZW4gaW4gdGhlIGZpZWxkIGlzIHRoZSBvbmx5IG9ic2VydmVkIGdyb3VwIHRoYXQgaGFzIGRlY3JlYXNlZCBvdmVyIHRpbWUuIFRoaXMgaXMgYWxhcm1pbmcgYmVjYXVzZSB0aGUgZ2VuZGVyIGRpdmVyc2l0eSB3YXMgYWxyZWFkeSBwb29yIGFuZCBpdCBpcyBvbmx5IGRlY3JlYXNpbmcgb3ZlciB0aGUgeWVhcnMuIApgYGB7cn0Kc3VtbWFyeV9kYXRhPC1kYXRhICU+JSAKICBncm91cF9ieSh5ZWFyKSAlPiUgCiAgZmlsdGVyKGpvYl90eXBlPT0iY29tcHV0ZXIiKSAlPiUgCiAgbXV0YXRlKCJQT0MiPUJsYWNrK0FzaWFuK0xhdGlubykgJT4lIAogIHN1bW1hcmlzZShhdmdfUE9DX3BlcmNlbnQ9bWVhbihQT0MsbmEucm09VCksCiAgICAgICAgICAgIG1pbl9QT0NfcGVyY2VudD1taW4oUE9DLG5hLnJtPVQpLAogICAgICAgICAgICBtYXhfUE9DX3BlcmNlbnQ9bWF4KFBPQyxuYS5ybT1UKSwKICAgICAgICAgICAgYXZnX3dvbWVuX3BlcmNlbnQ9bWVhbihXb21lbixuYS5ybT1UKSwKICAgICAgICAgICAgbWluX3dvbWVuX3BlcmNlbnQ9bWluKFdvbWVuLG5hLnJtPVQpLAogICAgICAgICAgICBtYXhfd29tZW5fcGVyY2VudD1tYXgoV29tZW4sbmEucm09VCksCiAgICAgICAgICAgIGF2Z19CbGFja19wZXJjZW50PW1lYW4oQmxhY2ssbmEucm09VCksCiAgICAgICAgICAgIG1pbl9CbGFja19wZXJjZW50PW1pbihCbGFjayxuYS5ybT1UKSwKICAgICAgICAgICAgbWF4X0JsYWNrX3BlcmNlbnQ9bWF4KEJsYWNrLG5hLnJtPVQpLAogICAgICAgICAgICBhdmdfQXNpYW5fcGVyY2VudD1tZWFuKEFzaWFuLG5hLnJtPVQpLAogICAgICAgICAgICBtaW5fQXNpYW5fcGVyY2VudD1taW4oQXNpYW4sbmEucm09VCksCiAgICAgICAgICAgIG1heF9Bc2lhbl9wZXJjZW50PW1heChBc2lhbixuYS5ybT1UKSwKICAgICAgICAgICAgYXZnX0xhdGlub19wZXJjZW50PW1lYW4oTGF0aW5vLG5hLnJtPVQpLAogICAgICAgICAgICBtaW5fTGF0aW5vX3BlcmNlbnQ9bWluKExhdGlubyxuYS5ybT1UKSwKICAgICAgICAgICAgbWF4X0xhdGlub19wZXJjZW50PW1heChMYXRpbm8sbmEucm09VCkpCgpgYGAKVGhpcyBzdW1tYXJ5IHRhYmxlIGNhcHR1cmVzIHRoZSBhdmVyYWdlIApgYGB7cn0KYmFyX2RhdGE8LWRhdGEgJT4lIGZpbHRlcihqb2JfdHlwZT09ImNvbXB1dGVyX2FsbCIpICU+JSBzZWxlY3QoLWMoam9iX3R5cGUsZGVzY3JpcHRpb24pKQpiYXJfZGF0YSRXb21lbjwtYmFyX2RhdGEkV29tZW4qYmFyX2RhdGEkQWxsLzEwMApiYXJfZGF0YSRCbGFjazwtYmFyX2RhdGEkQmxhY2sqYmFyX2RhdGEkQWxsLzEwMApiYXJfZGF0YSRMYXRpbm88LWJhcl9kYXRhJExhdGlubypiYXJfZGF0YSRBbGwvMTAwCmJhcl9kYXRhJEFzaWFuPC1iYXJfZGF0YSRBc2lhbipiYXJfZGF0YSRBbGwvMTAwCgpsb25nX2RhdGEgPC0gYmFyX2RhdGEgJT4lIAogICNUaGUgY29sdW1uIHRpdGxlcyBiZWNvbWUgdGhlIGNhdGVnb3JpZXMgaW4gYSBuZXcgY29sdW1uIGFmdGVyIHRoZSByZXNoYXBpbmcKICAjSSdtIG5hbWluZyB0aGlzIG5ldyBjb2x1bW4gR2VuZGVyCiAgI1RoZSBudW1iZXIgb2Ygc3R1ZGVudHMgYXJlIHB1dCBpbiBhIG5ldyB2YWx1ZXMgY29sdW1uLCB3aGljaCBJJ20gY2FsbGluZyBNYWpvcnMKICBwaXZvdF9sb25nZXIoLXllYXIsIG5hbWVzX3RvID0gIkRlbW9ncmFwaGljIiwgdmFsdWVzX3RvID0gIkNhcmVlcnMiKQoKZ2dwbG90KGxvbmdfZGF0YSwgYWVzKHggPSBEZW1vZ3JhcGhpYywgeSA9IENhcmVlcnMsIGdyb3VwID0gRGVtb2dyYXBoaWMsIGZpbGwgPSBEZW1vZ3JhcGhpYykpICsKICBnZW9tX2JhcihzdGF0PSJJZGVudGl0eSIpICsKICBsYWJzKHkgPSAiTnVtYmVyIG9mIHBlb3BsZSBlbXBsb3llZCIsIHggPSAiR3JvdXBzIiwgCiAgICAgICB0aXRsZSA9ICJEZW1vZ3JhcGhpYyBHcm91cHMgaW4gQ29tcHV0ZXIgU2NpZW5jZSBhbmQgTWF0aGVtYXRoaWNzIE9jY3VwYXRpb25zLCAyMDIwIiwKICAgICAgIGNvbG9yID0gIkdlbmRlciIpKwogIHNjYWxlX2ZpbGxfbWFudWFsKHZhbHVlcyA9IGMoInJlZCIsImdyZWVuIiwiYmx1ZSIsInllbGxvdyIsImJsYWNrIikpKwogIHRoZW1lKGxlZ2VuZC5wb3NpdGlvbj0ibm9uZSIpCmBgYAoK