Use the following data to produce 1 table of summary information and
2-3 graphs.
Notes on the data: the total, professional, and computer_all job
groups are available for all four years of data. Focus on these if you
want to produce graphs of jobs over time. The individual occupations
change from one year to the next, so you will not be able to graph them
over time (with the exception of computer programmers). You may,
however, filter one year of data and make a bar or point plot for each
occupation in that year.
The All variable is measured as total jobs in that category, while
Women, Black, Asian, and Latino are the percent of workers who identify
with each group. You cannot put All on the same plot as one of these
variables, since they are measured in different units.
The goal of the assignment is not only to practice making plots: your
task is to present the data in a clear and meaningful way. Points will
be deducted, for example, from plots that are hard to read or
understand. You are also encouraged to think about how to use colors,
labels, and themes effectively. Graphs with multiple groups must have a
legend.
Although you will need to filter the data, it does not need to be
cleaned up too much in order to be graphed. Don’t overthink it!
#setwd("~/Binghamton/harp325")
data <- read.csv("occupation_gender_race.csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")
library(dplyr)
library(ggplot2)
library(tidyr) #installing tidyr so I can use the pivot_longer() function
library(ggthemes) #installing ggthemes so that I can use the minimal theme for my line graph
str(data) #checking type of data
'data.frame': 70 obs. of 8 variables:
$ job_type : chr "total" "professional" "computer_all" "computer" ...
$ description: chr "Total, 16 years and over" "Professional and related occupations" "Computer and mathematical occupations" "Computer and information research scientists" ...
$ year : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
$ All : num 147795 36502 5603 42 594 ...
$ Women : num 46.8 57 25.2 NA 35.6 11.4 21.1 19.4 25.1 27.8 ...
$ Black : num 12.1 10.5 9.1 NA 9.7 11.9 6.3 6.2 12 3.7 ...
$ Asian : num 6.4 10.1 23 NA 18.7 6.9 28.3 34.1 29.6 16.2 ...
$ Latino : num 17.6 10.1 8.4 NA 8.1 15.8 6.6 5.9 9.2 5.9 ...
computer_data <- data %>% filter(job_type=="computer_all")
ggplot(computer_data)+
geom_line(aes(x = year, y = Women, color = "Women"))+
geom_line(aes(x = year, y = Black, color = "Black"))+
geom_line(aes(x = year, y = Asian, color = "Asian"))+
geom_line(aes(x = year, y = Latino, color = "Latino"))+
labs(y = "Percent of the workers", x = "Year", title = "Various Demographics in Computer Science and Mathematics related careers from 2005 to 2020", color = "Demographic group")+
theme_minimal()

NA
As this graph shows, there has been various changes in diversity in
computer science and mathematics related careers. Out of the three
racial groups, the asian demographic group have seen the most growth in
representation. The asian group is also significantly more represented
than the black group and Latino group. In 2005, there was a 7%
difference between the asian group and the black and Latino group. This
gap has only widened over time. The percentage of Latino and black
people has increased from 2005. However this growth pales in comparison
to the growth that the Asian demographic group has. The percentage of
women in the field is the only observed group that has decreased over
time. This is alarming because the gender diversity was already poor and
it is only decreasing over the years.
summary_data<-data %>%
group_by(year) %>%
filter(job_type=="computer") %>%
mutate("POC"=Black+Asian+Latino) %>%
summarise(avg_POC_percent=mean(POC,na.rm=T),
min_POC_percent=min(POC,na.rm=T),
max_POC_percent=max(POC,na.rm=T),
avg_women_percent=mean(Women,na.rm=T),
min_women_percent=min(Women,na.rm=T),
max_women_percent=max(Women,na.rm=T),
avg_Black_percent=mean(Black,na.rm=T),
min_Black_percent=min(Black,na.rm=T),
max_Black_percent=max(Black,na.rm=T),
avg_Asian_percent=mean(Asian,na.rm=T),
min_Asian_percent=min(Asian,na.rm=T),
max_Asian_percent=max(Asian,na.rm=T),
avg_Latino_percent=mean(Latino,na.rm=T),
min_Latino_percent=min(Latino,na.rm=T),
max_Latino_percent=max(Latino,na.rm=T))
This summary table captures the average
bar_data<-data %>% filter(job_type=="computer_all") %>% select(-c(job_type,description))
bar_data$Women<-bar_data$Women*bar_data$All/100
bar_data$Black<-bar_data$Black*bar_data$All/100
bar_data$Latino<-bar_data$Latino*bar_data$All/100
bar_data$Asian<-bar_data$Asian*bar_data$All/100
long_data <- bar_data %>%
#The column titles become the categories in a new column after the reshaping
#I'm naming this new column Gender
#The number of students are put in a new values column, which I'm calling Majors
pivot_longer(-year, names_to = "Demographic", values_to = "Careers")
ggplot(long_data, aes(x = Demographic, y = Careers, group = Demographic, fill = Demographic)) +
geom_bar(stat="Identity") +
labs(y = "Number of people employed", x = "Groups",
title = "Demographic Groups in Computer Science and Mathemathics Occupations, 2020",
color = "Gender")+
scale_fill_manual(values = c("red","green","blue","yellow","black"))+
theme(legend.position="none")

LS0tCnRpdGxlOiAiSGFycCAzMjUgTWlkdGVybSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKVXNlIHRoZSBmb2xsb3dpbmcgZGF0YSB0byBwcm9kdWNlIDEgdGFibGUgb2Ygc3VtbWFyeSBpbmZvcm1hdGlvbiBhbmQgMi0zIGdyYXBocy4gCgpOb3RlcyBvbiB0aGUgZGF0YTogdGhlIHRvdGFsLCBwcm9mZXNzaW9uYWwsIGFuZCBjb21wdXRlcl9hbGwgam9iIGdyb3VwcyBhcmUgYXZhaWxhYmxlIGZvciBhbGwgZm91ciB5ZWFycyBvZiBkYXRhLiBGb2N1cyBvbiB0aGVzZSBpZiB5b3Ugd2FudCB0byBwcm9kdWNlIGdyYXBocyBvZiBqb2JzIG92ZXIgdGltZS4gVGhlIGluZGl2aWR1YWwgb2NjdXBhdGlvbnMgY2hhbmdlIGZyb20gb25lIHllYXIgdG8gdGhlIG5leHQsIHNvIHlvdSB3aWxsIG5vdCBiZSBhYmxlIHRvIGdyYXBoIHRoZW0gb3ZlciB0aW1lICh3aXRoIHRoZSBleGNlcHRpb24gb2YgY29tcHV0ZXIgcHJvZ3JhbW1lcnMpLiBZb3UgbWF5LCBob3dldmVyLCBmaWx0ZXIgb25lIHllYXIgb2YgZGF0YSBhbmQgbWFrZSBhIGJhciBvciBwb2ludCBwbG90IGZvciBlYWNoIG9jY3VwYXRpb24gaW4gdGhhdCB5ZWFyLiAKClRoZSBBbGwgdmFyaWFibGUgaXMgbWVhc3VyZWQgYXMgdG90YWwgam9icyBpbiB0aGF0IGNhdGVnb3J5LCB3aGlsZSBXb21lbiwgQmxhY2ssIEFzaWFuLCBhbmQgTGF0aW5vIGFyZSB0aGUgcGVyY2VudCBvZiB3b3JrZXJzIHdobyBpZGVudGlmeSB3aXRoIGVhY2ggZ3JvdXAuIFlvdSBjYW5ub3QgcHV0IEFsbCBvbiB0aGUgc2FtZSBwbG90IGFzIG9uZSBvZiB0aGVzZSB2YXJpYWJsZXMsIHNpbmNlIHRoZXkgYXJlIG1lYXN1cmVkIGluIGRpZmZlcmVudCB1bml0cy4gCgpUaGUgZ29hbCBvZiB0aGUgYXNzaWdubWVudCBpcyBub3Qgb25seSB0byBwcmFjdGljZSBtYWtpbmcgcGxvdHM6IHlvdXIgdGFzayBpcyB0byBwcmVzZW50IHRoZSBkYXRhIGluIGEgY2xlYXIgYW5kIG1lYW5pbmdmdWwgd2F5LiBQb2ludHMgd2lsbCBiZSBkZWR1Y3RlZCwgZm9yIGV4YW1wbGUsIGZyb20gcGxvdHMgdGhhdCBhcmUgaGFyZCB0byByZWFkIG9yIHVuZGVyc3RhbmQuIFlvdSBhcmUgYWxzbyBlbmNvdXJhZ2VkIHRvIHRoaW5rIGFib3V0IGhvdyB0byB1c2UgY29sb3JzLCBsYWJlbHMsIGFuZCB0aGVtZXMgZWZmZWN0aXZlbHkuIEdyYXBocyB3aXRoIG11bHRpcGxlIGdyb3VwcyBtdXN0IGhhdmUgYSBsZWdlbmQuIAoKQWx0aG91Z2ggeW91IHdpbGwgbmVlZCB0byBmaWx0ZXIgdGhlIGRhdGEsIGl0IGRvZXMgbm90IG5lZWQgdG8gYmUgY2xlYW5lZCB1cCB0b28gbXVjaCBpbiBvcmRlciB0byBiZSBncmFwaGVkLiBEb24ndCBvdmVydGhpbmsgaXQhCgpgYGB7cn0KI3NldHdkKCJ+L0JpbmdoYW10b24vaGFycDMyNSIpCmRhdGEgPC0gcmVhZC5jc3YoIm9jY3VwYXRpb25fZ2VuZGVyX3JhY2UuY3N2Iiwgc3RyaW5nc0FzRmFjdG9ycyA9IEYsIGZpbGVFbmNvZGluZz0iVVRGLTgtQk9NIikKCmxpYnJhcnkoZHBseXIpCmxpYnJhcnkoZ2dwbG90MikKbGlicmFyeSh0aWR5cikgI2luc3RhbGxpbmcgdGlkeXIgc28gSSBjYW4gdXNlIHRoZSBwaXZvdF9sb25nZXIoKSBmdW5jdGlvbgpsaWJyYXJ5KGdndGhlbWVzKSAjaW5zdGFsbGluZyBnZ3RoZW1lcyBzbyB0aGF0IEkgY2FuIHVzZSB0aGUgbWluaW1hbCB0aGVtZSBmb3IgbXkgbGluZSBncmFwaApgYGAKYGBge3J9CnN0cihkYXRhKSAjY2hlY2tpbmcgdHlwZSBvZiBkYXRhCmBgYApgYGB7cn0KY29tcHV0ZXJfZGF0YSA8LSBkYXRhICU+JSBmaWx0ZXIoam9iX3R5cGU9PSJjb21wdXRlcl9hbGwiKSAKZ2dwbG90KGNvbXB1dGVyX2RhdGEpKwogIGdlb21fbGluZShhZXMoeCA9IHllYXIsIHkgPSBXb21lbiwgY29sb3IgPSAiV29tZW4iKSkrCiAgZ2VvbV9saW5lKGFlcyh4ID0geWVhciwgeSA9IEJsYWNrLCBjb2xvciA9ICJCbGFjayIpKSsKICBnZW9tX2xpbmUoYWVzKHggPSB5ZWFyLCB5ID0gQXNpYW4sIGNvbG9yID0gIkFzaWFuIikpKwogIGdlb21fbGluZShhZXMoeCA9IHllYXIsIHkgPSBMYXRpbm8sIGNvbG9yID0gIkxhdGlubyIpKSsKICBsYWJzKHkgPSAiUGVyY2VudCBvZiB0aGUgd29ya2VycyIsIHggPSAiWWVhciIsIHRpdGxlID0gIlZhcmlvdXMgRGVtb2dyYXBoaWNzIGluIENvbXB1dGVyIFNjaWVuY2UgYW5kIE1hdGhlbWF0aWNzIHJlbGF0ZWQgY2FyZWVycyBmcm9tIDIwMDUgdG8gMjAyMCIsIGNvbG9yID0gIkRlbW9ncmFwaGljIGdyb3VwIikrCiAgdGhlbWVfbWluaW1hbCgpCiAgCmBgYApBcyB0aGlzIGdyYXBoIHNob3dzLCB0aGVyZSBoYXMgYmVlbiB2YXJpb3VzIGNoYW5nZXMgaW4gZGl2ZXJzaXR5IGluIGNvbXB1dGVyIHNjaWVuY2UgYW5kIG1hdGhlbWF0aWNzIHJlbGF0ZWQgY2FyZWVycy4gT3V0IG9mIHRoZSB0aHJlZSByYWNpYWwgZ3JvdXBzLCB0aGUgYXNpYW4gZGVtb2dyYXBoaWMgZ3JvdXAgaGF2ZSBzZWVuIHRoZSBtb3N0IGdyb3d0aCBpbiByZXByZXNlbnRhdGlvbi4gVGhlIGFzaWFuIGdyb3VwIGlzIGFsc28gc2lnbmlmaWNhbnRseSBtb3JlIHJlcHJlc2VudGVkIHRoYW4gdGhlIGJsYWNrIGdyb3VwIGFuZCBMYXRpbm8gZ3JvdXAuIEluIDIwMDUsIHRoZXJlIHdhcyBhIDclIGRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgYXNpYW4gZ3JvdXAgYW5kIHRoZSBibGFjayBhbmQgTGF0aW5vIGdyb3VwLiBUaGlzIGdhcCBoYXMgb25seSB3aWRlbmVkIG92ZXIgdGltZS4gVGhlIHBlcmNlbnRhZ2Ugb2YgTGF0aW5vIGFuZCBibGFjayBwZW9wbGUgaGFzIGluY3JlYXNlZCBmcm9tIDIwMDUuIEhvd2V2ZXIgdGhpcyBncm93dGggcGFsZXMgaW4gY29tcGFyaXNvbiB0byB0aGUgZ3Jvd3RoIHRoYXQgdGhlIEFzaWFuIGRlbW9ncmFwaGljIGdyb3VwIGhhcy4KVGhlIHBlcmNlbnRhZ2Ugb2Ygd29tZW4gaW4gdGhlIGZpZWxkIGlzIHRoZSBvbmx5IG9ic2VydmVkIGdyb3VwIHRoYXQgaGFzIGRlY3JlYXNlZCBvdmVyIHRpbWUuIFRoaXMgaXMgYWxhcm1pbmcgYmVjYXVzZSB0aGUgZ2VuZGVyIGRpdmVyc2l0eSB3YXMgYWxyZWFkeSBwb29yIGFuZCBpdCBpcyBvbmx5IGRlY3JlYXNpbmcgb3ZlciB0aGUgeWVhcnMuIApgYGB7cn0Kc3VtbWFyeV9kYXRhPC1kYXRhICU+JSAKICBncm91cF9ieSh5ZWFyKSAlPiUgCiAgZmlsdGVyKGpvYl90eXBlPT0iY29tcHV0ZXIiKSAlPiUgCiAgbXV0YXRlKCJQT0MiPUJsYWNrK0FzaWFuK0xhdGlubykgJT4lIAogIHN1bW1hcmlzZShhdmdfUE9DX3BlcmNlbnQ9bWVhbihQT0MsbmEucm09VCksCiAgICAgICAgICAgIG1pbl9QT0NfcGVyY2VudD1taW4oUE9DLG5hLnJtPVQpLAogICAgICAgICAgICBtYXhfUE9DX3BlcmNlbnQ9bWF4KFBPQyxuYS5ybT1UKSwKICAgICAgICAgICAgYXZnX3dvbWVuX3BlcmNlbnQ9bWVhbihXb21lbixuYS5ybT1UKSwKICAgICAgICAgICAgbWluX3dvbWVuX3BlcmNlbnQ9bWluKFdvbWVuLG5hLnJtPVQpLAogICAgICAgICAgICBtYXhfd29tZW5fcGVyY2VudD1tYXgoV29tZW4sbmEucm09VCksCiAgICAgICAgICAgIGF2Z19CbGFja19wZXJjZW50PW1lYW4oQmxhY2ssbmEucm09VCksCiAgICAgICAgICAgIG1pbl9CbGFja19wZXJjZW50PW1pbihCbGFjayxuYS5ybT1UKSwKICAgICAgICAgICAgbWF4X0JsYWNrX3BlcmNlbnQ9bWF4KEJsYWNrLG5hLnJtPVQpLAogICAgICAgICAgICBhdmdfQXNpYW5fcGVyY2VudD1tZWFuKEFzaWFuLG5hLnJtPVQpLAogICAgICAgICAgICBtaW5fQXNpYW5fcGVyY2VudD1taW4oQXNpYW4sbmEucm09VCksCiAgICAgICAgICAgIG1heF9Bc2lhbl9wZXJjZW50PW1heChBc2lhbixuYS5ybT1UKSwKICAgICAgICAgICAgYXZnX0xhdGlub19wZXJjZW50PW1lYW4oTGF0aW5vLG5hLnJtPVQpLAogICAgICAgICAgICBtaW5fTGF0aW5vX3BlcmNlbnQ9bWluKExhdGlubyxuYS5ybT1UKSwKICAgICAgICAgICAgbWF4X0xhdGlub19wZXJjZW50PW1heChMYXRpbm8sbmEucm09VCkpCgpgYGAKVGhpcyBzdW1tYXJ5IHRhYmxlIGNhcHR1cmVzIHRoZSBhdmVyYWdlIApgYGB7cn0KYmFyX2RhdGE8LWRhdGEgJT4lIGZpbHRlcihqb2JfdHlwZT09ImNvbXB1dGVyX2FsbCIpICU+JSBzZWxlY3QoLWMoam9iX3R5cGUsZGVzY3JpcHRpb24pKQpiYXJfZGF0YSRXb21lbjwtYmFyX2RhdGEkV29tZW4qYmFyX2RhdGEkQWxsLzEwMApiYXJfZGF0YSRCbGFjazwtYmFyX2RhdGEkQmxhY2sqYmFyX2RhdGEkQWxsLzEwMApiYXJfZGF0YSRMYXRpbm88LWJhcl9kYXRhJExhdGlubypiYXJfZGF0YSRBbGwvMTAwCmJhcl9kYXRhJEFzaWFuPC1iYXJfZGF0YSRBc2lhbipiYXJfZGF0YSRBbGwvMTAwCgpsb25nX2RhdGEgPC0gYmFyX2RhdGEgJT4lIAogICNUaGUgY29sdW1uIHRpdGxlcyBiZWNvbWUgdGhlIGNhdGVnb3JpZXMgaW4gYSBuZXcgY29sdW1uIGFmdGVyIHRoZSByZXNoYXBpbmcKICAjSSdtIG5hbWluZyB0aGlzIG5ldyBjb2x1bW4gR2VuZGVyCiAgI1RoZSBudW1iZXIgb2Ygc3R1ZGVudHMgYXJlIHB1dCBpbiBhIG5ldyB2YWx1ZXMgY29sdW1uLCB3aGljaCBJJ20gY2FsbGluZyBNYWpvcnMKICBwaXZvdF9sb25nZXIoLXllYXIsIG5hbWVzX3RvID0gIkRlbW9ncmFwaGljIiwgdmFsdWVzX3RvID0gIkNhcmVlcnMiKQoKZ2dwbG90KGxvbmdfZGF0YSwgYWVzKHggPSBEZW1vZ3JhcGhpYywgeSA9IENhcmVlcnMsIGdyb3VwID0gRGVtb2dyYXBoaWMsIGZpbGwgPSBEZW1vZ3JhcGhpYykpICsKICBnZW9tX2JhcihzdGF0PSJJZGVudGl0eSIpICsKICBsYWJzKHkgPSAiTnVtYmVyIG9mIHBlb3BsZSBlbXBsb3llZCIsIHggPSAiR3JvdXBzIiwgCiAgICAgICB0aXRsZSA9ICJEZW1vZ3JhcGhpYyBHcm91cHMgaW4gQ29tcHV0ZXIgU2NpZW5jZSBhbmQgTWF0aGVtYXRoaWNzIE9jY3VwYXRpb25zLCAyMDIwIiwKICAgICAgIGNvbG9yID0gIkdlbmRlciIpKwogIHNjYWxlX2ZpbGxfbWFudWFsKHZhbHVlcyA9IGMoInJlZCIsImdyZWVuIiwiYmx1ZSIsInllbGxvdyIsImJsYWNrIikpKwogIHRoZW1lKGxlZ2VuZC5wb3NpdGlvbj0ibm9uZSIpCmBgYAoK