Harp 325 Midterm

Use the following data to produce 1 table of summary information and 2-3 graphs.

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year.

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units.

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend.

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don’t overthink it!

#setwd("~/Users/christinacorrado/Documents/Dida 325")
data <- read.csv("/Users/christinacorrado/Documents/Dida 325/occupation_gender_race.csv")
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

table1 <- data %>% group_by(job_type,year) %>% filter(job_type %in% c("computer_all", "total","professional")) %>% summarise(Total.Black=mean(Black), Average.Women=mean(Women), Average.Asian=mean(Asian), Average.Hispanic.Latino=mean(Hispanic.Latino))

## `summarise()` has grouped output by 'job_type'. You can override using the
## `.groups` argument.

print(table1) # I wanted to analyze the average occupation for each race so I created columns for the mean by using the "summarize" function

## # A tibble: 12 × 6
## # Groups:   job_type [3]
##    job_type  year Total.Black Average.Women Average.Asian Average.Hispanic.Lat…¹
##    <chr>    <int>       <dbl>         <dbl>         <dbl>                  <dbl>
##  1 compute…  2005         6.9          27            14.7                    5.3
##  2 compute…  2010         6.7          25.8          16.1                    5.5
##  3 compute…  2015         8.6          24.7          19.9                    6.8
##  4 compute…  2020         9.1          25.2          23                      8.4
##  5 profess…  2005         8.8          56.3           6.6                    6.4
##  6 profess…  2010         9.2          57.4           7                      7.1
##  7 profess…  2015         9.8          57.2           8.7                    8.8
##  8 profess…  2020        10.5          57            10.1                   10.1
##  9 total     2005        10.8          46.4           4.4                   13.1
## 10 total     2010        10.8          47.2           4.8                   14.3
## 11 total     2015        11.7          46.8           5.8                   16.4
## 12 total     2020        12.1          46.8           6.4                   17.6
## # ℹ abbreviated name: ¹Average.Hispanic.Latino

table2<-table1%>%filter(year==2020) #only want to analyze the most recent year of the data
table2

 ggplot(table2) +
  geom_col(aes(x = job_type, y = Average.Women, fill=job_type)) +
  labs(title = "Average % of Women by Occupation in 2020",
       x = "Occupation",
       y = "Average % of Women")

  scale_fill_manual(values=c("blue","red","green"))

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: grey50
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     scale_name: manual
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

  #i created a bar graph to show the percentage of women in the chosen professions from the data

table3 <- data %>% group_by(job_type, year) %>% filter(job_type %in% c("computer_all", "total","professional")) %>% summarise(Total.Black=sum(Black), Total.Women=sum(Women), Total.Asian=sum(Asian), Total.Hispanic.Latino=sum(Hispanic.Latino))

## `summarise()` has grouped output by 'job_type'. You can override using the
## `.groups` argument.

table3

table4<- table3 %>% filter(year == 2020)
table4

ggplot(table4)+ 
  geom_point(aes(x=job_type,y=Total.Black, color="Black"))+
  geom_point(aes(x=job_type,y=Total.Asian, color="Asian"))+
  geom_point(aes(x=job_type,y=Total.Hispanic.Latino, color="Hispanic"))+
  labs(title = "Ethnicity Prevalence by Occupation in 2020",
       x = "Job Type",
       y = "Ethnic Prevalence",
       color="Ethinicity type")

scale_color_manual(labels = c("Black","Asian","Hispanic"),
                     values = c("green", "blue","orange"))

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: colour
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: Black Asian Hispanic
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: grey50
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     scale_name: manual
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

ggplot(table1) +
  geom_line(aes(x=year, y =Average.Asian, color =job_type)) +
  labs(y = "Average", x = "Year",
       title = "Asians in work place over time",
       color="Job Type")

  scale_color_manual(labels = c("computer_all","professional","total"),
                     values = c("green", "blue","orange"))

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: colour
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: computer_all professional total
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: grey50
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     scale_name: manual
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

Anaylsis: In the first graph, I wanted to analyze the presence of women in the computer, professional, and total workforce in 2020. Based on the first graph, I noticed that women are least prevalent in computer professions by around 25%. They are slightly more prevalent in professional careers ranging around 55% and comprise slightly under half the total work force. We spoke about this in class; women are the minority in many male dominated STEM careers. In the second graph I created, I focused on the ethnic spread throughout the job types in the most recent year (2020). Asians are the majority in computer professions ranging around 55% out of the three ethnicities. For professional careers, each ethnicity ranges around 30%, there is no very little spread between ethnicity for this career type. Hispanics comprise almost half of the total labor force, Blacks comprising around 33%, and Asians comprising around 20%. For my last graph I analyzed the total amount of Asians in the workplace over time. I compared professional, computer, and total. It is interesting to see that at 2010 there was a steeper increase in the number of Asian employees for each profession. Computer professions display higher representation of Asians, based off the green line plotted above the other lines. However, Asians comprise a small amount of total employment over time displayed by the low blue line. They are still a minority.