DIDA 325 Midterm - Ethan Weiss

Use the following data to produce 1 table of summary information and 2-3 graphs.

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year.

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units.

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend.

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don’t overthink it!

data <- read.csv("http://tinyurl.com/dida325midtermdata", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

data <- na.omit(data)

View(data)

support <- data %>% filter( description == "Computer programmers" | description == "Operations research analysts")

View(support)

support <- support %>% select(c(description, All, Women, year)) %>% filter(year != "2005")

color <- c("red", "green", "blue")

(percent <- ggplot(support, aes(x = year, y = Women, group = description, color = description))+
  geom_line()+
  theme_classic()+
  labs(x = "Year", y = "Percentage of Women in Computer Role", title = "Change in Percentage of Women in Computer roles between 2010-2020"))

support <- support %>% mutate(Total_Women = ((Women*.01)*All))

View(support)

(total_women <- ggplot(support, aes(x = year, y = Total_Women, group = description, color = description))+
  geom_line()+
  theme_classic()+
  labs(x = "Year", y = "Total Women", title = "Change in Total Women in Computer roles between 2010-2020"))

(total <- ggplot(support, aes(x = year, y = All, group = description, color = description))+
  geom_line()+
  theme_classic()+
  labs(x = "Year", y = "Total Employees", title = "Change in Total Employment in Computer roles between 2010-2020"))

View(support)

library(dplyr)
library(mosaic)

## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2

## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.

## 
## Attaching package: 'mosaic'

## The following object is masked from 'package:Matrix':
## 
##     mean

## The following object is masked from 'package:ggplot2':
## 
##     stat

## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally

## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var

## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum

program <- support %>% filter(description == "Computer programmers")
research <- support %>% filter(description == "Operations research analysts")

summary(program)

##  description             All            Women            year     
##  Length:3           Min.   :417.0   Min.   :21.00   Min.   :2010  
##  Class :character   1st Qu.:443.5   1st Qu.:21.05   1st Qu.:2012  
##  Mode  :character   Median :470.0   Median :21.10   Median :2015  
##                     Mean   :455.7   Mean   :21.37   Mean   :2015  
##                     3rd Qu.:475.0   3rd Qu.:21.55   3rd Qu.:2018  
##                     Max.   :480.0   Max.   :22.00   Max.   :2020  
##   Total_Women    
##  Min.   : 87.99  
##  1st Qu.: 94.39  
##  Median :100.80  
##  Mean   : 97.40  
##  3rd Qu.:102.10  
##  Max.   :103.40

summary(research)

##  description             All            Women            year     
##  Length:3           Min.   :107.0   Min.   :42.90   Min.   :2010  
##  Class :character   1st Qu.:115.0   1st Qu.:44.55   1st Qu.:2012  
##  Mode  :character   Median :123.0   Median :46.20   Median :2015  
##                     Mean   :128.7   Mean   :46.60   Mean   :2015  
##                     3rd Qu.:139.5   3rd Qu.:48.45   3rd Qu.:2018  
##                     Max.   :156.0   Max.   :50.70   Max.   :2020  
##   Total_Women   
##  Min.   :49.43  
##  1st Qu.:55.90  
##  Median :62.36  
##  Mean   :59.57  
##  3rd Qu.:64.64  
##  Max.   :66.92

Computer_Programmers <- c(455.70, 21.37, 97.40)
Operations_Research_Analysts <- c(128.70, 46.60, 59.57)

table <- rbind(Computer_Programmers, Operations_Research_Analysts)

table <- t(table)

#rownames(table) <- c("Mean Total Employment", "Mean Percentage Women")

View(table)

#I chose to use the employment figures for women in Computer Programming and Operations Research Analyst roles between 2010-2020 to show that over time, the percentage of women in these positions are not increasing to better reflect global demographics. Although the total number of women did increase in Operations Research Analysts did increase over the decade, the percentage of women decreased from 46.2 to 42.9. In the same time period, the percentage of women in Computer Programmer roles decreased from 22.0 to 21.1 percent. This data only shows a small sample of roles where demographics are not changing, this can be paired with other data sources like in a Ms. Magazine column where they explained that lack of women in data science roles is a result of discrimination, which leads to less women entering thee roles as there are lack of people to identify with within the career structure. 
#Comparing my finding with the citation from Ms. Magazine, there is a clear problem within the field that is causing women to either enter and then leave due to discrimination, or discouraging women from joining the field altogether due to lack of representation. This negative feedback loop can only hinder the sector as in the future, the problem will only get worse until a structural change occurs in creating a better environment for women.