Use the following data to produce 1 table of summary information and 2-3 graphs.

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year.

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units.

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend.

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don’t overthink it!

library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(ggplot2)

install.packages("grafify")
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.3/grafify_4.0.1.tgz'
Content type 'application/x-gzip' length 4335968 bytes (4.1 MB)
==================================================
downloaded 4.1 MB

The downloaded binary packages are in
    /var/folders/br/x7ljhm5n45dbkdhv_z2qsx180000gp/T//Rtmp6ARG3Q/downloaded_packages
library(grafify)
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
job_cateogry <- data %>%
  filter(job_type %in% c("total", "computer_all"))

summary_job <- job_cateogry %>%
  group_by(year, job_type) %>%
  summarise(
    Total_Workers = sum(All, na.rm = TRUE),
    Women_Average = mean(Women, na.rm = TRUE),
    Black_Average = mean(Black, na.rm = TRUE),
    Asian_Average = mean(Asian, na.rm = TRUE),
    Hispanic_Latino_Average = mean(Hispanic.Latino, na.rm = TRUE)
  )
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

print(summary_job)

Based on the summary information for the total and computer_all categories, we can infer that throughout the past 15 years, we have seen little growth in the total workers for all computer occupations. While we see bigger growth for computer occupations every five years, we only saw around a 50% around for the past 15 years. On average, the women in the field had decreased going from an average of 27% down to 25.2%. Despite that, people of color have seen a higher percentage of people in the field, with Asians almost doubling since 2005, going from 14.7% to 23%. On average, Black, Hispanic, and Latino people did not see the same growth jump, though, there were was an average of a 3% increase in the past 15 years. Based on this table summary, we see that there are some growth of minorities in computer occupations, but in the past 15 years, these growths are pretty minimal. We can also infer that the total has stayed relatively the same for women, but the computer occupations with women has decreased , showing that despite seeing some increases, it doesn’t apply to all.

specific_occupation <- data %>%
  filter(description %in% c("Web developers", "Information security analysts"))

summary_jobs <- specific_occupation %>%
  group_by(year, job_type, description) %>%
  summarise(
    Total_Workers = sum(All, na.rm = TRUE),
    Women_Average = mean(Women, na.rm = TRUE),
    Black_Average = mean(Black, na.rm = TRUE),
    Asian_Average = mean(Asian, na.rm = TRUE),
    Hispanic_Latino_Average = mean(Hispanic.Latino, na.rm = TRUE)
  )
`summarise()` has grouped output by 'year', 'job_type'. You can override using the `.groups` argument.

print(summary_jobs)

Based on the specific occupations from 2015 to 2020, we can see the changes in the past 5 years for Information Security Analysts and Web developers in the computer occupations. Based on this table summary, it can be noted that information security analysts saw almost a 50% growth in the total workers while web developers saw a 50% loss in total workers. Some of these loses can be noted for the percentage of women, Black, and Hispanic/Latino in the web developer occupation. Women were the greatest loss going from 34.3% to 27.% in just five years, while the average of Black people that left came into a close second going from 9.1% in 2015 down to 3.7% in 2020. Hispanic and Latino saw loses though not at the same impact as women and black. In comparison, the percentage of Asian nearly doubled, going from 9.6% to 16.2%, despite all the other minorities and the total workers seeing decreases. While information security analysts saw almost a double increase in the total workers, we can see decreases in women from 19.7% down to 11.4%. In comparison, minorities saw increases, the most being from Black people from a 3% to 11.9%, almost quadrupling. Hispanics and Latinos saw a slightly lower same rate of growth from 5.2% to 15.8%. While Asians saw some growth, it wasn’t nearly as big, but had still doubled in the percentage in the information security analysts field. Based on all these, we can see why the total workers had doubled for information security analysts, and why web developers had lost half of its total workers.

The two graphs focused on the job types over time and a specific year’s individual job totals. Job types over time would show me the trends throughout the past 15 years. As suspected, the lines stayed relatively straight for all computer applications, while the professional job types shows a steep growth. We see non-linear growth for total jobs, which started with a decrease from 2005 to 2010 before making a decent growth for the next five years and then having a more steeper growth after. Though, if we were to compare from just 2005 to 2020, excluding the movements it made, we see very minimal growth. The second graph looked into the total individual occupations in the year 2020. This shows us where all the workers tend to be gathered at. Based on the graph, software developers dominated the graph with over 1500 employees, while Mathematicians had close to 0. Everything else on average typically had no more than 250 total workers.

---
title: "Harp 325 Midterm"
output: html_notebook
---

Use the following data to produce 1 table of summary information and 2-3 graphs. 

Notes on the data: the total, professional, and computer_all job groups are available for all four years of data. Focus on these if you want to produce graphs of jobs over time. The individual occupations change from one year to the next, so you will not be able to graph them over time (with the exception of computer programmers). You may, however, filter one year of data and make a bar or point plot for each occupation in that year. 

The All variable is measured as total jobs in that category, while Women, Black, Asian, and Latino are the percent of workers who identify with each group. You cannot put All on the same plot as one of these variables, since they are measured in different units. 

The goal of the assignment is not only to practice making plots: your task is to present the data in a clear and meaningful way. Points will be deducted, for example, from plots that are hard to read or understand. You are also encouraged to think about how to use colors, labels, and themes effectively. Graphs with multiple groups must have a legend. 

Although you will need to filter the data, it does not need to be cleaned up too much in order to be graphed. Don't overthink it!

```{r}
setwd("~/Documents/Documents - Jess's MacBook Air/DIDA 325")
data <- read.csv("/Users/jess/Documents/Documents - Jess's MacBook Air/DIDA 325/occupation_gender_race.csv")

library(dplyr)
library(ggplot2)

install.packages("grafify")
library(grafify)

```


``` {r}
#create 1 table of summary information for the total and computer_all categories

#You should  include a paragraph or two with your analysis that summarizes key findings from your work. 

job_cateogry <- data %>%
  filter(job_type %in% c("total", "computer_all"))

summary_job <- job_cateogry %>%
  group_by(year, job_type) %>%
  summarise(
    Total_Workers = sum(All, na.rm = TRUE),
    Women_Average = mean(Women, na.rm = TRUE),
    Black_Average = mean(Black, na.rm = TRUE),
    Asian_Average = mean(Asian, na.rm = TRUE),
    Hispanic_Latino_Average = mean(Hispanic.Latino, na.rm = TRUE)
  )

print(summary_job)


```

Based on the summary information for the total and computer_all categories, we can infer that throughout the past 15 years, we have seen little growth in the total workers for all computer occupations. While we see bigger growth for computer occupations every five years, we only saw around a 50% around for the past 15 years. On average, the women in the field had decreased going from an average of 27% down to 25.2%. Despite that, people of color have seen a higher percentage of people in the field, with Asians almost doubling since 2005, going from 14.7% to 23%. On average, Black, Hispanic, and Latino people did not see the same growth jump, though, there were was an average of a 3% increase in the past 15 years. Based on this table summary, we see that there are some growth of minorities in computer occupations, but in the past 15 years, these growths are pretty minimal. We can also infer that the total has stayed relatively the same for women, but the computer occupations with women has decreased , showing that despite seeing some increases, it doesn't apply to all. 


```{r}
#summary table for 1-2 occupations: 

specific_occupation <- data %>%
  filter(description %in% c("Web developers", "Information security analysts"))

summary_jobs <- specific_occupation %>%
  group_by(year, job_type, description) %>%
  summarise(
    Total_Workers = sum(All, na.rm = TRUE),
    Women_Average = mean(Women, na.rm = TRUE),
    Black_Average = mean(Black, na.rm = TRUE),
    Asian_Average = mean(Asian, na.rm = TRUE),
    Hispanic_Latino_Average = mean(Hispanic.Latino, na.rm = TRUE)
  )

print(summary_jobs)


```
Based on the specific occupations from 2015 to 2020, we can see the changes in the past 5 years for Information Security Analysts and Web developers in the computer occupations. Based on this table summary, it can be noted that information security analysts saw almost a 50% growth in the total workers while web developers saw a 50% loss in total workers. Some of these loses can be noted for the percentage of women, Black, and Hispanic/Latino in the web developer occupation. Women were the greatest loss going from 34.3% to 27.% in just five years, while the average of Black people that left came into a close second going from 9.1% in 2015 down to 3.7% in 2020. Hispanic and Latino saw loses though not at the same impact as women and black. In comparison, the percentage of Asian nearly doubled, going from 9.6% to 16.2%, despite all the other minorities and the total workers seeing decreases. While information security analysts saw almost a double increase in the total workers, we can see decreases in women from 19.7% down to 11.4%. In comparison, minorities saw increases, the most being from Black people from a 3% to 11.9%, almost quadrupling. Hispanics and Latinos saw a slightly lower same rate of growth from 5.2% to 15.8%. While Asians saw some growth, it wasn't nearly as big, but had still doubled in the percentage in the information security analysts field. Based on all these, we can see why the total workers had doubled for information security analysts, and why web developers had lost half of its total workers. 

```{r}
 # 2-3 plots using the ggplot2 package in R

job_trend <- c("total", "professional", "computer_all")
over_time <- data %>%
  filter(job_type %in% job_trend)


ggplot(over_time, aes(x = year, y = All, color = job_type, group = job_type)) +
  geom_line() +
  labs(title = "Total, Professional, and All Computer Job Trends Over Time",
       x = "Year", y = "Total Jobs") +
  scale_color_manual(values = c("total" = "plum1", "professional" = "purple", "computer_all" = "steelblue1")) +
  theme_minimal()

```

```{r}


years <- c(2020)

occupation_data <- data %>%
  filter(year %in% years, job_type %in% c("computer"))


ggplot(occupation_data, aes(x = description, y = All, fill = factor(year))) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Comparison of Total Jobs by Occupation in 2020",
       x = "Occupation", y = "Total Jobs") +
  scale_fill_manual(values = c("2020" = "powderblue"), name = "Year") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust=1 ))
#had some issues with the x-axis interfering with the graph, and used this for help for line 113:https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2

```
The two graphs focused on the job types over time and a specific year's individual job totals. Job types over time would show me the trends throughout the past 15 years. As suspected, the lines stayed relatively straight for all computer applications, while the professional job types shows a steep growth. We see non-linear growth for total jobs, which started with a decrease from 2005 to 2010 before making a decent growth for the next five years and then having a more steeper growth after. Though, if we were to compare from just 2005 to 2020, excluding the movements it made, we see very minimal growth. The second graph looked into the total individual occupations in the year 2020. This shows us where all the workers tend to be gathered at. Based on the graph, software developers dominated the graph with over 1500 employees, while Mathematicians had close to 0. Everything else on average typically had no more than 250 total workers. 
