This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Create 1 table of summary information for the total and computer_all categories, as well as 1-2 professions of your choice,

and 2-3 plots using the ggplot2 package in R. Please refer to the .Rmd file provided for notes and tips on how to use the data. Analysis will be graded on a) whether you are able to produce the required graphs and table and b) the quality and design of your plots. As you work, be sure to think about: ● Am I telling a clear story with my plot? ● Can a viewer easily understand and interpret my visuals? ● Do my design and color choices make sense for this plot? You should include a paragraph or two with your analysis that summarizes key findings from your work.

setwd("C:/Users/SUNYLoaner/OneDrive - Binghamton University/Desktop/DIDA 325")
#found path under More section of files
csv <- read.csv("occupation_gender_race (1).csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
  1. Create 1 table of summary information for the total and computer_all categories, as well as 1-2 professions of your choice,
table(csv$job_type)
#variable : jobtype

select_occupation_csv <- csv %>% filter(job_type %in% c("total", "computer_all","computer")) %>% head()


table(csv$job_type)

    computer computer_all professional        total 
          58            4            4            4 
table(csv$description)

                                             Actuaries 
                                                     3 
          Computer and information research scientists 
                                                     2 
                 Computer and mathematical occupations 
                                                     4 
                           Computer hardware engineers 
                                                     2 
                         Computer hardware engineers   
                                                     1 
                           Computer network architects 
                                                     2 
                       Computer occupations, all other 
                                                     2 
                                  Computer programmers 
                                                     4 
              Computer scientists and systems analysts 
                                                     2 
                           Computer software engineers 
                                                     2 
                          Computer support specialists 
                                                     3 
                         Computer support specialists  
                                                     1 
                             Computer systems analysts 
                                                     2 
                               Database administrators 
                                                     3 
                Database administrators and architects 
                                                     1 
                         Information security analysts 
                                                     2 
                                        Mathematicians 
                                                     3 
        Miscellaneous mathematical science occupations 
                                                     2 
           Network and computer systems administrators 
                                                     4 
      Network systems and data communications analysts 
                                                     2 
                          Operations research analysts 
                                                     3 
                         Operations research analysts  
                                                     1 
                Other mathematical science occupations 
                                                     2 
                  Professional and related occupations 
                                                     3 
                 Professional and related occupations  
                                                     1 
                                   Software developers 
                                                     1 
Software developers, applications and systems software 
                                                     1 
       Software quality assurance analysts and testers 
                                                     1 
                                         Statisticians 
                                                     3 
                              Total, 16 years and over 
                                                     4 
                   Web and digital interface designers 
                                                     1 
                                        Web developers 
                                                     2 
variables <- csv %>% 
  select(c("All", "Women","Black", "Asian", "Hispanic.Latino"))
variables

mean1 <- variables %>% summarise(across(everything(), mean, na.rm=TRUE))
sd1 <- variables %>% summarise(across(everything(), sd, na.rm=TRUE))
min1 <- variables %>% summarise(across(everything(), min, na.rm=TRUE))
max1 <- variables %>% summarise(across(everything(), max, na.rm=TRUE))

table <- rbind(mean1, sd1, min1, max1)
rownames(table) <- c("Mean", "Standard Deviation", "Minimum", "Maximum")
table <- t(table)
options(scipen = 999)

table <- table %>% 
  as.data.frame %>% 
  mutate_if(is.numeric, round, digits=2)
table
table(csv$job_type)

    computer computer_all professional        total 
          58            4            4            4 
table(csv$description)

                                             Actuaries 
                                                     3 
          Computer and information research scientists 
                                                     2 
                 Computer and mathematical occupations 
                                                     4 
                           Computer hardware engineers 
                                                     2 
                         Computer hardware engineers   
                                                     1 
                           Computer network architects 
                                                     2 
                       Computer occupations, all other 
                                                     2 
                                  Computer programmers 
                                                     4 
              Computer scientists and systems analysts 
                                                     2 
                           Computer software engineers 
                                                     2 
                          Computer support specialists 
                                                     3 
                         Computer support specialists  
                                                     1 
                             Computer systems analysts 
                                                     2 
                               Database administrators 
                                                     3 
                Database administrators and architects 
                                                     1 
                         Information security analysts 
                                                     2 
                                        Mathematicians 
                                                     3 
        Miscellaneous mathematical science occupations 
                                                     2 
           Network and computer systems administrators 
                                                     4 
      Network systems and data communications analysts 
                                                     2 
                          Operations research analysts 
                                                     3 
                         Operations research analysts  
                                                     1 
                Other mathematical science occupations 
                                                     2 
                  Professional and related occupations 
                                                     3 
                 Professional and related occupations  
                                                     1 
                                   Software developers 
                                                     1 
Software developers, applications and systems software 
                                                     1 
       Software quality assurance analysts and testers 
                                                     1 
                                         Statisticians 
                                                     3 
                              Total, 16 years and over 
                                                     4 
                   Web and digital interface designers 
                                                     1 
                                        Web developers 
                                                     2 

and 2-3 plots using the ggplot2 package in R. Please refer to the .Rmd file provided for notes and tips on how to use the data. Analysis will be graded on a) whether you are able to produce the required graphs and table and b) the quality and design of your plots. As you work, be sure to think about: ● Am I telling a clear story with my plot? ● Can a viewer easily understand and interpret my visuals? ● Do my design and color choices make sense for this plot? You should include a paragraph or two with your analysis that summarizes key findings from your work

Plot 1:

totalplot <- ggplot(total_csv)+
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Hispanic.Latino, colour = "H.Latino"))+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  labs(y= "Percentage of occupations", x = "year",
       color = "Demographic",
       title = "Demographic Breakdown of All Occupations from years 2005,2010,2015,2020")+
  theme_minimal()+
  scale_color_manual(labels = c("Asian", "Black", "H.Latino","Women"),
                     values = c("red", "orange", "darkgreen", "blue"))

totalplot

PLot 2:

computerallplot <-
  ggplot(computer_all_csv) + 
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Hispanic.Latino, colour = "H.Latino"))+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  labs(y= "Percentage of occupations", x = "Year",
       color = "Demographic",
       title = "Demographic Breakdown of All computer occupations from years 2005,2010,2015,2020")+
  theme_minimal()+
  scale_color_manual(labels = c("Asian", "Black", "H.Latino","Women"),
                     values = c("red", "orange", "darkgreen", "blue"))

computerallplot

professionalplot <-
  ggplot(professional_csv) + 
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Hispanic.Latino, colour = "H.Latino"))+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  labs(y= "Percentage of occupation", x = "Year",
       color = "Demographic",
       title = "Demographic Breakdown of Proffessional occupations from years 2005,2010,2015,2020")+
  theme_minimal()+
  scale_color_manual(labels = c("Asian", "Black", "H.Latino","Women"),
                     values = c("red", "orange", "darkgreen", "blue"))

professionalplot

Key Findings:

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

---
title: "Midterm DIDA 325 : Sharon Lin"
output: html_notebook
---

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*. 
--------------------------------------------------------------------------------------------
Create 1 table of summary information for the total and computer_all categories, as well as 1-2 professions of your choice, 

and 2-3 plots using the ggplot2 package in R. Please refer to the .Rmd file provided for notes and tips on how to use the data. Analysis will be graded on a) whether you are able to produce the required graphs and table and b) the quality and design of your plots. As you work, be sure to think about:
● Am I telling a clear story with my plot?
● Can a viewer easily understand and interpret my visuals?
● Do my design and color choices make sense for this plot?
You should include a paragraph or two with your analysis that summarizes key findings from
your work.

```{r}
setwd("C:/Users/SUNYLoaner/OneDrive - Binghamton University/Desktop/DIDA 325")
#found path under More section of files
csv <- read.csv("occupation_gender_race (1).csv", stringsAsFactors = F, fileEncoding="UTF-8-BOM")

library(dplyr)
library(tidyr)
library(ggplot2)
library(ggthemes)
```

1. Create 1 table of summary information for the total and computer_all categories, as well as 1-2 professions of your choice, 
```{r}
table(csv$job_type)
```

```{r}
#variable : jobtype

select_occupation_csv <- csv %>% filter(job_type %in% c("total", "computer_all","computer")) %>% head()


table(csv$job_type)
table(csv$description)

variables <- csv %>% 
  select(c("All", "Women","Black", "Asian", "Hispanic.Latino"))
variables

mean1 <- variables %>% summarise(across(everything(), mean, na.rm=TRUE))
sd1 <- variables %>% summarise(across(everything(), sd, na.rm=TRUE))
min1 <- variables %>% summarise(across(everything(), min, na.rm=TRUE))
max1 <- variables %>% summarise(across(everything(), max, na.rm=TRUE))

table <- rbind(mean1, sd1, min1, max1)
rownames(table) <- c("Mean", "Standard Deviation", "Minimum", "Maximum")
table <- t(table)
options(scipen = 999)

table <- table %>% 
  as.data.frame %>% 
  mutate_if(is.numeric, round, digits=2)
table
```
```{r}
table(csv$job_type)
```
```{r}
table(csv$description)
```

```{r}
plot(csv$year, csv$Women)


total_csv <- csv %>% filter(job_type %in% c("total")) %>% head()

computer_all_csv<- csv %>% filter(job_type %in% c("computer_all")) %>% head()

professional_csv <-csv %>% filter(job_type %in% c("professional")) %>% head()


plot(total_csv$year, total_csv$Women)

plot(computer_all_csv$year, computer_all_csv$Black)

plot(professional_csv$year, professional_csv$Women)


```
and 2-3 plots using the ggplot2 package in R. Please refer to the .Rmd file provided for notes and tips on how to use the data. Analysis will be graded on a) whether you are able to produce the required graphs and table and b) the quality and design of your plots. As you work, be sure to think about:
● Am I telling a clear story with my plot?
● Can a viewer easily understand and interpret my visuals?
● Do my design and color choices make sense for this plot?
You should include a paragraph or two with your analysis that summarizes key findings from
your work


Plot 1:
```{r}
totalplot <- ggplot(total_csv)+
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Hispanic.Latino, colour = "H.Latino"))+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  labs(y= "Percentage of occupations", x = "year",
       color = "Demographic",
       title = "Demographic Breakdown of All Occupations from years 2005,2010,2015,2020")+
  theme_minimal()+
  scale_color_manual(labels = c("Asian", "Black", "H.Latino","Women"),
                     values = c("red", "orange", "darkgreen", "blue"))

totalplot
```

PLot 2:
```{r}
computerallplot <-
  ggplot(computer_all_csv) + 
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Hispanic.Latino, colour = "H.Latino"))+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  labs(y= "Percentage of occupations", x = "Year",
       color = "Demographic",
       title = "Demographic Breakdown of All computer occupations from years 2005,2010,2015,2020")+
  theme_minimal()+
  scale_color_manual(labels = c("Asian", "Black", "H.Latino","Women"),
                     values = c("red", "orange", "darkgreen", "blue"))

computerallplot
```
```{r}
professionalplot <-
  ggplot(professional_csv) + 
  geom_line(aes(x = year, y = Asian, color = "Asian"))+
  geom_line(aes(x = year, y = Black, color = "Black"))+
  geom_line(aes(x = year, y = Hispanic.Latino, colour = "H.Latino"))+
  geom_line(aes(x = year, y = Women, color = "Women"))+
  labs(y= "Percentage of occupation", x = "Year",
       color = "Demographic",
       title = "Demographic Breakdown of Proffessional occupations from years 2005,2010,2015,2020")+
  theme_minimal()+
  scale_color_manual(labels = c("Asian", "Black", "H.Latino","Women"),
                     values = c("red", "orange", "darkgreen", "blue"))

professionalplot
```


## Key Findings: 



Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
