Resets Environment
rm(list=ls())
Loaded in dplyr, ggplot2, and kableExtra
data <- read.csv("http://tinyurl.com/dida325midtermdata", stringsAsFactors = F, fileEncoding="UTF-8-BOM")
library(dplyr)
library(ggplot2)
library(kableExtra)
#Add the other libraries that you would like to load
The the data was visualized in order to see the number of NAs
summary(data)
job_type description year All Women Black
Length:70 Length:70 Min. :2005 Min. : 3.0 Min. : 9.30 Min. : 3.000
Class :character Class :character 1st Qu.:2010 1st Qu.: 83.0 1st Qu.:21.05 1st Qu.: 6.450
Mode :character Mode :character Median :2015 Median : 223.5 Median :27.60 Median : 9.100
Mean :2014 Mean : 10589.8 Mean :31.18 Mean : 8.997
3rd Qu.:2020 3rd Qu.: 774.2 3rd Qu.:43.85 3rd Qu.:11.600
Max. :2020 Max. :148834.0 Max. :57.40 Max. :15.300
NA's :11 NA's :11
Asian Hispanic.Latino
Min. : 3.40 Min. : 2.000
1st Qu.: 8.65 1st Qu.: 5.550
Median :11.80 Median : 6.900
Mean :14.69 Mean : 7.853
3rd Qu.:19.75 3rd Qu.: 9.350
Max. :34.10 Max. :17.600
NA's :11 NA's :11
Remove NAs and storing it as data.
data <- na.omit(data)
Check to make sure the NAs were removed
summary(data)
job_type description year All Women Black
Length:59 Length:59 Min. :2005 Min. : 61 Min. : 9.30 Min. : 3.000
Class :character Class :character 1st Qu.:2010 1st Qu.: 107 1st Qu.:21.05 1st Qu.: 6.450
Mode :character Mode :character Median :2015 Median : 366 Median :27.60 Median : 9.100
Mean :2013 Mean : 12561 Mean :31.18 Mean : 8.997
3rd Qu.:2020 3rd Qu.: 929 3rd Qu.:43.85 3rd Qu.:11.600
Max. :2020 Max. :148834 Max. :57.40 Max. :15.300
Asian Hispanic.Latino
Min. : 3.40 Min. : 2.000
1st Qu.: 8.65 1st Qu.: 5.550
Median :11.80 Median : 6.900
Mean :14.69 Mean : 7.853
3rd Qu.:19.75 3rd Qu.: 9.350
Max. :34.10 Max. :17.600
Visualize data to see data in table format
View(data)
Created data sets that filters for job_type that is equal to computer all and selected for year, Women, Black, Asian, and Hispanic Latino
variables_comp<- data %>% filter(job_type == "computer_all") %>% select(c(year, Women, Black, Asian, Hispanic.Latino))
Created dataframes that store the mean, standard deviation, minimum and maximum of each df
mean1 <- variables_comp %>% summarise(across(everything(), mean))
sd1 <- variables_comp %>% summarise(across(everything(), sd))
min1 <- variables_comp %>% summarise(across(everything(), min))
max1 <- variables_comp %>% summarise(across(everything(), max))
merge dfs into a table
table_comp <- rbind(mean1, sd1, min1, max1)
Renamed the rows to be equal to be mean, standard deviation, minimum and maximum Rounded to the nearing two decimals
rownames(table_comp) <- c("Mean", "Standard Deviation", "Minimum", "Maximum")
table_comp <- table_comp %>%
as.data.frame %>%
mutate_if(is.numeric, round, digits=2)
Renamed year column
colnames(table_comp)[colnames(table_comp) == "year"] = "Year"
#“How to Relabel Rows and Columns in an R Table.” Displayr Help, https://help.displayr.com/hc/en-us/articles/360002876876-How-to-Relabel-Rows-and-Columns-in-an-R-Table. Accessed 22 Feb. 2024.
Made a table and formatted it in html
table_comp %>%
kbl(caption = "<center><strong>Table 1: Computer and Mathematical Occupations Data</strong></center>",
format = "html") %>%
kable_classic_2("striped", full_width = F)
| Year | Women | Black | Asian | Hispanic.Latino | |
|---|---|---|---|---|---|
| Mean | 2012.50 | 25.68 | 7.82 | 18.42 | 6.50 |
| Standard Deviation | 6.45 | 0.99 | 1.20 | 3.76 | 1.43 |
| Minimum | 2005.00 | 24.70 | 6.70 | 14.70 | 5.30 |
| Maximum | 2020.00 | 27.00 | 9.10 | 23.00 | 8.40 |
Repeated the process for statistical table for total CS jobs
tot<- data %>% filter(job_type == "total") %>% select(c(year, Women, Black, Asian, Hispanic.Latino))
meantot <- tot %>% summarise(across(everything(), mean))
sdtot <- tot %>% summarise(across(everything(), sd))
mintot <- tot %>% summarise(across(everything(), min))
maxtot <- tot %>% summarise(across(everything(), max))
table_tot <- rbind(meantot, sdtot, mintot, maxtot)
rownames(table_tot) <- c("Mean", "Standard Deviation", "Minimum", "Maximum")
table_tot <- table_tot %>%
as.data.frame %>%
mutate_if(is.numeric, round, digits=2)
colnames(table_tot)[colnames(table_tot) == "year"] = "Year"
table_tot %>%
kbl(caption = "<center><strong>Table 2: Total CS Data</strong></center>",
format = "html") %>%
kable_classic_2("striped", full_width = F)
| Year | Women | Black | Asian | Hispanic.Latino | |
|---|---|---|---|---|---|
| Mean | 2012.50 | 46.80 | 11.35 | 5.35 | 15.35 |
| Standard Deviation | 6.45 | 0.33 | 0.66 | 0.91 | 2.03 |
| Minimum | 2005.00 | 46.40 | 10.80 | 4.40 | 13.10 |
| Maximum | 2020.00 | 47.20 | 12.10 | 6.40 | 17.60 |
Repeated the process for statistical table for Web Developers
web<- data %>% filter(description == "Web developers") %>% select(c(year, Women, Black, Asian, Hispanic.Latino))
mean_web <- web %>% summarise(across(everything(), mean))
sd_web <- web %>% summarise(across(everything(), sd))
min_web <- web %>% summarise(across(everything(), min))
max_web <- web %>% summarise(across(everything(), max))
table_web <- rbind(mean_web, sd_web, min_web, max_web)
rownames(table_web) <- c("Mean", "Standard Deviation", "Minimum", "Maximum")
table_web <- table_web %>%
as.data.frame %>%
mutate_if(is.numeric, round, digits=2)
colnames(table_web)[colnames(table_web) == "year"] = "Year"
table_web %>%
kbl(caption = "<center><strong>Table 3: Web Developer Data</strong></center>",
format = "html") %>%
kable_classic_2("striped", full_width = F)
| Year | Women | Black | Asian | Hispanic.Latino | |
|---|---|---|---|---|---|
| Mean | 2017.50 | 31.05 | 6.40 | 12.90 | 6.05 |
| Standard Deviation | 3.54 | 4.60 | 3.82 | 4.67 | 0.21 |
| Minimum | 2015.00 | 27.80 | 3.70 | 9.60 | 5.90 |
| Maximum | 2020.00 | 34.30 | 9.10 | 16.20 | 6.20 |
(1)Created a copy of data and stored it as comp_data. (2) Filtered to include only information whose job type was equal to computer_all. Computer_all were jobs that that were computer and mathematical occupations. (3)Created a graph that shows the percentage of minority groups with Computer and Mathematical Occupations over time - The minority groups were African, Asian, and LatinX
This was done to investigate the amount of minority groups in stem over time. It shows that all percentages increased over time. Africans and LatinX are still underrepresented in the workplace.
comp_data <- data %>% filter(job_type == "computer_all")
ggplot(comp_data)+
geom_line(aes(y= Black, x= year, color=" African"))+
geom_line(aes(y= Asian, x= year, color= "Asian"))+
geom_line(aes(y= Hispanic.Latino, x= year, color= "LatinX"))+
labs(y="Percentage of Workers", x="Year", title= "Percentage of Minority Groups within Computer and Mathematical Occupations", color = "Minority Groups")+
scale_color_manual(labels = c("African","Asian","LatinX"),values = c("red","orange","blue")) #creates colored index
Create new data frame that stores the the percentage of Workplace per Race in the Year 2020
comp_2020 <- comp_data %>% filter(year == 2020)
comp_2020_new <- data.frame(
Race = c("Black","Asian","Hispanic"),
Percentage = c(comp_2020$Black, comp_2020$Asian, comp_2020$Hispanic.Latino)
)
Views the new df
View(comp_2020_new)
Creates a Bargraph that shows the percentage of minority groups in the workplace in the year 2020. Asian minority groups have the highest percentage of people in the workplace at 23%. African/Black people make up 9.1% of the workplace in computational and mathematical fields. Hispanic/LatinX people make up 8.4% of the workplace in computational and mathematical fields.
ggplot(comp_2020_new, aes(x = Race, y = Percentage, fill=Race)) +
geom_bar(stat="Identity") +
labs(y = "Percentage of Workplace", x = "Minority Groups",
title = "Percentage of Minority Groups within \nComputer and Mathematical Occupations in the US, 2020")
#\n works in python so I just used in the text to indent for the title
#https://www.datanovia.com/en/blog/ggplot-title-subtitle-and-caption/
# Alboukadel. “GGPlot Title, Subtitle and Caption: The Ultimate Guide.” Datanovia, 11 Nov. 2018, https://www.datanovia.com/en/blog/ggplot-title-subtitle-and-caption/. Accessed 22 Feb. 2024.
Created a bar chart that showed the percentage of women in the workplace for Computer and Mathematical Occupations in the years 2005, 2010, 2015, 2020. This was done to investigate the percentage of women in stem over time similar to minority groups. The Color is based on a gradient with the lighter colors having the highest percentage of women. It shows that the percent of women within computer and mathematical occupations has gone down since 2005.
ggplot(comp_data, aes(x = as.character(year), y = Women, fill = Women)) +
geom_bar(stat="Identity") +
labs(y = "Percentage of Workplace", x = "Year",
title = "Percentage of Women within \nComputer and Mathematical Occupations in the US ")
The key finds from the analysis are: (1)Black/African people are underrepresented in Computer and Mathematical Occupations since in 2020. Only 9.1% of the workplace was black. (2) Hispanic and Latino workers are also underrepresented in Computer and Mathematical Occupations since in 2020. 8.4% of the workplace was Hispanic.
The Percentage workers that are Asian, Black/African, and Hispanic/Latino has increased since 2005
Women are underrepresented in Computer and Mathematical Occupations since they make up only 26.5% of the workplace since 2020
The percentage of women in the workplace has decreased since 2005; this is surprising since you would expect the position to get more diverse over time
There is a severe amount of underrepresented of both women and minority groups within Computer and Mathematical Occupations. Based on the results of this data analysis, it was found that women make up 26.5 percent of the workplace in Computer and Mathematical based fields. Besides being underrepresented, there has been a decrease in percentage of women working in those fields since 2005. Hispanic/Latino workers make approximately 8.4% of the workplace in Computer and Mathematical based fields making them underrepresented compared the demographics of the United States. Black/African workers are also underrepresented making up 9.1% of the workforce in Computer and Mathematical based occupations. However, since 2005 there has been shown to be an increase of minority workers in these occupations.