This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
library(scales)
theme_set(theme_light())
jobs_gender <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/jobs_gender.csv")
earnings_female <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/earnings_female.csv")
employed_gender <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/employed_gender.csv")
summrize_jobs_gender <- function(tbl) {
tbl %>%
summarize(total_earnings = sum(total_earnings * total_workers) / sum(total_workers),
total_earnings_male =sum(total_earnings_male * workers_male, na.rm = TRUE) /
sum(workers_male[!is.na(total_earnings_male)]),
total_earnings_female = sum(total_earnings_female * workers_female, na.rm = TRUE) /
sum(workers_female[!is.na(total_earnings_female)]),
total_workers = sum(total_workers),
workers_male = sum(workers_male),
workers_female = sum(workers_female)) %>%
mutate(wage_percent_of_male = total_earnings_femal / total_earnings_male)
}
The description of data and variables represent what the categories David Robinson used for his information for men and women in the workplace. The variables represent specifically what the total earnings for men and women are. What the total workers in specific jobs men and female are. Lastly the last category represnts who has more workers in the workforce men or women.
Hint: One graph of your choice.
library(plotly)
p <- jobs_gender %>%
filter(year == 2016) %>%
filter(major_category == "Healthcare Practitioners and Technical") %>%
arrange(desc(wage_percent_of_male)) %>%
ggplot(aes(workers_female / total_workers,
total_earnings,
size = total_workers,
label = occupation)) +
geom_point() +
scale_size_continuous(range = c(1, 10)) +
labs(size = "Total # of workers",
x = "% of workforce reported as female",
y = "Median salary in the occupation") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = dollar_format()) +
expand_limits(y = 0)
ggplotly(p)
p <- jobs_gender %>%
filter(year == 2016,
total_workers >= 20000) %>%
filter(major_category == "Computer, Engineering, and Science") %>%
arrange(desc(wage_percent_of_male)) %>%
ggplot(aes(workers_female / total_workers,
total_earnings_female / total_earnings_male,
color = minor_category,
size = total_workers,
label = occupation)) +
geom_point() +
scale_size_continuous(range = c(1, 10)) +
labs(size = "Total # of workers",
x = "% of workforce reported as female",
y = "% of median female salary / median male") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format())
ggplotly(p)
The story behind the graph is about the different work places used as categories for both men and women in the workforce. By looking at this graph you can tell what how much men and woomen make in there fields and also can see who makes more in what speecifice areas of the categories. You find out that women have more then men in one spefcific job but men still dominate the overall workplace then women.