# excel file
data <- read_excel("data/mydatasal.xlsx")
data
## # A tibble: 32,562 × 13
## age workclass degree marital_status occupation relationship race gender
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 39 State-gov Bache… Never-married Adm-cleri… Not-in-fami… White Male
## 2 50 Self-emp-no… Bache… Married-civ-s… Exec-mana… Husband White Male
## 3 38 Private HS-gr… Divorced Handlers-… Not-in-fami… White Male
## 4 53 Private 11th Married-civ-s… Handlers-… Husband Black Male
## 5 28 Private Bache… Married-civ-s… Prof-spec… Wife Black Female
## 6 37 Private Maste… Married-civ-s… Exec-mana… Wife White Female
## 7 49 Private 9th Married-spous… Other-ser… Not-in-fami… Black Female
## 8 52 Self-emp-no… HS-gr… Married-civ-s… Exec-mana… Husband White Male
## 9 31 Private Maste… Never-married Prof-spec… Not-in-fami… White Female
## 10 42 Private Bache… Married-civ-s… Exec-mana… Husband White Male
## # ℹ 32,552 more rows
## # ℹ 5 more variables: Column11 <dbl>, Column12 <dbl>, hoursperweek <dbl>,
## # country <chr>, salary <chr>
does gender influence if someone makes over 50k a year?
ggplot(data, aes(x = gender, fill = salary)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("<=50K" = "#F8766D", ">50K" = "#00BFC4")) +
labs(
title = "Proportion of Individuals Earning >50K vs <=50K by Gender",
x = "Gender",
y = "Proportion",
fill = "Salary"
) +
theme_minimal(base_size = 14)
I used a bar chart because a scatter plot would be very messy, but yes there seems like a contingency thatgender does affect whether a individual will earn more than 50k a year.