Import data

# excel file
data <- read_excel("data/mydatasal.xlsx")
data
## # A tibble: 32,562 × 13
##      age workclass    degree marital_status occupation relationship race  gender
##    <dbl> <chr>        <chr>  <chr>          <chr>      <chr>        <chr> <chr> 
##  1    39 State-gov    Bache… Never-married  Adm-cleri… Not-in-fami… White Male  
##  2    50 Self-emp-no… Bache… Married-civ-s… Exec-mana… Husband      White Male  
##  3    38 Private      HS-gr… Divorced       Handlers-… Not-in-fami… White Male  
##  4    53 Private      11th   Married-civ-s… Handlers-… Husband      Black Male  
##  5    28 Private      Bache… Married-civ-s… Prof-spec… Wife         Black Female
##  6    37 Private      Maste… Married-civ-s… Exec-mana… Wife         White Female
##  7    49 Private      9th    Married-spous… Other-ser… Not-in-fami… Black Female
##  8    52 Self-emp-no… HS-gr… Married-civ-s… Exec-mana… Husband      White Male  
##  9    31 Private      Maste… Never-married  Prof-spec… Not-in-fami… White Female
## 10    42 Private      Bache… Married-civ-s… Exec-mana… Husband      White Male  
## # ℹ 32,552 more rows
## # ℹ 5 more variables: Column11 <dbl>, Column12 <dbl>, hoursperweek <dbl>,
## #   country <chr>, salary <chr>

State one question

does gender influence if someone makes over 50k a year?

Plot data

ggplot(data, aes(x = gender, fill = salary)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("<=50K" = "#F8766D", ">50K" = "#00BFC4")) +
  labs(
    title = "Proportion of Individuals Earning >50K vs <=50K by Gender",
    x = "Gender",
    y = "Proportion",
    fill = "Salary"
  ) +
  theme_minimal(base_size = 14)

Interpret

I used a bar chart because a scatter plot would be very messy, but yes there seems like a contingency thatgender does affect whether a individual will earn more than 50k a year.