Assignment 7 - Data 110

Author

Kalina P

Load the libraries

# Loading the necessary libraries
library(dslabs)
Warning: package 'dslabs' was built under R version 4.5.2
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2

The Data - Admissions

Gender bias among graduate school admissions to UC Berkeley. The admission data for six majors for the fall of 1973; often used as an example of Simpson’s paradox

# Examining the dataset
str(admissions)
'data.frame':   12 obs. of  4 variables:
 $ major     : chr  "A" "B" "C" "D" ...
 $ gender    : chr  "men" "men" "men" "men" ...
 $ admitted  : num  62 63 37 33 28 6 82 68 34 35 ...
 $ applicants: num  825 560 325 417 191 373 108 25 593 375 ...

This is not a very large dataset at all, it only has 12 observations. It has 2 numerical variables: the number of applicants and the percent that were admitted. It has 2 categorical variables: major and gender. For the purposes of my scatterplot, I will only be using the gender, admitted, and applicants variables.

Visualization

# Scatterplot of the admission and number of applicants, differntiated by gender
scatter <- admissions |>
  ggplot(aes(x = applicants, y = admitted, color = gender)) +
  geom_point(size = 4, aes(color = gender), alpha = .8) +
  scale_color_manual(values = c("#10ade6", "#ff91ad")) +
  geom_smooth(method = lm, formula = y ~ x,  se = FALSE,  lty = 2, linewidth = .75) +
  labs(title = "Relationship between College Admission \n and Number of Applicants at UC Berkeley in 1973", x = "Total Number of Applicants", y = "Percent of Students Admitted", caption = "Data from DS labs") +
  theme_grey(base_family = "sans", header_family = "serif")
scatter

For my visualization I used the DS Labs Admissions dataset, which observed 4 variables (major, gender, admitted, and applicants) and had 12 observations. To create my graph I used ggplot and mapped out the aesthetics, I used geom_point to create the scatterplot and color the points differently based on gender. I used geom_smooth to create the linear regression line for both male and female observations, and then labeled the graph and altered the theme. I found it very interesting that the trend in this graph shows that, as applicant number increases the percent of male students accepted does as well, but the number of female students accepted decreases. And, when there are fewer applicants, women were accepted more than men. However, like I said, this data has only 12 observations, so more studies would need to be conducted for any conclusive findings on this trend.