Load in the correct libraries

Actually load in the correct libraries this time. I chose to use the admissions dataset. I then display the fisrt bit of the dataset.

library(dslabs)
## Warning: package 'dslabs' was built under R version 4.3.1
library(plotly)
## Warning: package 'plotly' was built under R version 4.3.1
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## āœ” dplyr     1.1.2     āœ” readr     2.1.4
## āœ” forcats   1.0.0     āœ” stringr   1.5.0
## āœ” lubridate 1.9.2     āœ” tibble    3.2.1
## āœ” purrr     1.0.1     āœ” tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## āœ– dplyr::filter() masks plotly::filter(), stats::filter()
## āœ– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data("admissions")

head(admissions)
##   major gender admitted applicants
## 1     A    men       62        825
## 2     B    men       63        560
## 3     C    men       37        325
## 4     D    men       33        417
## 5     E    men       28        191
## 6     F    men        6        373

First plot

I chose to use the six majors for the X axis and the number of admitted students for the Y axis. I chose to make a scatter plot to better show any relationship.

ggplot(admissions, aes(x = major, y = admitted)) +
  geom_point() 

Second Plot

I then add in the total number of applicants, showed by the scale of each point. I feel that this better helps the information be understood.

ggplot(admissions, aes(x = major, y = admitted)) +
  geom_point(aes(size = applicants))

Third Plot

Now I add color to the points, based off gender. Now all data from the data set has been included in the graph.

ggplot(admissions, aes(x = major, y = admitted)) +
  geom_point(aes(size = applicants, color = gender))

Final Plot

I now clean up a couple of things stylistically. First I change the color of the genders to be more visible and distinct, I then change the labels for the gender legend so that the ā€˜g’ is capital. Next, I do the same for the applicants legend. I then add a title and labels for both the x and y axis. I then change the theme by moving the title the center of the graph.

p1 <- ggplot(admissions, aes(x = major, y = admitted)) +
  geom_point(aes(size = applicants, color = gender)) +
  scale_color_manual(values = c("men" = "blue", "women" = "red"), name = "Gender") +
  scale_size(name = "Applicants") +
  labs(title = "Admission Data for Six Majors \n Fall of 1973", x = "Major", y = "Admitted Students") +
  theme(plot.title = element_text(hjust = 0.5))
p1

Make the plot active

I then use plotly to make the graph interactive, I enjoy using these tools so that the viewer has access to all data that I want to include. Some observations from this graph are as follows. First I found it interesting that for major B, only 25 women applied in the dataset but 68 were admitted. I tried to investigate the source but I had to pay for the full article. Next I find it interesting that for major A, out of the 108 women that applied, 82 were admitted. Compared to the 825 men that applied for that major and only 62 got in. The women had an acceptance rate of ~76% while the men only had a rate of ~7.5%. It would be interesting to know what this major is. For majors C, D, E, F it seems the number of students accepted for men and women were about the same regardless of number of applicants.

p1 <- ggplotly(p1)
p1