Admissions Dataset

Install Packages:

install.packages("dslabs", repos = " https://CRAN.R-project.org/package=dslabs")
## Installing package into 'C:/Users/Maggie/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository  https://CRAN.R-project.org/package=dslabs/src/contrib:
##   scheme not supported in URL ' https://CRAN.R-project.org/package=dslabs/src/contrib/PACKAGES'
## Warning: package 'dslabs' is not available (for R version 3.6.2)
## Warning: unable to access index for repository  https://CRAN.R-project.org/package=dslabs/bin/windows/contrib/3.6:
##   scheme not supported in URL ' https://CRAN.R-project.org/package=dslabs/bin/windows/contrib/3.6/PACKAGES'
library("dslabs")
## Warning: package 'dslabs' was built under R version 3.6.3
data(package = "dslabs")
list.files(system.file("script", package = "dslabs"))
##  [1] "make-admissions.R"                   
##  [2] "make-brca.R"                         
##  [3] "make-brexit_polls.R"                 
##  [4] "make-death_prob.R"                   
##  [5] "make-divorce_margarine.R"            
##  [6] "make-gapminder-rdas.R"               
##  [7] "make-greenhouse_gases.R"             
##  [8] "make-historic_co2.R"                 
##  [9] "make-mnist_27.R"                     
## [10] "make-movielens.R"                    
## [11] "make-murders-rda.R"                  
## [12] "make-na_example-rda.R"               
## [13] "make-nyc_regents_scores.R"           
## [14] "make-olive.R"                        
## [15] "make-outlier_example.R"              
## [16] "make-polls_2008.R"                   
## [17] "make-polls_us_election_2016.R"       
## [18] "make-reported_heights-rda.R"         
## [19] "make-research_funding_rates.R"       
## [20] "make-stars.R"                        
## [21] "make-temp_carbon.R"                  
## [22] "make-tissue-gene-expression.R"       
## [23] "make-trump_tweets.R"                 
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"

View the dataset of admissions:

data("admissions")
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
view(admissions)

Checking the structure of the data:

str(admissions)
## 'data.frame':    12 obs. of  4 variables:
##  $ major     : chr  "A" "B" "C" "D" ...
##  $ gender    : chr  "men" "men" "men" "men" ...
##  $ admitted  : num  62 63 37 33 28 6 82 68 34 35 ...
##  $ applicants: num  825 560 325 417 191 373 108 25 593 375 ...

Getting more informations on the admissions dataset:

?admissions
## starting httpd help server ... done

Checking to see if thereโ€™s any NAs:

is.na(admissions)
##       major gender admitted applicants
##  [1,] FALSE  FALSE    FALSE      FALSE
##  [2,] FALSE  FALSE    FALSE      FALSE
##  [3,] FALSE  FALSE    FALSE      FALSE
##  [4,] FALSE  FALSE    FALSE      FALSE
##  [5,] FALSE  FALSE    FALSE      FALSE
##  [6,] FALSE  FALSE    FALSE      FALSE
##  [7,] FALSE  FALSE    FALSE      FALSE
##  [8,] FALSE  FALSE    FALSE      FALSE
##  [9,] FALSE  FALSE    FALSE      FALSE
## [10,] FALSE  FALSE    FALSE      FALSE
## [11,] FALSE  FALSE    FALSE      FALSE
## [12,] FALSE  FALSE    FALSE      FALSE

Create a scatterplot of total numbers of applicants vs percent of students admitted by gender:

admissions %>%
  ggplot(aes(x = applicants, y = admitted, color = major)) +
  geom_point(size = 3.5) +
  xlab("Total Number of Applicants") +
  ylab("Percent of Students Admitted")+
  facet_grid(~ gender)+
  scale_color_brewer(name = "MAJOR", palette = "Set2")+
  theme_linedraw() +
  theme(legend.position = "top",
        plot.title = element_text(size = 9, face = "bold")) +
   ggtitle("Scatterplot of Total Number of Applicants vs Percent of Students Admitted For Each Major by Gender")

SUMMARY:

This facet grid scatterplot was created using the admissions dataset from the Dslabs package. The dataset has four variables: Major, Gender, Admitted, and Applicants. I wanted to look into the total number of applicants vs the percent of students admitted in each major by gender, so I decided to create a facet grid scatterplot. I used the Total Number of Applicants variable on the x-axis, and Percent of Students Admitted on the y-axis. The third variable is the major. The legend is placed on the top of the graph instead of the side. To compare total number of applicants and percent of students admitted for each major by gender, I created a facet grid to show men and women side by side. From the facet grid scatterplot, the number of women applicants for the major A and B was lower compared to men but the percent of admitted was higher compared to men. For the major F, there was about the same number of applicants between women and men and the percent was very similar as well.