In this project, we were assigned to create a multivariable graph with any of the datasets included in “dslabs”. I was really interested in the dataset “admissions” because I was wondering about the factors of college admission. Otherwise, I want to know if college/university admission can be affected by the variables gender and time.
With 12 observations, this data frame contains 4 variables:
1- major
2- gender
3- admitted
4- applicants
library(readr)
## Warning: package 'readr' was built under R version 3.6.1
library(ggplot2)
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.1
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(RColorBrewer)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.1
## -- Attaching packages ------------------------------------------- tidyverse 1.2.1 --
## v tibble 2.1.3 v purrr 0.3.2
## v tidyr 0.8.3 v stringr 1.4.0
## v tibble 2.1.3 v forcats 0.4.0
## Warning: package 'tibble' was built under R version 3.6.1
## Warning: package 'tidyr' was built under R version 3.6.1
## Warning: package 'purrr' was built under R version 3.6.1
## Warning: package 'stringr' was built under R version 3.6.1
## Warning: package 'forcats' was built under R version 3.6.1
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x plotly::filter() masks dplyr::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(highcharter)
## Warning: package 'highcharter' was built under R version 3.6.1
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
library(dslabs)
## Warning: package 'dslabs' was built under R version 3.6.1
##
## Attaching package: 'dslabs'
## The following object is masked from 'package:highcharter':
##
## stars
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
## [1] "make-admissions.R"
## [2] "make-brca.R"
## [3] "make-brexit_polls.R"
## [4] "make-death_prob.R"
## [5] "make-divorce_margarine.R"
## [6] "make-gapminder-rdas.R"
## [7] "make-greenhouse_gases.R"
## [8] "make-historic_co2.R"
## [9] "make-mnist_27.R"
## [10] "make-movielens.R"
## [11] "make-murders-rda.R"
## [12] "make-na_example-rda.R"
## [13] "make-nyc_regents_scores.R"
## [14] "make-olive.R"
## [15] "make-outlier_example.R"
## [16] "make-polls_2008.R"
## [17] "make-polls_us_election_2016.R"
## [18] "make-reported_heights-rda.R"
## [19] "make-research_funding_rates.R"
## [20] "make-stars.R"
## [21] "make-temp_carbon.R"
## [22] "make-tissue-gene-expression.R"
## [23] "make-trump_tweets.R"
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"
data("admissions")
dim(admissions)
## [1] 12 4
str(admissions)
## 'data.frame': 12 obs. of 4 variables:
## $ major : chr "A" "B" "C" "D" ...
## $ gender : chr "men" "men" "men" "men" ...
## $ admitted : num 62 63 37 33 28 6 82 68 34 35 ...
## $ applicants: num 825 560 325 417 191 373 108 25 593 375 ...
At this step, we are going to remove all the missing values by using complete.cases() function. We will call the new data frame “admissions1”
admissions1 <- admissions[complete.cases(admissions),]
dim(admissions1)
## [1] 12 4
we are no missing values.
g<- ggplot(data = admissions, mapping = aes(x= applicants, y = admitted , color = gender))
class(g)
## [1] "gg" "ggplot"
g + geom_point() +
labs(title = "Admitted vs Applicants by Gender",
x = "Applicants", y = "Admitted", color = "Gender")
You can change the entire appearance of a plot by using a custom theme. The library ggthemes containing many custom themes and scales for ggplot.
g<- ggplot(data = admissions, mapping = aes(x= applicants, y = admitted , color = gender))+
geom_point()
# Use economist color scales
g + theme_economist() +
scale_color_economist()+
ggtitle("Admitted vs Applicants by Gender")
The results show that there are more women admitted than men, in the first 200 applications. Women and men almost have the same chance to be admitted around 400 applications. Most of the applicants were also selected at that step. After 600 applications, the number of admitted was low. Then, the factor time could affect the possibility to get an admission in an institution.In my opinion, the admission committee could give more chance to women when the applications were submitted early. On the hand, women may have apply earlier than men. The committee of admission maybe used rule such as “first come, firt serve”. However, this dataset could be quite a bit limited. I did not find the year of the study.I also was unable to see whether this admission was for an undergraduate or a graduate school. Therefore, we can not really use the outcomes at this time unless we get an accurate dataset.
In my opinion, men and women can probably increase their chance to get an admission if they apply early .