Does the choice of STEM major in graduation have an effect on the graduate salary?
The recent grad list compiled by census.gov has provided us with a list of compiled data of grads during 2010-2012. There are observations made against 173 majors.Each observation represents a major with data mentioning the number of grads who passed, their employement rate, salary ranges etc
The data has been collected and compiled by American Community Survey (ACS) and made available as Public Use Micro Service (PUMS) data.
This data has been gathered as an observation study
This data has been gathered and compiled by ACS, and made available at below locations. https://github.com/fivethirtyeight/data/tree/master/college-majors https://www.census.gov/programs-surveys/acs/data/pums.html
Median salary of a student is the dependent or response variable and is numerical (quantitative)
The course major is an independent variable (qualitative) and number of grad students is another independent variable (quantitative)
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
grads <- read.csv("https://raw.githubusercontent.com/san123i/CUNY/master/Semester1/606/Final%20Project/grad-students.csv")
summary(grads$Grad_median)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 47000 65000 75000 76756 90000 135000
summary(grads$Grad_total)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1542 15284 37872 127672 148255 1184158
ggplot(grads, aes(x=grads$Major_category, y = Grad_median)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
list <- aggregate(Grad_total ~ Major_category, grads, sum)
ggplot(list, aes(x=Major_category, y=Grad_total, label=Grad_total)) + geom_col() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs(title = '# of Graduates by major category type') +
geom_text(size = 3, position = position_stack(vjust = 0.5), color='white')
isSTEM <- function(majorCategory){
result <- grepl("science|math|engineering|technology|computer", majorCategory, ignore.case = T)
result <- ifelse(result, !grepl("Social", majorCategory, ignore.case = T), result)
return(ifelse(result, "STEM","NON-STEM"));
}
grads <- grads %>% mutate("Major_Type_STEM"=isSTEM(Major_category))
list <- aggregate(Grad_total ~ Major_Type_STEM, grads, sum)
ggplot(list, aes(x=Major_Type_STEM, y=Grad_total, label=Grad_total, fill=Major_Type_STEM)) + geom_col() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs(title = '# of Graduates by STEM type') +
geom_text(size = 3, position = position_stack(vjust = 0.5), color='white')