This project use the Grad Student data for analysis

Research question

Does the choice of STEM major in graduation have an effect on the graduate salary?

Cases

The recent grad list compiled by census.gov has provided us with a list of compiled data of grads during 2010-2012. There are observations made against 173 majors.Each observation represents a major with data mentioning the number of grads who passed, their employement rate, salary ranges etc

Data collection

The data has been collected and compiled by American Community Survey (ACS) and made available as Public Use Micro Service (PUMS) data.

Type of study

This data has been gathered as an observation study

Data Source

This data has been gathered and compiled by ACS, and made available at below locations. https://github.com/fivethirtyeight/data/tree/master/college-majors https://www.census.gov/programs-surveys/acs/data/pums.html

Dependent Variable

Median salary of a student is the dependent or response variable and is numerical (quantitative)

Independent Variable

The course major is an independent variable (qualitative) and number of grad students is another independent variable (quantitative)

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

grads <- read.csv("https://raw.githubusercontent.com/san123i/CUNY/master/Semester1/606/Final%20Project/grad-students.csv")

summary(grads$Grad_median)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   47000   65000   75000   76756   90000  135000
summary(grads$Grad_total)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1542   15284   37872  127672  148255 1184158
ggplot(grads, aes(x=grads$Major_category, y = Grad_median)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

Students in each category

list <- aggregate(Grad_total ~ Major_category, grads, sum)
ggplot(list, aes(x=Major_category, y=Grad_total, label=Grad_total)) + geom_col() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs(title = '# of Graduates by major category type') + 
  geom_text(size = 3, position = position_stack(vjust = 0.5), color='white')

Categorize the students into STEM and non-STEM.

isSTEM <- function(majorCategory){
    result <- grepl("science|math|engineering|technology|computer", majorCategory, ignore.case = T)
    result <- ifelse(result, !grepl("Social", majorCategory, ignore.case = T), result)
    return(ifelse(result, "STEM","NON-STEM"));
}
grads <- grads %>% mutate("Major_Type_STEM"=isSTEM(Major_category))

list <- aggregate(Grad_total ~ Major_Type_STEM, grads, sum)
ggplot(list, aes(x=Major_Type_STEM, y=Grad_total, label=Grad_total, fill=Major_Type_STEM)) + geom_col() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs(title = '# of Graduates by STEM type') + 
  geom_text(size = 3, position = position_stack(vjust = 0.5), color='white')