# load data
myWorkingDir <- getwd()
mySourceFile <- paste0(myWorkingDir,"/Where it Pays to Attend College.xlsx")
excel_sheets(path = mySourceFile)## [1] "degrees-that-pay-back" "salaries-by-college-type"
## [3] "salaries-by-region"
df_degrees_that_pay_back <- read_excel(path = mySourceFile, sheet = "degrees-that-pay-back")
df_salaries_by_college_type <- read_excel(path = mySourceFile, sheet = "salaries-by-college-type")
df_salaries_by_region <- read_excel(path = mySourceFile, sheet = "salaries-by-region")Where it Pays to Attend College.
With ever-mounting college debt in US reaching to astronomical proprotions of 1.5 Trillion USD, it only makes sense that millenials and Gen-Z need to do preliminary SWOT analysis of the colleges and degrees and align their goals, aspirations and passion accordingly. Since IVY league colleges would be out of moonshot for most students hence knowing which courses, subjects and elective to choose during the college years will go a long way in forming a market-ready career and stable future.
What are the cases, and how many are there?
Salary Increase By Type of College Party school? Liberal Arts college? State School? We already know that starting salary will be different depending on what type of school one attends. But, increased earning power shows less disparity. Ten years out, graduates of Ivy League schools earned 99% more than they did at graduation. Party school graduates saw an 85% increase. Engineering school graduates fared worst, earning 76% more 10 years out of school. See where does one’s coveted school ranks.
Salaries By Region Attending college in the Midwest leads to the lowest salary both at graduation and at mid-career, according to the PayScale Inc. survey. Graduates of schools in the Northeast and California fared best.
Salary Increase By Major Parents might be worried when one choses Philosophy or International Relations as a major. But a year-long survey of 1.2 million people with only a bachelor’s degree by PayScale Inc. shows that graduates in these subjects earned 103.5% and 97.8% more, respectively, about 10 years post-commencement. Majors that didn’t show as much salary growth include Nursing and Information Technology.
Describe the method of data collection.
The data was collected by surveys conducted by the prestigious Wall Street Journal
What type of study is this (observational/experiment)?
This is observational study based on data surveys and statistics
Wall Street Journal All data was obtained from the Wall Street Journal based on data from Payscale, Inc:
What is the response variable? Is it quantitative or qualitative?
You should have two independent variables, one quantitative and one qualitative.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
DT::datatable(df_degrees_that_pay_back, options = list(pagelength=5))kable(head(df_degrees_that_pay_back)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")| Undergraduate Major | Starting Median Salary | Mid-Career Median Salary | Percent change from Starting to Mid-Career Salary | Mid-Career 10th Percentile Salary | Mid-Career 25th Percentile Salary | Mid-Career 75th Percentile Salary | Mid-Career 90th Percentile Salary |
|---|---|---|---|---|---|---|---|
| Accounting | 46000 | 77100 | 67.6 | 42200 | 56100 | 108000 | 152000 |
| Aerospace Engineering | 57700 | 101000 | 75.0 | 64300 | 82100 | 127000 | 161000 |
| Agriculture | 42600 | 71900 | 68.8 | 36300 | 52100 | 96300 | 150000 |
| Anthropology | 36800 | 61500 | 67.1 | 33800 | 45500 | 89300 | 138000 |
| Architecture | 41600 | 76800 | 84.6 | 50600 | 62200 | 97000 | 136000 |
| Art History | 35800 | 64900 | 81.3 | 28800 | 42200 | 87400 | 125000 |
summary(df_degrees_that_pay_back)## Undergraduate Major Starting Median Salary Mid-Career Median Salary
## Length:50 Min. :34000 Min. : 52000
## Class :character 1st Qu.:37050 1st Qu.: 60825
## Mode :character Median :40850 Median : 72000
## Mean :44310 Mean : 74786
## 3rd Qu.:49875 3rd Qu.: 88750
## Max. :74300 Max. :107000
## Percent change from Starting to Mid-Career Salary
## Min. : 23.40
## 1st Qu.: 59.12
## Median : 67.80
## Mean : 69.27
## 3rd Qu.: 82.42
## Max. :103.50
## Mid-Career 10th Percentile Salary Mid-Career 25th Percentile Salary
## Min. :26700 Min. :36500
## 1st Qu.:34825 1st Qu.:44975
## Median :39400 Median :52450
## Mean :43408 Mean :55988
## 3rd Qu.:49850 3rd Qu.:63700
## Max. :71900 Max. :87300
## Mid-Career 75th Percentile Salary Mid-Career 90th Percentile Salary
## Min. : 70500 Min. : 96400
## 1st Qu.: 83275 1st Qu.:124250
## Median : 99400 Median :145500
## Mean :102138 Mean :142766
## 3rd Qu.:118750 3rd Qu.:161750
## Max. :145000 Max. :210000
DT::datatable(df_salaries_by_college_type, options = list(pagelength=5))kable(head(df_salaries_by_college_type)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")| School Name | School Type | Starting Median Salary | Mid-Career Median Salary | Mid-Career 10th Percentile Salary | Mid-Career 25th Percentile Salary | Mid-Career 75th Percentile Salary | Mid-Career 90th Percentile Salary |
|---|---|---|---|---|---|---|---|
| Massachusetts Institute of Technology (MIT) | Engineering | 72200 | 126000 | 76800 | 99200 | 168000 | 220000 |
| California Institute of Technology (CIT) | Engineering | 75500 | 123000 | N/A | 104000 | 161000 | N/A |
| Harvey Mudd College | Engineering | 71800 | 122000 | N/A | 96000 | 180000 | N/A |
| Polytechnic University of New York, Brooklyn | Engineering | 62400 | 114000 | 66800 | 94300 | 143000 | 190000 |
| Cooper Union | Engineering | 62200 | 114000 | N/A | 80200 | 142000 | N/A |
| Worcester Polytechnic Institute (WPI) | Engineering | 61000 | 114000 | 80000 | 91200 | 137000 | 180000 |
summary(df_salaries_by_college_type)## School Name School Type Starting Median Salary
## Length:269 Length:269 Min. :34800
## Class :character Class :character 1st Qu.:42000
## Mode :character Mode :character Median :44700
## Mean :46068
## 3rd Qu.:48300
## Max. :75500
## Mid-Career Median Salary Mid-Career 10th Percentile Salary
## Min. : 43900 Length:269
## 1st Qu.: 74000 Class :character
## Median : 81600 Mode :character
## Mean : 83932
## 3rd Qu.: 92200
## Max. :134000
## Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary
## Min. : 31800 Min. : 60900
## 1st Qu.: 53200 1st Qu.:100000
## Median : 58400 Median :113000
## Mean : 60373 Mean :116275
## 3rd Qu.: 65100 3rd Qu.:126000
## Max. :104000 Max. :234000
## Mid-Career 90th Percentile Salary
## Length:269
## Class :character
## Mode :character
##
##
##
DT::datatable(df_salaries_by_region, options = list(pagelength=5))kable(head(df_salaries_by_region)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")| School Name | Region | Starting Median Salary | Mid-Career Median Salary | Mid-Career 10th Percentile Salary | Mid-Career 25th Percentile Salary | Mid-Career 75th Percentile Salary | Mid-Career 90th Percentile Salary |
|---|---|---|---|---|---|---|---|
| Stanford University | California | 70400 | 129000 | 68400 | 93100 | 184000 | 257000 |
| California Institute of Technology (CIT) | California | 75500 | 123000 | N/A | 104000 | 161000 | N/A |
| Harvey Mudd College | California | 71800 | 122000 | N/A | 96000 | 180000 | N/A |
| University of California, Berkeley | California | 59900 | 112000 | 59500 | 81000 | 149000 | 201000 |
| Occidental College | California | 51900 | 105000 | N/A | 54800 | 157000 | N/A |
| Cal Poly San Luis Obispo | California | 57200 | 101000 | 55000 | 74700 | 133000 | 178000 |
summary(df_salaries_by_region)## School Name Region Starting Median Salary
## Length:320 Length:320 Min. :34500
## Class :character Class :character 1st Qu.:42000
## Mode :character Mode :character Median :45100
## Mean :46253
## 3rd Qu.:48900
## Max. :75500
## Mid-Career Median Salary Mid-Career 10th Percentile Salary
## Min. : 43900 Length:320
## 1st Qu.: 73725 Class :character
## Median : 82700 Mode :character
## Mean : 83934
## 3rd Qu.: 93250
## Max. :134000
## Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary
## Min. : 31800 Min. : 60900
## 1st Qu.: 53100 1st Qu.: 99825
## Median : 59400 Median :113000
## Mean : 60614 Mean :116497
## 3rd Qu.: 66025 3rd Qu.:129000
## Max. :104000 Max. :234000
## Mid-Career 90th Percentile Salary
## Length:320
## Class :character
## Mode :character
##
##
##
df_degrees_that_pay_back %>%
ggplot(aes(x=`Undergraduate Major`, y=`Starting Median Salary`)) +
geom_bar(stat="identity", position=position_dodge(), colour="black", width = 0.5) +
ggtitle("Starting Median Salary for Undergraduate Major") +
xlab("Undergraduate Major") + ylab("Starting Median Salary") +
#geom_text(aes(label=`Starting Median Salary`), vjust=0.5, hjust=1.1,color="black") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
coord_flip()head(df_salaries_by_college_type) %>%
ggplot(aes(x=`School Name`, y=`Starting Median Salary`)) +
geom_bar(stat="identity", position=position_dodge(), colour="black", width = 0.5) +
ggtitle("Starting Median Salary for School Name") +
xlab("School Name") + ylab("Starting Median Salary") +
#geom_text(aes(label=`Starting Median Salary`), vjust=0.5, hjust=1.1,color="black") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))head(df_salaries_by_region) %>%
ggplot(aes(x=`School Name`, y=`Starting Median Salary`)) +
geom_bar(stat="identity", position=position_dodge(), colour="black", width = 0.5) +
ggtitle("Starting Median Salary for School Name") +
xlab("School Name") + ylab("Starting Median Salary") +
#geom_text(aes(label=`Starting Median Salary`), vjust=0.5, hjust=1.1,color="black") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))qplot(df_degrees_that_pay_back$`Starting Median Salary`, df_degrees_that_pay_back$`Mid-Career Median Salary`, data = df_degrees_that_pay_back)plot(df_salaries_by_college_type$`Starting Median Salary`, df_salaries_by_college_type$`Mid-Career Median Salary`,
main="df_salaries_by_college_type",
xlab="Starting Median Salary ", ylab="Mid-Career Median Salary", pch=19)ggplot(df_salaries_by_region,
aes(x=`Starting Median Salary`, y=`Mid-Career Median Salary`)) +
geom_point(shape=1) +
geom_smooth(method=lm , color="red", se=TRUE)#Plot the graph for starting median salary
ggplot(df_degrees_that_pay_back,aes(x=`Undergraduate Major`, y=`Starting Median Salary`, fill="red")) +
geom_col() +
geom_text(aes(label = `Starting Median Salary`), angle = 90, hjust = 1) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))It is found that Top three degrees that has higher starting median salary is
#Plot bar graph for mid career median salary
ggplot(df_degrees_that_pay_back,aes(x=`Undergraduate Major`, y=`Mid-Career Median Salary`, fill="red")) +
geom_col() +
geom_text(aes(label = `Mid-Career Median Salary`), angle = 90, hjust = 1) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))It is found that top three degrees that has higher mid career median salary
#Plot bar graph for percentage increase in median salary when compared to starting salary
ggplot(df_degrees_that_pay_back,aes(x=`Undergraduate Major`, y=`Percent change from Starting to Mid-Career Salary`, fill="red")) +
geom_col() +
geom_text(aes(label = `Percent change from Starting to Mid-Career Salary`), angle = 90, hjust = 1) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))It is found that top three degrees that has higher percentage increase in median salary compared to starting salary
#Plot the graph by combining startm, mid career,10th, 25th, 75th and 90th percentile of salary
degrees_pay_back_melt <- df_degrees_that_pay_back %>% select(c(1:3,5:8))
colnames(degrees_pay_back_melt) <- c("Subject_major", "start", "mid_median", "mid_10", "mid_25", "mid_75", "mid_90")
degrees_pay_back_melt <- melt(degrees_pay_back_melt,id.vars='Subject_major', variable.name = 'Quartiles', value.name = 'Salary_pack')
ggplot(degrees_pay_back_melt) +
geom_point(aes(x = reorder(Subject_major, Salary_pack), y = Salary_pack, colour = Quartiles), xlab="Undergraduate Major", ylab="Salary") +
coord_flip() +
scale_colour_discrete(breaks = c("start", "mid_median", "mid_10", "mid_25", "mid_75", "mid_90"),
labels = c("Starting salary", "Mid-Career Median salary", "Mid-Career 10 percent salary", "Mid-Career 25 percent salary", "Mid-Career 75 percent salary", "Mid-Career 90 percent salary" ))