1 Pre-Requistes : Available Libraries

  • readxl
  • dplyr
  • DT
  • kableExtra
  • ggplot2
  • reshape

2 Data Preparation

# load data
myWorkingDir <- getwd()
mySourceFile <- paste0(myWorkingDir,"/Where it Pays to Attend College.xlsx")
excel_sheets(path = mySourceFile)
## [1] "degrees-that-pay-back"    "salaries-by-college-type"
## [3] "salaries-by-region"
df_degrees_that_pay_back <- read_excel(path = mySourceFile, sheet = "degrees-that-pay-back")
df_salaries_by_college_type <- read_excel(path = mySourceFile, sheet = "salaries-by-college-type")
df_salaries_by_region <- read_excel(path = mySourceFile, sheet = "salaries-by-region")

3 Research question

Where it Pays to Attend College.

With ever-mounting college debt in US reaching to astronomical proprotions of 1.5 Trillion USD, it only makes sense that millenials and Gen-Z need to do preliminary SWOT analysis of the colleges and degrees and align their goals, aspirations and passion accordingly. Since IVY league colleges would be out of moonshot for most students hence knowing which courses, subjects and elective to choose during the college years will go a long way in forming a market-ready career and stable future.

4 Cases

What are the cases, and how many are there?

Salary Increase By Type of College Party school? Liberal Arts college? State School? We already know that starting salary will be different depending on what type of school one attends. But, increased earning power shows less disparity. Ten years out, graduates of Ivy League schools earned 99% more than they did at graduation. Party school graduates saw an 85% increase. Engineering school graduates fared worst, earning 76% more 10 years out of school. See where does one’s coveted school ranks.

Salaries By Region Attending college in the Midwest leads to the lowest salary both at graduation and at mid-career, according to the PayScale Inc. survey. Graduates of schools in the Northeast and California fared best.

Salary Increase By Major Parents might be worried when one choses Philosophy or International Relations as a major. But a year-long survey of 1.2 million people with only a bachelor’s degree by PayScale Inc. shows that graduates in these subjects earned 103.5% and 97.8% more, respectively, about 10 years post-commencement. Majors that didn’t show as much salary growth include Nursing and Information Technology.

5 Data collection

Describe the method of data collection.

The data was collected by surveys conducted by the prestigious Wall Street Journal

6 Type of study

What type of study is this (observational/experiment)?

This is observational study based on data surveys and statistics

7 Data Source

Wall Street Journal All data was obtained from the Wall Street Journal based on data from Payscale, Inc:

8 Dependent Variable

What is the response variable? Is it quantitative or qualitative?

  • School Type (Qualitative) in salaries-by-college-type are related to Starting Median Salary
  • Region (Qualitative) in salaries-by-region are related to Starting Median Salary
  • Starting Median Salary (Quantitative) in degrees-that-pay-back are dependent on Undergraduate Major

9 Independent Variable

You should have two independent variables, one quantitative and one qualitative.

  • degrees-that-pay-back is independent of salaries-by-college-type and salaries-by-region
  • Percent change from Starting to Mid-Career Salary in degrees-that-pay-back are independent of Starting Median Salary
  • Starting Median Salary and Percent change from Starting to Mid-Career Salary are quantitative
  • School Type and Region are qualitative

10 Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

10.1 Show data

10.1.1 degrees-that-pay-back

DT::datatable(df_degrees_that_pay_back, options = list(pagelength=5))
kable(head(df_degrees_that_pay_back)) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")
Undergraduate Major Starting Median Salary Mid-Career Median Salary Percent change from Starting to Mid-Career Salary Mid-Career 10th Percentile Salary Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary Mid-Career 90th Percentile Salary
Accounting 46000 77100 67.6 42200 56100 108000 152000
Aerospace Engineering 57700 101000 75.0 64300 82100 127000 161000
Agriculture 42600 71900 68.8 36300 52100 96300 150000
Anthropology 36800 61500 67.1 33800 45500 89300 138000
Architecture 41600 76800 84.6 50600 62200 97000 136000
Art History 35800 64900 81.3 28800 42200 87400 125000
summary(df_degrees_that_pay_back)
##  Undergraduate Major Starting Median Salary Mid-Career Median Salary
##  Length:50           Min.   :34000          Min.   : 52000          
##  Class :character    1st Qu.:37050          1st Qu.: 60825          
##  Mode  :character    Median :40850          Median : 72000          
##                      Mean   :44310          Mean   : 74786          
##                      3rd Qu.:49875          3rd Qu.: 88750          
##                      Max.   :74300          Max.   :107000          
##  Percent change from Starting to Mid-Career Salary
##  Min.   : 23.40                                   
##  1st Qu.: 59.12                                   
##  Median : 67.80                                   
##  Mean   : 69.27                                   
##  3rd Qu.: 82.42                                   
##  Max.   :103.50                                   
##  Mid-Career 10th Percentile Salary Mid-Career 25th Percentile Salary
##  Min.   :26700                     Min.   :36500                    
##  1st Qu.:34825                     1st Qu.:44975                    
##  Median :39400                     Median :52450                    
##  Mean   :43408                     Mean   :55988                    
##  3rd Qu.:49850                     3rd Qu.:63700                    
##  Max.   :71900                     Max.   :87300                    
##  Mid-Career 75th Percentile Salary Mid-Career 90th Percentile Salary
##  Min.   : 70500                    Min.   : 96400                   
##  1st Qu.: 83275                    1st Qu.:124250                   
##  Median : 99400                    Median :145500                   
##  Mean   :102138                    Mean   :142766                   
##  3rd Qu.:118750                    3rd Qu.:161750                   
##  Max.   :145000                    Max.   :210000

10.1.2 salaries-by-college-type

DT::datatable(df_salaries_by_college_type, options = list(pagelength=5))
kable(head(df_salaries_by_college_type)) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")
School Name School Type Starting Median Salary Mid-Career Median Salary Mid-Career 10th Percentile Salary Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary Mid-Career 90th Percentile Salary
Massachusetts Institute of Technology (MIT) Engineering 72200 126000 76800 99200 168000 220000
California Institute of Technology (CIT) Engineering 75500 123000 N/A 104000 161000 N/A
Harvey Mudd College Engineering 71800 122000 N/A 96000 180000 N/A
Polytechnic University of New York, Brooklyn Engineering 62400 114000 66800 94300 143000 190000
Cooper Union Engineering 62200 114000 N/A 80200 142000 N/A
Worcester Polytechnic Institute (WPI) Engineering 61000 114000 80000 91200 137000 180000
summary(df_salaries_by_college_type)
##  School Name        School Type        Starting Median Salary
##  Length:269         Length:269         Min.   :34800         
##  Class :character   Class :character   1st Qu.:42000         
##  Mode  :character   Mode  :character   Median :44700         
##                                        Mean   :46068         
##                                        3rd Qu.:48300         
##                                        Max.   :75500         
##  Mid-Career Median Salary Mid-Career 10th Percentile Salary
##  Min.   : 43900           Length:269                       
##  1st Qu.: 74000           Class :character                 
##  Median : 81600           Mode  :character                 
##  Mean   : 83932                                            
##  3rd Qu.: 92200                                            
##  Max.   :134000                                            
##  Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary
##  Min.   : 31800                    Min.   : 60900                   
##  1st Qu.: 53200                    1st Qu.:100000                   
##  Median : 58400                    Median :113000                   
##  Mean   : 60373                    Mean   :116275                   
##  3rd Qu.: 65100                    3rd Qu.:126000                   
##  Max.   :104000                    Max.   :234000                   
##  Mid-Career 90th Percentile Salary
##  Length:269                       
##  Class :character                 
##  Mode  :character                 
##                                   
##                                   
## 

10.1.3 salaries-by-region

DT::datatable(df_salaries_by_region, options = list(pagelength=5))
kable(head(df_salaries_by_region)) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")
School Name Region Starting Median Salary Mid-Career Median Salary Mid-Career 10th Percentile Salary Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary Mid-Career 90th Percentile Salary
Stanford University California 70400 129000 68400 93100 184000 257000
California Institute of Technology (CIT) California 75500 123000 N/A 104000 161000 N/A
Harvey Mudd College California 71800 122000 N/A 96000 180000 N/A
University of California, Berkeley California 59900 112000 59500 81000 149000 201000
Occidental College California 51900 105000 N/A 54800 157000 N/A
Cal Poly San Luis Obispo California 57200 101000 55000 74700 133000 178000
summary(df_salaries_by_region)
##  School Name           Region          Starting Median Salary
##  Length:320         Length:320         Min.   :34500         
##  Class :character   Class :character   1st Qu.:42000         
##  Mode  :character   Mode  :character   Median :45100         
##                                        Mean   :46253         
##                                        3rd Qu.:48900         
##                                        Max.   :75500         
##  Mid-Career Median Salary Mid-Career 10th Percentile Salary
##  Min.   : 43900           Length:320                       
##  1st Qu.: 73725           Class :character                 
##  Median : 82700           Mode  :character                 
##  Mean   : 83934                                            
##  3rd Qu.: 93250                                            
##  Max.   :134000                                            
##  Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary
##  Min.   : 31800                    Min.   : 60900                   
##  1st Qu.: 53100                    1st Qu.: 99825                   
##  Median : 59400                    Median :113000                   
##  Mean   : 60614                    Mean   :116497                   
##  3rd Qu.: 66025                    3rd Qu.:129000                   
##  Max.   :104000                    Max.   :234000                   
##  Mid-Career 90th Percentile Salary
##  Length:320                       
##  Class :character                 
##  Mode  :character                 
##                                   
##                                   
## 

10.2 Show Plots

10.2.1 degrees-that-pay-back

df_degrees_that_pay_back %>%
  ggplot(aes(x=`Undergraduate Major`, y=`Starting Median Salary`)) +
    geom_bar(stat="identity", position=position_dodge(), colour="black", width = 0.5) +
    ggtitle("Starting Median Salary for Undergraduate Major") +
    xlab("Undergraduate Major") + ylab("Starting Median Salary") +
    #geom_text(aes(label=`Starting Median Salary`), vjust=0.5, hjust=1.1,color="black") +
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
    coord_flip()

10.2.2 salaries-by-college-type

head(df_salaries_by_college_type) %>%
  ggplot(aes(x=`School Name`, y=`Starting Median Salary`)) +
    geom_bar(stat="identity", position=position_dodge(), colour="black", width = 0.5) +
    ggtitle("Starting Median Salary for School Name") +
    xlab("School Name") + ylab("Starting Median Salary") +
    #geom_text(aes(label=`Starting Median Salary`), vjust=0.5, hjust=1.1,color="black") +
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

10.2.3 salaries-by-region

head(df_salaries_by_region) %>%
  ggplot(aes(x=`School Name`, y=`Starting Median Salary`)) +
    geom_bar(stat="identity", position=position_dodge(), colour="black", width = 0.5) +
    ggtitle("Starting Median Salary for School Name") +
    xlab("School Name") + ylab("Starting Median Salary") +
    #geom_text(aes(label=`Starting Median Salary`), vjust=0.5, hjust=1.1,color="black") +
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

10.3 Show ScatterPlots

10.3.1 degrees-that-pay-back

qplot(df_degrees_that_pay_back$`Starting Median Salary`, df_degrees_that_pay_back$`Mid-Career Median Salary`, data = df_degrees_that_pay_back)

10.3.2 salaries-by-college-type

plot(df_salaries_by_college_type$`Starting Median Salary`, df_salaries_by_college_type$`Mid-Career Median Salary`, 
     main="df_salaries_by_college_type", 
     xlab="Starting Median Salary ", ylab="Mid-Career Median Salary", pch=19)

10.3.3 salaries-by-region

ggplot(df_salaries_by_region, 
       aes(x=`Starting Median Salary`, y=`Mid-Career Median Salary`)) + 
  geom_point(shape=1) +  
  geom_smooth(method=lm , color="red", se=TRUE)

10.4 Top Paying Degrees

#Plot the graph for starting median salary 
ggplot(df_degrees_that_pay_back,aes(x=`Undergraduate Major`, y=`Starting Median Salary`, fill="red")) + 
  geom_col() + 
  geom_text(aes(label = `Starting Median Salary`), angle = 90, hjust = 1) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

It is found that Top three degrees that has higher starting median salary is

    1. Physician Assistant ($74,300)
    1. Chemical Engineering ($63,200)
    1. Computer Engineering ($61,400)
#Plot bar graph for mid career median salary
ggplot(df_degrees_that_pay_back,aes(x=`Undergraduate Major`, y=`Mid-Career Median Salary`, fill="red")) + 
  geom_col() +  
  geom_text(aes(label = `Mid-Career Median Salary`), angle = 90, hjust = 1) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

It is found that top three degrees that has higher mid career median salary

    1. Chemical Engineering ($107,000)
    1. Computer Engineering ($105,000)
    1. Electrical Engineering ($103,000)
#Plot bar graph for percentage increase in median salary when compared to starting salary
ggplot(df_degrees_that_pay_back,aes(x=`Undergraduate Major`, y=`Percent change from Starting to Mid-Career Salary`, fill="red")) + 
  geom_col() +  
  geom_text(aes(label = `Percent change from Starting to Mid-Career Salary`), angle = 90, hjust = 1) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

It is found that top three degrees that has higher percentage increase in median salary compared to starting salary

    1. Math (103.5%)
    1. Philoshophy (103.5%)
    1. International Relations (97.8%)
#Plot the graph by combining startm, mid career,10th, 25th, 75th and 90th percentile of salary
degrees_pay_back_melt <- df_degrees_that_pay_back %>% select(c(1:3,5:8))
colnames(degrees_pay_back_melt) <- c("Subject_major", "start", "mid_median", "mid_10", "mid_25", "mid_75", "mid_90")
degrees_pay_back_melt <- melt(degrees_pay_back_melt,id.vars='Subject_major', variable.name = 'Quartiles', value.name = 'Salary_pack')

ggplot(degrees_pay_back_melt) + 
  geom_point(aes(x = reorder(Subject_major, Salary_pack), y = Salary_pack, colour = Quartiles), xlab="Undergraduate Major", ylab="Salary") +
  coord_flip() + 
  scale_colour_discrete(breaks = c("start", "mid_median", "mid_10", "mid_25", "mid_75", "mid_90"), 
  labels = c("Starting salary", "Mid-Career Median salary", "Mid-Career 10 percent salary", "Mid-Career 25 percent salary", "Mid-Career 75 percent salary", "Mid-Career 90 percent salary" ))