Salary Increase By Type of College

We all want to know how our career grows. Some majors may not seem to be very interesting. Some majors may get meager pay to start, but things change once it gets into a mid-job. A year-long survey of 1.2 million people with only a bachelor’s degree by PayScale Inc. shows that graduates in Philosophy or International Relations subjects earned 103.5% and 97.8% more, respectively, about 10 years post-commencement.(http://online.wsj.com/public/resources/documents/info-Degrees_that_Pay_you_Back-sort.html)

These datasets are from
https://www.kaggle.com/wsj/college-salaries?select=degrees-that-pay-back.csv


Variables in this dataset:

Undergraduate Major: Categorical
Starting Median Salary: Quantitative
Mid-Career Median Salary: Quantitative
Percent change from Starting to Mid-Career Salary: Quantitative
Mid-Career 10th Percentile Salary: Quantitative
Mid-Career 25th Percentile Salary: Quantitative
Mid-Career Median: Quantitative
Mid-Career 75th Percentile Salary: Quantitative
Mid-Career 90th Percentile Salary: Quantitative

Even though I am majoring in Computer Science, I would love to know the earnings for other industries. Just in case someone I know needs advice when they are choosing their careers.
#install.packages("magrittr") # package installations are only needed the first time you use it
#install.packages("dplyr")    # alternative installation of the %>%
library(tidyverse) #Loading libraries and packages into session
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(tmap)
## Registered S3 methods overwritten by 'stars':
##   method             from
##   st_bbox.SpatRaster sf  
##   st_crs.SpatRaster  sf
library(tmaptools)
library(leaflet)
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(leaflet.extras)
library(dplyr)
library(rio)
library(sp)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:rio':
## 
##     export
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
library(ggplot2)
library(ggthemes)
### Set path
setwd("C:/Users/gru_e/OneDrive/Desktop/DATA110/R Projects/Project2")

#Load in Dataset1: Degree that pays back
degree_df <- read.csv("degrees-that-pay-back.csv")

#degree_df

Cleaning Data and rename variables

#CLEANING: Rename all variables
#Rename all variables
degree_df1 <- degree_df %>%
  rename(major = Undergraduate.Major,
         mdstart = Starting.Median.Salary,
         mdmid = Mid.Career.Median.Salary,
start_to_mid_percent = Percent.change.from.Starting.to.Mid.Career.Salary,
mid_career_10th = Mid.Career.10th.Percentile.Salary,
mid_career_25th = Mid.Career.25th.Percentile.Salary,
mid_career_75th = Mid.Career.75th.Percentile.Salary,
mid_career_90th = Mid.Career.90th.Percentile.Salary)
#degree_df1

Creating a chart shows “Starting Median Salary”.

#degree_df1 %>% arrange(desc(mdstart))

#degree_df1 %>% arrange(desc(mdstart))
#degree_df1 

gg <- ggplot(data = degree_df1, mapping = aes(x = major, y = mdstart)) +
  geom_bar(stat="identity", fill="#FF9999", colour="black") +  scale_y_discrete(labels = NULL, breaks = NULL, guide = guide_axis(angle = 90)) + labs(y = "")+ 
  coord_flip() +ggtitle("Median Newly Graduate Earnings")+
  xlab("Undergraduate Degree") + 
ylab("Salary $USD")+theme(axis.text.y = element_text(angle = 25,size = 5))+ scale_fill_brewer(palette="Dark2")

gg

Median starting salary: The median of what the students were earning right after graduation.

The top two highest pay for starting are Physician Assistants($74,300.00), and Chemical Engineering($63,200.00)
The lowest starting salaries are Religion($34,100) and Spanish($34,000.00)


Creating a chart shows “Mid-Career Median Earnings”.

degree_df2 <- degree_df %>%
  rename(major = Undergraduate.Major,
         mdstart = Starting.Median.Salary,
         mdmid = Mid.Career.Median.Salary,
start_to_mid_percent = Percent.change.from.Starting.to.Mid.Career.Salary,
mid_career_10th = Mid.Career.10th.Percentile.Salary,
mid_career_25th = Mid.Career.25th.Percentile.Salary,
mid_career_75th = Mid.Career.75th.Percentile.Salary,
mid_career_90th = Mid.Career.90th.Percentile.Salary)

#degree_df2 %>% arrange(desc(mdmid))


gg <- ggplot(data = degree_df2, mapping = aes(x = major, y = mdmid)) +
  geom_bar(stat="identity", fill="lightblue", colour="black") +  scale_y_discrete(labels = NULL, breaks = NULL, guide = guide_axis(angle = 90)) + labs(y = "")+ 
  coord_flip() +ggtitle("Median Mid-Career Earnings")+
  xlab("Undergraduate Degree") + 
ylab("Salary $USD")+theme(axis.text.y = element_text(angle = 25,size = 5))+ scale_fill_brewer(palette="Dark2")
gg

#ggplotly(gg)


The midcareer pay data is defined as the median salary for bachelor’s degree-holders with 10-plus years’ experience in the field. In general, the numbers aren’t too surprising — though some figures may raise eyebrows. (https://www.bizjournals.com/washington/news/2021/09/29/midcareer-salary-greater-washington-colleges.html#:~:text=The%20midcareer%20pay%20data%20is,some%20figures%20may%20raise%20eyebrows.)

The highest pay for Mid-Career has changed from Physician Assistants to Economics with the median of $98,600.00.


The lowest Mid-Career salary are tied for Education and
Religion at $52,000.00.

Creating a chart shows “Mid-Career Earnings Percentile”.

theme_set(theme_minimal())

#Deselect start_to_mid_percent variable because we do not need it right now
newdd <- select(degree_df1, -start_to_mid_percent)
#newdd


#Make Wide data to Long data for stacked bar graph showing percentile
data_long1 <- gather(newdd, percentile, salary, mdmid:mid_career_90th,na.rm = TRUE)
#data_long1

#reorder factor legend
theme_set(theme_minimal())
data_long1$percentile <- factor(data_long1$percentile, levels = c("mid_career_90th",
"mid_career_75th",
"mdmid",
"mid_career_25th",
"mid_career_10th"))


#Perform the graph showing Mid Career earnings
p <- ggplot(data_long1, aes(x = major, y = salary))+
  geom_col(aes(fill = percentile), width = 0.7)+  scale_y_discrete(labels = NULL, breaks = NULL, guide = guide_axis(angle = 90)) + labs(y = "")+ 
  coord_flip() +ggtitle("Mid-Career Earnings Percentile")+
  xlab("Undergraduate Degree") + 
  ylab("Salary $USD")+theme(axis.text.y = element_text(angle = 25,size = 6))+ scale_fill_brewer(palette="Dark2") 

#p + theme(axis.text.x = element_text(angle = 45,size = , hjust = 1))

ggplotly(p)
xm <- mean(newdd$mdmid)
## Warning in mean.default(newdd$mdmid): argument is not numeric or logical:
## returning NA


What are Percentile Salary?

The percentile salary estimate is the value of a salary which a certain percent of workers fall.

Mid-Career Percentiles: The median 10th, 25th, 50th, 75th and 90th percentiles of their salaries 10 years into their career.

https://medium.com/swlh/what-major-pays-an-analysis-of-the-salaries-of-undergraduate-majors-in-na-9c88d08c5b30


My finding is that a career such as Spanish can get less pay than others, to begin with. However, when it gets into more than ten years of experience, it seems like its earnings are one of the highest pay according to the 90th percentile.

Science majors such as IT, Industrial Engineering, Mechanical Engineering, and Physics seem to have a high median Mid-Career Salary. Therefore, we are in the right market.

On the other hand, people who are in Film, Business Management, and Hisroty seems to have low median Mid-Career.


Bonus Track: Salary by College Type

There are 5 types of college in this dataset:

  1. Engineering
  2. Party
  3. Liberal Arts Liberal arts colleges are colleges that teach students courses in a variety of liberal arts areas, including history, language, literature, math and more. Typically, liberal arts colleges are private, four-year institutions, and they tend to have relatively small class sizes. https://www.indeed.com/career-advice/career-development/types-of-colleges
  4. State Public universities are four-year institutions that are funded by governments. Often, public universities offer affordable tuition rates for state residents. Typically, public universities have large student populations and many student organizations and activities. Additionally, public universities usually have many smaller colleges within them, giving students a large range of degree programs to choose from. Public universities often also have distinguished faculty, meaning that students can learn from and network with industry experts. https://www.indeed.com/career-advice/career-development/types-of-colleges
  5. Ivy League
pay_by_college <- read.csv("salaries-by-college-type.csv")
#print(pay_by_college)

#names(pay_by_college)<-tolower(names(pay_by_college))##lowercased all variables
pay_by_college <- rename_with(pay_by_college, ~ tolower(gsub(".", "_", .x, fixed = TRUE)))
#pay_by_college


I want to know which college from Engineering school get pay the most, I filter only Engineering, and sorted it from highet pay to lowest.

#
  df1 <- pay_by_college %>% arrange(desc(starting_median_salary))
  df1 <- df1[, c(1, 2, 3, 4)] 
  only_en <- filter(df1, school_type == "Engineering" )
p <- df1 %>%
  filter(school_type == "Engineering") %>%
  mutate(school_name = fct_reorder(school_name, starting_median_salary)) %>%
ggplot( aes(x=school_name, y=starting_median_salary)) +
    geom_bar(stat="identity", fill="#2ECC71", alpha=.6, width=.4) +
    coord_flip() +
    xlab("") +
    theme_bw()+
  ggtitle("Engineering School's Newly Graduate Earnings")+
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
                      axis.ticks.x=element_blank()) 
p 


Therefore, the highest pay for Engineering school is California Institute of Technology (CIT). It is paying at $75,500.00 to start. The lowest pay for Engineering school is Tennessee Technological University at $46,200.00.