We all want to know how our career grows. Some majors may not seem to be very interesting. Some majors may get meager pay to start, but things change once it gets into a mid-job. A year-long survey of 1.2 million people with only a bachelor’s degree by PayScale Inc. shows that graduates in Philosophy or International Relations subjects earned 103.5% and 97.8% more, respectively, about 10 years post-commencement.(http://online.wsj.com/public/resources/documents/info-Degrees_that_Pay_you_Back-sort.html)
These datasets are from
https://www.kaggle.com/wsj/college-salaries?select=degrees-that-pay-back.csv
Undergraduate Major: Categorical
Starting Median Salary: Quantitative
Mid-Career Median Salary: Quantitative
Percent change from Starting to Mid-Career Salary: Quantitative
Mid-Career 10th Percentile Salary: Quantitative
Mid-Career 25th Percentile Salary: Quantitative
Mid-Career Median: Quantitative
Mid-Career 75th Percentile Salary: Quantitative
Mid-Career 90th Percentile Salary: Quantitative
#install.packages("magrittr") # package installations are only needed the first time you use it
#install.packages("dplyr") # alternative installation of the %>%
library(tidyverse) #Loading libraries and packages into session
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(tmap)
## Registered S3 methods overwritten by 'stars':
## method from
## st_bbox.SpatRaster sf
## st_crs.SpatRaster sf
library(tmaptools)
library(leaflet)
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(leaflet.extras)
library(dplyr)
library(rio)
library(sp)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:rio':
##
## export
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library(ggplot2)
library(ggthemes)
### Set path
setwd("C:/Users/gru_e/OneDrive/Desktop/DATA110/R Projects/Project2")
#Load in Dataset1: Degree that pays back
degree_df <- read.csv("degrees-that-pay-back.csv")
#degree_df
#CLEANING: Rename all variables
#Rename all variables
degree_df1 <- degree_df %>%
rename(major = Undergraduate.Major,
mdstart = Starting.Median.Salary,
mdmid = Mid.Career.Median.Salary,
start_to_mid_percent = Percent.change.from.Starting.to.Mid.Career.Salary,
mid_career_10th = Mid.Career.10th.Percentile.Salary,
mid_career_25th = Mid.Career.25th.Percentile.Salary,
mid_career_75th = Mid.Career.75th.Percentile.Salary,
mid_career_90th = Mid.Career.90th.Percentile.Salary)
#degree_df1
#degree_df1 %>% arrange(desc(mdstart))
#degree_df1 %>% arrange(desc(mdstart))
#degree_df1
gg <- ggplot(data = degree_df1, mapping = aes(x = major, y = mdstart)) +
geom_bar(stat="identity", fill="#FF9999", colour="black") + scale_y_discrete(labels = NULL, breaks = NULL, guide = guide_axis(angle = 90)) + labs(y = "")+
coord_flip() +ggtitle("Median Newly Graduate Earnings")+
xlab("Undergraduate Degree") +
ylab("Salary $USD")+theme(axis.text.y = element_text(angle = 25,size = 5))+ scale_fill_brewer(palette="Dark2")
gg
Median starting salary: The median of what the students were earning right after graduation.
The top two highest pay for starting are Physician Assistants($74,300.00), and Chemical Engineering($63,200.00)
The lowest starting salaries are Religion($34,100) and Spanish($34,000.00)
degree_df2 <- degree_df %>%
rename(major = Undergraduate.Major,
mdstart = Starting.Median.Salary,
mdmid = Mid.Career.Median.Salary,
start_to_mid_percent = Percent.change.from.Starting.to.Mid.Career.Salary,
mid_career_10th = Mid.Career.10th.Percentile.Salary,
mid_career_25th = Mid.Career.25th.Percentile.Salary,
mid_career_75th = Mid.Career.75th.Percentile.Salary,
mid_career_90th = Mid.Career.90th.Percentile.Salary)
#degree_df2 %>% arrange(desc(mdmid))
gg <- ggplot(data = degree_df2, mapping = aes(x = major, y = mdmid)) +
geom_bar(stat="identity", fill="lightblue", colour="black") + scale_y_discrete(labels = NULL, breaks = NULL, guide = guide_axis(angle = 90)) + labs(y = "")+
coord_flip() +ggtitle("Median Mid-Career Earnings")+
xlab("Undergraduate Degree") +
ylab("Salary $USD")+theme(axis.text.y = element_text(angle = 25,size = 5))+ scale_fill_brewer(palette="Dark2")
gg
#ggplotly(gg)
The midcareer pay data is defined as the median salary for bachelor’s degree-holders with 10-plus years’ experience in the field. In general, the numbers aren’t too surprising — though some figures may raise eyebrows. (https://www.bizjournals.com/washington/news/2021/09/29/midcareer-salary-greater-washington-colleges.html#:~:text=The%20midcareer%20pay%20data%20is,some%20figures%20may%20raise%20eyebrows.)
The highest pay for Mid-Career has changed from Physician Assistants to Economics with the median of $98,600.00.
The lowest Mid-Career salary are tied for Education and
Religion at $52,000.00.
theme_set(theme_minimal())
#Deselect start_to_mid_percent variable because we do not need it right now
newdd <- select(degree_df1, -start_to_mid_percent)
#newdd
#Make Wide data to Long data for stacked bar graph showing percentile
data_long1 <- gather(newdd, percentile, salary, mdmid:mid_career_90th,na.rm = TRUE)
#data_long1
#reorder factor legend
theme_set(theme_minimal())
data_long1$percentile <- factor(data_long1$percentile, levels = c("mid_career_90th",
"mid_career_75th",
"mdmid",
"mid_career_25th",
"mid_career_10th"))
#Perform the graph showing Mid Career earnings
p <- ggplot(data_long1, aes(x = major, y = salary))+
geom_col(aes(fill = percentile), width = 0.7)+ scale_y_discrete(labels = NULL, breaks = NULL, guide = guide_axis(angle = 90)) + labs(y = "")+
coord_flip() +ggtitle("Mid-Career Earnings Percentile")+
xlab("Undergraduate Degree") +
ylab("Salary $USD")+theme(axis.text.y = element_text(angle = 25,size = 6))+ scale_fill_brewer(palette="Dark2")
#p + theme(axis.text.x = element_text(angle = 45,size = , hjust = 1))
ggplotly(p)
xm <- mean(newdd$mdmid)
## Warning in mean.default(newdd$mdmid): argument is not numeric or logical:
## returning NA
The percentile salary estimate is the value of a salary which a certain percent of workers fall.
Mid-Career Percentiles: The median 10th, 25th, 50th, 75th and 90th percentiles of their salaries 10 years into their career.
My finding is that a career such as Spanish can get less pay than others, to begin with. However, when it gets into more than ten years of experience, it seems like its earnings are one of the highest pay according to the 90th percentile.
Science majors such as IT, Industrial Engineering, Mechanical Engineering, and Physics seem to have a high median Mid-Career Salary. Therefore, we are in the right market.
On the other hand, people who are in Film, Business Management, and Hisroty seems to have low median Mid-Career.
pay_by_college <- read.csv("salaries-by-college-type.csv")
#print(pay_by_college)
#names(pay_by_college)<-tolower(names(pay_by_college))##lowercased all variables
pay_by_college <- rename_with(pay_by_college, ~ tolower(gsub(".", "_", .x, fixed = TRUE)))
#pay_by_college
I want to know which college from Engineering school get pay the most, I filter only Engineering, and sorted it from highet pay to lowest.
#
df1 <- pay_by_college %>% arrange(desc(starting_median_salary))
df1 <- df1[, c(1, 2, 3, 4)]
only_en <- filter(df1, school_type == "Engineering" )
p <- df1 %>%
filter(school_type == "Engineering") %>%
mutate(school_name = fct_reorder(school_name, starting_median_salary)) %>%
ggplot( aes(x=school_name, y=starting_median_salary)) +
geom_bar(stat="identity", fill="#2ECC71", alpha=.6, width=.4) +
coord_flip() +
xlab("") +
theme_bw()+
ggtitle("Engineering School's Newly Graduate Earnings")+
theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
axis.ticks.x=element_blank())
p
Therefore, the highest pay for Engineering school is California Institute of Technology (CIT). It is paying at $75,500.00 to start. The lowest pay for Engineering school is Tennessee Technological University at $46,200.00.