Project 2

Author

Efren Martinez

Project 2

Where to Find the Best Salaries: States and Universities

Money for Degree Image

Source: https://www.timeshighereducation.com

Introduction

Salaries and Universities

My data set focuses on salaries and universities, enabling comparisons of starting salaries and career-level salaries by school type and region. The data sets, collected and downloaded from Kaggle, originally sourced from Wall Street Journal HTML pages and PayScale Inc., reveal that the type and region of college selected can significantly impact salary outcomes. The data includes salaries by colleges by region, salaries by college type, salaries by degree type. I cleaned the data using simple actions like converting to lowercase and using gsub for uniform column titles, as well as complex actions like mutate to correct some plots to numeric values. This topic is important to me as a student because the choice of school can significantly impact one’s financial future. Understanding how to repay student loans is crucial for making the right investment in education. The data sets list specific colleges and universities by region and type, providing median starting salaries, mid-point salaries, and follow-up percentile salaries for each institution. This information could help government agencies or private firms understand the economic impact of student loans and the job market for graduates. I plan to create visuals to help future students make informed decisions when choosing a college or university. I suspect Ivy League schools and schools in more populated cities, particularly in the Northeastern region and California, will have higher starting and continuing salaries. This data could also help a small Midwestern student make a decision without feeling compelled to move to the coast for better opportunities.

Load the libraries and set the working directory to read CSV files

library(tidyverse)
library(tidyr)
library(leaflet)
library(viridis)
library(RColorBrewer)
library(ggthemes)
setwd("C:/Users/Marti/OneDrive/Desktop/MC-DV")

degrees_that_pay_back <- read_csv("degrees_that_pay_back.csv")
salaries_by_region <- read_csv("salaries_by_region.csv")
salaries_by_college_type <- read_csv("salaries_by_college_type.csv")

Preview the Data

head(degrees_that_pay_back)
# A tibble: 6 × 8
  `Undergraduate Major` `Starting Median Salary` `Mid-Career Median Salary`
  <chr>                 <chr>                    <chr>                     
1 Accounting            $46,000.00               $77,100.00                
2 Aerospace Engineering $57,700.00               $101,000.00               
3 Agriculture           $42,600.00               $71,900.00                
4 Anthropology          $36,800.00               $61,500.00                
5 Architecture          $41,600.00               $76,800.00                
6 Art History           $35,800.00               $64,900.00                
# ℹ 5 more variables:
#   `Percent change from Starting to Mid-Career Salary` <dbl>,
#   `Mid-Career 10th Percentile Salary` <chr>,
#   `Mid-Career 25th Percentile Salary` <chr>,
#   `Mid-Career 75th Percentile Salary` <chr>,
#   `Mid-Career 90th Percentile Salary` <chr>
head(salaries_by_college_type)
# A tibble: 6 × 8
  `School Name`      `School Type` Starting Median Sala…¹ Mid-Career Median Sa…²
  <chr>              <chr>         <chr>                  <chr>                 
1 Massachusetts Ins… Engineering   $72,200.00             $126,000.00           
2 California Instit… Engineering   $75,500.00             $123,000.00           
3 Harvey Mudd Colle… Engineering   $71,800.00             $122,000.00           
4 Polytechnic Unive… Engineering   $62,400.00             $114,000.00           
5 Cooper Union       Engineering   $62,200.00             $114,000.00           
6 Worcester Polytec… Engineering   $61,000.00             $114,000.00           
# ℹ abbreviated names: ¹​`Starting Median Salary`, ²​`Mid-Career Median Salary`
# ℹ 4 more variables: `Mid-Career 10th Percentile Salary` <chr>,
#   `Mid-Career 25th Percentile Salary` <chr>,
#   `Mid-Career 75th Percentile Salary` <chr>,
#   `Mid-Career 90th Percentile Salary` <chr>
head(salaries_by_region)
# A tibble: 6 × 8
  `School Name`             Region Starting Median Sala…¹ Mid-Career Median Sa…²
  <chr>                     <chr>  <chr>                  <chr>                 
1 Stanford University       Calif… $70,400.00             $129,000.00           
2 California Institute of … Calif… $75,500.00             $123,000.00           
3 Harvey Mudd College       Calif… $71,800.00             $122,000.00           
4 University of California… Calif… $59,900.00             $112,000.00           
5 Occidental College        Calif… $51,900.00             $105,000.00           
6 Cal Poly San Luis Obispo  Calif… $57,200.00             $101,000.00           
# ℹ abbreviated names: ¹​`Starting Median Salary`, ²​`Mid-Career Median Salary`
# ℹ 4 more variables: `Mid-Career 10th Percentile Salary` <chr>,
#   `Mid-Career 25th Percentile Salary` <chr>,
#   `Mid-Career 75th Percentile Salary` <chr>,
#   `Mid-Career 90th Percentile Salary` <chr>

Clean up the data!

Lowering the header’s capital letters and connecting separated words with underscores will help uniform and find the variables later.

names(salaries_by_region) <- tolower(names(salaries_by_region))
names(salaries_by_region) <- gsub(" ","_",names(salaries_by_region))
# gsub will remove spaces in between words in the headers and replace them with underscore
head(salaries_by_region)
# A tibble: 6 × 8
  school_name               region starting_median_salary mid-career_median_sa…¹
  <chr>                     <chr>  <chr>                  <chr>                 
1 Stanford University       Calif… $70,400.00             $129,000.00           
2 California Institute of … Calif… $75,500.00             $123,000.00           
3 Harvey Mudd College       Calif… $71,800.00             $122,000.00           
4 University of California… Calif… $59,900.00             $112,000.00           
5 Occidental College        Calif… $51,900.00             $105,000.00           
6 Cal Poly San Luis Obispo  Calif… $57,200.00             $101,000.00           
# ℹ abbreviated name: ¹​`mid-career_median_salary`
# ℹ 4 more variables: `mid-career_10th_percentile_salary` <chr>,
#   `mid-career_25th_percentile_salary` <chr>,
#   `mid-career_75th_percentile_salary` <chr>,
#   `mid-career_90th_percentile_salary` <chr>
names(salaries_by_college_type) <- tolower(names(salaries_by_college_type))
names(salaries_by_college_type) <- gsub(" ","_",names(salaries_by_college_type))
# gsub will remove spaces in between words in the headers and replace them with underscore
head(salaries_by_college_type)
# A tibble: 6 × 8
  school_name          school_type starting_median_salary mid-career_median_sa…¹
  <chr>                <chr>       <chr>                  <chr>                 
1 Massachusetts Insti… Engineering $72,200.00             $126,000.00           
2 California Institut… Engineering $75,500.00             $123,000.00           
3 Harvey Mudd College  Engineering $71,800.00             $122,000.00           
4 Polytechnic Univers… Engineering $62,400.00             $114,000.00           
5 Cooper Union         Engineering $62,200.00             $114,000.00           
6 Worcester Polytechn… Engineering $61,000.00             $114,000.00           
# ℹ abbreviated name: ¹​`mid-career_median_salary`
# ℹ 4 more variables: `mid-career_10th_percentile_salary` <chr>,
#   `mid-career_25th_percentile_salary` <chr>,
#   `mid-career_75th_percentile_salary` <chr>,
#   `mid-career_90th_percentile_salary` <chr>
names(degrees_that_pay_back) <- tolower(names(degrees_that_pay_back))
names(degrees_that_pay_back) <- gsub(" ","_",names(degrees_that_pay_back))
# gsub will remove spaces in between words in the headers and replace them with underscore
head(degrees_that_pay_back)
# A tibble: 6 × 8
  undergraduate_major   starting_median_salary `mid-career_median_salary`
  <chr>                 <chr>                  <chr>                     
1 Accounting            $46,000.00             $77,100.00                
2 Aerospace Engineering $57,700.00             $101,000.00               
3 Agriculture           $42,600.00             $71,900.00                
4 Anthropology          $36,800.00             $61,500.00                
5 Architecture          $41,600.00             $76,800.00                
6 Art History           $35,800.00             $64,900.00                
# ℹ 5 more variables:
#   `percent_change_from_starting_to_mid-career_salary` <dbl>,
#   `mid-career_10th_percentile_salary` <chr>,
#   `mid-career_25th_percentile_salary` <chr>,
#   `mid-career_75th_percentile_salary` <chr>,
#   `mid-career_90th_percentile_salary` <chr>

Now we will try to join the two data sets that have some matching data for Salaries by Region and by College Type

combined_college_type_region <- salaries_by_region |>
  inner_join(salaries_by_college_type, by = "school_name")

head(combined_college_type_region )
# A tibble: 6 × 15
  school_name               region starting_median_sala…¹ mid-career_median_sa…²
  <chr>                     <chr>  <chr>                  <chr>                 
1 California Institute of … Calif… $75,500.00             $123,000.00           
2 Harvey Mudd College       Calif… $71,800.00             $122,000.00           
3 University of California… Calif… $59,900.00             $112,000.00           
4 Occidental College        Calif… $51,900.00             $105,000.00           
5 Cal Poly San Luis Obispo  Calif… $57,200.00             $101,000.00           
6 University of California… Calif… $52,600.00             $101,000.00           
# ℹ abbreviated names: ¹​starting_median_salary.x, ²​`mid-career_median_salary.x`
# ℹ 11 more variables: `mid-career_10th_percentile_salary.x` <chr>,
#   `mid-career_25th_percentile_salary.x` <chr>,
#   `mid-career_75th_percentile_salary.x` <chr>,
#   `mid-career_90th_percentile_salary.x` <chr>, school_type <chr>,
#   starting_median_salary.y <chr>, `mid-career_median_salary.y` <chr>,
#   `mid-career_10th_percentile_salary.y` <chr>, …

As a reminder from the Professor we should remove all $ to help clean up the data.

I’ll do this to my X columns as when I joined the data the similar columns have an X and Y with the same value. This will give me one as a character and one as numeric.

combined_college_type_region2 <- combined_college_type_region |>
  mutate(starting_median_salary.x = as.numeric(gsub("[\\$,]", "", starting_median_salary.x))) 
head(combined_college_type_region2)
# A tibble: 6 × 15
  school_name               region starting_median_sala…¹ mid-career_median_sa…²
  <chr>                     <chr>                   <dbl> <chr>                 
1 California Institute of … Calif…                  75500 $123,000.00           
2 Harvey Mudd College       Calif…                  71800 $122,000.00           
3 University of California… Calif…                  59900 $112,000.00           
4 Occidental College        Calif…                  51900 $105,000.00           
5 Cal Poly San Luis Obispo  Calif…                  57200 $101,000.00           
6 University of California… Calif…                  52600 $101,000.00           
# ℹ abbreviated names: ¹​starting_median_salary.x, ²​`mid-career_median_salary.x`
# ℹ 11 more variables: `mid-career_10th_percentile_salary.x` <chr>,
#   `mid-career_25th_percentile_salary.x` <chr>,
#   `mid-career_75th_percentile_salary.x` <chr>,
#   `mid-career_90th_percentile_salary.x` <chr>, school_type <chr>,
#   starting_median_salary.y <chr>, `mid-career_median_salary.y` <chr>,
#   `mid-career_10th_percentile_salary.y` <chr>, …
head(degrees_that_pay_back)
# A tibble: 6 × 8
  undergraduate_major   starting_median_salary `mid-career_median_salary`
  <chr>                 <chr>                  <chr>                     
1 Accounting            $46,000.00             $77,100.00                
2 Aerospace Engineering $57,700.00             $101,000.00               
3 Agriculture           $42,600.00             $71,900.00                
4 Anthropology          $36,800.00             $61,500.00                
5 Architecture          $41,600.00             $76,800.00                
6 Art History           $35,800.00             $64,900.00                
# ℹ 5 more variables:
#   `percent_change_from_starting_to_mid-career_salary` <dbl>,
#   `mid-career_10th_percentile_salary` <chr>,
#   `mid-career_25th_percentile_salary` <chr>,
#   `mid-career_75th_percentile_salary` <chr>,
#   `mid-career_90th_percentile_salary` <chr>
head(combined_college_type_region2)
# A tibble: 6 × 15
  school_name               region starting_median_sala…¹ mid-career_median_sa…²
  <chr>                     <chr>                   <dbl> <chr>                 
1 California Institute of … Calif…                  75500 $123,000.00           
2 Harvey Mudd College       Calif…                  71800 $122,000.00           
3 University of California… Calif…                  59900 $112,000.00           
4 Occidental College        Calif…                  51900 $105,000.00           
5 Cal Poly San Luis Obispo  Calif…                  57200 $101,000.00           
6 University of California… Calif…                  52600 $101,000.00           
# ℹ abbreviated names: ¹​starting_median_salary.x, ²​`mid-career_median_salary.x`
# ℹ 11 more variables: `mid-career_10th_percentile_salary.x` <chr>,
#   `mid-career_25th_percentile_salary.x` <chr>,
#   `mid-career_75th_percentile_salary.x` <chr>,
#   `mid-career_90th_percentile_salary.x` <chr>, school_type <chr>,
#   starting_median_salary.y <chr>, `mid-career_median_salary.y` <chr>,
#   `mid-career_10th_percentile_salary.y` <chr>, …

Filter for top ten paying schools

top_10_paying_schools <- combined_college_type_region2 |>
  arrange(desc(`starting_median_salary.x`)) |>
  slice(1:10)
head(top_10_paying_schools)
# A tibble: 6 × 15
  school_name               region starting_median_sala…¹ mid-career_median_sa…²
  <chr>                     <chr>                   <dbl> <chr>                 
1 California Institute of … Calif…                  75500 $123,000.00           
2 Massachusetts Institute … North…                  72200 $126,000.00           
3 Harvey Mudd College       Calif…                  71800 $122,000.00           
4 Princeton University      North…                  66500 $131,000.00           
5 Harvard University        North…                  63400 $124,000.00           
6 Polytechnic University o… North…                  62400 $114,000.00           
# ℹ abbreviated names: ¹​starting_median_salary.x, ²​`mid-career_median_salary.x`
# ℹ 11 more variables: `mid-career_10th_percentile_salary.x` <chr>,
#   `mid-career_25th_percentile_salary.x` <chr>,
#   `mid-career_75th_percentile_salary.x` <chr>,
#   `mid-career_90th_percentile_salary.x` <chr>, school_type <chr>,
#   starting_median_salary.y <chr>, `mid-career_median_salary.y` <chr>,
#   `mid-career_10th_percentile_salary.y` <chr>, …
summary(top_10_paying_schools$starting_median_salary.x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  61000   61900   62900   65790   70475   75500 
ggplot(top_10_paying_schools, aes(x = school_name, y = starting_median_salary.y, fill= school_type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), size = 1.5) + 
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "School Name",
       y = "Starting Median Salary ($)",
       fill = "School Type") +
  scale_fill_brewer(palette = "Set1") +
   theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6),
        plot.title = element_text(hjust = 0.5))  
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

This is a good visual, but trying to clean it up using the numerical column lets switch to a geom_point.

ggplot(top_10_paying_schools, aes(x = school_name, y = starting_median_salary.x, color = school_type)) +
  geom_point(size = 6) +
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "School Name",
       y = "Starting Median Salary ($)",
       color = "School Type") +
  scale_color_brewer(palette = "Set1") +
  scale_y_continuous(limits = c(60000, 75500), breaks = seq(60000, 75000, by = 5000)) + 
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6),
        plot.title = element_text(hjust = 0.5))

With the school name very small and hard to see including the fact that the region is excluded, lets try to improve the visuals one for each plot.

ggplot(top_10_paying_schools, aes(x = school_name, y = starting_median_salary.y, fill = school_type, color = region)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), size = 1.5) + 
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "School Name",
       y = "Starting Median Salary ($)",
       fill = "School Type",
       color = "Region") +
  scale_fill_brewer(palette = "Dark2") +
  scale_color_brewer(palette = "Accent") +  
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1, 
                                   vjust = 0.5, 
                                   size = 10),  
        axis.text.y = element_text(size = 8),
        plot.title = element_text(hjust = 0.5)) +
  coord_flip()

Also flipping for our geom_point.

ggplot(top_10_paying_schools, aes(x = school_name, y = starting_median_salary.x, fill = school_type, color = region)) +
 geom_point(size = 6, shape = 21, stroke = 2) +
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "School Name",
       y = "Starting Median Salary ($)",
       fill = "School Type",
       color = "Region") +
  scale_fill_brewer(palette = "Set3" ) +
  scale_color_brewer(palette = "Dark2") +  
  scale_y_continuous(limits = c(60000, 75500), breaks = seq(60000, 75500, by = 5000)) + 
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1, 
                                   vjust = 0.5, 
                                   size = 10),  
        axis.text.y = element_text(size = 8),
        plot.title = element_text(hjust = 0.5)) +
  coord_flip()

These two were very interesting plot that was counter to my initial assumption that Ivy League would top out all other schools. As prestigious and selective as those schools may be, it doesn’t seem to translate to average salary post graduation compared to engineering schools.I was correct in my other assumption that the Northeastern and California regions would have the best salaries. This is also based on assumptions of cost of living and job markets in larger cities. The top of the chart is California Institute of Technology as an engineering school, however only two of the top ten are from California, the rest belonging to the Northeastern region. Similarly, only two out of the top ten are from the Ivy League getting beaten out by the engineering school type.

Mid-West Roots

Being from the mid-west I’d also like to make a similar visual just for the Midwest.

Filter for Mid-West schools

midwestern_schools <- combined_college_type_region2 |>
 filter(region == "Midwestern", school_type != "Party")
head(midwestern_schools)
# A tibble: 6 × 15
  school_name               region starting_median_sala…¹ mid-career_median_sa…²
  <chr>                     <chr>                   <dbl> <chr>                 
1 Carleton College          Midwe…                  47500 $103,000.00           
2 Illinois Institute of Te… Midwe…                  56000 $97,800.00            
3 University of Illinois a… Midwe…                  52900 $96,100.00            
4 University of Missouri -… Midwe…                  57100 $95,800.00            
5 South Dakota School of M… Midwe…                  55800 $93,400.00            
6 University of Michigan    Midwe…                  52700 $93,000.00            
# ℹ abbreviated names: ¹​starting_median_salary.x, ²​`mid-career_median_salary.x`
# ℹ 11 more variables: `mid-career_10th_percentile_salary.x` <chr>,
#   `mid-career_25th_percentile_salary.x` <chr>,
#   `mid-career_75th_percentile_salary.x` <chr>,
#   `mid-career_90th_percentile_salary.x` <chr>, school_type <chr>,
#   starting_median_salary.y <chr>, `mid-career_median_salary.y` <chr>,
#   `mid-career_10th_percentile_salary.y` <chr>, …

Now I’ll find the top ten paying schools out of the Midwestern data.

Filter for top ten paying schools

top_10_paying_midwestern_schools <- midwestern_schools|>
  arrange(desc(`starting_median_salary.x`)) |>
  slice(1:10)
head(top_10_paying_midwestern_schools )
# A tibble: 6 × 15
  school_name               region starting_median_sala…¹ mid-career_median_sa…²
  <chr>                     <chr>                   <dbl> <chr>                 
1 University of Missouri -… Midwe…                  57100 $95,800.00            
2 Illinois Institute of Te… Midwe…                  56000 $97,800.00            
3 South Dakota School of M… Midwe…                  55800 $93,400.00            
4 University of Illinois a… Midwe…                  52900 $96,100.00            
5 University of Michigan    Midwe…                  52700 $93,000.00            
6 Purdue University         Midwe…                  51400 $90,500.00            
# ℹ abbreviated names: ¹​starting_median_salary.x, ²​`mid-career_median_salary.x`
# ℹ 11 more variables: `mid-career_10th_percentile_salary.x` <chr>,
#   `mid-career_25th_percentile_salary.x` <chr>,
#   `mid-career_75th_percentile_salary.x` <chr>,
#   `mid-career_90th_percentile_salary.x` <chr>, school_type <chr>,
#   starting_median_salary.y <chr>, `mid-career_median_salary.y` <chr>,
#   `mid-career_10th_percentile_salary.y` <chr>, …
ggplot(top_10_paying_midwestern_schools , aes(x = school_name, y = starting_median_salary.y, fill= school_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "College",
       y = "Starting Median Salary ($)",
       fill = "School Type") +
  scale_fill_brewer(palette = "Set3") +
   theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
        plot.title = element_text(hjust = 0.5))  # Center the plot title

(Due to one University being categorized as a party school and a state school I went back and filtered out party schools)

ggplot(top_10_paying_midwestern_schools , aes(x = school_name, y = starting_median_salary.x, fill= school_type)) +
  geom_point(size = 6, shape = 21, stroke = 1.5) +
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "College",
       y = "Starting Median Salary ($)",
       fill = "School Type") +
  scale_fill_brewer(palette = "Set3") +
  scale_y_continuous(limits = c(45000, 60000), breaks = seq(45000, 60000, by = 5000)) +    
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
        plot.title = element_text(hjust = 0.5))  # Center the plot title

ggplot(top_10_paying_midwestern_schools, aes(x = school_name, y = starting_median_salary.y, fill = school_type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), size = 1.5) + 
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "School Name",
       y = "Starting Median Salary ($)",
       fill = "School Type") +
  scale_fill_brewer(palette = "Set3") + 
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1, 
                                   vjust = 0.5, 
                                   size = 10),  
        axis.text.y = element_text(size = 8),
        plot.title = element_text(hjust = 0.5)) +
  coord_flip()

Same change for a numerical column plot for starting median salary -

ggplot(top_10_paying_midwestern_schools, aes(x = school_name, y = starting_median_salary.x, fill = school_type)) +
   geom_point(size = 6, shape = 21, stroke = 1.5) +
  labs(title = "Top 10 Paying Schools by Starting Median Salary",
       x = "School Name",
       y = "Starting Median Salary ($)",
       fill = "School Type") +
  scale_fill_brewer(palette = "Set3") + 
  scale_y_continuous(limits = c(45000, 60000), breaks = seq(45000, 60000, by = 5000)) +  
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1, 
                                   vjust = 0.5, 
                                   size = 10),  
        axis.text.y = element_text(size = 8),
        plot.title = element_text(hjust = 0.5)) +
  coord_flip()

Seeing these breakdowns could be a multitude of ways of showing different students their region, or school types that they would like to view and apply to to make their decision.

School Type Median Starting Salaries

If we will try to see any trends of the school type lets find the median of each school type.

Lets group by school type

avg_starting_salary_school_type <- combined_college_type_region2 |>
  group_by(school_type) |>
  summarize(avg_starting_salary = mean(starting_median_salary.x, na.rm =TRUE))
head(avg_starting_salary_school_type)
# A tibble: 5 × 2
  school_type  avg_starting_salary
  <chr>                      <dbl>
1 Engineering               59411.
2 Ivy League                60475 
3 Liberal Arts              45747.
4 Party                     45715 
5 State                     44126.
summary(avg_starting_salary_school_type)
 school_type        avg_starting_salary
 Length:5           Min.   :44126      
 Class :character   1st Qu.:45715      
 Mode  :character   Median :45747      
                    Mean   :51095      
                    3rd Qu.:59411      
                    Max.   :60475      

Now to plot -

ggplot(avg_starting_salary_school_type, aes(x = school_type, y = avg_starting_salary, fill = school_type)) +
  geom_point(size = 6, shape = 21, stroke = 1.5) +
  labs(title = "Average Starting Median Salary by School Type",
       x = "",
       y = "Average  Starting Median Salary",
       fill = "School Type") +
    scale_fill_brewer(palette = "Set3") + 
    scale_y_continuous(limits = c(42500, 65000), breaks = seq(42500, 65000, by = 5000)) +  
  theme_stata()

In this graph we see the average across all school types and see that Ivy League does top out the rest of the school types, with engineering at a close second. While this is somewhat useful, I think it would be better to have an interactive chart when a student can view filter and highlight these different aspects to see the top paying schools.

Now I will explore this data set in Tableau.

I will attempt to make a visual to show each university each school type and region and arrange them starting with the median starting salary. With this interactive sheet it will be able to display all universities and not be cut down to a limit of 10 for space. I will color code this with the school type as well. Once developing I also realized it would be useful to be able to have a map of each university and plot it. To achieve this I created an excel and found all the longitude and latitude of the universities I had data for and saved as a CSV.

university_long_lat <- read_csv("Universities_Coordinates.csv")
Rows: 321 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): University
dbl (2): Latitude, Longitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(university_long_lat)
# A tibble: 6 × 3
  University                           Latitude Longitude
  <chr>                                   <dbl>     <dbl>
1 Adelphi University                       40.7     -73.7
2 American University, Washington D.C.     38.9     -77.1
3 Amherst College                          42.4     -72.5
4 Appalachian State University             36.2     -81.7
5 Arizona State University (ASU)           33.4    -112. 
6 Arkansas State University (ASU)          35.8     -90.7

I’ll continue to use tableau and make a new map of the universities with the same data, using legends to color the regions, shape for school type, and size for starting salary. I used tool tips to include more information about mid-career salary amounts for the same school.

https://public.tableau.com/app/profile/efren.martinez5743/viz/Project2-CollegeSalaries/Dashboard1?publish=yes

Tableau Public - Efren Martinez Project 2 College Salaries

Tableau Public - Efren Martinez Project 2 College Salaries

Reviewing these data sets, I downloaded them from a Kaggle link that lacks a README file but cites Wall Street Journal articles as the source of the data, formatted into CSV files. These articles don’t have a published date but do mention PayScale Inc. as the original data source. Although PayScale’s website has newer and more robust data, it is not available for CSV download. Their College Salary Report Methodology details how they collect data, primarily through an ongoing online compensation survey of U.S. data. They note that data inclusion or exclusion is not based on school quality, typical graduate earnings, selectivity, or location within the U.S. Using this data, I created a tool that could be valuable for prospective students. It allows them to filter, review, and select different data categories to explore potential future career outcomes. For instance, Engineering and Ivy League schools have the highest median salaries by school type, while California and the Northeast regions show the highest median starting salaries. This insight could influence high-performing students to consider these locations regardless of their origins. The dashboard also allows users to filter for more selective or local options, making it useful for a wide range of audiences comparing their choices. One limitation is the inability to include the most recent 2023 survey data, which would have allowed for interesting comparisons and trend analysis across different categories and regions.

Bibliography

  1. Wall Street Journal. “Degrees That Pay You Back.” Wall Street Journal, n.d., https://www.wsj.com/public/resources/documents/info-Degrees_that_Pay_you_Back-sort.html.

  2. Wall Street Journal. “Salaries for Colleges by Region.” Wall Street Journal, n.d., https://www.wsj.com/public/resources/documents/info-Salaries_for_Colleges_by_Region-sort.html.

  3. Wall Street Journal. “Salaries for Colleges by Type.” Wall Street Journal, n.d., https://www.wsj.com/public/resources/documents/info-Salaries_for_Colleges_by_Type-sort.html#top.

  4. Payscale. “College Salary Report Methodology.” Payscale, n.d., https://www.payscale.com/college-salary-report/methodology.