Education and net worth graph, compares how much money you will make on average per education level

Matthew M

Introduction

This presentation explains how education level impacts income, savings, and net worth over time using a simulated dataset in R.

People in the training population have several variables:

  • A randomly assigned education, with factors such as:
    • Student loans
    • Costs of living
    • Age they start working
  • Age
    • Determines work experience
    • Determines if person is still in school
  • Net worth (Calculated based on these factors over time)

Required libraries

These are commented out to avoid slide errors if the package is already installed.

# if (!require(ggplot2)) install.packages("ggplot2")
# if (!require(dplyr)) install.packages("dplyr")

library(ggplot2)
library(dplyr)

Data setup

set.seed(42)

N <- 50000
EducationLevels <- c("High School", "Associate", "Bachelor", "Master", "PhD", "Professional")

Df <- data.frame(
  ID = 1:N,
  Age = sample(18:70, N, TRUE),
  Education = sample(EducationLevels, N, TRUE, prob = c(0.15, 0.25, 0.25, 0.15, 0.10, 0.10))
)

This code creates 50,000 instances as a training population. I tested with a testing population in a separate notebook and got 96% accuracy. When I asked ChatGPT to cross-reference this with studies and surveys, I got around 85% accuracy.

School end age and employment

EducationAges <- c("High School" = 18, "Associate" = 20, "Bachelor" = 22, 
                   "Master" = 24, "PhD" = 28, "Professional" = 30)

Df <- Df %>%
  mutate(
    SchoolEndAge = EducationAges[Education],
    EmploymentStatus = dplyr::case_when(
      Age < SchoolEndAge ~ "Student",
      Age <= 21 ~ "Part-time",
      Age < 55 ~ "Full-time",
      TRUE ~ "Retired"
    ),
    WorkExperience = pmax(0, Age - SchoolEndAge)
  )

This code determines when a person starts and stops working and how long they’ve been working (used later for raises).

Income generation function

GenerateIncome <- function(Edu, Exp, Status) {
  if (Status != "Full-time") return(0)
  BaseSalaries <- c("High School" = 770 * 52, "Associate" = 900 * 52, "Bachelor" = 1300 * 52,
                    "Master" = 1500 * 52, "PhD" = 1800 * 52, "Professional" = 2100 * 52)
  ExpBumps <- c("High School" = 700, "Associate" = 800, "Bachelor" = 900,
                "Master" = 1000, "PhD" = 1200, "Professional" = 1500)
  Sds <- c("High School" = 9000, "Associate" = 11000, "Bachelor" = 13000,
           "Master" = 17000, "PhD" = 22000, "Professional" = 10000)
  
  rnorm(1, BaseSalaries[Edu] + Exp * ExpBumps[Edu], Sds[Edu])
}

Base income straight after their education with no experience (higher with better educations) then raises based off of work experience (randomized) and then because of the random element in it I added a sds function to remove outliers.

Apply income and trim outliers

Df$Income <- mapply(GenerateIncome, Df$Education, Df$WorkExperience, Df$EmploymentStatus)
Df <- Df[abs(scale(Df$Income)) < 3, ]

This applies the income model and removes outliers using the Z-score method:

  • z = (x - μ) / σ
  • Keep only values where |z| < 3

Annual savings

CostOfLiving <- c("High School" = 40000, "Associate" = 45000, 
                  "Bachelor" = 50000, "Master" = 55000, 
                  "PhD" = 55000, "Professional" = 55000)

Df <- Df %>%
  mutate(
    AnnualSavings = pmax(Income - CostOfLiving[Education], 0)
  )

The code subtracts the cost of living and then uses it to calculate annual savings.

Student loans

StudentLoans <- data.frame(
  Education = EducationLevels,
  TotalDebt = c(0, 0, 30000, 45000, 55000, 90000),
  MonthlyPayment = c(0, 0, 300, 450, 450, 700),
  LoanYears = c(0, 0, 10, 10, 12, 15)
)

Df <- Df %>%
  left_join(StudentLoans, by = "Education") %>%
  mutate(
    YearsSinceSchool = pmax(0, Age - SchoolEndAge),
    LoanPayment = ifelse(YearsSinceSchool < LoanYears, MonthlyPayment * 12, 0),
    AdjustedSavings = pmax(AnnualSavings - LoanPayment, 0)
  )

Net worth calculation

InterestRate <- 0.025

Df <- Df %>%
  mutate(
    NetWorth = ifelse(
      EmploymentStatus == "Full-time" & WorkExperience > 0,
      AdjustedSavings * ((1 + InterestRate)^WorkExperience - 1) / InterestRate,
      0
    )
  )

Summarizing by age and education

SummaryDf <- Df %>%
  filter(EmploymentStatus == "Full-time", Age >= 20, Age <= 54) %>%
  group_by(Age, Education) %>%
  summarise(AvgNetWorth = mean(NetWorth, na.rm = TRUE), .groups = 'drop')

Visualization

EducationColors <- c("High School" = "green",
                     "Associate" = "red",
                     "Bachelor" = "goldenrod",
                     "Master" = "deepskyblue",
                     "PhD" = "black",
                     "Professional" = "hotpink")

P <- ggplot(SummaryDf, aes(x = Age, y = AvgNetWorth, color = Education)) +
  geom_smooth(se = FALSE, method = "loess", size = 1.2) +
  labs(title = "Estimated net worth vs age by education",
       x = "Age", y = "Estimated net wortth ($)", color = "Education") +
  scale_color_manual(values = EducationColors) +
  scale_y_continuous(limits = c(0, 2500000), labels = scales::comma) +
  scale_x_continuous(limits = c(20, 54)) +
  theme_minimal() +
  theme(legend.position = "bottom")

Print(P)

conclusion

This graph explains how much extra money someone will have on average based on their education. People with more education, like PhDs and Professionals, may start with more debt but eventually earn more and save more later in life