This presentation explains how education level impacts income, savings, and net worth over time using a simulated dataset in R.
People in the training population have several variables:
These are commented out to avoid slide errors if the package is already installed.
set.seed(42)
N <- 50000
EducationLevels <- c("High School", "Associate", "Bachelor", "Master", "PhD", "Professional")
Df <- data.frame(
ID = 1:N,
Age = sample(18:70, N, TRUE),
Education = sample(EducationLevels, N, TRUE, prob = c(0.15, 0.25, 0.25, 0.15, 0.10, 0.10))
)This code creates 50,000 instances as a training population. I tested with a testing population in a separate notebook and got 96% accuracy. When I asked ChatGPT to cross-reference this with studies and surveys, I got around 85% accuracy.
EducationAges <- c("High School" = 18, "Associate" = 20, "Bachelor" = 22,
"Master" = 24, "PhD" = 28, "Professional" = 30)
Df <- Df %>%
mutate(
SchoolEndAge = EducationAges[Education],
EmploymentStatus = dplyr::case_when(
Age < SchoolEndAge ~ "Student",
Age <= 21 ~ "Part-time",
Age < 55 ~ "Full-time",
TRUE ~ "Retired"
),
WorkExperience = pmax(0, Age - SchoolEndAge)
)This code determines when a person starts and stops working and how long they’ve been working (used later for raises).
GenerateIncome <- function(Edu, Exp, Status) {
if (Status != "Full-time") return(0)
BaseSalaries <- c("High School" = 770 * 52, "Associate" = 900 * 52, "Bachelor" = 1300 * 52,
"Master" = 1500 * 52, "PhD" = 1800 * 52, "Professional" = 2100 * 52)
ExpBumps <- c("High School" = 700, "Associate" = 800, "Bachelor" = 900,
"Master" = 1000, "PhD" = 1200, "Professional" = 1500)
Sds <- c("High School" = 9000, "Associate" = 11000, "Bachelor" = 13000,
"Master" = 17000, "PhD" = 22000, "Professional" = 10000)
rnorm(1, BaseSalaries[Edu] + Exp * ExpBumps[Edu], Sds[Edu])
}Base income straight after their education with no experience (higher with better educations) then raises based off of work experience (randomized) and then because of the random element in it I added a sds function to remove outliers.
Df$Income <- mapply(GenerateIncome, Df$Education, Df$WorkExperience, Df$EmploymentStatus)
Df <- Df[abs(scale(Df$Income)) < 3, ]This applies the income model and removes outliers using the Z-score method:
z = (x - μ) / σ|z| < 3CostOfLiving <- c("High School" = 40000, "Associate" = 45000,
"Bachelor" = 50000, "Master" = 55000,
"PhD" = 55000, "Professional" = 55000)
Df <- Df %>%
mutate(
AnnualSavings = pmax(Income - CostOfLiving[Education], 0)
)The code subtracts the cost of living and then uses it to calculate annual savings.
StudentLoans <- data.frame(
Education = EducationLevels,
TotalDebt = c(0, 0, 30000, 45000, 55000, 90000),
MonthlyPayment = c(0, 0, 300, 450, 450, 700),
LoanYears = c(0, 0, 10, 10, 12, 15)
)
Df <- Df %>%
left_join(StudentLoans, by = "Education") %>%
mutate(
YearsSinceSchool = pmax(0, Age - SchoolEndAge),
LoanPayment = ifelse(YearsSinceSchool < LoanYears, MonthlyPayment * 12, 0),
AdjustedSavings = pmax(AnnualSavings - LoanPayment, 0)
)EducationColors <- c("High School" = "green",
"Associate" = "red",
"Bachelor" = "goldenrod",
"Master" = "deepskyblue",
"PhD" = "black",
"Professional" = "hotpink")
P <- ggplot(SummaryDf, aes(x = Age, y = AvgNetWorth, color = Education)) +
geom_smooth(se = FALSE, method = "loess", size = 1.2) +
labs(title = "Estimated net worth vs age by education",
x = "Age", y = "Estimated net wortth ($)", color = "Education") +
scale_color_manual(values = EducationColors) +
scale_y_continuous(limits = c(0, 2500000), labels = scales::comma) +
scale_x_continuous(limits = c(20, 54)) +
theme_minimal() +
theme(legend.position = "bottom")
Print(P)This graph explains how much extra money someone will have on average based on their education. People with more education, like PhDs and Professionals, may start with more debt but eventually earn more and save more later in life