Rpub Link:
http://rpubs.com/ssufian/536313
GitHub Link:
https://github.com/ssufian/Data_607
editorial piece by Veronica Walsingham in Ranker.com
https://www.ranker.com/list/millennials-most-screwed-generation/veronica-walsingham
Part 1:
Let’s explore and start by looking at their debt obligations relative to other generational groups, see below
Baby Boomers: Born 1946-1964 (54-72 years old)
Generation X: Born 1965-1980 (38-53 years old)
This exercise will focus on two forms of debt outstanding: Student loans and Credit Card loans
Part 2:
The 2nd phase is to look at their Job prospects thru the lens of economic growth indicators since the
great recession of 2008; GDP of economy and unemployement rate and tuition rates over time
in, do we agree or disagree with the statement above. Lets find out….?
url1 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/No_Student_Loan_by_Age.csv'
url2 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/Student_Loan_by_Age.csv'
# Reading & Loading data
df1 <- read.csv(file = url1 ,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=3, stringsAsFactors = F,header=T)
df2 <- read.csv(file = url2 ,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=3, stringsAsFactors = F,header=T)
df1 <- df1 %>% select(-c(X.1:X.11)) # Number of Student loan borrowers
df2 <- df2 %>% select(-c(X.1:X.142)) # Dollar amount in $millions of Student loans
oldnames = c("X30.39","X40.49","X50.59","X60.")
newnames = c("under30_39","under40_49","under50_59","over60")
dfclean1 <- df1 %>%
rename(Year=X) %>%
rename_at(vars(oldnames), ~ newnames)
dfclean2 <- df2 %>%
rename(Year=X) %>%
rename_at(vars(oldnames), ~ newnames)
head(dfclean1)
## Year under.30 under30_39 under40_49 under50_59 over60
## 1 2004 11.3 5.7 3.2 2.1 0.6
## 2 2005 12.1 6.1 3.5 2.4 0.7
## 3 2006 12.9 6.7 3.9 2.8 0.9
## 4 2007 13.4 7.2 4.1 3.1 1.1
## 5 2008 14.2 8.0 4.4 3.5 1.3
## 6 2009 15.3 8.8 4.8 3.8 1.5
head(dfclean2)
## Year under.30 under30_39 under40_49 under50_59 over60
## 1 2004 147.8 112.3 48.7 29.5 6.3
## 2 2005 162.4 127.6 56.4 36.4 8.2
## 3 2006 196.3 154.8 69.8 48.2 12.2
## 4 2007 219.8 174.5 80.0 56.4 15.9
## 5 2008 250.9 205.4 94.4 67.6 20.4
## 6 2009 275.9 232.2 109.0 78.5 25.3
dfnew1 <- gather(dfclean1,"Age_group","n_borrower",2:6)
dfnew2 <- gather(dfclean2,"Age_group","amount_borrowed",2:6)
head(dfnew1)
## Year Age_group n_borrower
## 1 2004 under.30 11.3
## 2 2005 under.30 12.1
## 3 2006 under.30 12.9
## 4 2007 under.30 13.4
## 5 2008 under.30 14.2
## 6 2009 under.30 15.3
head(dfnew2)
## Year Age_group amount_borrowed
## 1 2004 under.30 147.8
## 2 2005 under.30 162.4
## 3 2006 under.30 196.3
## 4 2007 under.30 219.8
## 5 2008 under.30 250.9
## 6 2009 under.30 275.9
pn <- ggplot(data = dfnew1, aes(x = Year, y = n_borrower, fill = Age_group)) +
geom_bar(stat = "identity", position = "dodge") +
ggtitle("Total Number (millions) of Student loan borrower by age group")+
theme_excel()
pn
pd <- ggplot(data = dfnew2, aes(x = Year, y = amount_borrowed, fill = Age_group)) +
geom_bar(stat = "identity", position = "dodge") +
ggtitle("Total Amount ($Billions) of Student loan borrower by age group")+
theme_excel()
pd
## Warning: Removed 90 rows containing missing values (geom_bar).
Since the recession, both Millenials and Gen Z had been the highest number of borrowers and the highest by dollar values
as well. While Gen Z’s borrowing rate started to plateu out by mid 2012, the Millenials’ student debt rate kept growing.
url3 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/tot_debt_by_age.csv'
url4 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/Debt%20Share%20by%20Product%20Type%20and%20Age.csv'
# Reading & Loading data
df3 <- read.csv(file = url3 ,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=3, stringsAsFactors = F,header=T)
df4 <- read.csv(file = url4 ,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=2, stringsAsFactors = F,header=T)
df3 <- df3 %>% select(-c(X:X.13)) # Number of Student loan borrowers
df4 <- df4 %>% select(-c(X.1:X.11)) # Dollar amount in $millions of Student loans
df3new <-df3
oldnames = c("X18.29","X30.39","X40.49","X50.59","X60.69","X70.")
newnames = c("under18_29","under30_39","under40_49","under50_59","under60_69","over70")
dfclean3 <- df3new %>%
slice(1:(n()-2)) %>%
rename_at(vars(oldnames), ~ newnames)
dfclean4 <- df4 %>%
rename_at(vars(oldnames), ~ newnames) %>%
slice(1:6) %>%
rename(credit_type=X)
dfclean3 <- dfclean3 #%<>%
# mutate(year= as.Date(year, format= "%d.%m.%Y"))
head(dfclean3)
## quarter under18_29 under30_39 under40_49 under50_59 under60_69 over70
## 1 1999.1 0.35 1.22 1.46 0.98 0.38 0.18
## 2 1999.2 0.37 1.26 1.49 1.00 0.38 0.18
## 3 1999.3 0.42 1.33 1.54 1.02 0.38 0.18
## 4 1999.4 0.43 1.33 1.53 1.00 0.37 0.18
## 5 2000.1 0.39 1.35 1.64 1.13 0.43 0.21
## 6 2000.2 0.42 1.41 1.69 1.16 0.44 0.21
(dfclean4)
## credit_type under18_29 under30_39 under40_49 under50_59 under60_69
## 1 Auto Loans 0.16 0.28 0.30 0.28 0.18
## 2 Credit Card 0.05 0.15 0.19 0.21 0.16
## 3 Mortgage 0.32 1.86 2.51 2.35 1.54
## 4 HELOC 0.00 0.02 0.07 0.11 0.11
## 5 Student Loans 0.35 0.49 0.32 0.21 0.09
## 6 Other 0.03 0.07 0.09 0.10 0.08
## over70
## 1 0.09
## 2 0.11
## 3 0.82
## 4 0.08
## 5 0.02
## 6 0.04
dfnew3 <- gather(dfclean3,"Age_group","tot_debt",2:7)
dfnew4 <- gather(dfclean4,"Age_group","amount_borrowed",2:7)
head(dfnew3)
## quarter Age_group tot_debt
## 1 1999.1 under18_29 0.35
## 2 1999.2 under18_29 0.37
## 3 1999.3 under18_29 0.42
## 4 1999.4 under18_29 0.43
## 5 2000.1 under18_29 0.39
## 6 2000.2 under18_29 0.42
head(dfnew4)
## credit_type Age_group amount_borrowed
## 1 Auto Loans under18_29 0.16
## 2 Credit Card under18_29 0.05
## 3 Mortgage under18_29 0.32
## 4 HELOC under18_29 0.00
## 5 Student Loans under18_29 0.35
## 6 Other under18_29 0.03
library(basicTrendline)
p3 <- ggplot(dfnew3, aes(x=quarter,y=tot_debt,fill=Age_group)) + geom_bar(stat = "identity")+
xlab("Year") + ylab("Amount Owed in $ Trillions") +
ggtitle("Total Debt Balance by Age") +
theme_excel()
p3 <- p3+theme(axis.text = element_text(size = 10,angle =90, hjust = 1))+theme(legend.position="top")
p3
# Multiple line plot
dfnewline <- dfnew3
p3b <- ggplot(dfnewline , aes(x = quarter, y = tot_debt)) +
xlab("Year") + ylab("Amount Owed in $ Trillions")+
ggtitle("Total Debt trendlines by Age") +
geom_line(aes(color = Age_group), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800","red","azure","yellow","black")) +
theme_excel()
p3b <- p3b+theme(axis.text = element_text(size = 10,angle =90, hjust = 1))+theme(legend.position="top")
p3b
dfnew4 <-dfnew4 %>%
group_by(Age_group)
p4 <- ggplot(dfnew4, aes(x=Age_group,y=amount_borrowed,fill=credit_type)) + geom_bar(stat = "identity")+
xlab("Age Groups") + ylab("Amount Owed in $Trillions") +
ggtitle("Debt Share by Product Type and Age (2019 Q2)") +
theme_excel()
p4 <-p4+theme(axis.text = element_text(size = 10,angle =0, hjust = 0.5))+theme(legend.position="top")
p4
spike in debt load at the peak of the debt bubble in 2006, debt load rate of Millenials kept increasing. Also worrying
was the increasing rates of all age groups relative to Millenials. The Millenials is the youngest age group to
allow its debt load to increased while surprisingly enough, Gen Z which was the youngest to have kept its debt load
relatively flat post the great recession of 2006.
and Credit Card debts
url5 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/Tuition_Roomboard.csv'
# Reading & Loading data
df5 <- read.csv(file = url5 ,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=3, stringsAsFactors = F,header=F)
oldlist <- paste("V.", 1:32,sep="", collapse = NULL)
newname <- c("institution","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999", "2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016", "2017","2018")
df5short <- df5 %>% select(-c(V35:V37)) %>%
slice(14:15)
head(df5short)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 Public Two-Year $640 $720 $800 $850 $950 $1,130 $1,100 $1,200
## 2 Public Four-Year $1,260 $1,420 $1,520 $1,640 $1,800 $2,070 $2,260 $2,430
## V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
## 1 $1,270 $1,400 $1,480 $1,530 $1,550 $1,600 $1,700 $1,750 $1,880 $2,070
## 2 $2,590 $2,740 $2,880 $3,040 $3,160 $3,280 $3,380 $3,610 $3,950 $4,430
## V20 V21 V22 V23 V24 V25 V26 V27 V28 V29
## 1 $2,240 $2,370 $2,460 $2,460 $2,550 $2,720 $2,890 $3,110 $3,290 $3,400
## 2 $4,850 $5,170 $5,450 $5,880 $6,270 $6,700 $7,170 $7,690 $8,030 $8,220
## V30 V31 V32 V33 V34
## 1 $3,520 $3,610 $3,700 $3,790 $3,900
## 2 $8,480 $8,710 $8,980 $9,250 $9,510
#Rename Columns
df5clean <- df5short %>%
rename_at(vars(starts_with("V")), funs(gsub(.,"V",newname)))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
## Warning in gsub(., "V", newname): argument 'pattern' has length > 1 and
## only the first element will be used
head(df5clean)
## institution 1986 1987 1988 1989 1990 1991 1992 1993
## 1 Public Two-Year $640 $720 $800 $850 $950 $1,130 $1,100 $1,200
## 2 Public Four-Year $1,260 $1,420 $1,520 $1,640 $1,800 $2,070 $2,260 $2,430
## 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
## 1 $1,270 $1,400 $1,480 $1,530 $1,550 $1,600 $1,700 $1,750 $1,880 $2,070
## 2 $2,590 $2,740 $2,880 $3,040 $3,160 $3,280 $3,380 $3,610 $3,950 $4,430
## 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
## 1 $2,240 $2,370 $2,460 $2,460 $2,550 $2,720 $2,890 $3,110 $3,290 $3,400
## 2 $4,850 $5,170 $5,450 $5,880 $6,270 $6,700 $7,170 $7,690 $8,030 $8,220
## 2014 2015 2016 2017 2018
## 1 $3,520 $3,610 $3,700 $3,790 $3,900
## 2 $8,480 $8,710 $8,980 $9,250 $9,510
# making dataset long format
df5cleaner<- gather(df5clean,"years","tuition_fees",2:34)
df5cleaner <- df5cleaner %>%
mutate(tuition_fees=str_replace_all(tuition_fees,",",""))%<>%
mutate_at(2, as.numeric)
# replace $ with blank "" in the df$payment column. and coerce that result to numeric
df5cleaner$tuition_fees = as.numeric(gsub("\\$", "", df5cleaner$tuition_fees ))
head(df5cleaner)
## institution years tuition_fees
## 1 Public Two-Year 1986 640
## 2 Public Four-Year 1986 1260
## 3 Public Two-Year 1987 720
## 4 Public Four-Year 1987 1420
## 5 Public Two-Year 1988 800
## 6 Public Four-Year 1988 1520
# Multiple line plot: 2yr and 4 yr public schools
#dfnewline <- dfnew3
p5 <- ggplot(df5cleaner , aes(x = years, y = tuition_fees)) +
xlab("Year") + ylab("Tuition fees + Room & Board")+
ggtitle("Avg. Tuition and Fees plus Room & Board (Unweighted) in Current Dollars") +
geom_line(aes(color = institution), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800")) +
theme_excel()
p5 <- p5+theme(axis.text = element_text(size = 11,angle =0, hjust = 1))+theme(legend.position="top")
p5
Observation 3:
The average 2 and 4 year public instituitions (excluding private entities) in current dollars have been growing with
52% increase just in the last decade! This shows you the new younger generations of Gen Z and Millennials are NOT
getting any breaks if they wanted to or are currently pursuing any forms of post-secondary education
Let’s turn our focus to the revenue side of the equation: GDP of US economy and Unemployment rate of these generations
url6 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/BLS_Unemployment%20SeriesReport-20191006171232_c15ff5.csv'
# Reading & Loading data
df6 <- read.csv(file = url6,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=12, stringsAsFactors = F,header=F)
df6short <- df6 %>% select(-c(V2:V5,V7:V11,V13:V17,V19:V23))
#labeling columns "
new_name <- c("year","under18_21","under25_34","under35_44","under45_54")
df6shortnew <- df6short %>%
rename(year="V1","under18_21"='V6', "under25_34"="V12","under35_44"="V18","under45_54"="V24")
head(df6shortnew )
## year under18_21 under25_34 under35_44 under45_54
## 1 2009 23.43 9.88 8.00 7.18
## 2 2010 24.20 10.08 8.10 7.68
## 3 2011 22.83 9.45 7.33 7.05
## 4 2012 22.35 8.25 6.60 6.23
## 5 2013 21.05 7.40 5.88 5.65
## 6 2014 18.50 6.50 4.75 4.40
# making dataset long format
df6cleaner<- gather(df6shortnew ,"Age_group","unemployment_rate",2:5)
head(df6cleaner)
## year Age_group unemployment_rate
## 1 2009 under18_21 23.43
## 2 2010 under18_21 24.20
## 3 2011 under18_21 22.83
## 4 2012 under18_21 22.35
## 5 2013 under18_21 21.05
## 6 2014 under18_21 18.50
# Multiple line plot: Unemployment rate of various age groups
df6cleaner <- df6cleaner %>%
group_by(year)
p6 <- ggplot(df6cleaner , aes(x = year, y = unemployment_rate)) +
xlab("Year") + ylab("% Unemployment Rate ")+
ggtitle("Umemployment rate; various age groups") +
geom_line(aes(color = Age_group), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800","black","red")) +
theme_excel()
p6 <- p6+theme(axis.text = element_text(size = 11,angle =0, hjust = 1))+theme(legend.position="top")
p6
Observation 4:
The good news is unemployment rate of the Millennials has been trending downward since the great recession, its currently
less than 5%. By the same token, Gen X and beyond are also seeing very good unemployment numbers; all under 5%. Of note,
the older you are, the better off you are as measured by unemployment rate. However, the worrying sign is Gen Z is
still seeing some elevated numbers, hovering around 12%. But over all, all groups are seeing a down shift which is a
of today’s general economic strength, which is a topic we will delve into next to finish off this discussion.
url7 <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/GDP.csv'
# Reading & Loading data
df7 <- read.csv(file = url7,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE,skip=3, stringsAsFactors = F,header=T)
df7short <- df7 %>% select(-c(Country.Code,Indicator.Name,Indicator.Code,X1960))
#labeling columns "
df7shortnew <- df7short %>%
rename_at(vars(starts_with("X")), funs(str_replace(., "X", "")))
# making dataset long format
df7cleaner<- gather(df7shortnew,"year","N_GDP",2:59) %>%
filter(Country.Name == 'United States')%<>%
mutate_at(2, as.numeric)
head(df7cleaner)
## Country.Name year N_GDP
## 1 United States 1961 2.3
## 2 United States 1962 6.1
## 3 United States 1963 4.4
## 4 United States 1964 5.8
## 5 United States 1965 6.4
## 6 United States 1966 6.5
# Trend line plot: US GDP since 1961
df7cleaner <- df7cleaner %>%
group_by(year) %>%
filter(year >=2000)
p7 <- ggplot(df7cleaner , aes(x = year, y = N_GDP) )+
xlab("Year") + ylab("% GDP ")+
ggtitle("GDP growth rate since Yr 2000") +
geom_line(aes(color = Country.Name), size = 1.3) +
scale_color_manual(values = c("yellow")) +
theme_excel()
p7 <- p7+theme(axis.text = element_text(size = 11,angle =0, hjust = 1))+theme(legend.position="top")
p7
Observation 5:
Since the Great Recession of 2008, the US economy has been growing and it wasn’t till 2013 there after, that the economy
clawed its way back to pre-recession state. It has been averaging around 2% growth the last 6 years or so and in 2017,
the slope turned upwards. This upward trend is one of the biggest reason, unemployment rate in observation 4 has gone
down. Good economic growth has reduced unemployment rate across all age groups and Millenials’ downward unemployement
rate is part of that “feel good” story. In fact, this is the classic proverbial phrase; “high tide lifts all boats”.
Summary:
The discussion started out with a blaring editorial headline:
But after looking at the “check book” of the Millennial generation from both sides; Liabilities vs. Revenue (GDP &
Unemployment rate), we are now able to answer this question without biases and anecdotes. These are the facts:
Yes, Millennials are much in debt with the highest dollar student loan amount across all age groups
Yes, Millennials, along with Gen Z are the highest number of borrowers of student loan across all age groups
Yes, Millennials, along with all the other age groups are still incurring more total debt post recession
Yes, Public 2 & 4 year college tuition plus room & board fees are increasing with a whopping 52% increase in just
the last decade
Yes, Unemployment rate across all age groups is decreasing along with a positive GDP growth in the last 6 years
With all these statistics in the backdrop, Its an open secret that we are a nation of debtors. Millennials is just one
of the many. Millennials are certainly up against a lot of challenges with a huge debt overhang. But Millennials are NOT
the only generation facing this dilemma. In fact, all the generations across ages are facing their own challenges with
debts on their backs. The question then becomes how much debt becomes deterimental to personal growth. That is, if one
was to take on more debt, will there be enough earnings derived from a better economic future to offset this debt burden?
Since all age groups are “screwed” at a certain level because of their debt load, the real issue is then will there be
enough of a career path to pay off all these debts. Maybe the better headline should read something like this: