Rabin Gora
4/29/2022
Describe the general area or issue you want to investigate in your data analysis project. = The general area I want to investigate is the housing price index and mortgage rates. I want to show the relation between housing price index and the mortgage rate. My analysis will also consider the recessions. I will show the ways combine these data.
Definition: Mortgage Rate: A mortgage rate is the rate of interest charged on a mortgage.
Housing Price Index: - The House Price Index (HPI) is a broad measure of the movement of single-family property prices in the United States. Aside from serving as an indicator of house price trends, it also functions as an analytical tool for estimating changes in the rates of mortgage defaults, prepayments, and housing affordability - It is published by the Federal Housing Finance Agency (FHFA), using monthly and quarterly data supplied by Fannie Mae and Freddie Mac. - The HPI is one of many economic indicators that investors use to keep a pulse on broader economic trends and potential shifts in the stock market.(source: Investopedia)
Define at least two specific questions you would like to attempt to answer. = The questions are as follows: 1) What are the impacts of mortgage interest rate on house prices Index?
Which state and county in the US has highest growth in housing price Index year over year?
What is the impact of recession on housing price Index and mortgage rates?
Identify the data source(s) you used for your analysis. Provide a URL if possibl. = The data sources i’ll be using is as follows:
Provide a link to the documentation for the data or the documentation itself. Is there a data dictionary? =The technical documentation can be found here :
Freddie Mac publishes the monthly index values of the Freddie Mac House Price Index (FMHPISM) each quarter. Index values are available for the nation, the 50 states and the District of Columbia, and the more than 380 metropolitan statistical areas (MSAs) in the U.S.
Use the tools in R such as str() and summary() to describe the original dataset you imported.
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## --- Compiling #TidyTuesday Information for 2019-02-05 ----
## --- There are 3 files available ---
## --- Starting Download ---
##
## Downloading file 1 of 3: `mortgage.csv`
## Downloading file 2 of 3: `recessions.csv`
## Downloading file 3 of 3: `state_hpi.csv`
## --- Download complete ---
#separate individual tables of the housing index, recession and mortgage rate.
#mortgage.csv data
mortgage_rate <- HPandMR$mortgage
#recessions.csv data
recessions <- HPandMR$recessions
#state_hpi.csv data
state_hpi <- HPandMR$state_hpi## date fixed_rate_30_yr fees_and_pts_30_yr fixed_rate_15_yr
## Min. :1971-04-02 Min. : 3.310 Min. :0.300 Min. :2.560
## 1st Qu.:1983-03-09 1st Qu.: 5.850 1st Qu.:0.700 1st Qu.:3.765
## Median :1995-02-13 Median : 7.645 Median :1.100 Median :5.680
## Mean :1995-02-13 Mean : 8.084 Mean :1.271 Mean :5.479
## 3rd Qu.:2007-01-19 3rd Qu.: 9.780 3rd Qu.:1.800 3rd Qu.:6.840
## Max. :2018-12-27 Max. :18.630 Max. :2.700 Max. :8.890
## NA's :32 NA's :1065
## fees_and_pts_15_yr adjustable_rate_5_1_hybrid fees_and_pts_5_1_hybrid
## Min. :0.3000 Min. :2.560 Min. :0.20
## 1st Qu.:0.6000 1st Qu.:2.970 1st Qu.:0.50
## Median :0.7000 Median :3.540 Median :0.50
## Mean :0.9025 Mean :4.045 Mean :0.53
## 3rd Qu.:1.0000 3rd Qu.:5.260 3rd Qu.:0.60
## Max. :2.0000 Max. :6.390 Max. :0.90
## NA's :1065 NA's :1762 NA's :1762
## adjustable_margin_5_1_hybrid spread_30_yr_fixed_and_5_1_adjustable
## Min. :2.70 Min. :0.5000
## 1st Qu.:2.74 1st Qu.:0.6800
## Median :2.74 Median :0.7600
## Mean :2.75 Mean :0.7633
## 3rd Qu.:2.76 3rd Qu.:0.8225
## Max. :2.80 Max. :1.0200
## NA's :1762 NA's :2336
## spec_tbl_df [2,492 x 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ date : Date[1:2492], format: "1971-04-02" "1971-04-09" ...
## $ fixed_rate_30_yr : num [1:2492] 7.33 7.31 7.31 7.31 7.29 7.38 7.42 7.44 7.46 7.52 ...
## $ fees_and_pts_30_yr : num [1:2492] NA NA NA NA NA NA NA NA 1 NA ...
## $ fixed_rate_15_yr : num [1:2492] NA NA NA NA NA NA NA NA NA NA ...
## $ fees_and_pts_15_yr : num [1:2492] NA NA NA NA NA NA NA NA NA NA ...
## $ adjustable_rate_5_1_hybrid : num [1:2492] NA NA NA NA NA NA NA NA NA NA ...
## $ fees_and_pts_5_1_hybrid : num [1:2492] NA NA NA NA NA NA NA NA NA NA ...
## $ adjustable_margin_5_1_hybrid : num [1:2492] NA NA NA NA NA NA NA NA NA NA ...
## $ spread_30_yr_fixed_and_5_1_adjustable: num [1:2492] NA NA NA NA NA NA NA NA NA NA ...
## - attr(*, "spec")=
## .. cols(
## .. date = col_date(format = ""),
## .. fixed_rate_30_yr = col_double(),
## .. fees_and_pts_30_yr = col_double(),
## .. fixed_rate_15_yr = col_double(),
## .. fees_and_pts_15_yr = col_double(),
## .. adjustable_rate_5_1_hybrid = col_double(),
## .. fees_and_pts_5_1_hybrid = col_double(),
## .. adjustable_margin_5_1_hybrid = col_double(),
## .. spread_30_yr_fixed_and_5_1_adjustable = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
## name period_range duration_months
## Length:14 Length:14 Length:14
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## time_since_previous_recession_months peak_unemploy_ment
## Length:14 Length:14
## Class :character Class :character
## Mode :character Mode :character
## gdp_decline_peak_to_trough characteristics
## Length:14 Length:14
## Class :character Class :character
## Mode :character Mode :character
## year month state price_index
## Min. :1975 Min. : 1.00 Length:26877 Min. : 15.36
## 1st Qu.:1985 1st Qu.: 3.00 Class :character 1st Qu.: 57.52
## Median :1996 Median : 6.00 Mode :character Median : 84.49
## Mean :1996 Mean : 6.49 Mean : 94.09
## 3rd Qu.:2007 3rd Qu.: 9.00 3rd Qu.:126.17
## Max. :2018 Max. :12.00 Max. :381.41
## us_avg
## Min. : 23.46
## 1st Qu.: 52.94
## Median : 77.51
## Mean : 93.13
## 3rd Qu.:131.91
## Max. :189.54
## spec_tbl_df [26,877 x 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ year : num [1:26877] 1975 1975 1975 1975 1975 ...
## $ month : num [1:26877] 1 2 3 4 5 6 7 8 9 10 ...
## $ state : chr [1:26877] "AK" "AK" "AK" "AK" ...
## $ price_index: num [1:26877] 34.6 35.1 35.5 35.9 36.3 ...
## $ us_avg : num [1:26877] 23.5 23.6 23.8 24.1 24.2 ...
## - attr(*, "spec")=
## .. cols(
## .. year = col_double(),
## .. month = col_double(),
## .. state = col_character(),
## .. price_index = col_double(),
## .. us_avg = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
Describe the steps you took to get from your original dataset to the final dataset you used for your analysis. Include the R code in chunks.
#check if any missing data in any of the dataframe's required cols by boolean table.
table(is.na(mortgage_rate$date)) ##
## FALSE
## 2492
## < table of extent 0 >
#do the same for fixed_rate_30_yr colum; double check for missing data
table(is.na(mortgage_rate$fixed_rate_30_yr))##
## FALSE
## 2492
##
## FALSE
## 2492
##
## FALSE TRUE
## 2460 32
##
## FALSE
## 2460
#we see 32 rows missing with missing values. we will fix this by replacing with mean of this column.
mean_fees_and_pts_30_yr = mean(mortgage_rate$fees_and_pts_30_yr, na.rm = TRUE)
mean_fees_and_pts_30_yr## [1] 1.270776
#replace the missing value for fees_and_pts_30_yr column with the mean
mortgage_rate[is.na(mortgage_rate$fees_and_pts_30_yr), "fees_and_pts_30_yr"] = mean_fees_and_pts_30_yr
#chekc again to see if any missing values in this colume. use either summary or table
table(is.na(mortgage_rate$fees_and_pts_30_yr))##
## FALSE
## 2492
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.300 0.700 1.100 1.271 1.800 2.700
no missing values for other tables as we can see in the summary above.
#cleaning the recession datafram to readable output for R.
#converting the recession date from character vector to start and end date.
library(stringr)
recession_dates <- recessions %>%
mutate(duration_months = substring(duration_months, 3),
period_range = substring(period_range, 5),
time_since_previous_recession_months = substring(time_since_previous_recession_months, 4),
peak_unemploy_ment = substring(peak_unemploy_ment, 5),
gdp_decline_peak_to_trough = substring(gdp_decline_peak_to_trough, 5),
period_range = case_when(name == "Great Depression" ~ "Aug 1929-Mar 1933",
name == "Great Recession" ~ "Dec 2007-June 2009",
TRUE ~ period_range))
#extract(period_range, c("started_from","lasted_to"), ("([[^-]]+)-([[^-]]+)"))
#sperate the period_range into start and end month and year using lubridate and stringr packages
recession_dates$from <- my(stringr::str_extract(recession_dates$period_range,"^.*?(?=-)"))
recession_dates$to <- my(stringr::str_extract(recession_dates$period_range,"(?=-).*?$"))Show how you approached the questions you posed at the beginning. Describe how much you were able to accomplish. There should be both graphical and numerical results produced by R code included in chunks. Explain what you did and what it means.
#ggplot for the state_hpi for quick overview
ggplot_state_hpi <- state_hpi %>%
ggplot(aes(year, price_index, color = state, group = state)) +
geom_point(size = 0.3, alpha = 0.7) +
ggtitle("Year and House Price Index chart by state")
ggplotly(ggplot_state_hpi)This chart shows that something happened in 2000 with the HPI. so i did a little research on what happened that year for HPI i found that the HPI is scaled so that each state has a value of 100 in Dec 2000. So each data point is the percent of the Dec 2000 price in that state.
#reindexing the price_index and us_average so that the HPI starts from Jan 1975 instead of 2000.
state_hpi %>%
mutate(date = ymd(year*1e4 + month*1e2 + 1)) %>%
group_by(state) %>%
mutate(price_reindex = price_index /price_index[1] * 100,
us_reAvg_hpi = us_avg /us_avg[1] *100) %>%
ungroup() -> state_hpi#replotting the above chart with the reindex and reaverage
#ggplot for the state_hpifor quick overview
ggplot_state_hpi2 <- state_hpi %>%
ggplot(aes(year, price_reindex, color = state, group = state))+
geom_point(size = 0.3, alpha = 0.7) +
ylab("HPI_reindexed to Jan 1975 = 100") +
ggtitle("date vs HPI reindexed to jan 1975 by state")
ggplotly(ggplot_state_hpi2)recession_dates %>%
extract(period_range, c("from_c","to_c"), "(^.*?(?=-))-((?=-).*?$)") %>%
mutate(from = as.Date(parse_date_time(from_c, "%b %Y")),
to = as.Date(parse_date_time(to_c, "%b %Y")))## # A tibble: 14 x 10
## name from_c to_c duration_months time_since_prev~ peak_unemploy_m~
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Great Depress~ <NA> <NA> 3 years7 months "1 year9 months" "21.3%(1932)[45~
## 2 Recession of ~ <NA> <NA> 1 year1 month "4 years2 month~ "17.8%[45]-\n19~
## 3 Recession of ~ <NA> <NA> 8 months "6 years8 month~ "5.2%[52](1946)"
## 4 Recession of ~ <NA> <NA> 11 months "3 years1 month" "7.9%(Oct 1949)"
## 5 Recession of ~ <NA> <NA> 10 months "3 years9 month~ "6.1%(Sep 1954)"
## 6 Recession of ~ <NA> <NA> 8 months "3 years3 month~ "7.5%(July 1958~
## 7 Recession of ~ <NA> <NA> 10 months "2 years" "7.1%(May 1961)"
## 8 Recession of ~ <NA> <NA> 11 months "8 years10 mont~ " 6.1%(Dec 1970~
## 9 1973-75 reces~ <NA> <NA> 1 year4 months "3 years" " 9.0%(May 1975~
## 10 1980 recession <NA> <NA> 6 months "4 years10 mont~ " 7.8%(July 198~
## 11 1981-1982 rec~ <NA> <NA> 1 year4 months "1 year" " 10.8%(Nov 198~
## 12 Early 1990s r~ <NA> <NA> 8 months "7 years8 month~ " 7.8%(June 199~
## 13 Early 2000s r~ <NA> <NA> 8 months "10 years" " 6.3%(June 200~
## 14 Great Recessi~ <NA> <NA> 1 year6 months "6 years\n1 mon~ "10.0%(October ~
## # ... with 4 more variables: gdp_decline_peak_to_trough <chr>,
## # characteristics <chr>, from <date>, to <date>
##This chart answers the question of: 3) What is the impact of recession on housing price Index and mortgage rates?
#putting all the charts together: house price index, recession, and us_avg index. we get
state_hpi %>%
ggplot() +
geom_line(aes(date,price_index, group = state, color = state)) +
geom_rect(aes(xmin = from, xmax = to, ymin = -Inf, ymax = Inf),
alpha = 0.25,
fill = "black",
data = recession_dates) +
xlim(ymd(19750101), NA) +
ylab("HPI / US avg HPI\nreindexed to Jan 1975 = 100")## Warning: Removed 9 rows containing missing values (geom_rect).
= We see that in every recession, the housing price index has significantlty come down before before it continues its uptrend.
##Chart of us_average house price index.
state_hpi %>%
ggplot()+
geom_line(aes(x = date, y = us_avg))+
geom_rect(aes(xmin = from, xmax = to, ymin = -Inf, ymax = Inf),
alpha = 0.25,
fill = "black",
data = recession_dates) +
xlim(ymd(19750101), NA) +
ylab("US avg HPI Jan 1975 = 100") +
ggtitle("US Average house price Index vs date, including recession periods")## Warning: Removed 9 rows containing missing values (geom_rect).
This chart answers the questio of:
state_hpi %>%
ggplot()+
geom_line(aes(x = date, y = us_avg))+
geom_line(aes(x = date, y = fixed_rate_30_yr),
alpha = 0.25,
color = "blue",
data = mortgage_rate) +
xlim(ymd(19750101), NA) +
ylab("house price index + mortgage rate") +
ggtitle("US Average house price Index with relation to 30 years fixed mortgage rate")## Warning: Removed 196 row(s) containing missing values (geom_path).
= The housing price index and mortgage seems to have negetive corelation. We see the mortgage rate is in the down trend whereas the housing price index is uptrending with time
group_by_year <- state_hpi %>%
group_by(year) %>%
top_n(1, price_index) %>%
ungroup() %>%
arrange(year)
group_by_year## # A tibble: 44 x 8
## year month state price_index us_avg date price_reindex us_reAvg_hpi
## <dbl> <dbl> <chr> <dbl> <dbl> <date> <dbl> <dbl>
## 1 1975 11 WV 48.6 24.8 1975-11-01 120. 106.
## 2 1976 12 ND 49.1 27.2 1976-12-01 122. 116.
## 3 1977 12 WV 53.0 31.2 1977-12-01 130. 133.
## 4 1978 12 WV 59.3 35.4 1978-12-01 146. 151.
## 5 1979 12 OK 67.0 39.9 1979-12-01 179. 170.
## 6 1980 11 OK 71.2 42.8 1980-11-01 190. 182.
## 7 1981 12 OK 77.4 44.7 1981-12-01 206. 190.
## 8 1982 10 OK 82.6 45.0 1982-10-01 220. 192.
## 9 1983 12 AK 85.8 47.8 1983-12-01 248. 204.
## 10 1984 1 AK 85.8 48.0 1984-01-01 248. 204.
## # ... with 34 more rows
##plot the graph of the group by year for HPI
This chart answers the question of: 2) Which state and county in the US has highest growth in housing price Index year over year?
p1 <- group_by_year %>%
ggplot(aes(x= year, y = price_index, color = state))+
geom_point(size = 0.5) +
ggtitle("top state with Highest Housing Price Index by each year since 1975")
ggplotly(p1)= WE can see the top state with highest housing price index each year.
Hence, These answers all of the questions i was trying to answer with this project about houseing price index, mortgage rate and recession.