R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(tidyverse) 
## Warning: package 'ggplot2' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## βœ” dplyr     1.1.3     βœ” readr     2.1.4
## βœ” forcats   1.0.0     βœ” stringr   1.5.0
## βœ” ggplot2   3.4.4     βœ” tibble    3.2.1
## βœ” lubridate 1.9.2     βœ” tidyr     1.3.0
## βœ” purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## βœ– dplyr::filter() masks stats::filter()
## βœ– dplyr::lag()    masks stats::lag()
## β„Ή Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate) 
library(ggplot2)
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.3.2
## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.2
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## βœ” tsibbledata 0.4.1     βœ” fable       0.3.3
## βœ” feasts      0.3.1     βœ” fabletools  0.3.4
## Warning: package 'tsibbledata' was built under R version 4.3.2
## Warning: package 'feasts' was built under R version 4.3.2
## Warning: package 'fabletools' was built under R version 4.3.2
## Warning: package 'fable' was built under R version 4.3.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## βœ– lubridate::date()    masks base::date()
## βœ– dplyr::filter()      masks stats::filter()
## βœ– tsibble::intersect() masks base::intersect()
## βœ– tsibble::interval()  masks lubridate::interval()
## βœ– dplyr::lag()         masks stats::lag()
## βœ– tsibble::setdiff()   masks base::setdiff()
## βœ– tsibble::union()     masks base::union()
homes <- read.csv("D:/DataSet/Homes.csv")

Converting Year to Date

The year_built column contains the year each home was built.So we convert this to a Date column to enable time series analysis.

homes <- homes %>% 
  mutate(year_built = paste0(year_built, "-01-01")) %>%
  mutate(year_built = as.Date(year_built, format = "%Y-%m-%d")) %>%
  mutate(id = row_number())

Price over Time

After analyzing the price of homes over time.

First, we should create a tsibble with just the year_built and price:

homes_ts <- homes %>%
  select(id, year_built, price) %>%
  as_tsibble(index = year_built, key = id)

Plotting the full time series shows an overall increasing trend in price over time:

ggplot(data = homes_ts, mapping = aes(x = year_built, y = price)) + 
  geom_line()

Looking at a shorter recent window, we see prices spiked around 2015-2017 before dropping again:

ggplot(data = filter(homes_ts, year_built > "2000-01-01"), mapping = aes(x = year_built, y = price)) +
  geom_line()

Linear Trends

Will use linear regression to quantify the upward trend in prices over time.

Fitting a model to the full data shows a significant upward trend, with prices increasing by about $21,000 per year:

lm_model <- lm(price ~ year_built, data = homes_ts)
summary(lm_model)
## 
## Call:
## lm(formula = price ~ year_built, data = homes_ts)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2371065 -1227880  -730533    44727 25862096 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.164e+06  1.296e+05   16.70  < 2e-16 ***
## year_built  3.601e+01  8.452e+00    4.26 2.45e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2776000 on 490 degrees of freedom
## Multiple R-squared:  0.03572,    Adjusted R-squared:  0.03375 
## F-statistic: 18.15 on 1 and 490 DF,  p-value: 2.448e-05

The recent upward spike is even steeper - prices increased by about $85,000 per year from 2000-2017:

lm_model_recent <- lm(price ~ year_built, data = filter(homes_ts, year_built > "2000-01-01"))
summary(lm_model_recent)
## 
## Call:
## lm(formula = price ~ year_built, data = filter(homes_ts, year_built > 
##     "2000-01-01"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4198657 -1825921  -554914   542336 13710081 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.076e+07  2.537e+06  -4.243 4.26e-05 ***
## year_built   9.651e+02  1.737e+02   5.555 1.59e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3361000 on 125 degrees of freedom
## Multiple R-squared:  0.198,  Adjusted R-squared:  0.1916 
## F-statistic: 30.86 on 1 and 125 DF,  p-value: 1.593e-07

In summary, home prices in this data have increased substantially over time, with a steep trend upwards from 2000-2017. There is also clear seasonality within each year, with prices peaking in spring/summer.