This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.2
## ββ Attaching core tidyverse packages ββββββββββββββββββββββββ tidyverse 2.0.0 ββ
## β dplyr 1.1.3 β readr 2.1.4
## β forcats 1.0.0 β stringr 1.5.0
## β ggplot2 3.4.4 β tibble 3.2.1
## β lubridate 1.9.2 β tidyr 1.3.0
## β purrr 1.0.2
## ββ Conflicts ββββββββββββββββββββββββββββββββββββββββββ tidyverse_conflicts() ββ
## β dplyr::filter() masks stats::filter()
## β dplyr::lag() masks stats::lag()
## βΉ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.3.2
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.2
## ββ Attaching packages ββββββββββββββββββββββββββββββββββββββββββββββ fpp3 0.5 ββ
## β tsibbledata 0.4.1 β fable 0.3.3
## β feasts 0.3.1 β fabletools 0.3.4
## Warning: package 'tsibbledata' was built under R version 4.3.2
## Warning: package 'feasts' was built under R version 4.3.2
## Warning: package 'fabletools' was built under R version 4.3.2
## Warning: package 'fable' was built under R version 4.3.2
## ββ Conflicts βββββββββββββββββββββββββββββββββββββββββββββββββ fpp3_conflicts ββ
## β lubridate::date() masks base::date()
## β dplyr::filter() masks stats::filter()
## β tsibble::intersect() masks base::intersect()
## β tsibble::interval() masks lubridate::interval()
## β dplyr::lag() masks stats::lag()
## β tsibble::setdiff() masks base::setdiff()
## β tsibble::union() masks base::union()
homes <- read.csv("D:/DataSet/Homes.csv")
Converting Year to Date
The year_built column contains the year each home was built.So we convert this to a Date column to enable time series analysis.
homes <- homes %>%
mutate(year_built = paste0(year_built, "-01-01")) %>%
mutate(year_built = as.Date(year_built, format = "%Y-%m-%d")) %>%
mutate(id = row_number())
Price over Time
After analyzing the price of homes over time.
First, we should create a tsibble with just the year_built and price:
homes_ts <- homes %>%
select(id, year_built, price) %>%
as_tsibble(index = year_built, key = id)
Plotting the full time series shows an overall increasing trend in price over time:
ggplot(data = homes_ts, mapping = aes(x = year_built, y = price)) +
geom_line()
Looking at a shorter recent window, we see prices spiked around 2015-2017 before dropping again:
ggplot(data = filter(homes_ts, year_built > "2000-01-01"), mapping = aes(x = year_built, y = price)) +
geom_line()
Linear Trends
Will use linear regression to quantify the upward trend in prices over time.
Fitting a model to the full data shows a significant upward trend, with prices increasing by about $21,000 per year:
lm_model <- lm(price ~ year_built, data = homes_ts)
summary(lm_model)
##
## Call:
## lm(formula = price ~ year_built, data = homes_ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2371065 -1227880 -730533 44727 25862096
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.164e+06 1.296e+05 16.70 < 2e-16 ***
## year_built 3.601e+01 8.452e+00 4.26 2.45e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2776000 on 490 degrees of freedom
## Multiple R-squared: 0.03572, Adjusted R-squared: 0.03375
## F-statistic: 18.15 on 1 and 490 DF, p-value: 2.448e-05
The recent upward spike is even steeper - prices increased by about $85,000 per year from 2000-2017:
lm_model_recent <- lm(price ~ year_built, data = filter(homes_ts, year_built > "2000-01-01"))
summary(lm_model_recent)
##
## Call:
## lm(formula = price ~ year_built, data = filter(homes_ts, year_built >
## "2000-01-01"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4198657 -1825921 -554914 542336 13710081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.076e+07 2.537e+06 -4.243 4.26e-05 ***
## year_built 9.651e+02 1.737e+02 5.555 1.59e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3361000 on 125 degrees of freedom
## Multiple R-squared: 0.198, Adjusted R-squared: 0.1916
## F-statistic: 30.86 on 1 and 125 DF, p-value: 1.593e-07
In summary, home prices in this data have increased substantially over time, with a steep trend upwards from 2000-2017. There is also clear seasonality within each year, with prices peaking in spring/summer.