library(ggplot2)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(forecast)
library(aTSA)
##
## Attaching package: 'aTSA'
## The following object is masked from 'package:forecast':
##
## forecast
## The following objects are masked from 'package:tseries':
##
## adf.test, kpss.test, pp.test
## The following object is masked from 'package:graphics':
##
## identify
Many time series exhibit some form of periodic behavior. Detecting these periods visually can be difficult. We instead can perform a spectral analysis on our time series. In other words, we will approximate the spectral density as a function of frequency/period, and the peaks in our spectral density graph (otherwise known as the periodogram) correspond to the most dominant periods of our time series.
We are now ready to perform a spectral analysis on our data.
By default R’s spectrum() function will return the periodogram graph as a function of frequency.
Let \(f\) by frequency and \(T\) be period.
Then we have the following relations:
\[ f = \frac{1}{T}\\ T = \frac{1}{f} \] Let us carefully analyze and what this all means. Each data point in our Home Depot data set is collected every 3 months. Suppose we were looking for annual periodicity. This would mean we would need a period of 4 (because 4 quarters make a 1 year annual). A period \(T=4\) would mean frequency \(f=0.25\).
Let’s look at our peak at frequency \(f=1\). This means we have a period of \(T=1\). This means, since our data set is collected 1 data point every 3 months, that we have a quarterly (or every 3 months) component in our data set. This suggests that the Home Depot earnings per share follows a quarterly component.
spec.hd = spectrum(detrend.hd.ts,span=2, log='no')
Alternatively we can pull some information out of our spectrum object and reconfigure our graph so that the spectral density is a function of period (in terms of our time variable months).
delta = 1/3 #one observation every 3 months
specx = spec.hd$freq/delta
specy = 2*spec.hd$spec
plot(specx, specy, xlab="Period (months)", ylab="Spectral Density", type='l')
Purchase-Only Index (Estimated Using Sales Price Data) for th 100 Largest Metropolitan Areas. Our analysis will focus on the San Diego-Chula Vista-Carlsbad area, otherwise known as west San Diego County (west of I-15). The data set includes quarterly data from Q1 1991 to most recent Q1 2021.
Web page: https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index-Datasets.aspx#mpo
As before we will attempt to
1.) Import Data
2.) Apply Variance Stabilization (if necessary)
3.) Apply differencing to de-trend (if necessary)
4.) Spectral Analysis to determine the most significant period lengths (if any)
hpi.dat = read.csv('HPImetro.csv', header=TRUE)
hpi.sd = hpi.dat[hpi.dat$metro_name=='San Diego-Chula Vista-Carlsbad, CA',]
sd.ts <- ts(hpi.sd$index_nsa, start=c(1991, 1),frequency=4)
ggtsdisplay(sd.ts)
Visually there appears to be an increasing global trend. Thus, differencing will most likely be necessary to remove it. There does not appear to be a need for variance-stabilization.
We will attempt to remove the linear trend by 1st order differencing.
sd.diff.ts = diff(sd.ts, differences=1)
ggtsdisplay(sd.diff.ts)
adf.test(sd.diff.ts)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -3.93 0.0100
## [2,] 1 -3.32 0.0100
## [3,] 2 -1.67 0.0918
## [4,] 3 -1.43 0.1651
## [5,] 4 -1.93 0.0524
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -4.38 0.010
## [2,] 1 -3.78 0.010
## [3,] 2 -2.06 0.305
## [4,] 3 -1.83 0.397
## [5,] 4 -2.38 0.182
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -4.56 0.0100
## [2,] 1 -3.97 0.0135
## [3,] 2 -2.24 0.4735
## [4,] 3 -2.00 0.5738
## [5,] 4 -2.53 0.3510
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
kpss.test(sd.diff.ts)
## KPSS Unit Root Test
## alternative: nonstationary
##
## Type 1: no drift no trend
## lag stat p.value
## 2 0.668 0.1
## -----
## Type 2: with drift no trend
## lag stat p.value
## 2 0.264 0.1
## -----
## Type 1: with drift and trend
## lag stat p.value
## 2 0.141 0.059
## -----------
## Note: p.value = 0.01 means p.value <= 0.01
## : p.value = 0.10 means p.value >= 0.10
We have conflicting test conclusions about the stationarity of the series. Regardless, let us move onto the spectral analysis.
spec.sd = spectrum(sd.diff.ts, span=2, log='no')
delta = 1/3 #one observation every 3 months
specx = spec.sd$freq/delta
specy = 2*spec.sd$spec
plot(specx, specy, xlab="Period (months)", ylab="Spectral Density", main='SD Price Index', type='l')
max.freq = specx[which.max(specy)]
max.freq
## [1] 0.2
max.period = 1/max.freq
max.period #every 3.75 months
## [1] 5
There appears to be two significant peaks.
Let us again take the time to analyze what this graph means. We have a maximum peak at frequency \(f=0.26666\). This implies that \(T=3.75\). This means that we have an almost annual component (11.25 months) in the SD home price index data set.
A second significant peak occurs at \(f=1.0\). This implies that \(T=1.0\), and so a period of 1 implies that we have a quarterly (or every 3 months) component to the SD home price index data set.
decomp.sd = decompose(sd.ts)
plot(decomp.sd)
In conclusion, what was once difficult to see visually in the original time series plot, we can now see that there is an annual component to our SD home price index.