UMath 531T Project Edits

Libraries Used

library(ggplot2)
library(tseries)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(forecast)
library(aTSA)

## 
## Attaching package: 'aTSA'

## The following object is masked from 'package:forecast':
## 
##     forecast

## The following objects are masked from 'package:tseries':
## 
##     adf.test, kpss.test, pp.test

## The following object is masked from 'package:graphics':
## 
##     identify

Intro:

Many time series exhibit some form of periodic behavior. Detecting these periods visually can be difficult. We instead can perform a spectral analysis on our time series. In other words, we will approximate the spectral density as a function of frequency/period, and the peaks in our spectral density graph (otherwise known as the periodogram) correspond to the most dominant periods of our time series.

Data Analysis: Home Deport Quarterly Earnings per Share

Our data set includes Home Depot’s (NYSE: HD) quarterly earnings per share (EPS) from Q1 1999 to Q2 2021.

NOTE: Spectrum Analysis requires a stationary time series. Thus, the following steps will be completed before any spectral analysis.

1.) Import data
2.) Apply variance-stabilization (if necessary)
3.) Apply differencing to remove trend (if necessary)

Step 1.) Import Data

HD.dat = read.csv('HD.txt', header=FALSE)
hd.eps = as.numeric(substr(HD.dat$V2,7, 11))
hd.eps.rev = rev(hd.eps)
hd.eps.mod = hd.eps.rev + -1*min(hd.eps.rev) + 0.001 #so that we can use variance stabilization BoxCox()
hd.ts <- ts(hd.eps.mod, start=c(1999, 1),frequency=4)
ggtsdisplay(hd.ts)

Per above time series plot, we will need to apply a variance-stabilization transformation. Furthermore, by the ACF we could hypothesize that we have periodic data (lag 4)

Step 2.) Apply a variance-stabilization transformation

We apply a variance-stabilization transformation below.

lambda.hd = BoxCox.lambda(hd.ts)
hd.ts.BC = BoxCox(hd.ts, lambda=lambda.hd)
ggtsdisplay(hd.ts.BC)

Step 3.) Remove trend and Check Stationarity

We will attempt to remove a possibly linear trend by take the 1st differences.

hd.ts.BC.diff = diff(hd.ts.BC, differences=1)
ggtsdisplay(hd.ts.BC.diff)

Our BoxCox transformed + detrended data now looks like above time series. It appears to be just white noise. We will test if the data is stationary.

detrend.hd.ts = hd.ts.BC.diff
adf.test(detrend.hd.ts)

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -10.97    0.01
## [2,]   1 -16.42    0.01
## [3,]   2 -12.07    0.01
## [4,]   3  -4.81    0.01
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -11.0    0.01
## [2,]   1 -17.0    0.01
## [3,]   2 -13.5    0.01
## [4,]   3  -5.8    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -11.00    0.01
## [2,]   1 -17.19    0.01
## [3,]   2 -14.32    0.01
## [4,]   3  -6.46    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

kpss.test(detrend.hd.ts)

## KPSS Unit Root Test 
## alternative: nonstationary 
##  
## Type 1: no drift no trend 
##  lag  stat p.value
##    2 0.466     0.1
## ----- 
##  Type 2: with drift no trend 
##  lag  stat p.value
##    2 0.173     0.1
## ----- 
##  Type 1: with drift and trend 
##  lag   stat p.value
##    2 0.0425     0.1
## ----------- 
## Note: p.value = 0.01 means p.value <= 0.01 
##     : p.value = 0.10 means p.value >= 0.10

Our data is stationary (just barely).

Spectral Analysis via spectrum() function

We are now ready to perform a spectral analysis on our data.

By default R’s spectrum() function will return the periodogram graph as a function of frequency.

Let \(f\) by frequency and \(T\) be period.

Then we have the following relations:

\[ f = \frac{1}{T}\\ T = \frac{1}{f} \] Let us carefully analyze and what this all means. Each data point in our Home Depot data set is collected every 3 months. Suppose we were looking for annual periodicity. This would mean we would need a period of 4 (because 4 quarters make a 1 year annual). A period \(T=4\) would mean frequency \(f=0.25\).

Let’s look at our peak at frequency \(f=1\). This means we have a period of \(T=1\). This means, since our data set is collected 1 data point every 3 months, that we have a quarterly (or every 3 months) component in our data set. This suggests that the Home Depot earnings per share follows a quarterly component.

spec.hd = spectrum(detrend.hd.ts,span=2, log='no')

Alternatively we can pull some information out of our spectrum object and reconfigure our graph so that the spectral density is a function of period (in terms of our time variable months).

delta = 1/3 #one observation every 3 months
specx = spec.hd$freq/delta
specy = 2*spec.hd$spec
plot(specx, specy, xlab="Period (months)", ylab="Spectral Density", type='l')

Data Analysis: House Price Index

Purchase-Only Index (Estimated Using Sales Price Data) for th 100 Largest Metropolitan Areas. Our analysis will focus on the San Diego-Chula Vista-Carlsbad area, otherwise known as west San Diego County (west of I-15). The data set includes quarterly data from Q1 1991 to most recent Q1 2021.

Web page: https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index-Datasets.aspx#mpo

As before we will attempt to

1.) Import Data
2.) Apply Variance Stabilization (if necessary)
3.) Apply differencing to de-trend (if necessary)
4.) Spectral Analysis to determine the most significant period lengths (if any)

Step 1.) Import Data and Visualize

hpi.dat = read.csv('HPImetro.csv', header=TRUE)
hpi.sd = hpi.dat[hpi.dat$metro_name=='San Diego-Chula Vista-Carlsbad, CA',]
sd.ts <- ts(hpi.sd$index_nsa, start=c(1991, 1),frequency=4)
ggtsdisplay(sd.ts)

Visually there appears to be an increasing global trend. Thus, differencing will most likely be necessary to remove it. There does not appear to be a need for variance-stabilization.

Step 2.) Variance Stabilization (not necessary)

Step 3.) Differencing and check for stationarity

We will attempt to remove the linear trend by 1st order differencing.

sd.diff.ts = diff(sd.ts, differences=1)
ggtsdisplay(sd.diff.ts)

adf.test(sd.diff.ts)

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag   ADF p.value
## [1,]   0 -3.93  0.0100
## [2,]   1 -3.32  0.0100
## [3,]   2 -1.67  0.0918
## [4,]   3 -1.43  0.1651
## [5,]   4 -1.93  0.0524
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -4.38   0.010
## [2,]   1 -3.78   0.010
## [3,]   2 -2.06   0.305
## [4,]   3 -1.83   0.397
## [5,]   4 -2.38   0.182
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -4.56  0.0100
## [2,]   1 -3.97  0.0135
## [3,]   2 -2.24  0.4735
## [4,]   3 -2.00  0.5738
## [5,]   4 -2.53  0.3510
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

kpss.test(sd.diff.ts)

## KPSS Unit Root Test 
## alternative: nonstationary 
##  
## Type 1: no drift no trend 
##  lag  stat p.value
##    2 0.668     0.1
## ----- 
##  Type 2: with drift no trend 
##  lag  stat p.value
##    2 0.264     0.1
## ----- 
##  Type 1: with drift and trend 
##  lag  stat p.value
##    2 0.141   0.059
## ----------- 
## Note: p.value = 0.01 means p.value <= 0.01 
##     : p.value = 0.10 means p.value >= 0.10

We have conflicting test conclusions about the stationarity of the series. Regardless, let us move onto the spectral analysis.

Step 4.) Spectral Analysis

spec.sd = spectrum(sd.diff.ts, span=2, log='no')

delta = 1/3 #one observation every 3 months
specx = spec.sd$freq/delta
specy = 2*spec.sd$spec
plot(specx, specy, xlab="Period (months)", ylab="Spectral Density", main='SD Price Index', type='l')

max.freq = specx[which.max(specy)]
max.freq

## [1] 0.2

max.period = 1/max.freq
max.period #every 3.75 months

## [1] 5

There appears to be two significant peaks.

Let us again take the time to analyze what this graph means. We have a maximum peak at frequency \(f=0.26666\). This implies that \(T=3.75\). This means that we have an almost annual component (11.25 months) in the SD home price index data set.

A second significant peak occurs at \(f=1.0\). This implies that \(T=1.0\), and so a period of 1 implies that we have a quarterly (or every 3 months) component to the SD home price index data set.

decomp.sd = decompose(sd.ts)
plot(decomp.sd)

In conclusion, what was once difficult to see visually in the original time series plot, we can now see that there is an annual component to our SD home price index.