DATA 624 Homework 1
# removed echo = FALSE upstairs to enable collapseable code chunks
library(knitr)
library(rmdformats)
## Global options
options(max.print="31")
# opts_chunk$set(echo=FALSE,
# cache=TRUE,
# prompt=FALSE,
# tidy=TRUE,
# comment=NA,
# message=FALSE,
# warning=FALSE)
opts_knit$set(width=31)
library(forecast)## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
##
## Attaching package: 'fpp2'
## The following objects are masked from 'package:fpp':
##
## ausair, ausbeer, austa, austourists, debitcards, departures,
## elecequip, euretail, guinearice, oil, sunspotarea, usmelec
## Skipping install of 'rmdformats' from a github remote, the SHA1 (6221330c) has not changed since last install.
## Use `force = TRUE` to force installation
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Exercise 2.10.1
Use the help function to explore what the series gold, woolyrnq and gas represent.
gold is the daily morning gold prices in US dollars from 1 January 1985 to 31 March 1989.
woolyrnq is the quarterly production of woolen yarn in Australia in tons from March 1965 to September 1994.
gas is monthly gas production in Australia from 1956-1995.
a) Use autoplot() to plot each of these in separate plots.
b) What is the frequency of each series? Hint: apply the frequency() function.
## [1] 1
## [1] 4
## [1] 12
gold is an annual time series, woolyrnq is a quarterly time series and gas is a monthly time series
Exercise 2.10.2
a) You can read the data into R with the following script:
b) Convert the data to time series
Construct time series plots of each of the three series
When facets is not set to TRUE, what it does is it will not break out the individual series into different panels with specific range of y labels corresponding to the series.
Exercise 2.10.3
a) You can read the data into R with the following script:
b) Select one of the time series as follows (but replace the column name with your own chosen column):
c) Explore your chosen retail time series using the following functions:
Results & Learnings
Seasonal Plot
The seasonal plot confirms the seasonality and upwards trend found in the time plot.
Seasonal Subseries Plot
This subseries plot confirms the upwards trend and significant seasonal averages of retail sales at an elevated level for the month of Nov and Dec.
Time Series Lag Plot
The lagplot has the strongest linear relationship at lag 12, confirming that the data has annual seasonality. However, it also shows a positive linear relationship for most of the lag plots.
ACF Plot
The autocorrelation plot has peaks in \(r_{12}\) and \(r_{24}\) tells us that the peaks are 12 months apart and it follows an annual seasonality. There are significantly more than 5% of spikes that are outside the bounds of dashed blue lines. That means the series is not white noise.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
- I do spot seasonality. It appeared apparently in Nov and Dec, which is the holiday season and ideal for shopping, esp. around Thanksgiving, Black Friday, Cyber Monday, Christmas and New Year.
- No cyclicity is detected.
- Long term, I do see there is an upward trend in retail sales. Tho one could argue that the upward trend wasn’t apparent between 1998 and 2011 where retail prices stayed relatively flat.
Exercise 2.10.6
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
hsales
Description
Monthly sales of new one-family houses sold in the USA since 1973.
Results & Learnings
Seasonal Plot
The seasonal plot confirms the seasonality. It also reveals a peak in March and trough in December.
Seasonal Subseries Plot
This averages (blue horizontal bars) for each month shows there is an increasing trend from Dec to March, which is the peak, and there is a decreasing trend from March to Dec.
Time Series Lag Plot
The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues until lag 12 when it becomes somewhat linear. This reflects that this time series has some annual seasonality, but it mainly depends on the previous month of data.
ACF Plot
The autocorrelation plot has \(r_{12}\) higher than the other lags, due to the annual seasonal pattern in the data. Even higher is \(r_{1}\), which confirms that the previous month in the time series is indicative of the next. There are significantly more than 5% of spikes that are outside the bounds of dashed blue lines. That means the series is not white noise.
Can you spot any seasonality, cyclicity and trend?
- Annual seasonality with a peak in March and a trough in December
- Cyclic pattern detected every 8 - 10 years
- No evidence of a trend
usdeaths
Description
Monthly accidental deaths in USA.
Results
Seasonal Subseries Plot
The subseries plot shows there is an increasing trend from feb to Jul and a decreasing trend from July to Feb. There is a small sideway trend between September and Dec. Peak is July and trough is Feb.
Time Series Lag Plot
The lagplot has the strongest linear relationship at lag 12, confirming the annual seasonality present in the time series.
ACF Plot
The autocorrelation plot has \(r_{12}\) higher than the other lags as the annual seasonal pattern in the data. \(r_1\) has an higher ACF which shows that the previous month in the time series is indicative of the next. \(r_6\) and \(r_{18}\) show strong negative correlations which further confirm the annual seasonality. Same can be applied to \(r_{24}\). The series is not white noise as there are more than 5% of spikes are outside of the dotted blue lines.
Can you spot any seasonality, cyclicity and trend?
- Annual seasonality with a peak in July and a trough in February
- No evidence of a trend.
- No evidence of cyclic pattern
bricksq
Description
Australian quarterly clay brick production: 1956–1994.
Results
Seasonal Subseries Plot
The subseries plot confirms the peak at Q3 and trough at Q1.
Time Series Lag Plot
It has the strongest linear relationship at lag 1. However, as you move to higher lags, you can tell there is a higher variability in the higher production ranges (400 - 600+) as you increase the lags.
ACF Plot
The autocorrelation plot has peaks at \(r_4\), \(r_8\), \(r_12\) as there are annual seasonal patterns in the data. It attains its peak at \(r_1\) which confirms that the previous quarter in the time series is indicative of the next. All lags are higher than the dashed blue lines means there are not white noises.
Can you spot any seasonality, cyclicity and trend?
- Annual seasonality with a peak in Q3 and a trough in Q1
- Upwards trend from 1956 - 1974. Sideway trend in the ensuing 20 years
- Cyclic pattern detected for every 8 years
sunspotarea
Description
Annual average sunspot area (1875-2015)
Results
Seasonal Subseries Plot
## Data are not seasonal
Time Series Lag Plot
The lagplot has the strongest linear relationship at lag 10, followed by lag 11. It confirms the data is cyclical on a 10 - 12 year basis.
ACF Plot
The autocorrelation plot has positive correlation peaks at \(r_1, r_{10}, r_{11}, r_{21}, r_{22}, \space and \space r_{32}\). Negative correlation peaks occur at \(r_5, r_{16}\), and \(r_{27}\). This confirms that each cycle is 10 - 11 years.
Can you spot any seasonality, cyclicity and trend?
- No evidence of seasonality
- No evidence of a trend
- Cyclic pattern every 10-12 years
gasoline
Description
Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.
Results
Time Series Plot
Incrasing trend from 1994 to 2007. Decreasing trend from 2007 to 2012. Increasing trend from 2012 to 2016.
Seasonal Plot
The seasonal plot confirms the weekly seasonality. It also reveals a peak in weeks 29 - 39 and a trough during weeks 5 - 11.
Seasonal Subseries Plot
gasoline %>%
as.vector()%>%
ts(., frequency=52) %>%
ggsubseriesplot() +theme(axis.text=element_text(size=4))It shows a good rising trend from week 2 to week 30 and a decreasing seasonal trend from week 30 to week 2.
Time Series Lag Plot
The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues. This reflects that this time series mainly depends on the previous week of data.
ACF Plot
The autocorrelation plot has \(~r_{52} \space and \space ~r_{104}\) higher than the other lags signaling an annual seasonal patterns in the data as there are approximately 52 weeks a year. There are troughs at \(~r_{26} \space and \space ~r_{78}\). This illustrates annual seasonality.
Can you spot any seasonality, cyclicity and trend?
- Annual seasonality
- Generally upwards trend from 1994 to 2007 and 2012 to 2016. Decreasing trend otherwise.
- No evidence of cyclic pattern