First, we load the requisite packages.
library(forecast)
## Warning: package 'forecast' was built under R version 3.5.3
library(ggplot2)
Each is a dataset contained within the forecast package. The gold dataset contains time series data of daily morning gold prices in US dollars from 1/1/1985 to 3/31/1989. The woolyrnq dataset includes quarterly woolen yarn production from Australia from 2Q 1965 to 3Q 1994. Finally, the gas dataset consists of time series data of Australian monthly gas production from 1956 to 1995.
autoplot(gold) +
ggtitle("Daily morning gold prices") +
xlab("Days since 1/1/1985") +
ylab("US dollars")
autoplot(woolyrnq) +
ggtitle("Quarterly woollen yarn production in Australia") +
xlab("Year") +
ylab("Tonnes")
autoplot(gas) +
ggtitle("Australian monthly gas production") +
xlab("Year") +
ylab("Unknown units")
frequency(gold)
## [1] 1
Gold - daily
frequency(woolyrnq)
## [1] 4
Woolyrnq - quarterly
frequency(gas)
## [1] 12
Gas - monthly
which.max(gold)
## [1] 770
This function gives us 770, which corresponds with the single giant upward spike in prices. 770 signifies the number of days since the beginning of the time series, which would put the spike in Feb. 1987.
gold[770]
## [1] 593.7
A cursory google search revealed no details about such a spike. We want my want to review for the possibility of a data-entry error.
[Manually downloaded to R working directory]
You can read the data into R with the following script:
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
The second argument (skip=1) is required because the Excel sheet has two header rows.
Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349398A"],
frequency=12, start=c(1982,4))
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
autoplot(myts)
ggseasonplot(myts)
ggsubseriesplot(myts)
gglagplot(myts)
ggAcf(myts)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
Starting with the autoplot, we see both an obvious and increasing trend along with a clear seasonal pattern that increases in nominal size over time. The seasonal plot shows highest monthly sales generally occur in December, with the biggest dip arriving in February, and the seasonal subseries visualization only confirms this.
Neither the lag nor the autocorrelation plots add significant value to our analysis that was not already conveyed in the earlier visualizations. They back up the trend of increasing sales over time with seasonal variation that’s likely tied to holiday shopping.