library(fpp2)
library(kableExtra)

Exercise 2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

#help('gold') # Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.
#help("woolyrnq") # Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.
#help('gas') # Australian monthly gas production: 1956–1995.
  1. Use autoplot() to plot each of these in separate plots.
autoplot(gold) + ggtitle("Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989") 

autoplot(woolyrnq) + ggtitle("Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994")

autoplot(gas) + ggtitle("Australian monthly gas production: 1956–1995")

  1. What is the frequency of each series? Hint: apply the frequency() function.
frequency(gold)
## [1] 1

Annually

frequency(woolyrnq)
## [1] 4

Quarterly

frequency(gas)
## [1] 12

Monthly

  1. Use which.max() to spot the outlier in the gold series.
which.max(gold)
## [1] 770

Which observation was it?

gold[which.max(gold)]
## [1] 593.7

Exercise 2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. Read in data.
tute1 <- read.csv('tute1.csv')
  1. Convert to Time Series
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
  1. Construct time series plots of each of the three series
autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facets=TRUE

autoplot(mytimeseries, facets=F)

With out the facet feature all plots are placed onto one axis. This method does not make it easy to visually compare each individual plots.

Exercise 2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

  1. Read in data.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
  1. Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349399C"], frequency=12, start=c(1982,4)) #clothing
  1. Explore your chosen retail time series using the following functions:

autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

autoplot(myts) + xlab('Year') + ylab('Turnover: Clothing')

The plot show that there is an increasing trend with strong seasonality.

ggseasonplot(myts, year.labels = T, year.labels.left = T)

In January the sales begin to fall. As spring approach sales increase, dips a little over the summer then in the fall with september (around the time school re-opens) sales hike again. Also this is the time people prepare for the holidays (Nov - Dec) and do their last minute shopping (Thanksgiving, Black Friday and Christmas).

ggsubseriesplot(myts)

The horizontal lines represent the mean sales for each month. We can see the changes over time. December being the time where most sales are done.

gglagplot(myts)

Overall, the data shows moderate autocorrelation. However with lag 12 the relationship is strongly positive therefore revealing strong seasonality.

ggAcf(myts)

This is clearly not a white noise series as all the ACFs are past zero. The scalloped shape is due to the seasonality.

Exercise 2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

hsales

autoplot(hsales)

This plot displays cyclicity and seasonality. If you look closely at the peaks and troughs they both happen at the same time in each year. In each year, there are two peaks followed by a big dip in sales. Every 8 or so years the house sales are at the lowest.

ggseasonplot(hsales, year.labels = T, year.labels.left = T)

This plot confirms what I mentioned earlier. Sales increase towards March, decreases in May - July and increases a little in August - October then decreases again from then on. This explains the double peaks in the time series above.

ggsubseriesplot(hsales)

Confirms that I mentioned earlier with the seasonality plot.

gglagplot(hsales)

Lag 1 shows moderate autocorrelation while in lag 15 and 16 there is a lot a white noise.

Below the autocorrelation plot is given along with the coefficients.

ggAcf(hsales, lag.max = 48)

rk <- ggAcf(hsales, lag.max = 48, plot = F)
lag <- rk[["lag"]]
lag <- lag[,,1]
corrs <- rk[["acf"]]
corrs <- corrs[,,1]

autocorr <- data.frame(lag, corrs)

kable(autocorr[-1,]) %>% kable_styling(full_width = F) %>% scroll_box(height = "400px", width = "300px")
lag corrs
2 1 0.8550347
3 2 0.6668299
4 3 0.4688587
5 4 0.3367411
6 5 0.2846623
7 6 0.2385458
8 7 0.2230456
9 8 0.2190420
10 9 0.2957427
11 10 0.4345650
12 11 0.5519956
13 12 0.6103797
14 13 0.5150662
15 14 0.3655066
16 15 0.1954704
17 16 0.0761805
18 17 0.0071649
19 18 -0.0668550
20 19 -0.1003340
21 20 -0.1205970
22 21 -0.0450848
23 22 0.0752725
24 23 0.1798855
25 24 0.2388440
26 25 0.1661145
27 26 0.0329383
28 27 -0.1201804
29 28 -0.2291587
30 29 -0.2946261
31 30 -0.3548079
32 31 -0.3700597
33 32 -0.3733058
34 33 -0.2955296
35 34 -0.1812626
36 35 -0.0934901
37 36 -0.0499018
38 37 -0.1274890
39 38 -0.2418412
40 39 -0.3691897
41 40 -0.4424482
42 41 -0.4747679
43 42 -0.5150104
44 43 -0.5105460
45 44 -0.4989925
46 45 -0.4244350
47 46 -0.2974672
48 47 -0.1810163
49 48 -0.1207699

In this graph: \(r_1\) is higher than for the other lags while \(r_{42}\) is more negative than for the other lags. This is due to the seasonal pattern in the data. The highest peaks tend to be every 12 months and the longest troughs tend to be 10 - 12 months apart.

usdeaths

autoplot(usdeaths)

Seasonality - If you look at the plot closely, you can see the peaks happen in the middle of each year and the troughs occur at the start of each year.

Cyclicity - The impact occurs every year.

ggseasonplot(usdeaths, year.labels = T, year.labels.left = T)

Here is a clear and obvious view of what happens through out the year. July is the peak time when most death occurred.

ggsubseriesplot(usdeaths)

Closer look: Febuary has the lowest average death rates.

gglagplot(usdeaths, do.lines = F)

Lag 1, 12, 13 shows strong positive correlations while lags 6 shows negative correlations

ggAcf(usdeaths)

Here we see that there tends to be cyclic impact to the us death rates every year. Every 6 months the peaks and troughs occur. The highest peaks are at 1, 12 and 24 while the troughs are at 6 and 18. The plot also backs the point I made about the correlations in lags 1, 12 and 6.

bricksq

autoplot(bricksq)

This plot has a pattern to it but there is not an even space between each space. There is not any predictability to when these peacks and troughs will occur. This graph would be considered cyclic but also has a positive trend followed by slow decrease.

ggseasonplot(bricksq, year.labels = T, year.labels.left = T)

Here, we see that brick production has consistently increased over the years. The trends tends to be lowest in Q1 then typically peaks in Q2, levels off in Q3 then decrases slightling in Q4.

ggsubseriesplot(bricksq)

Confirms what was stated in the seasonality plot above.

gglagplot(bricksq)

ggAcf(bricksq)

This ACF plot shows that the greatest autocorrelation values occur at lags 4, 8, 12, 16, and 20. If you look at the lag plot above you can see that the the relationship appears strongest for these lags, thus supporting point for this graph.

sunspotarea

autoplot(sunspotarea)

This plot shows cyclicity and no seanality or trend. The ‘double’ peaks seems to happen every other decade.

#ggseasonplot(sunspotarea)
#ggsubseriesplot(sunspotarea)
gglagplot(sunspotarea)

ggAcf(sunspotarea)

The rise and falls in the ACF are due to cyclicity. There are some white noise in the data especially at lag 9, 13, 18 and 19. The peaks and troughs tend to be every 10 years.

gasoline

autoplot(gasoline)

Finally, this plot displays cyclicity with increasing trend. There is no obvious or regular pattern to indicate seasonality.

ggseasonplot(gasoline, year.labels = T, year.labels.left = T)

This plot confirms what I mentioned above with having no obvious pattern but the gasoline production increases over the weeks.

#ggsubseriesplot(gasoline)
gglagplot(gasoline, do.lines = F)

All lags apprear to be highly correlated positively.

ggAcf(gasoline)

There are no white noise in this series as the spikes are outside the bounds on the graph so some times series data defintely exists in this data.