Use the help function to explore what the series gold
, woolyrnq
and gas
represent.
#help('gold') # Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.
#help("woolyrnq") # Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.
#help('gas') # Australian monthly gas production: 1956–1995.
autoplot(gold) + ggtitle("Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989")
autoplot(woolyrnq) + ggtitle("Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994")
frequency()
function.## [1] 1
Annually
## [1] 4
Quarterly
## [1] 12
Monthly
## [1] 770
Which observation was it?
## [1] 593.7
Download the file tute1.csv
from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
Check what happens when you don’t include facets=TRUE
With out the facet feature all plots are placed onto one axis. This method does not make it easy to visually compare each individual plots.
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
autoplot()
, ggseasonplot()
, ggsubseriesplot()
, gglagplot()
, ggAcf()
The plot show that there is an increasing trend with strong seasonality.
In January the sales begin to fall. As spring approach sales increase, dips a little over the summer then in the fall with september (around the time school re-opens) sales hike again. Also this is the time people prepare for the holidays (Nov - Dec) and do their last minute shopping (Thanksgiving, Black Friday and Christmas).
The horizontal lines represent the mean sales for each month. We can see the changes over time. December being the time where most sales are done.
Overall, the data shows moderate autocorrelation. However with lag 12 the relationship is strongly positive therefore revealing strong seasonality.
This is clearly not a white noise series as all the ACFs are past zero. The scalloped shape is due to the seasonality.
Use the following graphics functions: autoplot()
, ggseasonplot()
, ggsubseriesplot()
, gglagplot()
, ggAcf()
and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
This plot displays cyclicity and seasonality. If you look closely at the peaks and troughs they both happen at the same time in each year. In each year, there are two peaks followed by a big dip in sales. Every 8 or so years the house sales are at the lowest.
This plot confirms what I mentioned earlier. Sales increase towards March, decreases in May - July and increases a little in August - October then decreases again from then on. This explains the double peaks in the time series above.
Confirms that I mentioned earlier with the seasonality plot.
Lag 1 shows moderate autocorrelation while in lag 15 and 16 there is a lot a white noise.
Below the autocorrelation plot is given along with the coefficients.
rk <- ggAcf(hsales, lag.max = 48, plot = F)
lag <- rk[["lag"]]
lag <- lag[,,1]
corrs <- rk[["acf"]]
corrs <- corrs[,,1]
autocorr <- data.frame(lag, corrs)
kable(autocorr[-1,]) %>% kable_styling(full_width = F) %>% scroll_box(height = "400px", width = "300px")
lag | corrs | |
---|---|---|
2 | 1 | 0.8550347 |
3 | 2 | 0.6668299 |
4 | 3 | 0.4688587 |
5 | 4 | 0.3367411 |
6 | 5 | 0.2846623 |
7 | 6 | 0.2385458 |
8 | 7 | 0.2230456 |
9 | 8 | 0.2190420 |
10 | 9 | 0.2957427 |
11 | 10 | 0.4345650 |
12 | 11 | 0.5519956 |
13 | 12 | 0.6103797 |
14 | 13 | 0.5150662 |
15 | 14 | 0.3655066 |
16 | 15 | 0.1954704 |
17 | 16 | 0.0761805 |
18 | 17 | 0.0071649 |
19 | 18 | -0.0668550 |
20 | 19 | -0.1003340 |
21 | 20 | -0.1205970 |
22 | 21 | -0.0450848 |
23 | 22 | 0.0752725 |
24 | 23 | 0.1798855 |
25 | 24 | 0.2388440 |
26 | 25 | 0.1661145 |
27 | 26 | 0.0329383 |
28 | 27 | -0.1201804 |
29 | 28 | -0.2291587 |
30 | 29 | -0.2946261 |
31 | 30 | -0.3548079 |
32 | 31 | -0.3700597 |
33 | 32 | -0.3733058 |
34 | 33 | -0.2955296 |
35 | 34 | -0.1812626 |
36 | 35 | -0.0934901 |
37 | 36 | -0.0499018 |
38 | 37 | -0.1274890 |
39 | 38 | -0.2418412 |
40 | 39 | -0.3691897 |
41 | 40 | -0.4424482 |
42 | 41 | -0.4747679 |
43 | 42 | -0.5150104 |
44 | 43 | -0.5105460 |
45 | 44 | -0.4989925 |
46 | 45 | -0.4244350 |
47 | 46 | -0.2974672 |
48 | 47 | -0.1810163 |
49 | 48 | -0.1207699 |
In this graph: \(r_1\) is higher than for the other lags while \(r_{42}\) is more negative than for the other lags. This is due to the seasonal pattern in the data. The highest peaks tend to be every 12 months and the longest troughs tend to be 10 - 12 months apart.
Seasonality - If you look at the plot closely, you can see the peaks happen in the middle of each year and the troughs occur at the start of each year.
Cyclicity - The impact occurs every year.
Here is a clear and obvious view of what happens through out the year. July is the peak time when most death occurred.
Closer look: Febuary has the lowest average death rates.
Lag 1, 12, 13 shows strong positive correlations while lags 6 shows negative correlations
Here we see that there tends to be cyclic impact to the us death rates every year. Every 6 months the peaks and troughs occur. The highest peaks are at 1, 12 and 24 while the troughs are at 6 and 18. The plot also backs the point I made about the correlations in lags 1, 12 and 6.
This plot has a pattern to it but there is not an even space between each space. There is not any predictability to when these peacks and troughs will occur. This graph would be considered cyclic but also has a positive trend followed by slow decrease.
Here, we see that brick production has consistently increased over the years. The trends tends to be lowest in Q1 then typically peaks in Q2, levels off in Q3 then decrases slightling in Q4.
Confirms what was stated in the seasonality plot above.
This ACF plot shows that the greatest autocorrelation values occur at lags 4, 8, 12, 16, and 20. If you look at the lag plot above you can see that the the relationship appears strongest for these lags, thus supporting point for this graph.
This plot shows cyclicity and no seanality or trend. The ‘double’ peaks seems to happen every other decade.
The rise and falls in the ACF are due to cyclicity. There are some white noise in the data especially at lag 9, 13, 18 and 19. The peaks and troughs tend to be every 10 years.
Finally, this plot displays cyclicity with increasing trend. There is no obvious or regular pattern to indicate seasonality.
This plot confirms what I mentioned above with having no obvious pattern but the gasoline production increases over the weeks.
All lags apprear to be highly correlated positively.
There are no white noise in this series as the spikes are outside the bounds on the graph so some times series data defintely exists in this data.