1. Use the help function to explore what the series gold, woolyrnq and gas represent.

a. Use autoplot() to plot each of these in separate plots.
autoplot(ts(gold, start=c(1985,1), end=c(1989,90), frequency=365.25)) +
           ggtitle("Gold Prices") + ylab("US Dollars") +
           xlab("Date")

autoplot(woolyrnq) +
  ggtitle("Wollen Yarn Production", subtitle="Australia") +
  scale_y_continuous(labels = unit_format(accuracy = 1, unit = "K", scale = 1e-3)) +
  ylab("Tonnes") +
  xlab("Date")

autoplot(gas) +
  ggtitle("Gas Production", subtitle="Australia") +
  scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3)) +
  ylab("Production") +
  xlab("Date")

b. What is the frequency of each series? Hint: apply the frequency() function.
  • The gold dataset has a frequency of 1, while the woolyrnq dataset has a frequency of 4, and the gas dataset has a frequency of 12.
c. Use which.max() to spot the outlier in the gold series. Which observation was it?
  • The largest value in the gold dataset is the 770th observation, which represents a value of $593.70.

2. Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a. Read the data into R.
tute1 <- read_csv("tute1.csv")
b. Convert the data to time series.
tutets <- ts(tute1[,-1],start = 1981,frequency = 4)
c. Construct time series plots of each of these three series.
autoplot(tutets,facets = TRUE)

Check what happens when you don’t include facets=TRUE.

autoplot(tutets)

  • The plot without facets = TRUE shows all 3 series in the same graph, without breaking them out into facets. This may cause issues with scale on some datasets.

3. Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

a. You can read the data into R.
retail <- readxl::read_xlsx("retail.xlsx",skip=1)
b. Select one of the time series as follows (but replace the column name with your own chosen column):
# A3349414R series ("Turnover ;  Victoria ;  Liquor retailing ;")
myts <- ts(retail[,"A3349414R"], frequency = 12, start = c(1982,4))
c. Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf(). Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
# Initial plot of the data
autoplot(myts) +
  labs(title="Liquor Retailing Turnover",subtitle="Victoria") +
  xlab("Date") + ylab("Turnover")

  • This basic plot seems to indicate the presence of both a trend and a seasonality.
ggseasonplot(myts) +
  labs(title="Seasonal Plot", subtitle="Liquor Retail Turnover in Victoria") +
  ylab("Turnover")

  • The seasonal plot, while busy, does show a bit more about the seasonality. There is definitely an increase in turnover at the end of the year.

  • There also may be a bit of a seasonal trend around March that seems to be more pronounced as the years go by.

ggsubseriesplot(myts) +
  labs(title="Subseries Plot", subtitle="Liquor Retail Turnover in Victoria") +
  ylab("Turnover")

  • Here we confirm the overall trend of an increase over the years.
gglagplot(myts, lags=12) +
  labs(title="Lag Plots", subtitle="Liquor Retail Turnover in Victoria")

  • This graph shows that most months have some positive correlation with previous months, to varying degrees.

  • Let’s drill-down and look at some specifics:

gglagplot(myts, lags=1) +
  labs(title="Lag Plots", subtitle="Liquor Retail Turnover in Victoria")

  • First we look at the correlation with the prior month only. Most months correlate well with the previous month. November and December, however, do not correlate as well. This makes sense given the spikes we saw in those months.
gglagplot(myts, lags=12, set.lags = 12) +
  labs(title="Lag Plots", subtitle="Liquor Retail Turnover in Victoria")

  • When we look at the correlation to a year (12 months) ago, we see there is a pretty solid linear correlation. December, however, seems to deviate a bit in more recent observations, indicating that there may be some outliers.
ggAcf(myts) +
  labs(title = "Autocorrelation Plot",
       subtitle="Liquor Retail Turnover in Victoria")

  • The autocorrelation plot shows much of the same of what we saw above, there is an upward trend in the data, indicated by the larger values for smaller lags that decrease as the lags get larger.

  • The plot also confirms some level of slight seasonality, which we also suspected. This is shown by slightly larger values at lags 12 and 24.


6 Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Dataset: hsales
autoplot(hsales) +
  labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")

ggseasonplot(hsales, polar = T) +
  labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")

ggsubseriesplot(hsales) +
  labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")

gglagplot(hsales) +
  labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")

ggAcf(hsales) +
  labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")

  • With the hsales dataset, we see evidence of a cycle (about 7 years or so in length), and seasonality, but little evidence of a trend.
Dataset: usdeaths
autoplot(usdeaths) +
  labs(title="Monthly Accidental Deaths", subtitle="United States") +
  scale_y_continuous(labels = comma)

ggseasonplot(usdeaths) +
  labs(title="Monthly Accidental Deaths", subtitle="United States") +
  scale_y_continuous(labels = comma)

ggsubseriesplot(usdeaths) +
  labs(title="Monthly Accidental Deaths", subtitle="United States") +
  scale_y_continuous(labels = comma)

gglagplot(usdeaths, lags = 12) +
  labs(title="Monthly Accidental Deaths", subtitle="United States") +
  scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3))

ggAcf(usdeaths) +
  labs(title="Monthly Accidental Deaths", subtitle="United States")

  • Here in the usdeaths dataset, we see definite seasonality. The acceident rate seems to increase in the summer months and decrease in the winter months. there doesn’t appear to be much trend. The values for 1973 are higher, though that might be due to a change in record-keeping methods, definitions of an “accident”, or the federal 55mph speed limits on highways (signed into law in 1974).
Dataset: bricksq
autoplot(bricksq) +
  labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
  scale_y_continuous(labels = comma)

ggseasonplot(bricksq) +
  labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
  scale_y_continuous(labels = comma)

ggsubseriesplot(bricksq) +
  labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
  scale_y_continuous(labels = comma)

gglagplot(bricksq, lags = 12) +
  labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
  scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3))

ggAcf(bricksq) +
  labs(title="Quarterly Clay Brick Production", subtitle="Australia")

  • With the bricksq dataset we see an upward trend through about 1980 when the trend seems to flatten. The data also appears to get rather volatile around 1975, which may be the start of a cyclical pattern (increase, big drop, increase, big drop). There does appear to be some slight seasonality, with Q1 generally being lower that following quarters.
Dataset: sunspotarea
autoplot(sunspotarea) +
  labs(title="Annual Average Sunspot Area") +
  scale_y_continuous(labels = comma)

gglagplot(sunspotarea, lags = 12) +
  labs(title="Annual Average Sunspot Area") +
  scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3))

ggAcf(sunspotarea) +
  labs(title="Annual Average Sunspot Area")

  • Looking at the plots for the sunspotarea dataset, we can see that there is a cycle of about 4-5 years length. There is no seasonality, as the data are collected annually. As far as a trend, there doesn’t appear to be one (though perhaps there is a larger cycle at work here).
Dataset: gasoline
autoplot(gasoline) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggseasonplot(gasoline) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggsubseriesplot(ts(gasoline, frequency = 52)) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

gglagplot(gasoline, lags = 10) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggAcf(gasoline) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States")

  • Looking at the gasoline dataset, we see a lot of data present. The first graph shows what appears to be an upward trend from 1990 until about 2005. The data then seems to decrease a bit until about 2012 when it then begins another increase.

  • There does appear to be some seasonality in the middle of the year where the values increase, but it is difficult to see with all the years on one plot.

ggseasonplot(window(gasoline, end=1995)) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggseasonplot(window(gasoline, start=1996, end=2000)) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggseasonplot(window(gasoline, start=2001, end=2005)) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggseasonplot(window(gasoline, start=2006, end=2010)) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

ggseasonplot(window(gasoline, start=2011, end=2017)) +
  labs(title="Weekly Gasoline Supplied", subtitle="United States") +
  ylab("Barrels Per Day") +
  scale_y_continuous(labels = unit_format(unit = "M"))

  • Here, with the years broken out, we can see that same seasonality pattern exists year after year.