Time Series

library(fpp2)
library(zoo)
library(plotly)

Question 2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

  • gold: Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.
  • woolrnq: Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.
  • gas: Australian monthly gas production: 1956–1995.

Part A

Use autoplot() to plot each of these in separate plots.

data(gold)
ggplotly(autoplot(gold))
data(woolyrnq)
ggplotly(autoplot(woolyrnq))
data(gas)
ggplotly(autoplot(gas))

Part B

What is the frequency of each series? Hint: apply the frequency() function.

frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12

Part C

Use which.max() to spot the outlier in the gold series. Which observation was it?

#position of the outlier
which.max(gold)
## [1] 770
#value of the outlier
gold[which.max(gold)]
## [1] 593.7

Question 2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

Part A

You can read the data into R with the following script:

tute1 <- read.csv("http://otexts.com/fpp2/extrafiles/tute1.csv", header=TRUE)
head(tute1)
##        X  Sales AdBudget   GDP
## 1 Mar-81 1020.2    659.2 251.8
## 2 Jun-81  889.2    589.0 290.9
## 3 Sep-81  795.0    512.5 290.8
## 4 Dec-81 1003.9    614.1 292.4
## 5 Mar-82 1057.7    647.2 279.1
## 6 Jun-82  944.4    602.0 254.0

Part B

Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
#(The [,-1] removes the first column which contains the quarters as we don’t need them now.)

Part C

Construct time series plots of each of the three series

ggplotly(autoplot(mytimeseries, facets=TRUE))

Question 2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

Part A

You can read the data into R with the following script:

temp = tempfile(fileext = ".xlsx")
dataURL <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
download.file(dataURL, destfile=temp, mode='wb')

retaildata <- readxl::read_excel(temp, skip=1)
#The second argument (skip=1) is required because the Excel sheet has two header rows.
head(retaildata[,"A3349396W"])
## # A tibble: 6 x 1
##   A3349396W
##       <dbl>
## 1     3396.
## 2     3498.
## 3     3358.
## 4     3487.
## 5     3356.
## 6     3454.

Part B

Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,"A3349396W"], frequency=12, start=c(1982,4))

Part C

Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Time Plot

The time plot shows:

  • Annual seasonality, with a peak in December and a low in February
  • A consistent upwards trend
  • No evidence of cyclic behavior
theme_set(theme_light(base_size = 12))
p <- autoplot(myts) + ggtitle("Retail Data A3349396W time plot")
ggplotly(p)

Season Plot

The seasonal plot confirms the seasonality and upwards trend found in the time plot. It also shows that the upwards trend has been accelerating.

ggseasonplot(myts, polar = TRUE) + ggtitle("Retail Data A3349396W season plot")

Season Subseries Plot

This subseries plot confirms the upwards trend and the peak that we see in Decmeber and the trough that occurs in February. Other than this, the subseries plot for this time series des not tell us much.

ggsubseriesplot(myts) + ggtitle("Retail Data A3349396W season subseries plot")

Lag Plot

The lagplot has the strongest linear relationship at lag 12, confirming that the data has annual seasonality. However, it also shows a positive linear relationship for most of the lag plots. This is because the time series is nearly always increasing.

gglagplot(myts) + ggtitle("Retail Data A3349396W lag plot")

Correlogram

The autocorrelation plot has r12 slightly higher than the other lags, due to the annual seasonal pattern in the data. All lags are higher than the dashed blue lines, which indicates the correlations are significantly different then 0.

ggAcf(myts) + ggtitle("Retail Data A3349396W autocorrelation plot")

Each of these plots has confirmed that the series has an annual seasonality and an upwards trend.

Question 2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

hsales

Monthly sales of new one-family houses sold in the USA since 1973.

These plots have indicated that the series has an annual seasonality and is strongly dependent on the previous month of data. There is also some evidence of cyclical behavior every 6-9 years.

Time Plot

The time plot shows:

  • Annual seasonality, with a peak in February and a trough in December
  • No evidence of a trend
  • Cyclic behaviour every 6-9 years
theme_set(theme_light(base_size = 12))
p <- autoplot(hsales) + ggtitle("hsales time plot")
ggplotly(p)

Season Plot

The seasonal plot confirms the seasonality. It also reveals a peak in March and trough in December.

ggseasonplot(hsales, polar = FALSE) + ggtitle("hsales season plot")

Season Subseries Plot

This subseries plot shows the seasonal behavior of the time series (increasing from January - March, decreasing from March - December).

ggsubseriesplot(hsales) + ggtitle("hsales season subseries plot")

Lag Plot

The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues until lag 12, when it becomes somewhat linear. This reflects that this time series has some annual seasonality, but mainly depends on the previous month of data.

gglagplot(hsales) + ggtitle("hsales lag plot")

Correlogram

The autocorrelation plot has r12 higher than the other lags, due to the annual seasonal pattern in the data. Even higher is r1, which confirms that the previous month in the time series is indicative of the next. All lags except 16 - 22 are higher than the dashed blue lines, which indicates the correlations are significantly different then 0.

ggAcf(hsales) + ggtitle("hsales autocorrelation plot")

usdeaths

Monthly accidental deaths in USA.

These plots have indicated that the series has an annual seasonality with a peak in June and a trough in February. There is no significant evidence of trend or cyclic behavior.

Time Plot

The time plot shows:

  • Annual seasonality, with a peak in July and a trough in February
  • No evidence of a trend (only five years of data, may be trend if more data was provided)
  • No evidence of cyclic behavior
theme_set(theme_light(base_size = 12))
p <- autoplot(usdeaths) + ggtitle("usdeaths time plot")
ggplotly(p)

Season Plot

The seasonal plot confirms the annual seasonality. It also hits at a downward trend, but there is not enough data to confirm.

ggseasonplot(usdeaths, polar = TRUE) + ggtitle("usdeaths season plot")

Season Subseries Plot

This subseries plot shows the seasonal behavior of the time series (decreasing from July - February, decreasing from February - July).

ggsubseriesplot(usdeaths) + ggtitle("usdeaths season subseries plot")

Lag Plot

The lagplot has the strongest linear relationship at lag 12, confirming the annual seasonality present in the time series.

gglagplot(usdeaths) + ggtitle("usdeaths lag plot")

Correlogram

The autocorrelation plot has r12 higher than the other lags, due to the annual seasonal pattern in the data. Even higher is r1, which shows that the previous month in the time series is indicative of the next. r6 and r18 show strong negative correlations, which further confirm the annual seasonality.

ggAcf(usdeaths) + ggtitle("usdeaths autocorrelation plot")

bricksq

Australian quarterly clay brick production: 1956–1994.

These plots have indicated that the series has an annual seasonality and an upwards trend. There is also evidence of cyclic behavior every 8 years.

Time Plot

The time plot shows:

  • Annual seasonality, with a peak in Q3 and a trough in Q1
  • Generally upwards trend
  • Cyclic behaviour every 8 years
theme_set(theme_light(base_size = 12))
p <- autoplot(bricksq) + ggtitle("bricksq time plot")
ggplotly(p)

Season Plot

The seasonal plot confirms the seasonality. It also confirms the trough in Q1 and peak in Q3, as well as the upwards trend.

ggseasonplot(bricksq, polar = TRUE) + ggtitle("bricksq season plot")

Season Subseries Plot

This subseries plot doesn’t tell us too much about seasonality, but it does show that the rises and falls in the time series are consistent across quarters.

ggsubseriesplot(bricksq) + ggtitle("bricksq season subseries plot")

Lag Plot

The lagplot has the strongest linear relationship at lag 1, but all the lag plots are somewhat linear (with increased variability at higher values).

gglagplot(bricksq) + ggtitle("bricksq lag plot")

Correlogram

The autocorrelation plot has peaks at r4, r8, r12 due to the annual seasonal pattern in the data. Even higher is r1, which confirms that the previous quarter in the time series is indicative of the next. All lags are higher than the dashed blue lines, which indicates the correlations are positive and significantly different then 0.

ggAcf(bricksq) + ggtitle("bricksq autocorrelation plot")

sunspotarea

Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.

These plots have indicated that the series only showcases cyclic behavior every ~10-12 years.

Time Plot

The time plot shows:

  • No evidence of seasonality
  • No evidence of a trend
  • Cyclic behaviour every 10-12 years
theme_set(theme_light(base_size = 12))
p <- autoplot(sunspotarea) + ggtitle("sunspotarea time plot")
ggplotly(p)

Season Plot

Data are not seasonal

Season Subseries Plot

Data are not seasonal

Lag Plot

The lagplot has the strongest linear relationship at lag 10 - this is further confirmation that the data is cyclic (on a 10 - 12 year period).

gglagplot(sunspotarea, lag = 12) + ggtitle("sunspotarea lag plot")

Correlogram

The autocorrelation plot has positive correlation peaks at r1, r10, r21, and r33. Negative correlation peaks occur at r5, r16, and r17. This confirms that each cycle is 10 - 12 years.

ggAcf(sunspotarea, lag.max = 36) + ggtitle("sunspotarea autocorrelation plot")

gasoline

These plots have indicated that the series has an annual seasonality and has a mostly upwards trend. No evidence of cyclic behavior was seen.

Time Plot

The time plot shows:

  • Annual seasonality
  • Generally upwards trend
  • No evidence of cyclic behavior
theme_set(theme_light(base_size = 12))
p <- autoplot(gasoline) + ggtitle("gasoline time plot")
ggplotly(p)

Season Plot

The seasonal plot confirms the weekly seasonality and upwards trend. It also reveals a peak in weeks 30 - 39 and a trough during weeks 5 - 11.

ggseasonplot(gasoline, polar = TRUE) + ggtitle("gasoline season plot")

Season Subseries Plot

Since there are not exactly 52 weeks in a year, the frequency is non-integer and the plot produces an error.

#ggsubseriesplot(gasoline, frequency = 52) + ggtitle("gasoline season subseries plot")

Lag Plot

The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues. This reflects that this time series mainly depends on the previous week of data.

gglagplot(gasoline) + ggtitle("gasoline lag plot")

Correlogram

The autocorrelation plot has ~r52 and ~r102 higher than the other lags, indicating an annual seasonal pattern in the data. There are troughs at ~r26 and ~r78 - this further confirms annual seasonality.

ggAcf(gasoline) + ggtitle("gasoline autocorrelation plot")

Notes

What can be forecast?

The predicatability of an event or quantity depends on several factors:

  1. how well we understand the factors that contribute to it
  2. how much data is available
  3. whether the forecasts can affect the thing we are trying to forecast

Forecasting methods depend on what data are available and the predicability of the quantity to be forecast. Methods include: 1. Naive method - Using the most recent observation as a forecast 2. Judgemental forecasting - For when historic data is not available. More accurate when the forcaster has important domain knowledge and more up-to-date data.

Forecasting, planning, and goals

Forecasting Predicting the future as accurately as possible, using historical data and knowledge of future events. May be short-term (ex - scheduling, demand), medium-term (ex- resource requirements), or long-term (ex - strategic planning).

Goals Events that we would like to happen

Planning A response to forecast and goals

Determining what to forecast

  • What is the forecasting horizon?
  • What forecasting frequency is needed?
  • What data is needed for the forecast?

Forecasting data and methods

  • Qualitative forecasting - Methods needed when data is not available or is not relevant.
  • Quantitative forecasting - Can be applied when numerical information about the past is available, and past patterns will continue.
  • Time series model - Uses only information on the variable, no factors that affect its behavior (ex - decomposition, exponential smoothing, ARIMA)
  • Explanatory model - A model with predictor variables
  • Mixed models - Combine the features of both models

The basic steps in a forecasting task

  1. Problem Definition (how will forcasts be used and by who?)
  2. Gathering information (statistical data and expertise)
  3. Exploratory analysis (are there patterns/trend/seasonality/business cycle? are there realtionships amoung variables?)
  4. Choosing and fitting models (compare multiple potential models, depends on data availability and strength of relationships between forecast variable and exploratory variables)
  5. Using and evaluating a forecasting model

The statistical forecasting perspective

  • Consider a forecast to be a random variable
  • The proability distribution of the random variable is the forecast distribution

ts objects

Time series can be stored as a ts object in R.

  • Frequency of a time series - number of observations before the seasonal pattern repears (annual, quarterly, monthly, weekly)

Time plots

  • Use autoplot in R to visualize

Time series patterns

  • Trend - long-term increase or decrease in the data
  • Seasonal - time series affected by seasonal factors such as time of year or day of week
  • Cyclic - fluctuations in data that are not of a fixed frequency

Seasonal plots

Seasonal plots show data plotted against individual seasons to help better identify underlying seasonal patterns and see where the pattern changes. Option - set polar = TRUE to see plot on polar coordinates.

Seasonal subseries plots

Plot where the data for each season are collected in separate mini time plots.

Scatterplots

Used to see relatonships between predictor variables. * Facet by a feature * Find correlation between two time series * Correlation matrix

Lag plots

Note - The kth lag is the time period that happened k time periods before time i. Helpful in identifying seasonality.

Autocorrelation

Autocorrelation measures the linear relationship between lagged values of a time series. * When data have a trend, autocorrelations for small lags tend to be large and positive * When data are seasonal, autocorrelations will be larger at multiples of the seasonal frequency * Wehn both, autocorrelations will have a combination of these

White noise

White noise - time series that show no autocorrelation.