Author

J Sigma

1. Introduction to Time Series Analysis

1.1. Time Series vs Cross-Sectional Data

ImportantDefinition (Cross-Sectional Data)

Cross-sectional data is data that is observed or measured at one point in time.

So far, the statistical methods and techniques we have used are examples of cross sectional data.

As an example, suppose we are trying to model the relationship between how well students perform in Applied Statistics based on their lecture attendance. We can measure this by the final mark of the student. Since this is a point estimate, it is an example of cross-sectional data

ImportantDefinition (Time Series)

A time series is a sequence of observations collected at regular, equally-spaced intervals over a period of time.

Time series are extremely common in everyday life. Here are a few examples of fields which use time series:

  • Business

    Sales figures, production numbers, and customer frequencies are measurements over time, and so are examples of time series

  • Economics

    Stock prices, exchange rates, and interest rates are examples of time series

  • Official Statistics

    Census data, personal expenditures and road casualties can be measured over time, and so are examples of time series

1.2. Describing Time Series Data

To describe time series, or get a feel for the data we are looking at, we consider four things:

  1. Trend

    This describes the long-term movement of the data, and we consider two things:

    1. whether the trend is increasing or decreasing (direction) over time; and
    2. whether the trend is linear or non-linear
  2. Seasonality

    Seasonality refers to patterns that repeat at regular intervals throughout the series. These patterns usually repeat themselves during the same calendar periods and last for a fixed amount of time through these periods

    WarningExample

    We may notice that the sale of alcoholic beverage spikes up in December of every year. We may infer that this is in keeping with the fact that it is a festive season.

    Furthermore, we may notice that this spike dies down around February, and this would be in-keeping with the idea that many people return to their daily working routines during this time.

    The pattern may repeat

    • every hour mark for hourly data
    • every new day for daily data
    • every month for monthly data
    • every quarter for quarterly data

    or any variation of these frequencies, e.g., every third quarter each year.

  3. Cyclical Movement

    These are longer term rises and falls. They are irregular, and are often (quite obviously) bigger deviations from the general trend than seasonal tendencies, and random variation.

    WarningExample

    An example of a cyclical movement is the housing crisis of 2008, which caused a stock-market crash. What makes this a “cycle” is the fact that markets have recovered since then – there was a fall followed by a rise.

    The duration of cyclical movements in a time series is not fixed. So, some cyclical movements are longer than others. Usually, a full cycle is 2 years long, but can last up to 10 years.

  4. Random Variation

    These are just unexpected noise or shocks in the time series

    Note

    There is always random variation in a time series. In addition, a time series may include one, two or all of the other three components.

WarningExample One

The following is a time series data set capturing the number of monthly airline passengers over a span of over 10 years, from 1949 to 1961. We can observe the data to see whether there is a trend, seasonality, cyclicality and random variation.

Code
######################################################################
# MONTHLY AIR PASSANGERS DATA
######################################################################

# Built-in dataset
data("AirPassengers")

# Convert to time series object
ts_data <- AirPassengers


# Plot raw observations
plot(ts_data,
     main = "Monthly Airline Passengers",
     ylab = "Passengers",
     xlab = "Year")

(1) We can see that we have an increasing trend. There is no clear evidence that the trend is non-linear, and so we can – generally – describe this a a linear trend.

(2) We can see that close to the end of every year into the beginning of the new year, we have a spike in the monthly airline passengers, and then we have a dip in the middle of the year. So, we clearly have seasonality in this data set.

(3) There are no big deviations from the true trend. Yes, we may observe big spikes, especially closer to the end, but these don’t break the overall pattern that we see in the data. So, there is no cyclicality.

(4) Random variation is present, as is always the case.

WarningExample Two

The following data set contains daily closing values of major European stock market indices over a fixed period. It includes the:

  • DAX (German Stock Index – Top 40)

  • SMI (Switzerland)

  • CAC (France)

  • FTSE (UK – Top 100)

We will look at the time series for the German Index

Code
############################################################
# EU STOCK MARKET DATA — SINGLE SERIES (DAX ONLY)
############################################################

data("EuStockMarkets")

# Extract one series
dax <- EuStockMarkets[, "DAX"]

# Convert to time series object (already ts, but explicit for clarity)
dax_ts <- ts(dax)

############################################################
# 1. RAW OBSERVATIONS
############################################################

plot(dax_ts,
     main = "DAX Index (Daily Closing Prices)",
     ylab = "Index Value",
     xlab = "Days")

(1) Here, we can see an increasing trend over time. We can clearly see that we have a non-linear (exponential) trend.

(2) There are no regularly repeating patterns. So, we do not have seasonality.

(3) There are also no big deviations from the overall upward and exponential trend. So, there is no cyclical movement in the time series.

(4) There is random variation.

WarningExample Three

This data set shows the number of driver deaths in the UK over a fixed historical period (1970s-1980s range).

Code
############################################################
# UK DRIVER DEATHS
############################################################

data("UKDriverDeaths")

ts_data <- UKDriverDeaths

# Raw series

plot(ts_data,
     main = "UK Driver Deaths (Monthly)",
     ylab = "Deaths",
     xlab = "Year")

(1) We can see that, although it is small, there is a gradual decline in the number of deaths over time. and that this is a linear decline.

(2) There is a strong annual pattern of higher deaths compared to other times in the year. So, we have seasonality in this time series.

(3) There are no big deviations from the declining trend and seasonality. So, there is no cyclicality.

(4) There is random variation in the time series.

WarningExample Four

The following data set contains annual records of the number of lynx trapped in Canada from 1821 to 1934.

Code
############################################################
# LYNX DATASET
############################################################

data("lynx")

ts_data <- lynx

# Basic plot
plot(ts_data,
     main = "Canadian Lynx Trappings (1821–1934)",
     ylab = "Number of lynx",
     xlab = "Year")

(1) There is no strong long-term trend. This time series oscillates around a relatively stable mean.

(2) The data set does not exhibit seasonality since the fluctuations are not tied to a fixed calendar period. The patterns observed are irregular in timing.

(3) The time series has a dominant cyclical behaviour as the number of lynx rise and fall around every decade.

(4) There is random variation.

1.3. Why Time Series Analysis?

Time series analysis is a collection of statistical techniques which attempt to isolate and quantify the influence of events and changes in conditions in order to build a model that utilises this information in order to:

  • To understand how things change over time

  • To forecast future changes

  • To model dependence over time

    Standard inferential techniques which assume independence do not work well with data that is collected at regular or equally-spaced time intervals since the observations are likely dependent.

    This is where time series shine.

    ImportantDefinition (Autocorrelation)

    Autocorrelation at lag \(g\) occurs when there is dependence between observations which are \(g\) time periods apart

    Note

    A lag refers to the number of time periods between observations

  • To detect unusual behaviour; and

  • To support decision-making

1.4. Basic Assumption Underlying Time Series Forecasting

TipAssumption of Time Series Forecasting

The factors that influenced patterns of activity in the past or present will continue to do so in – more or less – the same manner in the future.

So, the overall purpose of time series analysis is to identify and isolate these factors and use them to understand the time series behaviour, and be able to predict future behaviour.

2. Two Fundamental Time Series Concepts: Autocorrelation and Stationarity

2.1. Autocorrelation

Suppose we are given data \(X_{1}, X_{2}, \dots, X_{T}\) with \(T\) being the number of observations in the time series.

Up until now, we have assumed that these observations are independent. That is to say that the sample is random and there is no relationship between any two observations. This is a nice property and we can use it to derive a lot of useful results.

However, this does not hold for all time series data. For time series, observations are taken from the same variable, and recorded at equally-spaced time intervals. This means that, generally, observations between two time periods may influence one another. This is autocorrelation (or serial correlation).

WarningExample

We may consider the amount of time a student studies Applied Statistics per week. We may observe that a student studies the subject a lot in one week, and that they even go far so as to study some of the following week’s content. As a result, they may prioritise this subject a little bit less since they will still be up to date if they happen to “fall behind” and may prioritise other subjects.

In this way, the amount of hours they spend on this subject on a weekly basis may be influenced by how much work the have done in the previous week.

The inferential techniques we have used so far usually break time when observations are taken over time intervals because of the dependence in the time series data. They fail to capture any trends in the data and so are often not appropriate for modelling the time series.

Most time series are dependent. The autocorrelation represents useful information about patterns in the data that a time series model can use to make forecasts.

2.2. Stationarity

ImportantDefinition (Stationarity)

A time series is said to be stationary if

  1. The time series has a constant mean
  2. The variability of the time series is constant over time

We can qualitatively measure whether a time series is stationary by observing the following

  1. If there is a trend or not

    A trend in a time series suggests that the average values will increase or decrease over time from lag to lag. If there is no trend, then the mean is constant.

  2. If there is divergence in the time series or not

    If a time series diverges over time, then the variability betwen the observations gets larger over time, and so is not constant.

Visually:

Code
#########################################
# CONSTANT VS NON- CONSTANT MEAN
#########################################

set.seed(1)

par(mfrow = c(2,2))  # 2x2 layout

n <- 200
t <- 1:n

# 1. Constant mean (stationary)
y1 <- rnorm(n, mean = 0, sd = 1)
plot(ts(y1), main = "Constant Mean", ylab = "")

# 2. Changing mean (trend)
y2 <- 0.05*t + rnorm(n)
plot(ts(y2), main = "Changing Mean (Trend)", ylab = "")

#######################################
# CONSTANT VS NON-CONSTANT VARIATION
#######################################

# 3. Constant variance
y3 <- rnorm(n, mean = 0, sd = 1)
plot(ts(y3), main = "Constant Variance", ylab = "")

# 4. Changing variance (heteroskedastic)
y4 <- rnorm(n, mean = 0, sd = t/50)
plot(ts(y4), main = "Changing Variance", ylab = "")

Note

Most time series we encounter are non-stationary and we often need to transform them so that they exhibit stationarity. These transformations help us get a clear read into the data.

3. Time Series Graphics

3.1. Time Series Plot

ImportantDefinition (Time Series Plot)

A time series plot is a line graphof the observed data \(y_{t}\) against time \(t\).

The first step in time series analysis is to observe an patterns any patterns that have occurred over time using a time series plot. It enables us to detect and describe patterns , factors or components of the past behaviour of the series.

The identified components help in finding a suitable statistical model to describe the data, which the allows us to forecast future values of the time series.

3.2. Components of an Non-Stationary Time Series

As we have discussed, there are four components of a non-stationary time series. These are:

  • Trend

  • Cyclical Movement

  • Seasonality; and

  • Random Variation

From a time series plot, we need to be able to identify and describe these components.

4. General Time Series Model and Moving Average Smoothing

4.1. Time Series Model

There are many models that are available for forecasting a time series. Before getting into the models, it is important to note the following:

Note
  • The explanatory variable, \(t\), is often recorded from its real world values to make the equations easier to interpret

    For example, we may assign \(t=1\) for the first time period (year, quarter, week, or day), and ll successive periods will be assigned in increasing integer increments

  • With a time series, we are modelling the dependent variable, \(y_{t}\), based on past values of itself, without using any other explanatory variables to explain any change in \(y_{t}\). So, we understand past behaviour to forecast future values.

There are three main reasons why we prefer time series to other statistical techniques, particularly those which use explanatory variables to explain changes in a given factor:

  1. The process generating the data might not be understood, and so it may be difficult to measure the relationship giving rise to the behaviour. So, we look at the data for what it is, and not what it is caused by.
  2. Explanatory models require knowledge of various predictors in order to forecast future values.
  3. Time series may give more accurate forecasts than explanatory models.

4.1.1. Additive Time Series Model

The simplest assumption about the relationship between the components of a time series is that they are additive and independent of each other

We have the following:

ImportantAdditive Model

\[ Y_{t}=T_{t}+C_{t}+S_{t}+R_{t} \]

where

  • \(t\) is the time period we are interested in

  • \(Y_{t}\) is the observed value of the time series at \(t\)

  • \(T_{t}\) is the trend component at time \(t\)

  • \(C_{t}\) is the cyclical component at time \(t\)

  • \(S_{t}\) is the seasonal component at time \(t\); and

  • \(R_{t}\) is the random component at time \(t\)

4.1.2. Multiplicative Time Series Model

Alternatively, we assume that the four components of a time series are not necessarily independent, and can affect one another. This is the multiplicative time series model. We have that

Important

\[ Y_{t}=T_{t}\times C_{t}\times S_{t}\times R_{t} \]

Here, we can see that the observed time series value, \(Y_{t}\) at time \(t\) is a product of the sour time series components.

The multiplicative model is the most preferred forecasting model, because,

  1. It can be made additive. We can take the logarithms of the left and right hand side of the time series and observe that

    \[ \ln{(Y_{t})}=\ln{(T_{t})}+\ln{(C_{t})}+\ln{(S_{t})}+\ln{(R_{t})}\]

  2. If we consider any component, the the other components can be interpreted as indexes relative to that component. We can make statements like

    “At time \(t\), the seasonal effect is \(1.2\), meaning that the series is \(20\%\) above the baseline trend at that period”

The additive model is most appropriate when the size of the seasonal fluctuations remain roughly constant over time. Otherwise, a multiplicative model is preferred.

Question: How can be better understand current patterns in our time series?

Answer: It is often difficult to identify components of a time series by simply graphing the series over time, because of too much random variation.

In the event where there is no obvious pattern, moving average smoothing can be used to smooth out the noise from the time series so that we may get a clearer idea of the nature of the time series components. This helps us build better models for more accurate forecasts.

4.2. Moving Average Smoothing

Suppose we have data like \(\{100, 130, 95, 140, 105\}\). It is hard to tell whether the series is actually increasing or decreasing, or just fluctuating randomly. A moving average replaces each value an avergae of nearby values (since values which are close to one another will have roughly the same value). So, we may end up with something like \(\{108, 113, 121, 115, 120, 135\}\) which gives us a much clearer idea of what the time series is doing over time: increasing.

Moving average smoothing reduces short-term randomness to give us a clear idea of the behaviour of the time series.

4.2.1. Computing Moving Averages

We use the notation \(MA(k)\) to denote the moving average, where \(k\) gives the number of time periods over which average values are taken. We have the following process

  1. The first moving average of a sequence \(y_{1}, y_{2}, \dots, y_{T}\) is obtained by calculating the average of the first \(k\) consecutive values \(y_{1}, y_{2}, \dots, y_{k}\)

  2. The second is obtained by taking the avergage of the next batch of \(k\) values \(y_{2}, y_{3}, \dots, y_{k}, y_{k+1}\)

  3. This process continues until the last batch of \(k\) consecutive values is computed. We will end up with

    \[ T-k+1 \]

    smoothed values

WarningWorked Example (Finding Moving Averages)

Suppose we want to calculate MA(3) values for the following data set

Year t Revenue MA(3)
2003 1 34
2004 2 12 \[ \frac{34+12+67}{3}=37.67 \]
2005 3 67 \[ \frac{12+67+87}{3}=55.33 \]
2006 4 87 \[ \frac{67+87+22}{3}=58.67 \]
2007 5 22 \[ 58.33 \]
2008 6 66 \[ 55 \]
2009 7 77 \[ 77.67 \]
2010 8 90 \[ 67 \]
2011 9 34 \[ 48.67 \]
2012 10 22

Notice that the moving average is assigned to the middle period for each window. What if, instead, we wanted to find MA(4) values? Now, we don’t have a “middle” period in the same sense as when we had an MA(3). This doesn’t mean that we assign the values to periods that are not in our already defined time periods. We have the following:

Year t Revenue MA(4) CMA(4)
2003 1 34
2004 2 12 \[ \frac{34+12+67+87}{4}=50 \]
2005 3 67 \[ \frac{12+67+87+22}{4}=47 \] \[ \frac{50+47}{2}=48.5 \]
2006 4 87 \[ 60.5 \] \[ \frac{47+60.5}{2}=53.75 \]
2007 5 22 \[ 63 \] \[ 61.75 \]
2008 6 66 \[ 63.75 \] \[ 63.375 \]
2009 7 77 \[ 66.75 \] \[ 65.250 \]
2010 8 90 \[ 55.75 \] \[ 61.250 \]
2011 9 34
2012 10 22

So, in the case where we have an even number of periods to consider per window, we do two things:

  1. We calculate a moving average, and assign it to the period before where the middle of the period occurs
  2. Then we calculate a centered moving average between consecutive middle values and assign it to the period occurring after the middle period

4.2.2. Choosing \(k\)

\(k\) is subjectively chosen. However so, it much be chosen in such a way that it minimises the fluctuations as best as possible. The guideline for choosing \(k\) is as follows:

  • \(k=4\) for quarterly data

  • \(k=7\) for daily data

  • \(k=12\) for monthly data

and so on. As \(k\) gets bigger, the series becomes smoother. However, if \(k\) is too large, the smoothed series tends towards a straight line which defeats the purpose of a time series model where we have four components. Unless you have reason to believe that trend is the only component present in the time series, this is not really appropriate.

Code
################################################
# MOVING AVERAGES IN R -- AIR PASSANGERS DATA
################################################

library(ggplot2)
library(forecast)

# Note that R does the CMA automatically

autoplot(AirPassengers) +
  autolayer(ma(AirPassengers, 12), series="MA(12)") +
  ggtitle("Air Passangers Data with MA(12) Smoothed Series")

ImportantMoving Averages Advantages and Disadvantages
  • Moving averages are very quick and easy to apply

  • The problem with moving averages is that we lose \(k-1\) observations if \(k\) is odd, and \(k\) observations if \(k\) is even

  • Some values are never considered, particlarly when the fall out of the window for which the moving averages are calculated

  • a larger k \(\implies\) fewer moving averages can be computed which makes it hard to obtain an overall impression of the entire series

5. Time Series Decomposition

We decompose a time series primarily to get a better understanding of how the underlying components are affecting the variable of interest. Before moving forward, we need to understand the notion of seasonal variation

ImportantDefinition (Seasonal Variation)

Seasonal variation refers to the regular and reapeating patterns that occur over fixed time intervals. For example, we can find that ice cream sales increase every December year-on year. Since we have identified a month, we can say that – for monthly data – period 12 always has an increase in the number of sales of ice cream. Similarly, the data may behave in many other ways during different months.

We can say that the length of the seasonal variation is one year, and that it is made of 12 months or 12 “SEASONS”

It follows, then, that an appropriate \(k\) value for MA smoothing would be \(k=12\)

5.1. Classical Decomposition

ImportantThe process of Classical Decomposition
  1. Develop an estimate for the trend component using centred moving averages
  2. De-trend the original time series to isolate the seasonal component
  3. Calculate the seasonal indices and the adjust them if necessary. This gives precise estimates of the seasonal component
  4. Calculate the random component
  5. Deseasonalise the original time series
  6. If the trend is linear, develop a precise estimate of the trend component on the deseasonalised time series
Note

We assume that cylicality, \(C_{t}\) is negligible. So, our time series multiplicative model is given by

\[ Y_{t}=T_{t}\times S_{t} \times R_{t} \]

WarningWorked Example

The quarterly earning (in R millions) of a large drink manufacturer have been recorded for the years 1997 to 2000. We have the following:

and the data are listed below

Year Quarter t Quarterly Earnings
1997 1 1 52
1997 2 2 67
1997 3 3 85
1997 4 4 54
1998 1 5 57
1998 2 6 75
1998 3 7 90
1998 4 8 61
1999 1 9 60
1999 2 10 77
1999 3 11 94
1999 4 12 63
2000 1 13 66
2000 2 14 82
2000 3 15 98
2000 4 16 67

Decompose the time series into its components, estimate the components and then build a model that could be used to forecast future quarterly earnings values.

  1. We first develop an estimate of the trend component in the series by calculating (centred) moving averages (CMA’s) which allow us to smooth the seasonal and some of the random component.

    Year Quarter t Quarterly Earnings MA(4) CMA(4)
    1997 1 1 52
    1997 2 2 67 64.5
    1997 3 3 85 65.75 65.125
    1997 4 4 54 67.75 66.75
    1998 1 5 57 69 68.375
    1998 2 6 75 70.75 69.875
    1998 3 7 90 71.5 71.125
    1998 4 8 61 72 71.75
    1999 1 9 60 73 72.5
    1999 2 10 77 73.5 73.25
    1999 3 11 94 75 74.25
    1999 4 12 63 76.25 75.625
    2000 1 13 66 77.25 76.75
    2000 2 14 82 78.25 77.75
    2000 3 15 98
    2000 4 16 67
  2. We then de-trend the original time series by computing the ratio

    \[ \frac{Y_{t}}{CMA(k)_{t}} \]

    for each time period

    Year Quarter t Quarterly Earnings MA(4) CMA(4) De-trended Data
    1997 1 1 52
    1997 2 2 67 64.5
    1997 3 3 85 65.75 65.125 1.305
    1997 4 4 54 67.75 66.75 0.809
    1998 1 5 57 69 68.375 0.834
    1998 2 6 75 70.75 69.875 1.073
    1998 3 7 90 71.5 71.125 1.265
    1998 4 8 61 72 71.75 0.85
    1999 1 9 60 73 72.5 0.828
    1999 2 10 77 73.5 73.25 1.051
    1999 3 11 94 75 74.25 1.266
    1999 4 12 63 76.25 75.625 0.833
    2000 1 13 66 77.25 76.75 0.86
    2000 2 14 82 78.25 77.75 1.055
    2000 3 15 98
    2000 4 16 67
  3. We calculate the seasonal tendencies, then adjust them if necessary

    To do this, we group the de-trended data by the corresponding “season” – i.e., matching quarters in this case – and take the average of the data values in each season to get the seasonal indices

    Year Quarter 1 Quarter 2 Quarter 3 Quarter 4
    1997 1.305 0.809
    1998 0.834 1.073 1.265 0.85
    1999 0.828 1.051 1.266 0.833
    2000 0.86 1.055
    Seasonal Index (SI) 0.841 1.060 1.279 0.831

    We then sum the seasonal indices. The sum of the seasonal indices should match the value of \(m=\text{number of seasons}\). If they don’t, we multiply each index by a correction factor, and

    \[\frac{m}{\sum(\text{seasonal indices})}\]

    Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 Total
    1997 1.305 0.809
    1998 0.834 1.073 1.265 0.85
    1999 0.828 1.051 1.266 0.833
    2000 0.86 1.055
    Seasonal Index (SI) 0.841 1.060 1.279 0.831 4.011
    Adjusted SI 0.839 1.057 1.275 0.829 4

    We can interpret the seasonal indices as follows:

    \(S_{t}>1\) implies that, on average, values in the season are \(100\times|S_{t}-1|\%\) above the seasonal average, and \(S_{t}<1\) implies that, on average, values in that season are \(100\times |S_{t}-1|\%\) below the seasonal average. In our example:

    \(S_{1}=0.839\) implies that, on average, earnings in the first quarter are \(100\times|0.839-1|=16.1\%\) below the annual average earnings. \(S_{3}=1.275\) implies that, on average, earnings in the third quarter are \(100\times|1.275-1|=27.5\%\) above the annual average earnings.

  4. We compute the random component by dividing the original time series by the product of the estimates of the trend and seasonal components

    \[ R_{t}=\frac{Y_{t}}{\hat{S_{t}\times CMA(k)_{t}}} \]

    Year Quarter t Quarterly Earnings MA(4) CMA(4) De-trended Data Adjusted Seasonal Index Random Component
    1997 1 1 52
    1997 2 2 67 64.5
    1997 3 3 85 65.75 65.125 1.305 1.275 1.024
    1997 4 4 54 67.75 66.75 0.809 0.829 0.976
    1998 1 5 57 69 68.375 0.834 0.839 0.994
    1998 2 6 75 70.75 69.875 1.073 1.057 1.015
    1998 3 7 90 71.5 71.125 1.265 1.275 0.992
    1998 4 8 61 72 71.75 0.85 0.829 1.026
    1999 1 9 60 73 72.5 0.828 0.839 0.986
    1999 2 10 77 73.5 73.25 1.051 1.057 0.995
    1999 3 11 94 75 74.25 1.266 1.275 0.993
    1999 4 12 63 76.25 75.625 0.833 0.829 1.005
    2000 1 13 66 77.25 76.75 0.86 0.839 1.025
    2000 2 14 82 78.25 77.75 1.055 1.057 0.998
    2000 3 15 98
    2000 4 16 67
  5. We then deseasonalise the original time series which results in a smoother, seasonally-adjusted time series which makes it easier to estimate the trend component. We deseasonalise the time series as follows:

    \[ \frac{Y_{t}}{\hat{S}_{t}}=\frac{T_{t}\times S_{t}\times R_{t}}{\hat{S}_{t}}\approx T_{t}\times R_{t}=(DS)_{t}=T_{t}+\epsilon_{t} \]

    Note

    \(\hat{S}{t}\approx S_{t}\), and so we can just cancel these out to get an estimate of the deseasonalised time series.

    Year Quarter t Quarterly Earnings MA(4) CMA(4) De-trended Data Adjusted Seasonal Index Random Component Seasonally-Adjusted Time Series
    1997 1 1 52 0.839 61.979
    1997 2 2 67 64.5 1.057 63.387
    1997 3 3 85 65.75 65.125 1.305 1.275 1.024 66.667
    1997 4 4 54 67.75 66.75 0.809 0.829 0.976 65.139
    1998 1 5 57 69 68.375 0.834 0.839 0.994 67.938
    1998 2 6 75 70.75 69.875 1.073 1.057 1.015 70.956
    1998 3 7 90 71.5 71.125 1.265 1.275 0.992 70.588
    1998 4 8 61 72 71.75 0.85 0.829 1.026 73.583
    1999 1 9 60 73 72.5 0.828 0.839 0.986 71.514
    1999 2 10 77 73.5 73.25 1.051 1.057 0.995 72.848
    1999 3 11 94 75 74.25 1.266 1.275 0.993 73.725
    1999 4 12 63 76.25 75.625 0.833 0.829 1.005 75.995
    2000 1 13 66 77.25 76.75 0.86 0.839 1.025 78.665
    2000 2 14 82 78.25 77.75 1.055 1.057 0.998 77.578
    2000 3 15 98 1.275 76.863
    2000 4 16 67 0.829 80.820

    We obtain the following

If the deseasonalised time series looks linear, we can regress the deseasonalised data on time \(t\):

\[ \hat{T}_{t}=\hat{\beta_{0}}+\hat{\beta}_{1}t \]

This will provide us with a good estimate for the trend component. So, we conduct a simple linear regression analysis on the deseasonalised seasonaly-adjusted time series. Our multiplicative model for forecasting future values then becomes

\[ \hat{Y}_{t}=\hat{T}_{t}\times \hat{S}_{t}=(\hat{\beta}_{0}+\hat{\beta}_{1}t) \times S_{t} \]

5.1.1. Drawbacks of Classical Decomposition

ImportantDrawbacks of Classical Decomposition
  • The estimate of the trend cannot be found for the first and last few observations due to moving average smoothing used to estimate the trend

  • The trend estimate tends to over-smooth rapid rises and falls in the data

  • The method assumes that seasonal components repeat from year to year. While this is a reasonable assumption for some time series, it is not good for longer tie series. So, it fails to capture seasonal changes over time

  • Classical decomposition is not robust to unusual values (like short term big rises and drops which do not conform to the pattern of the time series as a whole)

5.1.2. Classical Decomposition in R

Code
###################################################
# CLASSICAL DECOMPOSITION OF AIR PASSANGERS DATA
###################################################

# reading in data 
data("AirPassengers")

# decomposition 
air_decomp <- decompose(AirPassengers, type="multiplicative")

# Note, there's also an "additive" option, but we use the multiplicative
# model

# plot decompositon
plot(air_decomp)

Code
# summary
raw <- print(air_decomp) #raw data for each component
$x
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

$seasonal
           Jan       Feb       Mar       Apr       May       Jun       Jul
1949 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1950 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1951 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1952 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1953 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1954 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1955 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1956 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1957 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1958 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1959 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
1960 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
           Aug       Sep       Oct       Nov       Dec
1949 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1950 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1951 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1952 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1953 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1954 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1955 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1956 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1957 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1958 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1959 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244
1960 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244

$trend
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
1949       NA       NA       NA       NA       NA       NA 126.7917 127.2500
1950 131.2500 133.0833 134.9167 136.4167 137.4167 138.7500 140.9167 143.1667
1951 157.1250 159.5417 161.8333 164.1250 166.6667 169.0833 171.2500 173.5833
1952 183.1250 186.2083 189.0417 191.2917 193.5833 195.8333 198.0417 199.7500
1953 215.8333 218.5000 220.9167 222.9167 224.0833 224.7083 225.3333 225.3333
1954 228.0000 230.4583 232.2500 233.9167 235.6250 237.7500 240.5000 243.9583
1955 261.8333 266.6667 271.1250 275.2083 278.5000 281.9583 285.7500 289.3333
1956 309.9583 314.4167 318.6250 321.7500 324.5000 327.0833 329.5417 331.8333
1957 348.2500 353.0000 357.6250 361.3750 364.5000 367.1667 369.4583 371.2083
1958 375.2500 377.9167 379.5000 380.0000 380.7083 380.9583 381.8333 383.6667
1959 402.5417 407.1667 411.8750 416.3333 420.5000 425.5000 430.7083 435.1250
1960 456.3333 461.3750 465.2083 469.3333 472.7500 475.0417       NA       NA
          Sep      Oct      Nov      Dec
1949 127.9583 128.5833 129.0000 129.7500
1950 145.7083 148.4167 151.5417 154.7083
1951 175.4583 176.8333 178.0417 180.1667
1952 202.2083 206.2500 210.4167 213.3750
1953 224.9583 224.5833 224.4583 225.5417
1954 247.1667 250.2500 253.5000 257.1250
1955 293.2500 297.1667 301.0000 305.4583
1956 334.4583 337.5417 340.5417 344.0833
1957 372.1667 372.4167 372.7500 373.6250
1958 386.5000 390.3333 394.7083 398.6250
1959 437.7083 440.9583 445.8333 450.6250
1960       NA       NA       NA       NA

$random
           Jan       Feb       Mar       Apr       May       Jun       Jul
1949        NA        NA        NA        NA        NA        NA 0.9516643
1950 0.9626030 1.0714668 1.0374474 1.0140476 0.9269030 0.9650406 0.9835566
1951 1.0138446 1.0640180 1.0918541 1.0176651 1.0515825 0.9460444 0.9474041
1952 1.0258814 1.0939696 1.0134734 0.9695596 0.9632673 1.0003735 0.9468562
1953 0.9976684 1.0151646 1.0604644 1.0802327 1.0413329 0.9718056 0.9551933
1954 0.9829785 0.9232032 1.0044417 0.9943899 1.0119479 0.9978740 1.0237753
1955 1.0154046 0.9888241 0.9775844 1.0015732 0.9878755 1.0039635 1.0385512
1956 1.0066157 0.9970250 0.9876248 0.9968224 0.9985644 1.0275560 1.0217685
1957 0.9937293 0.9649918 0.9881769 0.9867637 0.9924177 1.0328601 1.0261250
1958 0.9954212 0.9522762 0.9469115 0.9383993 0.9715785 1.0261340 1.0483841
1959 0.9825176 0.9505736 0.9785278 0.9746440 1.0177637 0.9968613 1.0373136
1960 1.0039279 0.9590794 0.8940857 1.0064948 1.0173588 1.0120790        NA
           Aug       Sep       Oct       Nov       Dec
1949 0.9534014 1.0022198 1.0040278 1.0062701 1.0118119
1950 0.9733720 1.0225047 0.9721928 0.9389527 1.0067914
1951 0.9397599 0.9888637 0.9938809 1.0235337 1.0250824
1952 0.9931171 0.9746302 1.0046687 1.0202797 1.0115407
1953 0.9894989 0.9934337 1.0192680 1.0009392 0.9915039
1954 0.9845184 0.9881036 0.9927613 0.9995143 0.9908692
1955 0.9831117 1.0032501 1.0003084 0.9827720 1.0125535
1956 1.0004765 1.0008730 0.9835071 0.9932761 0.9894251
1957 1.0312668 1.0236147 1.0108432 1.0212995 1.0005263
1958 1.0789695 0.9856540 0.9977971 0.9802940 0.9405687
1959 1.0531001 0.9974447 1.0013371 1.0134608 0.9999192
1960        NA        NA        NA        NA        NA

$figure
 [1] 0.9102304 0.8836253 1.0073663 0.9759060 0.9813780 1.1127758 1.2265555
 [8] 1.2199110 1.0604919 0.9217572 0.8011781 0.8988244

$type
[1] "multiplicative"

attr(,"class")
[1] "decomposed.ts"

As we can see, R automatically isolates each of the components of the multiplicativve model so that we may understand the behaviour of the data better

6. The Forecaster’s Toolbox

There are some forecasting methods that are simple, yet remarkably effecctve for some time series. We always fit simple forecasting methods to use as ‘benchmarks’ against which we compare the other more complex methods. If it happens that the more complex methods do not yield better results, then there is no need to use it for the time series being analysed

We consider four methods:

  1. the average method
  2. the naive method
  3. the seasonal naive method; and
  4. the drift method

6.2. Simple Forecasting Methods

6.2.1. The Average Method

The average method forecasts the future values as the mean of the historical data. The notation \(\hat{y}_{T+h}|T\) is a short-hand for the estimate of \(y_{T+h}\) based on the data \(y_{1}, \dots, y_{T}\)

Code
############################
# THE AVERAGE METHOD IN R
############################

library(forecast)

avg_forecast <- meanf(AirPassengers, h=12)
plot(avg_forecast, 
     main="Average Method Forecast",
     xlab="Year",
     ylab="Passengers")

Code
# h is the forecast horizon (time steps to forecast ahead)

6.2.2. Naive Method

The naive method forecasts all future values as the value of the last observation

Code
######################
# NAIVE METHOD IN R
######################

nforecast <- naive(AirPassengers, 12) 
# or nforecast <- rwf(AirPassengers, 12)

plot(nforecast)

6.2.3. Seasonal Naive Method

The seasonal naive method is used when the time series has a strong seasonal component. the forecast value for a particular season is the corresponding forecasted value from the previous season. So,

\[ \hat{y}_{T+h|T}=y_{T+h-m(k+1)} \]

Code
##############################
# SEASONAL NAIVE METHOD IN R
##############################

snforecast <- snaive(AirPassengers, 24)
plot(snforecast)

6.2.4. Drift Method

The drift method is an extension to the naive method which allows for the presence of a linear trend in the data. The amount of change over time (called the drift) is equal to the average change in the historical data.

\[ \hat{y}_{T+h|T}=y_{T}=\frac{h}{T-1}\sum_{t=2}^{T}(y_{t}-y_{t-1})=y_{T}+h\left(\frac{y_{T}-y_{1}}{T-1}\right) \]

Code
######################
# DRIFT METHOD IN R
######################

drift_forecast <- rwf(AirPassengers, 24, drift=TRUE)
plot(drift_forecast)

6.3. Forecasting and Forecasting Accuracy

6.3.1. Forecasting with Simple Forecasting Methods

WarningWorked Example One
Code
#########################################
# FORECASTING AUSTRALIAN BEER PRODUCTION
#########################################

library(fpp2)
library(ggplot2)
library(urca)
library(forecast)

data("ausbeer")

austrain <- window(ausbeer, start=c(1994, 1), end=c(2010, 4)) # partitions data

autoplot(austrain) +
  
  autolayer(meanf(austrain, h = 12),
            series = "Average Method",
            PI = FALSE) +
  
  autolayer(naive(austrain, h = 12),
            series = "Naive Method",
            PI = FALSE) +
  
  autolayer(snaive(austrain, h = 12),
            series = "Seasonal Naive",
            PI = FALSE) +
  
  autolayer(rwf(austrain, h = 12, drift = TRUE),
            series = "Drift Method",
            PI = FALSE) +
  
  ggtitle("Forecasts for Quarterly Beer Production") +
  xlab("Year") +
  ylab("Megalitres") +
  guides(colour = guide_legend(title = "Forecast")) 

We can see from these forecasts that, in the short-term, the seasonal naive method would be the best forecasting method to use for the time series. However, in the long-term, the drift method is probably better because of the downtrend present in the data.

The seasonal naive method does not preserve the trend in the data

WarningWorked Example Two
Code
###############################
# FORECASTING GOOGLE PRICE
###############################

library(fpp2)
library(fpp3)
library(forecast)
library(ggplot2)
library(urca)

data("goog200")

autoplot(goog200) +
  autolayer(meanf(goog200, h =90),
            series = "Average Method",
            PI = FALSE) +
  
  autolayer(snaive(goog200, h = 90),
            series = "Seasonal Naive Method",
            PI = FALSE) +
  
  autolayer(naive(goog200, h = 90),
            series = "Naive Method",
            PI = FALSE) +
  
  autolayer(rwf(goog200, h = 90, drift = TRUE),
            series = "Drift Method",
            PI = FALSE) +
  ggtitle("Google Stock (Daily, ending 6 December 2013")+
  xlab("Closing Price (US$)") + ylab("Day") +
  guides(colour=guide_legend(title="Forecasts"))

There is a note to make here about the seasonal naive method. Looking at the time series, we can see that there is very little seasonality (if there is, at all, any). As a result, when we try to forecast using the seasonal naive method, its behaviour is the same as the naive method. In our example, the seasonal naive method is printed, but since it is the same as the naive method, and the naive method is layered on top of it, you cannot see it visually.

The method that will provide the most accurate long-term forecast here is drift. This is due to the trend in the data. Short-term, we may argue that the naive method would produce the most accurate forecast.

6.3.2. Forecasting Accuracy

Before forecasting into the future, we try to see if our model well approximates the current data that we have. To do this, we split our data into a training and a test data set. The training data set is the data that we train our model on. This is the part of the data that the model sees. The test data set is then used to confirm whether or not the model we have built is accurate enough. The test set is only used once.

Note

In the real world, it is often wise to split the data into three sets: a training set, a validation set, and a test set.

The validation set acts as the test data set as we have defined above. However, a validation set allows us to tweak our model as much as we can to that we have a sufficiently accurate model (in the case where it is not accurate enough). Then, we use the test set to see if our data is truly accurate.

The reason we do this is because we do not want to go straight to the test data set, and then try to tweak our data so that it fits the test data set. This would result in a biased model that runs a risk of not forecasting accurately. At the same time, we do not want to lose data. So, the validation set is there to serve as the point where we can tweak our model as much as we want, and see if this will accurately forecast the test data.

When more than one forecasting method is available (as is usually the case), we can determine which method will result in the best forecasting accuracy using the mean absolute error (MAE) and the mean squared error (MSE). We have that,

\[MAE=\sum_{t=1}^{T}\frac{|Y_{t}-\hat{Y}_{t}|}{T} \quad \text{and} \quad MSE=\sum_{t=1}^{T}\frac{(Y_{t}-\hat{Y}_{t})^{2}}{T}\]

where

  • \(\hat{Y}_{t}\) is the forecasted value in period \(t\)

  • \(Y_{t}\) is the actual observation in period \(t\)

  • \(T\) is the number of time periods

  • \({e}_{t}=Y_{t}-\hat{Y}_{t}\) is the forecast error

ImportantMean Absolute Error

MAE is a measure of the average of the abolute differences between the actual and fitted values for a give set of data.

If the model predicts the data perfectly, \(MAE=0\), and it grows (relative to the data range) with poor data prediction.

ImportantMean Squared Error

MSE is a measure of the average squared differences between the actual and fitted values for a give set of data.

If the model predicts the data perfectly, \(MSE=0\). Otherwise, the MSE will be large (relative to the data range).

  • If it is important to avoid (even a few) large errors, the MSE is ideal because it punishes large deviations more heavily (by squaring) than the MAE. Otherwise, use MAE.

  • Note that MAE and MSE are both scale-dependent accuracy measures, i.e. they can only be used to compare models fitted on data with the same scale.

    For example, you cannot use these to comapare data where the the observed values a measured in 10000’s vs data where the observed values are measured in 0.0001’s

  • When comparing models using forecasting accuracy, we prefer models with smaller MAE and MSE

WarningForecasting Accuracy
Code
########################
# FORECASTING ACCURACY
########################

library(fpp2)
library(ggplot2)
library(urca)
library(forecast)

data("ausbeer")

# splitting the data

austrain <- window(ausbeer, start=1992, end=c(2007, 4)) # training set
austest <- window(ausbeer, start=2008) # testing set (rest of data)

# train forecasting models

ausfit1 <- meanf(austrain, h=8)
ausfit2 <- naive(austrain, h=8)
ausfit3 <- snaive(austrain, h=8)
ausfit4 <- rwf(austrain, h=8, drift=TRUE)

# testing the accuracy against the testing set 

accuracy(ausfit1, austest)
                 ME     RMSE      MAE        MPE     MAPE     MASE        ACF1
Training set  0.000 43.62858 35.23438 -0.9365102 7.886776 2.463942 -0.10915105
Test set     -6.875 36.32858 33.18750 -2.2656485 7.657554 2.320804 -0.07153733
             Theil's U
Training set        NA
Test set     0.7809768
Code
accuracy(ausfit2, austest)
                      ME     RMSE      MAE         MPE     MAPE     MASE
Training set   0.4761905 65.31511 54.73016  -0.9162496 12.16415 3.827284
Test set     -44.5000000 57.03289 52.00000 -11.1034206 12.64031 3.636364
                    ACF1 Theil's U
Training set -0.24098292        NA
Test set     -0.07153733  1.165502
Code
accuracy(ausfit3, austest)
                    ME     RMSE  MAE        MPE     MAPE     MASE         ACF1
Training set -2.133333 16.78193 14.3 -0.5537713 3.313685 1.000000 -0.287633300
Test set      9.250000 14.99166 14.0  2.1277563 3.267315 0.979021 -0.002750337
             Theil's U
Training set        NA
Test set     0.3265497
Code
accuracy(ausfit4, austest)
                        ME     RMSE      MAE        MPE     MAPE     MASE
Training set -5.414981e-15 65.31337 54.76795  -1.026695 12.17879 3.829927
Test set     -4.664286e+01 58.47025 52.71429 -11.598935 12.84308 3.686314
                    ACF1 Theil's U
Training set -0.24098292        NA
Test set     -0.08248707  1.196957

Notice that the MSE is not readily given by R. Instead, it gives RMSE. Recall that

\[ RMSE=\sqrt{MSE}\iff RMSE^{2}=MSE \]

But, we could still compare the RMSEs if \(RMSE>1\) for all the forecasting methods. Otherwise, if at least one of them has a value less that \(1\), then you need to square first. From our example, we can see that the seasonal naive method gives the highest forecasting accuracy since it has the lowest RMSE and MAE values.

6.3.3. Prediction Intervals

ImportantDefinition (Prediction Intervals)

A prediction interval provides an interval within which we expect the forecasted values to lier with a specified probability.

For example, assuming that the forecast errors are normally ditributred, a \(95\%\) prediction interval for the \(h\) step is given by

\[ \hat{y}_{t+h|T}\pm 1.96\hat{\sigma_{h}} \]

where \(\hat{\sigma}_{h}\) is the estimated standard deviation of the \(h\)-step forecast distribution. Usually, \(\hat{\sigma}_{h}\) is set equal to the standard deviation of the residuals for one-step ahead forecasts.

Generally, we write that

\[ \hat{y}_{t+h|T} \pm c\hat{\sigma}_{h} \]

where \(c\) depends on the coverage probability.

WarningWorked Example
Code
#################################################
# FORECASTING GOOGLE PRICE: PREDICTION INTERVALS
#################################################

library(fpp2)
library(fpp3)
library(forecast)
library(ggplot2)
library(urca)

data("goog200")

fit <- naive(goog200, h=10, level=c(80, 95))
fit
    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
201       531.4783 523.5222 539.4343 519.3105 543.6460
202       531.4783 520.2267 542.7298 514.2705 548.6861
203       531.4783 517.6980 545.2586 510.4031 552.5534
204       531.4783 515.5661 547.3904 507.1428 555.8138
205       531.4783 513.6880 549.2686 504.2704 558.6862
206       531.4783 511.9900 550.9666 501.6735 561.2830
207       531.4783 510.4285 552.5280 499.2854 563.6711
208       531.4783 508.9751 553.9814 497.0627 565.8939
209       531.4783 507.6101 555.3465 494.9750 567.9815
210       531.4783 506.3190 556.6375 493.0005 569.9561

This table gives the lower and upper bounds of the prediction intervals for the forecast values.

Suppose we are given that the standard deviation of the residuals from the naive method is \(6.21\). Then, we could calculate the \(95\%\) prediction interval as

\[ \hat{y}_{t+h|T}=531.48\pm1.96(6.21)=[519.3;543.6] \]

as appears in the table.

7. Transformations

Most time series method rely on the idea that the underlying probabilistic behavior stays stable over time. So, stationarity is a strong condition for time series. It means:

  • the mean does not change over time

  • the variance does not change over time

  • the autocorrelation does not change over time

We want the time series to exhibit this so that we can make accurate decisions about the future. This can only happen if, statistically, the past behaves like the future. We have already seen that not all time series are stationary.

However, we can produce stationary time series by performing transformations. We consider ARIMA models. We usually start transforming time series by applying variance stabilising transformations if they are necessary, before removing trend and/or seasonality.

7.1. Stabilising Transformations

7.1.1. Stabilising the Variance

If we observe that the variance of a time series increases or decreases with time, then we need a transformation which will stabilise the variance. Variance-stabilising transformations which are useful include:

  • logarithmic transformations

  • square root transformations

  • cube root transformations

  • fourth root transformations

Note

The graphical representation of a time series is a good way of getting an idea of what the time series is not. However, it doesn’t tell you what it is.

For example, we may know that a time series is not stationary by looking at it, and noticing that there is some kind of trend in the data. However, we cannot really look at a time series that seems stationary, and make a claim that it, indeed, is stationary. The reason for this is that, often, we have no idea what the process generating the data is.

Consider this: if someone is drunk, you definitely can tell that they are not sober. However, if another person looks sober, you cannot completely tell by just looking at them. In a similar way, a time series plot that has a clear uptrend or downtrend, and/or changing variance is not stationary. However, a time series that looks stationary isn’t always is.

We will asses changes in variation for a time series graphically, and consider time series where it is clear that the variance is changing or not. So, although it is useful to know the notion above, we ask – for now – that you do away with the idea.

As we said before, there are many different transformations that can be used to stabilise the variance. How do we decide which one to use?

The logarithm, square root and cube root transformations form part of a broader family called the Box-Cox Transformations that depend on the parameter, \(\lambda\):

\[ w_{t}=\begin{cases} \ln{(y_{t})} & \text{if } \lambda=0\\ (\lambda w_{t}+1)^{\frac{1}{\lambda}} & \text{otherwise} \end{cases} \]

To which transformation will be used, we need to find the value of \(\lambda\) first.

  • \(\lambda=1\) means that no variance stabilising is needed

  • \(\lambda=0\) as is given in the equation, mean that a logarithmic transformation is needed

  • \(\lambda=0.25\) results in a transformation that is close to a 4-th root approximation

  • \(\lambda=0.33\) results in a cube root transformation

  • \(\lambda=0.5\) results in a square root approximation

WarningWorked Example
Code
##############################################
# BOX-COX TRANSFORMATION (AUSSIE ELECTRICITY)
##############################################

library(fpp2)
library(fpp3)
library(ggplot2)
library(urca)
library(forecast)

data("elec")

autoplot(elec) +
  ggtitle("Australian Monthly Electricity Demand") +
  ylab("GWh")

Code
# lambda 

(lambda <- BoxCox.lambda(elec))
[1] 0.2654076

Before having even calculated the value for \(\lambda\) here, we could have seen that some transformation would be required because the variation in the data grows with time. We then get that \(\lambda=0.2654...\) which is close to \(\lambda=0.25\). This means that the Box-Cox transformation that will be applied to the data here is a 4-th root transformation.

We can, then, plot the transformed time series using the value for \(\lambda\) we found.

Code
autoplot(BoxCox(elec, lambda))

The data looks to have a more constant variance after having applied the transformation.

WarningWorked Example Two
Code
##############################################
# BOX-COX TRANSFORMATION - AIR PASSENGER DATA
##############################################

library(ggplot2)
library(forecast)
library(urca)
library(fpp2)
library(fpp3)

# raw time series

data("AirPassengers")
autoplot(AirPassengers) +
  ggtitle("Passenger Bookings")

Code
# lambda value

(lmda <- BoxCox.lambda(AirPassengers))
[1] -0.2947156
Code
# transformation 

autoplot(BoxCox(AirPassengers, lmda))

We can see in this example that the time series has an increasing variation with time. The \(\lambda\) value obtained was \(-0.2947...\), and we used this to transform the data.

7.1.2. Stabilising the Trend (Via Differencing)

If a time series has a trend, it is non-stationary because the mean will be changing with time as a result of the presence of the trend.

ImportantKey Idea

Differencing is used when there is only a l;inear trend and random variation (i.e. no seasonality), and the variance appears to be constant

First-order differencing at lag 1 is performed as:

\[ w_{t}=y_{t}-y_{t-1} \quad t=2,3,\dots, T \]

As a result of this, we lose one data point as the difference for the first observation cannot be computed. Second-order differencing can be performed to remove a quadratic trend. Rarely is more than secon-order differencing performed in practice.

Note

Since we do not deal with non-linear trends in this course, we do not consider more than one order of differencing

WarningWorked Example One
Code
###############################
# DIFFERENCING TO REMOVE TREND
###############################

library(ggplot2)
library(forecast)
library(urca)
library(fpp2)
library(fpp3)

# raw time series
data("goog200")

goog <- goog200
autoplot(goog) +
  ggtitle("Google Stock Price")

Code
# first-order differencing at lag 1

diff_g <- diff(goog, lag=1, differences=1)

autoplot(diff_g)+
  ggtitle("Google Stock Price Differencing at lag 1")

We can see here that, except for the spike around Day 160, the data has been de-trended. Also, we can see that the variance of the differenced series is relatively constant. So, this series is now likely stationary.

7.1.3. Stabilising the Seasonality (Via Differencing)

For many time series, seasonal effects are very common. What this implies is that the time series will be non-stationary since the mean will be changing due to seasonality. So, we need to stabilise seasonality.

We can use an appropriate order for differencing to remove the seasonal effects. The first-order difference with lag \(g\) is obtained by:

\[ w_{t}=y_{t}-y_{t-g} \quad t=g+1, g+2, \dots, T \]

Where \(g\) is the number of seasons within the seasonal variation of the data. The conclusion regarding which lag to use can also be guided by the raw data. If we can see that the data is recorded in months for a series of years, then the \(g=12\). If we have quarterly data, \(g=4\) instead.

WarningWorked Example
Code
########################################
# REMOVING SEASONALITY: AIRPASSENGERS 
########################################

library(forecast)
library(fpp2)
library(fpp3)
library(ggplot2)
library(urca)

data("AirPassengers")

ap <- AirPassengers

# raw time series 

autoplot(ap) +
  ggtitle("Passnger Bookings") +
  ylab("Number of Bookings")

Code
# differenced time series

seas_diff <- diff(ap, lag=12, differences=1)

autoplot(seas_diff)+
  ggtitle("First-Order Differenced Passanger Booking (Lag = 12)")

From this. we can see that there is not apparent seasonality. But this didn’t make our data any easier to read. We note that there is still some trend in our data, and the is also some variance. So, our time series is not stationery, even after the differencing we performed. So, a better approached may have been to first remove the trend and the variance before taking seasonal differencing.

Code
######################
# REMOVING VARIANCE
######################

#lambda 

(lambda <- BoxCox.lambda(ap))
[1] -0.2947156
Code
# transformation

apv <- BoxCox(ap, lambda)

autoplot(apv) +
  ggtitle("Passenger Bookings (Without Variance)") +
  ylab("Adjusted Number of Bookings")

The next step would be to remove the trend.

Code
########################
# REMOVING THE TREND
########################

#lambda 

lambda <- BoxCox.lambda(ap)

# transformation

apv <- BoxCox(ap, lambda)

# first-order differencing to remove trend

apvt <- diff(apv, lag=1, differences=1)

autoplot(apvt) +
  ggtitle("De-Trended Passenger Bookings Without Variance") +
  ylab("Adjusted Number of Bookings")

Since we’ve dealt with both the variance and the trend, it is time to deal with the seasonality.

Code
########################
# REMOVING SEASONALITY
########################

#lambda 

lambda <- BoxCox.lambda(ap)

# transformation

apv <- BoxCox(ap, lambda)

# first-order differencing to remove trend

apvt <- diff(apv, lag=1, differences=1)

# seasonally differenced time series

seas_diff <- diff(apvt, lag=12, differences=1)

autoplot(seas_diff)+
  ggtitle("First-Order Differenced Passanger Booking (Lag = 12)")