---
title: "Time Series"
author: "J Sigma"
editor: source
format:
html:
css: styles.css
toc: true
toc-depth: 3
number-sections: false
theme: cosmo
code-fold: true
code-tools: true
smooth-scroll: true
embed-resources: true
page-navigation: true
pdf:
documentclass: article
toc: true
number-sections: false
execute:
engine: knitr
echo: true
warning: false
message: false
---
# 1. Introduction to Time Series Analysis
## 1.1. Time Series vs Cross-Sectional Data
::: {.callout-important title="Definition (Cross-Sectional Data)"}
**Cross-sectional data** is data that is observed or measured at one point in time.
:::
So far, the statistical methods and techniques we have used are examples of cross sectional data.
As an example, suppose we are trying to model the relationship between how well students perform in Applied Statistics based on their lecture attendance. We can measure this by the final mark of the student. Since this is a point estimate, it is an example of cross-sectional data
::: {.callout-important title="Definition (Time Series)"}
A **time series** is a sequence of observations collected at regular, equally-spaced intervals over a period of time.
:::
Time series are extremely common in everyday life. Here are a few examples of fields which use time series:
- **Business**
Sales figures, production numbers, and customer frequencies are measurements over time, and so are examples of time series
- **Economics**
Stock prices, exchange rates, and interest rates are examples of time series
- **Official Statistics**
Census data, personal expenditures and road casualties can be measured over time, and so are examples of time series
## 1.2. Describing Time Series Data
To describe time series, or get a feel for the data we are looking at, we consider four things:
(1) **Trend**
This describes the long-term movement of the data, and we consider two things:
(a) whether the trend is **increasing or decreasing** **(direction)** over time; and
(b) whether the trend is **linear or non-linear**
(2) **Seasonality**
Seasonality refers to patterns that repeat at regular intervals **throughout the series**. These patterns usually repeat themselves during the same calendar periods and last for a fixed amount of time through these periods
::: {.callout-warning title="Example" icon="false"}
We may notice that the sale of alcoholic beverage spikes up in December of every year. We may infer that this is in keeping with the fact that it is a festive season.
Furthermore, we may notice that this spike dies down around February, and this would be in-keeping with the idea that many people return to their daily working routines during this time.
:::
The pattern may repeat
- every hour mark for hourly data
- every new day for daily data
- every month for monthly data
- every quarter for quarterly data
or any variation of these frequencies, e.g., every third quarter each year.
(3) **Cyclical Movement**
These are longer term **rises and falls**. They are irregular, and are often (quite obviously) bigger deviations from the general trend than seasonal tendencies, and random variation.
::: {.callout-warning title="Example" icon="false"}
An example of a cyclical movement is the housing crisis of 2008, which caused a stock-market crash. What makes this a "cycle" is the fact that markets have recovered since then – there was a fall followed by a rise.
:::
The duration of cyclical movements in a time series is not fixed. So, some cyclical movements are longer than others. Usually, a full cycle is 2 years long, but can last up to 10 years.
(4) **Random Variation**
These are just unexpected noise or shocks in the time series
::: callout-note
There is **always** random variation in a time series. In addition, a time series may include one, two or all of the other three components.
:::
::: {.callout-warning title="Example One" icon="false"}
The following is a time series data set capturing the number of monthly airline passengers over a span of over 10 years, from 1949 to 1961. We can observe the data to see whether there is a trend, seasonality, cyclicality and random variation.
```{r}
######################################################################
# MONTHLY AIR PASSANGERS DATA
######################################################################
# Built-in dataset
data("AirPassengers")
# Convert to time series object
ts_data <- AirPassengers
# Plot raw observations
plot(ts_data,
main = "Monthly Airline Passengers",
ylab = "Passengers",
xlab = "Year")
```
**(1)** We can see that we have an increasing trend. There is no clear evidence that the trend is non-linear, and so we can – generally – describe this a a linear trend.
**(2)** We can see that close to the end of every year into the beginning of the new year, we have a spike in the monthly airline passengers, and then we have a dip in the middle of the year. So, we clearly have seasonality in this data set.
**(3)** There are no big deviations from the true trend. Yes, we may observe big spikes, especially closer to the end, but these don't break the overall pattern that we see in the data. So, there is no cyclicality.
**(4)** Random variation is present, as is always the case.
:::
::: {.callout-warning title="Example Two" icon="false"}
The following data set contains daily closing values of major European stock market indices over a fixed period. It includes the:
- **DAX (German Stock Index – Top 40)**
- **SMI (Switzerland)**
- **CAC (France)**
- **FTSE (UK – Top 100)**
We will look at the time series for the German Index
```{r}
############################################################
# EU STOCK MARKET DATA — SINGLE SERIES (DAX ONLY)
############################################################
data("EuStockMarkets")
# Extract one series
dax <- EuStockMarkets[, "DAX"]
# Convert to time series object (already ts, but explicit for clarity)
dax_ts <- ts(dax)
############################################################
# 1. RAW OBSERVATIONS
############################################################
plot(dax_ts,
main = "DAX Index (Daily Closing Prices)",
ylab = "Index Value",
xlab = "Days")
```
**(1)** Here, we can see an increasing trend over time. We can clearly see that we have a non-linear (exponential) trend.
**(2)** There are no regularly repeating patterns. So, we do not have seasonality.
**(3)** There are also no big deviations from the overall upward and exponential trend. So, there is no cyclical movement in the time series.
**(4)** There is random variation.
:::
::: {.callout-warning title="Example Three" icon="false"}
This data set shows the number of driver deaths in the UK over a fixed historical period (1970s-1980s range).
```{r}
############################################################
# UK DRIVER DEATHS
############################################################
data("UKDriverDeaths")
ts_data <- UKDriverDeaths
# Raw series
plot(ts_data,
main = "UK Driver Deaths (Monthly)",
ylab = "Deaths",
xlab = "Year")
```
**(1)** We can see that, although it is small, there is a gradual decline in the number of deaths over time. and that this is a linear decline.
**(2)** There is a strong annual pattern of higher deaths compared to other times in the year. So, we have seasonality in this time series.
**(3)** There are no big deviations from the declining trend and seasonality. So, there is no cyclicality.
**(4)** There is random variation in the time series.
:::
::: {.callout-warning title="Example Four" icon="false"}
The following data set contains annual records of the number of lynx trapped in Canada from 1821 to 1934.
```{r}
############################################################
# LYNX DATASET
############################################################
data("lynx")
ts_data <- lynx
# Basic plot
plot(ts_data,
main = "Canadian Lynx Trappings (1821–1934)",
ylab = "Number of lynx",
xlab = "Year")
```
**(1)** There is no strong long-term trend. This time series oscillates around a relatively stable mean.
**(2)** The data set does not exhibit seasonality since the fluctuations are not tied to a fixed calendar period. The patterns observed are irregular in timing.
**(3)** The time series has a dominant cyclical behaviour as the number of lynx rise and fall around every decade.
**(4)** There is random variation.
:::
## 1.3. Why Time Series Analysis?
Time series analysis is a collection of statistical techniques which attempt to isolate and quantify the influence of events and changes in conditions in order to build a model that utilises this information in order to:
- To understand how things change over time
- To forecast future changes
- To model dependence over time
Standard inferential techniques **which assume independence** do not work well with data that is collected at regular or equally-spaced time intervals since the observations are likely dependent.
This is where time series shine.
::: {.callout-important title="Definition (Autocorrelation)"}
**Autocorrelation at lag** $g$ occurs when there is dependence between observations which are $g$ time periods apart
:::
::: callout-note
A **lag** refers to the number of time periods between observations
:::
- To detect unusual behaviour; and
- To support decision-making
## 1.4. Basic Assumption Underlying Time Series Forecasting
::: {.callout-tip title="Assumption of Time Series Forecasting" icon="false"}
The factors that influenced patterns of activity in the past or present will continue to do so in – more or less – the same manner in the future.
:::
So, the overall purpose of time series analysis is to identify and isolate these factors and use them to understand the time series behaviour, and be able to predict future behaviour.
# 2. Two Fundamental Time Series Concepts: Autocorrelation and Stationarity
## 2.1. Autocorrelation
Suppose we are given data $X_{1}, X_{2}, \dots, X_{T}$ with $T$ being the number of observations in the time series.
Up until now, we have assumed that these observations are independent. That is to say that the sample is **random** and there is no relationship between any two observations. This is a nice property and we can use it to derive a lot of useful results.
However, this does not hold for all time series data. For time series, observations are taken from the same variable, and recorded at equally-spaced time intervals. This means that, generally, observations between two time periods may influence one another. This is autocorrelation (or **serial correlation**).
::: {.callout-warning title="Example" icon="false"}
We may consider the amount of time a student studies *Applied Statistics* per week. We may observe that a student studies the subject a lot in one week, and that they even go far so as to study some of the following week's content. As a result, they may prioritise this subject a little bit less since they will still be up to date if they happen to "fall behind" and may prioritise other subjects.
In this way, the amount of hours they spend on this subject on a weekly basis may be influenced by how much work the have done in the previous week.
:::
The inferential techniques we have used so far usually break time when observations are taken over time intervals because of the dependence in the time series data. They fail to capture any trends in the data and so are often not appropriate for modelling the time series.
Most time series are dependent. The autocorrelation represents useful information about patterns in the data that a time series model can use to make forecasts.
## 2.2. Stationarity
::: {.callout-important title="Definition (Stationarity)"}
A time series is said to be **stationary** if
(1) The time series has a **constant mean**
(2) The **variability** of the time series **is constant over time**
:::
We can qualitatively measure whether a time series is stationary by observing the following
(a) **If there is a trend or not**
A trend in a time series suggests that the average values will increase or decrease over time from lag to lag. If there is no trend, then the mean is constant.
(b) **If there is divergence in the time series or not**
If a time series diverges over time, then the variability betwen the observations gets larger over time, and so is not constant.
Visually:
```{r}
#########################################
# CONSTANT VS NON- CONSTANT MEAN
#########################################
set.seed(1)
par(mfrow = c(2,2)) # 2x2 layout
n <- 200
t <- 1:n
# 1. Constant mean (stationary)
y1 <- rnorm(n, mean = 0, sd = 1)
plot(ts(y1), main = "Constant Mean", ylab = "")
# 2. Changing mean (trend)
y2 <- 0.05*t + rnorm(n)
plot(ts(y2), main = "Changing Mean (Trend)", ylab = "")
#######################################
# CONSTANT VS NON-CONSTANT VARIATION
#######################################
# 3. Constant variance
y3 <- rnorm(n, mean = 0, sd = 1)
plot(ts(y3), main = "Constant Variance", ylab = "")
# 4. Changing variance (heteroskedastic)
y4 <- rnorm(n, mean = 0, sd = t/50)
plot(ts(y4), main = "Changing Variance", ylab = "")
```
::: callout-note
Most time series we encounter are non-stationary and we often need to transform them so that they exhibit stationarity. These transformations help us get a clear read into the data.
:::
# 3. Time Series Graphics
## 3.1. Time Series Plot
::: {.callout-important title="Definition (Time Series Plot)"}
A **time series plot** is a line graphof the observed data $y_{t}$ against time $t$.
:::
The first step in time series analysis is to observe an patterns any patterns that have occurred over time using a time series plot. It enables us to detect and describe patterns , factors or **components** of the past behaviour of the series.
The identified **components** help in finding a suitable statistical model to describe the data, which the allows us to forecast future values of the time series.
## 3.2. Components of an Non-Stationary Time Series
As we have discussed, there are four components of a non-stationary time series. These are:
- **Trend**
- **Cyclical Movement**
- **Seasonality**; and
- **Random Variation**
From a time series plot, we need to be able to identify and describe these components.
# 4. General Time Series Model and Moving Average Smoothing
## 4.1. Time Series Model
There are many models that are available for forecasting a time series. Before getting into the models, it is important to note the following:
::: callout-note
- The explanatory variable, $t$, is often recorded from its real world values to make the equations easier to interpret
For example, we may assign $t=1$ for the first time period (year, quarter, week, or day), and ll successive periods will be assigned in increasing integer increments
- With a time series, we are modelling the dependent variable, $y_{t}$, based on past values of itself, without using any other explanatory variables to explain any change in $y_{t}$. So, we understand past behaviour to forecast future values.
There are three main reasons why we prefer time series to other statistical techniques, particularly those which use explanatory variables to explain changes in a given factor:
1. The process generating the data might not be understood, and so it may be difficult to measure the relationship giving rise to the behaviour. So, we look at the data for what it is, and not what it is caused by.
2. Explanatory models require knowledge of various predictors in order to forecast future values.
3. Time series may give more accurate forecasts than explanatory models.
:::
### 4.1.1. Additive Time Series Model
The simplest assumption about the relationship between the components of a time series is that they are **additive** and **independent** of each other
We have the following:
::: {.callout-important title="Additive Model" icon="false"}
$$
Y_{t}=T_{t}+C_{t}+S_{t}+R_{t}
$$
where
- $t$ is the time period we are interested in
- $Y_{t}$ is the observed value of the time series at $t$
- $T_{t}$ is the trend component at time $t$
- $C_{t}$ is the cyclical component at time $t$
- $S_{t}$ is the seasonal component at time $t$; and
- $R_{t}$ is the random component at time $t$
:::
### 4.1.2. Multiplicative Time Series Model
Alternatively, we assume that the four components of a time series are not necessarily independent, and can affect one another. This is the **multiplicative time series model**. We have that
::: callout-important
$$
Y_{t}=T_{t}\times C_{t}\times S_{t}\times R_{t}
$$
Here, we can see that the observed time series value, $Y_{t}$ at time $t$ is a product of the sour time series components.
:::
The multiplicative model is the most preferred forecasting model, because,
1. It can be made additive. We can take the logarithms of the left and right hand side of the time series and observe that
$$
\ln{(Y_{t})}=\ln{(T_{t})}+\ln{(C_{t})}+\ln{(S_{t})}+\ln{(R_{t})}$$
2. If we consider any component, the the other components can be interpreted as indexes relative to that component. We can make statements like
"At time $t$, the seasonal effect is $1.2$, meaning that the series is $20\%$ above the baseline trend at that period"
The additive model is most appropriate when the size of the seasonal fluctuations remain roughly constant over time. Otherwise, a multiplicative model is preferred.
***Question: How can be better understand current patterns in our time series?***
***Answer: It is often difficult to identify components of a time series by simply graphing the series over time, because of too much random variation.***
***In the event where there is no obvious pattern, moving average smoothing can be used to smooth out the noise from the time series so that we may get a clearer idea of the nature of the time series components. This helps us build better models for more accurate forecasts.***
## 4.2. Moving Average Smoothing
Suppose we have data like $\{100, 130, 95, 140, 105\}$. It is hard to tell whether the series is actually increasing or decreasing, or just fluctuating randomly. A moving average replaces each value an avergae of nearby values (since values which are close to one another will have roughly the same value). So, we may end up with something like $\{108, 113, 121, 115, 120, 135\}$ which gives us a much clearer idea of what the time series is doing over time: increasing.
<div>
Moving average smoothing reduces short-term randomness to give us a clear idea of the behaviour of the time series.
</div>
### 4.2.1. Computing Moving Averages
We use the notation $MA(k)$ to denote the moving average, where $k$ gives the number of time periods over which average values are taken. We have the following process
1. The first moving average of a sequence $y_{1}, y_{2}, \dots, y_{T}$ is obtained by calculating the average of the first $k$ consecutive values $y_{1}, y_{2}, \dots, y_{k}$
2. The second is obtained by taking the avergage of the next batch of $k$ values $y_{2}, y_{3}, \dots, y_{k}, y_{k+1}$
3. This process continues until the last batch of $k$ consecutive values is computed. We will end up with
$$
T-k+1
$$
smoothed values
::: {.callout-warning title="Worked Example (Finding Moving Averages)"}
Suppose we want to calculate **MA(3)** values for the following data set
| Year | t | Revenue | MA(3) |
|:--------------:|:--------------:|:--------------:|:------------------------:|
| 2003 | 1 | 34 | |
| 2004 | 2 | 12 | $$
\frac{34+12+67}{3}=37.67
$$ |
| 2005 | 3 | 67 | $$
\frac{12+67+87}{3}=55.33
$$ |
| 2006 | 4 | 87 | $$
\frac{67+87+22}{3}=58.67
$$ |
| 2007 | 5 | 22 | $$
58.33
$$ |
| 2008 | 6 | 66 | $$
55
$$ |
| 2009 | 7 | 77 | $$
77.67
$$ |
| 2010 | 8 | 90 | $$
67
$$ |
| 2011 | 9 | 34 | $$
48.67
$$ |
| 2012 | 10 | 22 | |
Notice that the moving average is assigned to the middle period for each window. What if, instead, we wanted to find **MA(4)** values? Now, we don't have a "middle" period in the same sense as when we had an **MA(3)**. This doesn't mean that we assign the values to periods that are not in our already defined time periods. We have the following:
| Year | t | Revenue | MA(4) | CMA(4) |
|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|
| 2003 | 1 | 34 | | |
| 2004 | 2 | 12 | $$
\frac{34+12+67+87}{4}=50
$$ | |
| 2005 | 3 | 67 | $$
\frac{12+67+87+22}{4}=47
$$ | $$
\frac{50+47}{2}=48.5
$$ |
| 2006 | 4 | 87 | $$
60.5
$$ | $$
\frac{47+60.5}{2}=53.75
$$ |
| 2007 | 5 | 22 | $$
63
$$ | $$
61.75
$$ |
| 2008 | 6 | 66 | $$
63.75
$$ | $$
63.375
$$ |
| 2009 | 7 | 77 | $$
66.75
$$ | $$
65.250
$$ |
| 2010 | 8 | 90 | $$
55.75
$$ | $$
61.250
$$ |
| 2011 | 9 | 34 | | |
| 2012 | 10 | 22 | | |
So, in the case where we have an even number of periods to consider per window, we do two things:
1. We calculate a **moving average**, and assign it to the period before where the middle of the period occurs
2. Then we calculate a **centered moving average** between consecutive middle values and assign it to the period occurring after the middle period
:::
### 4.2.2. Choosing $k$
$k$ is subjectively chosen. However so, it much be chosen in such a way that it minimises the fluctuations as best as possible. The guideline for choosing $k$ is as follows:
- $k=4$ for quarterly data
- $k=7$ for daily data
- $k=12$ for monthly data
and so on. As $k$ gets bigger, the series becomes smoother. However, if $k$ is too large, the smoothed series tends towards a **straight line** which defeats the purpose of a time series model where we have four components. Unless you have reason to believe that trend is the only component present in the time series, this is not really appropriate.
```{r}
################################################
# MOVING AVERAGES IN R -- AIR PASSANGERS DATA
################################################
library(ggplot2)
library(forecast)
# Note that R does the CMA automatically
autoplot(AirPassengers) +
autolayer(ma(AirPassengers, 12), series="MA(12)") +
ggtitle("Air Passangers Data with MA(12) Smoothed Series")
```
::: {.callout-important title="Moving Averages Advantages and Disadvantages"}
- Moving averages are very quick and easy to apply
- The problem with moving averages is that we lose $k-1$ observations if $k$ is odd, and $k$ observations if $k$ is even
- Some values are never considered, particlarly when the fall out of the window for which the moving averages are calculated
- a larger k $\implies$ fewer moving averages can be computed which makes it hard to obtain an overall impression of the entire series
:::
# 5. Time Series Decomposition
We decompose a time series primarily to get a better understanding of how the underlying components are affecting the variable of interest. Before moving forward, we need to understand the notion of **seasonal variation**
::: {.callout-important title="Definition (Seasonal Variation)"}
**Seasonal variation** refers to the regular and reapeating patterns that occur over fixed time intervals. For example, we can find that ice cream sales increase every December year-on year. Since we have identified a month, we can say that – for monthly data – period 12 always has an increase in the number of sales of ice cream. Similarly, the data may behave in many other ways during different months.
We can say that the **length of the** **seasonal variation is one year, and that it is made of 12 months or 12 "SEASONS"**
It follows, then, that an appropriate $k$ value for MA smoothing would be $k=12$
:::
## 5.1. Classical Decomposition
::: {.callout-important title="The process of Classical Decomposition"}
1. Develop an estimate for the trend component using centred moving averages
2. De-trend the original time series to isolate the seasonal component
3. Calculate the **seasonal indices** and the **adjust them if necessary**. This gives precise estimates of the seasonal component
4. Calculate the random component
5. Deseasonalise the original time series
6. If the trend is linear, develop a precise estimate of the trend component on the deseasonalised time series
:::
::: callout-note
We assume that cylicality, $C_{t}$ is negligible. So, our time series multiplicative model is given by
$$
Y_{t}=T_{t}\times S_{t} \times R_{t}
$$
:::
::: {.callout-warning title="Worked Example"}
The quarterly earning (in R millions) of a large drink manufacturer have been recorded for the years 1997 to 2000. We have the following:
{fig-align="center" width="474"}
and the data are listed below
| Year | Quarter | t | Quarterly Earnings |
|:----:|:-------:|:---:|:------------------:|
| 1997 | 1 | 1 | 52 |
| 1997 | 2 | 2 | 67 |
| 1997 | 3 | 3 | 85 |
| 1997 | 4 | 4 | 54 |
| 1998 | 1 | 5 | 57 |
| 1998 | 2 | 6 | 75 |
| 1998 | 3 | 7 | 90 |
| 1998 | 4 | 8 | 61 |
| 1999 | 1 | 9 | 60 |
| 1999 | 2 | 10 | 77 |
| 1999 | 3 | 11 | 94 |
| 1999 | 4 | 12 | 63 |
| 2000 | 1 | 13 | 66 |
| 2000 | 2 | 14 | 82 |
| 2000 | 3 | 15 | 98 |
| 2000 | 4 | 16 | 67 |
Decompose the time series into its components, estimate the components and then build a model that could be used to forecast future quarterly earnings values.
1. **We first develop an estimate of the trend component** in the series by calculating (centred) moving averages (CMA's) which allow us to smooth the seasonal and some of the random component.
| Year | Quarter | t | Quarterly Earnings | MA(4) | CMA(4) |
|:----:|:-------:|:---:|:------------------:|:-----:|:------:|
| 1997 | 1 | 1 | 52 | | |
| 1997 | 2 | 2 | 67 | 64.5 | |
| 1997 | 3 | 3 | 85 | 65.75 | 65.125 |
| 1997 | 4 | 4 | 54 | 67.75 | 66.75 |
| 1998 | 1 | 5 | 57 | 69 | 68.375 |
| 1998 | 2 | 6 | 75 | 70.75 | 69.875 |
| 1998 | 3 | 7 | 90 | 71.5 | 71.125 |
| 1998 | 4 | 8 | 61 | 72 | 71.75 |
| 1999 | 1 | 9 | 60 | 73 | 72.5 |
| 1999 | 2 | 10 | 77 | 73.5 | 73.25 |
| 1999 | 3 | 11 | 94 | 75 | 74.25 |
| 1999 | 4 | 12 | 63 | 76.25 | 75.625 |
| 2000 | 1 | 13 | 66 | 77.25 | 76.75 |
| 2000 | 2 | 14 | 82 | 78.25 | 77.75 |
| 2000 | 3 | 15 | 98 | | |
| 2000 | 4 | 16 | 67 | | |
2. **We then de-trend the original time series by computing the ratio**
$$
\frac{Y_{t}}{CMA(k)_{t}}
$$
for each time period
| Year | Quarter | t | Quarterly Earnings | MA(4) | CMA(4) | De-trended Data |
|:----:|:-------:|:---:|:------------------:|:-----:|:------:|:---------------:|
| 1997 | 1 | 1 | 52 | | | |
| 1997 | 2 | 2 | 67 | 64.5 | | |
| 1997 | 3 | 3 | 85 | 65.75 | 65.125 | 1.305 |
| 1997 | 4 | 4 | 54 | 67.75 | 66.75 | 0.809 |
| 1998 | 1 | 5 | 57 | 69 | 68.375 | 0.834 |
| 1998 | 2 | 6 | 75 | 70.75 | 69.875 | 1.073 |
| 1998 | 3 | 7 | 90 | 71.5 | 71.125 | 1.265 |
| 1998 | 4 | 8 | 61 | 72 | 71.75 | 0.85 |
| 1999 | 1 | 9 | 60 | 73 | 72.5 | 0.828 |
| 1999 | 2 | 10 | 77 | 73.5 | 73.25 | 1.051 |
| 1999 | 3 | 11 | 94 | 75 | 74.25 | 1.266 |
| 1999 | 4 | 12 | 63 | 76.25 | 75.625 | 0.833 |
| 2000 | 1 | 13 | 66 | 77.25 | 76.75 | 0.86 |
| 2000 | 2 | 14 | 82 | 78.25 | 77.75 | 1.055 |
| 2000 | 3 | 15 | 98 | | | |
| 2000 | 4 | 16 | 67 | | | |
3. **We calculate the seasonal tendencies, then adjust them if necessary**
To do this, we group the de-trended data by the corresponding "season" – i.e., matching quarters in this case – and take the average of the data values in each season to get the **seasonal indices**
| **Year** | Quarter 1 | Quarter 2 | Quarter 3 | Quarter 4 |
|:-----------------------:|:---------:|:---------:|:---------:|:---------:|
| 1997 | | | 1.305 | 0.809 |
| 1998 | 0.834 | 1.073 | 1.265 | 0.85 |
| 1999 | 0.828 | 1.051 | 1.266 | 0.833 |
| 2000 | 0.86 | 1.055 | | |
| **Seasonal Index (SI)** | **0.841** | **1.060** | **1.279** | **0.831** |
We then sum the seasonal indices. The sum of the seasonal indices should match the value of $m=\text{number of seasons}$. If they don't, we multiply each index by a correction factor, and
$$\frac{m}{\sum(\text{seasonal indices})}$$
| **Year** | Quarter 1 | Quarter 2 | Quarter 3 | Quarter 4 | Total |
|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
| 1997 | | | 1.305 | 0.809 | |
| 1998 | 0.834 | 1.073 | 1.265 | 0.85 | |
| 1999 | 0.828 | 1.051 | 1.266 | 0.833 | |
| 2000 | 0.86 | 1.055 | | | |
| **Seasonal Index (SI)** | **0.841** | **1.060** | **1.279** | **0.831** | **4.011** |
| **Adjusted SI** | **0.839** | **1.057** | **1.275** | **0.829** | **4** |
We can interpret the seasonal indices as follows:
$S_{t}>1$ implies that, on average, values in the season are $100\times|S_{t}-1|\%$ above the seasonal average, and $S_{t}<1$ implies that, on average, values in that season are $100\times |S_{t}-1|\%$ below the seasonal average. In our example:
$S_{1}=0.839$ implies that, on average, earnings in the first quarter are $100\times|0.839-1|=16.1\%$ **below** the annual average earnings. $S_{3}=1.275$ implies that, on average, earnings in the third quarter are $100\times|1.275-1|=27.5\%$ **above** the annual average earnings.
4. **We compute the random component by dividing the original time series by the product of the estimates of the trend and seasonal components**
$$
R_{t}=\frac{Y_{t}}{\hat{S_{t}\times CMA(k)_{t}}}
$$
| Year | Quarter | t | Quarterly Earnings | MA(4) | CMA(4) | De-trended Data | Adjusted Seasonal Index | Random Component |
|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| 1997 | 1 | 1 | 52 | | | | | |
| 1997 | 2 | 2 | 67 | 64.5 | | | | |
| 1997 | 3 | 3 | 85 | 65.75 | 65.125 | 1.305 | 1.275 | 1.024 |
| 1997 | 4 | 4 | 54 | 67.75 | 66.75 | 0.809 | 0.829 | 0.976 |
| 1998 | 1 | 5 | 57 | 69 | 68.375 | 0.834 | 0.839 | 0.994 |
| 1998 | 2 | 6 | 75 | 70.75 | 69.875 | 1.073 | 1.057 | 1.015 |
| 1998 | 3 | 7 | 90 | 71.5 | 71.125 | 1.265 | 1.275 | 0.992 |
| 1998 | 4 | 8 | 61 | 72 | 71.75 | 0.85 | 0.829 | 1.026 |
| 1999 | 1 | 9 | 60 | 73 | 72.5 | 0.828 | 0.839 | 0.986 |
| 1999 | 2 | 10 | 77 | 73.5 | 73.25 | 1.051 | 1.057 | 0.995 |
| 1999 | 3 | 11 | 94 | 75 | 74.25 | 1.266 | 1.275 | 0.993 |
| 1999 | 4 | 12 | 63 | 76.25 | 75.625 | 0.833 | 0.829 | 1.005 |
| 2000 | 1 | 13 | 66 | 77.25 | 76.75 | 0.86 | 0.839 | 1.025 |
| 2000 | 2 | 14 | 82 | 78.25 | 77.75 | 1.055 | 1.057 | 0.998 |
| 2000 | 3 | 15 | 98 | | | | | |
| 2000 | 4 | 16 | 67 | | | | | |
5. **We then deseasonalise the original time series** which results in a smoother, seasonally-adjusted time series which makes it easier to estimate the trend component. We deseasonalise the time series as follows:
$$
\frac{Y_{t}}{\hat{S}_{t}}=\frac{T_{t}\times S_{t}\times R_{t}}{\hat{S}_{t}}\approx T_{t}\times R_{t}=(DS)_{t}=T_{t}+\epsilon_{t}
$$
::: callout-note
$\hat{S}{t}\approx S_{t}$, and so we can just cancel these out to get an estimate of the deseasonalised time series.
:::
| Year | Quarter | t | Quarterly Earnings | MA(4) | CMA(4) | De-trended Data | Adjusted Seasonal Index | Random Component | Seasonally-Adjusted Time Series |
|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| 1997 | 1 | 1 | 52 | | | | 0.839 | | 61.979 |
| 1997 | 2 | 2 | 67 | 64.5 | | | 1.057 | | 63.387 |
| 1997 | 3 | 3 | 85 | 65.75 | 65.125 | 1.305 | 1.275 | 1.024 | 66.667 |
| 1997 | 4 | 4 | 54 | 67.75 | 66.75 | 0.809 | 0.829 | 0.976 | 65.139 |
| 1998 | 1 | 5 | 57 | 69 | 68.375 | 0.834 | 0.839 | 0.994 | 67.938 |
| 1998 | 2 | 6 | 75 | 70.75 | 69.875 | 1.073 | 1.057 | 1.015 | 70.956 |
| 1998 | 3 | 7 | 90 | 71.5 | 71.125 | 1.265 | 1.275 | 0.992 | 70.588 |
| 1998 | 4 | 8 | 61 | 72 | 71.75 | 0.85 | 0.829 | 1.026 | 73.583 |
| 1999 | 1 | 9 | 60 | 73 | 72.5 | 0.828 | 0.839 | 0.986 | 71.514 |
| 1999 | 2 | 10 | 77 | 73.5 | 73.25 | 1.051 | 1.057 | 0.995 | 72.848 |
| 1999 | 3 | 11 | 94 | 75 | 74.25 | 1.266 | 1.275 | 0.993 | 73.725 |
| 1999 | 4 | 12 | 63 | 76.25 | 75.625 | 0.833 | 0.829 | 1.005 | 75.995 |
| 2000 | 1 | 13 | 66 | 77.25 | 76.75 | 0.86 | 0.839 | 1.025 | 78.665 |
| 2000 | 2 | 14 | 82 | 78.25 | 77.75 | 1.055 | 1.057 | 0.998 | 77.578 |
| 2000 | 3 | 15 | 98 | | | | 1.275 | | 76.863 |
| 2000 | 4 | 16 | 67 | | | | 0.829 | | 80.820 |
We obtain the following

If the deseasonalised time series looks linear, we can regress the deseasonalised data on time $t$:
$$
\hat{T}_{t}=\hat{\beta_{0}}+\hat{\beta}_{1}t
$$
This will provide us with a good estimate for the trend component. So, we conduct a simple linear regression analysis on the deseasonalised seasonaly-adjusted time series. Our multiplicative model for forecasting future values then becomes
$$
\hat{Y}_{t}=\hat{T}_{t}\times \hat{S}_{t}=(\hat{\beta}_{0}+\hat{\beta}_{1}t) \times S_{t}
$$
:::
### 5.1.1. Drawbacks of Classical Decomposition
::: {.callout-important title="Drawbacks of Classical Decomposition"}
- The estimate of the trend cannot be found for the first and last few observations due to moving average smoothing used to estimate the trend
- The trend estimate tends to over-smooth rapid rises and falls in the data
- The method assumes that seasonal components repeat from year to year. While this is a reasonable assumption for some time series, it is not good for longer tie series. So, it fails to capture seasonal changes over time
- Classical decomposition is not robust to unusual values (like short term big rises and drops which do not conform to the pattern of the time series as a whole)
:::
### 5.1.2. Classical Decomposition in R
```{r}
###################################################
# CLASSICAL DECOMPOSITION OF AIR PASSANGERS DATA
###################################################
# reading in data
data("AirPassengers")
# decomposition
air_decomp <- decompose(AirPassengers, type="multiplicative")
# Note, there's also an "additive" option, but we use the multiplicative
# model
# plot decompositon
plot(air_decomp)
# summary
raw <- print(air_decomp) #raw data for each component
```
As we can see, R automatically isolates each of the components of the multiplicativve model so that we may understand the behaviour of the data better
# 6. The Forecaster's Toolbox
There are some forecasting methods that are simple, yet remarkably effecctve for some time series. We always fit simple forecasting methods to use as 'benchmarks' against which we compare the other more complex methods. If it happens that the more complex methods do not yield better results, then there is no need to use it for the time series being analysed
We consider four methods:
1. the **average method**
2. the **naive method**
3. the **seasonal naive method**; and
4. the **drift method**
## 6.2. Simple Forecasting Methods
### 6.2.1. The Average Method
The **average method** forecasts the future values as the mean of the historical data. The notation $\hat{y}_{T+h}|T$ is a short-hand for the estimate of $y_{T+h}$ based on the data $y_{1}, \dots, y_{T}$
```{r}
############################
# THE AVERAGE METHOD IN R
############################
library(forecast)
avg_forecast <- meanf(AirPassengers, h=12)
plot(avg_forecast,
main="Average Method Forecast",
xlab="Year",
ylab="Passengers")
# h is the forecast horizon (time steps to forecast ahead)
```
### 6.2.2. Naive Method
The **naive method** forecasts all future values as the value of the last observation
```{r}
######################
# NAIVE METHOD IN R
######################
nforecast <- naive(AirPassengers, 12)
# or nforecast <- rwf(AirPassengers, 12)
plot(nforecast)
```
### 6.2.3. Seasonal Naive Method
The **seasonal naive method** is used when the time series has a strong seasonal component. the forecast value for a particular season is the corresponding forecasted value from the previous season. So,
$$
\hat{y}_{T+h|T}=y_{T+h-m(k+1)}
$$
```{r}
##############################
# SEASONAL NAIVE METHOD IN R
##############################
snforecast <- snaive(AirPassengers, 24)
plot(snforecast)
```
### 6.2.4. Drift Method
The **drift method** is an extension to the naive method which allows for the presence of a linear trend in the data. The amount of change over time (called the **drift**) is equal to the average change in the historical data.
$$
\hat{y}_{T+h|T}=y_{T}=\frac{h}{T-1}\sum_{t=2}^{T}(y_{t}-y_{t-1})=y_{T}+h\left(\frac{y_{T}-y_{1}}{T-1}\right)
$$
```{r}
######################
# DRIFT METHOD IN R
######################
drift_forecast <- rwf(AirPassengers, 24, drift=TRUE)
plot(drift_forecast)
```
## 6.3. Forecasting and Forecasting Accuracy
### 6.3.1. Forecasting with Simple Forecasting Methods
::: {.callout-warning title="Worked Example One" icon="false"}
```{r}
#########################################
# FORECASTING AUSTRALIAN BEER PRODUCTION
#########################################
library(fpp2)
library(ggplot2)
library(urca)
library(forecast)
data("ausbeer")
austrain <- window(ausbeer, start=c(1994, 1), end=c(2010, 4)) # partitions data
autoplot(austrain) +
autolayer(meanf(austrain, h = 12),
series = "Average Method",
PI = FALSE) +
autolayer(naive(austrain, h = 12),
series = "Naive Method",
PI = FALSE) +
autolayer(snaive(austrain, h = 12),
series = "Seasonal Naive",
PI = FALSE) +
autolayer(rwf(austrain, h = 12, drift = TRUE),
series = "Drift Method",
PI = FALSE) +
ggtitle("Forecasts for Quarterly Beer Production") +
xlab("Year") +
ylab("Megalitres") +
guides(colour = guide_legend(title = "Forecast"))
```
We can see from these forecasts that, in the short-term, the seasonal naive method would be the best forecasting method to use for the time series. However, in the long-term, the drift method is probably better because of the downtrend present in the data.
The seasonal naive method does not preserve the trend in the data
:::
::: {.callout-warning title="Worked Example Two" icon="false"}
```{r}
###############################
# FORECASTING GOOGLE PRICE
###############################
library(fpp2)
library(fpp3)
library(forecast)
library(ggplot2)
library(urca)
data("goog200")
autoplot(goog200) +
autolayer(meanf(goog200, h =90),
series = "Average Method",
PI = FALSE) +
autolayer(snaive(goog200, h = 90),
series = "Seasonal Naive Method",
PI = FALSE) +
autolayer(naive(goog200, h = 90),
series = "Naive Method",
PI = FALSE) +
autolayer(rwf(goog200, h = 90, drift = TRUE),
series = "Drift Method",
PI = FALSE) +
ggtitle("Google Stock (Daily, ending 6 December 2013")+
xlab("Closing Price (US$)") + ylab("Day") +
guides(colour=guide_legend(title="Forecasts"))
```
There is a note to make here about the seasonal naive method. Looking at the time series, we can see that there is very little seasonality (if there is, at all, any). As a result, when we try to forecast using the seasonal naive method, its behaviour is the same as the naive method. In our example, the seasonal naive method is printed, but since it is the same as the naive method, and the naive method is layered on top of it, you cannot see it visually.
The method that will provide the most accurate long-term forecast here is drift. This is due to the trend in the data. Short-term, we may argue that the naive method would produce the most accurate forecast.
:::
### 6.3.2. Forecasting Accuracy
Before forecasting into the future, we try to see if our model well approximates the current data that we have. To do this, we split our data into a **training** and a **test data set**. The training data set is the data that we train our model on. This is the part of the data that the model sees. The test data set is then used to confirm whether or not the model we have built is accurate enough. The test set is only used once.
:::: callout-note
<div>
In the real world, it is often wise to split the data into three sets: a **training set**, a **validation set**, and a **test set**.
The validation set acts as the test data set as we have defined above. However, a validation set allows us to tweak our model as much as we can to that we have a sufficiently accurate model (in the case where it is not accurate enough). Then, we use the test set to see if our data is truly accurate.
The reason we do this is because we do not want to go straight to the test data set, and then try to tweak our data so that it fits the test data set. This would result in a biased model that runs a risk of not forecasting accurately. At the same time, we do not want to lose data. So, the validation set is there to serve as the point where we can tweak our model as much as we want, and see if this will accurately forecast the test data.
</div>
::::
When more than one forecasting method is available (as is usually the case), we can determine which method will result in the best forecasting accuracy using the **mean absolute error** **(MAE)** and the **mean squared error (MSE)**. We have that,
$$MAE=\sum_{t=1}^{T}\frac{|Y_{t}-\hat{Y}_{t}|}{T} \quad \text{and} \quad MSE=\sum_{t=1}^{T}\frac{(Y_{t}-\hat{Y}_{t})^{2}}{T}$$
where
- $\hat{Y}_{t}$ is the forecasted value in period $t$
- $Y_{t}$ is the actual observation in period $t$
- $T$ is the number of time periods
- ${e}_{t}=Y_{t}-\hat{Y}_{t}$ is the forecast error
::: {.callout-important title="Mean Absolute Error"}
**MAE** is a measure of the average of the abolute differences between the actual and fitted values for a give set of data.
If the model predicts the data perfectly, $MAE=0$, and it grows (relative to the data range) with poor data prediction.
:::
::: {.callout-important title="Mean Squared Error" icon="false"}
**MSE** is a measure of the average squared differences between the actual and fitted values for a give set of data.
If the model predicts the data perfectly, $MSE=0$. Otherwise, the MSE will be large (relative to the data range).
:::
::: {title="Key Idea: Choosing the Measure of Accuracy"}
- If it is important to avoid (even a few) large errors, the MSE is ideal because it punishes large deviations more heavily (by squaring) than the MAE. Otherwise, use MAE.
- Note that MAE and MSE are both scale-dependent accuracy measures, i.e. they can only be used to compare models fitted on data with the same scale.
For example, you cannot use these to comapare data where the the observed values a measured in 10000's vs data where the observed values are measured in 0.0001's
- When comparing models using forecasting accuracy, we prefer models with smaller MAE and MSE
:::
::: {.callout-warning title="Forecasting Accuracy" icon="false"}
```{r}
########################
# FORECASTING ACCURACY
########################
library(fpp2)
library(ggplot2)
library(urca)
library(forecast)
data("ausbeer")
# splitting the data
austrain <- window(ausbeer, start=1992, end=c(2007, 4)) # training set
austest <- window(ausbeer, start=2008) # testing set (rest of data)
# train forecasting models
ausfit1 <- meanf(austrain, h=8)
ausfit2 <- naive(austrain, h=8)
ausfit3 <- snaive(austrain, h=8)
ausfit4 <- rwf(austrain, h=8, drift=TRUE)
# testing the accuracy against the testing set
accuracy(ausfit1, austest)
accuracy(ausfit2, austest)
accuracy(ausfit3, austest)
accuracy(ausfit4, austest)
```
Notice that the MSE is not readily given by R. Instead, it gives RMSE. Recall that
$$
RMSE=\sqrt{MSE}\iff RMSE^{2}=MSE
$$
But, we could still compare the RMSEs if $RMSE>1$ for all the forecasting methods. Otherwise, if at least one of them has a value less that $1$, then you need to square first. From our example, we can see that the seasonal naive method gives the highest forecasting accuracy since it has the lowest RMSE and MAE values.
:::
### 6.3.3. Prediction Intervals
::: {.callout-important title="Definition (Prediction Intervals)"}
A **prediction interval** provides an interval within which we expect the forecasted values to lier with a specified probability.
:::
For example, assuming that the forecast errors are normally ditributred, a $95\%$ prediction interval for the $h$ step is given by
$$
\hat{y}_{t+h|T}\pm 1.96\hat{\sigma_{h}}
$$
where $\hat{\sigma}_{h}$ is the estimated standard deviation of the $h$-step forecast distribution. Usually, $\hat{\sigma}_{h}$ is set equal to the standard deviation of the residuals for one-step ahead forecasts.
Generally, we write that
$$
\hat{y}_{t+h|T} \pm c\hat{\sigma}_{h}
$$
where $c$ depends on the **coverage probability**.
::: {.callout-warning title="Worked Example" icon="false"}
```{r}
#################################################
# FORECASTING GOOGLE PRICE: PREDICTION INTERVALS
#################################################
library(fpp2)
library(fpp3)
library(forecast)
library(ggplot2)
library(urca)
data("goog200")
fit <- naive(goog200, h=10, level=c(80, 95))
fit
```
This table gives the lower and upper bounds of the prediction intervals for the forecast values.
Suppose we are given that the standard deviation of the residuals from the naive method is $6.21$. Then, we could calculate the $95\%$ prediction interval as
$$
\hat{y}_{t+h|T}=531.48\pm1.96(6.21)=[519.3;543.6]
$$
as appears in the table.
:::
# 7. Transformations
Most time series method rely on the idea that the underlying probabilistic behavior stays stable over time. So, stationarity is a strong condition for time series. It means:
- the mean does not change over time
- the variance does not change over time
- the autocorrelation does not change over time
We want the time series to exhibit this so that we can make accurate decisions about the future. This can only happen if, statistically, the past behaves like the future. We have already seen that not all time series are stationary.
However, we can produce stationary time series by performing **transformations**. We consider **ARIMA** **models**. We usually start transforming time series by applying variance stabilising transformations if they are necessary, before removing trend and/or seasonality.
## 7.1. Stabilising Transformations
### 7.1.1. Stabilising the Variance
If we observe that the variance of a time series increases or decreases with time, then we need a transformation which will stabilise the variance. Variance-stabilising transformations which are useful include:
- **logarithmic transformations**
- **square root transformations**
- **cube root transformations**
- **fourth root transformations**
::: callout-note
The graphical representation of a time series is a good way of getting an idea of what the time series is not. However, it doesn't tell you what it is.
For example, we may know that a time series is not stationary by looking at it, and noticing that there is some kind of trend in the data. However, we cannot really look at a time series that seems stationary, and make a claim that it, indeed, is stationary. The reason for this is that, often, we have no idea what the process generating the data is.\
\
Consider this: if someone is drunk, you definitely can tell that they are **not sober**. However, if another person looks sober, you cannot completely tell by just looking at them. In a similar way, a time series plot that has a clear uptrend or downtrend, and/or changing variance is not stationary. However, a time series that looks stationary isn't always is.
:::
We will asses changes in variation for a time series graphically, and consider time series where it is clear that the variance is changing or not. So, although it is useful to know the notion above, we ask – for now – that you do away with the idea.
As we said before, there are many different transformations that can be used to stabilise the variance. How do we decide which one to use?
The logarithm, square root and cube root transformations form part of a broader family called the **Box-Cox Transformations** that depend on the parameter, $\lambda$:
$$
w_{t}=\begin{cases} \ln{(y_{t})} & \text{if } \lambda=0\\
(\lambda w_{t}+1)^{\frac{1}{\lambda}} & \text{otherwise}
\end{cases}
$$
To which transformation will be used, we need to find the value of $\lambda$ first.
- $\lambda=1$ means that no variance stabilising is needed
- $\lambda=0$ as is given in the equation, mean that a logarithmic transformation is needed
- $\lambda=0.25$ results in a transformation that is close to a 4-th root approximation
- $\lambda=0.33$ results in a cube root transformation
- $\lambda=0.5$ results in a square root approximation
::: {.callout-warning title="Worked Example" icon="false"}
```{r}
##############################################
# BOX-COX TRANSFORMATION (AUSSIE ELECTRICITY)
##############################################
library(fpp2)
library(fpp3)
library(ggplot2)
library(urca)
library(forecast)
data("elec")
autoplot(elec) +
ggtitle("Australian Monthly Electricity Demand") +
ylab("GWh")
# lambda
(lambda <- BoxCox.lambda(elec))
```
Before having even calculated the value for $\lambda$ here, we could have seen that some transformation would be required because the variation in the data grows with time. We then get that $\lambda=0.2654...$ which is close to $\lambda=0.25$. This means that the Box-Cox transformation that will be applied to the data here is a 4-th root transformation.
We can, then, plot the transformed time series using the value for $\lambda$ we found.
```{r}
autoplot(BoxCox(elec, lambda))
```
The data looks to have a more constant variance after having applied the transformation.
:::
::: {.callout-warning title="Worked Example Two" icon="false"}
```{r}
##############################################
# BOX-COX TRANSFORMATION - AIR PASSENGER DATA
##############################################
library(ggplot2)
library(forecast)
library(urca)
library(fpp2)
library(fpp3)
# raw time series
data("AirPassengers")
autoplot(AirPassengers) +
ggtitle("Passenger Bookings")
# lambda value
(lmda <- BoxCox.lambda(AirPassengers))
# transformation
autoplot(BoxCox(AirPassengers, lmda))
```
We can see in this example that the time series has an increasing variation with time. The $\lambda$ value obtained was $-0.2947...$, and we used this to transform the data.
:::
### 7.1.2. Stabilising the Trend (Via Differencing)
If a time series has a trend, it is non-stationary because the mean will be changing with time as a result of the presence of the trend.
::: {.callout-important title="Key Idea"}
**Differencing** is used when there is only a l;inear trend and random variation (i.e. no seasonality), and the variance appears to be constant
:::
**First-order differencing at lag 1** is performed as:
$$
w_{t}=y_{t}-y_{t-1} \quad t=2,3,\dots, T
$$
As a result of this, we lose one data point as the difference for the first observation cannot be computed. **Second-order differencing** can be performed to remove a quadratic trend. Rarely is more than secon-order differencing performed in practice.
::: callout-note
Since we do not deal with non-linear trends in this course, we do not consider more than one order of differencing
:::
::: {.callout-warning title="Worked Example One" icon="false"}
```{r}
###############################
# DIFFERENCING TO REMOVE TREND
###############################
library(ggplot2)
library(forecast)
library(urca)
library(fpp2)
library(fpp3)
# raw time series
data("goog200")
goog <- goog200
autoplot(goog) +
ggtitle("Google Stock Price")
# first-order differencing at lag 1
diff_g <- diff(goog, lag=1, differences=1)
autoplot(diff_g)+
ggtitle("Google Stock Price Differencing at lag 1")
```
We can see here that, except for the spike around Day 160, the data has been de-trended. Also, we can see that the variance of the differenced series is relatively constant. So, this series is now likely stationary.
:::
### 7.1.3. Stabilising the Seasonality (Via Differencing)
For many time series, seasonal effects are very common. What this implies is that the time series will be non-stationary since the mean will be changing due to seasonality. So, we need to stabilise seasonality.
We can use an appropriate order for differencing to remove the seasonal effects. The first-order difference with lag $g$ is obtained by:
$$
w_{t}=y_{t}-y_{t-g} \quad t=g+1, g+2, \dots, T
$$
Where $g$ is the number of seasons within the seasonal variation of the data. The conclusion regarding which lag to use can also be guided by the raw data. If we can see that the data is recorded in months for a series of years, then the $g=12$. If we have quarterly data, $g=4$ instead.
::: {.callout-warning title="Worked Example" icon="false"}
```{r}
########################################
# REMOVING SEASONALITY: AIRPASSENGERS
########################################
library(forecast)
library(fpp2)
library(fpp3)
library(ggplot2)
library(urca)
data("AirPassengers")
ap <- AirPassengers
# raw time series
autoplot(ap) +
ggtitle("Passnger Bookings") +
ylab("Number of Bookings")
# differenced time series
seas_diff <- diff(ap, lag=12, differences=1)
autoplot(seas_diff)+
ggtitle("First-Order Differenced Passanger Booking (Lag = 12)")
```
From this. we can see that there is not apparent seasonality. But this didn't make our data any easier to read. We note that there is still some trend in our data, and the is also some variance. So, our time series is not stationery, even after the differencing we performed. So, a better approached may have been to first remove the trend and the variance before taking seasonal differencing.
```{r}
######################
# REMOVING VARIANCE
######################
#lambda
(lambda <- BoxCox.lambda(ap))
# transformation
apv <- BoxCox(ap, lambda)
autoplot(apv) +
ggtitle("Passenger Bookings (Without Variance)") +
ylab("Adjusted Number of Bookings")
```
The next step would be to remove the trend.
```{r}
########################
# REMOVING THE TREND
########################
#lambda
lambda <- BoxCox.lambda(ap)
# transformation
apv <- BoxCox(ap, lambda)
# first-order differencing to remove trend
apvt <- diff(apv, lag=1, differences=1)
autoplot(apvt) +
ggtitle("De-Trended Passenger Bookings Without Variance") +
ylab("Adjusted Number of Bookings")
```
Since we've dealt with both the variance and the trend, it is time to deal with the seasonality.
```{r}
########################
# REMOVING SEASONALITY
########################
#lambda
lambda <- BoxCox.lambda(ap)
# transformation
apv <- BoxCox(ap, lambda)
# first-order differencing to remove trend
apvt <- diff(apv, lag=1, differences=1)
# seasonally differenced time series
seas_diff <- diff(apvt, lag=12, differences=1)
autoplot(seas_diff)+
ggtitle("First-Order Differenced Passanger Booking (Lag = 12)")
```
:::