2022-08-12

tsibble objects

  • A time series can be thought of as a list of numbers (measurement) along with the time those number were recorded (index).

  • In the tidyverse, we use tsibble object to save a time series in R.

head(global_economy)
## # A tsibble: 6 x 6 [1Y]
## # Key:       Country [1]
##    Year Country             GDP Imports Exports Population
##   <dbl> <fct>             <dbl>   <dbl>   <dbl>      <dbl>
## 1  1960 Afghanistan  537777811.    7.02    4.13    8996351
## 2  1961 Afghanistan  548888896.    8.10    4.45    9166764
## 3  1962 Afghanistan  546666678.    9.35    4.88    9345868
## 4  1963 Afghanistan  751111191.   16.9     9.17    9533954
## 5  1964 Afghanistan  800000044.   18.1     8.89    9731361
## 6  1965 Afghanistan 1006666638.   21.4    11.3     9938414
  • Year - index
  • Country - Key
  • GDP, imports, Exports, Pop - Measurament

tsibble objects

head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State Purpose  Trips
##     <qtr> <chr>    <chr> <chr>    <dbl>
## 1 1998 Q1 Adelaide SA    Business  135.
## 2 1998 Q2 Adelaide SA    Business  110.
## 3 1998 Q3 Adelaide SA    Business  166.
## 4 1998 Q4 Adelaide SA    Business  127.
## 5 1999 Q1 Adelaide SA    Business  137.
## 6 1999 Q2 Adelaide SA    Business  200.
  • Quarter - index
  • Region, State, Purpose - Key
  • Trips - Measurament

tsibble objects

  • A tsibble allows storage and manipulation of multiple time series in R.

  • It contains:

    • An index: time information about the observation
    • Measured variable(s): numbers of interest
    • Key variable(s): optional unique identifiers for each series
  • It works with tidyverse functions.

tsibble index

mydata <- tsibble(
    year = 2012:2016,
    y = c(123, 39, 78, 52, 110),
    index = year
) 
mydata
## # A tsibble: 5 x 2 [1Y]
##    year     y
##   <int> <dbl>
## 1  2012   123
## 2  2013    39
## 3  2014    78
## 4  2015    52
## 5  2016   110

tsibble index (or)

mydata <- tibble(
    year = 2012:2016,
    y = c(123, 39, 78, 52, 110)
) %>%
  as_tsibble(index = year)
mydata
## # A tsibble: 5 x 2 [1Y]
##    year     y
##   <int> <dbl>
## 1  2012   123
## 2  2013    39
## 3  2014    78
## 4  2015    52
## 5  2016   110

tsibble index

  • For observations more frequent than once per year, we need to use a time class function on the index
z
## # A tibble: 5 × 2
##   Month    Observation
##   <chr>          <dbl>
## 1 2019 Jan          50
## 2 2019 Feb          23
## 3 2019 Mar          34
## 4 2019 Apr          30
## 5 2019 May          25

tsibble index

z %>%
  mutate(Month = yearmonth(Month)) %>%
  as_tsibble(index = Month)
## # A tsibble: 5 x 2 [1M]
##      Month Observation
##      <mth>       <dbl>
## 1 2019 Jan          50
## 2 2019 Feb          23
## 3 2019 Mar          34
## 4 2019 Apr          30
## 5 2019 May          25
  • First, the Month column is being converted from text to a monthly time object with yearmonth().
  • We then convert the data frame to a tsibble by identifying the index variable using as_tsibble().
  • Note the addition of “[1M]” on the first line indicating this is monthly data.

The tsibble index

Common time index variables can be created with these functions:

Frequency Function
Annual start:end
Quarterly yearquarter()
Monthly yearmonth()
Weekly yearweek()
Daily as_date(), ymd()
Sub-daily as_datetime()

Read a csv file and convert to a tsibble

prison <- readr::read_csv("~/Dropbox/Education/Teaching/GCSU/Forecasting/Forecasting_GITHub/ETC3550Slides-fable2022/data/prison_population.csv")
## # A tibble: 3,072 × 6
##    date       state gender legal     indigenous count
##    <date>     <chr> <chr>  <chr>     <chr>      <dbl>
##  1 2005-03-01 ACT   Female Remanded  ATSI           0
##  2 2005-03-01 ACT   Female Remanded  Other          2
##  3 2005-03-01 ACT   Female Sentenced ATSI           0
##  4 2005-03-01 ACT   Female Sentenced Other          0
##  5 2005-03-01 ACT   Male   Remanded  ATSI           7
##  6 2005-03-01 ACT   Male   Remanded  Other         58
##  7 2005-03-01 ACT   Male   Sentenced ATSI           0
##  8 2005-03-01 ACT   Male   Sentenced Other          0
##  9 2005-03-01 NSW   Female Remanded  ATSI          51
## 10 2005-03-01 NSW   Female Remanded  Other        131
## # … with 3,062 more rows

Read a csv file and convert to a tsibble

prison <- readr::read_csv("~/Dropbox/Education/Teaching/GCSU/Forecasting/Forecasting_GITHub/ETC3550Slides-fable2022/data/prison_population.csv") %>%
  mutate(Quarter = yearquarter(date))
## # A tibble: 3,072 × 7
##    date       state gender legal     indigenous count Quarter
##    <date>     <chr> <chr>  <chr>     <chr>      <dbl>   <qtr>
##  1 2005-03-01 ACT   Female Remanded  ATSI           0 2005 Q1
##  2 2005-03-01 ACT   Female Remanded  Other          2 2005 Q1
##  3 2005-03-01 ACT   Female Sentenced ATSI           0 2005 Q1
##  4 2005-03-01 ACT   Female Sentenced Other          0 2005 Q1
##  5 2005-03-01 ACT   Male   Remanded  ATSI           7 2005 Q1
##  6 2005-03-01 ACT   Male   Remanded  Other         58 2005 Q1
##  7 2005-03-01 ACT   Male   Sentenced ATSI           0 2005 Q1
##  8 2005-03-01 ACT   Male   Sentenced Other          0 2005 Q1
##  9 2005-03-01 NSW   Female Remanded  ATSI          51 2005 Q1
## 10 2005-03-01 NSW   Female Remanded  Other        131 2005 Q1
## # … with 3,062 more rows

Read a csv file and convert to a tsibble

prison <- readr::read_csv("~/Dropbox/Education/Teaching/GCSU/Forecasting/Forecasting_GITHub/ETC3550Slides-fable2022/data/prison_population.csv") %>%
  mutate(Quarter = yearquarter(date)) %>%
  select(-date)
## # A tibble: 3,072 × 6
##    state gender legal     indigenous count Quarter
##    <chr> <chr>  <chr>     <chr>      <dbl>   <qtr>
##  1 ACT   Female Remanded  ATSI           0 2005 Q1
##  2 ACT   Female Remanded  Other          2 2005 Q1
##  3 ACT   Female Sentenced ATSI           0 2005 Q1
##  4 ACT   Female Sentenced Other          0 2005 Q1
##  5 ACT   Male   Remanded  ATSI           7 2005 Q1
##  6 ACT   Male   Remanded  Other         58 2005 Q1
##  7 ACT   Male   Sentenced ATSI           0 2005 Q1
##  8 ACT   Male   Sentenced Other          0 2005 Q1
##  9 NSW   Female Remanded  ATSI          51 2005 Q1
## 10 NSW   Female Remanded  Other        131 2005 Q1
## # … with 3,062 more rows

Read a csv file and convert to a tsibble

prison <- readr::read_csv("~/Dropbox/Education/Teaching/GCSU/Forecasting/Forecasting_GITHub/ETC3550Slides-fable2022/data/prison_population.csv") %>%
  mutate(Quarter = yearquarter(date)) %>%
  select(-date) %>%
  as_tsibble(
    index = Quarter,
    key = c(state, gender, legal, indigenous)
  )
## # A tsibble: 3,072 x 6 [1Q]
## # Key:       state, gender, legal, indigenous [64]
##    state gender legal    indigenous count Quarter
##    <chr> <chr>  <chr>    <chr>      <dbl>   <qtr>
##  1 ACT   Female Remanded ATSI           0 2005 Q1
##  2 ACT   Female Remanded ATSI           1 2005 Q2
##  3 ACT   Female Remanded ATSI           0 2005 Q3
##  4 ACT   Female Remanded ATSI           0 2005 Q4
##  5 ACT   Female Remanded ATSI           1 2006 Q1
##  6 ACT   Female Remanded ATSI           1 2006 Q2
##  7 ACT   Female Remanded ATSI           1 2006 Q3
##  8 ACT   Female Remanded ATSI           0 2006 Q4
##  9 ACT   Female Remanded ATSI           0 2007 Q1
## 10 ACT   Female Remanded ATSI           1 2007 Q2
## # … with 3,062 more rows

Read a csv file and convert to a tsibble - Distinct

The distinct() function can be used to show the key variables or even combinations of it:

prison%>% distinct(gender, legal,indigenous)
## # A tibble: 8 × 3
##   gender legal     indigenous
##   <chr>  <chr>     <chr>     
## 1 Female Remanded  ATSI      
## 2 Female Remanded  Other     
## 3 Female Sentenced ATSI      
## 4 Female Sentenced Other     
## 5 Male   Remanded  ATSI      
## 6 Male   Remanded  Other     
## 7 Male   Sentenced ATSI      
## 8 Male   Sentenced Other

Working with tsibble objects

Example: Australian pharmaceutical sales

-The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.

  • Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs.
  • The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP.
  • The total cost is budgeted based on forecasts of drug usage.
  • Costs are disaggregated by drug type (ATC1 x15 / ATC2 84), concession category (x2) and patient type (x2), giving \(84\times2\times2=336\) time series.

Example: Australian pharmaceutical sales

PBS
## # A tsibble: 67,596 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [336]
##       Month Concession  Type  ATC1  ATC1_desc ATC2  ATC2_desc Scripts  Cost
##       <mth> <chr>       <chr> <chr> <chr>     <chr> <chr>       <dbl> <dbl>
##  1 1991 Jul Concession… Co-p… A     Alimenta… A01   STOMATOL…   18228 67877
##  2 1991 Aug Concession… Co-p… A     Alimenta… A01   STOMATOL…   15327 57011
##  3 1991 Sep Concession… Co-p… A     Alimenta… A01   STOMATOL…   14775 55020
##  4 1991 Oct Concession… Co-p… A     Alimenta… A01   STOMATOL…   15380 57222
##  5 1991 Nov Concession… Co-p… A     Alimenta… A01   STOMATOL…   14371 52120
##  6 1991 Dec Concession… Co-p… A     Alimenta… A01   STOMATOL…   15028 54299
##  7 1992 Jan Concession… Co-p… A     Alimenta… A01   STOMATOL…   11040 39753
##  8 1992 Feb Concession… Co-p… A     Alimenta… A01   STOMATOL…   15165 54405
##  9 1992 Mar Concession… Co-p… A     Alimenta… A01   STOMATOL…   16898 61108
## 10 1992 Apr Concession… Co-p… A     Alimenta… A01   STOMATOL…   18141 65356
## # … with 67,586 more rows

Working with tsibble objects

We can use the filter() function to select rows.

PBS %>%
  filter(ATC2 == "A10")
## # A tsibble: 816 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [4]
##       Month Concession Type  ATC1  ATC1_desc ATC2  ATC2_desc Scripts   Cost
##       <mth> <chr>      <chr> <chr> <chr>     <chr> <chr>       <dbl>  <dbl>
##  1 1991 Jul Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   89733 2.09e6
##  2 1991 Aug Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   77101 1.80e6
##  3 1991 Sep Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   76255 1.78e6
##  4 1991 Oct Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   78681 1.85e6
##  5 1991 Nov Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   70554 1.69e6
##  6 1991 Dec Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   75814 1.84e6
##  7 1992 Jan Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   64186 1.56e6
##  8 1992 Feb Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   75899 1.73e6
##  9 1992 Mar Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   89445 2.05e6
## 10 1992 Apr Concessio… Co-p… A     Alimenta… A10   ANTIDIAB…   97315 2.23e6
## # … with 806 more rows

Working with tsibble objects

We can use the select() function to select columns.

PBS %>%
  filter(ATC2 == "A10") %>%
  select(Month, Concession, Type, Cost)
## # A tsibble: 816 x 4 [1M]
## # Key:       Concession, Type [4]
##       Month Concession   Type           Cost
##       <mth> <chr>        <chr>         <dbl>
##  1 1991 Jul Concessional Co-payments 2092878
##  2 1991 Aug Concessional Co-payments 1795733
##  3 1991 Sep Concessional Co-payments 1777231
##  4 1991 Oct Concessional Co-payments 1848507
##  5 1991 Nov Concessional Co-payments 1686458
##  6 1991 Dec Concessional Co-payments 1843079
##  7 1992 Jan Concessional Co-payments 1564702
##  8 1992 Feb Concessional Co-payments 1732508
##  9 1992 Mar Concessional Co-payments 2046102
## 10 1992 Apr Concessional Co-payments 2225977
## # … with 806 more rows

Working with tsibble objects

We can use the summarise() function to summarise over keys (regardless of Concession or Type)

PBS %>%
  filter(ATC2 == "A10") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(total_cost = sum(Cost))
## # A tsibble: 204 x 2 [1M]
##       Month total_cost
##       <mth>      <dbl>
##  1 1991 Jul    3526591
##  2 1991 Aug    3180891
##  3 1991 Sep    3252221
##  4 1991 Oct    3611003
##  5 1991 Nov    3565869
##  6 1991 Dec    4306371
##  7 1992 Jan    5088335
##  8 1992 Feb    2814520
##  9 1992 Mar    2985811
## 10 1992 Apr    3204780
## # … with 194 more rows

Working with tsibble objects

We can use the mutate() function to create new variables.

PBS %>%
  filter(ATC2 == "A10") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(total_cost = sum(Cost)) %>%
  mutate(total_cost = total_cost / 1e6)
## # A tsibble: 204 x 2 [1M]
##       Month total_cost
##       <mth>      <dbl>
##  1 1991 Jul       3.53
##  2 1991 Aug       3.18
##  3 1991 Sep       3.25
##  4 1991 Oct       3.61
##  5 1991 Nov       3.57
##  6 1991 Dec       4.31
##  7 1992 Jan       5.09
##  8 1992 Feb       2.81
##  9 1992 Mar       2.99
## 10 1992 Apr       3.20
## # … with 194 more rows

Working with tsibble objects

We can use the mutate() function to create new variables.

PBS %>%
  filter(ATC2 == "A10") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(total_cost = sum(Cost)) %>%
  mutate(total_cost = total_cost / 1e6) -> a10
## # A tsibble: 204 x 2 [1M]
##       Month total_cost
##       <mth>      <dbl>
##  1 1991 Jul       3.53
##  2 1991 Aug       3.18
##  3 1991 Sep       3.25
##  4 1991 Oct       3.61
##  5 1991 Nov       3.57
##  6 1991 Dec       4.31
##  7 1992 Jan       5.09
##  8 1992 Feb       2.81
##  9 1992 Mar       2.99
## 10 1992 Apr       3.20
## # … with 194 more rows

Time Plots

  • For time series data, the obvious graph to start with is a time plot.

  • We will be using autoplot() - a function based on ggplot2 for time series.

  • It automatically produces an appropriate plot of whatever you pass to it in the first argument.

Time Plots

a10 %>%autoplot(total_cost)

Time Plots

a10 %>%
  autoplot(total_cost)+
  labs(y = "$ (millions)",
       title = "Australian antidiabetic drug sales")

Times series patterns

Trend - pattern exists when there is a long-term increase or decrease in the data (does not have to be linear)

Seasonal - pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). It is always fixed and known period.

Cyclic - pattern exists when data exhibit rises and falls that are (duration usually of at least 2 years). Usually reflect economic conditions or “business cycle”

Time series components

Differences between seasonal and cyclic patterns:

  • seasonal pattern constant length; cyclic pattern variable length
  • average length of cycle longer than length of seasonal pattern
  • magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

Time series patterns

aus_production %>%
  filter(year(Quarter) >= 1980) %>%
  autoplot(Electricity) +
  labs(y = "GWh", title = "Australian electricity production")

Time series patterns

aus_production %>%
  autoplot(Bricks) +
  labs(y = "million units", title = "Australian clay brick production")

Time series patterns

us_employment %>%
  filter(Title == "Retail Trade", year(Month) >= 1980) %>%
  autoplot(Employed / 1e3) +
  labs(y = "Million people", title = "Retail employment, USA")

Time series patterns

gafa_stock %>%
  filter(Symbol == "AMZN", year(Date) >= 2018) %>%
  autoplot(Close) +
  labs(y = "$US", title = "Amazon closing stock price")

Time series patterns

pelt %>%
  autoplot(Lynx) +
  labs(y="Number trapped", title = "Annual Canadian Lynx Trappings")

Seasonal Plots

a10 %>% gg_season(total_cost, labels = "both") +
  labs(y = "$ million", title = "Seasonal plot: antidiabetic drug sales")

Seasonal plots

  • Data plotted against the individual seasons in which the data were observed. (In this case a season is a month.)
  • Something like a time plot except that the data from each season are overlapped.
  • Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified.
  • In R: gg_season()

Seasonal plots

tsibbledata::aus_retail %>%
  filter(State == "Victoria", Industry == "Cafes, restaurants and catering services") %>%
  gg_season(Turnover)

Multiple seasonal periods

  • Where the data has more than one seasonal pattern, the period argument can be used to select which seasonal plot is required.

  • The vic_elec data contains half-hourly electricity demand for the state of Victoria, Australia. We can plot the daily pattern, weekly pattern or yearly pattern by specifying the period.

Multiple seasonal periods - Yearly

vic_elec %>% gg_season(Demand)

Multiple seasonal periods - Weekly

vic_elec %>% gg_season(Demand, period = "week")

Multiple seasonal periods - Daily

vic_elec %>% gg_season(Demand, period = "day")

Seasonal subseries plots

  • An alternative plot is to focus on seasonal separated by mini time plots
a10 %>%gg_subseries(total_cost) +
  labs(y = "$ million", title = "Subseries plot: antidiabetic drug sales")

Seasonal subseries plots

  • The blue horizontal lines indicate the means for each month.

  • This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time.

  • It is useful in identifying changes within particular seasons.

Example - Australia holidays

  • We will get the total visitor nights spent on Holiday by State for each quarter
holidays <- tourism %>%
  filter(Purpose == "Holiday") %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips))

holidays
## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    State Quarter Trips
##    <chr>   <qtr> <dbl>
##  1 ACT   1998 Q1  196.
##  2 ACT   1998 Q2  127.
##  3 ACT   1998 Q3  111.
##  4 ACT   1998 Q4  170.
##  5 ACT   1999 Q1  108.
##  6 ACT   1999 Q2  125.
##  7 ACT   1999 Q3  178.
##  8 ACT   1999 Q4  218.
##  9 ACT   2000 Q1  158.
## 10 ACT   2000 Q2  155.
## # … with 630 more rows

Example - Australia holidays

holidays %>% autoplot(Trips) +
  labs(y = "thousands of trips", title = "Australian domestic holiday nights")

Example - Australia holidays

  • To see the timing of the seasonal peaks in each state, we can use a season plot
holidays %>% gg_season(Trips) +
  labs(y = "thousands of trips", title = "Australian domestic holiday nights")

Example - Australia holidays

holidays %>%
  gg_subseries(Trips) +
  labs(y = "thousands of trips", title = "Australian domestic holiday nights")

  • It is clear that (WA) Western Australian tourism has jumped in recent years, while (VIC) Victorian tourism has increased in Q1 and Q4 but not in the middle of the year.

Scatterplots

  • The graphs discussed so far are useful for visualizing individual time series.

  • However, sometimes it is also useful to explore relationships between time series.

  • Let’s look again at the half-hourly electricity demand (GW) and temperature (Celsius), for 2014 in Victoria, Australia - vic_elec

Scatterplots - Demand

vic_elec %>%
  filter(year(Time) == 2014) %>%
  autoplot(Demand) +
  labs(y = "GW", title = "Half-hourly electricity demand: Victoria")

Scatterplots - Temperature

vic_elec %>%
  filter(year(Time) == 2014) %>%
  autoplot(Temperature) +
  labs(y = "Degrees Celsius",title = "Half-hourly temperatures: Melbourne, Australia")

Scatterplots

vic_elec %>% filter(year(Time) == 2014) %>%
  ggplot(aes(x = Temperature, y = Demand)) +
  geom_point() + labs(x = "Temperature (degrees Celsius)", y = "Electricity demand (GW)")

Scatterplots

  • The previous scatterplot helps us to visualize the relationship between electricity demand and temperature.

  • It is clear that high demand occurs when temperatures are high due to the effect of air-conditioning.

  • But there is also a heating effect, where demand increases for very low temperatures.

Correlation

  • It is common to compute correlation coefficients to measure the strength of the linear relationship between two variables.

  • The correlation between variables x and y is given by: \[ r = \frac{\sum(x_{t}-\bar{x})(y_{t}-\bar{y})}{\sqrt{\sum(x_{t}-\bar{x})^{2}}\sqrt{\sum(y_{t}-\bar{y})^{2}}} \]

  • The value of r always lies between \(-1\) and \(1\)

Correlation

  • The correlation coefficient only measures the strength of the linear relationship between two variables, and can sometimes be misleading.

  • The correlation between Temperature and Electricity Demand is only:

## [1] 0.28
  • The two variables have a very strong non-linear relationship !!!

  • Always look at the plots of the data and not simply rely on correlation values !!!

Lag plots and autocorrelation

new_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
new_production
## # A tsibble: 74 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1992 Q1   443    5777    383   1289       38332   117
##  2 1992 Q2   410    5853    404   1501       39774   151
##  3 1992 Q3   420    6416    446   1539       42246   175
##  4 1992 Q4   532    5825    420   1568       38498   129
##  5 1993 Q1   433    5724    394   1450       39460   116
##  6 1993 Q2   421    6036    462   1668       41356   149
##  7 1993 Q3   410    6570    475   1648       42949   163
##  8 1993 Q4   512    5675    443   1863       40974   138
##  9 1994 Q1   449    5311    421   1468       40162   127
## 10 1994 Q2   381    5717    475   1755       41199   159
## # … with 64 more rows

Lag plots and autocorrelation

  • We also can have a scatterplot of a given variable by looking the relationship between different time horizon (lags)
new_production %>% gg_lag(Beer)

Lag plots and autocorrelation

new_production %>% gg_lag(Beer, geom='point')

Lag plots and autocorrelation

  • Each graph shows \(y_{t}\) plotted against \(y_{t-k}\) for different values of \(k\)
  • The autocorrelations are the correlations associated with these scatterplots.
  • ACF (autocorrelation function):
    • \(r_1=\text{Correlation}(y_{t}, y_{t-1})\)
    • \(r_2=\text{Correlation}(y_{t}, y_{t-2})\)
    • \(r_3=\text{Correlation}(y_{t}, y_{t-3})\)
    • etc.

Autocorrelation

Correlation: measure extent of linear relationship between two variables (\(y\) and \(X\)).

Autocorrelation \(r_k\): measure linear relationship between lagged values of a time series \(y\). \[ r_k= \frac{\sum_{t=k+1}^T (y_t-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^T(y_t-\bar{y})^2} \] - where \(T\) is the length of the time series - The autocorrelation coefficients make up the autocorrelation function or ACF.

Autocorrelation

  • \(r_1\) indicates how successive values of \(y\) relate to each other
  • \(r_2\) indicates how \(y\) values two periods apart relate to each other
  • \(r_k\) is the same as the sample correlation between \(y_t\) and \(y_{t-k}\).

ACF

Results for first 9 lags for beer data:

new_production %>% ACF(Beer, lag_max = 9)
## # A tsibble: 9 x 2 [1Q]
##     lag     acf
##   <lag>   <dbl>
## 1    1Q -0.102 
## 2    2Q -0.657 
## 3    3Q -0.0603
## 4    4Q  0.869 
## 5    5Q -0.0892
## 6    6Q -0.635 
## 7    7Q -0.0542
## 8    8Q  0.832 
## 9    9Q -0.108

ACF

new_production %>% ACF(Beer, lag_max = 9) %>% autoplot() + labs(title="Australian beer production")

  • Together, the autocorrelations at lags 1, 2\(, \dots\), 9, make up the autocorrelation or ACF.
  • The plot is known as a correlogram

ACF

new_production %>% ACF(Beer, lag_max = 9) %>% autoplot() + labs(title="Australian beer production")

  • \(r_{4}\) higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart.

  • \(r_2\) is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.

Trend and seasonality in ACF plots

  • When data have a trend, the autocorrelations for small lags tend to be large and positive that slowly decrease as the lags increase.

  • When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency).

  • When data are trended and seasonal, you see a combination of these effects.

US retail trade employment

retail <- us_employment %>%
  filter(Title == "Retail Trade", year(Month) >= 1980)
retail %>% autoplot(Employed)

US retail trade employment

retail %>%
  ACF(Employed, lag_max = 48) %>%
  autoplot()

Which is which?

  • #1-B; 2-A, 3-D, 4-C

White noise

  • White noise data is uncorrelated across time with zero mean and constant variance. (Technically, we require independence as well.)
set.seed(30)
wn <- tsibble(t = 1:50, y = rnorm(50), index = t); wn %>% autoplot(y)

White noise

wn %>% ACF(y)
\(r_{1}\) \(r_{2}\) \(r_{3}\) \(r_{4}\) \(r_{5}\) \(r_{6}\) \(r_{7}\) \(r_{8}\) \(r_{9}\) \(r_{10}\)
0.014 -0.163 0.163 -0.259 -0.198 0.064 -0.139 -0.032 0.199 -0.024

  • Sample autocorrelations for white noise series.
  • Expect each autocorrelation to be close to zero.
  • Blue lines show 95% critical values.

Sampling distribution of autocorrelations

Sampling distribution of \(r_k\) for white noise data is asymptotically N(0,\(1/T\)).

  • 95% of all \(r_k\) for white noise must lie within \(\pm 1.96/\sqrt{T}\).
  • If this is not the case, the series is probably not WN.
  • Common to plot lines at \(\pm 1.96/\sqrt{T}\) when plotting ACF.

These are the critical values from the confidence interval.

  • If one or more large spikes are outside these bounds() more than 5% of spikes are outside these bounds), then the series is probably not white noise.

Example: Pigs slaughtered

pigs <- aus_livestock %>%
  filter(State == "Victoria", Animal == "Pigs", year(Month) >= 2014)
pigs %>% autoplot(Count/1e3) +
  labs(y = "Thousands", title = "Number of pigs slaughtered in Victoria")

Example: Pigs slaughtered

pigs %>% ACF(Count) %>% autoplot()

Example: Pigs slaughtered

Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)

  • Difficult to detect pattern in time plot.
  • ACF shows significant autocorrelation for lag 2 and 12.
  • Indicate some slight seasonality.

These show the series is not a white noise series.