DATA 624_Homework2

WEEK3 HOMEWORK

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

PROBLEM 3.1

Consider the GDP information in global_economy.

Plot the GDP per capita for each country over time. Which country has the highest GDP per capita?
How has this changed over time?.

Highest GDP seems to be Luxemborg for 2017.

Per Wiki and investopedia,we can see some changes on the plot reflect real events.

In 1973, oil production and revenues increased and Qatar had high per capita income. Qatar’s economy was in a downturn from 1982 to 1989. OPEC (Organization of Petroleum Exporting Countries) quotas on crude oil production. We can see that on the plot.

From 2010 to 2015, its China GDP growth rate slowed to 7%. Chinese manufacturing declined to its lowest level in three years, with the nation’s purchasing manager’s index falling to 49.7 in August 2015.

library(tidyverse)
library(dplyr)
# install.packages("ggplot2")
# install.packages("ggfortify")
library(ggplot2)
library(ggfortify)
library(tsibble)
library(tsibbledata)
#install.packages("feasts")
library(feasts)
library ("fpp3")
global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE)  +
  labs(title= "GDP per capita", y = "GDP per capita")

#?global_economy
# str(global_economy)
unique(global_economy$Year)

##  [1] 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974
## [16] 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
## [31] 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
## [46] 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

gp2011 <- global_economy %>%
  mutate(GDPpercp = GDP/Population) %>% group_by(Country) %>% arrange(desc(GDPpercp)) %>%  filter(Year >= 1990, Code %in% c("MAC","USA", "LUX","SGP","QAT","DEU")) 

gp2017 <- global_economy %>%
  mutate(GDPpercp = GDP/Population) %>% group_by(Country) %>% arrange(desc(GDPpercp)) %>%  filter(Year == 2017) 


gp1960 <- global_economy %>%
  mutate(GDPpercp = GDP/Population) %>% group_by(Country) %>% arrange(desc(GDPpercp)) %>%  filter(between(Year,1960, 1990), Code %in% c("MAC","USA", "LUX","SGP","QAT","DEU")) 


head(gp2011,25)

## # A tsibble: 25 x 10 [1Y]
## # Key:       Country [3]
## # Groups:    Country [3]
##    Country  Code   Year     GDP Growth   CPI Imports Exports Population GDPpercp
##    <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>    <dbl>
##  1 Luxembo… LUX    2014 6.63e10  5.77  109.     174.    208.     556319  119225.
##  2 Luxembo… LUX    2011 6.00e10  2.54  103.     145.    178.     518347  115762.
##  3 Luxembo… LUX    2008 5.58e10 -1.28   97.4    156.    187.     488650  114294.
##  4 Luxembo… LUX    2013 6.17e10  3.65  108.     159.    191.     543360  113625.
##  5 Luxembo… LUX    2012 5.67e10 -0.353 106.     155.    186.     530946  106749.
##  6 Luxembo… LUX    2007 5.09e10  8.35   94.2    150.    183.     479993  106018.
##  7 Luxembo… LUX    2010 5.32e10  4.86  100      142.    175.     506953  104965.
##  8 Luxembo… LUX    2017 6.24e10  2.30  111.     194.    230.     599449  104103.
##  9 Luxembo… LUX    2009 5.14e10 -4.36   97.8    132.    164.     497783  103199.
## 10 Luxembo… LUX    2015 5.78e10  2.86  109.     187.    223.     569604  101447.
## # ℹ 15 more rows

head(gp2017,25)

## # A tsibble: 25 x 10 [1Y]
## # Key:       Country [25]
## # Groups:    Country [25]
##    Country  Code   Year     GDP Growth   CPI Imports Exports Population GDPpercp
##    <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>    <dbl>
##  1 Luxembo… LUX    2017 6.24e10   2.30 111.    194.    230.      599449  104103.
##  2 Macao S… MAC    2017 5.04e10   9.10 136.     32.0    79.4     622567   80893.
##  3 Switzer… CHE    2017 6.79e11   1.09  98.3    53.9    65.0    8466017   80190.
##  4 Norway   NOR    2017 3.99e11   1.92 115.     33.1    35.5    5282223   75505.
##  5 Iceland  ISL    2017 2.39e10   3.64 122.     42.8    47.0     341284   70057.
##  6 Ireland  IRL    2017 3.34e11   7.80 105.     87.9   120.     4813608   69331.
##  7 Qatar    QAT    2017 1.67e11   1.58 116.     37.3    51.0    2639211   63249.
##  8 United … USA    2017 1.94e13   2.27 112.     NA      NA    325719178   59532.
##  9 North A… NAC    2017 2.10e13   2.35  NA      NA      NA    362492702   58070.
## 10 Singapo… SGP    2017 3.24e11   3.62 113.    149.    173.     5612253   57714.
## # ℹ 15 more rows

gp2011 |>
  autoplot(GDPpercp)

gp1960 |>
  autoplot(GDPpercp)

PROBLEM 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
The best transformation would be to calculate GDP per capita for the US, therefore we have shown the plot for GDP per capita as a population adjustment.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

A good value of lamda is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler.

Victorian Electricity Demand from vic_elec.

For Electricity demand we can convert the tsibble to look at demand daily and monthly

Gas production from aus_production.

For the Gas dataset, the -1 lambda has shown some variation across the trend on the plot.

# install.packages("MASS")
# install.packages("Guerrero")

usgdp <- global_economy %>%
  mutate(GDPpercp = GDP/Population) %>% group_by(Country) %>% arrange(desc(GDPpercp)) %>%  filter(Code %in% c("USA")) 

usgdp |>
  autoplot(GDPpercp)

usgdp1 <- global_economy %>% filter(Code %in% c("USA")) 

usgdp1 |>
  autoplot(GDP)

library("MASS")
library("feasts")
library(dplyr)


aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot() + ggtitle("Australian livestock timeseries")

lambda <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  features(Count,features = guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] -0.04461887

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot() + ggtitle("Australian livestock timeseries")

aus_livestock %>% filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>% autoplot(box_cox(Count,lambda))+
   labs(y = "",
       title = "Transformed livestock dataset",
         round(lambda,2))

v1 <- vic_elec %>%
  group_by(Date) %>%
  mutate(Demand = sum(Demand)) %>%
  distinct(Date, Demand)

v1 %>% 
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Daily Electricity Demand", y = "MWH ")

v2 <- v1 %>%
  mutate(Date = yearmonth(Date)) %>%
  group_by(Date) %>%
  summarise(Demand = sum(Demand)) %>%
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Monthly Electricity Demand", y = "MWH")
head(v1)

## # A tibble: 6 × 2
## # Groups:   Date [6]
##   Date        Demand
##   <date>       <dbl>
## 1 2012-01-01 222438.
## 2 2012-01-02 257965.
## 3 2012-01-03 267099.
## 4 2012-01-04 222742.
## 5 2012-01-05 210585.
## 6 2012-01-06 210247.

head(v2)

## $data
## # A tsibble: 36 x 2 [1M]
##        Date   Demand
##       <mth>    <dbl>
##  1 2012 Jan 7241049.
##  2 2012 Feb 6874543.
##  3 2012 Mar 6746560.
##  4 2012 Apr 6401078.
##  5 2012 May 7375177.
##  6 2012 Jun 7388456.
##  7 2012 Jul 7568114.
##  8 2012 Aug 7492261.
##  9 2012 Sep 6567196.
## 10 2012 Oct 6680705.
## # ℹ 26 more rows
## 
## $layers
## $layers[[1]]
## geom_line: na.rm = FALSE, orientation = NA
## stat_identity: na.rm = FALSE
## position_identity 
## 
## 
## $scales
## <ggproto object: Class ScalesList, gg>
##     add: function
##     add_defaults: function
##     add_missing: function
##     backtransform_df: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     map_df: function
##     n: function
##     non_position_scales: function
##     scales: list
##     train_df: function
##     transform_df: function
##     super:  <ggproto object: Class ScalesList, gg>
## 
## $guides
## <Guides[0] ggproto object>
## 
## <empty>
## 
## $mapping
## Aesthetic mapping: 
## * `x` -> `Date`
## * `y` -> `Demand`
## 
## $theme
## list()

aus_production %>%
  autoplot(Gas) +
  labs(title = "Gas Production")

lambda1 <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

lambda1

## [1] 0.1095171

aus_production %>%
  autoplot(box_cox(Gas, lambda1)) +
  labs(y = "", title = "Transformed Gas Production ",
         round(lambda1,2))

PROBLEM 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Since lambda is very close to 0, there is not much change on the plot.

All series have been adjusted for inflation

library(fpp3)
canadian_gas

## # A tsibble: 542 x 2 [1M]
##       Month Volume
##       <mth>  <dbl>
##  1 1960 Jan  1.43 
##  2 1960 Feb  1.31 
##  3 1960 Mar  1.40 
##  4 1960 Apr  1.17 
##  5 1960 May  1.12 
##  6 1960 Jun  1.01 
##  7 1960 Jul  0.966
##  8 1960 Aug  0.977
##  9 1960 Sep  1.03 
## 10 1960 Oct  1.25 
## # ℹ 532 more rows

lambdaa <- canadian_gas %>%
  features(Volume,features = guerrero) %>%
  pull(lambda_guerrero)
  
lambdaa

## [1] 0.5767648

canadian_gas %>%
  autoplot() + labs(title = "Monthly Canadian gas production")

canadian_gas %>%
  autoplot(box_cox(Volume,lambdaa))

PROBLEM 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

We have used box-cox and guerrero method, again lamda is closer to 0, therefore the plot is relatively same.

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

autoplot(myseries, Turnover)+
  labs(title = "Retail Turnover", y = "$AUD (in millions)")

lambda3a <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda3a)) +
  labs(y = "", title = "Transformed Retail Turnover ",
         round(lambda3a,2))

PROBLEM 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

Tobacco from aus_production,
Economy class passengers between Melbourne and Sydney from ansett, and
Pedestrian counts at Southern Cross Station from pedestrian.

When Lambda is close to 1, the shape of the data doesn’t vary by much in the case of the Tobacco dataset.This is series is not steadily increasing with level therefore, box cox value is not going to change the shape by much.

autoplot(aus_production, Tobacco)+
  labs(title = "Tobacco and Cigarette Production in Tonnes")

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.9264636

lambda1<--1

aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "", title = "Transformed Tobacco Production ",
         round(lambda,2))

aus_production %>%
  autoplot(box_cox(Tobacco, lambda1)) +
  labs(y = "", title = "Transformed Tobacco Production ",
         round(lambda1,2))

lambda3 <- ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  features(Passengers,features = guerrero) %>%
  pull(lambda_guerrero)

lambda3

## [1] 1.999927

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(Passengers) + labs(title = "Passengers on Ansett Flights")

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(box_cox(Passengers,lambda3)) + labs(title = "Transformed data")

head(ansett)

## # A tsibble: 6 x 4 [1W]
## # Key:       Airports, Class [1]
##       Week Airports Class    Passengers
##     <week> <chr>    <chr>         <dbl>
## 1 1989 W28 ADL-PER  Business        193
## 2 1989 W29 ADL-PER  Business        254
## 3 1989 W30 ADL-PER  Business        185
## 4 1989 W31 ADL-PER  Business        254
## 5 1989 W32 ADL-PER  Business        191
## 6 1989 W33 ADL-PER  Business        136

#hourly pedastrian counts
southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 

autoplot(southern_cross, Count)+
  labs(title = "Hourly Pedestrian Counts ")

head(southern_cross)

## # A tsibble: 6 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [1]
##   Sensor                 Date_Time           Date        Time Count
##   <chr>                  <dttm>              <date>     <int> <int>
## 1 Southern Cross Station 2015-01-01 00:00:00 2015-01-01     0   746
## 2 Southern Cross Station 2015-01-01 01:00:00 2015-01-01     1   312
## 3 Southern Cross Station 2015-01-01 02:00:00 2015-01-01     2   180
## 4 Southern Cross Station 2015-01-01 03:00:00 2015-01-01     3   133
## 5 Southern Cross Station 2015-01-01 04:00:00 2015-01-01     4    44
## 6 Southern Cross Station 2015-01-01 05:00:00 2015-01-01     5    16

lambda7 <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

lambda7

## [1] -0.2501616

southern_cross %>%
  autoplot(box_cox(Count, lambda7)) +
  labs(y = "", title = "Transformed Daily Pedestrian Counts with ",
         round(lambda7,2))

# daily series
southern_cross1 <- southern_cross %>%
  index_by(Date) %>%
  summarise(Count = sum(Count))
head(southern_cross1)

## # A tsibble: 6 x 2 [1D]
##   Date       Count
##   <date>     <int>
## 1 2015-01-01  2813
## 2 2015-01-02  4648
## 3 2015-01-03  1428
## 4 2015-01-04  1347
## 5 2015-01-05 11483
## 6 2015-01-06 11513

autoplot(southern_cross1, Count)+
  labs(title = "Daily Pedestrian data")

lambda8 <- southern_cross1 %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

lambda8

## [1] 0.2726316

southern_cross1 %>%
  autoplot(box_cox(Count, lambda8)) +
  labs(y = "", title = "Transformed Daily Pedestrian data",
         round(lambda8,2))

#weekly
southern_cross2 <- southern_cross %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

head(southern_cross2)

## # A tsibble: 6 x 2 [1W]
##       Week Count
##     <week> <int>
## 1 2015 W01 10236
## 2 2015 W02 60134
## 3 2015 W03 71440
## 4 2015 W04 73393
## 5 2015 W05 62535
## 6 2015 W06 79154

autoplot(southern_cross2, Count)+
  labs(title = "Weekly Pedestrian ")

lambda9 <- southern_cross2 %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
lambda9

## [1] -0.1108714

southern_cross2 %>%
  autoplot(box_cox(Count, lambda9)) +
  labs(y = "", title = "Transformed Weekly Counts ",
         round(lambda9,2))

PROBLEM 3.6

Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

I tried to understand how to show the calculation , however, unsure if I got the calculation right below.

library ("fpp3")

# 7 term weights - Explanation
# 
# k=7 periods
# if k is 7 periods then m = (7*2)+ 1 = 15 (ie 15 MA)
# if m is 15,  then  series becomes 1/15{k periods before + yt + k periods after}
# since k = 7, we have to use the average of 7 periods before yt therefore we have 
# 
# 3 * 5-MA
# 5 MA means , m = 5, therefore k = m-1/2 = 4/2 so therefore we have 2 before and 2 after the y period. No we have another 3
# which means, m= 3, therefore k = 3-1/2 = 1 so therefore 1 before and 1 after the 5-MA

beer <- aus_production |>
  filter(year(Quarter) >= 1992) 

beer_ma1 <- beer |>
  mutate(
    `5-MA` = slider::slide_dbl(Beer, mean,
                .before = 2, .after = 2, .complete = TRUE),
    `3x5-MA` = slider::slide_dbl(`5-MA`, mean,
                .before = 1, .after = 1, .complete = TRUE))
head(beer_ma1, n=20)

## # A tsibble: 20 x 9 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas `5-MA` `3x5-MA`
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>  <dbl>    <dbl>
##  1 1992 Q1   443    5777    383   1289       38332   117    NA       NA 
##  2 1992 Q2   410    5853    404   1501       39774   151    NA       NA 
##  3 1992 Q3   420    6416    446   1539       42246   175   448.      NA 
##  4 1992 Q4   532    5825    420   1568       38498   129   443.     445.
##  5 1993 Q1   433    5724    394   1450       39460   116   443.     449.
##  6 1993 Q2   421    6036    462   1668       41356   149   462.     450.
##  7 1993 Q3   410    6570    475   1648       42949   163   445      447.
##  8 1993 Q4   512    5675    443   1863       40974   138   435.     438.
##  9 1994 Q1   449    5311    421   1468       40162   127   435      443.
## 10 1994 Q2   381    5717    475   1755       41199   159   459.     445.
## 11 1994 Q3   423    7000    497   1962       44095   184   442      445 
## 12 1994 Q4   531    6085    476   1833       41745   147   434.     439.
## 13 1995 Q1   426    4714    430   1626       41768   131   441.     445.
## 14 1995 Q2   408    3939    457   1703       43686   167   460.     446.
## 15 1995 Q3   416    6137    417   1733       46022   181   436.     442.
## 16 1995 Q4   520    4739    370   1545       42800   145   430.     431.
## 17 1996 Q1   409    4275    310   1526       43661   133   428.     435.
## 18 1996 Q2   398    5239    358   1593       44707   162   446.     434.
## 19 1996 Q3   398    6293    379   1706       46326   184   429.     434.
## 20 1996 Q4   507    5575    369   1699       43346   146   427.     428.

beer <- aus_production |>
  filter(year(Quarter) >= 1992) 

beer_ma2 <- beer |>
  mutate(
    `15-MA` = slider::slide_dbl(Beer, mean,
                .before = 7, .after = 7, .complete = TRUE))
    
head(beer_ma1, n=20)

## # A tsibble: 20 x 9 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas `5-MA` `3x5-MA`
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>  <dbl>    <dbl>
##  1 1992 Q1   443    5777    383   1289       38332   117    NA       NA 
##  2 1992 Q2   410    5853    404   1501       39774   151    NA       NA 
##  3 1992 Q3   420    6416    446   1539       42246   175   448.      NA 
##  4 1992 Q4   532    5825    420   1568       38498   129   443.     445.
##  5 1993 Q1   433    5724    394   1450       39460   116   443.     449.
##  6 1993 Q2   421    6036    462   1668       41356   149   462.     450.
##  7 1993 Q3   410    6570    475   1648       42949   163   445      447.
##  8 1993 Q4   512    5675    443   1863       40974   138   435.     438.
##  9 1994 Q1   449    5311    421   1468       40162   127   435      443.
## 10 1994 Q2   381    5717    475   1755       41199   159   459.     445.
## 11 1994 Q3   423    7000    497   1962       44095   184   442      445 
## 12 1994 Q4   531    6085    476   1833       41745   147   434.     439.
## 13 1995 Q1   426    4714    430   1626       41768   131   441.     445.
## 14 1995 Q2   408    3939    457   1703       43686   167   460.     446.
## 15 1995 Q3   416    6137    417   1733       46022   181   436.     442.
## 16 1995 Q4   520    4739    370   1545       42800   145   430.     431.
## 17 1996 Q1   409    4275    310   1526       43661   133   428.     435.
## 18 1996 Q2   398    5239    358   1593       44707   162   446.     434.
## 19 1996 Q3   398    6293    379   1706       46326   184   429.     434.
## 20 1996 Q4   507    5575    369   1699       43346   146   427.     428.

head(beer_ma2, n=20)

## # A tsibble: 20 x 8 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas `15-MA`
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>   <dbl>
##  1 1992 Q1   443    5777    383   1289       38332   117     NA 
##  2 1992 Q2   410    5853    404   1501       39774   151     NA 
##  3 1992 Q3   420    6416    446   1539       42246   175     NA 
##  4 1992 Q4   532    5825    420   1568       38498   129     NA 
##  5 1993 Q1   433    5724    394   1450       39460   116     NA 
##  6 1993 Q2   421    6036    462   1668       41356   149     NA 
##  7 1993 Q3   410    6570    475   1648       42949   163     NA 
##  8 1993 Q4   512    5675    443   1863       40974   138    441 
##  9 1994 Q1   449    5311    421   1468       40162   127    446.
## 10 1994 Q2   381    5717    475   1755       41199   159    446.
## 11 1994 Q3   423    7000    497   1962       44095   184    445.
## 12 1994 Q4   531    6085    476   1833       41745   147    436.
## 13 1995 Q1   426    4714    430   1626       41768   131    441.
## 14 1995 Q2   408    3939    457   1703       43686   167    441.
## 15 1995 Q3   416    6137    417   1733       46022   181    441.
## 16 1995 Q4   520    4739    370   1545       42800   145    433.
## 17 1996 Q1   409    4275    310   1526       43661   133    439.
## 18 1996 Q2   398    5239    358   1593       44707   162    442.
## 19 1996 Q3   398    6293    379   1706       46326   184    440 
## 20 1996 Q4   507    5575    369   1699       43346   146    431.

PROBLEM 3.7

Consider the last five years of the Gas data from aus_production

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

From the textbook “we can think of a time series as comprising three components: a trend-cycle component, a seasonal component, and a remainder component (containing anything else in the time series).”. On the gg_season below, we can see patterns exist between the quarters therefore there is a seasonal fluctuations between quarters within the years. On trend-cycle, there is a mild upwards trend a slight cycle but not very obvious on the plot.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?

Yes, here we are able to show the seasonal indices below and there is seasonal pattern aligning with the plot above.

Compute and plot the seasonally adjusted data.

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

The outlier when added to an observation in the middle, does seem to show change in the plot. If its prior to the start of the series, the graph shape doesn’t change as much.

#install.packages("seasonal")
library(seasonal)

gas <- tail(aus_production, 5*4) #|> select(Gas)

#Weekly data beginning Week 6, 1991, and ending Week 3, 2017. Units are million barrels per day.
gas %>% autoplot() + geom_smooth()

gg_season(gas, period = "year", labels="both")

#Classical
gasd <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 
components(gasd) %>%
  autoplot()

  labs(title = "Classical multiplicative Decomposition of Gas dataset")

## $title
## [1] "Classical multiplicative Decomposition of Gas dataset"
## 
## attr(,"class")
## [1] "labels"

  components(gasd) %>% 
  head()

## # A dable: 6 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##   .model                       Quarter   Gas trend seasonal random season_adjust
##   <chr>                          <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
## 1 "classical_decomposition(Ga… 2005 Q3   221   NA     1.13  NA              196.
## 2 "classical_decomposition(Ga… 2005 Q4   180   NA     0.925 NA              195.
## 3 "classical_decomposition(Ga… 2006 Q1   171  200.    0.875  0.974          195.
## 4 "classical_decomposition(Ga… 2006 Q2   224  204.    1.07   1.02           209.
## 5 "classical_decomposition(Ga… 2006 Q3   233  207     1.13   1.00           207.
## 6 "classical_decomposition(Ga… 2006 Q4   192  210.    0.925  0.987          208.

#x11 method to display the seasonal adjusted data
  x11_dcmp <- gas %>%
  model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) |>
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total Gas using X-11.")

x11_dcmp |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Gas Production") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

x11_dcmp |>
  gg_subseries(seasonal)

#Seats decomposition
gasg <- gas |>
  model(seats = X_13ARIMA_SEATS(Gas ~ seats())) |>
  components()
autoplot(gasg) +
  labs(title =
    "Decomposition of total gas using SEATS")

#STL decomposition
gas |>
  model(
    STL(Gas ~ trend(window = 15) +
                   season(window = "periodic"),
    robust = TRUE)) |>
  components() |>
  autoplot()

#Outlier 
gas0<- gas %>%
  mutate(Gas = ifelse(Gas == 192, Gas + 300, Gas)) 

 gas %>%
  mutate(Gas = ifelse(Gas == 192, Gas + 300, Gas))  %>% model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") + labs(title = "Seasonally Adjusted Gas Production with an Outlier")

 gas %>%
  mutate(Gas = ifelse(Gas == 180, Gas + 300, Gas))  %>% model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") + labs(title = "Seasonally Adjusted Gas Production with an Outlier")

#   library(dplyr)
# # Identify rows with the value
# row_indices <- apply(gas0, 1, function(row) any(row == "492"))
# # Subset the data
# filtered_data <- gas0[row_indices, ]
# 
# filtered_data

PROBLEM 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11.

Does it reveal any outliers, or unusual features that you had not noticed previously?

There does seem to be outlier with 1990 to 2000.

library(seasonal)

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

head(myseries)

## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State              Industry                      `Series ID`    Month Turnover
##   <chr>              <chr>                         <chr>          <mth>    <dbl>
## 1 Northern Territory Clothing, footwear and perso… A3349767W   1988 Apr      2.3
## 2 Northern Territory Clothing, footwear and perso… A3349767W   1988 May      2.9
## 3 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jun      2.6
## 4 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jul      2.8
## 5 Northern Territory Clothing, footwear and perso… A3349767W   1988 Aug      2.9
## 6 Northern Territory Clothing, footwear and perso… A3349767W   1988 Sep      3

#x11 method to display the seasonal adjusted data
  x11_dcmp1 <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()
autoplot(x11_dcmp1) +
  labs(title =
    "Decomposition of total Turnover using X-11.")

x11_dcmp1 |>
  ggplot(aes(x = Month)) +
  geom_line(aes(y = Turnover, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Turnover",
       title = "Turnover") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

x11_dcmp1 |>
  gg_subseries(seasonal)

PROBLEM 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?

The irregular graph shows a big downward spike in the data after 1990 January. Prior to that it is trending upward and in line with non-recession. Yes it is visible on the 2nd image in Jan/Feb subseries.

knitr::include_graphics("labour-1.png")

knitr::include_graphics("labour2-1.png")