Homework 2

Question: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per capita by Country over Time", y = "GDP per capita ($US)")

last_year_gdp <- global_economy %>%
  group_by(Country) %>%
  filter(Year == max(Year)) %>%
  mutate(GDP_Per_Capita = GDP / Population)

last_highest_gdp_country <- last_year_gdp %>%
arrange(desc(GDP_Per_Capita))%>%
select(Country, GDP_Per_Capita)

print(last_highest_gdp_country)

## # A tsibble: 263 x 3 [1Y]
## # Key:       Country [263]
## # Groups:    Country [263]
##    Country          GDP_Per_Capita  Year
##    <fct>                     <dbl> <dbl>
##  1 Luxembourg              104103.  2017
##  2 Macao SAR, China         80893.  2017
##  3 Switzerland              80190.  2017
##  4 Norway                   75505.  2017
##  5 Iceland                  70057.  2017
##  6 Ireland                  69331.  2017
##  7 Qatar                    63249.  2017
##  8 United States            59532.  2017
##  9 North America            58070.  2017
## 10 Singapore                57714.  2017
## # ℹ 253 more rows

overall_gdp <- global_economy %>%
  group_by(Country) %>%
  filter(Year == max(Year)) %>%
  mutate(GDP_Per_Capita = GDP / Population)

last_highest_gdp_country <- last_year_gdp %>%
arrange(desc(GDP_Per_Capita))%>%
select(Country, GDP_Per_Capita)

print(last_highest_gdp_country)

## # A tsibble: 263 x 3 [1Y]
## # Key:       Country [263]
## # Groups:    Country [263]
##    Country          GDP_Per_Capita  Year
##    <fct>                     <dbl> <dbl>
##  1 Luxembourg              104103.  2017
##  2 Macao SAR, China         80893.  2017
##  3 Switzerland              80190.  2017
##  4 Norway                   75505.  2017
##  5 Iceland                  70057.  2017
##  6 Ireland                  69331.  2017
##  7 Qatar                    63249.  2017
##  8 United States            59532.  2017
##  9 North America            58070.  2017
## 10 Singapore                57714.  2017
## # ℹ 253 more rows

highest_gdp_country <- global_economy %>%
  group_by(Country) %>%
  summarise(Max_GDP_Per_Capita = max(GDP / Population, na.rm = TRUE)) %>%
  arrange(desc(Max_GDP_Per_Capita)) %>%
  slice(1) %>%
  select(Country, Max_GDP_Per_Capita)

print(highest_gdp_country)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country Max_GDP_Per_Capita  Year
##   <fct>                <dbl> <dbl>
## 1 Monaco             185153.  2014

autoplot(subset(global_economy, Country == "Luxembourg")) +
   labs(title = "Change in GDP per Capita for Luxembourg")

## Plot variable not specified, automatically selected `.vars = GDP`

To summarize, Luxembourg is the country with the highest GDP per capita in the most recent year, and Monaco is the country with the highest GDP per capita in the overall data.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

a. United States GDP from global_economy.

Transforming data involves transforming data to emphasize or adjust certain characteristics. Log transformation is used to change the distribution of data to reduce skewness and stabilize fluctuations. Performing these transformations can reduce relative differences in the data and make patterns more distinct.

global_economy %>%
  filter(Country == "United States") %>%
  ggplot(aes(x = Year, y = GDP/Population)) +
  geom_line(size = 0.08) +
  geom_point(shape = 21, size = 0.5, fill = 'white', stroke = 0.5) +
  labs(title = "GDP per capita by Country over Time", y = "GDP per capita ($US)") +
  theme_minimal()

b. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>%
  autoplot(Count)+
  labs(title = "Slaughter of Victoria Bulls, Bullocks, and Steers")

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victoria Bulls, Bullocks, and Steers")

c.Victorian Electricity Demand from vic_elec.

“The reason for using the”as_tsibble()” function to convert data to tsibble format is to perform time series analysis. tsibble is efficient for storing and manipulating time series data, and has a variety of functions to easily handle the characteristics of time series data. provide.

Converting data to Month makes it easier to analyze and visualize data by aggregating it by month, and saves memory and calculation costs by reducing data size.

str(vic_elec)

## tbl_ts [52,608 × 5] (S3: tbl_ts/tbl_df/tbl/data.frame)
##  $ Time       : POSIXct[1:52608], format: "2012-01-01 00:00:00" "2012-01-01 00:30:00" ...
##  $ Demand     : num [1:52608] 4383 4263 4049 3878 4036 ...
##  $ Temperature: num [1:52608] 21.4 21.1 20.7 20.6 20.4 ...
##  $ Date       : Date[1:52608], format: "2012-01-01" "2012-01-01" ...
##  $ Holiday    : logi [1:52608] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  - attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
##   ..$ .rows: list<int> [1:1] 
##   .. ..$ : int [1:52608] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..@ ptype: int(0) 
##  - attr(*, "index")= chr "Time"
##   ..- attr(*, "ordered")= logi TRUE
##  - attr(*, "index2")= chr "Time"
##  - attr(*, "interval")= interval [1:1] 30m
##   ..@ .regular: logi TRUE

vic_elec

## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
##    Time                Demand Temperature Date       Holiday
##    <dttm>               <dbl>       <dbl> <date>     <lgl>  
##  1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
##  2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
##  3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
##  4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
##  5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
##  6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE   
##  7 2012-01-01 03:00:00  3694.        20.1 2012-01-01 TRUE   
##  8 2012-01-01 03:30:00  3562.        19.6 2012-01-01 TRUE   
##  9 2012-01-01 04:00:00  3433.        19.1 2012-01-01 TRUE   
## 10 2012-01-01 04:30:00  3359.        19.0 2012-01-01 TRUE   
## # ℹ 52,598 more rows

vic_elec <- vic_elec %>%
  group_by(Date) %>%
  mutate(Demand = sum(Demand)) %>%
  distinct(Date, Demand) 


vic_elec_tsibble <- as_tsibble(vic_elec, index = "Date")


vic_elec_monthly <- vic_elec %>%
  mutate(Month = lubridate::floor_date(Date, unit = "month")) %>%  
  group_by(Month) %>%
  summarise(Demand = sum(Demand))

#print(vic_elec_monthly)


vic_elec_tsibble <- as_tsibble(vic_elec_monthly, index = "Month")

plot1 <- autoplot(vic_elec_tsibble, ts.col = "demand_sum") +
  labs(title = "Victorian Electricity Demand over Time", x = "Date", y = "Demand Sum")

## Plot variable not specified, automatically selected `.vars = Demand`

plot2 <- autoplot(vic_elec_tsibble, ts.col = "demand_sum") +
  labs(title = "Victorian Electricity Demand over Time", x = "Month", y = "Demand Sum")

## Plot variable not specified, automatically selected `.vars = Demand`

plot_grid(plot1 + theme(plot.title = element_text(size = 8)),
          plot2 + theme(plot.title = element_text(size = 8)),
          ncol = 2)

d. Gas production from aus_production.

The first graph visualizes gas production over time. In this graph, the original data is used, with the y-axis representing gas production. This graph allows you to see trends in gas production over time.

The second graph visualizes gas production over time using a log transformation. Log-transformed data were obtained by applying a logarithmic function to the gas production. In this graph, the y-axis represents log-transformed gas production. Log transformation is used to adjust the distribution of data to reduce skewness and stabilize fluctuations.

The main difference between these two graphs is how the data is represented. The first graph uses the original data as is, allowing you to understand the absolute size and trend of gas production. The second graph, on the other hand, uses log-transformed data, allowing us to see trends in gas production at the transformed scale. When log-transformed data is used, the variance of the data is more stable and relative differences are often more pronounced.

aus_production

## # A tsibble: 218 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5
##  2 1956 Q2   213    5178    204    532        4436     6
##  3 1956 Q3   227    5297    208    561        4806     7
##  4 1956 Q4   308    5681    197    570        4418     6
##  5 1957 Q1   262    5577    187    529        4339     5
##  6 1957 Q2   228    5651    214    604        4811     7
##  7 1957 Q3   236    5317    227    603        5259     7
##  8 1957 Q4   320    6152    222    582        4735     6
##  9 1958 Q1   272    5758    199    554        4608     5
## 10 1958 Q2   233    5641    229    620        5196     7
## # ℹ 208 more rows

plot1 <- autoplot(aus_production, Gas, xlab = "Quarter", ylab = "Gas Production", 
         main = "Gas Production over Time")

aus_production$log_Gas <- log(aus_production$Gas + 1)  

plot2 <- autoplot(aus_production, log_Gas, xlab = "Quarter", ylab = "Log Gas Production", 
         main = "Log Gas Production over Time")

plot_grid(plot1 + theme(plot.title = element_text(size = 8)),
          plot2 + theme(plot.title = element_text(size = 8)),
          ncol = 2)

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

The Box-Cox transformation is often helpful in stabilizing the variability in your data and ensuring that your model’s assumptions are met. However, sometimes, depending on the nature of the data, the effect of the transformation may be minimal. Looking at the Canadian_gas data, the value is rising steadily, so the Box-Cox transformaiton does not have much of an impact because it is already stable data.

canadian_gas

## # A tsibble: 542 x 2 [1M]
##       Month Volume
##       <mth>  <dbl>
##  1 1960 Jan  1.43 
##  2 1960 Feb  1.31 
##  3 1960 Mar  1.40 
##  4 1960 Apr  1.17 
##  5 1960 May  1.12 
##  6 1960 Jun  1.01 
##  7 1960 Jul  0.966
##  8 1960 Aug  0.977
##  9 1960 Sep  1.03 
## 10 1960 Oct  1.25 
## # ℹ 532 more rows

plot1 <- autoplot(canadian_gas, series = "Volume", title = "canadian_gas")

## Plot variable not specified, automatically selected `.vars = Volume`

lambda <- BoxCox.lambda(canadian_gas$Volume)
canadian_gas_transformed <- canadian_gas %>%
  mutate(Volume_BoxCox = BoxCox(Volume, lambda))

plot2 <- autoplot(canadian_gas_transformed, series = "Volume_BoxCox", title = "Canadian Gas Volume (Box-Cox Transformed)")

## Plot variable not specified, automatically selected `.vars = Volume`

plot_grid(plot1 + theme(plot.title = element_text(size = 8)),
          plot2 + theme(plot.title = element_text(size = 8)),
          ncol = 2)

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Initially, the Box-Cox transformation had a λ value of -0.3645609, while the Guerrero transformation had a λ value of 0.08303631. However, the difference between the two transformations appears to be minimal on the graph.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

lambda <- BoxCox.lambda(myseries$Turnover)
myseries_transformed <- mutate(myseries, Turnover_BoxCox = BoxCox(myseries$Turnover, lambda))

print(lambda)

## [1] -0.3645609

autoplot(myseries_transformed, series = "Turnover_BoxCox") + 
          labs(title = "Box-Cox Transformed Time Series")

## Plot variable not specified, automatically selected `.vars = Turnover`

lambda_bc <- BoxCox.lambda(myseries$Turnover)
myseries_bc <- mutate(myseries, Turnover_BoxCox = BoxCox(Turnover, lambda_bc))

lambda_guerrero <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries_guerrero <- mutate(myseries, Turnover_Guerrero = guerrero(Turnover, lambda_guerrero))

print(lambda_guerrero)

## [1] 0.08303631

autoplot(myseries_guerrero, series = "Turnover_Guerrero", title = "Guerrero's Transformed Time Series") +
  labs(y = "Turnover (Guerrero's)", title = "Guerrero's Transformed Time Series")

## Plot variable not specified, automatically selected `.vars = Turnover`

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Since the data is relatively stable, there is no significant difference even if Box-Cox transformation is used. However, when plotting the pedestrian counts at Southern Cross Station on a daily basis, the granularity might make the graph difficult to interpret. Thus, it’s advisable to aggregate the data on a weekly basis to improve the readability of the graph.

aus_production

## # A tsibble: 218 x 8 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas log_Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>   <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5    1.79
##  2 1956 Q2   213    5178    204    532        4436     6    1.95
##  3 1956 Q3   227    5297    208    561        4806     7    2.08
##  4 1956 Q4   308    5681    197    570        4418     6    1.95
##  5 1957 Q1   262    5577    187    529        4339     5    1.79
##  6 1957 Q2   228    5651    214    604        4811     7    2.08
##  7 1957 Q3   236    5317    227    603        5259     7    2.08
##  8 1957 Q4   320    6152    222    582        4735     6    1.95
##  9 1958 Q1   272    5758    199    554        4608     5    1.79
## 10 1958 Q2   233    5641    229    620        5196     7    2.08
## # ℹ 208 more rows

ansett

## # A tsibble: 7,407 x 4 [1W]
## # Key:       Airports, Class [30]
##        Week Airports Class    Passengers
##      <week> <chr>    <chr>         <dbl>
##  1 1989 W28 ADL-PER  Business        193
##  2 1989 W29 ADL-PER  Business        254
##  3 1989 W30 ADL-PER  Business        185
##  4 1989 W31 ADL-PER  Business        254
##  5 1989 W32 ADL-PER  Business        191
##  6 1989 W33 ADL-PER  Business        136
##  7 1989 W34 ADL-PER  Business          0
##  8 1989 W35 ADL-PER  Business          0
##  9 1989 W36 ADL-PER  Business          0
## 10 1989 W37 ADL-PER  Business          0
## # ℹ 7,397 more rows

pedestrian

## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [4]
##    Sensor         Date_Time           Date        Time Count
##    <chr>          <dttm>              <date>     <int> <int>
##  1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
##  2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
##  3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
##  4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
##  5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
##  6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
##  7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
##  8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
##  9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
## # ℹ 66,027 more rows

a. Tobacco from aus_production

autoplot(aus_production, Tobacco) +
  labs(title = "Tobacco Production in Australia")

lambda <- BoxCox.lambda(aus_production$Tobacco)

aus_production_transformed <- aus_production %>%
  mutate(Tobacco_BoxCox = BoxCox(Tobacco, lambda))

autoplot(aus_production_transformed, Tobacco_BoxCox) +
  labs(title = "Tobacco Production in Australia (Box-Cox Transformed)")

b. Economy class passengers between Melbourne and Sydney from ansett

economy_passengers <- ansett %>%
  filter(Airports == "MEL-SYD", Class == "Economy")

autoplot(economy_passengers, Passengers) +
  labs(title = "Economy Class Passengers between Melbourne and Sydney")

lambda <- BoxCox.lambda(economy_passengers$Passengers)

economy_passengers_transformed <- economy_passengers %>%
  mutate(Passengers_BoxCox = BoxCox(Passengers, lambda))

autoplot(economy_passengers_transformed, Passengers_BoxCox)+
  labs(title = "Tobacco Production in Australia (Box-Cox Transformed)")

c. Pedestrian counts at Southern Cross Station from pedestrian

southern_cross_pedestrian <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")

autoplot(southern_cross_pedestrian, Count) +
  labs(title = "Pedestrian Counts at Southern Cross Station")

lambda <- BoxCox.lambda(southern_cross_pedestrian$Count)

southern_cross_pedestrian_transformed <- southern_cross_pedestrian %>%
  mutate(Count_BoxCox = BoxCox(Count, lambda))

autoplot(southern_cross_pedestrian_transformed, Count_BoxCox)+
  labs(title = "Pedestrian Counts at Southern Cross Station (Box-Cox Transformed)")

print(lambda)

## [1] 0.07481161

southern_cross_pedestrian <- southern_cross_pedestrian %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross_pedestrian, Count)+
  labs(title = "Weekly Pedestrian Counts at Southern Cross Station")

lambda <- BoxCox.lambda(southern_cross_pedestrian$Count)

southern_cross_pedestrian_transformed <- southern_cross_pedestrian %>%
  mutate(Count_BoxCox = BoxCox(Count, lambda))

autoplot(southern_cross_pedestrian_transformed, Count_BoxCox)+
  labs(title = "Pedestrian Counts at Southern Cross Station (Box-Cox Transformed)")

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(gas, Gas)

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_decomposed <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas_decomposed) %>%
  autoplot() +
  labs(title = "Trend of Gas Time Series")

c. Do the results support the graphical interpretation from part a?

Yes, the results corroborate the visual interpretation from part a as the values increase consistently each quarter, which is evident in the trend graph.

d. Compute and plot the seasonally adjusted data.

gas

## # A tsibble: 20 x 2 [1Q]
##      Gas Quarter
##    <dbl>   <qtr>
##  1   221 2005 Q3
##  2   180 2005 Q4
##  3   171 2006 Q1
##  4   224 2006 Q2
##  5   233 2006 Q3
##  6   192 2006 Q4
##  7   187 2007 Q1
##  8   234 2007 Q2
##  9   245 2007 Q3
## 10   205 2007 Q4
## 11   194 2008 Q1
## 12   229 2008 Q2
## 13   249 2008 Q3
## 14   203 2008 Q4
## 15   196 2009 Q1
## 16   238 2009 Q2
## 17   252 2009 Q3
## 18   210 2009 Q4
## 19   205 2010 Q1
## 20   236 2010 Q2

gas_ts <- ts(gas$Gas, frequency = 4, start = c(2005, 3))

gas_decomposed <- decompose(gas_ts)

seasonally_adjusted_data <- gas_ts - gas_decomposed$seasonal

autoplot(seasonally_adjusted_data) +
  labs(title = "Seasonally Adjusted Gas Consumption")

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

Aadding an outlier 300 to the graph can significantly change the mean or median of the time series data, the range of the y-axis of the graph is also adjusted accordingly.

gas_with_outlier <- gas
gas_with_outlier$Gas[5] <- gas_with_outlier$Gas[5] + 300

gas_with_outlier_ts <- ts(gas_with_outlier$Gas, frequency = 4, start = c(2005, 3))

gas_with_outlier_decomposed <- decompose(gas_with_outlier_ts)

seasonally_adjusted_with_outlier <- gas_with_outlier_ts - gas_with_outlier_decomposed$seasonal

autoplot(seasonally_adjusted_data, series = "Without Outlier") +
  autolayer(seasonally_adjusted_with_outlier, series = "With Outlier") +
  labs(title = "Effect of Outlier on Seasonally Adjusted Gas Consumption")

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Yes, it Does make differences. The location of outliers within a time series can really make a difference. Outliers at the ends of a time series can have a greater impact on statistical measures such as the mean or median than outliers in the middle or earlier.

If outliers occur near the end of a time series, they can have a more pronounced effect on the overall trends and fluctuations in the data, especially if the time series exhibits a particular pattern or trend over time. Additionally, outliers close to the ends may affect the prediction model differently, ultimately leading to different predictions or estimates.

gas_with_outlier <- gas
gas_with_outlier$Gas[19] <- gas_with_outlier$Gas[19] + 300

gas_with_outlier_ts <- ts(gas_with_outlier$Gas, frequency = 4, start = c(2005, 3))

gas_with_outlier_decomposed <- decompose(gas_with_outlier_ts)

seasonally_adjusted_with_outlier <- gas_with_outlier_ts - gas_with_outlier_decomposed$seasonal

autoplot(seasonally_adjusted_data, series = "Without Outlier") +
  autolayer(seasonally_adjusted_with_outlier, series = "With Outlier") +
  labs(title = "Effect of Outlier Near the End on Seasonally Adjusted Gas Consumption")

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

In general, when analyzing seasonal data, it’s evident that there tends to be more variability at the beginning of the time series compared to the later periods. This observation suggests that outliers occurring at the start of the time series may have a more significant impact on the overall pattern. As the time series progresses, the influence of outliers typically diminishes, given that data points accumulate gradually over time.

aus_retail

## # A tsibble: 64,532 x 5 [1M]
## # Key:       State, Industry [152]
##    State                        Industry           `Series ID`    Month Turnover
##    <chr>                        <chr>              <chr>          <mth>    <dbl>
##  1 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Apr      4.4
##  2 Australian Capital Territory Cafes, restaurant… A3349849A   1982 May      3.4
##  3 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Jun      3.6
##  4 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Jul      4  
##  5 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Aug      3.6
##  6 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Sep      4.2
##  7 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Oct      4.8
##  8 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Nov      5.4
##  9 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Dec      6.9
## 10 Australian Capital Territory Cafes, restaurant… A3349849A   1983 Jan      3.8
## # ℹ 64,522 more rows

set.seed(12345678)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

x11_dcmp <- myseries %>%
  model(x13 = X_13ARIMA_SEATS(Turnover))

x11_components <- components(x11_dcmp)

autoplot(x11_components) +
  labs(title = "Decomposition of Retail Turnover using X-13-ARIMA-SEATS")

3.9

The data suggests that Australia’s civilian labor force maintains steady growth, seemingly unaffected by seasonal fluctuations. However, notable declines coincide with economic recessions, indicating sensitivity to broader economic trends.

In the trend chart, it’s challenging to clearly identify decreases due to recessions. However, in the value and remainder graphs, recession-induced declines are notably pronounced, providing clearer evidence of their impact.