library(fpp3)
library(latex2exp)
library(seasonal)

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Monaco has the highest GDP per capita followed by Liechtenstein. Over the years, Monaco and Liechtenstein has maintained their position in have the highest GDP per capita. All countries’ GDP per capita following the overall same trend (upwards) and the similar dips corresponding to economic crisis (like the financial crisis in 2008)

global_economy %>%
  autoplot(GDP/Population, show.legend = FALSE) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

# Must include `show.legend = FALSE`
gdp_per_capita <- global_economy %>%
  select(Country, Year, GDP, Population) %>%
  mutate(gdp_per_capita = GDP/Population) %>%
  slice_max(order_by = gdp_per_capita, n = 20)

gdp_per_capita 
## # A tsibble: 20 x 5 [1Y]
## # Key:       Country [2]
##    Country        Year         GDP Population gdp_per_capita
##    <fct>         <dbl>       <dbl>      <dbl>          <dbl>
##  1 Monaco         2014 7060236168.      38132        185153.
##  2 Monaco         2008 6476490406.      35853        180640.
##  3 Liechtenstein  2014 6657170923.      37127        179308.
##  4 Liechtenstein  2013 6391735894.      36834        173528.
##  5 Monaco         2013 6553372278.      37971        172589.
##  6 Monaco         2016 6468252212.      38499        168011.
##  7 Liechtenstein  2015 6268391521.      37403        167591.
##  8 Monaco         2007 5867916781.      35111        167125.
##  9 Liechtenstein  2016 6214633651.      37666        164993.
## 10 Monaco         2015 6258178995.      38307        163369.
## 11 Monaco         2011 6080344732.      37497        162155.
## 12 Liechtenstein  2011 5739977477.      36264        158283.
## 13 Monaco         2012 5743029680.      37783        152000.
## 14 Liechtenstein  2012 5456009385.      36545        149296.
## 15 Monaco         2009 5451653237.      36534        149221.
## 16 Monaco         2010 5362649007.      37094        144569.
## 17 Liechtenstein  2008 5081432924.      35541        142974.
## 18 Liechtenstein  2010 5082366478.      36003        141165.
## 19 Monaco         2006 4582988333.      34408        133195.
## 20 Liechtenstein  2007 4601299567.      35322        130267.
global_economy %>%
  filter(Country %in% c("Monaco", "Liechtenstein"))%>%
  autoplot(GDP/Population) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.

United States GDP

There is exponential growth. Performing log transformation flatten it.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP/Population, show.legend = FALSE) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(log(GDP/Population), show.legend = FALSE) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

Slaughter of Victorian “Bulls, bullocks and steers”

Variation seems even for the most part with the exception of some anamoly in late 1970s. transformation may not be neccesarily.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria")%>%
  autoplot(Count) +
  labs(title = "Number of Victorian Bulls, Bullocks and Steers Slaughtered ")

Victorian Electricity Demand

The original data shows Half-hourly electricity demand for Victoria, Australia. The variation seems even for the most part with an exception to some unusual peaks. However, it was difficult to see with all the noise. Transformation may not be necessary. We can explore trends of electricity demand by yearly, monthly, and daily.

vic_elec %>% 
  autoplot(Demand) + 
  labs(title = "Victorian Electricity Demand", y = "MWh")

Plotting the electricity demand yearly, we see the demand has decrease over time.

demand <- vic_elec%>% 
  mutate(Month_Year = yearmonth(Time),
         Year = year(Time))

annual_demand <- demand %>% 
  group_by(Year) %>% 
  index_by(Year) %>% 
  summarise(total = sum(Demand), .groups = "drop") 

annual_demand %>% 
  autoplot(total) + 
  labs(title = "Yearly Victorian Electricity Demand", y = "MWh")

Plotting the electricity demand monthly, we see the seasonal patterns. Peaks in demands in the summer and winter. Dips in demand when temperatures are comfortable.

monthly_demand <- demand %>% 
  group_by(Month_Year) %>% 
  index_by(Month_Year) %>% 
  summarise(total = sum(Demand), .groups = "drop") 

monthly_demand %>% 
  autoplot(total) + 
  labs(title = "Monthly Victorian Electricity Demand", y = "MWh")

The variation seems even for the most part with an exception to some unusual peaks like early 2014.

daily_demand <- demand %>% 
  group_by(Date) %>% 
  index_by(Date) %>% 
  summarise(total = sum(Demand), .groups = "drop") 

daily_demand %>% 
  autoplot(total) + 
  labs(title = "Daily Victorian Electricity Demand", y = "MWh")

Gas production

In the plot of the original data, the variations is not homogeneous. There are more variation in the demand in 2000s than in 1960s. We should use transformation to make the variation more homogeneous.

aus_production %>%
  autoplot(Gas) + 
  labs(title = "Orginal Victorian Electricity Demand Data", y = "MWh")

After a square root transformation, the variations look more even but there are still more variations in the 2000s data.

aus_production %>%
  autoplot(sqrt(Gas)) + 
  labs(title = "Square Root Victorian Electricity Demand Data", y = "MWh")

After a cube root transformation, the variations look more even than the last transformation but there are still more variations in the 2000s data.

aus_production %>%
  autoplot((Gas)^(1/3)) + 
  labs(title = "Cube Root Victorian Electricity Demand Data", y = "MWh")

After a cube root transformation, the variations look more even but there are still more variations in the late 1900s data instead.

aus_production %>%
  autoplot(log(Gas)) + 
  labs(title = "Log Victorian Electricity Demand Data", y = "MWh")

The box cox transformation is the best by far in evening out the seasonal variation.

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.1095171
aus_production %>%
  autoplot(box_cox(Gas, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Gas with $\\lambda = ",
         round(lambda,2)
       )))

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Box Cox transformation is not helpful for data that has different periods of changing variance. The transformation will overcorrect or undercorrect the periods. In the canadian_gas data, it shows an increasing trend from 1960 to 1970 with season patterns, a constant trend with seasonal patterns from 1970 to 1990, a slight increasing trend from 1990 to 2000, and finally constant.

canadian_gas %>%
  autoplot() + 
  labs(title = "Orginal Canadian Gas Data " )
## Plot variable not specified, automatically selected `.vars = Volume`

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.5767648
canadian_gas %>%
  autoplot(box_cox(Volume, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Gas with $\\lambda = ",
         round(lambda,2)
       )))

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)

The optimal lambda value is 0.08303631, which is close to 0. This indicates that the appropriate transformation for this data is natural logarithm.

\(\lambda\) Transformation
2 Square
1 None
0.5 square root
-0.5 inverse square root
-1 reciprocal
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot()+
  labs(title = "Monthly Australian Retail Trade Turnover",
       x = "Monthly",
       y = "Turnover ($Million AUD)")
## Plot variable not specified, automatically selected `.vars = Turnover`

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.08303631
myseries %>%
  autoplot(box_cox(Turnover, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Australian Retail Trade Turnover with $\\lambda = ",
         round(lambda,2)
       )))

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

\(\lambda\) is 0.9264636, which is close to 1. This indicates that there is almost none transformation is needed for this dataset.

aus_production %>%
  autoplot(Tobacco) + 
  labs(title = "Original Tobacco data  ")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.9264636
aus_production %>%
  autoplot(box_cox(Tobacco, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco with $\\lambda = ",
         round(lambda,2)
       )))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Economy class passengers between Melbourne and Sydney

The data shows the total number of air passengers traveling with Ansett weekly. The optimal lambda value is 1.999927, which is close to 2. This indicates that the appropriate transformation for this data is square transformation.

economy_class <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

economy_class %>%
  autoplot(Passengers) +
  labs(title = "Weekly Economy class passengers between Melbourne and Sydney")

lambda <- economy_class %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 1.999927
economy_class %>%
  autoplot(box_cox(Passengers, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Economy class passengers Numbers with $\\lambda = ",
         round(lambda,2)
       )))

Pedestrian counts at Southern Cross Station

This dataset contains the hourly pedestrian counts from 2015 to 2016 at 4 sensors (“Birrarung Marr”, “Bourke Street Mall (North)”, “QV Market-Elizabeth St (West)” “Southern Cross Station”) in the city of Melbourne.

The optimal lambda value is -0.2501616, which is close to -0.25. This indicates that the appropriate transformation for this data is inverse square root transformation.

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")
  
southern_cross %>%
  autoplot(Count) +
  labs(title = "Hourly Pedestrian counts in the city of Melbourne")

pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [4]
##    Sensor         Date_Time           Date        Time Count
##    <chr>          <dttm>              <date>     <int> <int>
##  1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
##  2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
##  3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
##  4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
##  5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
##  6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
##  7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
##  8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
##  9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
## # ℹ 66,027 more rows
lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] -0.2501616
southern_cross %>%
  autoplot(box_cox(Count, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Pedestrian Ccounts in Melbourne with $\\lambda = ",
         round(lambda,2)
       )))

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Part a

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

Seasonal Fluctuations: There is a repeating pattern each year. Each year gas production rises to a peak in Q3 and then declines to a trough in Q1. This make senses as the winter months in Austrailia is June to August.

Trend Cycle: Overall, it is a upward trend with each year’s peak higher than the previous year’s.

gas %>%
  autoplot() + 
  labs(title = "Last Five Years of the Gas Production in Australia" )
## Plot variable not specified, automatically selected `.vars = Gas`

Part b

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

The first panel shows the plot of the original data.

The second panel shows the trend-cyle of the data to be overally increasing. Gas production increases from 2006 to mid 2007, constant (no trend) from mid 2007 to mid 2008, and increases after.

The third panel shows the seasonal trend. There is a annual pattern of rise and fall in gas production. Gas production peaks in Q3 and troughs in Q1.

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Last Five Years of the Gas Production in Australia ") 

Part c

Do the results support the graphical interpretation from part a?

Yes, the results support the graphical interpretation from part a. There was a yearly seasonal trend with a overall upward trend in gas production.

Part d

Compute and plot the seasonally adjusted data.

gas_mult_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()


gas_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Part e

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

# There are only 20 observations. 
outlier <- gas 
outlier$Gas[5] <- outlier$Gas[5] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "(Introduced Outlier)") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Part f

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Yes, classical decompoisition is not robust to outliers as it uses moving averages. Depending on where the outlier is, anomaly will show on the adjusted seasonal plot. The seasonal plot remain similar and almost unchanged as we see the similar peaks and trough annually. Trend cycle and remainder affected.

Outlier in the Beginning

outlier <- gas 
outlier$Gas[1] <- outlier$Gas[1] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "(Outlier in the Beginning)") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ",
       subtitle = "(Outlier in the Beginning)") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Outlier in the Middle

outlier <- gas 
outlier$Gas[10] <- outlier$Gas[10] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "(Outlier in the Middle)") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ",
       subtitle = "(Outlier in the Middle)") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Outlier in the End

outlier <- gas 
outlier$Gas[20] <- outlier$Gas[20] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "Outlier in the End") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ",
       subtitle = "Outlier in the End")  +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Exercise 7 in Section 2.10

The dataset contains monthly retail sales in Australia. Sales peak in December, and July. The lag plot reveals strongly positive in lag 1 and lag 12. There is strong evidence of annual sesonality.

myseries %>%
  autoplot() +
  labs(title = "Monthly Australian Retail Trade Turnover") 
## Plot variable not specified, automatically selected `.vars = Turnover`

myseries %>%
  gg_season() +
  labs(title = "GG Season Australian Retail Trade Turnover") 
## Plot variable not specified, automatically selected `y = Turnover`

myseries %>%
  gg_subseries() +
  labs(title = "GG Subseries Australian Retail Trade Turnover") 
## Plot variable not specified, automatically selected `y = Turnover`

myseries %>%
  gg_lag(Turnover, lags = 1:12) +
  labs(title = "GG Season Australian Retail Trade Turnover") 

X11

X11 is most robust to outliers. It was able to capture outliers. We can easily see that the overall trend is upwards. We are able to see outliers (sharp peaks and dips) in the irregular plot (last panel) that we couldn’t see in the autoplot.

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_dcmp) +
  labs(title = "Decomposition of Monthly Australian Retail Trade Turnover Using X11") 

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

part a

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The trend graph reveals an overall upward trend of labour force in Australia. There is evidence of seasonal fluctuations in the season_year graph as there are regular oscillations present. Employment peaks in Dec, and is lowest in January and August. The remainder plot reveals anomaly (extreme dip in labour force) in early 1990s.

part b

Is the recession of 1991/1992 visible in the estimated components?

Yes, it the trend plot we can see the trend is almost flat in the early 1990s and an increasing trend in other years. In the remainder plot, there is sharp drop into negative values. The negative remainder indicates that labour force drop lower than expected.