library(fpp3)
library(latex2exp)

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Monaco has the highest GDP per capita followed by Liechtenstein. Over the years, Monaco and Liechtenstein has maintained their position in have the highest GDP per capita. All countries’ GDP per capita following the overall same trend (upwards) and the similar dips corresponding to economic crisis (like the financial crisis in 2008)

global_economy %>%
  autoplot(GDP/Population, show.legend = FALSE) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

# Must include `show.legend = FALSE`
gdp_per_capita <- global_economy %>%
  select(Country, Year, GDP, Population) %>%
  mutate(gdp_per_capita = GDP/Population) %>%
  slice_max(order_by = gdp_per_capita, n = 20)

gdp_per_capita 
## # A tsibble: 20 x 5 [1Y]
## # Key:       Country [2]
##    Country        Year         GDP Population gdp_per_capita
##    <fct>         <dbl>       <dbl>      <dbl>          <dbl>
##  1 Monaco         2014 7060236168.      38132        185153.
##  2 Monaco         2008 6476490406.      35853        180640.
##  3 Liechtenstein  2014 6657170923.      37127        179308.
##  4 Liechtenstein  2013 6391735894.      36834        173528.
##  5 Monaco         2013 6553372278.      37971        172589.
##  6 Monaco         2016 6468252212.      38499        168011.
##  7 Liechtenstein  2015 6268391521.      37403        167591.
##  8 Monaco         2007 5867916781.      35111        167125.
##  9 Liechtenstein  2016 6214633651.      37666        164993.
## 10 Monaco         2015 6258178995.      38307        163369.
## 11 Monaco         2011 6080344732.      37497        162155.
## 12 Liechtenstein  2011 5739977477.      36264        158283.
## 13 Monaco         2012 5743029680.      37783        152000.
## 14 Liechtenstein  2012 5456009385.      36545        149296.
## 15 Monaco         2009 5451653237.      36534        149221.
## 16 Monaco         2010 5362649007.      37094        144569.
## 17 Liechtenstein  2008 5081432924.      35541        142974.
## 18 Liechtenstein  2010 5082366478.      36003        141165.
## 19 Monaco         2006 4582988333.      34408        133195.
## 20 Liechtenstein  2007 4601299567.      35322        130267.
global_economy %>%
  filter(Country %in% c("Monaco", "Liechtenstein"))%>%
  autoplot(GDP/Population) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.

United States GDP

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP/Population, show.legend = FALSE) + 
  labs(title = "GDP per capita", x = "Year", y = "$US")

Slaughter of Victorian “Bulls, bullocks and steers”

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria")%>%
  autoplot(Count) +
  labs(title = "Number of Victorian Bulls, Bullocks and Steers Slaughtered ")

Victorian Electricity Demand

The original data shows Half-hourly electricity demand for Victoria, Australia. The variation seems even for the most part with an exception to some unusual peaks. Transformation may not be necessary. We can explore trends of electricity demand by yearly, monthly, and daily.

vic_elec %>% 
  autoplot(Demand) + 
  labs(title = "Victorian Electricity Demand", y = "MWh")

Plotting the electricity demand yearly, we see the demand has decrease over time.

demand <- vic_elec%>% 
  mutate(Month_Year = yearmonth(Time),
         Year = year(Time))

annual_demand <- demand %>% 
  group_by(Year) %>% 
  index_by(Year) %>% 
  summarise(total = sum(Demand), .groups = "drop") 

annual_demand %>% 
  autoplot(total) + 
  labs(title = "Yearly Victorian Electricity Demand", y = "MWh")

Plotting the electricity demand monthly, we see the seasonal patterns. Peaks in demands in the summer and winter. Dips in demand when temperatures are comfortable.

monthly_demand <- demand %>% 
  group_by(Month_Year) %>% 
  index_by(Month_Year) %>% 
  summarise(total = sum(Demand), .groups = "drop") 

monthly_demand %>% 
  autoplot(total) + 
  labs(title = "Monthly Victorian Electricity Demand", y = "MWh")

The variation seems even for the most part with an exception to some unusual peaks like early 2014.

daily_demand <- demand %>% 
  group_by(Date) %>% 
  index_by(Date) %>% 
  summarise(total = sum(Demand), .groups = "drop") 

daily_demand %>% 
  autoplot(total) + 
  labs(title = "Daily Victorian Electricity Demand", y = "MWh")

Gas production

In the plot of the original data, the variations is not homogeneous. There are more variation in the demand in 2000s than in 1960s. We should use transformation to make the variation more homogeneous.

aus_production %>%
  autoplot(Gas) + 
  labs(title = "Orginal Victorian Electricity Demand Data", y = "MWh")

After a square root transformation, the variations look more even but there are still more variations in the 2000s data.

aus_production %>%
  autoplot(sqrt(Gas)) + 
  labs(title = "Square Root Victorian Electricity Demand Data", y = "MWh")

After a cube root transformation, the variations look more even than the last transformation but there are still more variations in the 2000s data.

aus_production %>%
  autoplot((Gas)^(1/3)) + 
  labs(title = "Cube Root Victorian Electricity Demand Data", y = "MWh")

After a cube root transformation, the variations look more even but there are still more variations in the late 1900s data instead.

aus_production %>%
  autoplot(log(Gas)) + 
  labs(title = "Log Victorian Electricity Demand Data", y = "MWh")

The box cox transformation is the best by far in evening out the seasonal variation.

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.1095171
aus_production %>%
  autoplot(box_cox(Gas, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Gas with $\\lambda = ",
         round(lambda,2)
       )))

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Box Cox transformation is not helpful for data that has different periods of changing variance. The transformation will overcorrect or undercorrect the periods. In the canadian_gas data, it shows an increasing trend from 1960 to 1970 with season patterns, a constant trend with seasonal patterns from 1970 to 1990, a slight increasing trend from 1990 to 2000, and finally constant.

canadian_gas %>%
  autoplot() + 
  labs(title = "Orginal Canadian Gas Data " )
## Plot variable not specified, automatically selected `.vars = Volume`

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.5767648
canadian_gas %>%
  autoplot(box_cox(Volume, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Gas with $\\lambda = ",
         round(lambda,2)
       )))

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)

The optimal lambda value is 0.08303631, which is close to 0. This indicates that the appropriate transformation for this data is natural logarithm.

\(\lambda\) Transformation
2 Square
1 None
0.5 square root
-0.5 inverse square root
-1 reciprocal
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot()+
  labs(title = "Monthly Australian Retail Trade Turnover",
       x = "Monthly",
       y = "Turnover ($Million AUD)")
## Plot variable not specified, automatically selected `.vars = Turnover`

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.08303631
myseries %>%
  autoplot(box_cox(Turnover, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Australian Retail Trade Turnover with $\\lambda = ",
         round(lambda,2)
       )))

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

\(\lambda\) is 0.9264636, which is close to 1. This indicates that there is almost none transformation is needed for this dataset.

aus_production %>%
  autoplot(Tobacco) + 
  labs(title = "Original Tobacco data  ")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 0.9264636
aus_production %>%
  autoplot(box_cox(Tobacco, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco with $\\lambda = ",
         round(lambda,2)
       )))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Economy class passengers between Melbourne and Sydney

The data shows the total number of air passengers traveling with Ansett weekly. The optimal lambda value is 1.999927, which is close to 2. This indicates that the appropriate transformation for this data is square transformation.

economy_class <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

economy_class %>%
  autoplot(Passengers) +
  labs(title = "Weekly Economy class passengers between Melbourne and Sydney")

lambda <- economy_class %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] 1.999927
economy_class %>%
  autoplot(box_cox(Passengers, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Economy class passengers Numbers with $\\lambda = ",
         round(lambda,2)
       )))

Pedestrian counts at Southern Cross Station

This dataset contains the hourly pedestrian counts from 2015 to 2016 at 4 sensors (“Birrarung Marr”, “Bourke Street Mall (North)”, “QV Market-Elizabeth St (West)” “Southern Cross Station”) in the city of Melbourne.

The optimal lambda value is -0.2501616, which is close to -0.25. This indicates that the appropriate transformation for this data is inverse square root transformation.

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")
  
southern_cross %>%
  autoplot(Count) +
  labs(title = "Hourly Pedestrian counts in the city of Melbourne")

pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [4]
##    Sensor         Date_Time           Date        Time Count
##    <chr>          <dttm>              <date>     <int> <int>
##  1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
##  2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
##  3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
##  4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
##  5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
##  6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
##  7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
##  8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
##  9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
## # ℹ 66,027 more rows
lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
lambda
## [1] -0.2501616
southern_cross %>%
  autoplot(box_cox(Count, lambda))+
  labs(y= "",
       title = latex2exp::TeX(paste0(
         "Transformed Pedestrian Ccounts in Melbourne with $\\lambda = ",
         round(lambda,2)
       )))

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Part a

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

Seasonal Fluctuations: There is a repeating pattern each year. Each year gas production rises to a peak in Q3 and then declines to a trough in Q1. This make senses as the winter months in Austrailia is June to August.

Trend Cycle: Overall, it is a upward trend with each year’s peak higher than the previous year’s.

gas %>%
  autoplot() + 
  labs(title = "Last Five Years of the Gas Production in Australia" )
## Plot variable not specified, automatically selected `.vars = Gas`

Part b

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

The first panel shows the plot of the original data.

The second panel shows the trend-cyle of the data to be overally increasing. Gas production increases from 2006 to mid 2007, constant (no trend) from mid 2007 to mid 2008, and increases after.

The third panel shows the seasonal trend. There is a annual pattern of rise and fall in gas production. Gas production peaks in Q3 and troughs in Q1.

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Last Five Years of the Gas Production in Australia ") 

Part c

Do the results support the graphical interpretation from part a?

Yes, the results support the graphical interpretation from part a. There was a yearly seasonal trend with a overall upward trend in gas production.

Part d

Compute and plot the seasonally adjusted data.

gas_mult_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()


gas_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Part e

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

# There are only 20 observations. 
outlier <- gas 
outlier$Gas[5] <- outlier$Gas[5] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "(Introduced Outlier)") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Part f

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Introducing the outlier in the beginning, middle, or end has little affect on the seasonal trend.

Outlier in the Beginning

outlier <- gas 
outlier$Gas[1] <- outlier$Gas[1] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "(Outlier in the Beginning)") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ",
       subtitle = "(Outlier in the Beginning)") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Outlier in the Middle

outlier <- gas 
outlier$Gas[10] <- outlier$Gas[10] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "(Outlier in the Middle)") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ",
       subtitle = "(Outlier in the Middle)") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Outlier in the End

outlier <- gas 
outlier$Gas[20] <- outlier$Gas[20] + 300

outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
       subtitle = "Outlier in the End") 

outlier_mult_decomp <- outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_mult_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Last Five Years of the Gas Production in Australia ",
       subtitle = "Outlier in the End")  +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.