Homework 2 Data 624

library(lubridate)
library(tsibble)
library(dplyr)
library(tidyverse)
library(fpp3)
library(forecast)
library(seasonal)

Question 1:

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

data("global_economy")
#global_economy
global_economy$GDPperCapita <- global_economy$GDP/global_economy$Population
global_economy %>% autoplot(GDPperCapita, show.legend =  FALSE) +
  labs(title= "GDP Per Capita", y = "USD")

# Creating a dataframe by GDPCAP
gdpCap<- global_economy %>%
  index_by(Year)%>%
  filter(GDPperCapita == max(GDPperCapita, na.rm = TRUE)) %>%
  select(Country, GDPperCapita) %>%
  arrange(Year)
## Adding missing grouping variables: `Year`
# Top 3 Countries GDP per Capita
head(gdpCap  %>%
  arrange(desc(GDPperCapita)), n = 3)
## # A tsibble: 3 x 3 [1Y]
## # Key:       Country [2]
## # Groups:    @ Year [3]
##    Year Country       GDPperCapita
##   <dbl> <fct>                <dbl>
## 1  2014 Monaco             185153.
## 2  2008 Monaco             180640.
## 3  2013 Liechtenstein      173528.
# Changing data type
gdpCap$Country <- factor(gdpCap$Country)

# Graph
ggplot(gdpCap, aes(x = as.factor(Year), y = GDPperCapita, fill = Country)) +
  geom_bar(stat = "identity") +
  labs(title = "GDP Per Capita over the Period of 1960 to 2017", x = "Year from 1960 to 2017", y = "GDP per Capita", fill = "Country") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  scale_x_discrete(limits = gdpCap$Year, breaks = 5)

It seems that Monaco has the highest GDP per Capita in 2014 at 185153. If we look at the yearly leaders for GDP per capita, we can see that initially it was Kuwait and United States in the early 1960/1970. Then Monaco seems to be the constant leader for majority of the time period with the more recent years having Luxembourg and Liechtenstein taking over for a couple years at the tail end.

Question 2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

data("aus_livestock")
data("vic_elec")
data("aus_production")

global_economy.

United States GDP from global_economy.

global_economy %>% filter(Country =='United States') %>%
  autoplot(GDP) +
  labs(title = "USA GDP", y = "USD")

A transformation doesn’t seem appropriate.

aus_livestock.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>% filter(Animal == 'Bulls, bullocks and steers') %>%
  filter(State == 'Victoria') %>%
  autoplot(Count) +
  labs(title = "Victorian Bulls, bullocks and Steers Slaughtered", y = 'Count')

A transformation doesn’t seem appropriate.

vic_elec

Victorian Electricity Demand from vic_elec.

vic_elec %>% autoplot(Demand) + 
  labs(title = 'Victory Electricity Demands', y = 'MWh')

electric <- vic_elec %>%
  mutate(Year = year(Date)) %>%
  index_by(Year) %>%
  summarise(Demand = sum(Demand))

electric %>% autoplot(Demand) + 
  labs(title = 'Yearly Victory Electricity Demands', y = 'MWh')

Since there was a lot of noise within the data set, I thought that transforming the electrical data to be yearly would help eliminate the noise as well as showing the constant trend of the decreasing demands of electricity.

aus_production

Gas production from aus_production.

aus_production %>% autoplot(Gas) +
  labs(title = 'Gas Production', y = 'petajoules')

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(title = paste("Transformed Gas Production with \u03BB =", round(lambda, 2)), y = 'petajoules')

I decided to do a Box-CoX transformation to help showcase the seasonal variation throughout the entire time frame. As we can also see there is uniform seasonality.

Question 3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>% autoplot(Volume) +
  labs(title = "Canadian Gas Data", y ="Volumne in Billions of Cubic Metres")

print(range(canadian_gas$Volume))
## [1]  0.9660 19.5284

The Box-Cox transformation is unhelpful since when we are looking at the range of data for the Volume of gas it is very small. We can see the range is between 0.966 and 19.5284.

Question 4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(17)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 
myseries %>% autoplot(Turnover) +
  labs(title = "Retail Data Turnover",
       y = "$AUD (Millions)")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>% autoplot(box_cox(Turnover, lambda))+
  labs(title = paste("Transformed Retail Turnover with \u03BB =", round(lambda, 2)))

With the Box-Cox transformation with \(\lambda\) = 0.14, we can see a more uniform seasonal variation. Box-Cox transformation was used since it uses a natural logarithm for exponential growth as well as using Guerrero found a good value of \(\lambda\) to help making the forecasting simpler.

Question 5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

data("pedestrian")
data("ansett")

aus_production

Tobacco from aus_production

aus_production %>% autoplot(Tobacco) +
  labs(title = "Original Tobacco Production", y = "Production in tonnes")
## Warning: Removed 24 rows containing missing values (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero)%>%
  pull(lambda_guerrero)

aus_production %>% autoplot(box_cox(Tobacco,lambda)) +
  labs(title = paste("Transformed Tobacco Production with \u03BB =", round(lambda, 2)))
## Warning: Removed 24 rows containing missing values (`geom_line()`).

For the Tobacco data we see that Box-Cox transformation we have a \(\lambda\) value of 0.93 which signifies that there was barely a transformation in data.

Ansett

Economy class passengers between Melbourne and Sydney from ansett

econ <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

autoplot(econ, Passengers)+
  labs(title = "Economy class Passengers Between Melbourne and Sydney")

lambda <- econ %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

econ %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(title = paste("Transformed Economy Class Passengers Count with \u03BB =", round(lambda, 2)))

In the Ansett data set, we have a Box-Cox transformation with a \(\lambda\) value of 2 indicating that the data is Squared in order to show variations better.

Pedestrian

Pedestrian counts at Southern Cross Station from pedestrian

pedestrian %>% filter(Sensor =='Southern Cross Station') %>% autoplot(Count)+
  labs(title = "Original Pedestrian Count")

weekly <- pedestrian %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

weekly %>% autoplot(Count)+
  labs(title = "Weekly Pedestrian Count")

lambda <- weekly %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

weekly %>% autoplot(box_cox(Count,lambda)) +
  labs(title = paste("Transformed Weekly Pedestrian Count with \u03BB =", round(lambda, 2)))

It seems that the Box-Cox Transformation does provide a power transformation since it has a \(\lambda\) value of 2. However the major transformation was grouping the data from an hourly pedestrian count to a weekly count since the total time frame is from Jan 2015 to Dec 2016. There was way too many data points within the row that had implied variance and a lot of noise.

Question 7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

A.

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>% autoplot(Gas) +
  labs(title = "Australia Gas Production", y = "Petajoules")

There seems to be seasonal fluctuation as when it approaches end Q4 of the year manufacturing production decreases to its trough and peaks during the mid year around end of Q2.

B.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>% model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical additive decomposition of total US retail employment")
## Warning: Removed 2 rows containing missing values (`geom_line()`).

C.

Do the results support the graphical interpretation from part a?

The results support part A. As seen on the seasonal line, we have its trough at the beginning of the year and peaks around the mid part of the year to then slowly descend as the year finishes.

D.

Compute and plot the seasonally adjusted data.

gas_season <- gas %>% model(classical_decomposition(Gas, type = "multiplicative"))  
components(gas_season) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "darkgray") +
  geom_line(aes(y=season_adjust), colour = "#3230B2") +
  labs(title = "Seasonally Adjusted Gas Production")

#### E. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas$Gas[gas$Gas == 171] <- gas$Gas[gas$Gas == 171] + 300

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "darkgray") +
  geom_line(aes(y=season_adjust), colour = "#3230B2") +
  labs(title = "Seasonally Adjusted Data with an Outlier")

The effects of the outlier was that there was a significant increase in both of the data as well as seasonally adjusted data, while impacting the trend of the overall data.

F.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas$Gas[gas$Gas == 471] <- gas$Gas[gas$Gas == 471] - 300
gas$Gas[gas$Gas == 245] <- gas$Gas[gas$Gas == 245] + 300

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "darkgray") +
  geom_line(aes(y=season_adjust), colour = "#3230B2") +
  labs(title = "Seasonally Adjusted Data with a Middle Outlier")

There doesn’t seem to be a difference where the outlier is the impact on the trend and data seems to be significant where it is placed.

Question 8:

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title = "Decomposition of total US retail employment using X-11.")

Looking at the irregular graph of the decomposition we notice a high noise around the beginning of the 1990 period as well as a bit of noise during the tail end of the 2000s. This can be explained as there was recessions during both of these time period. Besides that we can see general standard trend line as well as seasonality through the turnover.

Question 9

A. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

From figure 3.19 we can see that the trend is positive and increasing over time. From the seasonality trend we can see that there is 3 peaks per a year in hiring the civilian labor force, which seems to be around March, September and December. throughout the data we can see that there isn’t much noise until the end of the 1980’s through to around 1993. With the trough being around the end of 1990 and into 1991.

B. Is the recession of 1991/1992 visible in the estimated components?

Looking at figure 3.20 the seasonal component from the decomposition we can see that during the March to August period of the early 1990’s there is a sharp decrease. This can match up to figure 3.19 the overview of the STL decomposition where during the period of 1990 to 1991 we see a huge decrease in the remainder or noise column to help further the case of there being a recession.