DATA 624: Homework 2

Exercise 3.1

First, let’s take a look at our global_economy dataset:

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

First, we’ll need to calculate the GDP per capita by time period

global_economy <- global_economy %>% mutate(gdp_per_capita = GDP / Population)

# Plot GDP per capita per country over time
autoplot(global_economy, gdp_per_capita)

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

Now let’s find the countrieswith the highest GDP per capita. Looking at our sorted result, it looks like micro-nations like Monaco, Liechenstein, and Luxembourg top the list.

# Find GDP per Capita by Country, then sort on result
highest_gdps <- global_economy %>%
              group_by(Country) %>%
              arrange(desc(gdp_per_capita))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

highest_gdps

## # A tsibble: 15,150 x 10 [1Y]
## # Key:       Country [263]
## # Groups:    Country [263]
##    Country       Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Monaco        MCO    2014 7060236168.  7.18     NA      NA      NA      38132
##  2 Monaco        MCO    2008 6476490406.  0.732    NA      NA      NA      35853
##  3 Liechtenstein LIE    2014 6657170923. NA        NA      NA      NA      37127
##  4 Liechtenstein LIE    2013 6391735894. NA        NA      NA      NA      36834
##  5 Monaco        MCO    2013 6553372278.  9.57     NA      NA      NA      37971
##  6 Monaco        MCO    2016 6468252212.  3.21     NA      NA      NA      38499
##  7 Liechtenstein LIE    2015 6268391521. NA        NA      NA      NA      37403
##  8 Monaco        MCO    2007 5867916781. 14.4      NA      NA      NA      35111
##  9 Liechtenstein LIE    2016 6214633651. NA        NA      NA      NA      37666
## 10 Monaco        MCO    2015 6258178995.  4.94     NA      NA      NA      38307
## # ℹ 15,140 more rows
## # ℹ 1 more variable: gdp_per_capita <dbl>

Not we can plot our results, let’s look at Monaco’s GDP per capita over time. We see a strong upward trend since the start of our dataset (1970)

monaco <- global_economy %>%
                filter(Country == "Monaco")

# Plot results for Monaco
autoplot(monaco) + labs(x="Year", y="GDP per Capita (USD)", title="Monaco GDP per Capita over Time")

## Plot variable not specified, automatically selected `.vars = GDP`

## Warning: Removed 11 rows containing missing values (`geom_line()`).

Exercise 3.2

First, plotting the USA GDP over time. There’s not much seasonal variation in this dataset, so a transformation isn’t likely necessary.

usa <- global_economy %>% filter(Country == "United States")

autoplot(usa)

## Plot variable not specified, automatically selected `.vars = GDP`

Next, we can plot from aus_livestock

bulls <- aus_livestock %>% filter (Animal == "Bulls, bullocks and steers" & State=="Victoria")

autoplot(bulls, Count)

We see much more seasonal variation from this time series unlike before. We can us e a classic decomposition method from our text to break out the seasonal, trend, and cyclic components of this time series.

bulls %>% 
  model(
    classical_decomposition(Count, type = "additive")
  ) %>% 
  components() %>%
  autoplot() +
  labs(title = "Bulls, bullock and steer slaughters in Australia")

## Warning: Removed 6 rows containing missing values (`geom_line()`).

We can now

autoplot(vic_elec)

## Plot variable not specified, automatically selected `.vars = Demand`

Let’s transform the australian gas data using the X ARIMA method used in the text

autoplot(aus_production, Gas)

aus_x11_dcmp <- aus_production %>%
  model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>%
  components()
autoplot(aus_x11_dcmp) +
  labs(title = "X-11 Decomposition of total Australian Gas Production")

We can see the different time series components as a result of the X-11 decomposition.

We can use the SEATS method to decompose this dataset

# Use SEATS decomposition on Australian gas data
aus_gas_seats <- aus_production %>%
  model(seats = X_13ARIMA_SEATS(Gas ~ seats())) %>%
  components()
autoplot(aus_gas_seats) +
  labs(title = "Decomposition of total Australian gas production")

Exercise 3.3

First, let’s look at the canadian_gas time series data with a basic autoplot

autoplot(canadian_gas)

## Plot variable not specified, automatically selected `.vars = Volume`

Let’s run and plot a Box-Cox transform to see if this transformation is appropriate for this data.

lambda <- canadian_gas %>%
  features(Volume, features=guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda))

In this case, the Box-cox transformation didn’t change much with a returned value of \(\lambda=0.5767...\). Additionally, the seasonal variation within this isn’t constant over our timeframe, indicating our selected value of \(\lambda\) may not be optimal.

Exercise 3.4

# Let's pick a single series value
retail <- aus_retail %>%
  filter(`Series ID` == "A3349849A")

autoplot(retail)

## Plot variable not specified, automatically selected `.vars = Turnover`

Now we can apply Box-Cox transform to this dataset, we can try this with varying values of lambda to see which

lambdas = seq(0,5, 0.5)

# Try BC transform for varying values of lambda, then plot
for (l in lambdas){
  title <- glue("Box-Cox Transformation with lambda = {l}")
  p <- retail %>% autoplot(box_cox(Turnover, l)) + labs(x="Month", y="Monthly Turnover", title=title)
  print(p)
}

From our plots, we see a lower value of \(\lambda\) is a better transform for our data, as towards the higher end we see more variance in our seasonal variations (a “wider” cycle). We can verify this using the Guerrero feature listed in the book.

# Find ideal lambda for retail data
lambda <-retail %>%
  features(Turnover, features=guerrero) %>%
  pull(lambda_guerrero)
print(lambda)

## [1] 0.5054374

Exercise 3.5

# Running Box-Cox on Tobacco series from aus production
tobacco_lambda <- aus_production %>% features(Tobacco, features=guerrero) %>%
  pull(lambda_guerrero)

aus_production %>% autoplot(box_cox(Tobacco, tobacco_lambda)) + labs(x="Week", y="Tobacco Production (tonnes)")

## Warning: Removed 24 rows containing missing values (`geom_line()`).

Now let’s plot the Economy passengers series transformed. We’ll use the same Box-Cox transformation method used above. For this data, we see some large dips in 1989 and 1993

# Filter and select down to economy passnegers between Melbourne and Sydney
economy <- ansett %>% filter(Class == "Economy" & Airports == "MEL-SYD") %>% select(Passengers)

economy_lambda <- economy %>% features(Passengers, features=guerrero) %>%
  pull(lambda_guerrero)
print(economy_lambda)

## [1] 1.999927

# Plot the box cox transform!
economy %>% autoplot(box_cox(Passengers, economy_lambda))

Now we can run a Box-Cox transformation against the pedestrian dtaa at Southern Cross Station

scs <- pedestrian %>% filter(Sensor == "Southern Cross Station")

scs_lambda <- scs %>% features(Count, features=guerrero) %>%
  pull(lambda_guerrero)

print(scs_lambda)

## [1] -0.2501616

# Plot the box cox transform!
scs %>% autoplot(box_cox(Count, scs_lambda))

Exercise 3.7

# time series def from the book
gas <- tail(tsibbledata::aus_production, 5*4) %>% select(Gas)

autoplot(gas, Gas)

There are quarterly seasonal fluctuations in this data, as well as a general upward trend throughout the timespan of this data (5 years). We can run a classical multiplicative decomposition on this

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) |>
  components() |>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Australian gas production (2006 - 2010)")

## Warning: Removed 2 rows containing missing values (`geom_line()`).

From our classical decomposition we see an upward trend in gas production over the past 5 years and quarterly cycles. This is in line with our observations above. We can take a 5-MA moving average to adjust this data for seasonal cycles

# Run X-11 to make seasonal adjustment
gas_smoothed <- gas %>%
  model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>% components()

# plot seasonally adjusted data
gas_smoothed |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = season_adjust))

In our seasonally-adjusted plot we see a clearer picture of the overall trend in our data for quarterly gas production

Now we can add an outlier point to our data and recompute the seasonal adjustment:

# Add 300 to a quarter gas value
gas_outlier <- gas %>% mutate(Gas = ifelse(`Quarter` == yearquarter("2005 Q3"), 471, Gas))

# Recompute saesonal adjustment (X-11 again) and plot
gas_smoothed <- gas_outlier %>%
  model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>% components()

# plot seasonally adjusted data
gas_smoothed |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted (with Outlier)"))

We see a huge spike on our outlier value for Q3 2005. This is close to the end of our series, let’s try one in the middle of our series (Q1 2008)

# Add 300 to a quarter gas value
gas_outlier <- gas %>% mutate(Gas = ifelse(`Quarter` == yearquarter("2008 Q1"), 471, Gas))

# Recompute seasonal adjustment (X-11 again) and plot
gas_smoothed <- gas_outlier %>%
  model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) %>% components()

# plot seasonally adjusted data
gas_smoothed |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted (with Outlier)"))

Here we see a huge spike in the middle, which makes it difficult to get a sense of scale with our trend cycle.

Exercise 3.8

First, let’s visualize the raw data present in aus_retail

autoplot(aus_retail)

## Plot variable not specified, automatically selected `.vars = Turnover`

Let’s first select a single series, using the method from Chapter 2 of the text

# Select single series ID
myseries <- aus_retail %>%
  filter(`Series ID` == "A3349849A")

Now let’s decompose the series using the X-11 method, again, we’ll mirror the methods used in the text using the feasts library and it’s X-13 ARIMA SEATS method for seasonal adjustment:

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) + labs(title ="Decomposition of total Australian retail turnover using X-11.")

Exercise 3.9

The decomposition in these charts shows a steady upwards trend in the data, with a seasonality of a month. There’s some outlier periods, which is emphasized by the remainder plot around 1991 - 1992. The seasonal component plot shows a wide variance in the level of the time series by month, with peaks in springtime (March) and December, and some strong dips. There’s also less of a lag pattern we can see from the

The recession is visible from the remainder plot below. This makes sense as a large recession would generally not be considered a seasonal event and would break the seasonal and trend cycles within this time series.