3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.2.3

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.4.0

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'dplyr' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tsibble' was built under R version 4.2.3

## Warning: package 'tsibbledata' was built under R version 4.2.3

## Warning: package 'feasts' was built under R version 4.2.3

## Warning: package 'fabletools' was built under R version 4.2.3

## Warning: package 'fable' was built under R version 4.2.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)
global_economy

## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows

global_economy %>%
  autoplot(GDP/Population, show.legend=F)

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

global_economy2 <- global_economy
global_economy2 %>%
  group_by(Country) %>%
  mutate(GDP_Capita = GDP/Population) %>%
  arrange(desc(GDP_Capita))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tsibble: 15,150 x 10 [1Y]
## # Key:       Country [263]
## # Groups:    Country [263]
##    Country Code   Year    GDP Growth   CPI Imports Exports Population GDP_Capita
##    <fct>   <fct> <dbl>  <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>      <dbl>
##  1 Monaco  MCO    2014 7.06e9  7.18     NA      NA      NA      38132    185153.
##  2 Monaco  MCO    2008 6.48e9  0.732    NA      NA      NA      35853    180640.
##  3 Liecht… LIE    2014 6.66e9 NA        NA      NA      NA      37127    179308.
##  4 Liecht… LIE    2013 6.39e9 NA        NA      NA      NA      36834    173528.
##  5 Monaco  MCO    2013 6.55e9  9.57     NA      NA      NA      37971    172589.
##  6 Monaco  MCO    2016 6.47e9  3.21     NA      NA      NA      38499    168011.
##  7 Liecht… LIE    2015 6.27e9 NA        NA      NA      NA      37403    167591.
##  8 Monaco  MCO    2007 5.87e9 14.4      NA      NA      NA      35111    167125.
##  9 Liecht… LIE    2016 6.21e9 NA        NA      NA      NA      37666    164993.
## 10 Monaco  MCO    2015 6.26e9  4.94     NA      NA      NA      38307    163369.
## # ℹ 15,140 more rows

Monaco has the highest GDP per capita which occurs in 2014. In general, the overall trend for GDP per capita is increasing over time. From the graph, due to the amount of countries, it’s difficult to determine which line corresponds to which without the legend.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP)

No transformations seem necessary. Appears to follow a normal trend, with little variation. However, 3.1 of the text states “Any data that are affected by population changes can be adjusted to give per-capita data.” Therefore, we can apply that population transformation, which appears almost the same.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP/Population)

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter(State == "Victoria"& Animal == "Bulls, bullocks and steers") %>%
  autoplot(Count)

A lot of seasonal variation. Will apply box-cox transformations for normality.

lambda <- aus_livestock %>%
  filter(State == "Victoria"& Animal == "Bulls, bullocks and steers") %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
aus_livestock %>%
  filter(State == "Victoria"& Animal == "Bulls, bullocks and steers") %>%
  autoplot(box_cox(Count, lambda))

Looking closely you can see changes in variations of the plot. With a lambda value close to 0, it appears the power transformation performed was a log transform.

Victorian Electricity Demand from vic_elec.

vic_elec %>% 
  autoplot(Demand)

Appears to have trends, cycles and seasonality, however difficult to read with the data being captured every 30 minutes. Let’s try to group by month based on the Date field.

#library(lubridate)
#vic_elec %>% 
#  mutate(M_Y= format_ISO8601(Date,precision="ym")) %>%
#  index_by(M_Y) %>%
#  summarise(Demand=sum(Demand)) %>%
#  autoplot(Demand)
##Getting errors

Attempted to group by month and trend, did not work, as intended.

Gas production from aus_production.

aus_production %>%
  autoplot(Gas)

Applying another box-cox transformation:

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda))

This aims to make the size of the seasonal variation the same across the entire series, which would be important for modeling.

3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume)

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda))

The plots look almost identical and the seasonality does not appear normalized by the peaks and dips in the chart.

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

#code from 2.10 exercise 7
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

This returns a lambda of 0.271, which based on section 3.1 is closer to 1/2 than 0, therefore probably a square root plus linear transformation (lambda = 0.5)

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

#tobacco from aus_production
lambda_a <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

#Economy class passengers between Melourne and Sydney from ansett
lambda_b <- ansett %>%
  filter(Airports == 'MEL-SYD' & Class == 'Economy') %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

#Pedestrian counts at Southern Cross Station from pedestrian
lambda_c <- pedestrian %>%
  filter(Sensor == 'Southern Cross Station') %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

-Tobacco from aus_production = 0.92 -#Economy class passengers between Melbourne and Sydney from ansett = 2 -Pedestrian counts at Southern Cross Station from pedestrian = -0.25

3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot(Gas)

Upward trend year over year with seasonal highs in the summers and lows in the winters.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical decomposition with type = multiplicative")

## Warning: Removed 2 rows containing missing values (`geom_line()`).

### Do the results support the graphical interpretation from part a? Yes, when decomposed, you can clearly see the upward overall trend and seasonal component as in part a.

Compute and plot the seasonally adjusted data.

gas_decomp <- gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components()
  

#section 3.5
gas_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Gas in Australia") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values (`geom_line()`).

Not 100% sure how to compute. The code above was leveraged from section 3.5

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_mod <- gas
gas_mod$Gas[3] <- gas_mod$Gas[3] + 300
gas_mod_dc <- gas_mod %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components()

gas_mod_dc %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas",
       title = "Gas in Australia") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values (`geom_line()`).

You can see the outlier in Q1 2006. This skews the seasonality and takes on a downward trend.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Adding the outlier at the end should keep the trend since it is upward, however, it should still impact the seasonality as it would not follow the normal patterns shown. Any outlier would have to be addressed and normalized for modeling.

3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

#3.5
library(seasonal)

## Warning: package 'seasonal' was built under R version 4.2.3

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

x11_dcmp <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australia retail using X-11.")

The overall trend of turnover is increasing while overall seasonality is decreasing over time.

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

There is an overall positive trend in the labor force. There is more of a pickup between 1987 and 1992. However, the seasonal pattern is pretty consistent, but I do notice slightly larger peaks and low points between 1990-95. However, due to the overall increase in the workforce, the total trend is not significantly impacted due to this.

Is the recession of 1991/1992 visible in the estimated components?

The recession is visible in the remainder component but not in the trend or seasonal components. Per 3.6 STL is “robust to outliers, so that occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components. They will, however, affect the remainder component.

BrianSingh_Data624_HW2

2024-02-14