Homework2_Steve

Import packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(fma)
library(fpp2)

## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ expsmooth 2.3

library(tsibble)

## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(tsibbledata)
library(dplyr)
library(fpp3)

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ feasts     0.3.1     ✔ fabletools 0.4.0
## ✔ fable      0.3.3     
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
## 
## Attaching package: 'fpp3'
## 
## The following object is masked from 'package:fpp2':
## 
##     insurance

library(seasonal)

## 
## Attaching package: 'seasonal'
## 
## The following object is masked from 'package:tibble':
## 
##     view

Question 1

global_economy %>%
  autoplot(GDP/Population, show.legend = FALSE) +
  labs(title = "GDP per capita")

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

global_economy_capita <- global_economy %>% mutate(gdp_capita = GDP/Population)

filtered_economy <- global_economy %>% 
  mutate(gdp_per_capita = GDP/Population) %>% filter(gdp_per_capita > 100000)
filtered_economy %>% arrange(desc(gdp_per_capita))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tsibble: 42 x 10 [1Y]
## # Key:       Country [4]
##    Country       Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Monaco        MCO    2014 7060236168.  7.18     NA      NA      NA      38132
##  2 Monaco        MCO    2008 6476490406.  0.732    NA      NA      NA      35853
##  3 Liechtenstein LIE    2014 6657170923. NA        NA      NA      NA      37127
##  4 Liechtenstein LIE    2013 6391735894. NA        NA      NA      NA      36834
##  5 Monaco        MCO    2013 6553372278.  9.57     NA      NA      NA      37971
##  6 Monaco        MCO    2016 6468252212.  3.21     NA      NA      NA      38499
##  7 Liechtenstein LIE    2015 6268391521. NA        NA      NA      NA      37403
##  8 Monaco        MCO    2007 5867916781. 14.4      NA      NA      NA      35111
##  9 Liechtenstein LIE    2016 6214633651. NA        NA      NA      NA      37666
## 10 Monaco        MCO    2015 6258178995.  4.94     NA      NA      NA      38307
## # ℹ 32 more rows
## # ℹ 1 more variable: gdp_per_capita <dbl>

We can see that Monaco has the highest gdp per capita followed by liechtenstein. These two have had the highest GDP per capita since about 1990.

Question 2

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title= "GDP", y = "$US", x= "Year")

No transformations are needed for US Data

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers") %>%
  filter(State == "Victoria") %>%
  autoplot(Count) +
  labs(title = "Bull, Bullocks and Steers Counts in Victoria")

No transformation was needed for this plot

vic_elec %>%
  autoplot(Demand) + 
  labs(title = "Victorian Electricity Demand")

vic_elec_transformed <- vic_elec %>%
  mutate(week = yearweek(Time))%>%
  index_by(week) %>%
  summarize(Demand = sum(Demand))
vic_elec_transformed[3:(nrow(vic_elec_transformed)-1),] %>%
  autoplot(Demand) +
  labs(title =  "Weekly Victorian Electricity Demand")

Because the starting data is very noisy, the data was transformed to be weekly. This data is much easier to read than initially.

aus_production %>%
  autoplot(Gas) +
  labs(title = "Australian Gas Production")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs("Transformed Australian Gas Production")

lambda

## [1] 0.1095171

This plot was modified so that the seasonal variance was equalized, using a box-cox transformation. It is now easier to see the upward trend occuring. The lambda value is about 0.11.

Question 3

canadian_gas %>% 
  autoplot(Volume) + 
  labs(title = "Gas Production without Transformation")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) + 
  labs(title = "Gas Production with Box-Cox Transformation")

lambda

## [1] 0.5767648

The Box-Cox transformation doesn’t help, as the seasonal variation isn’t equalized accross the plot. This is because the seasonal variation from 1995 to 2005 is too different from the variation displayed from 1975 to 1990.

Question 4

set.seed(12384)
retail_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
retail_series %>%
  autoplot(Turnover) +
  labs(title = "Retail Data without Transformation")

lambda_retail <- retail_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

retail_series %>%
  autoplot(box_cox(Turnover, lambda_retail)) +
  labs(y = "Turnover", title= "Retail Data with Transformation")

print(lambda_retail)

## [1] 0.1946443

The value of lambda would be 0.19 to make the seasonal variation as uniform as possible.

Question 5

aus_production %>%
  autoplot(Tobacco) + 
  labs(title = "Australian Tobacco Production without Transformations")

## Warning: Removed 24 rows containing missing values (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(title = "Australian Tobacco Production with Transformations")

## Warning: Removed 24 rows containing missing values (`geom_line()`).

print(lambda)

## [1] 0.9264636

To make seasonal variance more uniform, lambda was set to 0.93. Because lambda is close to 1, we can tell that seasonal variance was already relatively uniform

ansett %>%
  filter(Airports == "MEL-SYD") %>%
  filter(Class == "Economy") %>%
  autoplot(Passengers) +
  labs(title = "Untransformed Passenger Data")

lambda <- ansett %>%
  filter(Airports == "MEL-SYD") %>%
  filter(Class == "Economy") %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
ansett %>%
  filter(Airports == "MEL-SYD") %>%
  filter(Class == "Economy") %>%
  autoplot(box_cox(Passengers, lambda)) + 
  labs(title = "Transformed Passenger Data")

print(lambda)

## [1] 1.999927

The lambda value that stabilized the variance the most was found to be about 2

pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(Count) +
  labs(title = "Untransformed Pedestrian Counts")

lambda <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(box_cox(Count, lambda)) + 
  labs(title = "Transformed Pedestrian Counts")

print(lambda)

## [1] -0.2501616

To create uniform seasonal variance, the optimal lambda was found to be -0.25

Question 7

gas <- tail(aus_production, 5*4) %>%
  select(Gas)
autoplot(gas) +
  labs(title = "Australian Gas Production")

## Plot variable not specified, automatically selected `.vars = Gas`

We can see that there is certainly a seasonal componen to the data, as well as an upward trend in the data.

gas_model <- gas %>%
  model(classical_decomposition( type = "multiplicative")) %>%
  components()

## Model not specified, defaulting to automatic modelling of the `Gas` variable.
## Override this using the model formula.

gas %>%
  model(classical_decomposition( type = "multiplicative")) %>%
  components() %>%
  autoplot()

## Model not specified, defaulting to automatic modelling of the `Gas` variable.
## Override this using the model formula.

## Warning: Removed 2 rows containing missing values (`geom_line()`).

The results confirm that there is a seasonal element to the data as well as an upward ongoing trend.

gas_model %>%
  as_tsibble() %>%
  autoplot(Gas) +
  geom_line(aes(y = season_adjust), color = "Blue") +
  labs(title = "Adjusted Gas Production")

gas_outlier <- gas
gas_outlier[4,1] <- 600
gas_model_outlier <- gas_outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()
gas_model_outlier %>%
  as_tsibble() %>%
  autoplot(Gas) + 
  geom_line(aes(y = season_adjust), color = "Blue") +
  labs(title = "Trend Data with Early Outlier")

Putting an outlier in the beginning of the data has a significant impact on the seasonally adjusted data. The adjusted data still contains the seasonal component to it.

gas_outlier[4,1] <- 224
gas_outlier[20,1] <- 600
gas_model_outlier <- gas_outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()
gas_model_outlier %>%
  as_tsibble() %>%
  autoplot(Gas) +
  geom_line(aes(y = season_adjust), color = "Blue") +
  labs(title = "Trend Data with late outlier")

Putting the outlier at the end of the data doesn’t impact the seasonally adjusted data until the outlier is reached.

Question 8

retail_series %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

retail_x11 <- retail_series %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(retail_x11) +
  labs(title = "Retail Data using X-11")

One feature that is revealed is that the seasonal variance is less from 1995 to 2010 and higher in the years before and after that period. One thing that was made evident on the trend is that It was fairly linear, even though there are many peaks and valleys on the way upwards.

Question 9

A.) Overall, the trend in the data is linearly upwards until 1991 where the trend remains linear but the slope decreases. The scale of trend and value are about the same, as you’d expect. The scale of seasonality is much wider than the remainder, which implies that seasonality is important to the data. The scale of the remainder would be much smaller if not for the data from 1991 to 1992

B.) Yes, the recession is visible in the remainder data. We can also determine that the recession happens at this time because of the decrease in the slope of the trend plot at this point.

Homework2_Steve_Phillips

Steve Phillips

2024-02-11

Import packages

Question 1

We can see that Monaco has the highest gdp per capita followed by liechtenstein. These two have had the highest GDP per capita since about 1990.

Question 2

No transformations are needed for US Data

No transformation was needed for this plot

Because the starting data is very noisy, the data was transformed to be weekly. This data is much easier to read than initially.

This plot was modified so that the seasonal variance was equalized, using a box-cox transformation. It is now easier to see the upward trend occuring. The lambda value is about 0.11.

Question 3

The Box-Cox transformation doesn’t help, as the seasonal variation isn’t equalized accross the plot. This is because the seasonal variation from 1995 to 2005 is too different from the variation displayed from 1975 to 1990.

Question 4

The value of lambda would be 0.19 to make the seasonal variation as uniform as possible.

Question 5

To make seasonal variance more uniform, lambda was set to 0.93. Because lambda is close to 1, we can tell that seasonal variance was already relatively uniform

The lambda value that stabilized the variance the most was found to be about 2

Question 7

We can see that there is certainly a seasonal componen to the data, as well as an upward trend in the data.

The results confirm that there is a seasonal element to the data as well as an upward ongoing trend.

Putting an outlier in the beginning of the data has a significant impact on the seasonally adjusted data. The adjusted data still contains the seasonal component to it.

Putting the outlier at the end of the data doesn’t impact the seasonally adjusted data until the outlier is reached.

Question 8

One feature that is revealed is that the seasonal variance is less from 1995 to 2010 and higher in the years before and after that period. One thing that was made evident on the trend is that It was fairly linear, even though there are many peaks and valleys on the way upwards.

Question 9

B.) Yes, the recession is visible in the remainder data. We can also determine that the recession happens at this time because of the decrease in the slope of the trend plot at this point.