library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.2
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tsibble 1.1.6 ✔ feasts 0.4.1
## ✔ tsibbledata 0.4.1 ✔ fable 0.4.1
## Warning: package 'tsibble' was built under R version 4.4.2
## Warning: package 'tsibbledata' was built under R version 4.4.2
## Warning: package 'feasts' was built under R version 4.4.2
## Warning: package 'fabletools' was built under R version 4.4.2
## Warning: package 'fable' was built under R version 4.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tsibble)
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.4.2
##
## Attaching package: 'seasonal'
##
## The following object is masked from 'package:tibble':
##
## view
library(slider)
## Warning: package 'slider' was built under R version 4.4.2
#Introduction
In this homework assignment I will be submitting exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the Hyndman online Forecasting book.
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
# Viewing the global_economy information
#help("global_economy")
# Loading the global economy data with the mutated gdp per capita data, and selecting only the mutated column
GDP <- global_economy %>%
mutate(gdp_per_capita = GDP/Population) %>%
select(gdp_per_capita)
# Viewing the data
head(GDP)
## # A tsibble: 6 x 3 [1Y]
## # Key: Country [1]
## gdp_per_capita Year Country
## <dbl> <dbl> <fct>
## 1 59.8 1960 Afghanistan
## 2 59.9 1961 Afghanistan
## 3 58.5 1962 Afghanistan
## 4 78.8 1963 Afghanistan
## 5 82.2 1964 Afghanistan
## 6 101. 1965 Afghanistan
# Removing the legened to see the plot of GDP per capita for each country over time.
autoplot(GDP, gdp_per_capita, show.legend = FALSE)
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Finding the country with the highest gdp per capita in the most recent years found in the data
# Finding the three most recent years in our dataset (2017, 2016, and 2015)
recent_years <- unique(GDP$Year) %>% tail(3)
# Finding the country with highest gdp per capita in 2017
GDP %>%
filter(Year == 2017) %>%
arrange(desc(gdp_per_capita))
## # A tsibble: 262 x 3 [1Y]
## # Key: Country [262]
## gdp_per_capita Year Country
## <dbl> <dbl> <fct>
## 1 104103. 2017 Luxembourg
## 2 80893. 2017 Macao SAR, China
## 3 80190. 2017 Switzerland
## 4 75505. 2017 Norway
## 5 70057. 2017 Iceland
## 6 69331. 2017 Ireland
## 7 63249. 2017 Qatar
## 8 59532. 2017 United States
## 9 58070. 2017 North America
## 10 57714. 2017 Singapore
## # ℹ 252 more rows
# Finding the country with highest gdp per capita in 2016
GDP %>%
filter(Year == 2016) %>%
arrange(desc(gdp_per_capita))
## # A tsibble: 262 x 3 [1Y]
## # Key: Country [262]
## gdp_per_capita Year Country
## <dbl> <dbl> <fct>
## 1 168011. 2016 Monaco
## 2 164993. 2016 Liechtenstein
## 3 100739. 2016 Luxembourg
## 4 79866. 2016 Switzerland
## 5 78730. 2016 Isle of Man
## 6 74017. 2016 Macao SAR, China
## 7 70890. 2016 Norway
## 8 64100. 2016 Ireland
## 9 60530. 2016 Iceland
## 10 59044. 2016 Qatar
## # ℹ 252 more rows
# Finding the country with highest gdp per capita in 2015
GDP %>%
filter(Year == 2015) %>%
arrange(desc(gdp_per_capita))
## # A tsibble: 262 x 3 [1Y]
## # Key: Country [262]
## gdp_per_capita Year Country
## <dbl> <dbl> <fct>
## 1 167591. 2015 Liechtenstein
## 2 163369. 2015 Monaco
## 3 101447. 2015 Luxembourg
## 4 82016. 2015 Switzerland
## 5 81672. 2015 Isle of Man
## 6 75484. 2015 Macao SAR, China
## 7 74498. 2015 Norway
## 8 65177. 2015 Qatar
## 9 61808. 2015 Ireland
## 10 56561. 2015 Australia
## # ℹ 252 more rows
In the most recent data we have (2017), Luxembourg was the country with the highest GDP per capita at $104,103.0367 USD. In 2016 and 2015 Luxembourg was third highest, so we can say that they seem to have consistently high GDP per capita, at least in recent history. Macao SAR, China has recently started climbing from top six to top two in 2017, whereas Monaco which was a strong GDP per capita country is not in the top ten anymore as of 2017.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.
# Loading the US gdp data and mutating in gdp per capita data as a population transformation
us.gdp <- global_economy %>%
mutate(gdp_per_capita = GDP/Population) %>%
select(GDP, gdp_per_capita) %>%
filter(Country == "United States")
head(us.gdp)
## # A tsibble: 6 x 4 [1Y]
## # Key: Country [1]
## GDP gdp_per_capita Year Country
## <dbl> <dbl> <dbl> <fct>
## 1 543300000000 3007. 1960 United States
## 2 563300000000 3067. 1961 United States
## 3 605100000000 3244. 1962 United States
## 4 638600000000 3375. 1963 United States
## 5 685800000000 3574. 1964 United States
## 6 743700000000 3828. 1965 United States
us.gdp %>% autoplot(GDP)
us.gdp %>% autoplot(gdp_per_capita)
The effect of adjusting the US gdp data for population did not have a major impact on our vizualizations. The comparison of the adjusted data to the unadjusted data, however, is important for this reason: if there is a material difference in the two plots we could hypothesize different explainations and research further. For example is the gpd per capita had a lesser or decreasing trend and the gdp had an strong positive trend we could explore the possibility of an increasing younger population, or greater returns for the same production of products due to political interplay, etc.
#help("aus_livestock")
# Loading in tsibble of bulls bullocks and steers slaughtered in Victoria
aus_livestock <- aus_livestock
livestock <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers",
State == "Victoria")
head(livestock)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Victoria 109200
## 2 1976 Aug Bulls, bullocks and steers Victoria 94700
## 3 1976 Sep Bulls, bullocks and steers Victoria 95500
## 4 1976 Oct Bulls, bullocks and steers Victoria 94800
## 5 1976 Nov Bulls, bullocks and steers Victoria 94100
## 6 1976 Dec Bulls, bullocks and steers Victoria 98300
livestock %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Count`
No adjustment was done to the plot of slaughtering of bulls, bullocks and steers, however if the dataset contained daily indexing an average slaughter count per day could be calculated to do a calendar adjustment - to account for variation per month due to the different number of days in a month.
#help("vic_elec")
vic_elec %>%
autoplot(Demand)
No transformation was done to the demand plot from vic_electric, a transformation to aggregate the half hour demand into daily demand may be appropriate depending on the direction of the anaysis, but this aggregation would miss out on any daily seasonality only available with less than daily data frequency.
#help(aus_production)
aus_production %>%
autoplot(Gas)
# Making a mathematical box-cox adjustement due to the change in varation of the data
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
Variance over time is standardized with in this adjustment with the guerror feature choosing lamba = 0.11.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
# Viewing the canadian_gas data over time
autoplot(canadian_gas, Volume)
The gas production data shows a variance which increases and decreases over time. Because of the cyclic nature of the variance, standardizing the variance would be in essence hiding the cyclic nature when that cyclic nature might be a feature of the data to be explored. Standardizing the variance with box-cox is a good choice when the variance is steadily changing overtime.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(123678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
#help("aus_retail")
myseries %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
The plot of retail data above shows variance increasing with time so I will do a guerrero box-cox transformation to standardize the variance.
lambda_retail <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda_retail))+
labs(title = latex2exp::TeX(paste0(
"Box Cox Transformation of Australian Retail Turnover with $\\lambda$ = ",
round(lambda_retail,2))))
The positive lambda value reduces the right skewness of the data, and the non zero lambda indicates a square root transformation.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
aus_production %>%
autoplot(Tobacco)+
labs(title = "Tobacco")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda_tobacco <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda_tobacco)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco with $\\lambda$ = ",
round(lambda_tobacco,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
The box-cox transformation here uses lambda 0.93
ansett %>%
filter(Class == "Economy",Airports == "MEL-SYD") %>%
mutate(Passengers = Passengers/1000) %>%
autoplot(Passengers)
lambda_class <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD") %>%
mutate(Passengers = Passengers/1000) %>%
autoplot(box_cox(Passengers, lambda = lambda_class)) +
labs(y = "Passengers ('000)",
title = latex2exp::TeX(paste0(
"Transformed Ansett Airlines Economy Class with $\\lambda$ = ",
round(lambda_class,2))))
The box-cox transformation here uses lambda 2.0
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(Count) +
labs(title = "Pedestrian Counts at Southern Cross Station")
lambda_count <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(box_cox(Count,lambda_count))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Pedestrian Counts at Southern Cross Station with $\\lambda$ = ",
round(lambda_count,2))))
The box-cox transformation here uses lambda -0.23
Show that a 3 × 5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
# Showing the results of a 3 x 5 MA with the beer data from aus_production.
beer <- aus_production |>
filter(year(Quarter) >= 1992) |>
select(Quarter, Beer)
beer_ma <- beer |>
mutate(
`5-MA` = slider::slide_dbl(Beer, mean,
.before = 2, .after = 2, .complete = TRUE),
`3x5-MA` = slider::slide_dbl(`5-MA`, mean,
.before = 1, .after = 1, .complete = TRUE)
)
head(beer_ma,7)
## # A tsibble: 7 x 4 [1Q]
## Quarter Beer `5-MA` `3x5-MA`
## <qtr> <dbl> <dbl> <dbl>
## 1 1992 Q1 443 NA NA
## 2 1992 Q2 410 NA NA
## 3 1992 Q3 420 448. NA
## 4 1992 Q4 532 443. 445.
## 5 1993 Q1 433 443. 449.
## 6 1993 Q2 421 462. 450.
## 7 1993 Q3 410 445 447.
# Showing the results of a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067 on the same data as the above demonstration of the 3 X 5-MA.
weights <- c(0.067, 0.133, 0.200, 0.200, 0.200, 0.133, 0.067)
# Defining a function for weighted moving average
weighted_ma <- function(x, weights) {
if (length(x) < length(weights)) return(NA_real_)
sum(x * weights, na.rm = TRUE)
}
# Applying the weighted moving average to the data
beer_weightedMA <- beer %>%
mutate(ma_7 = slide_dbl(
.x = Beer,
.f = ~weighted_ma(.x, weights),
.before = 3,
.after = 3
))
head(beer_weightedMA, 7)
## # A tsibble: 7 x 3 [1Q]
## Quarter Beer ma_7
## <qtr> <dbl> <dbl>
## 1 1992 Q1 443 NA
## 2 1992 Q2 410 NA
## 3 1992 Q3 420 NA
## 4 1992 Q4 532 445.
## 5 1993 Q1 433 449.
## 6 1993 Q2 421 450.
## 7 1993 Q3 410 447.
The above results of beer_ma and beer_weightedMA are shown above - they are the very similar results, but may differ I assume due to rounding.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
A. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas <- tail(aus_production, 5*4) |> select(Gas)
#Plotting the time series
autoplot(gas,Gas)
The trend-cycle seems to be a weak positive trend, with no cyclicity seen in the data available. There is very strong seasonality with peaks in Q2 and Q3 and lows in Q1 and Q4.
B. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
# Viewing the components
gas.multiplicative <- gas %>%
model(classical_decomposition(Gas,type = "multiplicative")) %>%
components()
# Viewing the component plots
gas.multiplicative %>% autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
mean(gas.multiplicative$seasonal)
## [1] 1
The seasonality calculated with multiplicative decomposition is 1.
mean(gas.multiplicative$trend, na.rm=TRUE)
## [1] 216.3203
The trend cycle calculated with multiplicative decomposition is 216.3203
C.Do the results support the graphical interpretation from part a?
The calculated seasonality is 1, which is reflective of what we see in part a, the seasonality seen is per year. We obtained a positive trend, which is also what is seen in part a’s plot.
D. Compute and plot the seasonally adjusted data.
# Plotting seasonaly adjusted data
gas %>%
model(
classical_decomposition(Gas, type = 'multiplicative')
) %>%
components() %>%
ggplot(aes(x=Quarter)) +
geom_line(aes(y = season_adjust))
gas300 <- gas
gas300$Gas[1] <- gas300$Gas[1]+300
gas300 %>%
model(
classical_decomposition(Gas, type = 'multiplicative')
) %>%
components() %>%
ggplot(aes(x=Quarter)) +
geom_line(aes(y = season_adjust))
In the seasonally adjusted data the outlier added in remains shown on the plot. This is expected because the outlier is not part of the seasonality of the data.
F. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas300 <- gas
gas300$Gas[10] <- gas300$Gas[10]+300
gas300 %>%
model(
classical_decomposition(Gas, type = 'multiplicative')
) %>%
components() %>%
ggplot(aes(x=Quarter)) +
geom_line(aes(y = season_adjust))
In the example where the outlier is at the end of the data, the outlier seems to flatten out the seasonally adjusted data more than if the outlier is in the center. It seems that the outlier may be affecting the mean of either the peak or the troughs of the seasonality and changing the calculations of how much variation is calculated into the seasonality.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components() %>%
autoplot()
The X-11 decomposition method reveals to me a spike in Residual effect shortly prior to 1990, and a variably changing seasonality overtime where before I had assumed the seasonality was steadily increasing.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
At first glance, based on scaling, I would have assumed that the seasonality of the data has a much greater affect on the value plot in the STL decomposition. When taking a closer look we can see that the fluctuations in seasonality only have about a magnitude of 100 to -100, where as the value is shown to go from about 6000 to 9000. The scale of the remainder is about the same as the seasonality with an outlier present. This means that most of the change in value is due to the trend-cycle.
The recession of 1991/1992 is visible as an outlier in the remainder component of the plots. This makes sense because the recession was not due to seasonality or standard cyclic nature, but likely other factors.