This assignment involves solving problems 3.1, 3.2, 3.3, 3.4, 3.5,3.7,3.8, and 3.9 from the Hyndman online Forecasting book.(https://otexts.com/fpp3/decomposition-exercises.html)
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.1 ✔ fabletools 0.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(latex2exp)
library(seasonal)
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
#print(global_economy)
global_economy |>
autoplot(GDP/Population) +
labs(title= "GDP per capita", y = "$US")+ theme(legend.position = "none")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Calculate the GDP per capita for each country
gdp_per_capita <- global_economy |>
group_by(Country) |>
#filter(Year == min(Year)) |>
mutate(GDP_per_capita = GDP / Population) |> select(Country,Year,GDP_per_capita)|>
arrange(desc(GDP_per_capita))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
print(gdp_per_capita)
## # A tsibble: 15,150 x 3 [1Y]
## # Key: Country [263]
## # Groups: Country [263]
## Country Year GDP_per_capita
## <fct> <dbl> <dbl>
## 1 Monaco 2014 185153.
## 2 Monaco 2008 180640.
## 3 Liechtenstein 2014 179308.
## 4 Liechtenstein 2013 173528.
## 5 Monaco 2013 172589.
## 6 Monaco 2016 168011.
## 7 Liechtenstein 2015 167591.
## 8 Monaco 2007 167125.
## 9 Liechtenstein 2016 164993.
## 10 Monaco 2015 163369.
## # ℹ 15,140 more rows
# Display the country with the highest GDP per capita
#highest_gdp_country <- gdp_per_capita[1, ]
#highest_gdp_country
global_economy |>
filter(Country == "Monaco") |>
autoplot(GDP/Population) +
labs(title= "GDP per capita", y = "$US")
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monaco had the highest GDP per capita in 2014. The GDP per capita plot for Monaco shows an uneven upward trend with some fluctuations over the years. No repeated patterns or seasonality are observed in the GDP per capita plot. Overall, the GDP per capita has an increasing trend for majority of the countries.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.a.United States GDP from global_economy. b. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. c. Victorian Electricity Demand from vic_elec. d. Gas production from aus_production.
a.United States GDP from global_economy.
global_economy |>
filter(Country == "United States") |>
autoplot(GDP) +
labs(title= "GDP of the United States", y = "$US")
The data can be influenced by changes in population.Therefore,it should be adjusted on a per capita basis as because the GDP per capita plot is usually more appropriate for representing GDP growth while accounting for population changes.
global_economy |>
filter(Country == "United States") |>
autoplot(GDP/Population) +
labs(title= "GDP Per Capita in the United States", y = "$US")
In this case, both the GDP plot and the GDP per capita plot show a similar increasing trend, with no differences. Therefore, the transformation does not have any effect here.
aus_livestock |>
filter(State == "Victoria" &
Animal == "Bulls, bullocks and steers")|>
autoplot(Count)+labs(title= "Monthlhy Slaughter of Victorian Bulls, Bullocks and Steers")
No transformation was applied here. Overall, a decreasing trend with fluctuations has been observed.
autoplot(vic_elec, Demand)+labs(title= "Half-hourly Victorian Electricity Demand")
#head(vic_elec)
# Calculate average daily demand
daily_demand <- vic_elec |>
index_by(Date) |>
summarise(Daily_Avg_Demand = mean(Demand, na.rm = TRUE))
#|>
#as_tsibble(index = Date)
# Plot average daily electricity demand
daily_demand |>
autoplot(Daily_Avg_Demand) +
labs(title = "Average Daily Electricity Demand in Victoria")+
#y = "Average Demand (MWh)", x = "Date") +
theme_minimal()
monthly_demand <- vic_elec %>%
index_by(Month = ~ yearmonth(.data$Date)) %>% # Group by year-month
summarise(Monthly_Avg_Demand = mean(Demand, na.rm = TRUE))
# Plot the monthly average electricity demand
monthly_demand |>
autoplot(Monthly_Avg_Demand) +
labs(title = "Monthly Average Electricity Demand in Victoria",
y = "Average Demand (MWh)", x = "Month") +
theme_minimal()
The data was transformed to account for calendar adjustments when calculating the average daily and monthly electricity demand. These transformations reveal clear seasonal patterns, with a noticeable increase in electricity demand during the summer and in the middle of the year in winter. The monthly demand clearly shows these seasonal changes.
#head(aus_production)
autoplot(aus_production, Gas)+labs(title= "Quarterly Gas Production (Petajoules) in Australia")
The size of the seasonal variation is not same in the above plot. To make the size of seasonal variation about the same across the whole series a Box-Cox transformation is a good choice. This transformation is dependent on the parameter “lambda (λ)” and a good value of the lambda is needed to make the size of the seasonal variation abut the same across the whole series. The guerrero feature can be used to choose a good value of lambda.
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.1095171
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
Now, the above series has the same size of the seasonal variation, thus it will make the forecasting model simpler.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
#head(canadian_gas)
autoplot(canadian_gas, Volume)+labs(title="Monthly Canadian Gas Production")
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
#lambda
canadian_gas |>
autoplot(box_cox(Volume, lambda))+labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Monthly Canadian Gas Production with $\\lambda$ = ",
round(lambda,2))))
The Box-Cox transformation is not suitable for the Canadian gas data because it was unable to make the seasonal variations consistent in size throughout the time series.This lambda or any other will not be able to achieve that.Therefore, this transformation will not help simplify the identification of patterns or the building of forecasting models.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(13579)
#head(aus_retail)
my_series <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(my_series,Turnover)
lambda <- my_series |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
my_series |>
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Australian Retail Trade Turnover with $\\lambda$ = ",
round(lambda,2))))
The Box-Cox transformation is suitable for the Australian trade turnover data because it has made the seasonal variations consistent in size throughout the time series.Hence, this transformation will help simplify the identification of patterns or the building of forecasting models.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
a.Tobacco from aus_production
#head(aus_production)
autoplot(aus_production, Tobacco)+labs(title= "Quarterly Tobacco Production in Australia")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco production in Australia with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
b.Economy class passengers between Melbourne and Sydney from ansett
#head(ansett)
ansett |>
filter(Airports == "MEL-SYD",
Class == "Economy") |>
autoplot(Passengers)
lambda <- ansett |>
filter(Airports == "MEL-SYD",
Class == "Economy") |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
ansett |>
filter(Airports == "MEL-SYD",
Class == "Economy") |>
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Economy Class Passenger with $\\lambda$ = ",
round(lambda,2))))
c.Pedestrian counts at Southern Cross Station from pedestrian
#head(pedestrian)
pedestrian|>filter(Sensor=="Southern Cross Station")|>
autoplot(Count)+labs(title="Pedestrian Count at Southern Cross Station")
lambda <- pedestrian |>
filter(Sensor == "Southern Cross Station")|>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
pedestrian |>
filter(Sensor == "Southern Cross Station") |>
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Pedestrian Count at Southern Cross Station with $\\lambda$ = ",
round(lambda,2))))
All three Box-Cox transformations above have made the seasonal variations consistent in size throughout the time series.
Consider the last five years of the Gas data from aus_production. a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? b.Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. c. Do the results support the graphical interpretation from part a? d. Compute and plot the seasonally adjusted data. e.Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?
last five years of the Gas data from aus_production:
gas <- tail(aus_production, 5*4) |> select(Gas)
gas|>autoplot(Gas)
A yearly increasing trend-cycle is observed in the time series for the last five years of the Gas data from aus_production.
b.Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot() +
labs(title = "Classical multiplicative decomposition of the last five years gas data")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
c.Do the results support the graphical interpretation from part a?
The results do support the graphical interpretation, showing an increasing trend over time with a recurring annual cycle. Additionally, one noticeable point here is that the estimates for the trend-cycle and random components are missing in the first and last few observations of this classical decomposition.
decomposition <- gas|>
model(classical_decomposition(Gas, type = "multiplicative"))|>
components()
decomposition|>as_tsibble()|> autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas",
title = "Seasonally Adjusted Gas Data")
The seasonally adjusted plot above reflect a plot of the gas data with the seasonal component removed, showing only the trend and random fluctuations for the full data observations.
e.Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_data<- tail(aus_production, 5*4)|>
select(Gas)
set.seed(12345)
random_row <- sample(nrow(gas_data), 1)
add_outlier <- gas_data |>
mutate(Gas = ifelse(row_number() == random_row, Gas + 300, Gas))
decomposition_with_outlier <- add_outlier |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
decomposition_with_outlier |>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y = season_adjust), colour = "#0072B2") +
labs(y = "Gas",
title = "Seasonally Adjusted Gas Data with Random Outlier")
The outlier has created a spike in the plot. This spike indicates that the outlier has disrupted the normal pattern in the seasonally adjusted data, which makes the trend and any forecasting based on it less reliable.
f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_data_last<- tail(aus_production, 5*4)|>
select(Gas)
last_index <- nrow(gas_data )
gas_data[last_index, "Gas"] <- gas_data [last_index, "Gas"] + 300
decomposition_with_outlier <- gas_data |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
decomposition_with_outlier |>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y = season_adjust), colour = "#0072B2") +
labs(y = "Gas",
title = "Seasonally Adjusted Gas Data with Outlier")
It doesn’t matter if the outlier is in the middle or at the end of the data. It still makes the data difficult to understand.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
set.seed(13579)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Australian Retail Trade Turnover Using X-11")
Significant spikes have been found the in the irregular plot. This indicates the presence of outliers in the data.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labor force in Australia each month from February 1978 to August 1995.
a.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
The trend component shows a steady annual increase in the number of persons in the civilian labor force in Australia from February 1978 to August 1995. This increase is generally consistent, with a slight dip between 1991 and 1992. The seasonal component reveals significant declines at the beginning of each year, with fluctuations throughout the rest of the year. The labor force reaches its highest median in December each year. The remainder component shows noticeable spikes. This suggests the presence of outliers in the data. These outliers may potentially impact the accuracy of any analysis.
b.Is the recession of 1991/1992 visible in the estimated components?
Yes, the recession of 1991/1992 is evident from the downward spike in the remainder plot.