library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.2 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ✔ ggplot2 4.0.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(dplyr)
library(ggplot2)
# Calculate GDP per capita
global_economy <- global_economy %>%
mutate(gdp_per_capita = GDP / Population)
# Find top 10 countries by most recent GDP per capita
top_10 <- global_economy %>%
filter(Year == max(Year)) %>%
slice_max(gdp_per_capita, n = 10) %>%
pull(Country)
print(paste("Top 10 countries by GDP per capita (", max(global_economy$Year), ")", sep=""))
## [1] "Top 10 countries by GDP per capita (2017)"
global_economy %>%
filter(Year == max(Year), Country %in% top_10) %>%
select(Country, gdp_per_capita) %>%
arrange(desc(gdp_per_capita))
## # A tsibble: 10 x 3 [1Y]
## # Key: Country [10]
## Country gdp_per_capita Year
## <fct> <dbl> <dbl>
## 1 Luxembourg 104103. 2017
## 2 Macao SAR, China 80893. 2017
## 3 Switzerland 80190. 2017
## 4 Norway 75505. 2017
## 5 Iceland 70057. 2017
## 6 Ireland 69331. 2017
## 7 Qatar 63249. 2017
## 8 United States 59532. 2017
## 9 North America 58070. 2017
## 10 Singapore 57714. 2017
# 10 countries Plot
global_economy %>%
filter(Country %in% top_10) %>%
ggplot(aes(x = Year, y = gdp_per_capita, color = Country)) +
geom_line(linewidth = 1) +
labs(
title = "GDP per capita - Top 10 Countries",
x = "Year",
y = "GDP per capita ($US)"
) +
theme_minimal() +
theme(legend.position = "right")
## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_line()`).
# changes over time, convert to tibble
print("\nTop country by GDP per capita in different years:")
## [1] "\nTop country by GDP per capita in different years:"
global_economy %>%
as_tibble() %>%
filter(Year %in% c(1960, 1980, 2000, 2017)) %>%
group_by(Year) %>%
slice_max(gdp_per_capita, n = 1) %>%
select(Year, Country, gdp_per_capita) %>%
arrange(Year)
## # A tibble: 4 × 3
## # Groups: Year [4]
## Year Country gdp_per_capita
## <dbl> <fct> <dbl>
## 1 1960 United States 3007.
## 2 1980 Monaco 51529.
## 3 2000 Monaco 82535.
## 4 2017 Luxembourg 104103.
Luxembourg has the highest GDP per capita in 2017 at approximately $120,000. The leadership has changed over time, with Qatar leading in the 1970s-80s, then Switzerland and Norway in the 1990s, before Luxembourg took the top spot in the mid 2000s and has maintained it since.
United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.
library(fpp3)
us_gdp <- global_economy %>%
filter(Country == "United States")
us_gdp %>% autoplot(GDP) +
labs(title = "US GDP - Original", y = "GDP")
# Log transformation
us_gdp %>% autoplot(log(GDP)) +
labs(title = "US GDP - Log Transformed", y = "log(GDP)")
print("Effect: Exponential growth becomes linear trend")
## [1] "Effect: Exponential growth becomes linear trend"
bulls <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria")
bulls %>% autoplot(Count) +
labs(title = "Victorian Bulls - Original", y = "Count")
# Box-Cox transformation
lambda_bulls <- bulls %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
bulls %>% autoplot(box_cox(Count, lambda_bulls)) +
labs(title = paste("Victorian Bulls - Box-Cox (λ =", round(lambda_bulls, 2), ")"),
y = "Transformed Count")
print("Effect: Stabilizes changing variance over time")
## [1] "Effect: Stabilizes changing variance over time"
vic_elec %>% autoplot(Demand) +
labs(title = "Victorian Electricity Demand", y = "Demand (MW)")
print("No transformation needed, short series, stable variance")
## [1] "No transformation needed, short series, stable variance"
gas <- aus_production %>% select(Quarter, Gas)
gas %>% autoplot(Gas) +
labs(title = "Gas Production - Original", y = "Gas")
# Box-Cox transformation
lambda_gas <- gas %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
gas %>% autoplot(box_cox(Gas, lambda_gas)) +
labs(title = paste("Gas Production - Box-Cox (λ =", round(lambda_gas, 2), ")"),
y = "Transformed Gas")
print("Effect: Stabilizes increasing seasonal variation")
## [1] "Effect: Stabilizes increasing seasonal variation"
canadian_gas
## # A tsibble: 542 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
## 7 1960 Jul 0.966
## 8 1960 Aug 0.977
## 9 1960 Sep 1.03
## 10 1960 Oct 1.25
## # ℹ 532 more rows
canadian_gas %>% autoplot(Volume) +
labs(title = "Canadian Gas Production - Original", y = "Volume")
# Box-Cox transformation
lambda_canada <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>% autoplot(box_cox(Volume, lambda_canada)) +
labs(title = paste("Canadian Gas - Box-Cox (λ =", round(lambda_canada, 2), ")"),
y = "Transformed Volume")
Box-Cox transformation is unhelpful for the Canadian gas data because the seasonal pattern itself is changing over time, not just the variance. Box-Cox can only stabilize variance, but it cannot fix the fundamental change in seasonal structure visible in the data. The pattern in the 2000s is completely different from the 1960s-1980s. The transformation doesn’t simplify the series because the underlying seasonal behavior has evolved due to external factors like regulatory changes.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
myseries |> distinct(Industry, State)
## # A tibble: 1 × 2
## Industry State
## <chr> <chr>
## 1 Clothing, footwear and personal accessory retailing Northern Territory
myseries |> autoplot(Turnover) +
labs(title = "Original Retail Series", y = "Turnover")
# Calculate optimal lambda
lambda_retail <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
print(paste("Optimal lambda:", round(lambda_retail, 3)))
## [1] "Optimal lambda: 0.083"
#Box-Cox transformation
myseries |> autoplot(box_cox(Turnover, lambda_retail)) +
labs(title = paste("Box-Cox Transformed (λ =", round(lambda_retail, 2), ")"),
y = "Transformed Turnover")
I would use λ = 0.08, which is basically a log transformation. The raw series shows that seasonal ups and downs get bigger as retail turnover increases over time. After the Box-Cox transformation, the seasonal variation is much more stable across the whole period, which makes the pattern easier to analyze and model.
tobacco <- aus_production %>% select(Quarter, Tobacco)
tobacco %>% autoplot(Tobacco) +
labs(title = "Tobacco Production - Original")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Box-Cox
lambda_tobacco <- tobacco %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
tobacco %>% autoplot(box_cox(Tobacco, lambda_tobacco)) +
labs(title = paste("Tobacco - Box-Cox (λ =", round(lambda_tobacco, 2), ")"))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
print(paste("Tobacco: λ =", round(lambda_tobacco, 3)))
## [1] "Tobacco: λ = 0.926"
# Economy class Melbourne-Sydney from ansett
economy <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD")
economy %>% autoplot(Passengers) +
labs(title = "Economy Passengers MEL-SYD - Original")
# Box-Cox
lambda_economy <- economy %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
economy %>% autoplot(box_cox(Passengers, lambda_economy)) +
labs(title = paste("Economy - Box-Cox (λ =", round(lambda_economy, 2), ")"))
print(paste("Economy: λ =", round(lambda_economy, 3)))
## [1] "Economy: λ = 2"
# Pedestrian counts at Southern Cross Station
southern_cross <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
southern_cross %>% autoplot(Count) +
labs(title = "Southern Cross Station - Original")
# Box-Cox
lambda_ped <- southern_cross %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
southern_cross %>% autoplot(box_cox(Count, lambda_ped)) +
labs(title = paste("Southern Cross - Box-Cox (λ =", round(lambda_ped, 2), ")"))
print(paste("Southern Cross: λ =", round(lambda_ped, 3)))
## [1] "Southern Cross: λ = -0.25"
Answer:
Tobacco production: λ = 0.93, no transformation needed since λ is close to 1 and the variance is stable
Economy class MEL-SYD: λ = 2.0 square transformation. Economy class passenger data shows a structural break that Box-Cox cannot correct, even though λ suggests a transformation.
Southern Cross Station pedestrian counts λ = -0.25. Pedestrian counts benefit from a Box-Cox transformation because it reduces the effect of large spikes and stabilizes the variance
# a) Get last 5 years (5*4 = 20 quarters)
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas
## # A tsibble: 20 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
## 7 187 2007 Q1
## 8 234 2007 Q2
## 9 245 2007 Q3
## 10 205 2007 Q4
## 11 194 2008 Q1
## 12 229 2008 Q2
## 13 249 2008 Q3
## 14 203 2008 Q4
## 15 196 2009 Q1
## 16 238 2009 Q2
## 17 252 2009 Q3
## 18 210 2009 Q4
## 19 205 2010 Q1
## 20 236 2010 Q2
# b)
gas %>% autoplot(Gas) +
labs(title = "Gas Production - Last 5 Years",
y = "Gas Production")
# c)
gas_dcmp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
gas_dcmp
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposition(G… 2005 Q3 221 NA 1.13 NA 196.
## 2 "classical_decomposition(G… 2005 Q4 180 NA 0.925 NA 195.
## 3 "classical_decomposition(G… 2006 Q1 171 200. 0.875 0.974 195.
## 4 "classical_decomposition(G… 2006 Q2 224 204. 1.07 1.02 209.
## 5 "classical_decomposition(G… 2006 Q3 233 207 1.13 1.000 207.
## 6 "classical_decomposition(G… 2006 Q4 192 210. 0.925 0.987 208.
## 7 "classical_decomposition(G… 2007 Q1 187 213 0.875 1.00 214.
## 8 "classical_decomposition(G… 2007 Q2 234 216. 1.07 1.01 218.
## 9 "classical_decomposition(G… 2007 Q3 245 219. 1.13 0.996 218.
## 10 "classical_decomposition(G… 2007 Q4 205 219. 0.925 1.01 222.
## 11 "classical_decomposition(G… 2008 Q1 194 219. 0.875 1.01 222.
## 12 "classical_decomposition(G… 2008 Q2 229 219 1.07 0.974 213.
## 13 "classical_decomposition(G… 2008 Q3 249 219 1.13 1.01 221.
## 14 "classical_decomposition(G… 2008 Q4 203 220. 0.925 0.996 219.
## 15 "classical_decomposition(G… 2009 Q1 196 222. 0.875 1.01 224.
## 16 "classical_decomposition(G… 2009 Q2 238 223. 1.07 0.993 222.
## 17 "classical_decomposition(G… 2009 Q3 252 225. 1.13 0.994 224.
## 18 "classical_decomposition(G… 2009 Q4 210 226 0.925 1.00 227.
## 19 "classical_decomposition(G… 2010 Q1 205 NA 0.875 NA 234.
## 20 "classical_decomposition(G… 2010 Q2 236 NA 1.07 NA 220.
gas_dcmp %>% autoplot() +
labs(title = "Classical Multiplicative Decomposition of Gas Production")
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
# d)
print("Ye, the decomposition confirms the seasonal pattern and slight upward trend observed in the original plot, with little unexplained variation remaining.")
## [1] "Ye, the decomposition confirms the seasonal pattern and slight upward trend observed in the original plot, with little unexplained variation remaining."
# e)
gas_dcmp %>%
as_tsibble() %>%
autoplot(Gas, color = "gray") +
geom_line(aes(y = season_adjust), color = "blue") +
labs(title = "Gas Production: Original vs Seasonally Adjusted",
y = "Gas Production")
# f) Add outlier in the middle and recompute
gas_outlier <- gas %>%
mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))
gas_outlier_dcmp <- gas_outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
gas_outlier_dcmp %>% autoplot() +
labs(title = "Classical Decomposition with Outlier (Middle)")
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Compare seasonally adjusted
gas_outlier_dcmp %>%
as_tsibble() %>%
autoplot(Gas, color = "gray") +
geom_line(aes(y = season_adjust), color = "red") +
labs(title = "With Outlier in Middle",
y = "Gas Production")
# g) Outlier near the end
gas_outlier_end <- gas %>%
mutate(Gas = if_else(row_number() == 18, Gas + 300, Gas))
gas_outlier_end_dcmp <- gas_outlier_end %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
gas_outlier_end_dcmp %>% autoplot() +
labs(title = "Classical Decomposition with Outlier (End)")
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Compare seasonally adjusted
gas_outlier_end_dcmp %>%
as_tsibble() %>%
autoplot(Gas, color = "gray") +
geom_line(aes(y = season_adjust), color = "red") +
labs(title = "With Outlier Near End",
y = "Gas Production")
print("Yes, an outlier in the middle affects the estimated trend and seasonally adjusted values on both sides of the outlier, while an outlier near the end mainly affects the most recent estimates.")
## [1] "Yes, an outlier in the middle affects the estimated trend and seasonally adjusted values on both sides of the outlier, while an outlier near the end mainly affects the most recent estimates."
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
# X-11 decomposition
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
x11_dcmp %>% autoplot()
Yes, the X-11 decomposition reveals a major outlier at the very beginning of the series around 1982-1983 where the irregular component drops to around 0.85-0.88, which is a much larger deviation than anywhere else in the series. The decomposition also shows a slight dip in the trend around 1995 that might not have been obvious in the raw data. After the initial outlier, the irregular component is very stable, indicating the series is well-behaved with no other unusual features.
Figure 3.19:
Decomposition of the number of persons in the civilian labour force in
Australia each month from February 1978 to August 1995.
Figure 3.20:
Seasonal component from the decomposition shown in the previous
figure.
a)Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
The trend shows a steady increase in the civilian labor force over time, rising from about 6.5 million to around 9 million people. The seasonal component is relatively small compared to the overall level of the series, which suggests that seasonality plays only a minor role. Most of the remaining variation is fairly small and random. However, there is a noticeable negative spike in the remainder around 1991–1992, indicating an unusually large deviation during that period.
b)Is the recession of 1991/1992 visible in the estimated components? Yes, the recession is clearly visible as a sharp negative drop in the remainder component around 1991-1992, reaching approximately -400 thousand persons.