library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
data("global_economy")
# GDP per capita
global_economy <- global_economy %>%
mutate(GDP_per_capita = GDP / Population)
highest_gdp_per_capita <- global_economy %>%
index_by(Year) %>%
filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
ungroup() %>%
arrange(desc(GDP_per_capita))
print(highest_gdp_per_capita %>% select(Year, Country, GDP_per_capita))
## # A tsibble: 58 x 3 [1Y]
## # Key: Country [263]
## Year Country GDP_per_capita
## <dbl> <fct> <dbl>
## 1 2014 Monaco 185153.
## 2 2008 Monaco 180640.
## 3 2013 Liechtenstein 173528.
## 4 2016 Monaco 168011.
## 5 2015 Liechtenstein 167591.
## 6 2007 Monaco 167125.
## 7 2011 Monaco 162155.
## 8 2012 Monaco 152000.
## 9 2009 Monaco 149221.
## 10 2010 Monaco 144569.
## # ℹ 48 more rows
# plot
global_economy %>%
ggplot(aes(x = Year, y = GDP_per_capita, group = Country, color = Country)) +
geom_line(alpha = 0.7, size = 0.8) +
labs(title = "GDP per Capita Over Time for Each Country",
x = "Year",
y = "GDP per Capita (USD)",
color = "Country") +
theme_minimal() +
theme(legend.position = "none")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
# plot for change of the top country over time
highest_gdp_per_capita %>%
ggplot(aes(x = Year, y = GDP_per_capita, label = Country)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2, color = "red") +
geom_text(nudge_y = 5000, size = 3, check_overlap = TRUE) +
labs(title = "Highest GDP per Capita Over Time",
x = "Year",
y = "GDP per Capita (USD)") +
theme_minimal()
The latest data available seems to show that Luxembourg has the highest GDP. Over time, though, it seems Monaco had the highest GDP since data collection began for this dataset, while Liechtenstein took that spot once in the last ten years or so.
Starting with:
log transformation
library(fpp3)
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) +
labs(title = "United States GDP Over Time",
x = "Year", y = "GDP (USD)") +
theme_minimal()
global_economy %>%
filter(Country == "United States") %>%
autoplot(log(GDP)) +
labs(title = "Log-Transformed United States GDP Over Time",
x = "Year", y = "Log(GDP)") +
theme_minimal()
The log transformation made the trend more linear and stabilized varianc somewhat.
Box-Cox transformation
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
autoplot(Count) +
labs(title = "Slaughter of Victorian Bulls, Bullocks, and Steers",
x = "Year", y = "Count") +
theme_minimal()
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
mutate(Transformed_Count = box_cox(Count, lambda = 0)) %>%
autoplot(Transformed_Count) +
labs(title = "Log-Transformed Slaughter of Victorian Bulls, Bullocks, and Steers",
x = "Year", y = "Log(Count)") +
theme_minimal()
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
mutate(Diff_Count = difference(Count)) %>%
autoplot(Diff_Count) +
labs(title = "First-Order Differenced Slaughter of Victorian Bulls, Bullocks, and Steers",
x = "Year", y = "Differenced Count") +
theme_minimal()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
Box-cox transformation stabilized the variance slightly, but the first-order differencing removed the long-term trend but may have also caused some noise and large fluctuations.
log transformation
vic_elec %>%
autoplot(Demand) +
labs(title = "Victorian Electricity Demand Over Time",
x = "Time", y = "Demand (MW)") +
theme_minimal()
vic_elec %>%
mutate(Transformed_Demand = log(Demand)) %>%
autoplot(Transformed_Demand) +
labs(title = "Log-Transformed Victorian Electricity Demand",
x = "Time", y = "Log(Demand)") +
theme_minimal()
The log transformation here made the variance less pronounced and more stabilized (look at the Y axis).
differencing
aus_production %>%
autoplot(Gas) +
labs(title = "Australian Gas Production Over Time",
x = "Year", y = "Gas Production") +
theme_minimal()
aus_production %>%
mutate(Diff_Gas = difference(Gas)) %>%
autoplot(Diff_Gas) +
labs(title = "Differenced Australian Gas Production",
x = "Year", y = "Differenced Gas Production") +
theme_minimal()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
The first-order differencing removed the strong upward trend, though the plot still hows increasing variance and strong seasonality.
canadian_gas %>%
autoplot(Volume) +
labs(title = "Canadian Gas Production Over Time",
x = "Year", y = "Gas Production (Volume)") +
theme_minimal()
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
mutate(BoxCox_Volume = box_cox(Volume, lambda)) %>%
autoplot(BoxCox_Volume) +
labs(title = "Box-Cox Transformed Canadian Gas Production",
x = "Year", y = "Transformed Gas Production") +
theme_minimal()
print(paste("Optimal lambda:", lambda))
## [1] "Optimal lambda: 0.576764791258997"
Box-cox transformation has not removed the strong upward trend or the seasonal patterns in Canadian gas production. It does not make the series stationary.
## What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
aus_retail %>%
filter(Industry == "Department stores") %>%
autoplot(Turnover) +
labs(title = "Australian Retail Turnover (Department Stores)",
x = "Year", y = "Turnover (Million AUD)") +
theme_minimal()
# Specifying only one:
set.seed(123)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
autoplot(Turnover) +
labs(title = "Randomly Selected Retail Time Series",
x = "Year", y = "Turnover (Million AUD)") +
theme_minimal()
There is an upward trend as well as some seasonable variation. let’s try Box-cox transformation to stabilize the increasing fluctuations over time:
lambda <- myseries |> features(Turnover, features = guerrero) |> pull(lambda_guerrero)
myseries |>
mutate(BoxCox_Turnover = box_cox(Turnover, lambda)) |>
autoplot(BoxCox_Turnover) +
labs(title = "Box-Cox Transformed Retail Turnover",
x = "Year", y = "Transformed Turnover") +
theme_minimal()
I think it worked and it is somewhat better for the increasing fluctuations, but there is still an upward trend!
So to try to remove trend, we can add first order differencing:
myseries |>
mutate(BoxCox_Turnover = box_cox(Turnover, lambda),
Diff_BoxCox = difference(BoxCox_Turnover),
Seasonal_Diff_BoxCox = difference(Diff_BoxCox, lag = 12)) |>
autoplot(Seasonal_Diff_BoxCox) +
labs(title = "Fully Transformed Retail Turnover (Box-Cox + Differencing)",
x = "Year", y = "Transformed Turnover") +
theme_minimal()
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
Using Box-Cox transformation with optimal lambda
lambda_tobacco <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
mutate(BoxCox_Tobacco = box_cox(Tobacco, lambda_tobacco)) %>%
autoplot(BoxCox_Tobacco) +
labs(title = "Box-Cox Transformed Tobacco Production",
x = "Year", y = "Transformed Tobacco Production") +
theme_minimal()
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
print(paste("Optimal lambda for Tobacco:", lambda_tobacco))
## [1] "Optimal lambda for Tobacco: 0.926463585274373"
Using Box-Cox transformation with optimal lambda
# Compute optimal lambda for economy class passengers (Melbourne-Sydney)
lambda_passengers <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
# Apply Box-Cox transformation
ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
mutate(BoxCox_Passengers = box_cox(Passengers, lambda_passengers)) %>%
autoplot(BoxCox_Passengers) +
labs(title = "Box-Cox Transformed Economy Class Passengers (MEL-SYD)",
x = "Year", y = "Transformed Passenger Count") +
theme_minimal()
# Print the optimal lambda
print(paste("Optimal lambda for Economy Class Passengers:", lambda_passengers))
## [1] "Optimal lambda for Economy Class Passengers: 1.9999267732242"
Using Box-Cox transformation with optimal lambda (before and after)
pedestrian_filtered <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
pedestrian_filtered %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(Count) +
labs(title = "Southern Cross Pedestrians")
lambda_pedestrian <- pedestrian_filtered %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero) %>%
first()
pedestrian_filtered %>%
mutate(BoxCox_Count = box_cox(Count, lambda_pedestrian)) %>%
autoplot(BoxCox_Count) +
labs(title = "Box-Cox Transformed Pedestrian Counts at Southern Cross Station",
x = "Year", y = "Transformed Pedestrian Count") +
theme_minimal()
gas <- tail(aus_production, 5 * 4) |> select(Gas)
# Plot
autoplot(gas) +
labs(title = "Australian Gas Production (Last 5 Years)",
x = "Year", y = "Gas Production") +
theme_minimal()
## Plot variable not specified, automatically selected `.vars = Gas`
So there does seem to be a clear seasonal component that is repeating every year, with an upward trend element too. The change upward every year seems relatively the same in magnitude as well.
# classical decomposition
gas_decomp <- gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
# Plot
autoplot(gas_decomp) +
labs(title = "Multiplicative Decomposition of Australian Gas Production",
x = "Year") +
theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
After decomposition, the trend component confirms a graduate increase in gas production over time. And the seasonal element shows the quarterly pattern seen in the first plot.
Now, to compute the seasonally adjusted data:
gas_sa <- gas_decomp |>
mutate(Seasonally_Adjusted = Gas / seasonal)
# Plot
autoplot(gas_sa, Seasonally_Adjusted) +
labs(title = "Seasonally Adjusted Australian Gas Production",
x = "Year", y = "Seasonally Adjusted Gas Production") +
theme_minimal()
The seasonally adjusted time series removes seasonal fluctuations, making it easier to observe underlying trends. The trend component suggests a gradual increase in gas production over time, with some short-term variations.
Now introducing the outlier component and recomputing:
gas_outlier <- gas |>
mutate(Gas = ifelse(row_number() == 10, Gas + 300, Gas))
gas_decomp_outlier <- gas_outlier |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
gas_sa_outlier <- gas_decomp_outlier |>
mutate(Seasonally_Adjusted = Gas / seasonal)
# Plot
autoplot(gas_sa_outlier, Seasonally_Adjusted) +
labs(title = "Seasonally Adjusted Gas Production (With Outlier in Middle)",
x = "Year", y = "Seasonally Adjusted Gas Production") +
theme_minimal()
Sharp spike in 2008 clearly reflects the artificially introduced outlier. Seasonal adjustment is significantly distorted. The outlier overestimates the true trend.
To test how the effect changes when the outlier is at the end of the time series:
gas_outlier_end <- gas |>
mutate(Gas = ifelse(row_number() == n(), Gas + 300, Gas))
gas_decomp_outlier_end <- gas_outlier_end |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components()
gas_sa_outlier_end <- gas_decomp_outlier_end |>
mutate(Seasonally_Adjusted = Gas / seasonal)
# Plot
autoplot(gas_sa_outlier_end, Seasonally_Adjusted) +
labs(title = "Seasonally Adjusted Gas Production (With Outlier at End)",
x = "Year", y = "Seasonally Adjusted Gas Production") +
theme_minimal()
Sharp spike at the last observation confirms that the outlier significantly impacts the trend estimation. This overestimates the true production levels. Previous values remain stable.
x11_decomp <- myseries |>
model(X_11 = X_13ARIMA_SEATS(Turnover)) |>
components()
# Plot
autoplot(x11_decomp) +
labs(title = "X-11 Decomposition of Retail Turnover",
x = "Year") +
theme_minimal()
Shows a gradual increase over time, indicating growth in retail turnover. There are no sharp discontinuities. Displays a consistent repeating pattern, confirming strong seasonality in retail sales. Peaks likely correspond to holiday shopping periods. A significant spike around 2000 stands out (anomaly? outlier? error? policy or industry change?)
The decomposition of the Australian civilian labor force from February 1978 to August 1995 reveals several key patterns. The trend component shows a steady increase in labor force participation over time, with a noticeable dip around 1991–1992, suggesting the impact of the early 1990s recession. The seasonal component highlights consistent fluctuations, indicating recurring labor force patterns across months, with peaks and troughs that remain relatively stable over time. The remainder component, which captures irregular fluctuations, shows a significant negative deviation around 1991, further supporting the presence of economic disruption during the recession.
The second figure provides a detailed view of the seasonal component across different months, showing that March and December tend to have the highest seasonal increases, while September and January exhibit significant declines. The blue lines represent stable seasonal averages, while the black lines show how seasonal effects evolved over time.
This decomposition confirms that the 1991–1992 recession had a measurable effect on labor force participation, particularly evident in the trend and remainder components.