library(tidyverse)
library(forecast)
library(fpp3)
library(tsibbledata)
library(seasonal)
global_economy <- global_economy %>%
mutate(GDP_per_capita = GDP / Population)
ggplot(global_economy, aes(x = Year, y = GDP_per_capita, color = Country, group = Country)) +
geom_line() +
labs(
title = "GDP per Capita Over Time by Country",
x = "Year",
y = "GDP per Capita"
) +
theme_minimal() +
theme(legend.position = "none") # Hide legend if too many countries
highest_gdp_per_capita <- global_economy %>%
index_by(Year) %>%
slice_max(order_by = GDP_per_capita, n = 1)
ggplot(highest_gdp_per_capita, aes(x = Year, y = GDP_per_capita, label = Country)) +
geom_line() +
geom_point() +
geom_text(nudge_y = 5000, check_overlap = TRUE) +
labs(
title = "Country with the Highest GDP per Capita Over Time",
x = "Year",
y = "GDP per Capita"
) +
theme_minimal()
Monaco has consistently been recognized as the country with the highest GDP per capita over time. However, in the most recent ranking, Luxembourg has emerged as the nation with the highest GDP per capita.
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) +
labs(title = "United States GDP Over Time",
x = "Year", y = "GDP (US dollars)")
Apply log transformation to stabilize the variance and show proportional changes more clearly.
global_economy %>%
filter(Country == "United States") %>%
autoplot(log(GDP)) +
labs(title = "Log-Transformed United States GDP Over Time",
x = "Year", y = "Log of GDP")
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
autoplot(Count) +
labs(title = "Slaughter of Bulls, Bullocks, and Steers in Victoria",
x = "Year", y = "Number Slaughtered")
vic_elec %>%
autoplot(Demand) +
labs(title = "Victorian Electricity Demand Over Time",
x = "Time", y = "Electricity Demand (MW)")
aus_production %>%
autoplot(Gas) +
labs(title = "Australian Gas Production Over Time",
x = "Year", y = "Gas Production")
Apply log transformation to stabilize the variance and show proportional changes more clearly.
aus_production %>%
autoplot(log(Gas)) +
labs(title = "Log-Transformed Australian Gas Production",
x = "Year", y = "Log of Gas Production")
A Box-Cox transformation is typically used to stabilize variance and make a time series more normally distributed.
canadian_gas %>%
gg_season(Volume) +
labs(title = "Seasonal Plot of Canadian Gas Production")
Each year’s pattern aligns closely, it confirms strong seasonal effects.
canadian_gas %>%
model(STL(Volume ~ season(window = "periodic"))) %>%
components() %>%
autoplot()
There is strong seasonality in the data, the seasonal plot show clear and consistent peaks and valleys that repeat over time.
set.seed(1225)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
autoplot(Turnover)
lambda <- BoxCox.lambda(myseries$Turnover)
lambda
## [1] -0.231131
myseries %>%
autoplot(log(Turnover))
aus_production %>%
autoplot(Tobacco) +
labs(title = "Tobacco from aus_production")
tobacco_series <- aus_production$Tobacco
lambda_tobacco <- BoxCox.lambda(tobacco_series)
lambda_tobacco
## [1] 0.7099451
When Lambda is greater than 0 and less than 1, it is recommended to use a power transformation; in this case, I will utilize the square root transformation.
tobacco_series <- na.omit(tobacco_series)
tobacco_sqrt <- tobacco_series^(1/2)
tobacco_sqrt <- na.omit(tobacco_sqrt)
par(mfrow = c(1, 2))
plot(tobacco_series, main = "Original Tobacco Production", type = "o")
plot(tobacco_sqrt, main = "Square Root Transformed Tobacco", type = "o")
var(tobacco_series)
## [1] 1140807
var(tobacco_sqrt)
## [1] 45.71294
sd(tobacco_series)
## [1] 1068.086
sd(tobacco_sqrt)
## [1] 6.761135
The square root transformation proved to be very effective, as demonstrated by the results.
passengers <- ansett %>% filter(Airports == "MEL-SYD" & Class == "Economy")
pedestrian_series <- pedestrian$Count + 1
passengers %>%
autoplot(Passengers) +
labs(title = "Economy class passengers between Melbourne and Sydney on ansett")
passengers_series <- passengers$Passengers
summary(passengers_series)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 19694 21524 21509 24049 32468
passengers_series_shifted <- passengers_series + 1
lambda_passengers <- BoxCox.lambda(passengers_series_shifted)
lambda_passengers
## [1] 1.999953
When Lambda is almost equal to 2, it is recommended to use a power transformation. However, since this dataset has extreme values, I use log transformation to achieve better effects.
passengers_series_shifted <- na.omit(passengers_series_shifted)
passengers_log <- log(passengers_series_shifted)
par(mfrow = c(1, 2))
plot(passengers_series_shifted, main = "Original Passengers Count", type = "o")
plot(passengers_log, main = "Log Transformed Passengers", type = "o")
var(passengers_series_shifted)
## [1] 26982184
var(passengers_log)
## [1] 2.458385
sd(passengers_series_shifted)
## [1] 5194.438
sd(passengers_log)
## [1] 1.567924
The Log transformation proved to be very effective, as demonstrated by the results.
pedestrian_selected <- pedestrian %>% filter(Sensor == "Southern Cross Station")
summary(pedestrian_selected)
## Sensor Date_Time Date
## Length:17539 Min. :2015-01-01 00:00:00.00 Min. :2015-01-01
## Class :character 1st Qu.:2015-07-02 16:30:00.00 1st Qu.:2015-07-02
## Mode :character Median :2016-01-01 10:00:00.00 Median :2016-01-01
## Mean :2016-01-01 11:24:48.93 Mean :2015-12-31
## 3rd Qu.:2016-07-02 05:30:00.00 3rd Qu.:2016-07-02
## Max. :2016-12-31 23:00:00.00 Max. :2016-12-31
## Time Count
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 6.0 1st Qu.: 36.0
## Median :12.0 Median : 124.0
## Mean :11.5 Mean : 495.9
## 3rd Qu.:18.0 3rd Qu.: 696.0
## Max. :23.0 Max. :3743.0
pedestrian_series <- pedestrian_selected$Count + 1
pedestrian_selected %>%
autoplot(Count) +
labs(title = "Pedestrian counts at Southern Cross Station")
lambda_pedestrian <- BoxCox.lambda(pedestrian_series)
lambda_pedestrian
## [1] 0.03624172
When Lambda is approximately, it is recommended to use log transformation.
pedestrian_log <- log(pedestrian_series)
par(mfrow = c(1, 2))
plot(pedestrian_series, main = "Original Pedestrian Count", type = "o")
plot(pedestrian_log, main = "Log Transformed Pedestrian", type = "o")
var(pedestrian_series)
## [1] 512490.2
var(pedestrian_log)
## [1] 3.867562
sd(pedestrian_series)
## [1] 715.8842
sd(pedestrian_log)
## [1] 1.966612
The Log transformation proved to be very effective, as demonstrated by the results.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas %>% autoplot(Gas)
gas %>% gg_season(Gas)
I have identified seasonal fluctuations where gas production increases from Quarter 1 to Quarter 3 and then decreases. Additionally, gas production also rises when comparing the same quarters of previous years.
gas_ts <- ts(gas$Gas, frequency = 4)
gas_decomposed <- decompose(gas_ts, type = "multiplicative")
trend_cycle <- gas_decomposed$trend
trend_cycle
## Qtr1 Qtr2 Qtr3 Qtr4
## 1 NA NA 200.500 203.500
## 2 207.000 210.250 213.000 216.125
## 3 218.625 218.875 218.750 219.000
## 4 219.000 220.375 221.875 223.125
## 5 225.125 226.000 NA NA
seasonal_indices <- gas_decomposed$seasonal
seasonal_indices
## Qtr1 Qtr2 Qtr3 Qtr4
## 1 1.1256812 0.9250656 0.8752824 1.0739708
## 2 1.1256812 0.9250656 0.8752824 1.0739708
## 3 1.1256812 0.9250656 0.8752824 1.0739708
## 4 1.1256812 0.9250656 0.8752824 1.0739708
## 5 1.1256812 0.9250656 0.8752824 1.0739708
The results confirm a seasonal trend in gas production, with seasonal indices consistently repeating each year, indicating a quarterly pattern.
seasonally_adjusted <- gas_ts / gas_decomposed$seasonal
gas_tibble <- as_tsibble(gas_ts) |> mutate(Seasonally_Adjusted = seasonally_adjusted)
autoplot(gas_ts, series = "Original Data") +
autolayer(seasonally_adjusted, series = "Seasonally Adjusted", color = "darkblue") +
labs(title = "Seasonally Adjusted Gas Data", y = "Gas Production") +
theme_minimal()
gas_ts_outlier <- gas_ts
gas_ts_outlier[10] <- gas_ts_outlier[10] + 300
gas_decomposed_outlier <- decompose(gas_ts_outlier, type = "multiplicative")
seasonally_adjusted_outlier <- gas_ts_outlier / gas_decomposed_outlier$seasonal
p1 <- autoplot(gas_ts, series = "Original Data") +
autolayer(seasonally_adjusted, series = "Seasonally Adjusted", color = "darkblue") +
labs(title = "Before Outlier", y = "Gas Production") +
theme_minimal() +
theme(legend.position = "bottom")
p2 <- autoplot(gas_ts_outlier, series = "Original Data with Outlier") +
autolayer(seasonally_adjusted_outlier, series = "Seasonally Adjusted", color = "darkgreen") +
labs(title = "After Outlier", y = "Gas Production") +
theme_minimal() +
theme(legend.position = "bottom")
gridExtra::grid.arrange(p1, p2, ncol = 2)
Introducing an outlier can distort the trend by pulling it upward, resulting in inaccurate long-term trend estimates. Additionally, the multiplicative decomposition method assumes stable seasonality; the presence of an outlier can inflate the seasonal component, rendering it unreliable. Classical decomposition is sensitive to outliers, and a single extreme value can skew the entire analysis.
gas_ts_outlier2 <- gas_ts
gas_ts_outlier2[20] <- gas_ts_outlier2[20] + 300
gas_decomposed_outlier2 <- decompose(gas_ts_outlier2, type = "multiplicative")
seasonally_adjusted_outlier2 <- gas_ts_outlier2 / gas_decomposed_outlier2$seasonal
p3 <- autoplot(gas_ts, series = "Original Data") +
autolayer(seasonally_adjusted, series = "Seasonally Adjusted", color = "darkblue") +
labs(title = "Before Outlier", y = "Gas Production") +
theme_minimal() +
theme(legend.position = "bottom")
p4 <- autoplot(gas_ts_outlier2, series = "Original Data with Outlier") +
autolayer(seasonally_adjusted_outlier2, series = "Seasonally Adjusted", color = "darkgreen") +
labs(title = "After Outlier", y = "Gas Production") +
theme_minimal() +
theme(legend.position = "bottom")
gridExtra::grid.arrange(p3, p4, ncol = 2)
The trend-cycle component at the end is significantly distorted due to the limited number of data points available for smoothing. Classical decomposition relies on historical data to establish the trend, so an outlier at the end of the series can heavily skew future trend projections, rendering them unreliable. If the last data point is an outlier, the trend-cycle may overreact, leading to unrealistic predictions of a sudden increase.
myseries_ts <- ts(myseries$Turnover, start = c(year(min(myseries$Month)), month(min(myseries$Month))), frequency = 12)
x11_decomp <- seas(myseries_ts, x11 = "")
trend <- trendcycle(x11_decomp)
seasonal <- seasonal(x11_decomp)
irregular <- remainder(x11_decomp)
autoplot(myseries_ts, series = "Original Data") +
autolayer(trend, series = "Trend") +
autolayer(seasonal, series = "Seasonal") +
autolayer(irregular, series = "Irregular") +
labs(title = "X-11 Decomposition of Retail Series", y = "Turnover") +
theme_minimal() +
theme(legend.position = "bottom")
autoplot(irregular) +
labs(title = "Irregular Component (Checking for Outliers)", y = "Irregular Component") +
theme_minimal()
The plot displays several positive and negative spikes, which indicate sudden deviations from the overall pattern. These sharp spikes represent unexpected increases or decreases in retail sales. The early years show more extreme fluctuations, suggesting the presence of either significant events or outliers in the data. In contrast, the recent years appear more stable, indicating that retail trends have become more predictable.
The decomposition results show clear trends, seasonal patterns, and irregular elements in the data. The trend indicates a significant drop around 1991/1992, reflecting the economic downturn from the recession. The seasonal element remains steady, suggesting consistent periodic changes that were not impacted by the recession. Meanwhile, the irregular element shows more variability during the recession, indicating greater economic uncertainty. The graphs highlight the noticeable trend drop and irregular changes during the 1991/1992 recession.