Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
Load Required Packages: {fpp3} is loaded for time series analysis.
Calculate GDP per Capita: The dataset is updated with a new GDP_per_capita column by dividing GDP by Population.
Plot GDP per Capita Over Time: A line plot visualizes how GDP per capita changes for each country over time.
Fixing the Grouping Issue: Since global_economy is a tsibble (time-series tibble), index_by(Year) is used instead of group_by(Year).
Extracting Highest GDP per Capita Each Year: The dataset is filtered to retain only the country with the highest GDP per capita per year.
Displaying the Results: The output shows how the leading country has changed over time.
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.2
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.2
## Warning: package 'tsibbledata' was built under R version 4.4.2
## Warning: package 'feasts' was built under R version 4.4.2
## Warning: package 'fabletools' was built under R version 4.4.2
## Warning: package 'fable' was built under R version 4.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
# Calculate GDP per capita
global_economy <- global_economy %>%
mutate(GDP_per_capita = GDP / Population)
# Plot GDP per capita over time for each country
global_economy %>%
ggplot(aes(x = Year, y = GDP_per_capita, color = Country)) +
geom_line(show.legend = FALSE) +
labs(title = "GDP per Capita Over Time",
x = "Year",
y = "GDP per Capita") +
theme_minimal()
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Identify the country with the highest GDP per capita each year
highest_GDP_per_capita <- global_economy %>%
index_by(Year) %>% # Use index_by instead of group_by for tsibble
filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
select(Year, Country, GDP_per_capita)
print(highest_GDP_per_capita)
## # A tsibble: 58 x 3 [1Y]
## # Key: Country [263]
## # Groups: @ Year [58]
## Year Country GDP_per_capita
## <dbl> <fct> <dbl>
## 1 1965 Kuwait 4429.
## 2 1966 Kuwait 4556.
## 3 2013 Liechtenstein 173528.
## 4 2015 Liechtenstein 167591.
## 5 2017 Luxembourg 104103.
## 6 1970 Monaco 12480.
## 7 1971 Monaco 13813.
## 8 1972 Monaco 16734.
## 9 1973 Monaco 21423.
## 10 1974 Monaco 22707.
## # ℹ 48 more rows
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
# Load required libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
library(tsibbledata)
library(feasts)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.4.2
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
# 1. United States GDP from global_economy
us_gdp <- global_economy %>%
filter(Country == "United States") %>%
select(Year, GDP)
# Original and log-transformed plots
p1_original <- autoplot(us_gdp, GDP) +
labs(title = "US GDP (Original)", x = "Year", y = "GDP (USD)")
p1_log <- autoplot(us_gdp, log(GDP)) +
labs(title = "US GDP (Log Transformed)", x = "Year", y = "Log(GDP)")
grid.arrange(p1_original, p1_log, ncol = 1)
# Log transformation converts exponential growth to linear trend, stabilizing variance and revealing consistent growth rates
# 2. Victorian Bulls Slaughter from aus_livestock
vic_bulls <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
select(Month, Count)
# Original and log-transformed plots
p2_original <- autoplot(vic_bulls, Count) +
labs(title = "Victorian Bulls Slaughter (Original)", x = "Month", y = "Count")
p2_log <- autoplot(vic_bulls, log(Count + 1)) + # +1 to handle zero counts
labs(title = "Victorian Bulls Slaughter (Log Transformed)", x = "Month", y = "Log(Count)")
grid.arrange(p2_original, p2_log, ncol = 1)
# Log transform stabilizes variance and emphasizes seasonal patterns while reducing outlier effects
# 3. Victorian Electricity Demand from vic_elec
# Aggregate to daily demand for better visualization
vic_elec_daily <- vic_elec %>%
index_by(Date = as.Date(Time)) %>%
summarise(Demand = sum(Demand, na.rm = TRUE)) # Sum demand by day, na.rm = TRUE to handle missing data
# Original and log-transformed plots
p3_original <- autoplot(vic_elec_daily, Demand) +
labs(title = "Victorian Electricity Demand (Original)", x = "Date", y = "Demand (MW)")
p3_log <- autoplot(vic_elec_daily, log(Demand + 1)) + # Adding 1 to avoid log(0)
labs(title = "Victorian Electricity Demand (Log Transformed)", x = "Date", y = "Log(Demand)")
grid.arrange(p3_original, p3_log, ncol = 1)
# Log transform reduces magnitude of daily fluctuations while preserving patterns
# 4. Gas Production from aus_production
gas_data <- aus_production %>%
select(Quarter, Gas)
# Original and log-transformed plots
p4_original <- autoplot(gas_data, Gas) +
labs(title = "Gas Production (Original)", x = "Quarter", y = "Gas Production")
p4_log <- autoplot(gas_data, log(Gas)) +
labs(title = "Gas Production (Log Transformed)", x = "Quarter", y = "Log(Gas Production)")
grid.arrange(p4_original, p4_log, ncol = 1)
# Log transform reveals consistent growth rate and stabilizes seasonal fluctuations
A Box-Cox transformation is unhelpful for the canadian_gas data because it requires the data to have positive values, and in this case, the dataset contains zero or negative values, which makes the transformation invalid. The Box-Cox transformation relies on the assumption that the data is continuous and strictly positive, so applying it to data with zero or negative values would either result in undefined values or distort the data. Furthermore, the canadian_gas data may already have a natural structure or trend that doesn’t require such a transformation. For data like this, other methods like adding a small constant or using a different transformation, such as the log transformation with a shift, may be more suitable.
For the retail data from Exercise 7, the Box-Cox transformation that I would select depends on the distribution and behavior of the time series. After exploring the series using functions like autoplot(), gg_season(), and gg_subseries(), I’d assess the variance and trend characteristics. If the data shows exponential growth or has significant skewness, a positive Box-Cox transformation with a suitable lambda (estimated using methods like Guerrero or maximum likelihood) would help stabilize the variance and make the trend more linear. For example, after applying the Box-Cox transformation, I would choose the lambda that minimizes the variance and enhances the data’s linearity. This transformation would ensure that the data is better suited for modeling, especially when the series exhibits patterns that are not easily captured by simple linear models. However, if the series is already reasonably stable or doesn’t show significant skewness, a simpler transformation like a log or square root might be more appropriate.
# Load required libraries
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(feasts)
library(forecast) # For box_cox function
## Warning: package 'forecast' was built under R version 4.4.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# 1. Tobacco from aus_production
tobacco_data <- aus_production %>%
select(Quarter, Tobacco)
# Find the Box-Cox transformation lambda for Tobacco
lambda_tobacco <- BoxCox.lambda(tobacco_data$Tobacco)
# Apply Box-Cox transformation to Tobacco data
tobacco_transformed <- tobacco_data %>%
mutate(Transformed_Tobacco = BoxCox(Tobacco, lambda_tobacco))
# Plot original and transformed Tobacco data
p1_original <- autoplot(tobacco_data, Tobacco) +
labs(title = "Tobacco Production (Original)", x = "Quarter", y = "Tobacco")
p1_transformed <- autoplot(tobacco_transformed, Transformed_Tobacco) +
labs(title = "Box-Cox Transformed Tobacco Production", x = "Quarter", y = "Transformed Tobacco")
# Display the plots
grid.arrange(p1_original, p1_transformed, ncol = 1)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
# 2. Economy class passengers between Melbourne and Sydney from ansett
ansett_data <- ansett %>%
filter(Class == "Economy") %>%
select(Week, Passengers) # Corrected to use 'Week' instead of 'Month'
# Find the Box-Cox transformation lambda for Economy class passengers
lambda_ansett <- BoxCox.lambda(ansett_data$Passengers)
## Warning in guerrero(x, lower, upper): Guerrero's method for selecting a Box-Cox
## parameter (lambda) is given for strictly positive data.
# Apply Box-Cox transformation to Economy class passengers data
ansett_transformed <- ansett_data %>%
mutate(Transformed_Passengers = BoxCox(Passengers, lambda_ansett))
# Plot original and transformed Economy passengers data
p2_original <- autoplot(ansett_data, Passengers) +
labs(title = "Economy Class Passengers (Original)", x = "Week", y = "Passengers")
p2_transformed <- autoplot(ansett_transformed, Transformed_Passengers) +
labs(title = "Box-Cox Transformed Economy Class Passengers", x = "Week", y = "Transformed Passengers")
# Display the plots
grid.arrange(p2_original, p2_transformed, ncol = 1)
# 3. Pedestrian counts at Southern Cross Station from pedestrian
pedestrian_data <- pedestrian %>%
select(Date, Count)
# Find the Box-Cox transformation lambda for Pedestrian counts
lambda_pedestrian <- BoxCox.lambda(pedestrian_data$Count)
## Warning in guerrero(x, lower, upper): Guerrero's method for selecting a Box-Cox
## parameter (lambda) is given for strictly positive data.
# Apply Box-Cox transformation to Pedestrian counts data
pedestrian_transformed <- pedestrian_data %>%
mutate(Transformed_Count = BoxCox(Count, lambda_pedestrian))
# Plot original and transformed Pedestrian counts data
p3_original <- autoplot(pedestrian_data, Count) +
labs(title = "Pedestrian Counts (Original)", x = "Date", y = "Count")
p3_transformed <- autoplot(pedestrian_transformed, Transformed_Count) +
labs(title = "Box-Cox Transformed Pedestrian Counts", x = "Date", y = "Transformed Count")
# Display the plots
grid.arrange(p3_original, p3_transformed, ncol = 1)
For the Tobacco series from aus_production, the optimal Box-Cox
transformation was found by using the Guerrero method. This
transformation stabilizes the variance and makes the data more linear,
which is important for further analysis and forecasting. The lambda
value for the Tobacco series was calculated and applied to transform the
data. Similarly, for the Economy class passengers between Melbourne and
Sydney from the ansett dataset, the Box-Cox transformation was applied
to stabilize the variance and deal with any skewness in the passenger
counts. The same approach was applied to the Pedestrian counts at
Southern Cross Station from the pedestrian dataset, where the
transformation helped adjust for any irregularities in the data. Each
transformation stabilizes variance and makes the data more suitable for
time series modeling by reducing the impact of non-linear trends and
volatility. After performing these transformations, the data is now
better prepared for modeling, where the trends are clearer and the
variance more consistent.
# Load required libraries
library(tidyverse)
library(tsibble)
library(feasts)
library(fable)
# a. Plot the time series for the last 5 years of the Gas data from aus_production
gas <- tail(aus_production, 5*4) |> select(Gas)
# Plot the time series
p1 <- autoplot(gas, Gas) +
labs(title = "Gas Production (Last 5 Years)", x = "Time", y = "Gas Production")
print(p1)
# b. Classical decomposition using multiplicative model to calculate trend-cycle and seasonal indices
gas_decomp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative"))
# Extract components from the decomposition
gas_decomp_components <- components(gas_decomp)
# Plot the decomposition results (individual components)
p2 <- autoplot(gas_decomp_components) +
labs(title = "Classical Decomposition of Gas Production")
print(p2)
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
# c. Does the decomposition support the graphical interpretation from part a?
# We will visually compare the trend-cycle and seasonal components from the decomposition plot to the time series plot.
# The decomposition should highlight if the data shows seasonal fluctuations (seen in the seasonal component)
# and a trend-cycle (seen in the trend component).
# d. Compute and plot the seasonally adjusted data
seasonally_adjusted <- gas_decomp_components %>%
mutate(seasonally_adjusted = Gas / seasonal) # Seasonally adjusted = data / seasonal index
# Plot the seasonally adjusted data
p3 <- autoplot(seasonally_adjusted, seasonally_adjusted) +
labs(title = "Seasonally Adjusted Gas Production", x = "Time", y = "Seasonally Adjusted Gas Production")
print(p3)
# e. Introduce an outlier (add 300 to the first observation) and recompute the seasonally adjusted data
gas_outlier <- gas
gas_outlier$Gas[1] <- gas_outlier$Gas[1] + 300 # Adding outlier
# Re-decompose the series with the outlier
gas_outlier_decomp <- gas_outlier %>%
model(classical_decomposition(Gas, type = "multiplicative"))
# Extract components from the outlier-decomposed series
gas_outlier_decomp_components <- components(gas_outlier_decomp)
# Seasonally adjust the series with the outlier
seasonally_adjusted_outlier <- gas_outlier_decomp_components %>%
mutate(seasonally_adjusted = Gas / seasonal)
# Plot the seasonally adjusted data with the outlier
p4 <- autoplot(seasonally_adjusted_outlier, seasonally_adjusted) +
labs(title = "Seasonally Adjusted Gas Production with Outlier", x = "Time", y = "Seasonally Adjusted Gas Production")
print(p4)
# f. Compare the effect of the outlier if it is near the end of the series
gas_outlier_end <- gas
gas_outlier_end$Gas[20] <- gas_outlier_end$Gas[20] + 300 # Adding outlier at the end
# Re-decompose the series with the outlier at the end
gas_outlier_end_decomp <- gas_outlier_end %>%
model(classical_decomposition(Gas, type = "multiplicative"))
# Extract components from the outlier-at-the-end decomposed series
gas_outlier_end_decomp_components <- components(gas_outlier_end_decomp)
# Seasonally adjust the series with the outlier at the end
seasonally_adjusted_outlier_end <- gas_outlier_end_decomp_components %>%
mutate(seasonally_adjusted = Gas / seasonal)
# Plot the seasonally adjusted data with the outlier at the end
p5 <- autoplot(seasonally_adjusted_outlier_end, seasonally_adjusted) +
labs(title = "Seasonally Adjusted Gas Production with Outlier at End", x = "Time", y = "Seasonally Adjusted Gas Production")
print(p5)
a. Plotting the Time Series: The time series of gas production for the
last 5 years displays fluctuations over time, with certain peaks and
valleys. You can see that there is some level of seasonality, likely
related to natural cycles or production patterns. Additionally, there
might be an underlying upward or downward trend, although it’s not
immediately obvious without further analysis.
Classical Decomposition with Multiplicative Model: Using classical_decomposition() with a multiplicative model, the gas production data is decomposed into three components: trend-cycle, seasonal, and remainder. The multiplicative model is chosen because gas production data tends to have both trend and seasonality that are proportional to the level of the data. The decomposition plots show how the seasonal variations and the trend change over time, providing insights into how production patterns evolve.
Does the Decomposition Support the Graphical Interpretation?: Yes, the decomposition supports the graphical interpretation. The seasonal component reveals periodic fluctuations in production, while the trend-cycle component reflects any long-term growth or decline in gas production. The original plot also shows these fluctuations and trends, confirming the seasonality and potential upward trend over the period.
Seasonally Adjusted Data: By dividing the gas production data by its seasonal indices, we remove the seasonal fluctuations and obtain the seasonally adjusted data. The seasonally adjusted plot helps to clarify the underlying trend in gas production by removing the periodic noise caused by seasonality. This allows for a clearer view of whether there is any long-term growth or other patterns in the data.
Effect of an Outlier (Outlier in the Middle): Adding an outlier to the first observation (by increasing it by 300) disturbs the seasonal adjustment. The outlier causes a distortion in the seasonally adjusted series, as the seasonal index gets skewed by the sudden spike in the data. The seasonal adjustment with the outlier shows a deviation from the expected production levels, highlighting how sensitive seasonal adjustments are to extreme values.
Effect of the Outlier at the End: When the outlier is added near the end of the series, the effect on the seasonally adjusted data is somewhat less pronounced compared to when the outlier is placed in the middle. The impact on the seasonal adjustment is reduced because the model places more emphasis on earlier data points in the series, and the outlier at the end has less influence on the overall trend and seasonal components. This suggests that outliers near the end of the time series may have a smaller impact on the overall decomposition compared to those at the beginning.
# Step 1: Prepare Data and Apply X-1
# Load required packages
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(seasonal) # For X-11 decomposition
## Warning: package 'seasonal' was built under R version 4.4.3
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
# Set seed and select random series (replicating Exercise 7)
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) |>
select(Month, Turnover) # Ensure `Turnover` is the target variable
# Convert to `ts` object for X-11
start_year <- year(myseries$Month[1])
start_month <- month(myseries$Month[1])
ts_data <- ts(myseries$Turnover,
start = c(start_year, start_month),
frequency = 12)
# Apply X-11 decomposition
x11_decomp <- seas(ts_data, x11 = "")
#Step 2: Identify Outliers and Features
# Extract outliers detected by X-11
outliers <- out(x11_decomp)
cat("Detected outliers:\n", outliers)
## Detected outliers:
## C:\Users\taham\AppData\Local\Temp\RtmpQL5hef\x1360ac1c0126e4\iofile.html
# Plot decomposition components
plot(x11_decomp)
1.Outliers: X-11 automatically flags extreme values in the irregular
component (out(x11_decomp)). For example: * A spike in December 2019
(holiday season) may reflect uncharacteristically high/low turnover not
fully explained by seasonal patterns. * A dip in April 2020 could
correlate with COVID-19 disruptions, which earlier plots (e.g.,
autoplot()) might have shown as a trend break but not isolated as an
outlier.
The scale hierarchy (trend ≫ seasonality ≫ remainder) confirms trend growth as the primary driver, while seasonality and noise are secondary.