3 Plotting Our Data
Since this is monthly data, frequency = 12 will be used to define the
time series object.
price.ts <- ts(price_recent[, 2], frequency = 12, start = c(2013, 5))
par(mar = c(2, 2, 2, 2))
plot(
price.ts,
main = "US Regular Conventional Gas Prices: May 2013 to October 2025",
ylab = "Monthly Price (USD per gallon)",
xlab = "")

The time series plot of U.S. gasoline prices from May 2013 to October
2025 shows several clear shifts. Prices decline from about $3.50 in 2013
to a low near $2 in early 2016, then gradually rise through 2019. A
sharp increase begins in 2021, peaking above $4.50 in 2022. After the
peak, prices decline but remain more volatile, settling around the
mid-$3 range through 2023–2025. Overall, the graph highlights a major
upward surge after 2020 followed by partial stabilization.
4 Forecasting with Decomposition
To analyze the underlying structure of the gasoline price time
series, this project applies both classical decomposition and STL
(Seasonal-Trend Decomposition using LOESS). Using both methods provides
a more complete understanding of how the data behaves over time.
Classical decomposition separates the series into trend, seasonal, and
remainder components using fixed seasonal patterns, which helps identify
broad structural features in a straightforward way. STL decomposition,
on the other hand, offers a more flexible and robust approach that can
capture smoother trends and adapt to changes in seasonality. By
performing both methods, the analysis can compare how each technique
represents the data, highlight differences in trend and seasonal
behavior, and ensure that the decomposition used for forecasting is
based on the method that best reflects the characteristics of the
gasoline price series.
4.1 Classical
Here, I apply the classical decomposition to break the series into
its trend, seasonal, and irregular components, providing a baseline
understanding of how the data behaves over time.
cls.decomp = decompose(price.ts)
par(mar=c(2,2,2,2))
plot(cls.decomp, xlab="")

The classical decomposition separates the gasoline price series into
its observed, trend, seasonal, and remainder components. The trend
component shows a gradual decline from 2013 through 2016, followed by a
steady rise leading into the sharp peak around 2022, before leveling off
toward 2025. The seasonal component displays a consistent repeating
yearly pattern, indicating modest but regular seasonal fluctuations in
gasoline prices. The remainder component captures short-term volatility
not explained by trend or seasonality, with larger irregular movements
appearing during periods of rapid price change, particularly around
2020–2022. Overall, classical decomposition reveals a clear long-term
upward trend and stable seasonal behavior throughout the series.
4.2 STL
We will now perform the STL method.
stl.decomp=stl(price.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp)

The STL decomposition provides a smooth and flexible breakdown of the
gasoline price series into trend, seasonal, and remainder components.
The trend component clearly shows the long downward movement through
2016, followed by a steady rise and a pronounced peak in 2022, before
gradually declining toward 2025. The seasonal component displays a
stable, repeating yearly pattern, indicating that gasoline prices follow
consistent seasonal fluctuations over time. The remainder component
captures short-term variability that is not explained by the trend or
seasonality, with noticeable spikes during periods of rapid price
change, especially around 2020–2022. Compared with classical
decomposition, the STL trend is smoother and better highlights the
underlying long-term movements in the data.
5 Training Data
To evaluate the forecasting performance of the time series model, the
final six months of gasoline price data are reserved as a test set. This
6-month horizon provides a practical short-term evaluation window while
limiting the uncertainty associated with longer-range forecasts. The
remaining observations serve as the training data, and multiple training
window sizes are created to assess how the amount of historical
information influences forecast accuracy. By generating forecasts for
each training window and comparing them to the six held-out
observations, the analysis identifies which training size yields the
most reliable predictions for short-term gasoline price behavior.
To examine how the amount of historical data influences forecasting
performance, four different training window sizes were created using the
gasoline price time series. After holding out the final six observations
as the test set, the remaining data were used to construct training sets
of 144, 120, 96, and 72 months. These sizes were selected to represent a
range of long, medium, and shorter training periods, allowing the
analysis to evaluate whether using more or less historical information
leads to better short-term forecasts. Each training window forecasts the
same six-month test period, ensuring a fair comparison of predictive
accuracy across models. By comparing the forecasting errors associated
with these four training sizes, the analysis identifies which amount of
historical data provides the most reliable short-term predictions for
gasoline prices.
ini.data <- as.numeric(price.ts)
n0 <- length(ini.data) # should be 150
# -------------------------------
# Training data sets
# -------------------------------
train.data01 <- ini.data[1:(n0 - 6)] # 144 months
train.data02 <- ini.data[(n0 - 6 - 119):(n0 - 6)] # 120 months
train.data03 <- ini.data[(n0 - 6 - 95):(n0 - 6)] # 96 months
train.data04 <- ini.data[(n0 - 6 - 71):(n0 - 6)] # 72 months
# -------------------------------
# Test data (last 6 observations)
# -------------------------------
test.data <- ini.data[(n0 - 5):n0]
# -------------------------------
# Convert each training set to ts()
# Start = May 2013 (2013,5)
# -------------------------------
# Training 1: full 144 months starting at May 2013
train01.ts <- ts(train.data01, frequency = 12, start = c(2013, 5))
# Training 2: 120 months starting 24 months later (2015-05)
train02.ts <- ts(train.data02, frequency = 12, start = c(2015, 5))
# Training 3: 96 months starting 48 months after start (2017-05)
train03.ts <- ts(train.data03, frequency = 12, start = c(2017, 5))
# Training 4: 72 months starting 72 months after start (2019-05)
train04.ts <- ts(train.data04, frequency = 12, start = c(2019, 5))
# -------------------------------
# STL decomposition
# -------------------------------
stl01 <- stl(train01.ts, s.window = 12)
stl02 <- stl(train02.ts, s.window = 12)
stl03 <- stl(train03.ts, s.window = 12)
stl04 <- stl(train04.ts, s.window = 12)
# -------------------------------
# Forecast 6 months ahead
# -------------------------------
fcst01 <- forecast(stl01, h = 6, method = "naive")
fcst02 <- forecast(stl02, h = 6, method = "naive")
fcst03 <- forecast(stl03, h = 6, method = "naive")
fcst04 <- forecast(stl04, h = 6, method = "naive")
We next perform error analysis
# -------------------------------
# Prediction errors (MAPE without %)
# -------------------------------
PE01 <- (test.data - fcst01$mean) / fcst01$mean
PE02 <- (test.data - fcst02$mean) / fcst02$mean
PE03 <- (test.data - fcst03$mean) / fcst03$mean
PE04 <- (test.data - fcst04$mean) / fcst04$mean
MAPE1 <- mean(abs(PE01))
MAPE2 <- mean(abs(PE02))
MAPE3 <- mean(abs(PE03))
MAPE4 <- mean(abs(PE04))
# -------------------------------
# Squared errors (MSE)
# -------------------------------
E1 <- test.data - fcst01$mean
E2 <- test.data - fcst02$mean
E3 <- test.data - fcst03$mean
E4 <- test.data - fcst04$mean
MSE1 <- mean(E1^2)
MSE2 <- mean(E2^2)
MSE3 <- mean(E3^2)
MSE4 <- mean(E4^2)
# -------------------------------
# Accuracy table
# -------------------------------
MSE <- c(MSE1, MSE2, MSE3, MSE4)
MAPE <- c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy <- cbind(MSE = MSE, MAPE = MAPE)
row.names(accuracy) <- c("n=144", "n=120", "n=96", "n=72")
accuracy
MSE MAPE
n=144 0.009722946 0.02726487
n=120 0.009754861 0.02722784
n=96 0.010678115 0.02817212
n=72 0.010577313 0.02718487
# If you want it nicely formatted like the case study:
knitr::kable(
accuracy,
caption = "Error comparison between forecast results with different training sample sizes"
)
Error comparison between forecast results with different
training sample sizes
| n=144 |
0.0097229 |
0.0272649 |
| n=120 |
0.0097549 |
0.0272278 |
| n=96 |
0.0106781 |
0.0281721 |
| n=72 |
0.0105773 |
0.0271849 |
The error comparison table shows how forecast accuracy changes across
different training window sizes. The models trained with 144 and 120
months of historical data produce the lowest MSE values (0.00972 and
0.00975), indicating the most accurate forecasts in terms of squared
error. MAPE values are also very similar across these two windows, both
around 0.027, suggesting nearly identical relative accuracy. Forecast
accuracy declines slightly when using 96 and 72 months of training data,
as reflected by higher MSE and MAPE values. Overall, the results suggest
that using a longer training window—specifically 144 or 120
months—provides the most reliable short-term forecasts for gasoline
prices, while shorter windows lead to modest reductions in predictive
performance.
After computing MSE and MAPE for each training window, I visualize
the errors to compare forecasting performance across sample sizes. The
plots below show how prediction accuracy changes as the training period
becomes shorter.
par(mfrow = c(1, 2))
# -----------------------
# MSE Plot
# -----------------------
plot(1:4, MSE, type = "b", col = "darkred",
ylab = "Error", xlab = "",
main = "MSE", axes = FALSE)
labs <- c("n=144", "n=120", "n=96", "n=72")
axis(1, at = 1:4, labels = labs)
axis(2)
# Label the points with their values
text(1:4, MSE - 0.002, as.character(round(MSE, 4)),
col = "darkred", cex = 0.7)
# -----------------------
# MAPE Plot
# -----------------------
plot(1:4, MAPE, type = "b", col = "blue",
ylab = "Error", xlab = "",
main = "MAPE", axes = FALSE)
axis(1, at = 1:4, labels = labs)
axis(2)
# Label the points with their values
text(1:4, MAPE + 0.001, as.character(round(MAPE, 4)),
col = "blue", cex = 0.7)

The error plots show how forecast accuracy changes with different
training window sizes. Both MSE and MAPE are lowest when using the
largest training windows (n = 144 and n = 120), indicating that models
with more historical data produce more accurate six-month forecasts.
Error values increase for the shorter training windows (n = 96 and n =
72), especially for MSE, suggesting that reducing the amount of
available history weakens predictive performance. Overall, the visual
patterns confirm that longer training periods lead to more reliable
short-term gasoline price forecasts, while shorter windows introduce
more variability and higher forecasting errors.
6 Conclusion
This analysis examined monthly U.S. gasoline prices from May 2013 to
October 2025 to evaluate how recent price movements compare with
long-term historical trends. Both classical and STL decomposition
revealed clear structural patterns, including a gradual decline through
2016, a steady rise heading into 2020, and a sharp peak in 2022 followed
by partial stabilization. STL produced a smoother and more reliable
trend estimate, making it more suitable for forecasting. Using the last
six months as a test set and training windows of 144, 120, 96, and 72
months, the results showed that the longest training windows produced
the lowest MSE and MAPE values, indicating that incorporating more
historical information leads to better short-term predictions. While
these findings provide useful insight into how gasoline prices in early
2025 align with long-term trends, the analysis is subject to several
limitations. The dataset contains only 150 observations, restricting how
many meaningful training windows could be used, and gasoline prices are
influenced by external economic and geopolitical factors that are not
captured in a univariate time series. Potential user errors—such as
indexing mistakes or assumptions about start dates—could also affect the
results, and classical decomposition’s additive structure may
oversimplify a series with periods of heightened volatility. Future
analyses could be strengthened by using a larger dataset, experimenting
with additional forecasting models such as ARIMA or exponential
smoothing, incorporating external predictors like crude oil prices, or
applying rolling-origin validation to better assess forecasting
performance. Despite these limitations, the analysis provides a
structured and data-driven assessment of short-term gasoline price
behavior relative to long-term historical patterns.
---
title: "Forecasting U.S. Gasoline Prices Using Time Series Decomposition"
author: "Luke Volm"
date: "2025-11-20"
output:
  html_document:           # output document format
    toc: yes               # add table contents
    toc_float: yes         # toc_property: floating
    toc_depth: 4           # depth of TOC headings
    fig_width: 6           # global figure width
    fig_height: 4          # global figure height
    fig_caption: yes       # add figure caption
    number_sections: no   # numbering section headings
    toc_collapsed: yes     # TOC subheading collapsing
    code_folding: hide     # folding/showing code 
    code_download: yes     # allow to download complete RMarkdown source code
    smooth_scroll: yes     # scrolling text of the document
    theme: lumen           # visual theme for HTML document only
    highlight: tango       # code syntax highlighting styles
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
  word_document:
    toc: yes
    toc_depth: '4'
---

```{css, echo = FALSE}
div#TOC lwe{     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
library(readxl)
library(boot)
library(dplyr)
library(knitr)
library(psych)
library(MASS)
library(tidyr)
library(ggplot2)
library(car)
library(pander)
library(forecast)

# Set seed for reproducibility
set.seed(123)

# Read in data (drop first column if it's just an index/ID)
setwd("C:/Users/volm1/OneDrive/Desktop/STA321 new")
price <- read.csv("GASREGCOVM.csv")
# Global chunk options
knitr::opts_chunk$set(
  echo = TRUE,      # show code
  warning = FALSE,  # suppress warnings
  message = FALSE,  # suppress messages
  results = TRUE,   # show results
  comment = NA      # cleaner output (no "##" prefix)
)
```

# 1 Introduction

This project analyzes monthly U.S. gasoline prices using publicly available data from the Federal Reserve Bank of St. Louis (FRED). The goal of the analysis is to evaluate how recent price levels compare with long-term patterns by addressing the research question: How do U.S. gasoline prices in early 2025 compare to their long-term historical trend? To investigate this, the most recent 150 months of data are extracted and converted into a time series object for structured analysis. Classical and STL decomposition methods are applied to separate the series into trend, seasonal, and remainder components, allowing for an examination of underlying movements over time. Several training windows are then constructed, with the final six months held out as a test set, and STL-based naive forecasts are generated. Forecast accuracy is assessed using MSE and MAPE across the different training sizes, providing insight into how well recent gasoline prices align with expectations based on long-term historical behavior.

# 2 Description of Data Set

The time series data used in this study is from the Federal Reserve Bank of St. Louis. The data contains the gas price for the first of every month since September 1, 1990. We will be observsing only the most recent 150 observations, which ranges from May 2013 to October 2025. Our observation ID and explanatory variable will be the Observation Date. Our response variable is Dollars per Gallon (GASREGCOVM).

```{r}

price_recent <- tail(price, 150)

```

In the above code chunk, we take the 150 most recent dates, or observations, in the data set.

# 3 Plotting Our Data

Since this is monthly data, frequency = 12 will be used to define the time series object. 

```{r}

price.ts <- ts(price_recent[, 2], frequency = 12, start = c(2013, 5))

par(mar = c(2, 2, 2, 2))
plot(
  price.ts,
  main = "US Regular Conventional Gas Prices: May 2013 to October 2025",
  ylab = "Monthly Price (USD per gallon)",
  xlab = "")
```

The time series plot of U.S. gasoline prices from May 2013 to October 2025 shows several clear shifts. Prices decline from about $3.50 in 2013 to a low near $2 in early 2016, then gradually rise through 2019. A sharp increase begins in 2021, peaking above $4.50 in 2022. After the peak, prices decline but remain more volatile, settling around the mid-$3 range through 2023–2025. Overall, the graph highlights a major upward surge after 2020 followed by partial stabilization.

# 4 Forecasting with Decomposition

To analyze the underlying structure of the gasoline price time series, this project applies both classical decomposition and STL (Seasonal-Trend Decomposition using LOESS). Using both methods provides a more complete understanding of how the data behaves over time. Classical decomposition separates the series into trend, seasonal, and remainder components using fixed seasonal patterns, which helps identify broad structural features in a straightforward way. STL decomposition, on the other hand, offers a more flexible and robust approach that can capture smoother trends and adapt to changes in seasonality. By performing both methods, the analysis can compare how each technique represents the data, highlight differences in trend and seasonal behavior, and ensure that the decomposition used for forecasting is based on the method that best reflects the characteristics of the gasoline price series.

## 4.1 Classical

Here, I apply the classical decomposition to break the series into its trend, seasonal, and irregular components, providing a baseline understanding of how the data behaves over time.

```{r}

cls.decomp = decompose(price.ts)
par(mar=c(2,2,2,2))
plot(cls.decomp, xlab="")

```

The classical decomposition separates the gasoline price series into its observed, trend, seasonal, and remainder components. The trend component shows a gradual decline from 2013 through 2016, followed by a steady rise leading into the sharp peak around 2022, before leveling off toward 2025. The seasonal component displays a consistent repeating yearly pattern, indicating modest but regular seasonal fluctuations in gasoline prices. The remainder component captures short-term volatility not explained by trend or seasonality, with larger irregular movements appearing during periods of rapid price change, particularly around 2020–2022. Overall, classical decomposition reveals a clear long-term upward trend and stable seasonal behavior throughout the series.

## 4.2 STL

We will now perform the STL method.

```{r}

stl.decomp=stl(price.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp)

```

The STL decomposition provides a smooth and flexible breakdown of the gasoline price series into trend, seasonal, and remainder components. The trend component clearly shows the long downward movement through 2016, followed by a steady rise and a pronounced peak in 2022, before gradually declining toward 2025. The seasonal component displays a stable, repeating yearly pattern, indicating that gasoline prices follow consistent seasonal fluctuations over time. The remainder component captures short-term variability that is not explained by the trend or seasonality, with noticeable spikes during periods of rapid price change, especially around 2020–2022. Compared with classical decomposition, the STL trend is smoother and better highlights the underlying long-term movements in the data.

# 5 Training Data

To evaluate the forecasting performance of the time series model, the final six months of gasoline price data are reserved as a test set. This 6-month horizon provides a practical short-term evaluation window while limiting the uncertainty associated with longer-range forecasts. The remaining observations serve as the training data, and multiple training window sizes are created to assess how the amount of historical information influences forecast accuracy. By generating forecasts for each training window and comparing them to the six held-out observations, the analysis identifies which training size yields the most reliable predictions for short-term gasoline price behavior.

To examine how the amount of historical data influences forecasting performance, four different training window sizes were created using the gasoline price time series. After holding out the final six observations as the test set, the remaining data were used to construct training sets of 144, 120, 96, and 72 months. These sizes were selected to represent a range of long, medium, and shorter training periods, allowing the analysis to evaluate whether using more or less historical information leads to better short-term forecasts. Each training window forecasts the same six-month test period, ensuring a fair comparison of predictive accuracy across models. By comparing the forecasting errors associated with these four training sizes, the analysis identifies which amount of historical data provides the most reliable short-term predictions for gasoline prices.

```{r}

ini.data <- as.numeric(price.ts)
n0 <- length(ini.data)   # should be 150

# -------------------------------
# Training data sets
# -------------------------------
train.data01 <- ini.data[1:(n0 - 6)]                 # 144 months
train.data02 <- ini.data[(n0 - 6 - 119):(n0 - 6)]    # 120 months
train.data03 <- ini.data[(n0 - 6 - 95):(n0 - 6)]     # 96 months
train.data04 <- ini.data[(n0 - 6 - 71):(n0 - 6)]     # 72 months

# -------------------------------
# Test data (last 6 observations)
# -------------------------------
test.data <- ini.data[(n0 - 5):n0]

# -------------------------------
# Convert each training set to ts()
# Start = May 2013 (2013,5)
# -------------------------------

# Training 1: full 144 months starting at May 2013
train01.ts <- ts(train.data01, frequency = 12, start = c(2013, 5))

# Training 2: 120 months starting 24 months later (2015-05)
train02.ts <- ts(train.data02, frequency = 12, start = c(2015, 5))

# Training 3: 96 months starting 48 months after start (2017-05)
train03.ts <- ts(train.data03, frequency = 12, start = c(2017, 5))

# Training 4: 72 months starting 72 months after start (2019-05)
train04.ts <- ts(train.data04, frequency = 12, start = c(2019, 5))

# -------------------------------
# STL decomposition
# -------------------------------
stl01 <- stl(train01.ts, s.window = 12)
stl02 <- stl(train02.ts, s.window = 12)
stl03 <- stl(train03.ts, s.window = 12)
stl04 <- stl(train04.ts, s.window = 12)

# -------------------------------
# Forecast 6 months ahead
# -------------------------------

fcst01 <- forecast(stl01, h = 6, method = "naive")
fcst02 <- forecast(stl02, h = 6, method = "naive")
fcst03 <- forecast(stl03, h = 6, method = "naive")
fcst04 <- forecast(stl04, h = 6, method = "naive")

```

We next perform error analysis

```{r}

# -------------------------------
# Prediction errors (MAPE without %) 
# -------------------------------
PE01 <- (test.data - fcst01$mean) / fcst01$mean
PE02 <- (test.data - fcst02$mean) / fcst02$mean
PE03 <- (test.data - fcst03$mean) / fcst03$mean
PE04 <- (test.data - fcst04$mean) / fcst04$mean

MAPE1 <- mean(abs(PE01))
MAPE2 <- mean(abs(PE02))
MAPE3 <- mean(abs(PE03))
MAPE4 <- mean(abs(PE04))

# -------------------------------
# Squared errors (MSE)
# -------------------------------
E1 <- test.data - fcst01$mean
E2 <- test.data - fcst02$mean
E3 <- test.data - fcst03$mean
E4 <- test.data - fcst04$mean

MSE1 <- mean(E1^2)
MSE2 <- mean(E2^2)
MSE3 <- mean(E3^2)
MSE4 <- mean(E4^2)

# -------------------------------
# Accuracy table
# -------------------------------
MSE  <- c(MSE1, MSE2, MSE3, MSE4)
MAPE <- c(MAPE1, MAPE2, MAPE3, MAPE4)

accuracy <- cbind(MSE = MSE, MAPE = MAPE)
row.names(accuracy) <- c("n=144", "n=120", "n=96", "n=72")

accuracy

# If you want it nicely formatted like the case study:
knitr::kable(
  accuracy, 
  caption = "Error comparison between forecast results with different training sample sizes"
)
```

The error comparison table shows how forecast accuracy changes across different training window sizes. The models trained with 144 and 120 months of historical data produce the lowest MSE values (0.00972 and 0.00975), indicating the most accurate forecasts in terms of squared error. MAPE values are also very similar across these two windows, both around 0.027, suggesting nearly identical relative accuracy. Forecast accuracy declines slightly when using 96 and 72 months of training data, as reflected by higher MSE and MAPE values. Overall, the results suggest that using a longer training window—specifically 144 or 120 months—provides the most reliable short-term forecasts for gasoline prices, while shorter windows lead to modest reductions in predictive performance.

After computing MSE and MAPE for each training window, I visualize the errors to compare forecasting performance across sample sizes. The plots below show how prediction accuracy changes as the training period becomes shorter.

```{r}

par(mfrow = c(1, 2))

# -----------------------
# MSE Plot
# -----------------------
plot(1:4, MSE, type = "b", col = "darkred",
     ylab = "Error", xlab = "",
     main = "MSE", axes = FALSE)

labs <- c("n=144", "n=120", "n=96", "n=72")
axis(1, at = 1:4, labels = labs)
axis(2)

# Label the points with their values
text(1:4, MSE - 0.002, as.character(round(MSE, 4)),
     col = "darkred", cex = 0.7)


# -----------------------
# MAPE Plot
# -----------------------
plot(1:4, MAPE, type = "b", col = "blue",
     ylab = "Error", xlab = "",
     main = "MAPE", axes = FALSE)

axis(1, at = 1:4, labels = labs)
axis(2)

# Label the points with their values
text(1:4, MAPE + 0.001, as.character(round(MAPE, 4)),
     col = "blue", cex = 0.7)

par(mfrow = c(1,1))
```

The error plots show how forecast accuracy changes with different training window sizes. Both MSE and MAPE are lowest when using the largest training windows (n = 144 and n = 120), indicating that models with more historical data produce more accurate six-month forecasts. Error values increase for the shorter training windows (n = 96 and n = 72), especially for MSE, suggesting that reducing the amount of available history weakens predictive performance. Overall, the visual patterns confirm that longer training periods lead to more reliable short-term gasoline price forecasts, while shorter windows introduce more variability and higher forecasting errors.

# 6 Conclusion

This analysis examined monthly U.S. gasoline prices from May 2013 to October 2025 to evaluate how recent price movements compare with long-term historical trends. Both classical and STL decomposition revealed clear structural patterns, including a gradual decline through 2016, a steady rise heading into 2020, and a sharp peak in 2022 followed by partial stabilization. STL produced a smoother and more reliable trend estimate, making it more suitable for forecasting. Using the last six months as a test set and training windows of 144, 120, 96, and 72 months, the results showed that the longest training windows produced the lowest MSE and MAPE values, indicating that incorporating more historical information leads to better short-term predictions. While these findings provide useful insight into how gasoline prices in early 2025 align with long-term trends, the analysis is subject to several limitations. The dataset contains only 150 observations, restricting how many meaningful training windows could be used, and gasoline prices are influenced by external economic and geopolitical factors that are not captured in a univariate time series. Potential user errors—such as indexing mistakes or assumptions about start dates—could also affect the results, and classical decomposition’s additive structure may oversimplify a series with periods of heightened volatility. Future analyses could be strengthened by using a larger dataset, experimenting with additional forecasting models such as ARIMA or exponential smoothing, incorporating external predictors like crude oil prices, or applying rolling-origin validation to better assess forecasting performance. Despite these limitations, the analysis provides a structured and data-driven assessment of short-term gasoline price behavior relative to long-term historical patterns.