Introduction
According to the US Energy Information Administration, regular gas costs 58.7% more this month than it did a year ago - $3.491 a gallon last month vs $2.20 in November 2020. (EIA). I was very much interested to know the facts behind the rise in the fuel prices. The data was pulled from the EIA website.
Despite the fact that customers buy gasoline locally, the price of the fuel is mostly decided by the worldwide market for crude oil. When the price of crude oil fluctuates on the global market, whether due to geopolitical events, crude supply variations, or gasoline demand fluctuations, the prices that consumers pay at the pump fluctuate as well.
It is always difficult to forecast fuel prices as there are factors such as Covid 19 in the past few years where the prices have gone up over the time period.
Importing the data and analyzing various columns in it.
library(readxl)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(ggplot2)
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
## Loading required package: TTR
library(mltools)
##
## Attaching package: 'mltools'
## The following object is masked from 'package:tidyr':
##
## replace_na
library(tidyr)
library(forecast)
## Warning: package 'forecast' was built under R version 4.1.2
library(tseries)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(zoo)
library(sjPlot)
## #refugeeswelcome
library(sjmisc)
##
## Attaching package: 'sjmisc'
## The following object is masked from 'package:mltools':
##
## replace_na
## The following object is masked from 'package:purrr':
##
## is_empty
## The following object is masked from 'package:tidyr':
##
## replace_na
## The following object is masked from 'package:tibble':
##
## add_case
library(sjlabelled)
##
## Attaching package: 'sjlabelled'
## The following object is masked from 'package:forcats':
##
## as_factor
## The following object is masked from 'package:dplyr':
##
## as_label
## The following object is masked from 'package:ggplot2':
##
## as_label
library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
##
## Attaching package: 'rlang'
## The following objects are masked from 'package:sjlabelled':
##
## as_character, as_label
## The following object is masked from 'package:sjmisc':
##
## is_empty
## The following objects are masked from 'package:purrr':
##
## %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
## flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
## splice
Fuel_prices_data <- read_excel("/Users/nishanthreddy/Downloads/Fuel_prices_data.xlsx")
head(Fuel_prices_data)
## # A tibble: 6 × 2
## Week Price
## <dttm> <dbl>
## 1 2022-01-17 00:00:00 3.40
## 2 2022-01-10 00:00:00 3.39
## 3 2022-01-03 00:00:00 3.38
## 4 2021-12-27 00:00:00 3.38
## 5 2021-12-20 00:00:00 3.40
## 6 2021-12-13 00:00:00 3.41
summary(Fuel_prices_data)
## Week Price
## Min. :1993-04-05 00:00:00 Min. :0.949
## 1st Qu.:2000-06-15 12:00:00 1st Qu.:1.329
## Median :2007-08-27 00:00:00 Median :2.284
## Mean :2007-08-27 00:00:00 Mean :2.242
## 3rd Qu.:2014-11-06 12:00:00 3rd Qu.:2.913
## Max. :2022-01-17 00:00:00 Max. :4.165
str(Fuel_prices_data)
## tibble [1,503 × 2] (S3: tbl_df/tbl/data.frame)
## $ Week : POSIXct[1:1503], format: "2022-01-17" "2022-01-10" ...
## $ Price: num [1:1503] 3.4 3.39 3.38 3.38 3.4 ...
We can see here the summary of the data which is pulled from past years. Mean, Median values are 2.42 and 2.28 respectively. The data has 1,503 rows and 2 columns.
Finding missing values if any…
colSums(is.na(Fuel_prices_data))
## Week Price
## 0 0
Fuel_data <- Fuel_prices_data[which(Fuel_prices_data$Week >= "2011-01-01" & Fuel_prices_data$Week <= "2021-12-31" ),]
Fuel_data
## # A tibble: 574 × 2
## Week Price
## <dttm> <dbl>
## 1 2021-12-27 00:00:00 3.38
## 2 2021-12-20 00:00:00 3.40
## 3 2021-12-13 00:00:00 3.41
## 4 2021-12-06 00:00:00 3.44
## 5 2021-11-29 00:00:00 3.48
## 6 2021-11-22 00:00:00 3.49
## 7 2021-11-15 00:00:00 3.50
## 8 2021-11-08 00:00:00 3.50
## 9 2021-11-01 00:00:00 3.48
## 10 2021-10-25 00:00:00 3.48
## # … with 564 more rows
head(Fuel_data)
## # A tibble: 6 × 2
## Week Price
## <dttm> <dbl>
## 1 2021-12-27 00:00:00 3.38
## 2 2021-12-20 00:00:00 3.40
## 3 2021-12-13 00:00:00 3.41
## 4 2021-12-06 00:00:00 3.44
## 5 2021-11-29 00:00:00 3.48
## 6 2021-11-22 00:00:00 3.49
We took the data from past 10 years as its more significant and also found that there are no missing values in the data.
par(mfrow=c(1,2))
plot(Fuel_data$Week,Fuel_data$Price, type = "h", main = "Fuel Prices over the Years", ylab = "Fuel Prices", xlab = "Years")
plot(Fuel_data$Week,Fuel_data$Price, type = "o", main = "Fuel Prices over the Years", ylab = "Fuel Prices", xlab = "Years")

plot(Fuel_data$Week,Fuel_data$Price, type = "l", main = "Fuel Prices over the Years", ylab = "Fuel Prices", xlab = "Years")
boxplot(Fuel_data$Price, main = "Fuel Price")

From the Histogram and graphs, We can see various trends that the fuel prices dipped in 2016. As per the statistics the United States became a monthly net exporter of natural gas for the first time since 1957 in November 2016. Infrastructure advancements, such as natural gas pipelines and facilities for liquefying natural gas for export, aided this by allowing supply to fulfill rising demand from international markets.Throughout 2016, pipeline shipments from the United States to Mexico increased, accounting for 87 percent of total natural gas exports from the United States.
In 2019 the prices again went down as there was sufficient amount of production in the United States for that period and in 2020, COVID 19 pandemic played a major role in the decline of the Fuel prices as the demand was less when compared to the previous years.
There are no outliers in the data as per the boxplot plotted above.
Linear Modeling
library(tidyverse)
sample_data <- sort(sample(nrow(Fuel_data), nrow(Fuel_data)*0.8))
train<-Fuel_data[sample_data,]
test<-Fuel_data[-sample_data,]
lm_train = lm(Price ~ Week,data = train)
summary(lm_train)
##
## Call:
## lm(formula = Price ~ Week, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.12605 -0.33764 0.05372 0.31161 1.12459
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.996e+00 3.109e-01 25.71 <2e-16 ***
## Week -3.436e-09 2.114e-10 -16.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4565 on 457 degrees of freedom
## Multiple R-squared: 0.3664, Adjusted R-squared: 0.365
## F-statistic: 264.3 on 1 and 457 DF, p-value: < 2.2e-16
Looking at the summary of the 80% data, we can see that the pvalue is <0.05 by which we can reject the null hypothesis testing and can conclude that there is a clear relationship between the Fuel prices and Years.
predict_price <- predict(lm_train,newdata = test[c('Week')] )
predicted_test_data <- test
predicted_test_data['predict_price'] <- predict_price
install.packages("mltools", repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/36/rj11vkws2rq06q3731tl873r0000gn/T//RtmpfDJwyf/downloaded_packages
#install.packages("Metrics")
#rmse(predicted_test_data$Price, predicted_test_data$predict_price)
The Root Mean Square Error value which we got for the predicted model is 0.45.
Plotting the Predict and Actual Prices using ggplot
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/36/rj11vkws2rq06q3731tl873r0000gn/T//RtmpfDJwyf/downloaded_packages
install.packages("ggplot2", repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/36/rj11vkws2rq06q3731tl873r0000gn/T//RtmpfDJwyf/downloaded_packages
ggplot(train, aes(x = Week , y = Price )) +
geom_line(colour = "Black") +
geom_line(data = test, aes(x = Week, y = Price),colour = "green") +
geom_line(data = predicted_test_data, aes(x = Week, y = predict_price),colour = "red") +
labs(title = "Fuel Prices over the Years(Predicted)",
subtitle = "2011-2021",
x = "Years", y = "Fuel Price per Gallon ($)")

The actual and predicted fuel prices over the past ten years are labelled in the above graph. The actual prices are shown in black color, the predicted prices are shown in the red color and followed by the test data prices which are labelled in the green color.The factors such as abundant production of fuel in the 2016,2019 years and followed by the COVID 19 pandemic in the 2020 are clearly responsible for the changes in the graphin those time periods. We can conclude by saying that Linear regression isnt the best technique for the time series data as it isnt predicting it accurately. Better techniques can be used such as ARIMA for better analysis.
#line chart
ggplot(data = Fuel_data, aes(x = Fuel_data$Week, y = Fuel_data$Price )) + theme_bw()+
geom_line() + labs(title = "Line Chart")
## Warning: Use of `Fuel_data$Week` is discouraged. Use `Week` instead.
## Warning: Use of `Fuel_data$Price` is discouraged. Use `Price` instead.

ADF Test
print(adf.test(Fuel_data$Price))
##
## Augmented Dickey-Fuller Test
##
## data: Fuel_data$Price
## Dickey-Fuller = -3.4053, Lag order = 8, p-value = 0.05253
## alternative hypothesis: stationary
From the above ADF Test output, we observe the p-value is greater than 0.05. So, we fail to reject the Null Hypothesis and infer that the time series is non-stationary.
kpss.test(Fuel_data$Price, null="Trend")
## Warning in kpss.test(Fuel_data$Price, null = "Trend"): p-value smaller than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: Fuel_data$Price
## KPSS Trend = 0.95139, Truncation lag parameter = 6, p-value = 0.01
From the above KPSS Test output, we observe the p-value is less than 0.05. So, we reject the Null Hypothesis and infer that the time series is non-stationary.
The time series after being log-transformed and Box-Cox transformed are extremely identical.
Handling Mean Stationarity
Fuel_diff <- Fuel_data_log %>%
mutate(Price_diff = Fuel_data$Price - lag(Fuel_data$Price))
Fuel_diff %>% select(Week, Price, Price_diff) %>%
head()
## # A tibble: 6 × 3
## Week Price Price_diff
## <dttm> <dbl> <dbl>
## 1 2021-12-27 00:00:00 3.38 NA
## 2 2021-12-20 00:00:00 3.40 0.0200
## 3 2021-12-13 00:00:00 3.41 0.0190
## 4 2021-12-06 00:00:00 3.44 0.0260
## 5 2021-11-29 00:00:00 3.48 0.0380
## 6 2021-11-22 00:00:00 3.49 0.0150
ggplot() + geom_line(aes(Fuel_diff$Week,Fuel_diff$Price_diff)) + theme_bw() + ggtitle("Fuel Price (First Difference)") + ylab("Price(Difference))") + xlab("Year")
## Warning: Removed 1 row(s) containing missing values (geom_path).

ACF
acf(Fuel_diff$Price)

We can see that the time series has no seasonality in the ACF plot above.
In the ACF plot, there is a Dampening effect. As a result, the time series is entirely an AR process.
PACF
pacf(Fuel_diff$Price)

We can see from the PACF that there are two statistically significant autocorrelations. As a result, the order of the AR process is 2. ARIMA is given as (2,1,0).
Re-verifying using the ADF Test.
adf.test(Fuel_data_log_diff$Price)
##
## Augmented Dickey-Fuller Test
##
## data: Fuel_data_log_diff$Price
## Dickey-Fuller = -2.2758, Lag order = 8, p-value = 0.4615
## alternative hypothesis: stationary
adf.test(Fuel_data_log_diff$Price_diff)
##
## Augmented Dickey-Fuller Test
##
## data: Fuel_data_log_diff$Price_diff
## Dickey-Fuller = -3.175, Lag order = 8, p-value = 0.09224
## alternative hypothesis: stationary
adf.test(Fuel_data_log_diff$Price_log)
##
## Augmented Dickey-Fuller Test
##
## data: Fuel_data_log_diff$Price_log
## Dickey-Fuller = -3.409, Lag order = 8, p-value = 0.0519
## alternative hypothesis: stationary
adf.test(Fuel_data_log_diff$Price_log_diff)
## Warning in adf.test(Fuel_data_log_diff$Price_log_diff): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: Fuel_data_log_diff$Price_log_diff
## Dickey-Fuller = -6.9354, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(Fuel_data_log_diff$Price)
## Warning in kpss.test(Fuel_data_log_diff$Price): p-value smaller than printed p-
## value
##
## KPSS Test for Level Stationarity
##
## data: Fuel_data_log_diff$Price
## KPSS Level = 3.9025, Truncation lag parameter = 6, p-value = 0.01
kpss.test(Fuel_data_log_diff$Price_diff)
## Warning in kpss.test(Fuel_data_log_diff$Price_diff): p-value smaller than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: Fuel_data_log_diff$Price_diff
## KPSS Level = 6.0787, Truncation lag parameter = 6, p-value = 0.01
kpss.test(Fuel_data_log_diff$Price_log)
## Warning in kpss.test(Fuel_data_log_diff$Price_log): p-value smaller than printed
## p-value
##
## KPSS Test for Level Stationarity
##
## data: Fuel_data_log_diff$Price_log
## KPSS Level = 3.7162, Truncation lag parameter = 6, p-value = 0.01
kpss.test(Fuel_data_log_diff$Price_log_diff)
## Warning in kpss.test(Fuel_data_log_diff$Price_log_diff): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: Fuel_data_log_diff$Price_log_diff
## KPSS Level = 0.10982, Truncation lag parameter = 6, p-value = 0.1
The results of the ADF and KPSS tests show that time series is stationary.
ARIMA
par(mfrow = c(1,2))
acf(Fuel_data_log_diff$Price_log_diff,lag.max=20)
pacf(Fuel_data_log_diff$Price_log_diff,lag.max=20)

AIC of ARIMA Models.
AIC(
arima(Fuel_data_log_diff$Price_log,order=c(0,1,1)),
arima(Fuel_data_log_diff$Price_log,order=c(0,1,2)),
arima(Fuel_data_log_diff$Price_log,order=c(0,1,3)),
arima(Fuel_data_log_diff$Price_log,order=c(1,1,0)),
arima(Fuel_data_log_diff$Price_log,order=c(1,1,1)),
arima(Fuel_data_log_diff$Price_log,order=c(2,1,0)),
arima(Fuel_data_log_diff$Price_log,order=c(3,1,0))
)
## df AIC
## arima(Fuel_data_log_diff$Price_log, order = c(0, 1, 1)) 2 -3568.583
## arima(Fuel_data_log_diff$Price_log, order = c(0, 1, 2)) 3 -3612.266
## arima(Fuel_data_log_diff$Price_log, order = c(0, 1, 3)) 4 -3622.255
## arima(Fuel_data_log_diff$Price_log, order = c(1, 1, 0)) 2 -3636.575
## arima(Fuel_data_log_diff$Price_log, order = c(1, 1, 1)) 3 -3634.718
## arima(Fuel_data_log_diff$Price_log, order = c(2, 1, 0)) 3 -3634.702
## arima(Fuel_data_log_diff$Price_log, order = c(3, 1, 0)) 4 -3633.285
According to the above AIC values Models, ARIMA (1,1,0) is the best model with an AIC = -3636.578.
BIC of ARIMA Models
BIC(
arima(Fuel_data_log_diff$Price_log,order=c(0,1,1)),
arima(Fuel_data_log_diff$Price_log,order=c(0,1,2)),
arima(Fuel_data_log_diff$Price_log,order=c(0,1,3)),
arima(Fuel_data_log_diff$Price_log,order=c(1,1,0)),
arima(Fuel_data_log_diff$Price_log,order=c(1,1,1)),
arima(Fuel_data_log_diff$Price_log,order=c(2,1,0)),
arima(Fuel_data_log_diff$Price_log,order=c(3,1,0))
)
## df BIC
## arima(Fuel_data_log_diff$Price_log, order = c(0, 1, 1)) 2 -3559.884
## arima(Fuel_data_log_diff$Price_log, order = c(0, 1, 2)) 3 -3599.219
## arima(Fuel_data_log_diff$Price_log, order = c(0, 1, 3)) 4 -3604.859
## arima(Fuel_data_log_diff$Price_log, order = c(1, 1, 0)) 2 -3627.877
## arima(Fuel_data_log_diff$Price_log, order = c(1, 1, 1)) 3 -3621.670
## arima(Fuel_data_log_diff$Price_log, order = c(2, 1, 0)) 3 -3621.655
## arima(Fuel_data_log_diff$Price_log, order = c(3, 1, 0)) 4 -3615.888
According to the above BIC values Models, ARIMA (1,1,1) is the best model with an BIC = -3621.676.
Auto ARIMA
auto.arima(Fuel_data_log_diff$Price_log,stationary=FALSE,allowdrift=FALSE,
seasonal=FALSE,stepwise=FALSE,approximation=FALSE)
## Series: Fuel_data_log_diff$Price_log
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## 0.6031
## s.e. 0.0332
##
## sigma^2 = 0.0001009: log likelihood = 1820.29
## AIC=-3636.58 AICc=-3636.55 BIC=-3627.88
As per the output of the model, we can conclude that ARIMA(1,1,0) is the best model which can be used for further analysis.
Ljung - Box Test
Model_1 = arima(Fuel_data_log_diff$Price_log,order=c(1,1,0))
forecast::checkresiduals(Model_1)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)
## Q* = 7.5908, df = 9, p-value = 0.5758
##
## Model df: 1. Total lags used: 10
From the viz, we can say that the residuals seem to be white noise.
par(mfrow=c(1,2))
res = Model_1$resid
acf(res,lag.max=20)
pacf(res,lag.max=20)

The RMSE of the corresponding model is 0.039
The Forecast shows us that there wont be any major difference in Fuel price for the upcoming days.
## Time series analysis using Prophet Model
Fuel_data <- Fuel_prices_data %>%
filter(Week >= ymd("2000=01-01")) %>% rename(Year = "Week",Avg_price ='Price') %>%
mutate(Month_yr = format(Year, "%Y-%m")) %>% distinct(Month_yr,.keep_all=TRUE)
as_tibble(Fuel_data)
## # A tibble: 265 × 3
## Year Avg_price Month_yr
## <dttm> <dbl> <chr>
## 1 2022-01-17 00:00:00 3.40 2022-01
## 2 2021-12-27 00:00:00 3.38 2021-12
## 3 2021-11-29 00:00:00 3.48 2021-11
## 4 2021-10-25 00:00:00 3.48 2021-10
## 5 2021-09-27 00:00:00 3.27 2021-09
## 6 2021-08-30 00:00:00 3.24 2021-08
## 7 2021-07-26 00:00:00 3.23 2021-07
## 8 2021-06-28 00:00:00 3.18 2021-06
## 9 2021-05-31 00:00:00 3.12 2021-05
## 10 2021-04-26 00:00:00 2.96 2021-04
## # … with 255 more rows
Prophet Model
The time series data is divided into two parts: train data (from January 2000 to December 2014) and test data (from January 2015 to January 2021). The prophet model is then built based on the train data and forecast. The future forecast period units have been adjusted to reflect the fact that this series is collected at the month level.
prophet_fueldata = Fuel_data %>%
rename(ds = Year,
y = Avg_price)
train = prophet_fueldata %>%
filter(ds<ymd("2015-01-01"))
test = prophet_fueldata %>%
filter(ds>=ymd("2015-01-01"))
model = prophet(train)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future = make_future_dataframe(model,periods = 85, freq = 'month')
forecast = predict(model,future)
plot(model,forecast)+
ylab("Average Fuel Price")+xlab("Date")+labs(title = 'Forecast: Average Fuel Price',subtitle = 'January 2000 - December 2021')+theme_bw()

Time series Components plots
prophet_plot_components(model,forecast)

We can see taht there’s an upward trend in the Jan-Mid of March period and then there’s a fall in the April followed by few fluctuations till July then during Fall it dips and rises afterwards.
Finding Saturation points
After finding the summaries of both the test and train data, we need to find the saturation points which means the threshold of the price which is reasonable as seen in the last 20 years of the data.
summary(train)
## ds y Month_yr
## Min. :2000-01-31 00:00:00 Min. :1.137 Length:180
## 1st Qu.:2003-10-20 00:00:00 1st Qu.:1.686 Class :character
## Median :2007-07-12 12:00:00 Median :2.654 Mode :character
## Mean :2007-07-13 07:36:00 Mean :2.560
## 3rd Qu.:2011-04-04 00:00:00 3rd Qu.:3.373
## Max. :2014-12-29 00:00:00 Max. :4.146
summary(test)
## ds y Month_yr
## Min. :2015-01-26 00:00:00 Min. :1.870 Length:85
## 1st Qu.:2016-10-31 00:00:00 1st Qu.:2.341 Class :character
## Median :2018-07-30 00:00:00 Median :2.589 Mode :character
## Mean :2018-07-28 02:32:28 Mean :2.604
## 3rd Qu.:2020-04-27 00:00:00 3rd Qu.:2.857
## Max. :2022-01-17 00:00:00 Max. :3.478
#setting floor for train data
train$floor = 1.13
train$cap = 4.14
#setting floor for forecast data
future$floor = 1.8
future$cap = 3.4
sat_model = prophet(train,growth='logistic')
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
sat_forecast = predict(sat_model,future)
plot(sat_model,sat_forecast)+ylim(0,4.5)+xlab("Date")+ylab("Avg Price")+labs(title = 'Forecast: Average Price of Fuel ',subtitle = 'Saturation Points')+theme_bw()

We set these saturation thresholds for the prophet model, and the resulting figure shows that the predicted values are substantially different from the previous forecast, and that the increasing trend is not as sharp as it was before the limitations were established.
Examination for Change points
model = prophet(train, n.changepoints = 10)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future = make_future_dataframe(model,periods = 85, freq = 'month')
plot(model,forecast)+add_changepoints_to_plot(model)+xlab("Date")+ylab("Avg Price")+labs(title = 'Forecast: Average Price of Prices',subtitle = 'January 2000 - December 2021')+theme_bw()

These are the two change points observed in the Time series.
Chekcing the Type of Seasonilty
additive = prophet(train)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
add_fcst = predict(additive,future)
plot(additive,add_fcst)+
ylim(0,5)+xlab("Date")+ylab("Avg Price")+labs(title = 'Forecast: Average Price of Fuel',subtitle = 'Additive Seasonality Model')+theme_bw()

prophet_plot_components(additive,add_fcst)

Both the models seem to have the same trends.
Model Assessment
forecast_metric_data = forecast %>%
as_tibble() %>%
filter(ds>=ymd("2015-01-01"))
## Warning in base::check_tzones(e1, e2): 'tzone' attributes are inconsistent
RMSE = sqrt(mean((test$y - forecast_metric_data$yhat)^2))
MAE = mean(abs(test$y - forecast_metric_data$yhat))
MAPE = mean(abs((test$y - forecast_metric_data$yhat)/test$y))
print(paste("RMSE:",round(RMSE,2)))
## [1] "RMSE: 1.85"
print(paste("MAE:",round(MAE,2)))
## [1] "MAE: 1.76"
print(paste("MAPE:",round(MAPE,2)))
## [1] "MAPE: 0.71"
RMSE for the above model is ‘1.85’, whereas MAE for the same model is slightly lower which is ‘1.76’. The Mean Absolute percentage Error is 71%
Cross Validation
The rolling window predictions are compared to actuals in the cross validation result plot. For a one-year forecast horizon, the RMSE is between 0.25 and 0.75. The average absolute percent inaccuracy varies from 10 to 25%.
According to the results, the ARIMA model appears to have fewer errors for predictions in the foreseeable term, whereas larger errors in later periods are similar to the Prophet model. For near-future predictions, Prophet has higher errors, but the variability in the error is lower. We can observe that the ARIMA outperforms the Prophet for this time series based on the RMSE and the data above.