The aim of this project is to analyse and model a time series from a data set which represents the monthly energy expenditure in the United States using various techniques. The objective is to find patterns such as trends, seasonality, fluctuations and any residuals. Also, the project seeks to forecast future energy expenditure, looking into expected future behavior based on historical data.
This data set is collected from Federal Reserve Bank of ST. Louis, which represents monthly energy expenditure in the United States. The graph visually shows how spending on energy evolves over time. Before we analyse, we can see in the graph, an increase in energy spending, this could be partly due to population growth and economic growth. Although the increase is not linear, but you can see a resembles of exponential increase in expenditure.
US_energy_expenditure <- data.frame(
ds = as.Date(raw_data$observation_date, format = "%d/%m/%Y"),
y = raw_data$Amount
)
head(US_energy_expenditure,5)## ds y
## 1 1959-01-01 22.6
## 2 1959-02-01 22.7
## 3 1959-03-01 22.7
## 4 1959-04-01 22.5
## 5 1959-05-01 22.8
The 12-month MAT (Moving Average Trend) smooths out the short term fluctuations to shows the long term trend, the steady increase in energy expenditure. The 12-month MAT removes the seasonal noise as well, providing only the yearly trend. Similarly, the 6-month MAT captures the general trend and some medium-term variations. It reacts more quickly to the changes in the data than the 12-month MAT. Finally, the 3-month MAT is the most sensitive to data movements. It closely follows the original time series, showing the short term fluctuations. The 3-month is most useful for recent data observation analysis and volatility in energy expenditure.
# Plotting the time series with parametric trend (exponential graph)
US_energy_expenditure$t <- 1:nrow(US_energy_expenditure)
model <- lm(log(US_energy_expenditure$y) ~ t, data = US_energy_expenditure)
a <- exp(coef(model)[1])
b <- coef(model)[2]
US_energy_expenditure_exp <- a*exp(b*US_energy_expenditure$t)
plot(US_energy_expenditure$ds, US_energy_expenditure$y, type = "l",
col = "black", lwd = 2,
xlab = "Date",
ylab = "Energy Expenditure (USD billions)",
main = "Time Series of Amount spent on Energy Monthly in US",
cex.main = 0.9)
lines(US_energy_expenditure$ds, US_energy_expenditure_exp,
col = "blue", lwd = 1)
legend("topleft",
legend = c("Actual Data", "Exponential Fit"),
col = c("Black","Blue"),
lty = 1,
lwd = 2)As, said before the data closely resembles exponential increase in energy expenditure. This graph compares the actual data with an exponential trend fitted to this series. However, the exponential fit doesn’t capture the short term fluctuations, indicating that additional components are needed for better modelling.
# Modelling the data and making predictions
data_model <- prophet(US_energy_expenditure)
future_data <- make_future_dataframe(data_model, periods = 12,
freq = "month", include_history = TRUE)
predictions <- predict(data_model, future_data)
plot(data_model, predictions,
xlab = "Date",
ylab = "Energy Expenditure (USD billions)",
main = "Actual Expenditure with forecast",
cex.main = 0.9)This graph shows the actual data with forecasts generated using the Prophet model. The model captures the trend and extends it forecasting the future, providing predicted values of energy expenditure. The blue region represents the uncertainty, indicating the range within which future values are expected to lie, this is known as the confidence interval.
## ds trend additive_terms additive_terms_lower additive_terms_upper
## 1 1959-01-01 7.093920 -1.1755909 -1.1755909 -1.1755909
## 2 1959-02-01 7.392028 -1.2692577 -1.2692577 -1.2692577
## 3 1959-03-01 7.661287 3.4294110 3.4294110 3.4294110
## 4 1959-04-01 7.959395 3.7208291 3.7208291 3.7208291
## 5 1959-05-01 8.247887 0.7949149 0.7949149 0.7949149
## 6 1959-06-01 8.545995 0.2042079 0.2042079 0.2042079
## yearly yearly_lower yearly_upper multiplicative_terms
## 1 -1.1755909 -1.1755909 -1.1755909 0
## 2 -1.2692577 -1.2692577 -1.2692577 0
## 3 3.4294110 3.4294110 3.4294110 0
## 4 3.7208291 3.7208291 3.7208291 0
## 5 0.7949149 0.7949149 0.7949149 0
## 6 0.2042079 0.2042079 0.2042079 0
## multiplicative_terms_lower multiplicative_terms_upper yhat_lower yhat_upper
## 1 0 0 -55.83541 67.27036
## 2 0 0 -49.03243 64.44223
## 3 0 0 -51.64190 71.87900
## 4 0 0 -50.34378 67.76892
## 5 0 0 -53.45057 66.08361
## 6 0 0 -53.90003 68.95874
## trend_lower trend_upper yhat
## 1 7.093920 7.093920 5.918329
## 2 7.392028 7.392028 6.122770
## 3 7.661287 7.661287 11.090698
## 4 7.959395 7.959395 11.680224
## 5 8.247887 8.247887 9.042802
## 6 8.545995 8.545995 8.750203
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.918 93.034 237.452 305.481 545.002 742.203
In this r code output we can see the table headings and the first 5 values of the predictions made by using Prophet model, along with the summary of the predictions from the model. We can see the minimum is $5.198 billions and maximum value predicted is $742.203 billions.
The graphs above show the zoomed-in forecast focusing on the most recent years, 25 years and 5 years, making it easier to compare predicted values with actual observations. The graphs show that the forecast follows the general trend but may slightly deviate from actual values.
# Trend analysis
plot(predictions$ds, predictions$trend,
type = "l",
col = "black",
lwd = 1,
main = "Estimated Trend of Energy Expenditure",
xlab = "Date",
ylab = "Energy Expenditure (USD billions)",
cex.main = 0.9)This graph isolates the trend estimated by the model, showing a smooth and steadily increasing pattern. Though there may be seasonal and random noise, but the trend is that primary driver of the time series is long term growth.
# Seasonality analysis
plot(predictions$ds, predictions$yearly,
type = "l",
col = "black",
lwd = 1,
main = "Seasonality Plot",
xlab = "Day of Year",
ylab = "Seasonal Variation",
cex.main = 0.9)The seasonality plot shows a repeating patter within each year. We can see that in some month energy expenditure is very high, and in some its much lower than the rest of the year. This could be because of more electricity and gas used in the winter for heating, or in the summer for cooling.
The prophet_plot_components present the series into trend and seasonal effect. From our second seasonality graph we can see that some month are above 0, showing periods where expenditure is above average, and some periods where its below 0. This, as said before, is likely driven by seasonal energy demand.
# Residual analysis
US_energy_expenditure$residual <- US_energy_expenditure$y -
predictions$yhat[1:nrow(US_energy_expenditure)]
plot(US_energy_expenditure$ds, US_energy_expenditure$residual,
type = "l",
col = "black",
lwd = 2,
main = "Residual Plot",
xlab = "Date",
ylab = "Residuals",
cex.main = 0.9)
abline(h = 0, col = "red", lwd = 2, lty = 2)The residual plot is the difference between the actual data values and the model predictions. We can see that the values are randomly distributed around the value zero. We can see the residuals increasing over the years, especially after 2000. Though there is no clear pattern, but large deviations suggest potential heteroscedasticity, indicating that the model may not have fully captured the variability in the later years, especially after 2000.
The Prophet model certainly does capture the trend and seasonality effectively of the energy expenditure of US data set. However, deviations in the residuals and forecast inaccuracies in the recent years, suggest that Prophet model may not have fully accounted for external factors or sudden changes in the economy such as the pandemic. Overall, the model provides reasonable forecasts, but the pandemic area are predicted poorly, but this would be expected as its a major external input which distorts the trend, leading to high residuals. This could be seen as a limitation of the model, as external factor can heavily influence the energy expenditure. As of March 2026, we can see energy prices might increase rapidly due to the increase tensions between Iran and US. This may not be captured by the Prophet model as easily, it may underestimate energy expenditure during these times.
Additionally, our analysis is solely based on historical data of energy expenditure which shows no external explanatory variables. If we compare this data set with lets say oil prices over years, we may see similarities. One such similarity could be as oil prices increased so did energy expenditure in US. So, having some information on external variables, such as energy prices during those times, would immensely improve forecasting performance.