Homework #6 - Spring 2025
Instructions: Please ensure that your graphs and visuals have proper titles and axis labels, where necessary. Refer to the output, whenever appropriate, when discussing the results. Lastly, remember that creativity (coupled with relevance) will be rewarded.
The FRED database has some valuable data on the U.S. and global macroeconomy. One such series is the “Delinquency Rate on Loans to Finance Agricultural Production, All Commercial Banks (DRFAPGACBN)”. More details of the series are available here: https://fred.stlouisfed.org/series/DRFAPGACBN
UI TRUST, a local bank, has approached you to forecast the delinquency rate of future loans. However, you have been provided with no additional guidance beyond this request. I have sent you an outline that I believe will be helpful in assisting you with this forecasting task.
quantmod
package, pull the
DRFAPGACBN
series from the FRED
website. Be
sure to declare this series as a ts()
object with the
appropriate start
and frequency
arguments.
Save the final ts
variable and delrate
.## [1] "DRFAPGACBN"
## DRFAPGACBN
## 1987-01-01 10.76
## 1987-04-01 8.15
## 1987-07-01 6.96
## 1987-10-01 6.87
## 1988-01-01 7.51
## 1988-04-01 5.67
start_year<-1987
start_quarter<-1
delrate <- ts(DRFAPGACBN, start = c(1987,1), frequency = 4)
plot(delrate,
main = "Delinquency Rate on Loans for Agricultural Production",ylab = "Delinquency Rate (%)",xlab = "Year",col = "darkgreen",lwd = 2)
Using the seasonal plot, there appears to be seasonality when you remove the years before 1990. As for the Trend,using the trend graph on the third graph before 1990 there appears to be a trend. After 1990 the data appears flat with a slight downward slope. But very slight.
Using the initial line graph and the seasonal graph it is additive because there is no sign of the sesonality amplifying over time.
Note:
(i) Be sure to explain how you used your graphs to arrive at your conclusion. (ii) I expect you to use/present at least three (3) graphs to supplement your answers.
autoplot(delrate) +ggtitle("Quarterly Delinquency Rate on Agricultural Loans (1987–Present)") +ylab("Delinquency Rate (%)") +xlab("Year") +theme_minimal()
ggseasonplot(delrate, year.labels = TRUE, year.labels.left = TRUE) +ggtitle("Seasonal Plot: Delinquency Rate by Quarter") +ylab("Delinquency Rate (%)") +xlab("Quarter") + theme_minimal()
delrate.train
, and
a test set, delrate.test
. Assign the last five (5) years of
data to the test set.Note: Instead of using mental math or the window
command, I would like you to use the subset()
function to
split the data.
A quick Google search should help you with the syntax. My aim is for you to be comfortable exploring functions that you might not be familiar with from our class notes.
delrate.train <- subset(delrate, time(delrate) >= 1987 & time(delrate) < 2020)
delrate.test <- subset(delrate, time(delrate) >= 2020)
plot(delrate, main = "Delinquency Rate: Training vs. Test Sets", ylab = "Delinquency Rate (%)", xlab = "Year", col = "black", lwd = 2)
lines(delrate.train, col = "blue", lwd = 2)
lines(delrate.test, col = "red", lwd = 2)
legend("topright", legend = c("Full Series", "Training Set", "Test Set"), col = c("black", "blue", "red"), lwd = 2)
del.snaive
.It does not do well because it is based on the last data point. So it shows the seasonality, but fails to catch that dip in delinquencies.
n<-length(delrate)
delrate.train<-subset(delrate,start=1, end = (n-20))
delrate.test<-subset(delrate,start = (n-19), end = n)
del.snaive <- snaive(delrate.train, h = length(delrate.test))
autoplot(delrate, color = "black") +
autolayer(delrate.train, series = "Training Data") +
autolayer(delrate.test, series = "Actual Test Data") +
autolayer(del.snaive$mean,series="Seasonal Naive Forecast") +
ggtitle("Seasonal Naïve Forecast: Delinquency Rate on Agricultural Loans") +
xlab("Year") +
ylab("Delinquency Rate (%)") +
scale_colour_manual(values = c("black", "blue", "red"))+ theme_minimal()+
guides(colour = guide_legend(title = "Series"))
initial = "optimal"
, produce the forecasts over the
appropriate horizon.del.ses
.This method just takes the last data point and uses it for the forecast. It does not capture seasonality or anything like the seasonal naive forecast. And since there is a dip in delinquencies, it misses the mark in capturing that shock like seasonal naive forecast.
alpha
value that the model chose. In particular, how does
the model weigh the most recent observations in the data relative to the
older ones?With a .6238 the model is slightly weighing the most recent observations stronger than the older values.
You can modify the following syntax to display the model results:
xxx$model
.
Holt's
model, forecast the delinquency rate over
the appropriate forecast horizon.del.holt
.del.holt <- holt(delrate.train, h = length(delrate.test), initial = "optimal")
autoplot(delrate, color = "black") +
autolayer(delrate.train, series = "Training Data") +
autolayer(delrate.test, series = "Actual Test Data") +
autolayer(del.holt$mean, series = "Holt Forecast") +
ggtitle("Holt’s Linear Forecast: Delinquency Rate on Agricultural Loans") +
xlab("Year") +
ylab("Delinquency Rate (%)") +
scale_colour_manual(values = c("black", "blue", "red")) +
theme_minimal() +
guides(colour = guide_legend(title = "Series"))
Holt is a smooth forecast, that does not take seasonality into account much. It seems to have the past trend in there, but indefinitely.
Was this model able to capture potential seasonality in the delinquency rates?
NO.
Do you believe the point at which you subset the data actually mattered for your model predictions above?
Yes, because it picked up the trend in the data. Maybe including the more recent data would have helped this forecast capture the shock in delinquency rates.
Holt-Winters'
model with multiplicative seasonality. Store your model
estimation results as del.hw
then reproduce the graph above
for this model’s results.del.hw <- hw(delrate.train, seasonal = "multiplicative", h = length(delrate.test), initial = "optimal")
autoplot(delrate, color = "black") +
autolayer(delrate.train, series = "Training Data") +
autolayer(delrate.test, series = "Actual Test Data") +
autolayer(del.hw$mean, series = "Holt-Winters Forecast") +
ggtitle("Holt-Winters Forecast: Multiplicative Seasonality") +
xlab("Year") +
ylab("Delinquency Rate (%)") +
scale_colour_manual(values = c("black", "blue", "red")) +
theme_minimal() +
guides(colour = guide_legend(title = "Series"))
In a single graph, plot the full data and the point estimate of each of your models above. autoplot(delrate, color = “black”) + autolayer(del.snaive\(mean, series = "Seasonal Naïve") + autolayer(del.ses\)mean, series = “SES”) + autolayer(del.holt\(mean, series = "Holt") + autolayer(del.hw\)mean, series = “Holt-Winters”) + ggtitle(“Forecast Comparison: Seasonal Naïve, SES, Holt, and Holt-Winters”) + xlab(“Year”) + ylab(“Delinquency Rate (%)”) + scale_colour_manual(values = c( “black”, # Actual “red”, # Seasonal Naïve “blue”, # SES “green”, # Holt “orange” # Holt-Winters )) + theme_minimal() + guides(colour = guide_legend(title = “Model”))
Which of the four (4) forecasting models
(del.snaive
, del.ses
, del.holt
or
del.hw
) provides the better forecast?
SES is the better forecast because it has the lowest RMSE, MAPE, and MSE.
Use the RMSE
, MAPE
, and MSE
to
make your justify your final determination. You are
required to create a table using the kable
command here.
Please ensure that your table is correctly formatted to 2 decimal places
and the models are appropriately labeled. acc_snaive <-
accuracy(del.snaive, delrate.test) acc_ses <- accuracy(del.ses,
delrate.test) acc_holt <- accuracy(del.holt, delrate.test) acc_hw
<- accuracy(del.hw, delrate.test) model_comparison <- data.frame(
Model = c(“Seasonal Naïve”, “SES”, “Holt”, “Holt-Winters”), RMSE =
c(acc_snaive[“Test set”, “RMSE”], acc_ses[“Test set”, “RMSE”],
acc_holt[“Test set”, “RMSE”], acc_hw[“Test set”, “RMSE”]), MAPE =
c(acc_snaive[“Test set”, “MAPE”], acc_ses[“Test set”, “MAPE”],
acc_holt[“Test set”, “MAPE”], acc_hw[“Test set”, “MAPE”]), MSE =
c(acc_snaive[“Test set”, “RMSE”]^2, acc_ses[“Test set”, “RMSE”]^2,
acc_holt[“Test set”, “RMSE”]^2, acc_hw[“Test set”, “RMSE”]^2))
model_comparison[ , 2:4] <- round(model_comparison[ , 2:4], 2)
kable(model_comparison, caption = “Forecast Accuracy Comparison:
Seasonal Naïve, SES, Holt, Holt-Winters”)
checkresiduals()
command of your preferred
model, comment on whether the residuals appear to be white noise.
For your answers, be sure to discuss the null of the Ljung-Box
statistics and your conclusion at the 1% and 5% levels of significance.
The P value is far below the 1% and 5% level of significance. So we reject the null. The residuals contain patterns meaning the model maybe missing trend, seasonality, or structure.
checkresiduals(del.ses)
Your report should include an explanation of how you arrived at the preferred model but try to avoid boring them with all the technical details involved.
Since you have a preferred model, use it on the full data to provide forecasts for the next three (3) years (h = 12) and potentially frame your discussion around that. What do the 90% Prediction intervals around your estimate look like?
What does your forecast imply about the delinquency rate on loans to the Ag sector in the next three (3) years? Do you expect a(n) decline/increase?
In the main, this question is free-form, and my attempt to have you explain your model results to a non-scientific audience.
Well, I call this, explaining it to my grandma. What I would say is that we chose model 4, I will tell you that is SES but I wouldn’t tell her to keep things simple. I would just call it model 4 or maybe not mention that to keep down confusion. I would tell my grandmother that over the next three years that farmers will pay their loans the same way they always have, with cash. No, that is a joke to loosen up the executives. I would tell the executives that I would tell my grandma over the next three years the same amount of farmers that paid off their loans the last three years will pay their loans the next three years. Nothing will change. Things will stay the same. I hope this helps.