Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
robusta <- mydata %>%select(1, `Coffee, Robusta`)
Constructing time series
dates <- robusta$...1values <-as.numeric(robusta$`Coffee, Robusta`) start_year <-as.numeric(substr(dates[1], 1, 4)) start_month <-as.numeric(substr(dates[1], 6, 7)) robusta_ts <-ts(values, start =c(start_year, start_month), frequency =12)plot.ts(robusta_ts, main ="Prices over time", xlab ="Time", ylab ="Price ($/unit)")
Cheking if data needs transforming
# ADF testlibrary(tseries)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
library(forecast)adf.test(robusta_ts) # Test for stationarity in the time series
Augmented Dickey-Fuller Test
data: robusta_ts
Dickey-Fuller = -2.8812, Lag order = 9, p-value = 0.2053
alternative hypothesis: stationary
plot(robusta_ts, main ="Raw Robusta Prices", ylab ="Price", xlab ="Time")
robusta_transformed_stl <-stl(robusta_transformed, s.window ="periodic")plot(robusta_transformed_stl, main ="STL Decomposition of Transformed Data")
The p-value is greater than the common significance level of 0.05, so we fail to reject the null hypothesis. This suggests that the time series is not stationary. We shall apply differencing
Warning in adf.test(robusta_diff): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: robusta_diff
Dickey-Fuller = -7.1375, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary
plot(robusta_diff, main ="Differenced Time Series", ylab ="Differenced Prices")
Since the p-value (0.01) is less than the standard threshold (0.05), we reject the null hypothesis. This indicates that the differenced series is stationary.
Cheking ARIMA models
library(forecast)model1 <-Arima(robusta_ts, order =c(1, 1, 1))summary(model1)
Series: robusta_ts
ARIMA(1,1,1)
Coefficients:
ar1 ma1
0.3129 0.0185
s.e. 0.0984 0.1029
sigma^2 = 0.02291: log likelihood = 366.36
AIC=-726.73 AICc=-726.7 BIC=-712.75
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.004009412 0.1510789 0.08679787 0.07988763 4.261407 0.1948081
ACF1
Training set -0.0001176458
model2 <-Arima(robusta_ts, order =c(1, 1, 0))summary(model2)
Series: robusta_ts
ARIMA(1,1,0)
Coefficients:
ar1
0.3294
s.e. 0.0338
sigma^2 = 0.02288: log likelihood = 366.35
AIC=-728.69 AICc=-728.68 BIC=-719.38
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.003988237 0.1510821 0.08681053 0.08035099 4.261834 0.1948366
ACF1
Training set 0.002028933
Series: robusta_ts
ARIMA(1,1,0)
Coefficients:
ar1
0.3294
s.e. 0.0338
sigma^2 = 0.02288: log likelihood = 366.35
AIC=-728.69 AICc=-728.68 BIC=-719.38
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.003988237 0.1510821 0.08681053 0.08035099 4.261834 0.1948366
ACF1
Training set 0.002028933
cat("AIC for ARIMA(1,1,1):", AIC(model1), "\n")
AIC for ARIMA(1,1,1): -726.7262
cat("AIC for ARIMA(1,1,0):", AIC(model2), "\n")
AIC for ARIMA(1,1,0): -728.6937
cat("AIC for Auto ARIMA:", AIC(auto_model), "\n")
AIC for Auto ARIMA: -728.6937
ARIMA(1,1,0) is the best model based on the lowest AIC value (−728.6937)
Residual Diagnostics for ARIMA(1,1,0)
checkresiduals(model2)
Ljung-Box test
data: Residuals from ARIMA(1,1,0)
Q* = 63.636, df = 23, p-value = 1.12e-05
Model df: 1. Total lags used: 24
The residuals fluctuate around zero with no visible trend, which is a good sign. Most of the autocorrelations are within the confidence bounds. A very small p-value indicates that the null hypothesis (residuals are white noise) is rejected.
Forecasting with ARIMA(1,1,0)
forecast_values <-forecast(model2, h =12) plot(forecast_values, main ="Forecast for Robusta Coffee Prices")
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 2025 5.296119 5.102251 5.489987 4.999623 5.592615
Feb 2025 5.321523 4.999019 5.644027 4.828296 5.814750
Mar 2025 5.329891 4.903608 5.756174 4.677947 5.981835
Apr 2025 5.332647 4.819483 5.845812 4.547830 6.117464
May 2025 5.333555 4.745111 5.922000 4.433607 6.233504
Jun 2025 5.333854 4.678393 5.989316 4.331412 6.336296
Jul 2025 5.333953 4.617618 6.050288 4.238413 6.429493
Aug 2025 5.333985 4.561529 6.106442 4.152615 6.515356
Sep 2025 5.333996 4.509219 6.158774 4.072608 6.595384
Oct 2025 5.334000 4.460025 6.207974 3.997371 6.670628
Nov 2025 5.334001 4.413454 6.254548 3.926145 6.741856
Dec 2025 5.334001 4.369126 6.298876 3.858353 6.809650
The ARIMA(1,1,0) model provides a reasonable and stable forecast for Robusta coffee prices. The predictions can be trusted as long as no unforeseen external events (e.g., market shocks, policy changes) occur.
MP4
Logarithmic returns
log_returns <-100*diff(log(robusta_ts))plot.ts(log_returns, main ="Logarithmic Returns of Robusta Coffee Prices", xlab ="Time", ylab ="Log Returns")
adf.test(log_returns)
Warning in adf.test(log_returns): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: log_returns
Dickey-Fuller = -6.8838, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary
With this small p-value, logarithmic returns are stationary, so data is suitable for the future modelling.
Experiment with different ARMA models
arma_models <-list()for (p in0:3) {for (q in0:3) {if (p ==0& q ==0) next model <-Arima(robusta_ts, order =c(p, 1, q)) arma_models[[paste0("ARMA(", p, ",", q, ")")]] <- model }}model_comparisons <-data.frame(Model =character(),AIC =numeric(),BIC =numeric(),RMSE =numeric(),stringsAsFactors =FALSE)for (name innames(arma_models)) { model <- arma_models[[name]] model_comparisons <-rbind(model_comparisons,data.frame(Model = name,AIC =AIC(model),BIC =BIC(model),RMSE =sqrt(mean(residuals(model)^2))))}model_comparisons <- model_comparisons[order(model_comparisons$AIC), ]print(model_comparisons)
Ljung-Box test
data: Residuals from ARIMA(1,1,0)
Q* = 63.636, df = 23, p-value = 1.12e-05
Model df: 1. Total lags used: 24
forecast_values_best <-forecast(best_model, h =12)plot(forecast_values_best, main =paste("Forecast with", best_model_name))
forecast_values_best2 <-forecast(best_model2, h =12)plot(forecast_values_best2, main =paste("Forecast with", best_model_name2))
The first two selected models, ARMA(2,3) and ARMA(1,0), are the best based on AIC, BIC, and RMSE. ARMA(2,3) has the lowest AIC and BIC, meaning it offers a good balance between fit and complexity. Its RMSE is also the lowest, indicating it fits the data well. ARMA(1,0) is simpler but still performs well with a close AIC and BIC. Its RMSE is slightly higher, but still competitive. These models are the best because they provide the best fit without overfitting. Previously we identified ARMA (1, 0) as the best, so we continue working with it.
library(FinTS)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Attaching package: 'FinTS'
The following object is masked from 'package:forecast':
Acf
While all models have similar performance, the GARCH(1,1) model stands out with the highest LogLikelihood and the lowest AIC. Thus, based on these criteria, the GARCH(1,1) model is the best fit for the data.
Forecasting
garch_forecast_11 <-ugarchforecast(garch_fit, n.ahead =length(residuals_best2))forecast_volatility <-sigma(garch_forecast_11)par(mfrow =c(1, 2))plot(residuals_best2^2, type ="l", col ="blue", main ="Squared Residuals", ylab ="Squared Residuals", xlab ="Time")plot(forecast_volatility, type ="l", col ="red", main ="Forecasted Volatility (GARCH(1,1))", ylab ="Volatility", xlab ="Time")legend("topright", legend =c("Squared Residuals", "Forecasted Volatility"), col =c("blue", "red"), lty =1)
The left plot shows squared residuals, which highlight volatility clustering. Spikes in volatility, especially around the late 1970s and early 2000s, indicate significant financial events. Calm periods, such as the 1960s and early 1990s, show low squared residuals, suggesting stable volatility.
The right plot shows forecasted volatility using the GARCH(1,1) model. This forecast suggests that volatility will gradually decrease and stabilize, following an exponential decay. The high starting volatility reflects past high residuals, but the GARCH model predicts a return to lower levels unless new shocks occur.