Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
robusta <- mydata %>%select(1, `Coffee, Robusta`)
Constructing time series
dates <- robusta$...1values <-as.numeric(robusta$`Coffee, Robusta`) start_year <-as.numeric(substr(dates[1], 1, 4)) start_month <-as.numeric(substr(dates[1], 6, 7)) robusta_ts <-ts(values, start =c(start_year, start_month), frequency =12)plot.ts(robusta_ts, main ="Prices over time", xlab ="Time", ylab ="Price ($/unit)")
library(dplyr)library(ggplot2)library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
price_data <-data.frame(Time =time(robusta_ts),Levels =as.numeric(robusta_ts), Logs =log(as.numeric(robusta_ts)), Differences =c(NA, diff(as.numeric(robusta_ts))) )price_data <- price_data %>%filter(!is.na(Differences))plot_levels <-ggplot(price_data, aes(x = Time, y = Levels)) +geom_line(color ="blue") plot_logs <-ggplot(price_data, aes(x = Time, y = Logs)) +geom_line(color ="green") plot_diff <-ggplot(price_data, aes(x = Time, y = Differences)) +geom_line(color ="red") grid.arrange(plot_levels, plot_logs, plot_diff, ncol =1)
Smoothing
library(TTR)plot(SMA(robusta_ts, n =50))
plot(SMA(robusta_ts, n =150))
We see a significant upward trend over the last 15 years, likely driven by climate change and increased demand.
Autocorrelation and partial autocorrelation plots indicated non-stationarity, with dependency primarily on the previous value. Differencing was applied to achieve stationarity.
Series: robusta_diff
ARIMA(1,0,0) with zero mean
Coefficients:
ar1
0.3294
s.e. 0.0338
sigma^2 = 0.02288: log likelihood = 366.35
AIC=-728.69 AICc=-728.68 BIC=-719.38
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.003992463 0.151179 0.08692108 NaN Inf 0.6543212 0.002026816
checkresiduals(model)
Ljung-Box test
data: Residuals from ARIMA(1,0,0) with zero mean
Q* = 63.556, df = 23, p-value = 1.151e-05
Model df: 1. Total lags used: 24
plot(forecast(model, h =120))
The model suggests robusta prices largely depend on immediate past values but may need refinement to address long-term trends and recent anomalies. The model fits well, but there is significant autocorrelation in the residuals.
This means the model hasn’t fully captured the structure in the data, and there may be room for improvement. Also some trends like in recent years can affect the model so we need to think how to approach it in the future.