Econometrics 2.1 (1)

Author

Vladyslava Bondarenko, Viktoriia Prokopoie, Hlib Usovych, BFE25

Group presentation 1 (week2)

Selecting the data

library(readxl)
mydata <- read_excel("C:/Users/User/Desktop/CMO-Historical-Data-Monthly.xlsx", sheet = "Monthly Prices",  skip = 4, col_names = TRUE)

New names:
• `` -> `...1`

mydata <- mydata[-1, ]

head(mydata)

# A tibble: 6 × 72
  ...1    `Crude oil, average` `Crude oil, Brent` `Crude oil, Dubai`
  <chr>   <chr>                <chr>              <chr>             
1 1960M01 1.63000011444        1.63000011444      1.63000011444     
2 1960M02 1.63000011444        1.63000011444      1.63000011444     
3 1960M03 1.63000011444        1.63000011444      1.63000011444     
4 1960M04 1.63000011444        1.63000011444      1.63000011444     
5 1960M05 1.63000011444        1.63000011444      1.63000011444     
6 1960M06 1.63000011444        1.63000011444      1.63000011444     
# ℹ 68 more variables: `Crude oil, WTI` <chr>, `Coal, Australian` <chr>,
#   `Coal, South African **` <chr>, `Natural gas, US` <chr>,
#   `Natural gas, Europe` <chr>, `Liquefied natural gas, Japan` <chr>,
#   `Natural gas index` <chr>, Cocoa <chr>, `Coffee, Arabica` <chr>,
#   `Coffee, Robusta` <chr>, `Tea, avg 3 auctions` <chr>, `Tea, Colombo` <chr>,
#   `Tea, Kolkata` <chr>, `Tea, Mombasa` <chr>, `Coconut oil` <chr>,
#   Groundnuts <chr>, `Fish meal` <chr>, `Groundnut oil **` <chr>, …

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

robusta <- mydata %>% select(1, `Coffee, Robusta`)

Constructing time series

dates <- robusta$...1  
values <- as.numeric(robusta$`Coffee, Robusta`)  

start_year <- as.numeric(substr(dates[1], 1, 4))  
start_month <- as.numeric(substr(dates[1], 6, 7)) 


robusta_ts <- ts(values, start = c(start_year, start_month), frequency = 12)

plot.ts(robusta_ts, main = "Prices over time", xlab = "Time", ylab = "Price ($/unit)")

library(dplyr)
library(ggplot2)
library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

price_data <- data.frame(
  Time = time(robusta_ts),
  Levels = as.numeric(robusta_ts),  
  Logs = log(as.numeric(robusta_ts)), 
  Differences = c(NA, diff(as.numeric(robusta_ts))) 
)


price_data <- price_data %>% filter(!is.na(Differences))


plot_levels <- ggplot(price_data, aes(x = Time, y = Levels)) +
  geom_line(color = "blue") 


plot_logs <- ggplot(price_data, aes(x = Time, y = Logs)) +
  geom_line(color = "green") 

plot_diff <- ggplot(price_data, aes(x = Time, y = Differences)) +
  geom_line(color = "red") 


grid.arrange(plot_levels, plot_logs, plot_diff, ncol = 1)

Smoothing

library(TTR)
plot(SMA(robusta_ts, n = 50))

plot(SMA(robusta_ts, n = 150))

We see a significant upward trend over the last 15 years, likely driven by climate change and increased demand.

Decomposition

decomp <- stl(robusta_ts, s.window = "periodic")

plot(decomp)

Seasonal

library(forecast)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

ggseasonplot(robusta_ts)

robusta_ts_2000 <- window(robusta_ts, start = c(2000, 1))
ggseasonplot(robusta_ts_2000)

The time series decomposition highlighted:

A strong upward trend.
Seasonal patterns and cycles.
Irregularities, including a significant spike in the 1980s, with 2023 and 2024 with unusually high prices as well.

ACF, PACF

library(forecast)
acf(robusta_ts)

pacf(robusta_ts)

library(tseries)
adf.test(robusta_ts)


    Augmented Dickey-Fuller Test

data:  robusta_ts
Dickey-Fuller = -2.8812, Lag order = 9, p-value = 0.2053
alternative hypothesis: stationary

Making data stationary

robusta_diff <- diff(robusta_ts)
plot.ts(robusta_diff)

acf(robusta_diff)

pacf(robusta_diff)

Autocorrelation and partial autocorrelation plots indicated non-stationarity, with dependency primarily on the previous value. Differencing was applied to achieve stationarity.

Data distribution

robusta$`Coffee, Robusta` <- as.numeric(gsub(",", "", robusta$`Coffee, Robusta`))

robusta <- robusta %>% filter(!is.na(`Coffee, Robusta`))

robusta_prices <- robusta$`Coffee, Robusta`

hist(robusta_prices, col = "skyblue")

hist(robusta_diff, col="burlywood")

Constructing model

library(forecast)
model <- auto.arima(robusta_diff)
summary(model)

Series: robusta_diff 
ARIMA(1,0,0) with zero mean 

Coefficients:
         ar1
      0.3294
s.e.  0.0338

sigma^2 = 0.02288:  log likelihood = 366.35
AIC=-728.69   AICc=-728.68   BIC=-719.38

Training set error measures:
                      ME     RMSE        MAE MPE MAPE      MASE        ACF1
Training set 0.003992463 0.151179 0.08692108 NaN  Inf 0.6543212 0.002026816

checkresiduals(model)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,0) with zero mean
Q* = 63.556, df = 23, p-value = 1.151e-05

Model df: 1.   Total lags used: 24

plot(forecast(model, h = 120))

The model suggests robusta prices largely depend on immediate past values but may need refinement to address long-term trends and recent anomalies. The model fits well, but there is significant autocorrelation in the residuals.

This means the model hasn’t fully captured the structure in the data, and there may be room for improvement. Also some trends like in recent years can affect the model so we need to think how to approach it in the future.