Here I am discussing modeling of imported rice prices in Mogadishu using ARIMA model.

#Setting the directory.
setwd("E:/EDWIN/EDWIN/STATISTICS/MASTER/PROJECT MASTER/ARIMA-R/FSNAU Data")
#Packages which we are using here include:
library(shiny)
library(rmarkdown)
library(knitr)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#Reading in imported rice prices data.
Rice_mog = read.csv("Rice_mog.csv",sep = ",",header = TRUE)
View(Rice_mog)

There is need to transform our data so that we have variables “Year”, “Month” and “Price”

rice_long = gather(Rice_mog,month,price,Jan:Dec,factor_key = TRUE)
rice_long_sort = arrange(rice_long,Year)
View(rice_long_sort)

We need to only pick the column/variable for time series “price” and change it to a time series data.

#Saving the imported data variable as time series data.
data = ts(rice_long_sort[,3],start = c(1996,1),end = c(2021,9),frequency = 12)

#Plotting the read data.
plot(data,xlab = "Years",ylab = "Imported Rice Prices (in SOS)",main = "Mogadishu Imported Rice Prices")

Step 2: Difference data to make data stationary on mean (remove trend) Consider the 1st difference:

data_diff = diff(data)

#Plotting the difference data
plot(data_diff,xlab = "Years",ylab = "Differenced rice prices",main = "Plot of differenced rice prices")

From the plot, the differenced data is not stationary on variance, i.e. variation in the plot seems to be high at the beginning and highest between 2007 and 2009; while it decreases towards the end (as we move to 2021). We need to make the series stationary on variance to produce reliable forecasts through ARIMA models.

Step 3: log transform data to make data stationary on variance.

Data_log = log10(data)
plot(Data_log,xlab = "Years", ylab = "Log (rice prices)")

The series now looks stationary on variance as can be seen from the plot above

Step 4: Difference log transform data to make data stationary on both mean and variance.

data_log_diff = diff(Data_log)
#Plot the transformed diffrenced data.
plot(data_log_diff,xlab = "Years",ylab = "Diffrenced Log (rice prices)")

Step 5: Plot ACF and PACF to identify potential AR and MA model.

par(mfrow = c(1,2))
acf(ts(data_log_diff),main = "ACF rice prices")
pacf(ts(data_log_diff),main = "PACF rice prices")

Since, there are enough spikes in the plots outside the insignificant zone (dotted horizontal lines) we can conclude that the residuals are not random. This implies that there is juice or information available in residuals to be extracted by AR and MA models.

Step 6: Identification of best fit ARIMA model.

Auto arima function in forecast package in R helps us identify the best fit ARIMA model on the fly. The following is the code for the same. The required ‘forecast’ package is already installed.

library(forecast)

ARIMAfit = auto.arima(log10(data), approximation=FALSE,trace=FALSE)
summary(ARIMAfit)
## Series: log10(data) 
## ARIMA(0,1,1)(0,0,1)[12] 
## 
## Coefficients:
##           ma1    sma1
##       -0.4519  0.1560
## s.e.   0.0474  0.0779
## 
## sigma^2 estimated as 0.004133:  log likelihood=408.99
## AIC=-811.99   AICc=-811.91   BIC=-800.8
## 
## Training set error measures:
##                       ME       RMSE        MAE       MPE      MAPE      MASE
## Training set 0.002941354 0.06397213 0.03223562 0.0596825 0.8167661 0.4054556
##                     ACF1
## Training set -0.02360343

The best fit model is selected based on Akaike Information Criterion (AIC) , and Bayesian Information Criterion (BIC) values. The idea is to choose a model with minimum AIC and BIC values. Best fit model has MA value of order 1. Also, there is seasonal MA with lag 12 of order 1.

Step 6: Forecast sales using the best fit ARIMA model

The next step is to predict imported rice prices for 15 months i.e. for the remaining months of 2021 to 2022 through the above model.

par(mfrow = c(1,1))
pred = predict(ARIMAfit, n.ahead = 15)
pred
## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2021                                                                        
## 2022 4.106298 4.110049 4.108679 4.112142 4.109891 4.111058 4.115035 4.111410
##           Sep      Oct      Nov      Dec
## 2021          4.106503 4.108130 4.108579
## 2022 4.111175 4.111619 4.111619 4.111619
## 
## $se
##             Jan        Feb        Mar        Apr        May        Jun
## 2021                                                                  
## 2022 0.08864043 0.09538684 0.10168663 0.10761828 0.11323965 0.11859486
##             Jul        Aug        Sep        Oct        Nov        Dec
## 2021                                  0.06428496 0.07330811 0.08133635
## 2022 0.12371849 0.12863820 0.13337657 0.14084697 0.14661800 0.15217033
plot(data,type='l',xlim=c(2000,2022),ylim=c(1,45000),xlab = 'Year',ylab = 'Imported rice prices')
lines(10^(pred$pred),col='blue')
lines(10^(pred$pred+2*pred$se),col='orange')
lines(10^(pred$pred-2*pred$se),col='orange')