Time Series data refer to the historical representation of data points collected at periodic intervals of time (Austin, 2015). These historical data are used within statistical tools such as R
or Rstudio
to make statistical models that can predict future values at reasonable error rate. In this post Rstudio
in combination with rmarkdown
was used to forecast the exchange rate of Kenyan Shilling against US dollar using daily closing rate of the shilling collected over a period of 891 days (1/01/2017 to 7/23/2020).
The data for this post was obtained from Central Bank of Kenya. The data is dynamic; thus, this original dataset was saved online for future retrieval here.
The first 10 observations of the data are presented in table 1.
data <- read.csv("Key CBK Indicative Exchange Rates.csv")
library(knitr)
kable(head(data, 10), caption = "Table 1: First 10 Exchange Rates")
ï..Date | Currency | Mean | Buy | Sell |
---|---|---|---|---|
23/07/2020 | US DOLLAR | 108.1288 | 108.0288 | 108.2288 |
22/07/2020 | US DOLLAR | 107.8797 | 107.7800 | 107.9794 |
21/07/2020 | US DOLLAR | 107.6209 | 107.5212 | 107.7206 |
20/07/2020 | US DOLLAR | 107.4329 | 107.3329 | 107.5329 |
17/07/2020 | US DOLLAR | 107.4582 | 107.3582 | 107.5582 |
16/07/2020 | US DOLLAR | 107.3494 | 107.2494 | 107.4494 |
15/07/2020 | US DOLLAR | 107.2850 | 107.1853 | 107.3847 |
14/07/2020 | US DOLLAR | 107.1471 | 107.0476 | 107.2465 |
13/07/2020 | US DOLLAR | 107.0512 | 106.9512 | 107.1512 |
10/07/2020 | US DOLLAR | 106.9003 | 106.8006 | 107.0000 |
From table 1, it is clear that we only need the Mean
column and convert the series into time series and plot the series. Figure 1 shows the time plot of the exchange rate.
rate <- data[3]
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2017-01-01"), as.Date("2020-07-23"), by = "day")
## Create a time series object
rate.ts <- ts(rate, start = c(2017, as.numeric(format(inds[1], "%j"))),
frequency = 365)
plot.ts(rate.ts, main = "Figure 1: Time Plot of Kenyan Shillings - US Dollar Exchange Rate", col = "blue")
The components of the time series can be seen in figure 1.
A seasonal time series consists of a trend, seasonal and an error component. The process of decomposing the time series means separating the time series into these three components (Coghlan, 2018).
The decompose()
function can be used to estimate the trend and seasonal component of a time series described using an additive model. The function “decompose()
” returns a list object as its result, where the estimates of the seasonal, trend and error component are stored in named elements of that list objects, called “seasonal”, “trend”, and “random” respectively (Coghlan, 2018). The figure 2 shows a plot of decomposed series.
decomposed.series <- decompose(rate.ts)
# Define a plot function to allows us change the title
decomp.plot <- function(x, main = NULL, ...)
{
if(is.null(main))
main <- paste("Decomposition of", x$type, "time series")
plot(cbind(observed = x$random + if (x$type == "additive")
x$trend + x$seasonal
else x$trend * x$seasonal, trend = x$trend, seasonal = x$seasonal,
random = x$random), main = main, ...)
}
decomp.plot(decomposed.series, col = "blue", main = "Figure 2: Decomposed Series Plot")
The decomposed series shows that seasonality and trend. However, we need to perform a stationarity test to check if the data is stationary. The Augmented Dickey-Fuller unit root test was used as follows:
Null hypothesis \(H_{0}:\) Unit root exists
Alternative hypothesis \(H_{a}:\) stationary
library(tseries)
adf.test(rate.ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: rate.ts
## Dickey-Fuller = -2.5421, Lag order = 9, p-value = 0.3489
## alternative hypothesis: stationary
It was expected that the series is stationary at 95% confidence level, but the ADF test statistic = -2.54
, p-value = 0.35
implies that the p-vale is greater than 0.05, thus, fail to reject the null hypothesis of existence of unit root. Therefore, the series is not stationary. One way of making the series stationary is through differencing.
differenced.series <- diff(rate.ts, differences = 1)
decomp.plot(decompose(differenced.series), col = "blue", main = "Figure 3: Decomposed Differences Series Plot")
Figure 3 shows the series to have a constant variance with no trend, an indication that the first differencing have made the series stationary. However, ADF test was performed to confirm stationarity.
adf.test(differenced.series, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: differenced.series
## Dickey-Fuller = -8.2375, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary
Similarly, the expectation was that the series is stationary at 95% confidence level, the ADF test statistic = -8.24
, p-value = 0.01
implies that the p-vale is less than 0.0; thus, reject the null hypothesis of existence of unit root. Therefore, the series is stationary.
The historical data has been plotted as time series and differenced to make it stationary. Now we move on to forecasting. The fluctuations in Kenyan Shillings time series plot is fairly consistent over time. Therefore, simple exponential smoothing can be used to make predictions. In R the HoltWinters()
function is used for simple exponential smoothing.
model <- HoltWinters(differenced.series ,beta=FALSE,gamma=FALSE)
plot(model, main ="Figure 3: Holts-Winters Filtering", col = "blue")
The exponential smoothing is defined by setting the parameter beta=FALSE
. Since we are dealing with non-seasonal model the parameter gamma=FALSE
. The resulting filtered model is presented in figure 3.
The dataset is now ready for forecasting after estimation of the exponential smoothing model. The forecast.HoltWinters()
function in forecast package was used for the forecasting.
library(forecast)
forecasts1 <- forecast:::forecast.HoltWinters(model,h=60)
The parameter h=30 indicates forecasting for the next 60 intervals which is next 60 days in this instance. The output of forecast.HoltWinters is shown in figure 4.
plot(forecasts1, main = "Figure 4: Forecasts from HoltWinters")
The dark grey shaded area in figure 4 represents the range of Kshs exchange rate for next 60 days with 80% prediction. The lighter shade of grey represents the Kshs exchange rate with 90% prediction.
The forecast model is subject to prediction errors. The validation of the model involves studying the distribution of forecast errors. From the The Holt Winters forecast model creates the residuals can be extracted into a dataset. A plot of errors should follow a normal distribution with mean of zero. If the errors does not shows bell shape then the errors are high and the model is not sufficient.
hist(forecasts1$residuals, col="blue", main = "Figure 5: Histogram of Forecast Residuals", xlab = "Residuals")
Figure 5 shows the histogram of forecast errors which are approximately bell-shaped. The mean is approximately zero. Therefore, the model fits the data well.
Austin, B. (2015, August 2). Forecasting Exchange Rates Using R Time Series. Retrieved July 24, 2020, from Benny Austin website: https://bennyaustin.com/2015/08/03/r-time-series/
Coghlan, A. (2018). Using R for Time Series Analysis — Time Series 0.2 documentation. Retrieved July 24, 2020, from a-little-book-of-r-for-time-series.readthedocs.io website: https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html