A times series refers to a set of data points with some time order. The data can be entered daily, weekly, monthly or even at random (but preferably evenly spaced out). The most important feature is that the data is in a successive order. Using R and RStudio, time series data can be analysed to discover trends and to forecast into the future.
Several packages are required for this analysis. Install any of these using ‘install.packages()’ before loading.
library(quantmod)
library(dplyr)
library(tidymodels)
library(modeltime)
library(tidyverse)
library(timetk)
library(lubridate)
This demonstration will attempt to forecast the price fluctuations of stock prices over time. To do so, several years of previous stock information is required. This can be retrieved fairly easily using the package ‘quantmod’ in RStudio. This package gathers stock prices from Yahoo Finance to fill a data set with daily price information (Monday to Friday only). Feel free to use any stock you’d like. For this demonstration, the stock price of American wrestling company WWE will be analysed. Four years of stock prices will be pulled using ‘quantmod’. Make sure to check Yahoo Finance for the correct ticker of the stock listing of your choice.
price_wwe <- getSymbols("WWE", auto.assign=FALSE, from = "2017-10-01", to = "2021-09-30")
The date for each row is present as the row name. We can change this for simplicity later on.
price_wwe <- as.data.frame(price_wwe)
price_wwe <- tibble::rownames_to_column(price_wwe, "date")
price_wwe$date <- ymd(price_wwe$date)
head(price_wwe)
## date WWE.Open WWE.High WWE.Low WWE.Close WWE.Volume WWE.Adjusted
## 1 2017-10-02 23.74 23.87 23.59 23.72 739300 22.87632
## 2 2017-10-03 23.74 23.77 23.49 23.63 522500 22.78953
## 3 2017-10-04 23.54 23.88 23.53 23.64 366500 22.79917
## 4 2017-10-05 23.65 23.68 23.33 23.42 427600 22.58699
## 5 2017-10-06 23.43 23.63 23.39 23.51 311400 22.67379
## 6 2017-10-09 23.46 23.64 23.35 23.46 341100 22.62557
This data set includes the opening price, closing price, highest and lowest prices for each date. Daily volume is also recorded. For this undertaking, we’ll use the closing price as the variable we’d like to analyse.
An ARIMA model is one way in which to analyse and forecast with time series data. ARIMA stands for autoregressive integrated moving average. It’s best designed for “non-seasonal” time series data (data which doesn’t seem to have a season-influenced pattern).
ARIMA is essential a form of linear regression model, where the predictors are the lags of the model itself. A lag represents a fixed amount of time before a current time point. So a lag of 7 would represent the point of time 7 points before the current one.
For this walk through an “auto ARIMA” will be used to keep things simplistic. This allows for the parameters of the model to be automatically chosen for best suitability. The three terms are
With the daily closing stock price (‘WWE.Close’) being the target variable to forecast, we will need to create a data frame to store the new predictions. As the stock price data goes until the end of the September 2021, let’s try to forecast the stock price for the rest of the year.
future <- data.frame(date = seq(from = as.Date("2021/10/1"), to = as.Date("2021/12/31"), by = "day"),
price = NA)
For this auto ARIMA, fourier vectors will be added for 7, 14 and 30 day periods. The ‘fourier_vec’ function calculates a Fourier Series for the date period specified. A month indicator has also been included.
model_fit_auto_arima <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(
WWE.Close ~ date
+ fourier_vec(date, period = 7)
+ fourier_vec(date, period = 14)
+ fourier_vec(date, period = 30)
+ month(date, label = TRUE),
data = price_wwe
)
## frequency = 5 observations per 1 week
We now need to undergo calibration for this ARIMA model. This will allow for us to forecast future values.
calibrate <- modeltime_table(
model_fit_auto_arima
) %>%
modeltime_calibrate(price_wwe)
The next steps will allow for us to prepare and visualise the forecasted prices.
refit <- calibrate %>%
modeltime_refit(price_wwe)
## frequency = 5 observations per 1 week
refit %>%
modeltime_forecast(
new_data = future,
actual_data = price_wwe
) %>%
plot_modeltime_forecast(.conf_interval_alpha = 0.05)
As you can see, WWE’s stock price is forecast to slightly increase over the last three months of 2021. Implementation of time series analysis such as this may help inform stockholders on whether they should buy, hold or sell any of their own shares.
In addition, times series analysis has many more applications. Almost anything with a time element can be analysed and forecast in such a manner.