Predicting Microsoft’s Stock Price Using ARIMA, SES, SVR and ANN

Several research studies on stock predictions have been conducted with various solution techniques proposed over the years. The prominent techniques fall into two broad categories, namely, statistical and soft computing techniques. The purpose of this study was to check if econometric models preformed better than machine learning models in time series prediction. I have covered a few of the Statistical techniques covered i.e. simple exponential smoothing (SES), autoregressive integrated moving average (ARIMA). The ARIMA model is commonly used in analysis and forecasting, were as, the SES is also used for the same but where the forecasting data has no trend or seasonal pattern in them. The soft computing/machine learning techniques covered are Artificial Neural Network (ANN) and Support Vector Regression (SVR). ANNs and SVRs have been found to be very efficient in solving nonlinear problems.

We compare the above four methods to predict the close price of Microsoft Inc (MSFT).

Input Data

The data used for all the above methods (ARIMA, SES, ANN, and SVR) were historical daily prices of the MSFT (Microsoft Inc). The adjusted closing price was chosen to be modeled and predicted. This is because the adjusted closing price reflects not only reflects the closing price as a starting point, but it takes into account factors such as dividends, stock splits and new stock offerings to determine a value. The adjusted closing price represents a more accurate reflection of a stock’s value since distributions and new offerings can alter the closing price. This study used the Microsoft stock data used that covered the period from January 01, 1992, to June 20, 2018, having a total number of 6669 observations.

The ANN and SVM method were fed with one day lagged values, so that they can be used to predict the next day’s close price (AdjCloseL1).

A train and test set was created which was common for all the four methods. The range was as follows:

Training Set Range: 01 Jan 1992 - 20 June 2017
Test Set Range: 21 June 2017 - 20 June 2018

ARIMA Forecasting for MSFT Close Price

We shall a follow five-step process as shown below:

Step 1: Visualize the Time Series

The training set is shown in blue and the testing set is shown in green.

suppressMessages(library(forecast))
suppressMessages(library(tseries))
suppressMessages(library(quantmod))
suppressMessages(library(caret))
suppressMessages(library(nnet))

# Data Reading
data <- read.csv("MSFT.csv",header=T)

# Daily Seasonality (frequency set to 1 for daily data)
msft <- ts(data[,2],frequency=1)

# Delimit training range
msft.train <- window(msft,end=c(6417,1))

# Delimit testing range
msft.test <- window(msft,start=c(6418,1))


# Training and testing ranges chart
plot(msft,main="MSFT 1992-2018",ylab="Price",xlab="Days")
lines(msft.train,col="blue")
lines(msft.test,col="green")
legend("bottomright",col=c("blue","green"),lty=1,legend=c("Training","Testing"))

We shall perform the PP Unit Root and ADFC Test to check for stationarity in the above time series

# Unit Root Tests
# Augmented Dickey-Fuller Test ADF
adf.test(msft.train,alternative="stationary")

## 
##  Augmented Dickey-Fuller Test
## 
## data:  msft.train
## Dickey-Fuller = -0.71476, Lag order = 18, p-value = 0.9693
## alternative hypothesis: stationary

The p-value of the above ADF test shows that the time series of MSFT is nonstationary. Similarly, we can confirm this by using the PP test as shown below:

# Phillips-Perron Test 
pp.test(msft.train,alternative="stationary")

## 
##  Phillips-Perron Unit Root Test
## 
## data:  msft.train
## Dickey-Fuller Z(alpha) = -3.1503, Truncation lag parameter = 11,
## p-value = 0.9275
## alternative hypothesis: stationary

Since the above time series is not stationary, we proceed to First Order Differencing in Step 2.

Step 2: Stationarize the Series

There are three commonly used techniques to make a time series stationary:
1.Detrending: Here, we simply remove the trend component from the time series.
2.Differencing: This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and not the actual term. This differencing is called as the Integration part in AR(I)MA.
3.Seasonality: Seasonality can easily be incorporated in the ARIMA model directly.

We can check by differencing the time series once, as follows:

# Time Series Differencing
plot(diff(msft), type="l",main="MSFT 1992-2018",ylab="Price Differences",xlab="Days")

Noe applying the PP and the ADF test on the differenced time series shows below that the time series is stationary:

# Apply Unit Root Tests on Differenced data
# Augmented Dickey-Fuller Test ADF
adf.test(diff(msft.train),alternative="stationary")

## Warning in adf.test(diff(msft.train), alternative = "stationary"): p-value
## smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(msft.train)
## Dickey-Fuller = -18.651, Lag order = 18, p-value = 0.01
## alternative hypothesis: stationary

# Apply Unit Root Tests on Differenced data
# Phillips-Perron Test PP 
pp.test(diff(msft.train),alternative="stationary")

## Warning in pp.test(diff(msft.train), alternative = "stationary"): p-value
## smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  diff(msft.train)
## Dickey-Fuller Z(alpha) = -6281.5, Truncation lag parameter = 11,
## p-value = 0.01
## alternative hypothesis: stationary

The above differenced time series shows that it is stationary and we are now left with identifying the p and q parameters for the ARIMA model.

Step 3: Find Optimal Parameters

The parameters p, and q can be found using ACF and PACF plots.

# ARIMA Models Specification
# Normal and Partial Autocorrelation Functions ACF & PACF
acf(diff(msft.train))

pacf(diff(msft.train))

Step 4: Build ARIMA Model

The above plots show that we have an ARIMA (0,1,0) model.

With the parameters in hand, we can now try to build ARIMA model. The value found in the previous section might be an approximate estimate and we need to explore more (p,d,q) combinations which can also be done using the auto.arima function which is explored later.The one with the lowest BIC and AIC would be our choice.

#Implement ARIMA (0,1,0) model
model1 <- Arima(msft.train,order=c(0,1,0),include.constant=T)
model1

## Series: msft.train 
## ARIMA(0,1,0) with drift 
## 
## Coefficients:
##        drift
##       0.0104
## s.e.  0.0057
## 
## sigma^2 estimated as 0.2099:  log likelihood=-4094.75
## AIC=8193.5   AICc=8193.51   BIC=8207.04

tsdisplay(residuals(model1), lag.max=45, main='(0,1,0) Model Residuals')

The above ARIMA (0,1,0) model has an AIC of 8193.5 and BIC of 8207.04.Let us also check if the residuals are from a white noise series by performing the following:

#Check Residuals forARIMA (0,1,0) model
checkresiduals(model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,0) with drift
## Q* = 23.683, df = 9, p-value = 0.004832
## 
## Model df: 1.   Total lags used: 10

Thus we can confirm that the residuals are not distinguishable from a white noise series as the results are not significant.

We can also reconfirm whether the ARIMA (0,1,0) was indeed a good model by using the auto.arima test in R, as follows:

#Checking using auto.arima
modelauto<-auto.arima((msft.train), seasonal=FALSE)
modelauto

## Series: (msft.train) 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.0203  0.0104
## s.e.   0.0124  0.0056
## 
## sigma^2 estimated as 0.2098:  log likelihood=-4093.42
## AIC=8192.84   AICc=8192.84   BIC=8213.14

auto.arima suggests us that we need to use the ARIMA (0,1,1) model which has an AIC of 8192.84 and BIC of 8213.14. Though the AIC value of the ARIMA (0,1,1) model is slightly lower by a few basis points than ARIMA (0,1,0) model, the BIC is still higher for the ARIMA (0,1,1) model.

Thus we shall proceed with the ARIMA (0,1,0) model.

Step 5: Make Predictions using Multi-Steps Forecast for forward-looking 252 trading days (1 year) and using One-Step Forecast without Re-Estimation on the Test Data set

We can make predictions using two methods: a) Multi-Step Forecast and b) One-Step Forecast without Re-Estimation

a) Multi-Step Forecast

The Multi-Step Forecast does the job of forecasting the next 252 trading days (1 year) without using the test data set.

# Multi-Steps Forecast
plot(forecast(model1,h=252),main="ARIMA(0,1,0) using Multi-Steps Forecast",ylab="Price",xlab="Date")
lines(msft.test,lty=3)

and the accuracy of the Multi-Step Forecast is as follows:

# Multi-Steps Forecast Accuracy
accuracy(forecast(model1,h=252),msft.test)[2,1:6]

##       ME     RMSE      MAE      MPE     MAPE     MASE 
## 14.32167 17.10449 14.38729 15.88804 15.98537 51.47557

Thus this method of forecasting gives an RMSE of 17.10449.

b) One-Step Forecast without Re-Estimation

The One-Step Forecast does forecasting considering test data set.

# One-Step Forecast without Re-Estimation
model2 <- Arima(msft.test,model=model1)$fitted
plot(msft.train,main="ARIMA(0,1,0) using One-Step Forecast without Re-Estimation",ylab="Price",xlab="Date",ylim=c(min(msft),max(msft)))
lines(model2,col="green")
lines(msft.test,lty=3)
legend("bottomright",col="green",lty=1,legend="Forecasted Price")

and the accuracy of the One-Step Forecast without Re-Estimation is as follows:

# One-Step Forecast without Re-Estimation Accuracy
accuracy(model2,msft.test)[1,1:5]

##        ME      RMSE       MAE       MPE      MAPE 
## 0.1203627 1.2033914 0.8048090 0.1330234 0.9367135

The One-Step Forecasting without Re-Estimation does a better job by giving an RMSE of 1.2033914.

Simple Exponential Smoothing (SES) Forecasting for MSFT Close Price

The Simple Exponential Smoothing method is used for forecasting a time series when there is no trend or seasonal pattern, but the mean (or level) of the time series y keeps slowly changing over time. The SES method is effective when the parameters describing the time series are changing SLOWLY over time.

In practice, alpha which is used in the SSE function is kept equal to 0.1-0.2 tends to perform quite well but we’ll demonstrate shortly how to tune this parameter. When alpha is closer to 0. This is called slow learning because the algorithm gives historical data more weight. When alpha is closer to 1 we consider this fast learning because the algorithm gives more weight to the most recent observation; therefore, recent changes in the data will have a bigger impact on forecasted values.

The SES prediction for an alpha (Value of smoothing parameter) of 0.05 for the next 252 days (1 Year of Test period) is as follows:

# SES's predction over the 80th and 90th confidence interval for alpha=0.05 and h=252
ses.msft.train.dif <- ses(diff(msft.train), alpha = .05, h = 252)
autoplot(ses.msft.train.dif)

the accuracy for the model is :

# Accuracy on the test and train set
accuracy(ses.msft.train.dif, diff(msft.test))

##                        ME      RMSE       MAE  MPE MAPE      MASE
## Training set 7.753846e-05 0.4650732 0.2843675  NaN  Inf 0.6927139
## Test set     9.866647e-02 1.2037669 0.8059040 -Inf  Inf 1.9631669
##                     ACF1 Theil's U
## Training set -0.04077789        NA
## Test set     -0.21008351         0

Thus we can see that the model gives us an RME of 1.2037669 on the test set.

We can perform the above step using an alpha of 0.90 which is close to 1, to see if the model does any better if we weigh most recent observations than historical one’s.

# SES's predction over the 80th and 90th confidence interval for alpha=0.90 and h=252
ses.msft.train.dif <- ses(diff(msft.train), alpha = .90, h = 252)

the accuracy for the model is :

# Accuracy on the test and train set
accuracy(ses.msft.train.dif, diff(msft.test))

##                         ME      RMSE       MAE MPE MAPE      MASE
## Training set -0.0001392492 0.6232055 0.3912116 NaN  Inf 0.9529841
## Test set      0.9013851620 1.5006048 1.1859692 Inf  Inf 2.8889985
##                    ACF1 Theil's U
## Training set -0.4614367        NA
## Test set     -0.2100835       NaN

and we an RME of 1.5006048 on test set in above model using an alpha closer to 1.

Thus we can conclude that an alpha closer 1 gives a better model in predicting the test set.

Artificial Neural Network(ANN) Method for predicting MSFT Close Price

This study employed a three-layer (one hidden layer) multilayer perceptron model trained with back-propagation algorithm.Sermpinis et al., (2012); Dai et al, (2012) and Atsalakis-Valavanis (2009) pointed out, that feed-forward neural networks (FFNN) can be the most effective to forecast financial time series.

# Preparing input data for ANN
colnames(data) <- c("Date","AdjClose")
data$AdjCloseL1 <- Lag(data$AdjClose,1)
data <- na.omit(data)
# Delimit training range
data.train <- data[1:6417,]
# Delimit testing range
data.test <- data[6418:6668,]

The training parameters were set as follows: decay rate = 0.00001, number of units in the hidden layer = 10, and epoch size = 10000. Finally, the network was tested with the data set to estimate its generalization ability.

#N Net Function for Close Price
set.seed(1)
neural.price.model <- nnet(AdjClose ~ AdjCloseL1, data = data.train,
                           maxit=10000,  size = 10, decay = 0.00001, linout = 1, skip=TRUE, MaxNWts=10000, trace=FALSE)
#Predict Close Price on test data
data.test$NN_Prediction <- predict(neural.price.model, newdata = data.test)

#calculate RMS error
rmse <- sqrt(mean((data.test$NN_Prediction-data.test$AdjClose)^2))
rmse

## [1] 1.202236

We get an RMSE of 1.202236 when gauged against the test data.

Support Vector Regression(SVR) Method for predicting MSFT Close Price

SVR uses the same basic idea as Support Vector Machine (SVM), a classification algorithm, but applies it to predict real values rather than a class. SVR acknowledges the presence of non-linearity in the data and provides a proficient prediction model. A major benefit of using SVR is that it is a non-parametric technique. The SVR does not depend on distributions of the underlying dependent and independent variables. Instead the SVR technique depends on kernel functions. The close prices of Microsoft were predicted as shown below:

# SVR Model
trainControl = trainControl(method="repeatedcv", number=10, repeats=10)
svm.price.model = train(AdjClose ~ AdjCloseL1, data=data.train, method="svmLinear", 
                     preProc=c("range"),trControl=trainControl)

#Predict Close Price on test data
data.test$SVM_Prediction <- predict(svm.price.model, newdata = data.test)

#calculate RMS error
rmse <- sqrt(mean((data.test$SVM_Prediction-data.test$AdjClose)^2))
rmse

## [1] 1.215873

The SVR model gives an RMSE of 1.215873 when gauged on the test data set.

Conclusion

From the RMSE values of all the models performed above, we can note that the RMSE values have stayed between 1.5 to 1.21. SES followed by ANN and ARIMA was good at predicting MSFT’s close price. It was observed that the forecasting accuracy level of the ANN model compared with that of the ARIMA model is not quite significant. It can be argued that all models achieved good forecast performance judging from the forecast error of all the models, which were quite low. It was also observed that the pattern of ARIMA forecasting models is directional. Also, the actual and predicted values of the close price were quite close in all these models.

Reflection

In Assessment 2, we compared which machine learning model performed better and addressed each models drawback by merging and weighing their outputs to reach a common consensus, whether to buy or sell a stock. But my idea of Assessment 3 was to compare how good the time series models were in forecasting, when compared to traditional machine learning models. There is so much hype around machine learning models these days that one forgets the fact that, even these old time series models pretty much works as good as the machine learning models.The entire exercise of Assessment 2 and 3 has been very beneficial, as I have learned many new algorithms, its implementation, the pros and cons of using it on a linear and nonlinear data sets.