Time series project III. - Stock price prediction

In this project, I used ARIMA and ETS model to predict Novartis’s stock price, using the stock price in 2015.

#Set the language used in the output 
Sys.setlocale("LC_TIME","C")
## [1] "C"
#Load in the library needed
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Get the historical stock price from Yahoo finance

novartis = getSymbols("NVS", auto.assign = F,
                      from = "2015-01-01", to = "2016-01-01")
plot(as.ts(novartis$NVS.Open))

Overall, the line plot shows that the stock price has a clear trend pattern, but the seasonal pattern is unclear.

#Accomandate with finacial data's irregularly spaced
chartSeries(novartis,type = 'line')

This plot shows the price skipping the sotck market close days, and below is the transaction volumes.

library(forecast)
ggtsdisplay(novartis$NVS.Open)

From the ACF plot, there’s a clear auto-correlated pattern. But it’s hard to find the cut point for setting the AC parameter for arima models. On the other hand, based on the PACF plot, the is a clear cut-point when lag equals 1, thus the p should be equal to 1.

Model construction

#ARIMA model
novartsarima = auto.arima(novartis$NVS.Open,
                          stepwise = T,
                          approximation = F,
                          trace =T)
## 
##  ARIMA(2,1,2) with drift         : 789.8915
##  ARIMA(0,1,0) with drift         : 789.3865
##  ARIMA(1,1,0) with drift         : 787.3966
##  ARIMA(0,1,1) with drift         : 786.4334
##  ARIMA(0,1,0)                    : 787.4521
##  ARIMA(1,1,1) with drift         : 786.4954
##  ARIMA(0,1,2) with drift         : 786.4976
##  ARIMA(1,1,2) with drift         : 788.4009
##  ARIMA(0,1,1)                    : 784.5149
##  ARIMA(1,1,1)                    : 784.6786
##  ARIMA(0,1,2)                    : 784.5871
##  ARIMA(1,1,0)                    : 785.468
##  ARIMA(1,1,2)                    : 786.4657
## 
##  Best model: ARIMA(0,1,1)
novartsarima
## Series: novartis$NVS.Open 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.1553
## s.e.   0.0682
## 
## sigma^2 = 1.317:  log likelihood = -390.23
## AIC=784.47   AICc=784.51   BIC=791.52

The auto arima function gave us the most suitable model, which is ARIMA(0,1,1), however, this model didn’t account for any auto-correlated relationship in our data, and it’s worthwhile to take a look at the model including auto-correlated relationship since we did capture it in the PACF plot.

#ARIMA model 2: with autoregressive part
novartsarima2 = Arima(novartis$NVS.Open, order = c(1,1,1))
novartsarima2
## Series: novartis$NVS.Open 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.5904  -0.7247
## s.e.  0.3136   0.2723
## 
## sigma^2 = 1.312:  log likelihood = -389.29
## AIC=784.58   AICc=784.68   BIC=795.16

Here’s the summary of the arima(1,1,1) model, the AIC was slightly higher but still close to that of what the auto arima function picked for us.

#Forecast using the 2 arima
plot(forecast(novartsarima, h = 20))

plot(forecast(novartsarima2, h = 20))

The difference between the predicted price of the two models is little. The only difference is the confidence interval, arima(0,1,1) has a larger confidence interval. And that’s because ARIMA model assumed that the last observation is indicative of the next one. Prior observations don’t matter much.

#ETS
novartisets = ets(novartis$NVS.Open)
plot(forecast(novartisets, h = 20))

The result of the ETS model was also similar to the previous two models.

Convert to a regular time series

Since the stock market closes during holidays and weekends, the stock price is not regularly spaced. Fixing this issue could bring us a more robust result.

  1. First, transform the data into a data frame format
  2. Create a df column of complete dates in 2015
  3. Merge the two, and find out which days are missing
  4. Remove weekends, since there should not be any transactions on weekends.
  5. For the rest missing values, impute them
#1. Conversion to dataframe
novartis = as.data.frame(novartis)

# Adding the rownames as date
novartis$Date = rownames(novartis)
novartis$Date = as.Date(novartis$Date)
head(novartis)
##            NVS.Open NVS.High  NVS.Low NVS.Close NVS.Volume NVS.Adjusted
## 2015-01-02 83.18101 83.37814 82.44624  82.66129     807872     62.11983
## 2015-01-05 83.71864 83.73656 82.59856  82.84050    1537848     62.25450
## 2015-01-06 82.77778 83.09140 81.52330  82.14158    1316992     61.72925
## 2015-01-07 81.85484 82.70609 81.76524  82.50000    1598447     61.99861
## 2015-01-08 84.38172 85.72581 84.22939  85.36739    2156782     64.15344
## 2015-01-09 86.43369 86.74731 85.77957  86.12903    2206890     64.72582
##                  Date
## 2015-01-02 2015-01-02
## 2015-01-05 2015-01-05
## 2015-01-06 2015-01-06
## 2015-01-07 2015-01-07
## 2015-01-08 2015-01-08
## 2015-01-09 2015-01-09
#2. Creating the date column of complete dates in 2015
mydates = seq.Date(from = as.Date("2015-01-01"), 
                to = as.Date("2016-01-01"), 
                by = 1)

# Converting to a df (required for the merge)
mydates = data.frame(Date = mydates)
#3. Merge novartis with 'mydates'
mydata = merge(novartis, mydates, by = "Date", all.y = T) #Default is full join
# 3.9 Removing initial days to start on monday, since 1/1/2015 is Thursday
mydata = mydata[5:366,]
#4. Removing wekends,
## Sundays
mydata = mydata[-(seq(from = 7, to = nrow(mydata), by = 7)),]
## Saturdays
mydata = mydata[-(seq(from = 6, to = nrow(mydata), by = 6)),]
#5. Impute with last observation
mydata = na.locf(mydata)

Which days are the ones best to buy or sell?

After preprocessing the data, now it’s time to look at the price pattern and find the best sell/buy date for Novartis stock.

# Putting the closeprice into a weekly time series
highestprice = ts(as.numeric(mydata$NVS.High), 
                  frequency = 5)
# Various plots
seasonplot(highestprice, season.labels = c("Mon", "Tue", "Wed", "Thu", "Fri"))

monthplot(highestprice)

monthplot(highestprice, base = median, col.base = "red")

plot(stl(highestprice, s.window = "periodic"))

From the season plot, the stock price did not show a clear seasonal pattern - there’ probably not a specific day of the week to sell or buy the stock overall. Though, we did see a lower price on Wednesdays.

# Comparison with the low prices
lowestprice = ts(as.numeric(mydata$NVS.Low), 
                 frequency = 5)
monthplot(lowestprice, base = median, col.base = "red")

monthplot(highestprice, base = median, col.base = "red")

It seems the same for the low-price pattern. The seasonal pattern was not clear, so buying/selling stock on a specific day is probably not a good strategy. But Wednesday could be the one day to buy the stock since on average the price is slightly lower compared to other days.