In this project, I used ARIMA and ETS model to predict Novartis’s stock price, using the stock price in 2015.
#Set the language used in the output
Sys.setlocale("LC_TIME","C")
## [1] "C"
#Load in the library needed
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
novartis = getSymbols("NVS", auto.assign = F,
from = "2015-01-01", to = "2016-01-01")
plot(as.ts(novartis$NVS.Open))
Overall, the line plot shows that the stock price has a clear trend pattern, but the seasonal pattern is unclear.
#Accomandate with finacial data's irregularly spaced
chartSeries(novartis,type = 'line')
This plot shows the price skipping the sotck market close days, and below is the transaction volumes.
library(forecast)
ggtsdisplay(novartis$NVS.Open)
From the ACF plot, there’s a clear auto-correlated pattern. But it’s hard to find the cut point for setting the AC parameter for arima models. On the other hand, based on the PACF plot, the is a clear cut-point when lag equals 1, thus the p should be equal to 1.
#ARIMA model
novartsarima = auto.arima(novartis$NVS.Open,
stepwise = T,
approximation = F,
trace =T)
##
## ARIMA(2,1,2) with drift : 789.8915
## ARIMA(0,1,0) with drift : 789.3865
## ARIMA(1,1,0) with drift : 787.3966
## ARIMA(0,1,1) with drift : 786.4334
## ARIMA(0,1,0) : 787.4521
## ARIMA(1,1,1) with drift : 786.4954
## ARIMA(0,1,2) with drift : 786.4976
## ARIMA(1,1,2) with drift : 788.4009
## ARIMA(0,1,1) : 784.5149
## ARIMA(1,1,1) : 784.6786
## ARIMA(0,1,2) : 784.5871
## ARIMA(1,1,0) : 785.468
## ARIMA(1,1,2) : 786.4657
##
## Best model: ARIMA(0,1,1)
novartsarima
## Series: novartis$NVS.Open
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.1553
## s.e. 0.0682
##
## sigma^2 = 1.317: log likelihood = -390.23
## AIC=784.47 AICc=784.51 BIC=791.52
The auto arima function gave us the most suitable model, which is ARIMA(0,1,1), however, this model didn’t account for any auto-correlated relationship in our data, and it’s worthwhile to take a look at the model including auto-correlated relationship since we did capture it in the PACF plot.
#ARIMA model 2: with autoregressive part
novartsarima2 = Arima(novartis$NVS.Open, order = c(1,1,1))
novartsarima2
## Series: novartis$NVS.Open
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.5904 -0.7247
## s.e. 0.3136 0.2723
##
## sigma^2 = 1.312: log likelihood = -389.29
## AIC=784.58 AICc=784.68 BIC=795.16
Here’s the summary of the arima(1,1,1) model, the AIC was slightly higher but still close to that of what the auto arima function picked for us.
#Forecast using the 2 arima
plot(forecast(novartsarima, h = 20))
plot(forecast(novartsarima2, h = 20))
The difference between the predicted price of the two models is little. The only difference is the confidence interval, arima(0,1,1) has a larger confidence interval. And that’s because ARIMA model assumed that the last observation is indicative of the next one. Prior observations don’t matter much.
#ETS
novartisets = ets(novartis$NVS.Open)
plot(forecast(novartisets, h = 20))
The result of the ETS model was also similar to the previous two models.
Since the stock market closes during holidays and weekends, the stock price is not regularly spaced. Fixing this issue could bring us a more robust result.
#1. Conversion to dataframe
novartis = as.data.frame(novartis)
# Adding the rownames as date
novartis$Date = rownames(novartis)
novartis$Date = as.Date(novartis$Date)
head(novartis)
## NVS.Open NVS.High NVS.Low NVS.Close NVS.Volume NVS.Adjusted
## 2015-01-02 83.18101 83.37814 82.44624 82.66129 807872 62.11983
## 2015-01-05 83.71864 83.73656 82.59856 82.84050 1537848 62.25450
## 2015-01-06 82.77778 83.09140 81.52330 82.14158 1316992 61.72925
## 2015-01-07 81.85484 82.70609 81.76524 82.50000 1598447 61.99861
## 2015-01-08 84.38172 85.72581 84.22939 85.36739 2156782 64.15344
## 2015-01-09 86.43369 86.74731 85.77957 86.12903 2206890 64.72582
## Date
## 2015-01-02 2015-01-02
## 2015-01-05 2015-01-05
## 2015-01-06 2015-01-06
## 2015-01-07 2015-01-07
## 2015-01-08 2015-01-08
## 2015-01-09 2015-01-09
#2. Creating the date column of complete dates in 2015
mydates = seq.Date(from = as.Date("2015-01-01"),
to = as.Date("2016-01-01"),
by = 1)
# Converting to a df (required for the merge)
mydates = data.frame(Date = mydates)
#3. Merge novartis with 'mydates'
mydata = merge(novartis, mydates, by = "Date", all.y = T) #Default is full join
# 3.9 Removing initial days to start on monday, since 1/1/2015 is Thursday
mydata = mydata[5:366,]
#4. Removing wekends,
## Sundays
mydata = mydata[-(seq(from = 7, to = nrow(mydata), by = 7)),]
## Saturdays
mydata = mydata[-(seq(from = 6, to = nrow(mydata), by = 6)),]
#5. Impute with last observation
mydata = na.locf(mydata)
After preprocessing the data, now it’s time to look at the price pattern and find the best sell/buy date for Novartis stock.
# Putting the closeprice into a weekly time series
highestprice = ts(as.numeric(mydata$NVS.High),
frequency = 5)
# Various plots
seasonplot(highestprice, season.labels = c("Mon", "Tue", "Wed", "Thu", "Fri"))
monthplot(highestprice)
monthplot(highestprice, base = median, col.base = "red")
plot(stl(highestprice, s.window = "periodic"))
From the season plot, the stock price did not show a clear seasonal pattern - there’ probably not a specific day of the week to sell or buy the stock overall. Though, we did see a lower price on Wednesdays.
# Comparison with the low prices
lowestprice = ts(as.numeric(mydata$NVS.Low),
frequency = 5)
monthplot(lowestprice, base = median, col.base = "red")
monthplot(highestprice, base = median, col.base = "red")
It seems the same for the low-price pattern. The seasonal pattern was not clear, so buying/selling stock on a specific day is probably not a good strategy. But Wednesday could be the one day to buy the stock since on average the price is slightly lower compared to other days.