For this assignment, I picked data from the company Tesla. Source: https://finance.yahoo.com/quote/TSLA/history?period1=1436745600&period2=1594598400&interval=1mo&filter=history&frequency=1mo
#Reading our dataset
#Summary of the dataset
str(tsla)
## 'data.frame': 60 obs. of 7 variables:
## $ Date : Factor w/ 60 levels "2015-08-01","2015-09-01",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Open : num 266 240 248 209 231 ...
## $ High : num 271 272 250 235 244 ...
## $ Low : num 195 237 202 206 215 ...
## $ Close : num 249 248 207 230 240 ...
## $ Adj.Close: num 249 248 207 230 240 ...
## $ Volume : int 115264100 80288100 100812000 78488400 59845500 79247200 133705800 102922000 135507300 103307500 ...
head(tsla)
## Date Open High Low Close Adj.Close Volume
## 1 2015-08-01 266.29 271.00 195.00 249.06 249.06 115264100
## 2 2015-09-01 240.34 271.57 236.97 248.40 248.40 80288100
## 3 2015-10-01 247.51 249.84 202.00 206.93 206.93 100812000
## 4 2015-11-01 208.92 234.58 205.80 230.26 230.26 78488400
## 5 2015-12-01 231.06 243.63 214.87 240.01 240.01 59845500
## 6 2016-01-01 230.72 231.38 182.41 191.20 191.20 79247200
tail(tsla)
## Date Open High Low Close Adj.Close Volume
## 55 2020-02-01 673.69 968.99 611.52 667.99 667.99 472968700
## 56 2020-03-01 711.26 806.98 350.51 524.00 524.00 420935000
## 57 2020-04-01 504.00 869.82 446.40 781.88 781.88 381490600
## 58 2020-05-01 755.00 843.29 683.04 835.00 835.00 272779000
## 59 2020-06-01 858.00 1087.69 854.10 1079.81 1079.81 255718100
## 60 2020-07-01 1083.00 1548.92 1080.50 1544.65 1544.65 123946500
#Create our time series
tsla.ts = ts(tsla$Adj.Close, frequency=12, start=c(2015,8))
plot(tsla.ts)
As we can see, starting in 2020, the stock values rise expontentially.
#Spliting our time series into training and testing sets.
train.ts = window(tsla.ts, end=c(2020,1))
test.ts = window(tsla.ts,start=c(2015,8))
#Fitting Models
fit.ETS = ets(train.ts, model="ZZZ")
fit.LR = tslm(train.ts ~ trend)
fit.Arima = auto.arima(train.ts)
fit.nnet = nnetar(train.ts)
#ETS Model Fit
fit.ETS
## ETS(M,N,N)
##
## Call:
## ets(y = train.ts, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 244.0315
##
## sigma: 0.1481
##
## AIC AICc BIC
## 617.2689 617.7489 623.2358
The AIC, AICc and BIC are efficient measurement to determine, which models fit the dataset better; the smaller their values, the better it is for the model.
#Linear Regression Fit
fit.LR
##
## Call:
## tslm(formula = train.ts ~ trend)
##
## Coefficients:
## (Intercept) trend
## 216.924 2.368
Even though we know firsthand that a linear regression is not the best machine lerning model for fitting, there is still little info we get from it.
#Arima Model Fit
fit.Arima
## Series: train.ts
## ARIMA(0,1,0)
##
## sigma^2 estimated as 2161: log likelihood=-278.68
## AIC=559.36 AICc=559.44 BIC=561.33
As our results show, the AIC, AICc, and BIC in the Arima model are lower than in the ETS model, which means that so far the Arima model is the best fitting model.
#Neural Network Model Fit
fit.nnet
## Series: train.ts
## Model: NNAR(1,1,2)[12]
## Call: nnetar(y = train.ts)
##
## Average of 20 networks, each of which is
## a 2-2-1 network with 9 weights
## options were - linear output units
##
## sigma^2 estimated as 884.1
The best Neural Network Model is NNAR(1,1,2)[12], which means that the model uses 1 lag, 1 seasonal lag, and contains 2 hidden nodes.
##Forecasting results based on models
#Forecast from ETS Model
fcast.ETS = forecast(fit.ETS,h=24)
plot(fcast.ETS)
lines(test.ts, col="red")
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))
#Forecast from Linear Regression Model
fcast.LR = forecast(fit.LR,h=24)
plot(fcast.LR)
lines(test.ts, col="red")
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))
#Forecast from Arima Model
fcast.Arima = forecast(fit.Arima,h=24)
plot(fcast.Arima)
lines(test.ts, col="red")
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))
#Forecast from Neural Network Model
fcast.nnet = forecast(fit.nnet,h=24)
plot(fcast.nnet)
lines(test.ts, col="red")
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))
Visually speaking, the neural network model seems to do the best job at predicting the values of our stocks in the coming years. As we can see, there are more variations in the results compare to other models.
To verify if our assumptions are correct, we can test the accuracy of each model
##Determining the best model
acc.ETS = accuracy(fcast.ETS, test.ts)
acc.LR = accuracy(fcast.LR, test.ts)
acc.Arima = accuracy(fcast.Arima, test.ts)
acc.nnet = accuracy(fcast.nnet, test.ts)
#Accuracy of ETS Model
acc.ETS
## ME RMSE MAE MPE MAPE MASE
## Training set 7.528815 46.06245 30.68279 0.9234048 10.32518 0.4442856
## Test set 255.008274 418.58441 297.19052 19.1643262 27.21437 4.3033076
## ACF1 Theil's U
## Training set 0.1499998 NA
## Test set 0.3715459 1.398568
#Accuracy of Linear Regression Model
acc.LR
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0000 65.86697 45.27646 -4.291222 16.19798 0.6556014
## Test set 552.4564 642.64987 552.45637 56.268253 56.26825 7.9995476
## ACF1 Theil's U
## Training set 0.5575392 NA
## Test set 0.3687543 2.333898
#Accuracy of Arima Model
acc.Arima
## ME RMSE MAE MPE MAPE MASE
## Training set 7.439983 46.05665 30.59443 0.8878493 10.28974 0.4430061
## Test set 254.985006 418.57024 297.17501 19.1614349 27.21296 4.3030830
## ACF1 Theil's U
## Training set 0.1500592 NA
## Test set 0.3715459 1.398513
#Accuracy of Neural Network Model
acc.nnet
## ME RMSE MAE MPE MAPE
## Training set 2.445616e-02 29.73388 23.93492 -1.273613 8.269153
## Test set -1.773466e+03 1803.46812 1773.46625 -230.636948 230.636948
## MASE ACF1 Theil's U
## Training set 0.3465767 0.09116791 NA
## Test set 25.6797248 0.10969703 8.108521
To determine the best accuracy, we refer to the MAPE(Mean Absolute Percentage Error) from the “Test set” line; the best model will have the lowest MAPE.
Arima model: 27.21296 % <===
ETS model: 27.21437 %
LR model: 56.26825 %
NN model: 131.278731 %
As our results display, the Arima model is the best model to forecast the values of our stocks since its MAPE is the lowest(slightly lower than the ETS model).
In hindsight, we could not be any further from the truth when we said that the Neural Network model was the best model to forecast the stocks’ values. Indeed. its MAPE is the highest among our models, which clearly infers to us that we should never rely on just the visual aspect of a model to determine its efficiency.