Homework #1 - Determining the best machine learning model to forecast values

For this assignment, I picked data from the company Tesla. Source: https://finance.yahoo.com/quote/TSLA/history?period1=1436745600&period2=1594598400&interval=1mo&filter=history&frequency=1mo

#Reading our dataset

#Summary of the dataset

str(tsla)

## 'data.frame':    60 obs. of  7 variables:
##  $ Date     : Factor w/ 60 levels "2015-08-01","2015-09-01",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Open     : num  266 240 248 209 231 ...
##  $ High     : num  271 272 250 235 244 ...
##  $ Low      : num  195 237 202 206 215 ...
##  $ Close    : num  249 248 207 230 240 ...
##  $ Adj.Close: num  249 248 207 230 240 ...
##  $ Volume   : int  115264100 80288100 100812000 78488400 59845500 79247200 133705800 102922000 135507300 103307500 ...

head(tsla)

##         Date   Open   High    Low  Close Adj.Close    Volume
## 1 2015-08-01 266.29 271.00 195.00 249.06    249.06 115264100
## 2 2015-09-01 240.34 271.57 236.97 248.40    248.40  80288100
## 3 2015-10-01 247.51 249.84 202.00 206.93    206.93 100812000
## 4 2015-11-01 208.92 234.58 205.80 230.26    230.26  78488400
## 5 2015-12-01 231.06 243.63 214.87 240.01    240.01  59845500
## 6 2016-01-01 230.72 231.38 182.41 191.20    191.20  79247200

tail(tsla)

##          Date    Open    High     Low   Close Adj.Close    Volume
## 55 2020-02-01  673.69  968.99  611.52  667.99    667.99 472968700
## 56 2020-03-01  711.26  806.98  350.51  524.00    524.00 420935000
## 57 2020-04-01  504.00  869.82  446.40  781.88    781.88 381490600
## 58 2020-05-01  755.00  843.29  683.04  835.00    835.00 272779000
## 59 2020-06-01  858.00 1087.69  854.10 1079.81   1079.81 255718100
## 60 2020-07-01 1083.00 1548.92 1080.50 1544.65   1544.65 123946500

#Create our time series

tsla.ts = ts(tsla$Adj.Close, frequency=12, start=c(2015,8))
plot(tsla.ts)

As we can see, starting in 2020, the stock values rise expontentially.

#Spliting our time series into training and testing sets.

train.ts = window(tsla.ts, end=c(2020,1))
test.ts = window(tsla.ts,start=c(2015,8))

#Fitting Models

fit.ETS = ets(train.ts, model="ZZZ")
fit.LR = tslm(train.ts ~ trend)
fit.Arima = auto.arima(train.ts)
fit.nnet = nnetar(train.ts)

#ETS Model Fit

fit.ETS

## ETS(M,N,N) 
## 
## Call:
##  ets(y = train.ts, model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 244.0315 
## 
##   sigma:  0.1481
## 
##      AIC     AICc      BIC 
## 617.2689 617.7489 623.2358

The AIC, AICc and BIC are efficient measurement to determine, which models fit the dataset better; the smaller their values, the better it is for the model.

#Linear Regression Fit

fit.LR

## 
## Call:
## tslm(formula = train.ts ~ trend)
## 
## Coefficients:
## (Intercept)        trend  
##     216.924        2.368

Even though we know firsthand that a linear regression is not the best machine lerning model for fitting, there is still little info we get from it.

#Arima Model Fit

fit.Arima

## Series: train.ts 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 2161:  log likelihood=-278.68
## AIC=559.36   AICc=559.44   BIC=561.33

As our results show, the AIC, AICc, and BIC in the Arima model are lower than in the ETS model, which means that so far the Arima model is the best fitting model.

#Neural Network Model Fit

fit.nnet

## Series: train.ts 
## Model:  NNAR(1,1,2)[12] 
## Call:   nnetar(y = train.ts)
## 
## Average of 20 networks, each of which is
## a 2-2-1 network with 9 weights
## options were - linear output units 
## 
## sigma^2 estimated as 884.1

The best Neural Network Model is NNAR(1,1,2)[12], which means that the model uses 1 lag, 1 seasonal lag, and contains 2 hidden nodes.

##Forecasting results based on models

#Forecast from ETS Model

fcast.ETS = forecast(fit.ETS,h=24)
plot(fcast.ETS)
lines(test.ts, col="red")  
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))

#Forecast from Linear Regression Model

fcast.LR = forecast(fit.LR,h=24)
plot(fcast.LR)
lines(test.ts, col="red")  
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))

#Forecast from Arima Model

fcast.Arima = forecast(fit.Arima,h=24)
plot(fcast.Arima)
lines(test.ts, col="red")  
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))

#Forecast from Neural Network Model

fcast.nnet = forecast(fit.nnet,h=24)
plot(fcast.nnet) 
lines(test.ts, col="red")  
legend("topleft",lty=1,col=c("red","blue"),c("actual values","forecast"))

Visually speaking, the neural network model seems to do the best job at predicting the values of our stocks in the coming years. As we can see, there are more variations in the results compare to other models.

To verify if our assumptions are correct, we can test the accuracy of each model

##Determining the best model

acc.ETS = accuracy(fcast.ETS, test.ts)
acc.LR = accuracy(fcast.LR, test.ts)
acc.Arima = accuracy(fcast.Arima, test.ts)
acc.nnet = accuracy(fcast.nnet, test.ts)

#Accuracy of ETS Model

acc.ETS

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set   7.528815  46.06245  30.68279  0.9234048 10.32518 0.4442856
## Test set     255.008274 418.58441 297.19052 19.1643262 27.21437 4.3033076
##                   ACF1 Theil's U
## Training set 0.1499998        NA
## Test set     0.3715459  1.398568

#Accuracy of Linear Regression Model

acc.LR

##                    ME      RMSE       MAE       MPE     MAPE      MASE
## Training set   0.0000  65.86697  45.27646 -4.291222 16.19798 0.6556014
## Test set     552.4564 642.64987 552.45637 56.268253 56.26825 7.9995476
##                   ACF1 Theil's U
## Training set 0.5575392        NA
## Test set     0.3687543  2.333898

#Accuracy of Arima Model

acc.Arima

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set   7.439983  46.05665  30.59443  0.8878493 10.28974 0.4430061
## Test set     254.985006 418.57024 297.17501 19.1614349 27.21296 4.3030830
##                   ACF1 Theil's U
## Training set 0.1500592        NA
## Test set     0.3715459  1.398513

#Accuracy of Neural Network Model

acc.nnet

##                         ME       RMSE        MAE         MPE       MAPE
## Training set  2.445616e-02   29.73388   23.93492   -1.273613   8.269153
## Test set     -1.773466e+03 1803.46812 1773.46625 -230.636948 230.636948
##                    MASE       ACF1 Theil's U
## Training set  0.3465767 0.09116791        NA
## Test set     25.6797248 0.10969703  8.108521

To determine the best accuracy, we refer to the MAPE(Mean Absolute Percentage Error) from the “Test set” line; the best model will have the lowest MAPE.

Arima model: 27.21296 % <===

ETS model: 27.21437 %

LR model: 56.26825 %

NN model: 131.278731 %

As our results display, the Arima model is the best model to forecast the values of our stocks since its MAPE is the lowest(slightly lower than the ETS model).

In hindsight, we could not be any further from the truth when we said that the Neural Network model was the best model to forecast the stocks’ values. Indeed. its MAPE is the highest among our models, which clearly infers to us that we should never rely on just the visual aspect of a model to determine its efficiency.

Homework #1 - Determining the best machine learning model to forecast values

Rodrigue

7/13/2020