1 Introduction

In the following report we will attempt to forecast the closing stock price of the publicly traded microchip manufacturer NVIDIA (ticker: NVDA). Included in the report will be a preliminarly analysis of the stock, a split of the data into training and testing data, and the contruction of four forecasting models which will then be tested for accuracy against the training data.

2 Analysis

The following section will contain four sections:

  1. a preliminary analysis where the closing stock price will be plotted and discussed
  2. a split of the data into training and testing data and the construction of four different forecasting models
  3. a graphical representation of the forecasted outputs
  4. a determination of the best model using various accuracy measures

2.1 Preliminary Analysis

In this section, we will first import the data for NVDIA’s closing stock price from Yahoo Finance and then plot the data to see a visual representation of the stock’s historical closing price over the past two years.

getSymbols("NVDA", from="2021-06-21", to="2023-04-14") #retrieve data from Yahoo finance
## [1] "NVDA"
NVDA_Close_Prices = NVDA[,4] #just look at closing price

plot(NVDA_Close_Prices, main = "NVIDIA's Closing Stock Price") #generate plot of stock data

The graph shows that the date of the beginning of the stock price is June 21, 2021 which represents the day the stock was bought by the analyst. The last day is April 14, 2023, the last trading day before this report was written. From the graph we can deduce that the stock price reached a maximum of 333.76001 on 2021-11-29, a minimum of on 2022-10-14, and ended the final trading day at 264.630005.

2.2 Training, Testing, and Model Building

In this section we will first split the data into traing and testing data, in which the testing data will be the last ten observations of the closing stock price. Next, we will develop four forecasting models:

  • a moving average model in which all future values to be equal to the average of the historical data
  • a naive forecasting model in which all forecasts to be the value of the last observation
  • a seasonal naive forecasting model which is a modification of the naive model but with a seasonality component
  • a drift model which is another variation of the naive method that allows the forecasts to increase or decrease over time

The forecast horizon for each model will be ten days, the same amount as the testing data.

training = NVDA_Close_Prices[1:447] #split training
testing = NVDA_Close_Prices[448:457] #testing

NVDA.ts = ts(training, frequency = 365, start = 1, end = 447) #create timeseries object
pred.mv = meanf(NVDA.ts, h=10)$mean #predicted moving average
pred.naive = naive(NVDA.ts, h=10)$mean #naive forecast
pred.snaive = snaive(NVDA.ts, h=10)$mean #seasonal naive
pred.rwf = rwf(NVDA.ts, h=10, drift = TRUE)$mean #drift

pred.table = cbind( pred.mv = pred.mv,
                    pred.naive = pred.naive,
                    pred.snaive = pred.snaive,
                    pred.rwf = pred.rwf)
pander(pred.table, caption = "Forecasting Table")
Forecasting Table
pred.mv pred.naive pred.snaive pred.rwf
205.6 218.6 242.7 218.6
205.6 218.6 265 218.6
205.6 218.6 265.1 218.6
205.6 218.6 245.1 218.6
205.6 218.6 236.4 218.6
205.6 218.6 233.9 218.6
205.6 218.6 223.9 218.6
205.6 218.6 237.5 218.6
205.6 218.6 241.6 218.6
205.6 218.6 243.9 218.6

The forecasting table above shows the various outputs for the forecasting models. The only one which varies with time is the seasonal naive model. The others stay constant over time.

2.3 Graphical Representation of the Forecasting Models

The following shows a graphical representation of the four forecasted models and the held-out testing data which is represented by the black line. Based on the graph, the seasonal naive data is the best approximation of the testing data. This conclusion will also be validated by the next section in which three statistical tests for accurracy will be implemented.

plot(448:457, NVDA_Close_Prices[448:457], type="l", xlim=c(448,457), ylim=c(200, 350),
     xlab = "observation sequence",
     ylab = "Stock Price",
     main = "NVDA Stock Price Forecast")
points(448:457, NVDA_Close_Prices[448:457],pch=20)
##
points(448:457, pred.mv, pch=15, col = "red")
points(448:457, pred.naive, pch=16, col = "blue")
points(448:457, pred.rwf, pch=18, col = "navy")
points(448:457, pred.snaive, pch=17, col = "purple")
##
lines(448:457, pred.mv, lty=2, col = "red")
lines(448:457, pred.snaive, lty=2, col = "purple")
lines(448:457, pred.naive, lty=2, col = "blue")
lines(448:457, pred.rwf, lty=2, col = "navy")
## 
legend("topright", c("moving average", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

2.4 Statistical Tests for Accurracy

In the following section we will test the accurracy of each model using three accurracy measures:

  • Mean Absolute Percentage Error (MAPE): represents the average of the absolute percentage errors of each entry in a dataset to calculate how accurate the forecasted quantities were in comparison with the actual quantities
  • Mean Average Deviation (MAD): mean (average) distance between each data value and the mean of the data set
  • Mean Standard Error (MSE): the average squared error between actual and predicted values
true.value = NVDA_Close_Prices[448:457]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
pander(accuracy.table, caption ="Overall performance of the four forecasting methods")
Overall performance of the four forecasting methods
  MAPE MAD MSE
Moving Average 24.43 665.7 4454
Naive 19.66 535.8 2894
Seasonal Naive 10.56 287.1 943.8
Drift 19.66 535.8 2894

Based on the following table, the accurracy measures confirm our conclusion that the seasonal naive model is the most accurrate at forecasting. Looking in particular at the MAPE measure, the seasonal naive is only off by around 10 to 11 percent, whereas the naive and drift models are off by 20 percent and the moving average is off by 24 to 25 percent.

3 Conclusion

In the above report, we attempted to forecast the stock value of NVIDIA using four different methods: moving average, naive, seasonal naive, and drift. We found through both graphical and accuracy measures that the seasonal naive model was the best at forecasting the testing values we held out from the training data. In the next report, we will further build upon moving averages and LOESS smoothing.