In the following report we will attempt to forecast the closing stock price of the publicly traded microchip manufacturer NVIDIA (ticker: NVDA). Included in the report will be a preliminarly analysis of the stock, a split of the data into training and testing data, and the contruction of four forecasting models which will then be tested for accuracy against the training data.
The following section will contain four sections:
In this section, we will first import the data for NVDIA’s closing stock price from Yahoo Finance and then plot the data to see a visual representation of the stock’s historical closing price over the past two years.
getSymbols("NVDA", from="2021-06-21", to="2023-04-14") #retrieve data from Yahoo finance
## [1] "NVDA"
NVDA_Close_Prices = NVDA[,4] #just look at closing price
plot(NVDA_Close_Prices, main = "NVIDIA's Closing Stock Price") #generate plot of stock data
The graph shows that the date of the beginning of the stock price is June 21, 2021 which represents the day the stock was bought by the analyst. The last day is April 14, 2023, the last trading day before this report was written. From the graph we can deduce that the stock price reached a maximum of 333.76001 on 2021-11-29, a minimum of on 2022-10-14, and ended the final trading day at 264.630005.
In this section we will first split the data into traing and testing data, in which the testing data will be the last ten observations of the closing stock price. Next, we will develop four forecasting models:
The forecast horizon for each model will be ten days, the same amount as the testing data.
training = NVDA_Close_Prices[1:447] #split training
testing = NVDA_Close_Prices[448:457] #testing
NVDA.ts = ts(training, frequency = 365, start = 1, end = 447) #create timeseries object
pred.mv = meanf(NVDA.ts, h=10)$mean #predicted moving average
pred.naive = naive(NVDA.ts, h=10)$mean #naive forecast
pred.snaive = snaive(NVDA.ts, h=10)$mean #seasonal naive
pred.rwf = rwf(NVDA.ts, h=10, drift = TRUE)$mean #drift
pred.table = cbind( pred.mv = pred.mv,
pred.naive = pred.naive,
pred.snaive = pred.snaive,
pred.rwf = pred.rwf)
pander(pred.table, caption = "Forecasting Table")
| pred.mv | pred.naive | pred.snaive | pred.rwf |
|---|---|---|---|
| 205.6 | 218.6 | 242.7 | 218.6 |
| 205.6 | 218.6 | 265 | 218.6 |
| 205.6 | 218.6 | 265.1 | 218.6 |
| 205.6 | 218.6 | 245.1 | 218.6 |
| 205.6 | 218.6 | 236.4 | 218.6 |
| 205.6 | 218.6 | 233.9 | 218.6 |
| 205.6 | 218.6 | 223.9 | 218.6 |
| 205.6 | 218.6 | 237.5 | 218.6 |
| 205.6 | 218.6 | 241.6 | 218.6 |
| 205.6 | 218.6 | 243.9 | 218.6 |
The forecasting table above shows the various outputs for the forecasting models. The only one which varies with time is the seasonal naive model. The others stay constant over time.
The following shows a graphical representation of the four forecasted models and the held-out testing data which is represented by the black line. Based on the graph, the seasonal naive data is the best approximation of the testing data. This conclusion will also be validated by the next section in which three statistical tests for accurracy will be implemented.
plot(448:457, NVDA_Close_Prices[448:457], type="l", xlim=c(448,457), ylim=c(200, 350),
xlab = "observation sequence",
ylab = "Stock Price",
main = "NVDA Stock Price Forecast")
points(448:457, NVDA_Close_Prices[448:457],pch=20)
##
points(448:457, pred.mv, pch=15, col = "red")
points(448:457, pred.naive, pch=16, col = "blue")
points(448:457, pred.rwf, pch=18, col = "navy")
points(448:457, pred.snaive, pch=17, col = "purple")
##
lines(448:457, pred.mv, lty=2, col = "red")
lines(448:457, pred.snaive, lty=2, col = "purple")
lines(448:457, pred.naive, lty=2, col = "blue")
lines(448:457, pred.rwf, lty=2, col = "navy")
##
legend("topright", c("moving average", "naive", "drift", "seasonal naive"),
col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
bty="n", cex = 0.8)
In the following section we will test the accurracy of each model using three accurracy measures:
true.value = NVDA_Close_Prices[448:457]
PE.mv = 100*(true.value - pred.mv)/true.value
PE.naive = 100*(true.value - pred.naive)/true.value
PE.snaive = 100*(true.value - pred.snaive)/true.value
PE.rwf = 100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
pander(accuracy.table, caption ="Overall performance of the four forecasting methods")
| MAPE | MAD | MSE | |
|---|---|---|---|
| Moving Average | 24.43 | 665.7 | 4454 |
| Naive | 19.66 | 535.8 | 2894 |
| Seasonal Naive | 10.56 | 287.1 | 943.8 |
| Drift | 19.66 | 535.8 | 2894 |
Based on the following table, the accurracy measures confirm our conclusion that the seasonal naive model is the most accurrate at forecasting. Looking in particular at the MAPE measure, the seasonal naive is only off by around 10 to 11 percent, whereas the naive and drift models are off by 20 percent and the moving average is off by 24 to 25 percent.
In the above report, we attempted to forecast the stock value of NVIDIA using four different methods: moving average, naive, seasonal naive, and drift. We found through both graphical and accuracy measures that the seasonal naive model was the best at forecasting the testing values we held out from the training data. In the next report, we will further build upon moving averages and LOESS smoothing.