Stock Forecasting Utilizing ARIMA

This html document represents a stock forecasting analysis model using the ARIMA model in R. The stock focus for this is NVIDIA (“NVDA”). Below, you will see the code explained and its financial implications.

Packages needed to carry out analysis:

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: timeDate

## 
## Attaching package: 'timeSeries'

## The following object is masked from 'package:zoo':
## 
##     time<-

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

##Data Import: The code uses the getSymbols function from the quantmod package to import historical stock data for the “NVDA” stock from Yahoo Finance within the specified date range.

# (pulls from yahoo finance)
getSymbols('NVDA', from = '2022-06-02', to = '2023-06-21')

## [1] "NVDA"

View(NVDA)
# class(NVDA)

##Visualization: The code includes a chart series plot using the chartSeries function to display the stock’s price over the past 12 months, along with Bollinger Bands.

# utilized to create ytd stock/fund visual
chartSeries(NVDA, subset = index(NVDA) >= tail(index(NVDA), 1) - months(12), type = 'auto')

addBBands()

##Data Preparation: The code assigns different columns of the imported dataset (Open_prices, High_prices, Low_prices, Close_prices, Volume_prices, Adjusted_prices) to separate variables for further analysis.

# Assigning columns of dataset
Open_prices = NVDA[,1]
High_prices = NVDA[,2]
Low_prices = NVDA[,3]
Close_prices = NVDA[,4]
Volume_prices = NVDA[,5]
Adjusted_prices = NVDA[,6]

par(mfrow = c(1, 1), mar = c(4, 4, 2, 1))  # Set margins to accommodate titles

plot(Open_prices, main = 'Opening price of stock (Over a given period)', xlab = '', xaxt = 'n')  # Remove x-axis labels

plot(High_prices, main = 'Highest price of stock (Over a given period)', xlab = '', xaxt = 'n')

plot(Low_prices, main = 'Lowest price of stock (Over a given period)', xlab = '', xaxt = 'n')

plot(Close_prices, main = 'Closing price of stock (Over a given period)', xlab = '', xaxt = 'n')

plot(Volume_prices, main = 'Volume price of stock (Over a given period)', xlab = '', xaxt = 'n')

plot(Adjusted_prices, main = 'Adjusted price of stock (Over a given period)', xlab = '', xaxt = 'n')

##ACF and PACF Analysis: The code calculates and plots the autocorrelation function (ACF) and partial autocorrelation function (PACF) for the Close_prices series. These functions help identify the presence of any significant autocorrelation and determine the potential order of the ARIMA model.

### Finding linear relation between observation
## Using the close_prices to represent actual historical closing price of the stock, the adf test is then applied to determine stationarity)

par(mfrow = c(1,2)) #par is used to set a parameter for the intended graphic
Acf(Close_prices, main = 'ACF for differenced series')
Pacf(Close_prices, main = 'PACF for differenced series ', col = '#cc0000')

ADF Test: The code performs the Augmented Dickey-Fuller (ADF) test on the Close_prices series to assess its stationarity. The p-value is compared to a threshold (usually 0.05) to determine if the series is stationary. In this case, the test result indicates that the series is non-stationary.

print(adf.test(Close_prices)) # p-value = 0.99 which is greater than the usual .05 threshold, hence the 'warning'

## Warning in adf.test(Close_prices): p-value greater than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Close_prices
## Dickey-Fuller = 0.2488, Lag order = 6, p-value = 0.99
## alternative hypothesis: stationary

auto.arima(Close_prices, seasonal = FALSE)

## Series: Close_prices 
## ARIMA(1,2,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.0282  -0.9776
## s.e.  0.0630   0.0131
## 
## sigma^2 = 64.9:  log likelihood = -915.43
## AIC=1836.86   AICc=1836.95   BIC=1847.55

ARIMA Modeling: The code fits various ARIMA models to the Close_prices series using the auto.arima and arima functions. It tries different combinations of AR, I, and MA components to find the best-fitting model. The models’ residuals are also displayed for visual analysis.

fitA = auto.arima(Close_prices, seasonal = FALSE) #auto Arima 1, 2, 2 (1- first-order autoregresive term; 2- two differences, 2- moving avergae term of order 2) all stem from auto.arima function utilizing the close_prices
tsdisplay(residuals(fitA), lag.max = 30, main='(1,2,2) Model Residuals')

auto.arima(Close_prices, seasonal = FALSE) #aic/bic = 18378/1852

## Series: Close_prices 
## ARIMA(1,2,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.0282  -0.9776
## s.e.  0.0630   0.0131
## 
## sigma^2 = 64.9:  log likelihood = -915.43
## AIC=1836.86   AICc=1836.95   BIC=1847.55

fitB = arima(Close_prices, order=c(1,2,4))  #custom arima of 1(autoregressive order of 1), 2(two differences), 4(moving avg order of 4)
tsdisplay(residuals(fitB), lag.max = 30, main='(1,2,4) Model Residuals')

fitC = arima(Close_prices, order = c(6,1,4))
tsdisplay(residuals(fitC), lag.max=30, main='(6,1,4) Model Residuals') #tried using 5 instead of 6 but it was non-stationary

fitD = arima(Close_prices, order = c(1,1,1))
tsdisplay(residuals(fitD), lag.max=30, main='(1,1,1) Model Residuals')

Forecasting: The code generates forecasts using the fitted ARIMA models and plots the forecasted values for a specified number of future periods.

par(mfrow=c(2,2))

# auto arima (2,0,2)
var<-150 #variable for how many days of projection
fcast1<-forecast(fitA, h=var)
plot(fcast1)
# custom arima (3,0,3)
fcast2 <- forecast(fitB, h=var)
plot(fcast2)
fcast3 <- forecast(fitC, h=var)
plot(fcast3)
fcast4 <- forecast(fitD, h=var)
plot(fcast4)

Accuracy Testing: The code uses the accuracy function to evaluate the accuracy of the ARIMA model forecasts by comparing them to the actual values.

accuracy(fcast1)

##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 0.8422137 7.994628 5.34886 0.3604873 2.707821 0.9943247
##                      ACF1
## Training set -0.009424954

accuracy(fcast2)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.7977069 7.878375 5.373374 0.3433341 2.750183 0.9988816
##                     ACF1
## Training set -0.01015069

accuracy(fcast3)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.6571545 7.768089 5.231591 0.2264413 2.664372 0.9725249
##                      ACF1
## Training set -0.007478808

accuracy(fcast4)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.8692714 8.063988 5.308486 0.2267034 2.670923 0.9868193
##                     ACF1
## Training set -0.01136858

Prediction Return: The code calculates the returns of the predicted prices and separates the returns into training and testing sets. It fits an ARIMA model to the training set and uses it to generate predictions for the testing set.

Forecast Prediction Result: The code generates a forecast for the future returns using the ARIMA model and compares the forecasted returns to the actual returns using the accuracy function.

## ME = Mean Error
## RMSE = Root Mean Square Error
## MAE = Mean Accuracy Error
## MAPE = Mean Accuracy percent Error