FINAL

This first section of the code deals with fetching the required data. It imports the required R libraries and then fetches historical stock price data of NVIDIA from Yahoo Finance, within a given date range. The code also contains a test to check if the data is fetched successfully. It then chooses the adjusted closing price, which is important for proper analysis, and converts this information into a time series object for further analysis. Lastly, the code prints the first few rows of the data and creates a plot to see the trend of the stock price.

# Load the necessary libraries. I always make sure these are at the top!
library(forecast) # For the forecasting functions (ETS, ARIMA, forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(tseries) # For time series analysis, specifically the ADF test
library(quantmod) # I find this handy for fetching financial data directly

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

library(dplyr)    # For data manipulation

## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:xts':
## 
##     first, last

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# --- 1. Data Acquisition ---
# I'm going to fetch the NVIDIA stock data directly from Yahoo Finance.
# Reviewer can adjust the date range as needed.
start_date <- "2024-01-01" # Corrected start date.
end_date <- "2024-12-01"
getSymbols("NVDA", from = start_date, to = end_date)

## [1] "NVDA"

# Check if NVDA data was successfully fetched
if (is.null(NVDA) || nrow(NVDA) == 0) {
  stop("Failed to retrieve NVDA stock data from Yahoo Finance.  Please check the date range and internet connection.")
}

# Use Adjusted Close Price
if ("NVDA.Adjusted" %in% colnames(NVDA)) {
  nvda_stock <- NVDA$NVDA.Adjusted
} else if ("Adjusted" %in% colnames(NVDA)) {
  nvda_stock <- NVDA$Adjusted
} else {
  stop("Error: Neither 'NVDA.Adjusted' nor 'Adjusted' column found.")
}


time_series <- ts(nvda_stock, frequency = 252) # Assuming daily data, roughly 252 trading days per year.

# I'm taking a peek at the first few rows of the data to make sure it looks right.
head(time_series)

## Time Series:
## Start = c(1, 1) 
## End = c(1, 6) 
## Frequency = 252 
##      NVDA.Adjusted
## [1,]      48.14992
## [2,]      47.55114
## [3,]      47.97998
## [4,]      49.07857
## [5,]      52.23338
## [6,]      53.12005
## attr(,"index")
##   [1] 1704153600 1704240000 1704326400 1704412800 1704672000 1704758400
##   [7] 1704844800 1704931200 1705017600 1705363200 1705449600 1705536000
##  [13] 1705622400 1705881600 1705968000 1706054400 1706140800 1706227200
##  [19] 1706486400 1706572800 1706659200 1706745600 1706832000 1707091200
##  [25] 1707177600 1707264000 1707350400 1707436800 1707696000 1707782400
##  [31] 1707868800 1707955200 1708041600 1708387200 1708473600 1708560000
##  [37] 1708646400 1708905600 1708992000 1709078400 1709164800 1709251200
##  [43] 1709510400 1709596800 1709683200 1709769600 1709856000 1710115200
##  [49] 1710201600 1710288000 1710374400 1710460800 1710720000 1710806400
##  [55] 1710892800 1710979200 1711065600 1711324800 1711411200 1711497600
##  [61] 1711584000 1711929600 1712016000 1712102400 1712188800 1712275200
##  [67] 1712534400 1712620800 1712707200 1712793600 1712880000 1713139200
##  [73] 1713225600 1713312000 1713398400 1713484800 1713744000 1713830400
##  [79] 1713916800 1714003200 1714089600 1714348800 1714435200 1714521600
##  [85] 1714608000 1714694400 1714953600 1715040000 1715126400 1715212800
##  [91] 1715299200 1715558400 1715644800 1715731200 1715817600 1715904000
##  [97] 1716163200 1716249600 1716336000 1716422400 1716508800 1716854400
## [103] 1716940800 1717027200 1717113600 1717372800 1717459200 1717545600
## [109] 1717632000 1717718400 1717977600 1718064000 1718150400 1718236800
## [115] 1718323200 1718582400 1718668800 1718841600 1718928000 1719187200
## [121] 1719273600 1719360000 1719446400 1719532800 1719792000 1719878400
## [127] 1719964800 1720137600 1720396800 1720483200 1720569600 1720656000
## [133] 1720742400 1721001600 1721088000 1721174400 1721260800 1721347200
## [139] 1721606400 1721692800 1721779200 1721865600 1721952000 1722211200
## [145] 1722297600 1722384000 1722470400 1722556800 1722816000 1722902400
## [151] 1722988800 1723075200 1723161600 1723420800 1723507200 1723593600
## [157] 1723680000 1723766400 1724025600 1724112000 1724198400 1724284800
## [163] 1724371200 1724630400 1724716800 1724803200 1724889600 1724976000
## [169] 1725321600 1725408000 1725494400 1725580800 1725840000 1725926400
## [175] 1726012800 1726099200 1726185600 1726444800 1726531200 1726617600
## [181] 1726704000 1726790400 1727049600 1727136000 1727222400 1727308800
## [187] 1727395200 1727654400 1727740800 1727827200 1727913600 1728000000
## [193] 1728259200 1728345600 1728432000 1728518400 1728604800 1728864000
## [199] 1728950400 1729036800 1729123200 1729209600 1729468800 1729555200
## [205] 1729641600 1729728000 1729814400 1730073600 1730160000 1730246400
## [211] 1730332800 1730419200 1730678400 1730764800 1730851200 1730937600
## [217] 1731024000 1731283200 1731369600 1731456000 1731542400 1731628800
## [223] 1731888000 1731974400 1732060800 1732147200 1732233600 1732492800
## [229] 1732579200 1732665600 1732838400
## attr(,"index")attr(,"tzone")
## [1] UTC
## attr(,"index")attr(,"tclass")
## [1] Date
## attr(,"src")
## [1] yahoo
## attr(,"updated")
## [1] 2025-05-10 17:23:25 CST

plot(time_series, main = "NVIDIA Stock Price Over Time", ylab = "Adjusted Closing Price", xlab = "Date")

Here, the code prepares the time series data for modeling. A key aspect of this is checking for stationarity, a property that many time series models assume. The Augmented Dickey-Fuller (ADF) test is used to assess whether the data is stationary. If the data is found to be non-stationary, differencing is applied to transform it. Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots are also examined to provide visual insights into the data’s properties and to aid in model selection.

# --- 2. Data Preparation and Stationarity Check ---
# Now, I want to see if our time series is stationary. Stationarity is important for ARIMA models.
# I'll use the Augmented Dickey-Fuller (ADF) test.
print("ADF Test for Stationarity:")

## [1] "ADF Test for Stationarity:"

adf.test(time_series)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  time_series
## Dickey-Fuller = -2.553, Lag order = 6, p-value = 0.3436
## alternative hypothesis: stationary

# The p-value from the ADF test tells us if we can reject the null hypothesis of non-stationarity.
# If the p-value is high (typically > 0.05), it suggests the series is non-stationary, and I'll need to difference it.
# Let's check the ACF and PACF plots as well, they can give me visual clues about stationarity and potential model orders.
acf(time_series, main = "Autocorrelation Function of NVIDIA Stock Price")

pacf(time_series, main = "Partial Autocorrelation Function of NVIDIA Stock Price")

# It's likely that stock prices are non-stationary, so I'm preparing to difference the series.
diff_series <- diff(time_series)
plot(diff_series, main = "Differenced NVIDIA Stock Price", ylab = "Differenced Closing Price", xlab = "Date")

# Let's run the ADF test again on the differenced series.
print("ADF Test for Stationarity (after first difference):")

## [1] "ADF Test for Stationarity (after first difference):"

adf.test(diff_series)

## Warning in adf.test(diff_series): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff_series
## Dickey-Fuller = -5.6764, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

# And let's look at the ACF and PACF of the differenced series to help with ARIMA order selection.
acf(diff_series, main = "ACF of Differenced NVIDIA Stock Price")

pacf(diff_series, main = "PACF of Differenced NVIDIA Stock Price")

In this section, a Seasonal Autoregressive Integrated Moving Average (SARIMA) model is implemented. The code uses the arima() function to fit a SARIMA model to the time series data. The order of the model (p, d, q) is specified, and the model’s summary is printed. Following this, the model’s residuals are analyzed to ensure they resemble white noise, indicating a good model fit. The Ljung-Box test is used to formally check for autocorrelation in the residuals.

# --- 3. SARIMA Model Implementation ---
# Based on the ACF and PACF of the differenced series, I'll make an initial guess for the (p, d, q) orders.
# 'd' will be 1 because I had to difference the series once to achieve stationarity (hopefully!).
# The 'p' order relates to the number of lagged values in the AR part, and 'q' to the number of lagged errors in the MA part.
# For now, I'm going to try an ARIMA(1, 1, 1) model as a starting point. You might need to adjust these based on your analysis.
sarima_model <- arima(time_series, order = c(1, 1, 1))
print("Summary of the SARIMA Model:")

## [1] "Summary of the SARIMA Model:"

summary(sarima_model)

## 
## Call:
## arima(x = time_series, order = c(1, 1, 1))
## 
## Coefficients:
##           ar1     ma1
##       -0.7127  0.6184
## s.e.   0.1965  0.2173
## 
## sigma^2 estimated as 12.53:  log likelihood = -617.15,  aic = 1240.3
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.4117106 3.532769 2.649429 0.4234427 2.527096 0.9922401
##                       ACF1
## Training set -0.0005472834

# Now, I'll examine the residuals of the model to see if they look like white noise (no significant patterns).
residuals_sarima <- residuals(sarima_model)
plot(residuals_sarima, main = "Residuals of SARIMA Model", ylab = "Residuals", xlab = "Time")

acf(residuals_sarima, main = "ACF of SARIMA Residuals")

# I'm also going to perform a Ljung-Box test to formally check for autocorrelation in the residuals.
print("Ljung-Box Test for SARIMA Residuals:")

## [1] "Ljung-Box Test for SARIMA Residuals:"

Box.test(residuals_sarima, lag = 20, type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  residuals_sarima
## X-squared = 22.226, df = 20, p-value = 0.3284

This section focuses on building an Exponential Smoothing (ETS) model. The ets() function is employed to automatically select and fit an appropriate ETS model to the time series data. Similar to the SARIMA model, the code prints a summary of the fitted model and then analyzes its residuals. The residuals are plotted, and the Ljung-Box test is used to assess whether they exhibit any significant autocorrelation.

# --- 4. Exponential Smoothing Model Implementation ---
# For Exponential Smoothing, the 'ets()' function in the 'forecast' package is really powerful.
# It can automatically select the best ETS model based on the data.
ets_model <- ets(time_series)
print("Summary of the Exponential Smoothing Model:")

## [1] "Summary of the Exponential Smoothing Model:"

summary(ets_model)

## ETS(M,A,N) 
## 
## Call:
## ets(y = time_series)
## 
##   Smoothing parameters:
##     alpha = 0.8757 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 48.6754 
##     b = 0.6109 
## 
##   sigma:  0.0334
## 
##      AIC     AICc      BIC 
## 1828.245 1828.512 1845.457 
## 
## Training set error measures:
##                      ME     RMSE      MAE        MPE     MAPE MASE       ACF1
## Training set -0.2542799 3.537428 2.616535 -0.2640428 2.490375  NaN 0.02629872

# Let's also look at the residuals of the ETS model.
residuals_ets <- residuals(ets_model)
plot(residuals_ets, main = "Residuals of ETS Model", ylab = "Residuals", xlab = "Time")

acf(residuals_ets, main = "ACF of ETS Residuals")

# And perform a Ljung-Box test on these residuals as well.
print("Ljung-Box Test for ETS Residuals:")

## [1] "Ljung-Box Test for ETS Residuals:"

Box.test(residuals_ets, lag = 20, type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  residuals_ets
## X-squared = 20.534, df = 20, p-value = 0.425

This part of the code uses the fitted SARIMA and ETS models to generate forecasts of the NVIDIA stock price for the next 30 trading days. The forecast() function is used to produce these predictions for both models. The forecasts are then printed, and plots are generated to visualize the predicted values.

# --- 5. Forecasting ---
# Now for the exciting part - making forecasts! I'll forecast the next 30 trading days.
forecast_sarima <- forecast(sarima_model, h = 30)
print("SARIMA Forecast:")

## [1] "SARIMA Forecast:"

print(forecast_sarima)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1.916667       137.7920 133.2547 142.3292 130.8528 144.7311
## 1.920635       138.1026 131.9810 144.2242 128.7404 147.4647
## 1.924603       137.8812 130.3340 145.4284 126.3387 149.4237
## 1.928571       138.0390 129.4033 146.6747 124.8318 151.2461
## 1.932540       137.9265 128.2562 147.5968 123.1371 152.7160
## 1.936508       138.0067 127.4471 148.5663 121.8572 154.1562
## 1.940476       137.9496 126.5404 149.3587 120.5007 155.3984
## 1.944444       137.9903 125.8103 150.1702 119.3626 156.6179
## 1.948413       137.9612 125.0432 150.8793 118.2047 157.7178
## 1.952381       137.9819 124.3747 151.5892 117.1714 158.7924
## 1.956349       137.9672 123.6979 152.2365 116.1442 159.7902
## 1.960317       137.9777 123.0799 152.8754 115.1935 160.7618
## 1.964286       137.9702 122.4666 153.4738 114.2595 161.6809
## 1.968254       137.9755 121.8908 154.0603 113.3761 162.5750
## 1.972222       137.9717 121.3248 154.6186 112.5125 163.4310
## 1.976190       137.9744 120.7847 155.1642 111.6850 164.2639
## 1.980159       137.9725 120.2559 155.6892 110.8772 165.0678
## 1.984127       137.9739 119.7460 156.2018 110.0968 165.8510
## 1.988095       137.9729 119.2475 156.6984 109.3348 166.6110
## 1.992063       137.9736 118.7637 157.1835 108.5946 167.3527
## 1.996032       137.9731 118.2905 157.6557 107.8711 168.0751
## 2.000000       137.9735 117.8293 158.1176 107.1657 168.7813
## 2.003968       137.9732 117.3778 158.5686 106.4753 169.4711
## 2.007937       137.9734 116.9365 159.0103 105.8002 170.1466
## 2.011905       137.9733 116.5039 159.4426 105.1387 170.8079
## 2.015873       137.9734 116.0801 159.8666 104.4905 171.4563
## 2.019841       137.9733 115.6641 160.2825 103.8544 172.0922
## 2.023810       137.9733 115.2559 160.6907 103.2301 172.7166
## 2.027778       137.9733 114.8548 161.0918 102.6167 173.3299
## 2.031746       137.9733 114.4607 161.4860 102.0138 173.9328

plot(forecast_sarima, main = "SARIMA Forecast of NVIDIA Stock Price (Next 30 Days)", ylab = "Closing Price", xlab = "Time")

forecast_ets <- forecast(ets_model, h = 30)
print("Exponential Smoothing Forecast:")

## [1] "Exponential Smoothing Forecast:"

print(forecast_ets)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1.916667       138.5811 132.6417 144.5205 129.4976 147.6646
## 1.920635       139.1861 131.2696 147.1025 127.0789 151.2933
## 1.924603       139.7910 130.2867 149.2954 125.2554 154.3267
## 1.928571       140.3960 129.5211 151.2709 123.7643 157.0277
## 1.932540       141.0010 128.8987 153.1032 122.4922 159.5098
## 1.936508       141.6060 128.3796 154.8324 121.3779 161.8340
## 1.940476       142.2109 127.9391 156.4827 120.3841 164.0377
## 1.944444       142.8159 127.5613 158.0706 119.4859 166.1459
## 1.948413       143.4209 127.2345 159.6073 118.6659 168.1759
## 1.952381       144.0259 126.9503 161.1014 117.9111 170.1406
## 1.956349       144.6308 126.7025 162.5592 117.2118 172.0499
## 1.960317       145.2358 126.4860 163.9857 116.5604 173.9112
## 1.964286       145.8408 126.2967 165.3848 115.9508 175.7308
## 1.968254       146.4458 126.1316 166.7599 115.3780 177.5135
## 1.972222       147.0507 125.9879 168.1136 114.8379 179.2635
## 1.976190       147.6557 125.8634 169.4480 114.3272 180.9842
## 1.980159       148.2607 125.7561 170.7653 113.8429 182.6785
## 1.984127       148.8657 125.6645 172.0668 113.3825 184.3488
## 1.988095       149.4706 125.5871 173.3541 112.9440 185.9973
## 1.992063       150.0756 125.5228 174.6284 112.5254 187.6258
## 1.996032       150.6806 125.4705 175.8906 112.1251 189.2360
## 2.000000       151.2856 125.4292 177.1419 111.7417 190.8294
## 2.003968       151.8905 125.3982 178.3829 111.3740 192.4071
## 2.007937       152.4955 125.3765 179.6145 111.0206 193.9704
## 2.011905       153.1005 125.3637 180.8372 110.6808 195.5202
## 2.015873       153.7055 125.3591 182.0518 110.3535 197.0574
## 2.019841       154.3104 125.3621 183.2587 110.0378 198.5830
## 2.023810       154.9154 125.3723 184.4585 109.7332 200.0976
## 2.027778       155.5204 125.3893 185.6515 109.4388 201.6019
## 2.031746       156.1253 125.4125 186.8382 109.1541 203.0966

plot(forecast_ets, main = "Exponential Smoothing Forecast of NVIDIA Stock Price (Next 30 Days)", ylab = "Closing Price", xlab = "Time")

To evaluate the performance of the models and get a sense of how well they might generalize to unseen data, the code splits the available data into two parts: a training set and a testing set. The models are trained on the training set, and their forecasting accuracy is assessed on the testing set. The Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are calculated for both models to quantify their prediction accuracy. These metrics are then compared to determine which model performed better on the unseen data.

# --- 6. Model Comparison ---
# To get a better sense of how well these models might perform, I'll split the data into training and testing sets.
test_length <- 60  # I'm using the last 60 days as my test set.
train_length <- length(time_series) - test_length
if (train_length <= 0) {
  stop("Insufficient data for training and testing split.  The test length is too large.")
}
train_data <- window(time_series, start = start(time_series), end = time(time_series)[train_length])
test_data <- window(time_series, start = time(time_series)[train_length + 1], end = end(time_series))


# Fit SARIMA on the training data
sarima_model_train <- arima(train_data, order = c(1, 1, 1))
forecast_sarima_test <- forecast(sarima_model_train, h = length(test_data))

# Fit ETS on the training data
ets_model_train <- ets(train_data)
forecast_ets_test <- forecast(ets_model_train, h = length(test_data))

# Calculate evaluation metrics (Mean Squared Error and Root Mean Squared Error).
mse_sarima <- mean((forecast_sarima_test$mean - test_data)^2)
rmse_sarima <- sqrt(mse_sarima)
print(paste("SARIMA Test MSE:", mse_sarima))

## [1] "SARIMA Test MSE: 748.26697634021"

print(paste("SARIMA Test RMSE:", rmse_sarima))

## [1] "SARIMA Test RMSE: 27.3544690378046"

mse_ets <- mean((forecast_ets_test$mean - test_data)^2)
rmse_ets <- sqrt(mse_ets)
print(paste("ETS Test MSE:", mse_ets))

## [1] "ETS Test MSE: 83.48234896752"

print(paste("ETS Test RMSE:", rmse_ets))

## [1] "ETS Test RMSE: 9.13686756867582"

# By comparing these metrics, I can get an idea of which model performed better on the unseen test data.

FINAL

Maruquez

2025-05-10