Apple_Stock_Forecasting with ARIMA modelling in R

# Install packages if not already installed
if (!requireNamespace("quantmod", quietly = TRUE)) install.packages("quantmod")

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

if (!requireNamespace("forecast", quietly = TRUE)) install.packages("forecast")
if (!requireNamespace("tseries", quietly = TRUE)) install.packages("tseries")
if (!requireNamespace("tseries", quietly = TRUE)) install.packages("rugarch")

# Load packages
library(quantmod)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

library(forecast)
library(tseries)
library(rugarch)

## Loading required package: parallel

## 
## Attaching package: 'rugarch'

## The following object is masked from 'package:stats':
## 
##     sigma

# Specify the stock symbol and date range
stock_symbol <- "AAPL"
start_date <- as.Date("2018-01-01") # start date as needed
end_date <- as.Date("2023-01-01") #  an end date as needed

# Fetch stock data
getSymbols(stock_symbol, src = "yahoo", from = start_date, to = end_date)

## [1] "AAPL"

# Extract the Adjusted Close prices
stock_prices <- Ad(get(stock_symbol))

# Split data into training and test sets
train_end_date <- as.Date("2022-01-03")
test_start_date <- as.Date("2022-01-04")

training_set <- stock_prices[index(stock_prices) <= train_end_date]
test_set <- stock_prices[index(stock_prices) >= test_start_date]

# Print lengths of the training and test sets to verify
cat("Training set length: ", length(training_set), "\n")

## Training set length:  1009

cat("Test set length: ", length(test_set), "\n")

## Test set length:  250

# Plot the time series for training and test sets
plot(training_set, main = "Training Set of Stock Prices", xlab = "Date", ylab = "Adjusted Close Price", col = "blue")

plot(test_set, main = "Test Set of Stock Prices", xlab = "Date", ylab = "Adjusted Close Price", col = "red")

# Descriptive stats of the training,testing and overall data
training_stats <- summary(training_set)
test_stats <- summary(test_set)
overall_stats <- summary(stock_prices)

# print overall stats
cat("\nDescriptive Statistics for Overall Data:\n")

## 
## Descriptive Statistics for Overall Data:

print(overall_stats)

##      Index            AAPL.Adjusted   
##  Min.   :2018-01-02   Min.   : 34.03  
##  1st Qu.:2019-04-03   1st Qu.: 49.10  
##  Median :2020-07-02   Median : 89.49  
##  Mean   :2020-07-02   Mean   : 96.02  
##  3rd Qu.:2021-09-30   3rd Qu.:141.01  
##  Max.   :2022-12-30   Max.   :179.48

# Plot the time series ranging over the whole timeframe
plot(stock_prices, main = "Time Series of Stock Prices over whole timeframe", xlab = "Date", ylab = "Adjusted Close Price", col = "green")

# Check for stationarity using Augmented Dickey-Fuller Test
adf.test(stock_prices, alternative = "stationary")

## 
##  Augmented Dickey-Fuller Test
## 
## data:  stock_prices
## Dickey-Fuller = -1.5744, Lag order = 10, p-value = 0.7585
## alternative hypothesis: stationary

# p-value is (0.7585) >.05  Therefore the time series is not stationary.


acf(stock_prices, main = "ACF of Stock Prices")

pacf(stock_prices, main = "PACF of Stock Prices")

# The significant autocorrelation at multiple lags in the ACF plot suggests the presence of a trend or a seasonal pattern in the stock prices, indicating that the data is not stationary.

# The PACF plot indicates that an AR(1) model might be appropriate, as the partial autocorrelation drops off after lag 1.

# To properly model the series using ARIMA, we would likely need to difference the series to remove the trend and achieve stationarity before fitting the model.

apple_returns_diff<- na.omit(diff(stock_prices))

acf(apple_returns_diff, main="ACF of differenced Stock Prices")

pacf(apple_returns_diff, main ="PACF of differenced Stock Prices")

# The significant autocorrelation at lag 1 in both ACF and PACF plots suggests that an AR(1) model could be suitable.

# The periodicity indicated by the spike at lag 13 in the PACF plot suggests that further investigation into seasonal components might be needed.

# Applying a log transformation can help in normalizing the data, making the series more stationary, and improving the accuracy of the model.

#For most practical purposes in time series analysis, it is recommended to log transform the series first and then apply differencing. This approach stabilizes the variance before addressing trends and stationarity issues.

apple_log <- log(stock_prices)
apple_diff_log <- na.omit(diff(apple_log))


acf(apple_diff_log, main="ACF of differenced log Stock Prices")

pacf(apple_diff_log, main ="PACF of differenced log Stock Prices")

# Normalization: The log transformation followed by differencing has effectively stabilized the variance and removed trends, making the series more stationary. This is evident from the ACF and PACF plots where most autocorrelations are insignificant.

# Pattern Detection: By normalizing the data, these transformations make it easier to detect underlying patterns and trends. The significant spike at lag 1 in both plots suggests that short-term autocorrelation is present and can be captured with a simple AR(1) model.

# Stationarity: The transformed series is now more stationary, facilitating more accurate modeling and forecasting.

# Fit ARIMA model to get residuals
fit_arima <- auto.arima(apple_diff_log, seasonal = FALSE)
residuals_arima <- residuals(fit_arima)


# Fit GARCH model to the residuals
spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
                   mean.model = list(armaOrder = c(0, 0), include.mean = FALSE),
                   distribution.model = "norm")
fit_garch <- ugarchfit(spec = spec, data = residuals_arima)

# Summary of the GARCH model
summary(fit_garch)

##    Length     Class      Mode 
##         1 uGARCHfit        S4

# Diagnostic checks: ACF and PACF of squared residuals
acf(residuals(fit_garch)^2, main = "ACF of Squared GARCH Residuals")

pacf(residuals(fit_garch)^2, main = "PACF of Squared GARCH Residuals")

# Ljung-Box test for squared residuals
Box.test(residuals(fit_garch)^2, lag = 10, type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  residuals(fit_garch)^2
## X-squared = 479.48, df = 10, p-value < 2.2e-16

# The extremely low p-value (much less than 0.05) indicates that we reject the null hypothesis that there is no autocorrelation in the squared residuals. This result confirms the presence of significant autocorrelation in the squared residuals, suggesting heteroscedasticity (changing variance over time).

# Function to perform one-step ahead forecast
one_step_forecast <- function(model, training_set, test_set) {
  forecasts <- numeric(length(test_set))
  for (i in seq_along(test_set)) {
    if ((length(training_set) + i - 1) > length(training_set)) {
      fit <- Arima(c(training_set, test_set[1:(i - 1)]), model = model)
    } else {
      fit <- Arima(training_set, model = model)
    }
    forecasts[i] <- forecast(fit, h = 1)$mean
  }
  return(forecasts)
}

# Fitting the original data( stock_prices for forecasting)
final_fit_arima <- auto.arima(stock_prices, seasonal = FALSE)

# Perform one-step ahead forecast
one_step_forecasts <- one_step_forecast(final_fit_arima, training_set, test_set)


# Print the one-step ahead forecasted prices
print(one_step_forecasts)

##   [1] 179.2173 177.3257 172.7838 169.8032 169.7794 169.7965 172.4740 173.0536
##   [9] 169.9957 170.6246 167.6345 164.1463 162.3412 160.2865 159.4299 157.6737
##  [17] 157.4835 157.0367 167.2987 172.0442 172.1754 173.3225 170.6695 170.2392
##  [25] 169.5362 172.4329 173.9537 170.1886 166.7326 166.7449 170.3714 170.3695
##  [33] 166.9662 165.2940 162.4289 158.3136 160.5392 162.6312 163.0088 161.2513
##  [41] 164.2602 164.1372 161.2922 157.5304 155.5768 160.5674 156.7631 153.0172
##  [49] 148.9781 152.8774 157.2875 158.5109 161.7010 163.1933 166.4740 167.9625
##  [57] 171.6324 172.4584 173.3247 176.4931 175.5824 172.5967 172.1369 175.9387
##  [65] 173.0356 169.8639 169.9584 168.0632 163.9234 165.4427 168.0760 163.4976
##  [73] 163.0151 165.1464 165.1184 164.3656 160.0264 160.7732 155.1806 154.6272
##  [81] 161.1495 155.9917 155.9654 157.3733 163.5235 155.3201 155.5083 150.6721
##  [89] 152.6531 145.3351 141.2310 145.2067 143.9906 147.3527 139.7379 136.0522
##  [97] 136.0508 141.1770 138.9350 138.9472 141.9754 147.6015 147.2009 147.0558
## [105] 149.3686 144.0952 144.4802 146.8903 146.3404 141.3664 135.9470 130.7418
## [113] 131.2424 133.7523 128.9181 130.0170 134.0864 133.8511 136.5484 139.8607
## [121] 140.0622 136.1555 137.5801 135.3359 137.2517 139.8106 141.2292 144.5009
## [129] 145.3407 143.3766 144.1764 143.8815 146.6310 148.3769 145.6043 149.0853
## [137] 151.1915 153.4648 152.4330 151.3115 149.9896 154.7290 155.5374 160.3784
## [145] 159.7442 158.3127 163.9090 163.9522 163.7410 163.2818 163.3004 167.3188
## [153] 166.8658 170.1954 171.4116 171.3368 172.7458 172.4595 169.9963 166.1732
## [161] 165.6244 165.8700 168.2098 162.3912 159.9541 157.5089 155.7885 156.3721
## [169] 154.4081 153.0984 154.3485 153.0296 155.6555 161.4508 152.8851 153.7310
## [177] 151.0485 149.3322 152.7431 155.2010 152.3933 151.3112 149.0971 149.2786
## [185] 150.2104 148.4815 141.5319 137.1290 140.8134 144.4318 144.9309 144.0593
## [193] 139.0403 139.0419 137.7028 137.0262 141.3093 137.2828 140.7856 142.2448
## [201] 142.4358 142.0104 145.5927 147.8379 150.6621 148.0533 143.6634 153.5700
## [209] 151.9407 149.3400 143.9555 137.9088 137.2894 137.7547 138.3233 134.0451
## [217] 144.9628 148.2630 147.1411 148.7123 147.6434 149.3762 150.0126 146.9960
## [225] 148.8338 149.7747 147.0747 143.2869 140.2154 146.4186 147.0568 146.6299
## [233] 145.5047 141.9712 139.9212 141.3893 141.0222 143.1702 144.2138 142.1721
## [241] 135.7975 133.5561 131.4264 131.2316 134.1539 131.3323 130.8160 129.0800
## [249] 125.2577 128.3506

# Plot the actual vs forecasted prices
plot(index(test_set), coredata(test_set), col = 'red', type = 'l', xlab = 'Date', ylab = 'Adjusted Close Price', main = 'One-Step Ahead Forecast vs Actual')
lines(index(test_set), one_step_forecasts, col = 'blue', type = 'l')
legend('topleft', legend = c('Actual', 'Forecasted'), col = c('red', 'blue'), lty = 1)

# Calculate accuracy metrics
accuracy_metrics <- accuracy(one_step_forecasts, test_set)

# Display the RMSE, MAPE, MAE, ME, and MPE values
cat("\nAccuracy Metrics:\n")

## 
## Accuracy Metrics:

cat("RMSE: ", accuracy_metrics["Test set", "RMSE"], "\n")

## RMSE:  3.373886

cat("MAPE: ", accuracy_metrics["Test set", "MAPE"], "\n")

## MAPE:  1.73871

cat("MAE: ", accuracy_metrics["Test set", "MAE"], "\n")

## MAE:  2.627816

cat("ME: ", accuracy_metrics["Test set", "ME"], "\n")

## ME:  -0.2145752

cat("MPE: ", accuracy_metrics["Test set", "MPE"], "\n")

## MPE:  -0.1672674

#Overall, the error metrics suggest that the ARIMA model used for forecasting has performed well, with low RMSE, MAPE, and MAE values. The slight negative bias indicated by ME and MPE is minimal and should not significantly impact the overall forecasting performance

Apple_Stock_Forecasting with ARIMA modelling in R

2024-02-07