Introduction

This project explores the relationship between Apple Inc.’s stock performance and the S&P 500, a widely recognized benchmark for the overall market. By examining the dynamics between these two entities, the analysis aims to understand the extent to which Apple’s stock movements are influenced by broader market trends versus other specific factors.

Primary Questions

  1. What is the relationship between the returns of Apple stock and the S&P 500?
  1. Specifically, how well does the overall market (as represented by the S&P 500) predict Apple stock performance over time?

Data

The data used in this analysis comes from a publicly available Kaggle dataset. It includes daily stock prices and returns for Apple (AAPL) and the S&P 500 index (SPX).

AAPL_df1 <- read.csv("C:/Users/ndhlo/Downloads/AAPL.csv")

SP_df2 <- read.csv("C:/Users/ndhlo/Downloads/SPX.csv")

rmarkdown::paged_table(AAPL_df1)
rmarkdown::paged_table(SP_df2)

Although the data is relatively clean, a few key variables were created to address the analysis objectives. The first step involved sorting the stock and index data by date for consistency.

AAPL_df1 <- AAPL_df1[order(as.Date(AAPL_df1$Date)), ]
SP_df2 <- SP_df2[order(as.Date(SP_df2$Date)), ]

Next, daily returns were calculated for Apple stock and the S&P 500 index. These returns are based on the difference between the current day’s closing price and the previous day’s closing price, expressed as a percentage of the previous day’s closing price.

NB: For purposes of this analysis, returns were strictly limited to capital gains based on price movement.

AAPL_df1$Return <- c(NA, diff(AAPL_df1$Close) / head(AAPL_df1$Close, -1))
SP_df2$Return <- c(NA, diff(SP_df2$Close) / head(SP_df2$Close, -1))

For the analysis, data was filtered to focus on the 11-year period between 2009 and 2020.

AAPL_filtered <- subset(AAPL_df1, Date >= as.Date("2009-01-01") & Date <= as.Date("2020-01-01"))

SP_filtered <- subset(SP_df2, Date >= as.Date("2009-01-01") & Date <= as.Date("2020-01-01"))

The filtered datasets were then merged on the date column, with additional subsets created for training (2009–2019) and testing (2019–2020).

merged_df <- merge(AAPL_filtered, SP_filtered, by = "Date", suffixes = c("_AAPL", "_SPX"))

merged_df1 <- subset(merged_df, Date >= as.Date("2009-01-01") & Date <= as.Date("2019-01-01"))

merged_df2 <- subset(merged_df, Date >= as.Date("2019-01-02") & Date <= as.Date("2020-01-01"))

head(merged_df)
##         Date Open_AAPL High_AAPL Low_AAPL Close_AAPL Adj.Close_AAPL Volume_AAPL
## 1 2009-01-02  12.26857  13.00571 12.16571   12.96429       11.25353   186503800
## 2 2009-01-05  13.31000  13.74000 13.24429   13.51143       11.72847   295402100
## 3 2009-01-06  13.70714  13.88143 13.19857   13.28857       11.53502   322327600
## 4 2009-01-07  13.11571  13.21429 12.89429   13.00143       11.28577   188262200
## 5 2009-01-08  12.91857  13.30714 12.86286   13.24286       11.49534   168375200
## 6 2009-01-09  13.31571  13.34000 12.87714   12.94000       11.23245   136711400
##   Return_AAPL Open_SPX High_SPX Low_SPX Close_SPX Adj.Close_SPX Volume_SPX
## 1  0.06326893   902.99   934.73  899.35    931.80        931.80 4048270000
## 2  0.04220387   929.17   936.63  919.53    927.45        927.45 5413910000
## 3 -0.01649400   931.17   943.85  927.28    934.70        934.70 5392620000
## 4 -0.02160825   927.45   927.45  902.37    906.65        906.65 4704940000
## 5  0.01856937   905.73   910.00  896.81    909.73        909.73 4991550000
## 6 -0.02286949   909.91   911.93  888.31    890.35        890.35 4716500000
##     Return_SPX
## 1  0.031608069
## 2 -0.004668358
## 3  0.007817133
## 4 -0.030009616
## 5  0.003397073
## 6 -0.021303029
head(merged_df1)
##         Date Open_AAPL High_AAPL Low_AAPL Close_AAPL Adj.Close_AAPL Volume_AAPL
## 1 2009-01-02  12.26857  13.00571 12.16571   12.96429       11.25353   186503800
## 2 2009-01-05  13.31000  13.74000 13.24429   13.51143       11.72847   295402100
## 3 2009-01-06  13.70714  13.88143 13.19857   13.28857       11.53502   322327600
## 4 2009-01-07  13.11571  13.21429 12.89429   13.00143       11.28577   188262200
## 5 2009-01-08  12.91857  13.30714 12.86286   13.24286       11.49534   168375200
## 6 2009-01-09  13.31571  13.34000 12.87714   12.94000       11.23245   136711400
##   Return_AAPL Open_SPX High_SPX Low_SPX Close_SPX Adj.Close_SPX Volume_SPX
## 1  0.06326893   902.99   934.73  899.35    931.80        931.80 4048270000
## 2  0.04220387   929.17   936.63  919.53    927.45        927.45 5413910000
## 3 -0.01649400   931.17   943.85  927.28    934.70        934.70 5392620000
## 4 -0.02160825   927.45   927.45  902.37    906.65        906.65 4704940000
## 5  0.01856937   905.73   910.00  896.81    909.73        909.73 4991550000
## 6 -0.02286949   909.91   911.93  888.31    890.35        890.35 4716500000
##     Return_SPX
## 1  0.031608069
## 2 -0.004668358
## 3  0.007817133
## 4 -0.030009616
## 5  0.003397073
## 6 -0.021303029
head(merged_df2)
##            Date Open_AAPL High_AAPL Low_AAPL Close_AAPL Adj.Close_AAPL
## 2517 2019-01-02    154.89    158.85   154.23     157.92       155.2140
## 2518 2019-01-03    143.98    145.72   142.00     142.19       139.7535
## 2519 2019-01-04    144.53    148.55   143.80     148.26       145.7195
## 2520 2019-01-07    148.70    148.83   145.90     147.93       145.3952
## 2521 2019-01-08    149.56    151.82   148.52     150.75       148.1669
## 2522 2019-01-09    151.29    154.53   149.63     153.31       150.6830
##      Volume_AAPL  Return_AAPL Open_SPX High_SPX Low_SPX Close_SPX Adj.Close_SPX
## 2517    37039700  0.001141072  2476.96  2519.49 2467.47   2510.03       2510.03
## 2518    91312200 -0.099607370  2491.92  2493.14 2443.96   2447.89       2447.89
## 2519    58607100  0.042689303  2474.33  2538.07 2474.33   2531.94       2531.94
## 2520    54777800 -0.002225832  2535.61  2566.16 2524.56   2549.69       2549.69
## 2521    41025300  0.019063121  2568.11  2579.82 2547.56   2574.41       2574.41
## 2522    45099100  0.016981742  2580.00  2595.32 2568.89   2584.96       2584.96
##      Volume_SPX   Return_SPX
## 2517 3733160000  0.001268497
## 2518 3822860000 -0.024756730
## 2519 4213410000  0.034335714
## 2520 4104710000  0.007010435
## 2521 4083030000  0.009695285
## 2522 4052480000  0.004098046

Exploratory Data Analysis

  1. Visualizing Stock Trends

To assess the general trends of Apple and the S&P 500, a time-series plot was created.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.4.2
merged_df1$Date <- as.Date(merged_df1$Date)

apple_plot <- ggplot(data = merged_df1, aes(x = Date, y = Close_AAPL, group = 1)) +
  geom_line(color = "blue", linewidth = 1) +
  labs(title = "Apple Stock Closing Prices (2009-2019)",
       x = "Date", 
       y = "Closing Price (AAPL)") +
  theme_minimal()

sp500_plot <- ggplot(data = merged_df1, aes(x = Date, y = Close_SPX, group = 1)) +
  geom_line(color = "green", linewidth = 1) +
  labs(title = "S&P 500 Closing Prices (2009-2019)",
       x = "Date", 
       y = "Closing Price (S&P 500)") +
  theme_minimal()




combined_plot <- apple_plot + sp500_plot

combined_plot

  1. Exploring Return Distributions

Histograms were generated to visualize the distributions of returns for both Apple and the S&P 500.

# Histogram for AAPL returns
aapl_hist <- ggplot(data = merged_df1, aes(x = Return_AAPL)) +
  geom_histogram(binwidth = 0.01, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Distribution of Apple Stock Returns (AAPL)",
       x = "Apple Stock Returns",
       y = "Frequency") +
  theme_minimal()

# Histogram for SPX returns
spx_hist <- ggplot(data = merged_df1, aes(x = Return_SPX)) +
  geom_histogram(binwidth = 0.01, fill = "green", color = "black", alpha = 0.7) +
  labs(title = "Distribution of S&P 500 Returns (SPX)",
       x = "S&P 500 Returns",
       y = "Frequency") +
  theme_minimal()


aapl_hist

spx_hist

  1. Scatter Plot Analysis

Scatter plots were created to explore the relationship between Apple and S&P 500 returns.

library(ggplot2)

ggplot(data = merged_df1, aes(x = Return_SPX, y = Return_AAPL)) +
  geom_point(color = "orange") +
  geom_smooth(method = "lm", color = "black", se = FALSE) +
  labs(title = "Linear Relationship: Apple Returns vs. S&P 500 Returns",
       x = "S&P 500 Returns",
       y = "Apple Returns") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Next, summary statistics, including the mean, variance, and correlation, were calculated to gain deeper insights into the relationship between Apple stock and the S&P 500. These metrics were then organized into a data frame for clarity and analysis.

mean_aapl <- mean(merged_df1$Return_AAPL, na.rm = TRUE)
var_aapl <- var(merged_df1$Return_AAPL, na.rm = TRUE)


mean_spx <- mean(merged_df1$Return_SPX, na.rm = TRUE)
var_spx <- var(merged_df1$Return_SPX, na.rm = TRUE)

correlation <- cor(merged_df1$Return_AAPL, merged_df1$Return_SPX, use = "complete.obs")


summary_stats <- data.frame(
  Statistic = c("Mean (AAPL)", "Variance (AAPL)", "Mean (S&P 500)", "Variance (S&P 500)", "Correlation"),
  Value = c(mean_aapl, var_aapl, mean_spx, var_spx, correlation)
)

print(summary_stats)
##            Statistic        Value
## 1        Mean (AAPL) 0.0011591431
## 2    Variance (AAPL) 0.0002824423
## 3     Mean (S&P 500) 0.0004608028
## 4 Variance (S&P 500) 0.0001098946
## 5        Correlation 0.6028078075

Methods

Regression Model:

A simple linear regression model was built to quantify the relationship between Apple and S&P 500 returns.

lm_SP_AAPL <- lm(Return_AAPL~Return_SPX, data = merged_df1)

summary(lm_SP_AAPL)
## 
## Call:
## lm(formula = Return_AAPL ~ Return_SPX, data = merged_df1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.124278 -0.006808 -0.000416  0.006907  0.080702 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0007138  0.0002676   2.667   0.0077 ** 
## Return_SPX  0.9663970  0.0255115  37.881   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01341 on 2514 degrees of freedom
## Multiple R-squared:  0.3634, Adjusted R-squared:  0.3631 
## F-statistic:  1435 on 1 and 2514 DF,  p-value: < 2.2e-16

Model Validation

  1. Predictions The model was validated using test data from 2019 to 2020.
predicted_returns_2019 <- predict(lm_SP_AAPL, newdata = merged_df2)


length(predicted_returns_2019)
## [1] 252
merged_df2$Pred_Returns_AAPL <- predicted_returns_2019

A new data frame was created to quantitatively and visually compare the model’s predicted results with the actual results, providing insights into the model’s performance.

Actual_Forecast_Comparison <- data.frame(
  Date = merged_df2$Date,
  Actual_Return_AAPL = merged_df2$Return_AAPL,
  Predicted_Return_AAPL = merged_df2$Pred_Returns_AAPL
)

head(Actual_Forecast_Comparison)
##         Date Actual_Return_AAPL Predicted_Return_AAPL
## 1 2019-01-02        0.001141072           0.001939696
## 2 2019-01-03       -0.099607370          -0.023211006
## 3 2019-01-04        0.042689303           0.033895756
## 4 2019-01-07       -0.002225832           0.007488688
## 5 2019-01-08        0.019063121           0.010083319
## 6 2019-01-09        0.016981742           0.004674164
  1. Error Analysis Mean Absolute Error (MAE) was calculated to evaluate the model’s accuracy.
mae_2019 <- mean(abs(Actual_Forecast_Comparison$Actual_Return_AAPL - Actual_Forecast_Comparison$Predicted_Return_AAPL), na.rm = TRUE)


print(paste("Mean Absolute Error for 2019-2020:", mae_2019))
## [1] "Mean Absolute Error for 2019-2020: 0.00823079619243065"
  1. Visualization A line graph was used to compare predicted and actual returns.
library(ggplot2)

Actual_Forecast_Comparison$Date <- as.Date(Actual_Forecast_Comparison$Date)

ggplot(Actual_Forecast_Comparison, aes(x = Date)) +
  geom_line(aes(y = Actual_Return_AAPL, color = "Actual Returns"), group = 1) + 
  geom_line(aes(y = Predicted_Return_AAPL, color = "Predicted Returns"), group = 1, linetype = "dashed") +
  labs(title = "Actual vs Predicted AAPL Returns (2019-2020)",
       x = "Date", 
       y = "Returns") +
  scale_x_date(date_breaks = "3 months", date_labels = "%Y-%m") +
  scale_color_manual(values = c("Actual Returns" = "blue", "Predicted Returns" = "red")) +
  theme_minimal()

Conclusion

The analysis shows that 36.3% of Apple’s return variability can be explained by the S&P 500, as indicated by the R-squared value. While the relationship is statistically significant, the moderate R-squared value and MAE suggest that other factors, such as company-specific events, likely play a significant role in Apple’s stock performance.