Primary Questions
- What is the relationship between the returns of Apple stock and the S&P 500?
- Specifically, how well does the overall market (as represented by the S&P 500) predict Apple stock performance over time?
This project explores the relationship between Apple Inc.’s stock performance and the S&P 500, a widely recognized benchmark for the overall market. By examining the dynamics between these two entities, the analysis aims to understand the extent to which Apple’s stock movements are influenced by broader market trends versus other specific factors.
The data used in this analysis comes from a publicly available Kaggle dataset. It includes daily stock prices and returns for Apple (AAPL) and the S&P 500 index (SPX).
AAPL_df1 <- read.csv("C:/Users/ndhlo/Downloads/AAPL.csv")
SP_df2 <- read.csv("C:/Users/ndhlo/Downloads/SPX.csv")
rmarkdown::paged_table(AAPL_df1)
rmarkdown::paged_table(SP_df2)
Although the data is relatively clean, a few key variables were created to address the analysis objectives. The first step involved sorting the stock and index data by date for consistency.
AAPL_df1 <- AAPL_df1[order(as.Date(AAPL_df1$Date)), ]
SP_df2 <- SP_df2[order(as.Date(SP_df2$Date)), ]
Next, daily returns were calculated for Apple stock and the S&P 500 index. These returns are based on the difference between the current day’s closing price and the previous day’s closing price, expressed as a percentage of the previous day’s closing price.
NB: For purposes of this analysis, returns were strictly limited to capital gains based on price movement.
AAPL_df1$Return <- c(NA, diff(AAPL_df1$Close) / head(AAPL_df1$Close, -1))
SP_df2$Return <- c(NA, diff(SP_df2$Close) / head(SP_df2$Close, -1))
For the analysis, data was filtered to focus on the 11-year period between 2009 and 2020.
AAPL_filtered <- subset(AAPL_df1, Date >= as.Date("2009-01-01") & Date <= as.Date("2020-01-01"))
SP_filtered <- subset(SP_df2, Date >= as.Date("2009-01-01") & Date <= as.Date("2020-01-01"))
The filtered datasets were then merged on the date column, with additional subsets created for training (2009–2019) and testing (2019–2020).
merged_df <- merge(AAPL_filtered, SP_filtered, by = "Date", suffixes = c("_AAPL", "_SPX"))
merged_df1 <- subset(merged_df, Date >= as.Date("2009-01-01") & Date <= as.Date("2019-01-01"))
merged_df2 <- subset(merged_df, Date >= as.Date("2019-01-02") & Date <= as.Date("2020-01-01"))
head(merged_df)
## Date Open_AAPL High_AAPL Low_AAPL Close_AAPL Adj.Close_AAPL Volume_AAPL
## 1 2009-01-02 12.26857 13.00571 12.16571 12.96429 11.25353 186503800
## 2 2009-01-05 13.31000 13.74000 13.24429 13.51143 11.72847 295402100
## 3 2009-01-06 13.70714 13.88143 13.19857 13.28857 11.53502 322327600
## 4 2009-01-07 13.11571 13.21429 12.89429 13.00143 11.28577 188262200
## 5 2009-01-08 12.91857 13.30714 12.86286 13.24286 11.49534 168375200
## 6 2009-01-09 13.31571 13.34000 12.87714 12.94000 11.23245 136711400
## Return_AAPL Open_SPX High_SPX Low_SPX Close_SPX Adj.Close_SPX Volume_SPX
## 1 0.06326893 902.99 934.73 899.35 931.80 931.80 4048270000
## 2 0.04220387 929.17 936.63 919.53 927.45 927.45 5413910000
## 3 -0.01649400 931.17 943.85 927.28 934.70 934.70 5392620000
## 4 -0.02160825 927.45 927.45 902.37 906.65 906.65 4704940000
## 5 0.01856937 905.73 910.00 896.81 909.73 909.73 4991550000
## 6 -0.02286949 909.91 911.93 888.31 890.35 890.35 4716500000
## Return_SPX
## 1 0.031608069
## 2 -0.004668358
## 3 0.007817133
## 4 -0.030009616
## 5 0.003397073
## 6 -0.021303029
head(merged_df1)
## Date Open_AAPL High_AAPL Low_AAPL Close_AAPL Adj.Close_AAPL Volume_AAPL
## 1 2009-01-02 12.26857 13.00571 12.16571 12.96429 11.25353 186503800
## 2 2009-01-05 13.31000 13.74000 13.24429 13.51143 11.72847 295402100
## 3 2009-01-06 13.70714 13.88143 13.19857 13.28857 11.53502 322327600
## 4 2009-01-07 13.11571 13.21429 12.89429 13.00143 11.28577 188262200
## 5 2009-01-08 12.91857 13.30714 12.86286 13.24286 11.49534 168375200
## 6 2009-01-09 13.31571 13.34000 12.87714 12.94000 11.23245 136711400
## Return_AAPL Open_SPX High_SPX Low_SPX Close_SPX Adj.Close_SPX Volume_SPX
## 1 0.06326893 902.99 934.73 899.35 931.80 931.80 4048270000
## 2 0.04220387 929.17 936.63 919.53 927.45 927.45 5413910000
## 3 -0.01649400 931.17 943.85 927.28 934.70 934.70 5392620000
## 4 -0.02160825 927.45 927.45 902.37 906.65 906.65 4704940000
## 5 0.01856937 905.73 910.00 896.81 909.73 909.73 4991550000
## 6 -0.02286949 909.91 911.93 888.31 890.35 890.35 4716500000
## Return_SPX
## 1 0.031608069
## 2 -0.004668358
## 3 0.007817133
## 4 -0.030009616
## 5 0.003397073
## 6 -0.021303029
head(merged_df2)
## Date Open_AAPL High_AAPL Low_AAPL Close_AAPL Adj.Close_AAPL
## 2517 2019-01-02 154.89 158.85 154.23 157.92 155.2140
## 2518 2019-01-03 143.98 145.72 142.00 142.19 139.7535
## 2519 2019-01-04 144.53 148.55 143.80 148.26 145.7195
## 2520 2019-01-07 148.70 148.83 145.90 147.93 145.3952
## 2521 2019-01-08 149.56 151.82 148.52 150.75 148.1669
## 2522 2019-01-09 151.29 154.53 149.63 153.31 150.6830
## Volume_AAPL Return_AAPL Open_SPX High_SPX Low_SPX Close_SPX Adj.Close_SPX
## 2517 37039700 0.001141072 2476.96 2519.49 2467.47 2510.03 2510.03
## 2518 91312200 -0.099607370 2491.92 2493.14 2443.96 2447.89 2447.89
## 2519 58607100 0.042689303 2474.33 2538.07 2474.33 2531.94 2531.94
## 2520 54777800 -0.002225832 2535.61 2566.16 2524.56 2549.69 2549.69
## 2521 41025300 0.019063121 2568.11 2579.82 2547.56 2574.41 2574.41
## 2522 45099100 0.016981742 2580.00 2595.32 2568.89 2584.96 2584.96
## Volume_SPX Return_SPX
## 2517 3733160000 0.001268497
## 2518 3822860000 -0.024756730
## 2519 4213410000 0.034335714
## 2520 4104710000 0.007010435
## 2521 4083030000 0.009695285
## 2522 4052480000 0.004098046
To assess the general trends of Apple and the S&P 500, a time-series plot was created.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.4.2
merged_df1$Date <- as.Date(merged_df1$Date)
apple_plot <- ggplot(data = merged_df1, aes(x = Date, y = Close_AAPL, group = 1)) +
geom_line(color = "blue", linewidth = 1) +
labs(title = "Apple Stock Closing Prices (2009-2019)",
x = "Date",
y = "Closing Price (AAPL)") +
theme_minimal()
sp500_plot <- ggplot(data = merged_df1, aes(x = Date, y = Close_SPX, group = 1)) +
geom_line(color = "green", linewidth = 1) +
labs(title = "S&P 500 Closing Prices (2009-2019)",
x = "Date",
y = "Closing Price (S&P 500)") +
theme_minimal()
combined_plot <- apple_plot + sp500_plot
combined_plot
Histograms were generated to visualize the distributions of returns for both Apple and the S&P 500.
# Histogram for AAPL returns
aapl_hist <- ggplot(data = merged_df1, aes(x = Return_AAPL)) +
geom_histogram(binwidth = 0.01, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Distribution of Apple Stock Returns (AAPL)",
x = "Apple Stock Returns",
y = "Frequency") +
theme_minimal()
# Histogram for SPX returns
spx_hist <- ggplot(data = merged_df1, aes(x = Return_SPX)) +
geom_histogram(binwidth = 0.01, fill = "green", color = "black", alpha = 0.7) +
labs(title = "Distribution of S&P 500 Returns (SPX)",
x = "S&P 500 Returns",
y = "Frequency") +
theme_minimal()
aapl_hist
spx_hist
Scatter plots were created to explore the relationship between Apple and S&P 500 returns.
library(ggplot2)
ggplot(data = merged_df1, aes(x = Return_SPX, y = Return_AAPL)) +
geom_point(color = "orange") +
geom_smooth(method = "lm", color = "black", se = FALSE) +
labs(title = "Linear Relationship: Apple Returns vs. S&P 500 Returns",
x = "S&P 500 Returns",
y = "Apple Returns") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Next, summary statistics, including the mean, variance, and correlation, were calculated to gain deeper insights into the relationship between Apple stock and the S&P 500. These metrics were then organized into a data frame for clarity and analysis.
mean_aapl <- mean(merged_df1$Return_AAPL, na.rm = TRUE)
var_aapl <- var(merged_df1$Return_AAPL, na.rm = TRUE)
mean_spx <- mean(merged_df1$Return_SPX, na.rm = TRUE)
var_spx <- var(merged_df1$Return_SPX, na.rm = TRUE)
correlation <- cor(merged_df1$Return_AAPL, merged_df1$Return_SPX, use = "complete.obs")
summary_stats <- data.frame(
Statistic = c("Mean (AAPL)", "Variance (AAPL)", "Mean (S&P 500)", "Variance (S&P 500)", "Correlation"),
Value = c(mean_aapl, var_aapl, mean_spx, var_spx, correlation)
)
print(summary_stats)
## Statistic Value
## 1 Mean (AAPL) 0.0011591431
## 2 Variance (AAPL) 0.0002824423
## 3 Mean (S&P 500) 0.0004608028
## 4 Variance (S&P 500) 0.0001098946
## 5 Correlation 0.6028078075
Regression Model:
A simple linear regression model was built to quantify the relationship between Apple and S&P 500 returns.
lm_SP_AAPL <- lm(Return_AAPL~Return_SPX, data = merged_df1)
summary(lm_SP_AAPL)
##
## Call:
## lm(formula = Return_AAPL ~ Return_SPX, data = merged_df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.124278 -0.006808 -0.000416 0.006907 0.080702
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0007138 0.0002676 2.667 0.0077 **
## Return_SPX 0.9663970 0.0255115 37.881 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01341 on 2514 degrees of freedom
## Multiple R-squared: 0.3634, Adjusted R-squared: 0.3631
## F-statistic: 1435 on 1 and 2514 DF, p-value: < 2.2e-16
predicted_returns_2019 <- predict(lm_SP_AAPL, newdata = merged_df2)
length(predicted_returns_2019)
## [1] 252
merged_df2$Pred_Returns_AAPL <- predicted_returns_2019
A new data frame was created to quantitatively and visually compare the model’s predicted results with the actual results, providing insights into the model’s performance.
Actual_Forecast_Comparison <- data.frame(
Date = merged_df2$Date,
Actual_Return_AAPL = merged_df2$Return_AAPL,
Predicted_Return_AAPL = merged_df2$Pred_Returns_AAPL
)
head(Actual_Forecast_Comparison)
## Date Actual_Return_AAPL Predicted_Return_AAPL
## 1 2019-01-02 0.001141072 0.001939696
## 2 2019-01-03 -0.099607370 -0.023211006
## 3 2019-01-04 0.042689303 0.033895756
## 4 2019-01-07 -0.002225832 0.007488688
## 5 2019-01-08 0.019063121 0.010083319
## 6 2019-01-09 0.016981742 0.004674164
mae_2019 <- mean(abs(Actual_Forecast_Comparison$Actual_Return_AAPL - Actual_Forecast_Comparison$Predicted_Return_AAPL), na.rm = TRUE)
print(paste("Mean Absolute Error for 2019-2020:", mae_2019))
## [1] "Mean Absolute Error for 2019-2020: 0.00823079619243065"
library(ggplot2)
Actual_Forecast_Comparison$Date <- as.Date(Actual_Forecast_Comparison$Date)
ggplot(Actual_Forecast_Comparison, aes(x = Date)) +
geom_line(aes(y = Actual_Return_AAPL, color = "Actual Returns"), group = 1) +
geom_line(aes(y = Predicted_Return_AAPL, color = "Predicted Returns"), group = 1, linetype = "dashed") +
labs(title = "Actual vs Predicted AAPL Returns (2019-2020)",
x = "Date",
y = "Returns") +
scale_x_date(date_breaks = "3 months", date_labels = "%Y-%m") +
scale_color_manual(values = c("Actual Returns" = "blue", "Predicted Returns" = "red")) +
theme_minimal()
The analysis shows that 36.3% of Apple’s return variability can be explained by the S&P 500, as indicated by the R-squared value. While the relationship is statistically significant, the moderate R-squared value and MAE suggest that other factors, such as company-specific events, likely play a significant role in Apple’s stock performance.