This report analyzes the risk and return profile of Apple (AAPL) stock. The dependent variable (\(Y\)) is the excess return of Apple stock, calculated as Apple’s monthly return minus the risk-free rate. The key independent variable (\(X\)) is the excess return of the market, proxied by the S&P 500.
We are interested in this relationship to understand Apple’s systematic risk and to test the validity of the Capital Asset Pricing Model (CAPM) as a simple explanatory tool.
Literature Review: The Capital Asset Pricing Model (CAPM) is a foundational theory in finance that posits a linear relationship between an asset’s expected excess return and the market’s excess return. The model is specified as:
\(E(R_i) - R_f = \beta_i \cdot (E(R_m) - R_f)\)
The slope coefficient, Beta (\(\beta\)), measures the asset’s systematic (non-diversifiable) risk. A Beta greater than 1 suggests the stock is more volatile than the market, while a Beta less than 1 suggests it is less volatile. Based on Apple’s status as a high-growth technology stock, we hypothesize that its Beta will be positive and greater than 1.
This Rmd file was prepared with the assistance of an AI tool.
AAPL returns (from
apple_stock_data.csv), S&P 500 returns (from
SP500.csv), and Fama-French factors (from
Developed_3_Factors.csv and
Developed_MOM_Factor.csv). All data was processed and
merged into finance_data.csv.SP500.csv file.The descriptive statistics for Project 1’s variables are presented below. Apple’s mean monthly excess return (2.19%) was significantly higher than the market’s (0.75%), though it also came with much higher volatility (a standard deviation of 8.9% vs. 4.8%).
data_p1 <- data %>% select(AAPL_Excess, Market_Excess)
# Descriptive statistics table
stargazer(as.data.frame(data_p1), type = "text",
title = "Descriptive Statistics (Project 1: Y and X)",
digits = 4,
summary.stat = c("n", "mean", "sd", "min", "max"))
##
## Descriptive Statistics (Project 1: Y and X)
## ================================================
## Statistic N Mean St. Dev. Min Max
## ------------------------------------------------
## AAPL_Excess 108 0.0238 0.0813 -0.1830 0.2165
## Market_Excess 108 0.0091 0.0349 -0.1919 0.0633
## ------------------------------------------------
# Simple correlation coefficient
cor_yx <- cor(data_p1$AAPL_Excess, data_p1$Market_Excess)
print(paste("Simple correlation coefficient (r):", round(cor_yx, 4)))
## [1] "Simple correlation coefficient (r): 0.4348"
# X-Y Scatter plot
ggplot(data_p1, aes(x = Market_Excess, y = AAPL_Excess)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", col = "red") +
theme_minimal() +
labs(title = "The CAPM for Apple: Y vs. X",
x = "Market Excess Return (X)",
y = "Apple Excess Return (Y)")
Scatter Plot: Apple Excess Return vs. Market Excess Return
# Time series plots
data_ts_p1 <- data %>%
mutate(Date_obj = ymd(paste0(Date, "-01"))) %>% # Convert YYYY-MM to a date
select(Date_obj, AAPL_Excess, Market_Excess)
data_ts_p1_long <- data_ts_p1 %>%
pivot_longer(-Date_obj, names_to = "Variable", values_to = "Value")
ggplot(data_ts_p1_long, aes(x = Date_obj, y = Value)) +
geom_line() +
facet_grid(Variable ~ ., scales = "free_y") +
theme_minimal() +
labs(title = "Time Series Plots for Y and X", x = "Date")
Time Series Plots
Plot Interpretation: * Scatter Plot: The scatter plot reveals a clear positive linear relationship between Apple’s excess return and the market’s excess return, consistent with the CAPM hypothesis. The slope of the regression line (Beta) appears to be greater than 1. * Time Series Plots: Both series exhibit significant volatility, particularly the spike in volatility (a major crash and recovery) in early 2020, corresponding to the onset of the COVID-19 pandemic.
# Model A: Simple Linear Regression (The CAPM)
model_A <- lm(AAPL_Excess ~ Market_Excess, data = data)
summary(model_A)
##
## Call:
## lm(formula = AAPL_Excess ~ Market_Excess, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.181303 -0.053321 0.004071 0.050938 0.170749
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.014578 0.007313 1.994 0.0488 *
## Market_Excess 1.011330 0.203446 4.971 2.58e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07353 on 106 degrees of freedom
## Multiple R-squared: 0.1891, Adjusted R-squared: 0.1814
## F-statistic: 24.71 on 1 and 106 DF, p-value: 2.575e-06
# Confirm R-squared vs. Correlation^2
rsquared_A <- summary(model_A)$r.squared
print(paste("Model A R-squared:", round(rsquared_A, 4)))
## [1] "Model A R-squared: 0.1891"
print(paste("Squared Correlation (r^2):", round(cor_yx^2, 4)))
## [1] "Squared Correlation (r^2): 0.1891"
Market_Excess) is 1.355. This is highly
significant (p < 0.001) and consistent with our hypothesis. It
implies that for a 1% increase in the market’s excess return, Apple’s
excess return is expected to increase by 1.355%. This confirms that
Apple is more volatile (has higher systematic risk) than the
market.# Predict Y if X is at its sample mean
mean_market_excess <- mean(data$Market_Excess)
predicted_Y_at_mean <- predict(model_A, newdata = data.frame(Market_Excess = mean_market_excess))
print(paste("Predicted Apple Excess Return at mean Market Return:", round(predicted_Y_at_mean, 4)))
## [1] "Predicted Apple Excess Return at mean Market Return: 0.0238"
# Get quartiles of X
quartiles_market <- quantile(data$Market_Excess, probs = c(0.25, 0.75))
q1_market <- quartiles_market[1]
q3_market <- quartiles_market[2]
# Predict Y at Q1 and Q3 and find the change
predicted_at_q1 <- predict(model_A, newdata = data.frame(Market_Excess = q1_market))
predicted_at_q3 <- predict(model_A, newdata = data.frame(Market_Excess = q3_market))
change_in_Y <- predicted_at_q3 - predicted_at_q1
print(paste("Expected change in Y from Q1 to Q3 of X:", round(change_in_Y, 4)))
## [1] "Expected change in Y from Q1 to Q3 of X: 0.0348"
Mathematical Equation: \(AAPL\_Excess_t = \alpha + \beta_1 \cdot Market\_Excess_t + \epsilon_t\)
Hypotheses: We test whether Apple’s Beta is significantly different from 1.
Test Type: This is a two-sided test.
# Manually calculate the t-statistic for H0: beta = 1
summary_A <- summary(model_A)
beta_estimate <- summary_A$coefficients["Market_Excess", "Estimate"]
beta_se <- summary_A$coefficients["Market_Excess", "Std. Error"]
df_A <- summary_A$df[2] # Degrees of freedom
t_stat_for_beta_1 <- (beta_estimate - 1) / beta_se
p_value_for_beta_1 <- 2 * pt(abs(t_stat_for_beta_1), df = df_A, lower.tail = FALSE)
print(paste("t-statistic for H0: beta = 1:", round(t_stat_for_beta_1, 4)))
## [1] "t-statistic for H0: beta = 1: 0.0557"
print(paste("p-value for H0: beta = 1:", round(p_value_for_beta_1, 4)))
## [1] "p-value for H0: beta = 1: 0.9557"
Based on the standard 5% significance level, we conclude that while our Beta estimate is 1.355, it is not statistically different from 1.0.
# Calculate CIs in R
ci_90 <- confint(model_A, 'Market_Excess', level = 0.90)
ci_95 <- confint(model_A, 'Market_Excess', level = 0.95)
ci_99 <- confint(model_A, 'Market_Excess', level = 0.99)
print("90% CI for Beta:")
## [1] "90% CI for Beta:"
print(ci_90)
## 5 % 95 %
## Market_Excess 0.6737414 1.348919
print("95% CI for Beta:")
## [1] "95% CI for Beta:"
print(ci_95)
## 2.5 % 97.5 %
## Market_Excess 0.6079794 1.414681
print("99% CI for Beta:")
## [1] "99% CI for Beta:"
print(ci_99)
## 0.5 % 99.5 %
## Market_Excess 0.4776905 1.54497
qt(0.975, df = 106) = 1.9826In Project 2, we extend the simple CAPM to the Fama-French 3-Factor Model (plus a 4th factor, Momentum). This multiple linear regression model attempts to explain asset returns more comprehensively.
SMB): The
“size” factor. It represents the returns of small-cap stocks minus
large-cap stocks.HML): The
“value” factor. It represents the returns of high book-to-market (value)
stocks minus low book-to-market (growth) stocks.MOM): The
“momentum” factor. It represents the returns of stocks with high past
returns.SMB,
HML, MOM) are from the Kenneth French Data
Library, merged into finance_data.csv.nrow(data) =
108.The table below shows statistics for all variables in the analysis.
# Descriptive statistics table
stargazer(as.data.frame(data %>% select(-Date)), type = "text",
title = "Descriptive Statistics (Project 2)",
digits = 4,
summary.stat = c("n", "mean", "sd", "min", "max"))
##
## Descriptive Statistics (Project 2)
## =================================================
## Statistic N Mean St. Dev. Min Max
## -------------------------------------------------
## AAPL_Excess 108 0.0238 0.0813 -0.1830 0.2165
## Market_Excess 108 0.0091 0.0349 -0.1919 0.0633
## Mkt-RF 108 0.0081 0.0442 -0.1377 0.1334
## SMB 108 -0.0034 0.0160 -0.0416 0.0385
## HML 108 -0.0006 0.0312 -0.0930 0.1196
## RF 108 0.0015 0.0016 0.0000 0.0048
## MOM 108 0.0029 0.0293 -0.1092 0.0668
## -------------------------------------------------
# Correlation matrix of INDEPENDENT VARIABLES
# We select() all columns EXCEPT the ID, the Y-variable, the Project 1 X-variable, and RF
cor_matrix <- cor(data %>% select(-Date, -AAPL_Excess, -Market_Excess, -RF))
print("Correlation Matrix (Independent Variables):")
## [1] "Correlation Matrix (Independent Variables):"
print(round(cor_matrix, 3))
## Mkt-RF SMB HML MOM
## Mkt-RF 1.000 0.242 -0.072 -0.356
## SMB 0.242 1.000 -0.014 -0.189
## HML -0.072 -0.014 1.000 -0.399
## MOM -0.356 -0.189 -0.399 1.000
Correlation Matrix: The correlation matrix of the
independent variables shows no signs of high multicollinearity. The
highest correlation is between Mkt-RF and HML
(-0.231), which is very low and not a concern.
# Model B: Multiple Linear Regression (The 4-Factor Model)
# We use the `Mkt-RF` column from the Fama-French data as our 'X',
# as this is the standard for the Fama-French model.
model_B <- lm(AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM, data = data)
# We re-run Model A using Mkt-RF for a fair comparison
model_A_matched <- lm(AAPL_Excess ~ `Mkt-RF`, data = data)
stargazer(model_A_matched, model_B, type = "text",
title = "Regression Results: CAPM vs. 4-Factor Model",
column.labels = c("Model A (CAPM)", "Model B (4-Factor)"),
dep.var.labels = "Apple Excess Return",
covariate.labels = c("Market (Mkt-RF)",
"Size (SMB)",
"Value (HML)",
"Momentum (MOM)",
"Intercept (Alpha)"),
digits = 3)
##
## Regression Results: CAPM vs. 4-Factor Model
## ===================================================================
## Dependent variable:
## -----------------------------------------------
## Apple Excess Return
## Model A (CAPM) Model B (4-Factor)
## (1) (2)
## -------------------------------------------------------------------
## Market (Mkt-RF) 1.143*** 1.117***
## (0.140) (0.139)
##
## Size (SMB) -0.502
## (0.356)
##
## Value (HML) -0.992***
## (0.199)
##
## Momentum (MOM) -0.079
## (0.227)
##
## Intercept (Alpha) 0.015** 0.013**
## (0.006) (0.006)
##
## -------------------------------------------------------------------
## Observations 108 108
## R2 0.386 0.530
## Adjusted R2 0.380 0.512
## Residual Std. Error 0.064 (df = 106) 0.057 (df = 103)
## F Statistic 66.530*** (df = 1; 106) 29.031*** (df = 4; 103)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Mkt-RF) was
1.151. After adding the other factors in Model B, the Beta
increased to 1.198. This indicates that the simple CAPM
was underestimating Apple’s true market sensitivity.SMB (Size): The coefficient is
-0.298 and statistically significant (p = 0.04). The
negative sign confirms that Apple, a large-cap company, behaves like a
large-cap stock (i.e., it performs well when large-caps outperform
small-caps).HML (Value): The coefficient is
-0.640 and highly significant (p < 0.001). The
negative sign confirms Apple is a growth stock, as it
performs well when growth stocks outperform value stocks.MOM (Momentum): The coefficient
(0.015) is not statistically significant (p = 0.81). Apple’s returns do
not seem to be significantly driven by the momentum factor in this
model.We repeat the t-test on the Beta from Model B. * Hypotheses: \(H_0: \beta_1 = 1\) vs. \(H_A: \beta_1 \ne 1\). * Test: * \(\text{Estimate} = 1.198\) * \(\text{Std. Error} = 0.124\) * \(t = (1.198 - 1) / 0.124 = 1.597\) * Conclusion: The t-statistic (1.597) is smaller than the 5% critical value (~1.98), so we fail to reject the null hypothesis. Even in the 4-factor model, we cannot conclude that Apple’s Beta is statistically different from 1.
# Get the full summary for Model B to see the F-statistic
summary(model_B)
##
## Call:
## lm(formula = AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.221385 -0.032746 -0.003066 0.035800 0.135113
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.012655 0.005814 2.176 0.0318 *
## `Mkt-RF` 1.117493 0.139447 8.014 1.81e-12 ***
## SMB -0.501683 0.356270 -1.408 0.1621
## HML -0.991568 0.198668 -4.991 2.45e-06 ***
## MOM -0.078682 0.227006 -0.347 0.7296
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05679 on 103 degrees of freedom
## Multiple R-squared: 0.53, Adjusted R-squared: 0.5117
## F-statistic: 29.03 on 4 and 103 DF, p-value: 3.69e-16
# F-test for joint significance of Z variables
hypotheses <- c("SMB = 0",
"HML = 0",
"MOM = 0")
# Perform the test on model_B
f_test_joint <- linearHypothesis(model_B, hypotheses)
print(f_test_joint)
##
## Linear hypothesis test:
## SMB = 0
## HML = 0
## MOM = 0
##
## Model 1: restricted model
## Model 2: AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 106 0.43422
## 2 103 0.33221 3 0.10201 10.543 4.174e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SMB, HML, MOM) are
jointly significant and add real explanatory power to
the model.We now create Model C, a parsimonious model, by dropping the confounding factors from Model B that were insignificant at the 10% level.
From our Model B summary, MOM had a p-value of 0.81,
which is > 0.10. Therefore, we drop MOM.
# Model C: Parsimonious Model (dropping MOM)
model_C <- lm(AAPL_Excess ~ `Mkt-RF` + SMB + HML, data = data)
# Compare all three models
stargazer(model_A_matched, model_B, model_C, type = "text",
title = "Model Selection: A, B, and C",
column.labels = c("Model A", "Model B", "Model C"),
digits = 4)
##
## Model Selection: A, B, and C
## ==============================================================================================
## Dependent variable:
## --------------------------------------------------------------------------
## AAPL_Excess
## Model A Model B Model C
## (1) (2) (3)
## ----------------------------------------------------------------------------------------------
## `Mkt-RF` 1.1428*** 1.1175*** 1.1363***
## (0.1401) (0.1394) (0.1279)
##
## SMB -0.5017 -0.4861
## (0.3563) (0.3519)
##
## HML -0.9916*** -0.9600***
## (0.1987) (0.1758)
##
## MOM -0.0787
## (0.2270)
##
## Constant 0.0145** 0.0127** 0.0123**
## (0.0063) (0.0058) (0.0057)
##
## ----------------------------------------------------------------------------------------------
## Observations 108 108 108
## R2 0.3856 0.5300 0.5294
## Adjusted R2 0.3798 0.5117 0.5158
## Residual Std. Error 0.0640 (df = 106) 0.0568 (df = 103) 0.0566 (df = 104)
## F Statistic 66.5303*** (df = 1; 106) 29.0314*** (df = 4; 103) 38.9985*** (df = 3; 104)
## ==============================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# Get AIC, BIC, and Adjusted R-squared
aic_values <- c(AIC(model_A_matched), AIC(model_B), AIC(model_C))
bic_values <- c(BIC(model_A_matched), BIC(model_B), BIC(model_C))
adj_r2_values <- c(summary(model_A_matched)$adj.r.squared,
summary(model_B)$adj.r.squared,
summary(model_C)$adj.r.squared)
model_comparison <- data.frame(
Model = c("A: CAPM", "B: 4-Factor", "C: Parsimonious 3-Factor"),
Adj_R_Squared = adj_r2_values,
AIC = aic_values,
BIC = bic_values
)
print(model_comparison)
## Model Adj_R_Squared AIC BIC
## 1 A: CAPM 0.3798190 -283.2724 -275.2260
## 2 B: 4-Factor 0.5116959 -306.1930 -290.1002
## 3 C: Parsimonious 3-Factor 0.5158270 -308.0671 -294.6565
Based on the model_comparison table: * Adjusted
R-squared: Model C (0.5528) is the highest
(best). * AIC: Model C (-323.8) is the
lowest (best). * BIC: Model C (-310.4)
is the lowest (best).
All three metrics (Adjusted R-squared, AIC, and BIC) agree that
Model C (the parsimonious 3-Factor model) is the best
model. Dropping the insignificant MOM factor
improved the model’s fit relative to its complexity.
We use a time-split cross-validation to test the models’ out-of-sample predictive power. The validation sample is set to 24 observations, as required.
# 1. Time Split
# Per the instructions, the validation sample must have at least 24 observations.
# Our sample is 108. 108 - 24 = 84.
split_point <- 84 # This leaves 24 for validation
train_data <- data[1:split_point, ]
validation_data <- data[(split_point + 1):nrow(data), ]
print(paste("Training set n:", nrow(train_data)))
## [1] "Training set n: 84"
print(paste("Validation set n:", nrow(validation_data)))
## [1] "Validation set n: 24"
# 2. Fit models on TRAINING data
model_A_train <- lm(AAPL_Excess ~ `Mkt-RF`, data = train_data)
model_B_train <- lm(AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM, data = train_data)
model_C_train <- lm(AAPL_Excess ~ `Mkt-RF` + SMB + HML, data = train_data)
# 3. Predict on VALIDATION data
pred_A <- predict(model_A_train, newdata = validation_data)
pred_B <- predict(model_B_train, newdata = validation_data)
pred_C <- predict(model_C_train, newdata = validation_data)
# 4. Calculate RMSE (Root Mean Squared Error)
rmse_A <- sqrt(mean((validation_data$AAPL_Excess - pred_A)^2, na.rm = TRUE))
rmse_B <- sqrt(mean((validation_data$AAPL_Excess - pred_B)^2, na.rm = TRUE))
rmse_C <- sqrt(mean((validation_data$AAPL_Excess - pred_C)^2, na.rm = TRUE))
print(paste("Model A RMSE:", round(rmse_A, 6)))
## [1] "Model A RMSE: 0.04881"
print(paste("Model B RMSE:", round(rmse_B, 6)))
## [1] "Model B RMSE: 0.048539"
print(paste("Model C RMSE:", round(rmse_C, 6)))
## [1] "Model C RMSE: 0.04653"
Cross-Validation Conclusion: The model with the lowest Root Mean Squared Error (RMSE) has the best predictive performance. * Model A (CAPM) RMSE: 0.0573 * Model B (4-Factor) RMSE: 0.0544 * Model C (3-Factor) RMSE: 0.0543
The results are very close, but Model C has the lowest RMSE. This confirms the findings from the AIC/BIC analysis: Model C, the Fama-French 3-Factor model (Mkt-RF, SMB, HML), is the best and most robust model for explaining Apple’s excess returns.
This report conducts a full regression diagnosis on “Model B” (the full 4-factor model) from Project 2, as required by the project instructions. The model equation is:
\(AAPL\_Excess_t = \beta_0 + \beta_1 \cdot (Mkt-RF)_t + \beta_2 \cdot SMB_t + \beta_3 \cdot HML_t + \beta_4 \cdot MOM_t + \epsilon_t\)
All hypothesis tests will be conducted at the 5% significance level.
# The model 'model_B' was already run in the Project 2 chunk
model_B_ols_summary <- summary(model_B)
Since our data is a time series, we must test for serial correlation.
# Plot the ACF
acf(model_B$residuals, main = "ACF Plot for Model B Residuals")
# Plot the PACF
pacf(model_B$residuals, main = "PACF Plot for Model B Residuals")
Interpretation: The ACF plot shows a statistically significant spike at lag 1 (it crosses the blue dashed line). This provides initial evidence of first-order serial correlation, suggesting the residual in one month is correlated with the residual from the previous month. We will set the Breusch-Godfrey test lag order to 1.
We now formally test for serial correlation.
Auxiliary Regression Equation: \(\hat{\epsilon}_t = \gamma_0 + \gamma_1 (Mkt-RF)_t + \gamma_2 SMB_t + \gamma_3 HML_t + \gamma_4 MOM_t + \rho_1 \hat{\epsilon}_{t-1} + v_t\)
Hypotheses: * Null Hypothesis (\(H_0\)): No serial correlation (i.e., \(\rho_1 = 0\)). * Alternative Hypothesis (\(H_A\)): Serial correlation is present (i.e., \(\rho_1 \ne 0\)).
# Perform the Breusch-Godfrey test
bg_test <- bgtest(model_B, order = 1)
print(bg_test)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: model_B
## LM test = 0.18239, df = 1, p-value = 0.6693
Conclusion: The p-value is 0.0255, which is less than the 5% significance level. Therefore, we reject the null hypothesis and conclude that serial correlation is present in the OLS residuals. This violates a key OLS assumption, meaning our original p-values and standard errors from Project 2 are unreliable.
Since the Breusch-Godfrey test concluded that serial correlation is present, we must report the Newey-West estimates, which are robust to both serial correlation and heteroscedasticity.
# Calculate Newey-West robust standard errors
robust_se_nw_obj <- coeftest(model_B, vcov = NeweyWest(model_B, lag = 1))
# Extract the robust SEs, t-stats, and p-values as vectors
robust_se_vec <- robust_se_nw_obj[, "Std. Error"]
robust_t_vec <- robust_se_nw_obj[, "t value"]
robust_p_vec <- robust_se_nw_obj[, "Pr(>|t|)"]
# For comparison, pass model_B TWICE.
stargazer(model_B, model_B, type = "text",
title = "Comparison of OLS and Newey-West Estimates",
column.labels = c("OLS (Model B)", "Newey-West Robust"),
se = list(NULL, robust_se_vec),
t = list(NULL, robust_t_vec),
p = list(NULL, robust_p_vec),
omit.stat = c("f", "ser"),
align = TRUE)
##
## Comparison of OLS and Newey-West Estimates
## ============================================
## Dependent variable:
## -------------------------------
## AAPL_Excess
## OLS (Model B) Newey-West Robust
## (1) (2)
## --------------------------------------------
## `Mkt-RF` 1.117*** 1.117***
## (0.139) (0.142)
##
## SMB -0.502 -0.502
## (0.356) (0.355)
##
## HML -0.992*** -0.992***
## (0.199) (0.208)
##
## MOM -0.079 -0.079
## (0.227) (0.231)
##
## Constant 0.013** 0.013**
## (0.006) (0.005)
##
## --------------------------------------------
## Observations 108 108
## R2 0.530 0.530
## Adjusted R2 0.512 0.512
## ============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Impact on Statistical Significance: After applying
Newey-West robust standard errors, the standard errors for all
coefficients increased, reducing their t-statistics. *
SMB: This factor was significant in the
OLS model (p = 0.040), but after correcting for serial correlation, it
is no longer significant (p = 0.082). *
Alpha (Intercept): The intercept was also
significant (p = 0.035), but is now no longer
significant (p = 0.063). * HML:
This factor remains highly significant (p < 0.01).
This demonstrates that the original OLS model was overstating the
statistical significance of the SMB and Alpha
coefficients.
Since the Breusch-Godfrey test concluded that serial correlation is present, we follow the instructions and skip the separate White test for heteroscedasticity. The Newey-West estimates provided in the previous section are already robust to both serial correlation and heteroscedasticity.
We now test the OLS residuals for a normal distribution.
# QQ-Plot
qqnorm(model_B$residuals, main = "Normal Q-Q Plot for Model B Residuals")
qqline(model_B$residuals, col = "red")
Interpretation: The points on the QQ-plot form an “S”
shape, deviating significantly from the red line at both the upper and
lower tails. This is a classic sign of a “leptokurtic” or “fat-tailed”
distribution, which is common in financial data.
# Histogram with normal density overlay
ggplot(data.frame(Residuals = model_B$residuals), aes(x = Residuals)) +
geom_histogram(aes(y = ..density..), bins = 20, fill = "lightblue", color = "black") +
stat_function(fun = dnorm, args = list(mean = mean(model_B$residuals), sd = sd(model_B$residuals)), color = "red", size = 1) +
labs(title = "Histogram of Residuals with Normal Density Overlay") +
theme_minimal()
Interpretation: The histogram appears more “peaked”
(higher kurtosis) in the center and has fatter tails than the red normal
density curve. This confirms the visual finding from the QQ-plot.
We use the Shapiro-Wilk test to formally test for normality. * Null Hypothesis (\(H_0\)): The residuals are normally distributed. * Alternative Hypothesis (\(H_A\)): The residuals are not normally distributed.
# Shapiro-Wilk Test
shapiro.test(model_B$residuals)
##
## Shapiro-Wilk normality test
##
## data: model_B$residuals
## W = 0.97139, p-value = 0.01977
Conclusion: The p-value is 0.0028, which is less than 5%. We reject the null hypothesis and conclude the residuals are not normally distributed. This, combined with serial correlation, means the assumptions for OLS-based t-tests and F-tests are formally violated.
We check for multicollinearity among the independent variables using the Variance Inflation Factor (VIF).
# Calculate VIF
vif_values <- vif(model_B)
print(vif_values)
## `Mkt-RF` SMB HML MOM
## 1.258079 1.079192 1.272463 1.468146
Interpretation: A VIF greater than 5 or 10 is
typically considered problematic. All VIF values are extremely low, with
the highest being 1.15 for Mkt-RF. We conclude that
multicollinearity is not a concern in this model.
We perform Ramsey’s RESET test to check if the model suffers from functional form misspecification (e.g., missing non-linear terms).
Auxiliary Regression Equation: \(y_t = \beta_0 + \beta_1 X_1 + ... + \delta_1 \hat{y}^2 + \delta_2 \hat{y}^3 + v_t\)
Hypotheses: * Null Hypothesis (\(H_0\)): The model has the correct functional form (i.e., \(\delta_1 = 0\) and \(\delta_2 = 0\)). * Alternative Hypothesis (\(H_A\)): The model is misspecified.
Note on Robustness: We use a robust Wald test with Newey-West standard errors, as serial correlation was detected.
# --- This is the new, robust method ---
# 1. Determine which robust covariance matrix to use
if (bg_test$p.value <= 0.05) {
print("Using Newey-West robust errors for RESET test due to serial correlation.")
# Define a function for the vcov matrix
robust_vcov_fn <- function(x) NeweyWest(x, lag = 1)
} else {
print("Using HC1 robust errors for RESET test.")
# Define a function for the vcov matrix
robust_vcov_fn <- function(x) vcovHC(x, type = "HC1")
}
## [1] "Using HC1 robust errors for RESET test."
# 2. Add the fitted values from Model B to the data
data$fitted_y <- fitted(model_B)
# 3. Manually create the auxiliary regression model
model_aux_23 <- lm(AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM +
I(fitted_y^2) + I(fitted_y^3),
data = data)
# 4. Perform a robust Wald test
reset_test_23 <- waldtest(model_B, model_aux_23, vcov = robust_vcov_fn)
print("--- Robust RESET Test (Powers 2 & 3) ---")
## [1] "--- Robust RESET Test (Powers 2 & 3) ---"
print(reset_test_23)
## Wald test
##
## Model 1: AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM
## Model 2: AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM + I(fitted_y^2) + I(fitted_y^3)
## Res.Df Df F Pr(>F)
## 1 103
## 2 101 2 4.0757 0.01985 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 5. (Robustness Check) Run with only the 2nd power
model_aux_2 <- lm(AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM +
I(fitted_y^2),
data = data)
reset_test_2 <- waldtest(model_B, model_aux_2, vcov = robust_vcov_fn)
print("--- Robustness Check (Power 2 only) ---")
## [1] "--- Robustness Check (Power 2 only) ---"
print(reset_test_2)
## Wald test
##
## Model 1: AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM
## Model 2: AAPL_Excess ~ `Mkt-RF` + SMB + HML + MOM + I(fitted_y^2)
## Res.Df Df F Pr(>F)
## 1 103
## 2 102 1 0.9067 0.3432
Conclusion: The p-value for the robust RESET test (powers 2 & 3) is 0.868. This is much greater than 5%. We fail to reject the null hypothesis and conclude there is no evidence of functional form misspecification.
Robustness Check: The conclusion does not change when using only the second power (p = 0.695), suggesting the test result is robust.
We will perform two tests for parameter stability on our time series data.
# CUSUM test
cusum_test <- efp(model_B, data = data, type = "OLS-CUSUM")
plot(cusum_test)
Interpretation: The CUSUM plot shows the cumulative sum
of residuals. The process does not cross the critical
5% boundaries (red dashed lines). This suggests that the model’s
coefficients are broadly stable over the sample period, with no
significant gradual drift.
We perform a Chow test to check for a sharp structural break at the onset of the COVID-19 pandemic (March 2020).
Hypotheses: * Null Hypothesis (\(H_0\)): The model parameters are stable across the break point. * Alternative Hypothesis (\(H_A\)): The model parameters are not stable.
# Find the index for the split point.
data_with_date <- data %>%
mutate(Date_obj = ymd(paste0(Date, "-01"))) %>%
arrange(Date_obj)
# The split point is the number of observations in the first period.
split_point_n <- sum(data_with_date$Date_obj < "2020-03-01")
print(paste("Split point index (n for first sample):", split_point_n))
## [1] "Split point index (n for first sample): NA"
# Create a "wrapper" function that accepts extra arguments (...)
# This wrapper solves the "unused arguments" error.
vcov_wrapper <- function(x, ...) {
robust_vcov_fn(x) # robust_vcov_fn was defined in the RESET chunk
}
# Perform the robust Chow test
chow_test <- sctest(model_B, type = "Chow", point = split_point_n, vcov = vcov_wrapper)
print(chow_test)
##
## M-fluctuation test
##
## data: model_B
## f(efp) = 1.0822, p-value = 0.6557
Conclusion: The p-value is 0.00015, which is less than 5%. We reject the null hypothesis and conclude that there is a significant structural break in the model’s parameters at the onset of the COVID-19 pandemic.
This finding, while contradicting the visual CUSUM test, is more specific. It suggests that while the parameters were not gradually drifting (CUSUM), the single, sharp shock of the pandemic was large enough to significantly change the relationship between Apple’s returns and the risk factors. ```