library(tidyverse)
library(tidyquant)
AE 06
Import
<- tq_get(c("AAPL", "MSFT"), from = "2020-01-01", to = "2021-01-31") %>%
stocks select(date, symbol, open) %>%
pivot_wider(names_from = symbol, values_from = open)
stocks
Exercise 1
Minimizing the sum of residuals, (_i e_i), in regression presents challenges because residuals can be both positive and negative. In linear regression models with an intercept, the model’s constraints ensure that the sum of residuals always equals zero. As a result, minimizing (_i e_i) does not differentiate between parameter sets, as many solutions yield the same minimum (often zero).
Even without such constraints, positive and negative residuals can cancel each other out. For example, a large positive error in one observation can offset a large negative error in another, making the sum of residuals an unreliable metric. This approach fails to address the magnitude of individual errors and instead focuses on their net balance.
An alternative is to minimize the sum of the absolute residuals, (_i |e_i|), known as least absolute deviations (LAD) regression. This method prioritizes reducing the size of individual errors, avoiding the issue of cancellation between positive and negative deviations.
Exercise 2
When we move points, the regression line dynamically adjusts in real-time. Dragging a point away from the cluster causes the line to tilt toward it, highlighting the significant impact of outliers on the best-fit line. Clusters of points provide stability, keeping the line relatively unchanged, while isolated points can cause drastic shifts. The line continuously repositions itself to minimize the total squared distances between all points and the line.
Exercise 4
ggplot(stocks, aes(x = MSFT, y = AAPL)) +
geom_point() +
geom_abline(slope = 0.5, intercept = -5) +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
x = "MSFT Open",
y = "AAPL Open",
title = "Open prices of MSFT and AAPL",
subtitle = "January 2020"
)
<- lm(MSFT ~ AAPL, data = stocks)
model_fit
summary(model_fit)
Call:
lm(formula = MSFT ~ AAPL, data = stocks)
Residuals:
Min 1Q Median 3Q Max
-21.1627 -5.2946 -0.2928 6.1697 22.9446
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 102.6814 2.2139 46.38 <2e-16 ***
AAPL 0.9423 0.0220 42.84 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.461 on 270 degrees of freedom
Multiple R-squared: 0.8717, Adjusted R-squared: 0.8713
F-statistic: 1835 on 1 and 270 DF, p-value: < 2.2e-16
Exercise 5
<- summary(model_fit)
summary_fit <- as_tibble(summary_fit$coefficients, rownames = "term") %>%
tidy_summary rename(
estimate = Estimate,
std.error = `Std. Error`,
statistic = `t value`,
p.value = `Pr(>|t|)`
)
tidy_summary
Exercise 6
Based on the provided summary, we have:
- \(\hat{\beta}_0\) (Intercept) = 103
- \(\hat{\beta}_1\) (slope for AAPL) = 0.942
Plugging these values into the equation:
\[ \hat{y} = 103 + 0.942 \times x \]
Here, \(\hat{y}\) represents the predicted Microsoft opening price given Apple’s opening price \(x\).
Exercise 7
\(\hat{\beta}_0 = 103\) (Intercept): This suggests that if Apple’s opening price were $0 (a purely hypothetical scenario), Microsoft’s predicted opening price would be $103. While this situation never occurs in reality, the intercept serves as the baseline from which changes in Apple’s price influence Microsoft’s price.
\(\hat{\beta}_1 = 0.942\) (Slope): This means that for every $1 increase in Apple’s opening price, Microsoft’s opening price is expected to rise by about $0.94 on average. In other words, when Apple’s price goes up, Microsoft’s price generally moves up as well, reflecting a positive relationship between the two.
Bonus exercise
If we have the model:
\[ \hat{y}_{MSFT} = 103 + 0.942 \times x_{AAPL} \]
and you know Microsoft’s opening price (\(\hat{y}_{MSFT}\)) is 166, you can solve for Apple’s opening price (\(x_{AAPL}\)):
\[ 166 = 103 + 0.942 \times x_{AAPL} \]
Subtract 103 from both sides:
\[ 166 - 103 = 0.942 \times x_{AAPL} \]
\[ 63 = 0.942 \times x_{AAPL} \]
Divide both sides by 0.942:
\[ x_{AAPL} = \frac{63}{0.942} \approx 66.9 \]
<- 103
intercept <- 0.942
slope <- 166
msft_price
<- (msft_price - intercept) / slope
aapl_predicted aapl_predicted
[1] 66.87898