AE 06

Author
Affiliation

Moisieiev Vasyl

Kyiv School of Economics

Import

library(tidyverse)
library(tidyquant)
stocks <- tq_get(c("AAPL", "MSFT"), from = "2020-01-01", to = "2021-01-31") %>%
  select(date, symbol, open) %>%
  pivot_wider(names_from = symbol, values_from = open)

stocks

Exercise 1

Minimizing the sum of residuals, (_i e_i), in regression presents challenges because residuals can be both positive and negative. In linear regression models with an intercept, the model’s constraints ensure that the sum of residuals always equals zero. As a result, minimizing (_i e_i) does not differentiate between parameter sets, as many solutions yield the same minimum (often zero).

Even without such constraints, positive and negative residuals can cancel each other out. For example, a large positive error in one observation can offset a large negative error in another, making the sum of residuals an unreliable metric. This approach fails to address the magnitude of individual errors and instead focuses on their net balance.

An alternative is to minimize the sum of the absolute residuals, (_i |e_i|), known as least absolute deviations (LAD) regression. This method prioritizes reducing the size of individual errors, avoiding the issue of cancellation between positive and negative deviations.

Exercise 2

When we move points, the regression line dynamically adjusts in real-time. Dragging a point away from the cluster causes the line to tilt toward it, highlighting the significant impact of outliers on the best-fit line. Clusters of points provide stability, keeping the line relatively unchanged, while isolated points can cause drastic shifts. The line continuously repositions itself to minimize the total squared distances between all points and the line.

Exercise 4

ggplot(stocks, aes(x = MSFT, y = AAPL)) +
  geom_point() +
  geom_abline(slope = 0.5, intercept = -5) +
  geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
  labs(
    x = "MSFT Open",
    y = "AAPL Open",
    title = "Open prices of MSFT and AAPL",
    subtitle = "January 2020"
  )

model_fit <- lm(MSFT ~ AAPL, data = stocks)

summary(model_fit)

Call:
lm(formula = MSFT ~ AAPL, data = stocks)

Residuals:
     Min       1Q   Median       3Q      Max 
-21.1627  -5.2946  -0.2928   6.1697  22.9446 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 102.6814     2.2139   46.38   <2e-16 ***
AAPL          0.9423     0.0220   42.84   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.461 on 270 degrees of freedom
Multiple R-squared:  0.8717,    Adjusted R-squared:  0.8713 
F-statistic:  1835 on 1 and 270 DF,  p-value: < 2.2e-16

Exercise 5

summary_fit <- summary(model_fit)
tidy_summary <- as_tibble(summary_fit$coefficients, rownames = "term") %>%
  rename(
    estimate = Estimate,
    std.error = `Std. Error`,
    statistic = `t value`,
    p.value = `Pr(>|t|)`
  )

tidy_summary

Exercise 6

Based on the provided summary, we have:

  • \(\hat{\beta}_0\) (Intercept) = 103
  • \(\hat{\beta}_1\) (slope for AAPL) = 0.942

Plugging these values into the equation:

\[ \hat{y} = 103 + 0.942 \times x \]

Here, \(\hat{y}\) represents the predicted Microsoft opening price given Apple’s opening price \(x\).

Exercise 7

  • \(\hat{\beta}_0 = 103\) (Intercept): This suggests that if Apple’s opening price were $0 (a purely hypothetical scenario), Microsoft’s predicted opening price would be $103. While this situation never occurs in reality, the intercept serves as the baseline from which changes in Apple’s price influence Microsoft’s price.

  • \(\hat{\beta}_1 = 0.942\) (Slope): This means that for every $1 increase in Apple’s opening price, Microsoft’s opening price is expected to rise by about $0.94 on average. In other words, when Apple’s price goes up, Microsoft’s price generally moves up as well, reflecting a positive relationship between the two.

Bonus exercise

If we have the model:

\[ \hat{y}_{MSFT} = 103 + 0.942 \times x_{AAPL} \]

and you know Microsoft’s opening price (\(\hat{y}_{MSFT}\)) is 166, you can solve for Apple’s opening price (\(x_{AAPL}\)):

\[ 166 = 103 + 0.942 \times x_{AAPL} \]

Subtract 103 from both sides:

\[ 166 - 103 = 0.942 \times x_{AAPL} \]

\[ 63 = 0.942 \times x_{AAPL} \]

Divide both sides by 0.942:

\[ x_{AAPL} = \frac{63}{0.942} \approx 66.9 \]

intercept <- 103
slope <- 0.942
msft_price <- 166

aapl_predicted <- (msft_price - intercept) / slope
aapl_predicted
[1] 66.87898