2026-03-09

library(ggplot2)
library(plotly)
set.seed(123)

Introduction

  • Simple Linear Regression models relationship between two variables
  • Used for prediction and explanation
  • Example: Predict salary from years of experience

The Linear Model

\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]

  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon_i\) = random error

Least Squares Estimation

\[ \hat{\beta}_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})} {\sum (X_i - \bar{X})^2} \]

\[ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} \]

Simulated Example Data

x <- rnorm(100, 10, 2)
y <- 5 + 2*x + rnorm(100, 0, 3)
df <- data.frame(x, y)

Scatterplot (ggplot #1)

ggplot(df, aes(x=x, y=y)) +
  geom_point() +
  theme_minimal()

Regression Line (ggplot #2)

ggplot(df, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE, color="red") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

3D Interactive Plot (plotly)

model <- lm(y ~ x, data=df)
df$pred <- predict(model)

plot_ly(df,
        x = ~x,
        y = ~y,
        z = ~pred,
        type = "scatter3d",
        mode = "markers")

Model Fitting Code

model <- lm(y ~ x, data=df)
summary(model)

Interpretation

  • The slope tells us how much Y changes for one unit increase in X
  • \(R^2\) measures how well the model fits the data
  • Used in economics, biology, data science, engineering