What is Simple Linear Regression?

Simple Linear Regression is a method to explore the linear relationship between a dependent variable and a single independent variable.

The Regression Formula

The equation is:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where: - \(y\) is the predicted outcome - \(x\) is the input or predictor - \(\beta_0\) is the intercept - \(\beta_1\) is the slope - \(\epsilon\) is the error term

Real-Life Example

Suppose we want to predict a person’s weight based on their height.

We’ll use a small dataset of made-up values to illustrate.

ggplot: Visualizing the Data

library(ggplot2)
data <- data.frame(
  height = c(150, 155, 160, 165, 170, 175, 180),
  weight = c(50, 53, 58, 61, 65, 69, 72)
)

ggplot(data, aes(x = height, y = weight)) +
  geom_point(color = "blue", size = 3) +
  labs(title = "Height vs Weight", x = "Height (cm)", y = "Weight (kg)") +
  theme_minimal()

ggplot: Adding Regression Line

model <- lm(weight ~ height, data = data)

ggplot(data, aes(x = height, y = weight)) +
  geom_point(color = "blue", size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Regression Line", x = "Height (cm)", y = "Weight (kg)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Math Behind the Model

The slope:

\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

The intercept:

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]

R Code for Linear Model

summary(model)
## 
## Call:
## lm(formula = weight ~ height, data = data)
## 
## Residuals:
##       1       2       3       4       5       6       7 
##  0.1071 -0.6429  0.6071 -0.1429  0.1071  0.3571 -0.3929 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -62.60714    2.94000  -21.30 4.23e-06 ***
## height        0.75000    0.01779   42.17 1.41e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4706 on 5 degrees of freedom
## Multiple R-squared:  0.9972, Adjusted R-squared:  0.9966 
## F-statistic:  1778 on 1 and 5 DF,  p-value: 1.415e-07

This gives you coefficients, p-values, and R-squared to judge how well the model fits.

Plotly: Simulated 3D Regression

library(plotly)
set.seed(1)
x <- rnorm(100)
y <- rnorm(100)
z <- 4 + 2*x + 3*y + rnorm(100)

plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers",
        marker = list(size = 3, color = z, colorscale = 'Viridis')) %>%
  layout(title = "3D Linear Regression (Simulated)")

Summary

  • Simple Linear Regression helps us predict outcomes based on one variable.
  • It’s useful in many real-life situations like salary prediction, crop yield, house prices, and more.
  • Tools like R make building and visualizing models super easy!