library(ggplot2)
library(plotly)
library(dplyr)

set.seed(1)

x <- seq(1,10,length=50)
y <- 3 + 2*x + rnorm(50,0,2)

data <- data.frame(x,y)

What is Simple Linear Regression?

Simple linear regression models the relationship between one predictor variable and one response variable.

Examples include:

  • Study time vs exam score
  • Advertising vs sales
  • Temperature vs electricity usage

Regression helps us predict outcomes and understand relationships in data.

Linear Regression Model

The model is

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

Where

  • \(Y\) = response variable
  • \(X\) = predictor variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = random error

Least Squares Method

The regression line is estimated using the least squares method.

\[ \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

where

\[ \hat{y} = b_0 + b_1x \]

This minimizes the total squared prediction error.

Example Dataset

head(data)
##          x        y
## 1 1.000000 3.747092
## 2 1.183673 5.734634
## 3 1.367347 4.063437
## 4 1.551020 9.292602
## 5 1.734694 7.128403
## 6 1.918367 5.195798

This dataset approximately follows

\[ y = 3 + 2x + error \]

Scatter Plot (ggplot)

ggplot(data, aes(x=x, y=y)) +
  geom_point(size=3) +
  theme_minimal() +
  labs(title="Scatter Plot of Data",
       x="Predictor (X)",
       y="Response (Y)")

Regression Line (ggplot)

ggplot(data, aes(x=x, y=y)) +
  geom_point(size=3) +
  geom_smooth(method="lm", se=FALSE) +
  theme_minimal() +
  labs(title="Linear Regression Fit")
## `geom_smooth()` using formula = 'y ~ x'

3D Visualization (Plotly)

z <- 3 + 2*x + rnorm(50)

plot_ly(
  x = ~x,
  y = ~y,
  z = ~z,
  type = "scatter3d",
  mode = "markers"
)

Why Regression Matters

Linear regression is widely used in:

  • economics
  • engineering
  • biology
  • finance
  • machine learning

It helps us understand relationships and make predictions.

Conclusion

Simple linear regression models relationships between variables.

The model

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

is one of the most important tools in statistics.