2024-10-20

What is Simple Linear Regression?

Simple Linear Regression is a statistical method that models the relationship between: - One independent variable (predictor) - \(X\) - One dependent variable (response) - \(Y\)

The model takes the form: \[Y = \beta_0 + \beta_1X + \epsilon\]

Where: - \(\beta_0\) is the y-intercept - \(\beta_1\) is the slope - \(\epsilon\) is the error term

Mathematical Foundation

The method of least squares finds the best-fitting line by minimizing: \[\sum_{i=1}^n (y_i - \hat{y}_i)^2\]

Where: - \(y_i\) are the observed values - \(\hat{y}_i\) are the predicted values

The estimators are: \[\beta_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}\] \[\beta_0 = \bar{y} - \beta_1\bar{x}\]

Creating Example Data

set.seed(123)
n <- 100
x <- runif(n, 0, 10)
y <- 2 + 3 * x + rnorm(n, 0, 2)
data <- data.frame(x = x, y = y)

# Fit the model
model <- lm(y ~ x, data = data)

Basic Scatter Plot with ggplot2

ggplot(data, aes(x = x, y = y)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE) +
  theme_minimal() +
  labs(title = "Simple Linear Regression",
       x = "Independent Variable (X)",
       y = "Dependent Variable (Y)")

Residuals Plot with ggplot2

3D Visualization with Plotly

Model Summary and Interpretation

The fitted model equation is: \[\hat{y} = 1.98 + 2.98x\]

Key Statistics: - R-squared: 0.951 - p-value: < 2.22e-16

Code Example

Here’s how to fit a simple linear regression model in R:

# Fit linear regression model
model <- lm(y ~ x, data = data)

# View model summary
summary(model)

# Make predictions
new_data <- data.frame(x = c(5, 6, 7))
predictions <- predict(model, newdata = new_data)