2025-03-26

Introduction

  • Simple Linear Regression is a statistical method used to model the relationship between two continuous variables.
  • It assumes a linear relationship: \(Y = \beta_0 + \beta_1 X + \epsilon\), where:
    • \(Y\) is the dependent variable (response variable).
    • \(X\) is the independent variable (predictor variable).
    • \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(\epsilon\) represents error.
  • It is commonly used for predictive modeling, trend analysis, and understanding relationships between variables.

Why Use Simple Linear Regression?

  • To identify and quantify the relationship between two variables.
  • To make predictions based on historical data.
  • To assess the impact of one variable on another.
  • It is widely used in economics, engineering, finance, and healthcare for decision-making.

How is Simple Linear Regression Used?

  1. Collect Data: Obtain paired observations of the dependent and independent variables.
  2. Fit the Model: Estimate the parameters \(\beta_0\) and \(\beta_1\) using the least squares method.
  3. Evaluate the Model: Check goodness-of-fit using metrics like R-squared and residual analysis.
  4. Make Predictions: Use the regression equation to predict future values.

Example: Predicting House Prices

  • Suppose we want to predict house prices based on square footage.
  • We use a dataset with houses of different sizes and their respective prices.
  • The goal is to determine if house size significantly impacts price and to develop a model for prediction.

Scatter Plot with Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Residuals Plot

3D Visualization

3D Visualization Code

data$price_per_sqft <- data$price/data$size

p <- plot_ly(data, 
             x = ~size, 
             y = ~price, 
             z = ~price_per_sqft,
             type = 'scatter3d', 
             mode = 'markers',
             marker = list(size = 5)) %>% 
    layout(scene = list(
        xaxis = list(title = 'Size (sq ft)'),
        yaxis = list(title = 'Price ($)'),
        zaxis = list(title = 'Price/SqFt')
    ))

Conclusion

  • Simple Linear Regression helps model relationships between two variables.
  • Interpretation of coefficients provides insights into the relationship.
  • Model assumptions should always be checked before making predictions.
  • It is a powerful tool for prediction and decision-making in various fields.