03/13/2025

Slide 1: Introduction

Welcome to this presentation on Linear Regression.

In this talk, we will: - Introduce the fundamentals of linear regression, - Explain its mathematical foundation using key formulas, - Demonstrate practical examples using the built-in mtcars dataset, - Showcase visualizations with both ggplot2 and an interactive 3D plotly plot, - Discuss model diagnostics and conclude with insights for future analysis.

This project is part of the DAT 301 Homework 3 assignment. By the end of this presentation, you will have a clear understanding of how linear regression is implemented in R and how its results can be visualized and interpreted.

Slide 2: What is Linear Regression?

Linear regression is a statistical method used to model the relationship between a dependent variable (response) and one or more independent variables (predictors). In simple linear regression, the relationship is modeled using a straight line.

The model is represented by the equation: \[ y = \beta_0 + \beta_1 x + \epsilon \]

where: - \(y\) is the dependent variable, - \(x\) is the independent variable, - \(\beta_0\) is the intercept, - \(\beta_1\) is the slope, and - \(\epsilon\) is the error term that accounts for variability not explained by the model.

This technique is widely used across various fields to predict outcomes and analyze trends.

Slide 3: Mathematical Foundation

The goal of linear regression is to minimize the sum of squared residuals (errors) between the observed and predicted values. This is achieved by finding the best-fitting line.

The objective is: \[ \min_{\beta_0, \beta_1} \sum_{i=1}^{n} \left(y_i - \beta_0 - \beta_1 x_i\right)^2 \]

The formulas to calculate the coefficients are: \[ \hat{\beta_1} = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \] \[ \hat{\beta_0} = \bar{y} - \hat{\beta_1}\bar{x} \]

These equations form the foundation of simple linear regression.

Slide 4: Data Overview

We will use the built-in mtcars dataset to illustrate linear regression.
Below is a summary of the dataset:

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Slide 5: Regression Analysis and Diagnostics

Let’s visualize the relationship between car weight (wt) and miles per gallon (mpg), and then examine the model’s diagnostics.

Scatterplot with Regression Line (ggplot2):

library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "MPG vs Weight with Regression Line",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon")

Residuals vs Fitted Plot (ggplot2):

# Fit the linear model
model <- lm(mpg ~ wt, data = mtcars)
# Create a data frame with fitted values and residuals
model_data <- data.frame(Fitted = fitted(model), Residuals = residuals(model))
ggplot(model_data, aes(x = Fitted, y = Residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted (ggplot2)", x = "Fitted Values", y = "Residuals")

Slide 6: Interactive 3D Plot (plotly)

For an interactive exploration, view this 3D scatter plot that shows the relationship between weight, mpg, and horsepower.

library(plotly)
p <- plot_ly(data = mtcars, x = ~wt, y = ~mpg, z = ~hp,
             type = "scatter3d", mode = "markers") %>%
     layout(title = "3D Scatter Plot: Weight, MPG, and Horsepower")
p

Slide 7: Conclusion

In summary, we have reviewed the fundamentals of linear regression, including its mathematical basis and practical application.

Recall the regression model: \[ y = \beta_0 + \beta_1 x + \epsilon \]

Thank you for your attention!

Slide 8: Questions and Future Directions

Feel free to ask any questions you might have.

Future Directions:

  • Extending the analysis to multiple linear regression.
  • Investigating non-linear relationships.
  • Using other datasets to validate the model in different contexts.

Your feedback and questions are welcome!