2025-03-16

What is Simple Linear Regression?

Simple linear regression models the relationship between two variables by fitting a straight line to the data. The model can be written as:

% The simple linear regression model: \[ y = \beta_0 + \beta_1 x + \epsilon \]

Another formulas used

Here are a few other formulas that were used:

% The formula to estimate the slope: \[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]

% The formula to estimate the intercept: \[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

The mtcars Dataset

The mtcars dataset is built into R and contains data extracted from the 1974 Motor Trend US magazine.

Key variables for our analysis include:

  • mpg: Miles per gallon (response variable)
  • wt: Weight (predictor variable)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Exploratory Data Analysis with ggplot2

Below is a scatter plot of mpg versus weight (wt). We add a regression line using ggplot2 to visualize the trend.

Fitting the Simple Linear Regression Model

We now fit a simple linear regression model where mpg is predicted by wt. The R code below fits the model and prints the summary.
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Residual Analysis with ggplot2

A residual plot helps us check the assumptions of our regression model. Below is a ggplot2 residual plot.

Interactive Visualization with Plotly

An interactive 2D plot using Plotly by converting our ggplot scatter plot. This plot allows you to hover over points and explore details interactively.

3D Visualization

Here’s an additional 3D scatter plot using three variables from mtcars: mpg, wt, and hp. This provides an extra perspective on the data.

Code Used for Plots (3D Visualization)

plot_ly(data = mtcars, x = ~wt, y = ~mpg, z = ~hp, 
        type = "scatter3d", mode = "markers",
        marker = list(size = 5, color = "orange")) %>%
  layout(title = "3D Scatter Plot: mpg vs wt vs hp",
         scene = list(xaxis = list(title = "Weight (wt)"),
                      yaxis = list(title = "MPG"),
                      zaxis = list(title = "Horsepower (hp)")))

Conclusion

In this presentation, we covered the fundamentals of simple linear regression using the mtcars dataset. We reviewed:

The regression equation and estimation formulas Exploratory analysis with ggplot2 Model fitting and residual diagnostics Interactive visualizations using Plotly, including a bonus 3D plot This approach not only demonstrates the statistical technique but also shows how to create engaging and interactive presentations in R.