Simple linear regression models the relationship between two variables by fitting a straight line to the data. The model can be written as:
% The simple linear regression model: \[ y = \beta_0 + \beta_1 x + \epsilon \]
2025-03-16
Simple linear regression models the relationship between two variables by fitting a straight line to the data. The model can be written as:
% The simple linear regression model: \[ y = \beta_0 + \beta_1 x + \epsilon \]
Here are a few other formulas that were used:
% The formula to estimate the slope: \[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]
% The formula to estimate the intercept: \[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Below is a scatter plot of mpg versus weight (wt). We add a regression line using ggplot2 to visualize the trend.
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
A residual plot helps us check the assumptions of our regression model. Below is a ggplot2 residual plot.
An interactive 2D plot using Plotly by converting our ggplot scatter plot. This plot allows you to hover over points and explore details interactively.
Here’s an additional 3D scatter plot using three variables from mtcars: mpg, wt, and hp. This provides an extra perspective on the data.
plot_ly(data = mtcars, x = ~wt, y = ~mpg, z = ~hp,
type = "scatter3d", mode = "markers",
marker = list(size = 5, color = "orange")) %>%
layout(title = "3D Scatter Plot: mpg vs wt vs hp",
scene = list(xaxis = list(title = "Weight (wt)"),
yaxis = list(title = "MPG"),
zaxis = list(title = "Horsepower (hp)")))
In this presentation, we covered the fundamentals of simple linear regression using the mtcars dataset. We reviewed:
The regression equation and estimation formulas Exploratory analysis with ggplot2 Model fitting and residual diagnostics Interactive visualizations using Plotly, including a bonus 3D plot This approach not only demonstrates the statistical technique but also shows how to create engaging and interactive presentations in R.