Kamalesh Reddi Arugunta
2025-11-10
In this presentation, we explore Simple Linear
Regression, one of the most widely used tools in Statistics and
Machine Learning.
It helps us understand the relationship between two variables and make
predictions.
Simple linear regression models the relationship between a dependent variable \(y\) and an independent variable \(x\) using a straight line.
The model assumes: \[ y = \beta_0 + \beta_1 x + \epsilon \]
Where: - \(\beta_0\):
intercept
- \(\beta_1\): slope
- \(\epsilon\): random error term
We’ll use R’s built-in mtcars dataset.
It contains car performance data such as miles per gallon (mpg),
horsepower, and weight.
We’ll predict mpg (miles per gallon) using weight (wt).
The plot shows a negative relationship — heavier cars tend to have lower fuel efficiency.
We can fit a linear model in R using the lm()
function.
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
This code computes the regression coefficients \(\beta_0\) and \(\beta_1\).
From the output, we get an estimated equation like: \[ \hat{y} = 37.29 - 5.34x \]
Interpretation: - The intercept (37.29) means a car
with zero weight (theoretically) would have 37.29 mpg.
- The slope (-5.34) means for every 1000 lbs increase
in weight, mpg decreases by about 5.34.
## `geom_smooth()` using formula = 'y ~ x'
This plot shows the fitted regression line with confidence intervals.
Let’s visualize MPG, Weight, and Horsepower together in 3D.
This interactive 3D plot helps us visualize how both weight and horsepower impact fuel efficiency.
We estimate the parameters \(\beta_0\) and \(\beta_1\) using the least squares method, which minimizes: \[ \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
The formulas are: \[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x} \]
Let’s check how well our model fits.
## [1] 0.7528328
An \(R^2\) value close to 1 means a strong linear relationship.
?lmmtcars