Modeling relationship through linear regression

2025-11-16

Linear Regression (LaTeX Math)

The underlying population relationship is represented by the equation: \[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

Where:

\(y_i\): The \(i^{th}\) is an iteration of the measured variable
\(\beta_0\): The Y-intercept.
\(\beta_1\): The slope.
\(\epsilon_i\): The some level of error for the iteration.

Error (Latex math 2)

\[SSE = \sum_{i=1}^n (y_i - \hat{y}_i)^2\]

We use this formula to calculate our slope (\(\hat{\beta}_1\)): \[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\] This is equivalent to the ratio of the sample covariance of \(x\) and \(y\) to the sample variance of \(x\).

Fitting the Model (R Code)

Finally after clarifying the key aspects of a linear regression, its possible to model car’s mileages in comparison to their weights through R’s mtcar dataset.

#Creates a linear model and returns the Y intercept and the slope

lr = lm(mpg ~ wt, data = mtcars)
coefficients(lr)

## (Intercept)          wt 
##   37.285126   -5.344472

Through the Mtcars dataset, it’s modeled that for every ton mileage decreases by about 5.34 mpgs.

Regression Line (ggplot 1)

This is a scatter plot displaying the data and regression line.

## `geom_smooth()` using formula = 'y ~ x'

Residuals (ggplot 2)

Interactive 3D Scatter plot (plotly)

Final Interpretations

It can be taken from the “Residuals” that the line was indeed of best fit and assures that graph has some level of variability that does not have solid grounds to be parabolic.
Additionally, in regards to the 3D scatterplot, it can be made of note that there is some trend between horsepower in comparison to mileage with an inverse relationship between the two. It can be assumed however that horsepower is correlated to heavy vehicles which could be a found due to the need of higher strength, but that would need to be explored more.