2025-10-26

Simple Linear Regression

Simple Linear Regression is a linear regression method to find a linear function to nearly project dependent variable values in a function of an independent variable.

The most common form is: \[ y = ax + b \]

And the general form is: \[ y_i = \alpha + \beta x_i + \varepsilon_i \]

With:

\(\alpha\) as the x-interception of the function

\(\beta\) as the slope of the function

and \(\varepsilon_i\) as the error term

Estimation

To generate the best linear fit, it is important to estimate the two variables \(\alpha\) and \(\beta\)

The needed parameters are estimated under the Ordinary Least Square (OLS).

We have the objective function Q as:

\[ Q(\alpha,\beta) = \sum_{i=1}^{n} (y_i-\alpha-\beta x_i)^2 \]

These estimates can be obtained in terms of \(\widehat{\alpha}\) and \(\widehat{\beta}\) as: \[ \widehat{\alpha} = \bar{y} - \widehat{\beta} \bar{x} \]

\[ \widehat{\beta} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^{n}(x_i - \bar{x})^2} \]

With: \(\bar{x}\) and \(\bar{y}\) are average of \(x_i\) and \(y_i\).

Examples of linear fits

For ggplot2: Given the built-in dataframe mtcars, determine the relationship between mpg (miles per gallon) and wt (weight) of 32 types of cars.

Same plot different tool

Same example but for plotly:

Pretty similar plots, but plotly is more versatile and ggplot2 is more presentation-friendly.

Codes:

data(mtcars)

# for ggplot2 plotting
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "darkcyan") +
  geom_smooth(method = "lm", color = "orange", linewidth = 1.4) +
  labs(x = "Weight", y = "Miles per gallon")

# for plotly plotting
linfit <- lm(mpg ~ wt, data = mtcars)
plot_ly(mtcars, x = ~wt, y = ~mpg,
        type = "scatter", mode = "markers") |>
  add_trace(x = ~wt, y = fitted(linfit), mode = "lines", 
            line = list(color = "orange", width = 2.4))

Another example?

Now determine relationship between wt (weight) and disp (engine displacement) to see if weight is proportional to disp or not.

Code

ggplot(data = mtcars, aes(x = wt, y = disp)) + 
  geom_point(color = "black") +
  geom_smooth(method = "lm", color = "red", linewidth = 1.6) +
  labs(x = "weight", y = "engine displacement")

It is easy to see that the larger the weight:

  • The lower the miles per gallon

  • The higher the engine displacement

The linear fit has made useful comparisons to mtcars categories.

Thank you for reading/listening