2026-03-09

What is Simple Linear Regression?

Simple regression is used to establish if a relationship exist between two variables and use that relationship to estimate unknown values.

There are two variables:

x: independent variable

y: dependent variable

x is used to predict the y value.

How it is Calculated?

Simple Linear Regression: \[ y = \beta_0 + \beta_1 x \] Simple Linear Regression with random error: \[ y = \beta_0 + \beta_1 x + \epsilon \] \(y\) = dependent variable

\(x\) = independent variable

\(\beta_0\) = intercept

\(\beta_1\) = slope

\(\epsilon\) = random error, added to minimize the distance from the predicted and the actual values since they are not always equal.

How Intercept and Slope are Calculated

\[ \beta_1 = \frac{\sum_{i=1}^n(x_i - \bar{x}) (y_i - \bar{y})} {\sum_{i=1}^n(x_i - \bar{x})^2} \]

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \] \(\beta_0\) = intercept

\(\beta_1\) = slope

Scatter plot(ggplot)

Scatter plot of mtcars data set comparing HP vs MPG:

Scatter plot(ggplot) with Linear Regression

Scatter plot of HP vs MPG with linear regression line:

The shaded region represents the confidence interval. This tells us how confident we are in the line at that point. Wider = less confident.

Scatter plot(plotly) with Linear Regression

Scatter plot of HP vs MPG with regression line in plotly:

Drag to zoom. Hover to see exact data point values.

Code Used for the plotly Scatter Plot

mod = lm(data = mtcars, mtcars$mpg~mtcars$hp)
plot_ly(data = mtcars, x=~hp, y = ~mpg, type = 'scatter',
            mode = 'markers', name='Points') %>%
  add_lines(x = ~hp, y = fitted(mod), name = 'Linear Regression Line') %>%
  layout(title = "HP vs MPG Scatter Plot")

I set mod equal to the best linear fit for mpg based on hp using the lm method. I then use plotly with same data using hp for x values and y for mpg values. Then I set the type to specify that want a scatter plot. The name set in polt_ly defines the name of the scatter points in the legend. Then I use add_lines to add the linear regression line using hp for x and using fitted(mod) to calculate the expected values for y(mpg). I also give this a name so it can be interpreted easier. I then add a title in the layout that explains what the plot is.