Simple regression is used to establish if a relationship exist between two variables and use that relationship to estimate unknown values.
There are two variables:
x: independent variable
y: dependent variable
x is used to predict the y value.
2026-03-09
Simple regression is used to establish if a relationship exist between two variables and use that relationship to estimate unknown values.
There are two variables:
x: independent variable
y: dependent variable
x is used to predict the y value.
Simple Linear Regression: \[ y = \beta_0 + \beta_1 x \] Simple Linear Regression with random error: \[ y = \beta_0 + \beta_1 x + \epsilon \] \(y\) = dependent variable
\(x\) = independent variable
\(\beta_0\) = intercept
\(\beta_1\) = slope
\(\epsilon\) = random error, added to minimize the distance from the predicted and the actual values since they are not always equal.
\[ \beta_1 = \frac{\sum_{i=1}^n(x_i - \bar{x}) (y_i - \bar{y})} {\sum_{i=1}^n(x_i - \bar{x})^2} \]
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \] \(\beta_0\) = intercept
\(\beta_1\) = slope
Scatter plot of mtcars data set comparing HP vs MPG:
Scatter plot of HP vs MPG with linear regression line:
The shaded region represents the confidence interval. This tells us how confident we are in the line at that point. Wider = less confident.
Scatter plot of HP vs MPG with regression line in plotly:
Drag to zoom. Hover to see exact data point values.
mod = lm(data = mtcars, mtcars$mpg~mtcars$hp)
plot_ly(data = mtcars, x=~hp, y = ~mpg, type = 'scatter',
mode = 'markers', name='Points') %>%
add_lines(x = ~hp, y = fitted(mod), name = 'Linear Regression Line') %>%
layout(title = "HP vs MPG Scatter Plot")
I set mod equal to the best linear fit for mpg based on hp using the lm method. I then use plotly with same data using hp for x values and y for mpg values. Then I set the type to specify that want a scatter plot. The name set in polt_ly defines the name of the scatter points in the legend. Then I use add_lines to add the linear regression line using hp for x and using fitted(mod) to calculate the expected values for y(mpg). I also give this a name so it can be interpreted easier. I then add a title in the layout that explains what the plot is.