2024-10-24

Introduction

  • What is linear regression?

  • Estimates a linear relationship between variables

  • Typically used in 2-D plots

  • Represented as a straight line with some formula \(y = mx + b\)

  • Minimizes points’ distances from the projected line

  • Used to predict results from changing variables

Example of Linear Regression

Here is an example of what linear regression looks like on a 2-D plot. The orange line represents the estimated relationship between the variables. Observe that the value on the y-axis seems to increase as the value on the x-axis increases. This is demonstrated by the positive slope of our line.

Creating Our Own Plot 1

Now that you’ve seen the example, let’s make a data set of our own and do linear regression on it. We will be using the coding language R for this example.

This time, we’ll try to create two variables that will result in a negative linear relationship, meaning that the slope of our estimate line will be negative. Let’s start by assigning some values and making a data frame.

x = c(2, 3, 4, 6, 7, 8, 11)
y = c(10, 6, 8, 2, 5, 1, 2)
df = data.frame(x = x, y = y)

Creating Our Own Plot 2

As an extra step, we will be making this plot into an interactable plot using plotly. Here, we will make the graph using ggplot so that we can convert it into a plotly plot.

You can also just create a plotly plot directly, but it is nice to be aware of this method as well.

example = ggplot(df, aes(x = x, y = y)) + 
  geom_point(alpha=1) + 
  geom_smooth(formula = y ~ x, method="lm", se=F, color="orange")

# To make our plotly plot, we'll use the following code next:
# ggplotly(example)

Creating Our Own Plot 3

Finally, we’ll make our plot. It is also interactable, so try to hover over the line and the points to see what their exact values are.

Examining Our Plot

Notice how the slope of the regression line in this graph is negative, since the x and y variables have a negative linear relationship.

You can also observe that even though the line estimates the relationship between the variables, it does not actually intersect with any of the points. It is merely trying to find the best fitting line that comes the closest to all of the given points. This is the essence of what linear regression serves to do.

Should We Always Use It?

Here is an example of a graph that doesn’t quite fit with linear regression. It looks close, but we may have a better method to analyze our graph here.

An Alternative?

This is just a short snippet of the topic, but we can use something called exponential regression to create a better predictor for our graph.

Why?

In our second graph, our line was far closer to all of the points than with the linear line in our first graph. This demonstrates that the line in the second graph would be a better predictor for the relationship of our variables.

The second graph was using exponential regression, which basically assumes that the relationship between the variables is exponential. This means that as one of our variables increases, the other starts to increase slowly, then accelerates over time to increase faster and faster.

The equation for these exponential curves typically looks something like this, where y and x represent our variables:

\(y = mx^2 + b\)

How Do We Know Which to Use?

Just by looking at a large set of data, we can’t usually tell just with a glance which method of regression to use. In most scenarios, the relationship between the variables may not be obvious.

To determine which method we use, we can do things plotting the points out and seeing what general patterns we can find in the variables. We can also simply graph some regression models and seeing which one fits our plot the best.

Hopefully, you’ve learned how to use linear regression and gained some insight on what it does and when we should use it.