March 17, 2024

Introduction to Simple Linear Regression

Objectives of Simple Linear Regression:

  1. Determining if a statistically significant relationship between two variables exists.

  2. Forecasting unobserved variables based on the relationship between the variables.

Role of Variables:

  1. Independent Variable: the variable that is changed to see the effect that it has on the dependent variable.

  2. Dependent Variable: the variable whose value we want to forecast that depends on the independent variable.

Simple Linear Regression Equation

With perfect data, a linear equation is used to convey a relationship.

\(\text{Linear Equation} : y = \beta_0 + \beta_1\cdot x\)

But real world data is not linear so a simple linear regression model is needed to help forecast a relationship by trying to minimize all errors between the actual and estimated data.

\(\text{Simple Linear Regression Model} : y = \beta_0 + \beta_1\cdot x + \varepsilon\)

\(x = \text{Independent Variable}\)

\(y = \text{Dependent Variable}\)

\(\beta_0 = \text{constant or intercept}\)

\(\beta_1 = \text{slope for the } x\)

\(\varepsilon = \text{error term}\)

Basics of Simple Linear Regression Model

Steps of Graphing a Linear Regression Model

Step 1: Compute the Simple Linear Regression Model Equation.

\(y = \beta_0 + \beta_1\cdot x + \varepsilon\)

  • The summary function in R helps to give estimates for the Linear Regression Model Function

Step 2: Plot all of the real data points in a scatter plot.

Step 3: Plot the regression line (line of best fit) to show the regression relationship between the two variables.

Example of Simple Linear Regression

The mtcars data set can be used to compare the mpg and the weight of the engine.

Using the summary function give the information of how the data set is fitting into the linear model:

cars_example <- lm(mpg~wt, data = mtcars)
summary(cars_example)
Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The Estimate column gives the information needed to create the Linear Regression Model:

\(y = 37.2851 + (-5.3445)\cdot x + \varepsilon\)

Plot of Simple Linear Regression

ggplot(mtcars, aes(mtcars$mpg, mtcars$wt)) + 
  geom_point() + geom_smooth(method ='lm', se = FALSE) + theme_bw() + 
  labs(x = "MPG", y = "Weight of the Engine", 
       title = "Linear Regression of MPG and Weight") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = 'bold'))

This simple linear regression plot shows a negative relationship between the MPG and weight of the engine. The information from the summary can be visually seen in this graph as the slope and intercept are shown in the blue line going through the data points to create a visual relationship between the two variables.

Interpreting the Error Term

Here is another example of a Simple Linear Regression with the cars data set that compares the speed and distance of a car:

Placing your mouse over the blue dots gives information on the actual data points about the speed and distance. The orange line gives the line of best fit which shows an estimated value. The distance between the actual data point and the regression line is the error term.

Interpreting Results

The direction of the regression line indicates whether or not the relationship between the two variables is positive or negative.

For example, in this final example of a linear regression model there is a positive relationship between the length of a Sepal and the length of a Petal within an iris. This means that as the indepedent variable increases, so does the dependent variable.

The opposite will occur if the regression line is pointing downwards, as that will be a negative relationship between the variables.

Conclusion

Overall Takeaways

  1. Simple Linear Regression: Assumes a linear relationship between two variables
  2. Predictions: Allows predictions based on data and analysis
  3. Limitations: It is based on assumptions and there may be outliers


In Conclusion, simple linear regression helps to provide a framework to understand and predict the relationships that variables have with one another.

These models can be influential in helping companies analyze trends, forecast outcomes, and make decisions based on a set of data.