2024-11-11

Introduction to Linear Regression

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable’s value is called the independent variable.

This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable.

What is Linear Regression?

Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values.

Correlation Formula

The formula for the correlation coefficient between the quarter mile time (qsec) and horsepower (hp) is:

\[ r = \tiny{\frac{\sum ( \text{qsec}_i - \overline{\text{qsec}})( \text{hp}_i - \overline{\text{hp}})}{\sqrt{\sum ( \text{qsec}_i - \overline{\text{qsec}})^2 \sum ( \text{hp}_i - \overline{\text{hp}})^2}}} \]

Plot of Horsepower vs. Number of Cylinders

The plot below shows the relationship between horsepower and the quarter mile time in seconds. A linear regression line is fitted to the data.

Linear Regression Formula Between hp and mpg

The relationship between horsepower (hp) and miles per gallon (mpg) can be modeled with a linear regression:

\[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{hp} \]

Plot of Horsepower vs. Miles Per Gallon

Here we show the linear regression between horsepower and miles per gallon (mpg). A fitted line illustrates the relationship.

3D Plot Overview

In this 3D plot, we visualize the relationship between three variables: Miles per Gallon, Horsepower, and Weight. The axes represent each variable, and the plot provides insight into how they interact together in the dataset.

3D Plot Code

Below is the R code used to generate the 3D plot of Miles per Gallon (mpg), Horsepower (hp), and Weight (wt):

plot_ly(mtcars, x = ~mpg, y = ~hp, z = ~wt, 
        type = 'scatter3d', mode = 'markers',
        marker = list(size = 5, color = 'blue')) %>%
  layout(scene = list(
    xaxis = list(title = 'Miles Per Gallon'),
    yaxis = list(title = 'Horsepower'),
    zaxis = list(title = 'Weight')))

Conclusion

Linear regression helps us predict a dependent variable based on one or more independent variables. The correlation coefficient is used to assess the strength and direction of the linear relationship between variables, and it plays a key role in estimating the regression coefficients.