2024-04-06

Introduction To Simple Linear Regression

  • Simple linear regression is a statistical method used to model the relationship between two variables.
  • Simple Linear Regression models the relationship between a dependent variable (Y) and an independent variable (X) using a linear equation.
  • It assumes that there is a linear relationship between the two variables.

Theoretical Background

  • The simple linear regression model can be represented as: \[ Y = \beta_0 + \beta_1 X + \varepsilon \]

    Where:

    • \(Y\) is the dependent variable
    • \(X\) is the independent variable
    • \(\beta_0\) is the intercept term
    • \(\beta_1\) is the slope coefficient
    • \(\varepsilon\) is the error term

Example Dataset

  • We’ll use the mtcars dataset, which contains various attributes of cars.
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scatter Plot Visualization

  • Let’s create a scatter plot to visualize the relationship between miles per gallon (mpg) and car weight.

Simple Linear Regression Analysis

  • Now, let’s fit a simple linear regression model to predict miles per gallon (mpg) based on horsepower (hp).

How the Previous Graph was Generated (R)

# Fit a simple linear regression model
model <- lm(mpg ~ wt, data = mtcars)

# Get regression line data
regression_line <- data.frame(wt = seq(min(mtcars$wt), max(mtcars$wt), 
                                       length.out = 100))
regression_line$mpg <- predict(model, newdata = regression_line)

# Plot with regression line
regression_plot <- ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_line(data = regression_line, aes(x = wt, y = mpg), 
            color = "red") +  
  labs(x = "Weight (1000 lbs)", y = "Miles per Gallon") +
  ggtitle("Regression Line of MPG vs. Weight")

# Print the plots
print(regression_plot)

Evaluating the Model

# Coefficient estimates
coefficients(model)
## (Intercept)          wt 
##   37.285126   -5.344472
# R-squared value
summary(model)$r.squared
## [1] 0.7528328
  • The coefficient estimate for car weight \(\beta_1\) is -5.344472, thereby indicating a negative relationship between weight and fuel efficiency.
  • The \(r^2\) value of the model is 0.7528328, suggesting that the model fits the data relatively well.

3D Scatterplot of MPG vs. Horsepower and Weight

Conclusion

  • Simple linear regression provides a useful tool for understanding the relationship between two variables.
  • It allows us to make predictions and assess the strength and direction of the relationship.
  • However, it’s important to check the assumptions of the model, such as linearity, normality, and constant variance of errors.