2024-11-17

Simple Linear Regression

The Simple Linear Regression is a statistical method used to analyze and illustrate relationships between two quantitative variables.

Elements of Linear Regression

  • \(Y = \beta_{0} + \beta_{1}X + \epsilon\)
  • X is the independent or predictor variable
  • Y is the dependent or response variable
  • \(\epsilon\) is the random error or deviation
  • \(\beta_{0}\) is the intercept of Y when X is 0.
  • \(\beta_{1}\) is the slope of the line.

Application of Simple Linear Regression

The data that will be analyzed for simple linear regression will be performed on the dataset: mtcars that is built into RStudios.

The relationship that is being looked at will be:

  • \(mpg = \beta_{0} + \beta_{1}*hp\)
  • Y: mpg (miles per gallon)
  • X: hp (horsepower)

Getting Linear Regression

The regression formula based off the data using the following commands shown below the formula will be:

  • \(mpg = 30.0989 - 0.0682*hp\)
df = mtcars
mod = lm(mpg ~ hp, data = df)
summary(mod)
## 
## Call:
## lm(formula = mpg ~ hp, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Dataset of mtcars: hp vs mpg

\(mpg = 30.0989 - 0.0682*hp\)

Dataset of mtcars: fitted vs residual

There are some signs of heteroscedasticity shown in this graph here, but it is not confirmed. We want to aim for more of a homoscedasticity graph.

Dataset of mtcars: Q-Q plot

The dataset shown here of the residuals does not follow a normal distribution near the end tail (top-right).

Conclusion

In conclusion based off the data from fitted vs residual and Q-Q plot, the underlying assumptions of a simple linear regression can not be made about the relationship compared about HP vs MPG. There will need to be further actions of analysis to determine if the data follows simple linear regression.