2023-10-15

Simple Linear Regression

Simple linear regression is a statistical method that applies two variables to a straight line of best fit. It estimates the correlation between two quantitative variables.

It is well-liked for its simple application and interpretation, but is not ideal for accounting for outliers and other relationships.

Simple Linear Regression Equation

\(y = {\beta}_0 + {\beta}_1x + {\epsilon}\)

  • y - the response, dependent variable
  • \({\beta}_0\) - the y-intercept
  • \({\beta}_1\) - the coefficient and rate of change of x
  • x - the predictor/independent variable
  • \({\epsilon}\) - error (often not included in the linear regression equation)

Dataset Diamonds

data(diamonds)
head(diamonds)
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

Ggplot: Carat Versus Price in Diamond Dataset

g <- ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point(aes(color = carat))

g

Ggplot with Linear Regression

## 
## Call:
## lm(formula = price ~ carat, data = diamonds)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18585.3   -804.8    -18.9    537.4  12731.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2256.36      13.06  -172.8   <2e-16 ***
## carat        7756.43      14.07   551.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1549 on 53938 degrees of freedom
## Multiple R-squared:  0.8493, Adjusted R-squared:  0.8493 
## F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16

Plotly: Carat Versus Price

Linear Regression Equation Estimation and Evaluation

  • price = y
  • \({\beta}_0\) = -2256.36
  • \({\beta}_1\) = 7756.43
  • carat = x = .31

Simple Linear Regression Equation of Carat Versus Price

\(y = 7756.43x -2256.36\)

7756.43 (.31) - 2256.36 = $148.13

There is a positive correlation between carats and price in the Diamonds dataset and it is shown in the graph and the linear regression equation.

Thank You