What is Linear Regression?

  • A method to model relationship between variables
  • Predicts Y using X

Regression Equation (Math Slide)

\[ y = \beta_0 + \beta_1 x + \epsilon \]

  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope

Example Dataset

library(ggplot2)

x <- 1:10
y <- 2*x + rnorm(10)

data <- data.frame(x, y)

ggplot Example 1

ggplot(data, aes(x, y)) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

ggplot Example 2

ggplot(data, aes(x, y)) +
  geom_point(color = "red") +
  geom_line()

Plotly Interactive Plot

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(data, x = ~x, y = ~y, type = 'scatter', mode = 'markers')

Another Math Slide

\[ \hat{y} = b_0 + b_1 x \]

R Code Example

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4332 -0.3638  0.0044  0.3590  0.4907 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.77249    0.27314  -2.828   0.0222 *  
## x            2.15276    0.04402  48.903 3.38e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3998 on 8 degrees of freedom
## Multiple R-squared:  0.9967, Adjusted R-squared:  0.9962 
## F-statistic:  2391 on 1 and 8 DF,  p-value: 3.383e-11

Conclusion

  • Linear regression helps predict outcomes
  • Widely used in data science
library(ggplot2)
library(plotly)