Linear Regression

Introduction

I will be using the built iris dataset in to demonstrate simple linear regression in R.

Iris data

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Simple scatterplot (Code)

The upcoming plot shows a linear correlation between petal length and petal width

scatter = ggplot(
   iris,
   aes(Petal.Length, Petal.Width)) +
   geom_jitter() +
   labs(x = "Petal length (cm)",
        y = "Petal width (cm)")

Simple scatterplot

Least Squares

The least squares fit is given by the formula \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_i + \epsilon_i\)

In our model: - \(x_i\) refers to petal length - \(\hat{y}\) refers to petal width - \(\epsilon_i\) is the random error - \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are regression parameters

Least Squares Regression Model

We use R’s lsfit() function to estimate the coefficients \(\beta_0\) (intercept) and \(\beta_1\) (slope).

lsfit(iris$Petal.Length, iris$Petal.Width)$coefficients
##  Intercept          X 
## -0.3630755  0.4157554

Least Squares Regression Model (cont)

## `geom_smooth()` using formula = 'y ~ x'

Interactive Plot

For a more comprehensive look, we can use an interactive plot.

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:latex2exp':
## 
##     TeX
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout