I will be using the built iris dataset in to demonstrate simple linear regression in R.
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa
The upcoming plot shows a linear correlation between petal length and petal width
scatter = ggplot(
iris,
aes(Petal.Length, Petal.Width)) +
geom_jitter() +
labs(x = "Petal length (cm)",
y = "Petal width (cm)")
The least squares fit is given by the formula \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_i + \epsilon_i\)
In our model: - \(x_i\) refers to petal length - \(\hat{y}\) refers to petal width - \(\epsilon_i\) is the random error - \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are regression parameters
We use R’s lsfit() function to estimate the coefficients \(\beta_0\) (intercept) and \(\beta_1\) (slope).
lsfit(iris$Petal.Length, iris$Petal.Width)$coefficients
## Intercept X ## -0.3630755 0.4157554
## `geom_smooth()` using formula = 'y ~ x'
For a more comprehensive look, we can use an interactive plot.
## ## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2': ## ## last_plot
## The following object is masked from 'package:latex2exp': ## ## TeX
## The following object is masked from 'package:stats': ## ## filter
## The following object is masked from 'package:graphics': ## ## layout