The dataset used in this product is the Anderson’s iris datset: Measurements of Iris species in centimeters.
library(tidyverse)
In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression.
For this product, we will be examining on how to visualize and preform simple linear regression.
Lets view some data:
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
We will be examining if there is a a statistically significant relationship among the following quantitative variables within the iris dataset:
ggplot(iris, aes(x = iris$Sepal.Length, y = iris$Sepal.Width, col = iris$Species)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~ iris$Species, nrow = 1) +
labs(title = "Sepal Length vs. Width", x = "Length", y = "Width", col = "Species")+
theme_bw()
ggplot(iris, aes(x = iris$Sepal.Length, y = iris$Petal.Length, col = iris$Species)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~ iris$Species, nrow = 1) +
labs(title = "Sepal Length vs. Petal Length", x = "Sepal Length", y = "Petal Length", col = "Species") +
theme_bw()
ggplot(iris, aes(x = iris$Sepal.Length, y = iris$Petal.Width, col = iris$Species)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~ iris$Species, nrow = 1) +
labs(title = "Sepal Length vs. Petal Length", x = "Sepal Length", y = "Petal Width", col = "Species") +
theme_bw()
For time and codings sake, we will focus on one species in particular: Virginica
vir <- iris %>%
subset(Species == "virginica")
vir_lm <- lm(Sepal.Length ~ Sepal.Width, data = vir)
summary(vir_lm)
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width, data = vir)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.26067 -0.36921 -0.03606 0.19841 1.44917
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.9068 0.7571 5.161 4.66e-06 ***
## Sepal.Width 0.9015 0.2531 3.562 0.000843 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5714 on 48 degrees of freedom
## Multiple R-squared: 0.2091, Adjusted R-squared: 0.1926
## F-statistic: 12.69 on 1 and 48 DF, p-value: 0.0008435
plot(density(vir$Sepal.Length))