The dataset used in this product is the Anderson’s iris datset: Measurements of Iris species in centimeters.

library(tidyverse)

Linear Regression

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression.

Data

For this product, we will be examining on how to visualize and preform simple linear regression.

Lets view some data:

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Relationships

We will be examining if there is a a statistically significant relationship among the following quantitative variables within the iris dataset:

Sepal Length vs. Sepal Width

ggplot(iris, aes(x = iris$Sepal.Length, y = iris$Sepal.Width, col = iris$Species)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_wrap(~ iris$Species, nrow = 1) +
  labs(title = "Sepal Length vs. Width", x = "Length", y = "Width", col = "Species")+
  theme_bw()

Sepal Length vs. Petal Length

ggplot(iris, aes(x = iris$Sepal.Length, y = iris$Petal.Length, col = iris$Species)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_wrap(~ iris$Species, nrow = 1) +
  labs(title = "Sepal Length vs. Petal Length", x = "Sepal Length", y = "Petal Length", col = "Species") +
  theme_bw()

Sepal Length vs. Petal Width

ggplot(iris, aes(x = iris$Sepal.Length, y = iris$Petal.Width, col = iris$Species)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_wrap(~ iris$Species, nrow = 1) +
  labs(title = "Sepal Length vs. Petal Length", x = "Sepal Length", y = "Petal Width", col = "Species") +
  theme_bw()


Focus Species

For time and codings sake, we will focus on one species in particular: Virginica

vir <- iris %>% 
  subset(Species == "virginica")

vir_lm <- lm(Sepal.Length ~ Sepal.Width, data = vir)
summary(vir_lm)
## 
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width, data = vir)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26067 -0.36921 -0.03606  0.19841  1.44917 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.9068     0.7571   5.161 4.66e-06 ***
## Sepal.Width   0.9015     0.2531   3.562 0.000843 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5714 on 48 degrees of freedom
## Multiple R-squared:  0.2091, Adjusted R-squared:  0.1926 
## F-statistic: 12.69 on 1 and 48 DF,  p-value: 0.0008435
plot(density(vir$Sepal.Length))