2024-04-06

Plotly Linear Regression of Iris

GGPlot Linear Regression of Iris

R Code for GGPlot

Here is the R Code for the last slide, which produced a plot of petal showing the relationship between petal lengths and sepal lengths of iris.

library(ggplot2)
ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length)) +
  geom_point() +  
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE) + 
  labs(
    x = "Sepal Length",
    y = "Petal Length",
    title = "Relationship between Sepal Length and Petal Length"
  ) +
  theme_minimal()

GGPlot Linear Regression of Iris by Species

Each species has a distinct linear regression line which may be used for prediction.

R Code for Species GGPlot

Here is the R Code for the last slide, which produced a plot of petal length and sepal lengths of each iris species.

ggplot(data = iris, 
aes(x = Sepal.Length, 
y = Petal.Length, 
color = Species)) +
  geom_point() + 
  geom_smooth(method = "lm", 
  formula = y ~ x, se = FALSE) + 
  labs(
    x = "Sepal Length",
    y = "Petal Length",
    title = "Relationship between Sepal Length and Petal Length"
  ) +
  theme_minimal()  

Regression Equation

The regression equation used for the plots can be represented as:

\[ \hat{PetalLength}_{\text{Setosa}} = \beta_0 + \beta_1 \cdot SepalLength + \epsilon \]

Where \(\hat{PetalLength}_{\text{Setosa}}\) represents the predicted Petal Length for the species.

Sample Variance Equation

\[s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}\] The Sepal Length Variance is:

## [1] 0.6856935

The variance is relatively small, and most points are close to the mean value. The regression line for this data set is fairly reliable for future predictions.

Thank you!