data (iris) head (iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
2026-04-12
Irisdata (iris) head (iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
Linear regression is a statistical method used to estimate the value of one variable based on the value of another.
The variable being predicted is the dependent variable, and the variable used for prediction is the independent variable.
Mathematically:
\[ \hat{y} = \beta_0 + \beta_1 x \]
Breakdown of the equation:
Iris: Sepal Length vs. Petal Length
Model \[
\text{Sepal.Length} = \beta_0 + \beta_1 \cdot \text{Petal.Length} + \varepsilon,
\quad \varepsilon \sim \mathcal{N}(0, \sigma^2)
\]
Fitted Model \[
\text{Sepal.Length} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Petal.Length}
\]
`geom_smooth()` using formula = 'y ~ x'
Iris: Sepal Length vs. Sepal WidthModel \[ \text{Sepal.Length} = \beta_0 + \beta_1 \cdot \text{Sepal.Width} + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2) \]
Fitted Model \[ \text{Sepal.Length} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Sepal.Width} \]
`geom_smooth()` using formula = 'y ~ x'
Iris: Petal Length vs. Petal WidthModel \[ \text{Petal.Length} = \beta_0 + \beta_1 \cdot \text{Petal.Width} + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2) \]
Fitted Model \[ \text{Petal.Length} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Petal.Width} \]
`geom_smooth()` using formula = 'y ~ x'
x_vals <- iris$Sepal.Length
y_vals <- iris$Sepal.Width
z_vals <- iris$Petal.Length
species_vals <- iris$Species
my_colors <- c("#6a659e", "#659e6a", "#9e6599")
plot_ly(
x = x_vals,
y = y_vals,
z = z_vals,
type = "scatter3d",
mode = "markers",
color = species_vals,
colors = my_colors,
marker = list(size = 4)
) %>%
hide_colorbar() %>%
layout(
scene = list(
xaxis = list(title = "Sepal Length"),
yaxis = list(title = "Sepal Width"),
zaxis = list(title = "Petal Length")
)
)
Petal Length shows a strong positive linear relationship with Sepal Length.
Points trend upward and stay close to the regression line, so the linear model fits well.
Sepal Width shows a weak relationship with Sepal Length.
Points are scattered with no clear pattern, and the regression line does not match the data.
This means the linear model:
\[ \text{Sepal Length} = \beta_0 + \beta_1 \cdot \text{Sepal Width} \]
does not fit this relationship.
Petal Length and Petal Width have a very strong positive relationship.
Points are tightly grouped along the regression line, showing an great model fit.
The 3D plot shows that increases in Petal Length are associated with increases in Sepal Length, while Sepal Width shows less consistent variation. Petal Length is more strongly related to Sepal Length than Sepal Width. Petal Length is the better predictor of Sepal Length.
Species form clear clusters in all scatterplots, even though species were not part of the regression models.
Setosa is clearly different from the other two species.
It has much smaller petal measurements and is overall a smaller iris.
This shows up as clusters of purple Setosa points on the lower end of the graphs.
Versicolor and Virginica overlap more, but still show separation.
Virginica tends to have the largest measurements.
Versicolor sits in the middle.
Petal Length separates species the most.
When Petal Length is compared with other variables, the clusters become more distinct.
Overall, species grouping explains some of the variation in the data and adds an interesting layer to the dataset.