This R markdown document contains code for calculating statistics (min, median, mean, max) for each iris species in the iris dataset. To analyze whether sepal width is a good predictor of sepal length across all three species, the code constructs a linear model and plot (Figure 1) with a linear regression of sepal width versus sepal length. The code also constructs two tables: one for iris traits (Table 1) and one for iris statistics (Table 2).
shhh <- suppressPackageStartupMessages # silence messages.
shhh(library(tidyverse))
shhh(library(knitr))
shhh(library(ggplot2))
shhh(library(dplyr))
data(iris)
# view(iris)
iris_traits <- iris %>%
group_by(Species) %>%
summarise(across(.cols = everything(), list(
"min" =min,
median = median,
mean = mean,
max = max ) ) ) %>%
pivot_longer( -Species, names_to = "Trait", values_to = "Observation") %>%
separate(Trait, c("Trait", "Statistic"), sep = "_")
knitr::kable( (iris_traits), caption = "Table 1. Iris Traits")
| Species | Trait | Statistic | Observation |
|---|---|---|---|
| setosa | Sepal.Length | min | 4.300 |
| setosa | Sepal.Length | median | 5.000 |
| setosa | Sepal.Length | mean | 5.006 |
| setosa | Sepal.Length | max | 5.800 |
| setosa | Sepal.Width | min | 2.300 |
| setosa | Sepal.Width | median | 3.400 |
| setosa | Sepal.Width | mean | 3.428 |
| setosa | Sepal.Width | max | 4.400 |
| setosa | Petal.Length | min | 1.000 |
| setosa | Petal.Length | median | 1.500 |
| setosa | Petal.Length | mean | 1.462 |
| setosa | Petal.Length | max | 1.900 |
| setosa | Petal.Width | min | 0.100 |
| setosa | Petal.Width | median | 0.200 |
| setosa | Petal.Width | mean | 0.246 |
| setosa | Petal.Width | max | 0.600 |
| versicolor | Sepal.Length | min | 4.900 |
| versicolor | Sepal.Length | median | 5.900 |
| versicolor | Sepal.Length | mean | 5.936 |
| versicolor | Sepal.Length | max | 7.000 |
| versicolor | Sepal.Width | min | 2.000 |
| versicolor | Sepal.Width | median | 2.800 |
| versicolor | Sepal.Width | mean | 2.770 |
| versicolor | Sepal.Width | max | 3.400 |
| versicolor | Petal.Length | min | 3.000 |
| versicolor | Petal.Length | median | 4.350 |
| versicolor | Petal.Length | mean | 4.260 |
| versicolor | Petal.Length | max | 5.100 |
| versicolor | Petal.Width | min | 1.000 |
| versicolor | Petal.Width | median | 1.300 |
| versicolor | Petal.Width | mean | 1.326 |
| versicolor | Petal.Width | max | 1.800 |
| virginica | Sepal.Length | min | 4.900 |
| virginica | Sepal.Length | median | 6.500 |
| virginica | Sepal.Length | mean | 6.588 |
| virginica | Sepal.Length | max | 7.900 |
| virginica | Sepal.Width | min | 2.200 |
| virginica | Sepal.Width | median | 3.000 |
| virginica | Sepal.Width | mean | 2.974 |
| virginica | Sepal.Width | max | 3.800 |
| virginica | Petal.Length | min | 4.500 |
| virginica | Petal.Length | median | 5.550 |
| virginica | Petal.Length | mean | 5.552 |
| virginica | Petal.Length | max | 6.900 |
| virginica | Petal.Width | min | 1.400 |
| virginica | Petal.Width | median | 2.000 |
| virginica | Petal.Width | mean | 2.026 |
| virginica | Petal.Width | max | 2.500 |
####################################################
##Graph data to visualize
####################################################
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point()+
theme_classic()
ggplot(iris, aes(x = Sepal.Width)) + # looks like it is normally distributed
geom_histogram()+
theme_classic()
####################################################
##Model data
####################################################
#Sepal L versus W (both equivalent)
sepal <- lm(Sepal.Width ~Sepal.Length, data = iris)
# sepal <- lm(Sepal.Width ~ 1 + Sepal.Length, data = iris)
#Summarize model
summary(sepal)
#Pull out stats table
coef(summary(sepal))
#Add model prediction onto data graph
sepal_summary <- summary(sepal)
iris_table <- tibble( "R-squared"= sepal_summary$r.squared,
"P-value" = sepal_summary$coefficients[2,4],
"Slope" = sepal_summary$coefficients[2,1]
)
iris_table
knitr::kable(iris_table, align = "ccccc",
caption = "Table 2. Iris Statistics",
col.names = c("R-squared",
"P-value" ,
"Slope"))
| R-squared | P-value | Slope |
|---|---|---|
| 0.0138227 | 0.1518983 | -0.0618848 |
ggplot(iris, aes(Sepal.Width,Sepal.Length))+
geom_point(color = "lightgrey")+
geom_smooth(method = "lm", se = F) +
labs(x = "Sepal width",
y = "Sepal length",
caption = "Figure 1. Scatter plot with linear regression of the independent variable, sepal width, versus the dependent variable, sepal \n length, from the iris data set for all species. Sepal length showed a negative correlation and insignificant relationship wtih\n sepal width (r-squared = 0.01382, p-value = 0.1519).") +
theme(plot.caption.position = "plot",
plot.caption = element_text(hjust = 0))
## `geom_smooth()` using formula 'y ~ x'
Plotting a linear regression for sepal width versus sepal length for all iris species showed a negative correlation between the two iris traits, indicating that as sepal width increases, sepal length decreases. However, the results are insignificant (p-value = 0.1519). R-squared indicates that sepal width explains ~1.4% of the variation in sepal length.