library(tibble)
library(tidyverse)
library(palmerpenguins)
library(vtable)
library(ggplot2)
CHOOSING STATISTICAL TESTS
LOADING PACKAGES
LOADING DATASET IRIS
iris
BOX PLOT REPRESENTING THE RELATIONSHIP BETWEEN SPECIES AND THEIR SEPAL LENGTH
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot(aes(color = Species), fill = "white") +
scale_color_manual(values = c("setosa" = "red", "versicolor" = "green", "virginica" = "blue")) +
labs(x = "Species", y = "Sepal Length") +
theme_minimal()
ANOVA (Analysis of Variance) is used to compare the mean Sepal Lengths across multiple groups, specifically the three species: setosa, versicolor, and virginica. It tests whether the differences in mean Sepal Length between these species are statistically significant.
DENSITY PLOT REPRESENTATION OF LENGTH AMONG SPECIES
ggplot(iris, aes(x = Petal.Length, fill = Species)) +
geom_density(alpha = 0.5) +
labs(x = "Petal Length", y = "Density") +
theme_minimal() +
scale_fill_manual(values = c("red", "green", "blue"))
ANOVA (Analysis of Variance) is used to determine if the mean Petal Lengths differ significantly across the three species. It tests whether the differences in average Petal Length between the species are statistically significant.
SCATTER DIAGRAM REPRESENTATION OF RELATIONSHIP BETWEEN PETAL LENGTH AND WIDTH AMONG SPECIES
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species)) +
geom_point(aes(shape = Species)) +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(x = "Petal Length", y = "Petal Width") +
theme_minimal() +
scale_color_manual(values = c("red", "green", "blue"))
`geom_smooth()` using formula = 'y ~ x'
ANOVA for regression is used to test whether the linear relationship between Petal Length and Petal Width is statistically significant. A small p-value suggests that Petal Length is a significant predictor of Petal Width.
BAR PLOT REPRESENTATION OF SPECIES ACCORDING TO SIZE
<- iris %>%
iris mutate(size = ifelse(Sepal.Length < median(Sepal.Length), "small", "big"))
# Create the bar plot using the new 'size' variable
ggplot(iris, aes(x = Species, fill = size)) +
geom_bar(position = "dodge") +
labs(x = "Species", y = "Count") +
theme_minimal()
The Chi-Square Test of Independence is used to determine whether there is an association between two categorical variables, such as species and size (big or small).