CHOOSING STATISTICAL TESTS

Author

INUSA YAWUZA MUSA(N1349229)

LOADING PACKAGES

library(tibble)
library(tidyverse)
library(palmerpenguins)
library(vtable)
library(ggplot2)

LOADING DATASET IRIS

iris

BOX PLOT REPRESENTING THE RELATIONSHIP BETWEEN SPECIES AND THEIR SEPAL LENGTH

ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot(aes(color = Species), fill = "white") +
  scale_color_manual(values = c("setosa" = "red", "versicolor" = "green", "virginica" = "blue")) +
  labs(x = "Species", y = "Sepal Length") +
  theme_minimal()

ANOVA (Analysis of Variance) is used to compare the mean Sepal Lengths across multiple groups, specifically the three species: setosa, versicolor, and virginica. It tests whether the differences in mean Sepal Length between these species are statistically significant.

DENSITY PLOT REPRESENTATION OF LENGTH AMONG SPECIES

ggplot(iris, aes(x = Petal.Length, fill = Species)) +
  geom_density(alpha = 0.5) +
  labs(x = "Petal Length", y = "Density") +
  theme_minimal() +
  scale_fill_manual(values = c("red", "green", "blue"))

ANOVA (Analysis of Variance) is used to determine if the mean Petal Lengths differ significantly across the three species. It tests whether the differences in average Petal Length between the species are statistically significant.

SCATTER DIAGRAM REPRESENTATION OF RELATIONSHIP BETWEEN PETAL LENGTH AND WIDTH AMONG SPECIES

ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species)) +
  geom_point(aes(shape = Species)) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(x = "Petal Length", y = "Petal Width") +
  theme_minimal() +
  scale_color_manual(values = c("red", "green", "blue"))
`geom_smooth()` using formula = 'y ~ x'

ANOVA for regression is used to test whether the linear relationship between Petal Length and Petal Width is statistically significant. A small p-value suggests that Petal Length is a significant predictor of Petal Width.

BAR PLOT REPRESENTATION OF SPECIES ACCORDING TO SIZE

iris <- iris %>%
  mutate(size = ifelse(Sepal.Length < median(Sepal.Length), "small", "big"))

# Create the bar plot using the new 'size' variable
ggplot(iris, aes(x = Species, fill = size)) +
  geom_bar(position = "dodge") +
  labs(x = "Species", y = "Count") +
  theme_minimal()

The Chi-Square Test of Independence is used to determine whether there is an association between two categorical variables, such as species and size (big or small).