Week 5

Author

Yasiru Dilshan

Week 5

library(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length, color = Species)) +   geom_boxplot() +   labs(x = "Species", y = "Sepal Length") +   theme_minimal()

Explanation

ANOVA Test is best for this.

because

  • ANOVA is designed to compare means across multiple groups

  • It’s a parametric test, which is appropriate for normally distributed data.

  • A one-way ANOVA appears to be an appropriate statistical test to compare the “Sepal Length” means among the three species, according to the boxplot. Before beginning the analysis, it is crucial to confirm the assumptions of equal variances and normalcy.

library(ggplot2)
data("iris") 
ggplot(iris, aes(x = Petal.Length, fill = Species)) +   geom_density(alpha = 0.5) +   labs(title = "Density Plot of Petal Length by Species",        x = "Petal Length",        y = "Density")

Explanation

The main purpose of density charts is to show how a continuous variable is distributed. They can offer information that helps choose the right statistical tests, even though they don’t do statistical tests themselves. These are some typical situations and the exams that go along with them. The main purpose of density charts is to show how a continuous variable is distributed.

Tests

  • Normality Assessment - Shapiro-Wilk test

    Reason - Parametric tests such as ANOVA or t-tests can be applied if the density plot looks like a bell curve, which is a representation of a normal distribution. Non-parametric tests like the Mann-Whitney U test or the Kruskal-Wallis test may be more suitable if it is highly skewed or contains several peaks.

  • Correlation analysis - We can plot the density plots of two variables together to determine whether they are connected. The shapes may indicate a correlation if they are complementary or comparable. We can plot the density plots of two variables together to determine whether they are connected. The shapes may indicate a correlation if they are complementary or comparable.

library(ggplot2) 
data("iris") 
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species, shape = Species)) +   geom_point() +   geom_smooth(method = "lm", se = FALSE) +   labs(title = "Scatter Plot of Petal Length vs. Petal Width by Species",        x = "Petal Length",        y = "Petal Width")
`geom_smooth()` using formula = 'y ~ x'

Explanation

A linear regression analysis would be suitable to model the relationship between Petal.Length and Petal.Width for each species.

Linear regression is used to model linear relationships between variables.

To investigate the association between Petal.Length and Petal.Width for each species in the given scatter plot, a linear regression analysis would be an appropriate statistical test. You can investigate any variations in the correlations between the variables among the groups by carrying out distinct analysis for every species.

library(ggplot2) 
data("iris") 
#Create a new column
iris$size <- ifelse(iris$Petal.Length > 5, "big", "small")
ggplot(iris, aes(x = Species, fill = size)) +
  geom_bar(stat = "count") +
  labs(title = "Number of Individuals by Species and Size",
       x = "Species",
       y = "Count")

Explanation

This is a frequency test.

A Chi-squared test of independence would be suitable.

Reason for Chi-squared

  • Categorical data is analysed using chi-squared tests.

  • It tests the independence between two categorical variables.

  • To compare the percentages of “big” and “small” individuals among the three species in the given bar plot, a chi-squared test of independence would be an appropriate statistical test. You can ascertain whether there are notable variations in the distributions of “size” among the species by carrying out the test and analyzing the findings.

#End of week 5