Penguin Data Task

Penguin Workbook

I have absolutely no clue what I am doing but I am giving it my best shot. Can’t quite wrap my head around the exercises in chapter 6 however I managed to produce a histogram, scatter plot, bar chart and box plot. Things are started to make more sense but still need loads of practice! :)

Meet the penguins

Illustration of three species of Palmer Archipelago penguins: Chinstrap, Gentoo, and Adelie. Artwork by @allison_horst.

The penguins data from the palmerpenguins package contains size measurements for 344 penguins from three species observed on three islands in the Palmer Archipelago, Antarctica.

The plot below shows the relationship between flipper and bill lengths of these penguins.

What is a good Research Hypotheses?

A Research Hypothesis is a suggested explanation or a reasoned proposal about the expected result of an experiment or project. They can be formulated by exploring the correlation between two variables, exposing a connection between two variables or by taking the form of a direct statement. There are two types of hypotheses; Null and Alternative. The null hypothesis is the proposal or statement that is made which we are trying to reject and the alternative is what we are trying tp prove and accept if we have sufficient evidence. There are several features which make an effective Research Hypothesis: - Testability - Clarity and Relevance - Brevity and objectivity By following these features when creating a hypothesis, it ensures that it reflects a clear idea of what we know and what we expect to find out in a brief statement which allows us to work towards observable and testable results.

Graphs and their associated tests- Formative Exercise 5

Box plot

Boxplots show the distribution of data across different categories, making it easy to compare medians and ranges. Mose likely an ANOVA test was carried out as it is suitable when comparing three or more groups, while t-tests are appropriate for two groups.

library(ggplot2)
data("iris")

ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) +
  geom_boxplot() +
  labs(title="Boxplot of Sepal Length by Species",
       x="Species",
       y="Sepal Length (cm)") +
  theme_minimal()

Scatter plot

Scatter plots depict the relationship between two continuous variables. If you see a linear pattern, correlation and regression analyses are appropriate to quantify the strength and nature of the relationship.Most likely a Pearson Correlation Coefficient test was carried out as this measures the strength and direction of association between two continuous variables.

library(tidyverse)
library(ggplot2)
data("iris")
view(iris)

  ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
  geom_point() +
  labs(title="Scatterplot of Petal Length vs. Petal Width",
       x="Petal Length (cm)",
       y="Petal Width (cm)") +
  theme_minimal()

Histogram

Histograms display the frequency distribution of a continuous variable. They help assess whether the data follows a normal distribution, which is crucial for many parametric tests.

library(ggplot2)
data("iris")
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
  geom_histogram(aes(y=..density..), binwidth=0.2, position="identity", alpha=0.5) +
  labs(title="Histogram of Petal Length by Species",
       x="Petal Length (cm)",
       y="Density") +
  theme_minimal() +
  scale_fill_brewer(palette="Set1") 
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Bar Chart

Bar charts visualize categorical data and help assess relationships or differences between groups. Chi-squared tests determine if there is a significant association between categorical variables.

library(ggplot2)
library(dplyr)

data("iris")

iris <- iris %>%
  mutate(Size = ifelse(Sepal.Length > 5.5, "Big", "Small"))



ggplot(iris, aes(x=Species, fill=Size)) +
  geom_bar(position="dodge") +
  labs(title="Count of Iris Species by Size Category",
       x="Species",
       y="Count") +
  theme_minimal() +
  scale_fill_manual(values=c("Big"="blue", "Small"="orange"))