Homework

You will be using this dataset for the homework

install.packages("ggplot2") #install ggplot
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
install.packages("dplyr") #install dplyr
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggplot2) #ensure ggplot installed
library(dplyr) #ensure dplyr installed
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
install.packages("palmerpenguins") #install penguin data
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(palmerpenguins) #ensure penguin data installed
## 
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
## 
##     penguins, penguins_raw
penguins_clean <- penguins %>% #creating 'cleaned' data to use
  filter(!is.na(species), !is.na(sex), !is.na(body_mass_g))

Part 1

install.packages("ggforce") #install ggforce
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggforce) #ensure ggplot installed
library(dplyr) #ensure dplyr installed

ggplot(penguins_clean, aes(x = species, y = body_mass_g, color = sex)) + #plot species, mass, sex
  geom_sina(alpha = 0.7, size = 3) + #change opacity + size
  labs(
    title = "Sina Plot of Penguin Mass by Sex & Species", #add title
    x = "Species", #name x axis
    y = "Mass (g)" #name y axis
  ) +
  theme_minimal() + #change theme
  theme(legend.position = "none") #remove legend

  1. What plot type did you choose? I chose a sina plot.

  2. Why is this plot appropriate for this data? It is appropriate because there is a lot of overlapping data.

  3. What patterns do you observe? Males typically have a larger body mass (g).

Part 2

install.packages("ggdist")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggdist)

ggplot(penguins_clean, aes(x = species, y = body_mass_g, fill = species)) + #plot by species, mass, sex
  stat_halfeye(
    adjust = 0.5,
    width = 0.6,
    justification = -0.2,
    .width = 0,
    point_colour = NA
  ) + 
  geom_boxplot( #add boxplot
    width = 0.12,
    outlier.shape = NA,
    alpha = 0.5
  ) +
  geom_jitter( #jitter for overlapping dara
    width = 0.08,
    alpha = 0.5,
    size = 1.5
  ) +
  labs(
    title = "Raincloud Plot of Penguin Body Mass", #add title
    x = "Species", #add x axis title
    y = "Body Mass (g)" #add y axis title
  ) +
  theme_minimal() +
  theme(legend.position = "none") #remove legend

  1. Which species has the highest body mass? Gentoo have the highest body mass (g).

  2. Which species shows the greatest variability? Adelie have the greatest variability.

  3. What does this plot show that a boxplot alone would not? It shows the density of the distribution.

Part 3. Create a forest plot. Now summarize the data and visualize uncertainty.

sum_data <- penguins %>% #create summary data
  group_by(species, sex) %>% #group by species/sex
  summarise(
    mean_mass = mean(body_mass_g, na.rm = TRUE), #create mean mass variable
    sd = sd(body_mass_g, na.rm = TRUE), #create sd variable
    n = n(),
    se = sd / sqrt(n),
    lower = mean_mass - 1.96 * se,
    upper = mean_mass + 1.96 * se,
    .groups = "drop"
  )

ggplot(sum_data, aes(x = mean_mass, y = species, color = sex)) + #plot summary data of mass, species, sex
  geom_point(position = position_dodge(width = 0.5), size = 3) + #plpot points
  geom_errorbarh( #set error bar
    aes(xmin = lower, xmax = upper),
    position = position_dodge(width = 0.5), #position bar
    height = 0.2 #bar height
  ) +
  labs(
    title = "Forest Plot: Body Mass by Sex & Species", #add title
    x = "Mean Body Mass (g) at 95% CI", #title x axis
    y = "Species", #title y axis
    color = "Sex" #color by sex
  ) +
  theme_minimal() #minimal theme
## Warning: `geom_errorbarh()` was deprecated in ggplot2 4.0.0.
## ℹ Please use the `orientation` argument of `geom_errorbar()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `height` was translated to `width`.

  1. Which group has the highest mean body mass? Male Gentoo’s have the highest mean mass (g).

  2. Which group has the widest confidence interval? Why? Adelie have the widest CI, because the data is so varied/has a wide range of values.

  3. Do any groups appear clearly different (based on overlap)? Gentoo are significantly higher than the two other species.