You will be using this dataset for the homework
install.packages("ggplot2") #install ggplot
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
install.packages("dplyr") #install dplyr
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggplot2) #ensure ggplot installed
library(dplyr) #ensure dplyr installed
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
install.packages("palmerpenguins") #install penguin data
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(palmerpenguins) #ensure penguin data installed
##
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
##
## penguins, penguins_raw
penguins_clean <- penguins %>% #creating 'cleaned' data to use
filter(!is.na(species), !is.na(sex), !is.na(body_mass_g))
install.packages("ggforce") #install ggforce
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggforce) #ensure ggplot installed
library(dplyr) #ensure dplyr installed
ggplot(penguins_clean, aes(x = species, y = body_mass_g, color = sex)) + #plot species, mass, sex
geom_sina(alpha = 0.7, size = 3) + #change opacity + size
labs(
title = "Sina Plot of Penguin Mass by Sex & Species", #add title
x = "Species", #name x axis
y = "Mass (g)" #name y axis
) +
theme_minimal() + #change theme
theme(legend.position = "none") #remove legend
What plot type did you choose? I chose a sina plot.
Why is this plot appropriate for this data? It is appropriate because there is a lot of overlapping data.
What patterns do you observe? Males typically have a larger body mass (g).
install.packages("ggdist")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggdist)
ggplot(penguins_clean, aes(x = species, y = body_mass_g, fill = species)) + #plot by species, mass, sex
stat_halfeye(
adjust = 0.5,
width = 0.6,
justification = -0.2,
.width = 0,
point_colour = NA
) +
geom_boxplot( #add boxplot
width = 0.12,
outlier.shape = NA,
alpha = 0.5
) +
geom_jitter( #jitter for overlapping dara
width = 0.08,
alpha = 0.5,
size = 1.5
) +
labs(
title = "Raincloud Plot of Penguin Body Mass", #add title
x = "Species", #add x axis title
y = "Body Mass (g)" #add y axis title
) +
theme_minimal() +
theme(legend.position = "none") #remove legend
Which species has the highest body mass? Gentoo have the highest body mass (g).
Which species shows the greatest variability? Adelie have the greatest variability.
What does this plot show that a boxplot alone would not? It shows the density of the distribution.
sum_data <- penguins %>% #create summary data
group_by(species, sex) %>% #group by species/sex
summarise(
mean_mass = mean(body_mass_g, na.rm = TRUE), #create mean mass variable
sd = sd(body_mass_g, na.rm = TRUE), #create sd variable
n = n(),
se = sd / sqrt(n),
lower = mean_mass - 1.96 * se,
upper = mean_mass + 1.96 * se,
.groups = "drop"
)
ggplot(sum_data, aes(x = mean_mass, y = species, color = sex)) + #plot summary data of mass, species, sex
geom_point(position = position_dodge(width = 0.5), size = 3) + #plpot points
geom_errorbarh( #set error bar
aes(xmin = lower, xmax = upper),
position = position_dodge(width = 0.5), #position bar
height = 0.2 #bar height
) +
labs(
title = "Forest Plot: Body Mass by Sex & Species", #add title
x = "Mean Body Mass (g) at 95% CI", #title x axis
y = "Species", #title y axis
color = "Sex" #color by sex
) +
theme_minimal() #minimal theme
## Warning: `geom_errorbarh()` was deprecated in ggplot2 4.0.0.
## ℹ Please use the `orientation` argument of `geom_errorbar()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `height` was translated to `width`.
Which group has the highest mean body mass? Male Gentoo’s have the highest mean mass (g).
Which group has the widest confidence interval? Why? Adelie have the widest CI, because the data is so varied/has a wide range of values.
Do any groups appear clearly different (based on overlap)? Gentoo are significantly higher than the two other species.