Today, we’ll discuss the importance of graphing data and we’ll work with ggplot to visualize data in R. We will then work in groups to get as close as possible to finalizing analysis plans.
Which of the following graphs is not visualizing the data in the best way, and why? (The first shows frequencies of respondents in various categories associated with the different colors; the second shows mean response levels, scaled from -1 to 1, with error bars showing confidence intervals.)
Draw a rough sketch of what might be produced by the following R code:
ggplot(penguins_updated, aes(x = bill_to_flipper_ratio)) + geom_histogram()
Let’s take a look at the data!
#
penguin_corr <- ggplot(penguins_forgraphs,
aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_jitter(aes(color = species), alpha = .9, shape = 16, size = 1,
stroke = 1) +
geom_smooth(method = "lm", se = TRUE, linetype = 1) +
scale_color_viridis(discrete = TRUE, begin = .2, end = .8) +
theme_tufte() +
theme(legend.position = "bottom")
penguin_corr_improved <- ggMarginal(penguin_corr, type = "density")
#
penguin_raincloud <- ggplot(penguins_forgraphs,
aes(x = species, y = bill_to_flipper_ratio,
color = species)) +
geom_jitter(alpha = .5, width = .1, size = .5) +
stat_summary(fun = "mean", geom = "point", shape = 15, col = "black") +
geom_half_boxplot(side = "r", outlier.shape = NA, center = TRUE,
position = position_nudge(x = .15),
errorbar.draw = FALSE, width = .2) +
geom_half_violin(aes(fill = species), side = "r", center = TRUE,
position = position_nudge(x = .3)) +
facet_wrap( ~ sex) +
scale_color_viridis(discrete = TRUE, begin = .2, end = .8) +
scale_fill_viridis(discrete = TRUE, begin = .2, end = .8) +
theme_tufte() +
theme(axis.text = element_text(size = 14),
strip.text = element_text(size = 14), legend.position = "hide")