PSY460: Advanced Quantitative Methods

Week #6: Visualizing Data

Today, we’ll discuss the importance of graphing data and we’ll work with ggplot to visualize data in R. We will then work in groups to get as close as possible to finalizing analysis plans.

Quiz: Part 1

Which of the following graphs is not visualizing the data in the best way, and why? (The first shows frequencies of respondents in various categories associated with the different colors; the second shows mean response levels, scaled from -1 to 1, with error bars showing confidence intervals.)

Quiz: Part 2

Draw a rough sketch of what might be produced by the following R code:

ggplot(penguins_updated, aes(x = bill_to_flipper_ratio)) + geom_histogram()

The Importance of Univariate Visualizations

The Importance of Multivariate Visualizations

Graphs in R

Whereas many statistical softwares produce graphs that greatly oversimplify data, R provides tremendous flexibility in creating visualizations.
- R therefore allows for more effective communication of information.
- Although the many possibilities allowed by R can at first be overwhelming, there are many guides to help (including the very useful cheat sheet on Canvas), and there is also a standardized approach to all different kinds of graphs within ggplot.

The grammar of graphics

Each graph in R consists of a series of layers on which different geometric elements (or “geoms”) are mapped. This makes it possible to plot various aspects of your data at once, by stacking various layers (e.g., points, bars, lines).
The geometric elements are determined by the variables (called “aesthetics”) that you would like to visualize.

Does the Ratio of Beaks to Flippers Differ Across Penguin Species and Sexes?

Let’s take a look at the data!

Preparing for Graphing

#
library(tidyverse) 
library(gghalves)
library(ggthemes)
library(ggExtra)
library(viridis)
library(palmerpenguins) 

myownpenguins <- penguins 

penguins_forgraphs <- myownpenguins %>% 
  filter(rowSums(is.na(myownpenguins)) == 0) %>% 
  mutate(bill_to_flipper_ratio = bill_length_mm/flipper_length_mm)

Looking at the spread of responses

#
penguin_densityplot <- ggplot(penguins_forgraphs, 
                              aes(x = bill_to_flipper_ratio)) +
  geom_density(fill="green4") + 
  theme_tufte() +
  theme(legend.position = "hide")

Looking at the spread of responses

Producing a scatterplot

#
penguin_corr <- ggplot(penguins_forgraphs, 
                       aes(x = bill_length_mm, y = flipper_length_mm)) +
  geom_jitter(aes(color = species), alpha = .9, shape = 16, size = 1, 
              stroke = 1) +
  geom_smooth(method = "lm", se = TRUE, linetype = 1) +
  scale_color_viridis(discrete = TRUE, begin = .2, end = .8) +
  theme_tufte() +
  theme(legend.position = "bottom") 

penguin_corr_improved <- ggMarginal(penguin_corr, type = "density")

Producing a scatterplot

A Raincloud Plot

#
penguin_raincloud <- ggplot(penguins_forgraphs, 
                       aes(x = species, y = bill_to_flipper_ratio, 
                           color = species)) +
  geom_jitter(alpha = .5, width = .1, size = .5) +
  stat_summary(fun = "mean", geom = "point", shape = 15, col = "black") +
  geom_half_boxplot(side = "r", outlier.shape = NA, center = TRUE,
                    position = position_nudge(x = .15), 
                    errorbar.draw = FALSE, width = .2) +
  geom_half_violin(aes(fill = species), side = "r", center = TRUE, 
                   position = position_nudge(x = .3)) +
  facet_wrap( ~ sex) + 
  scale_color_viridis(discrete = TRUE, begin = .2, end = .8) +
  scale_fill_viridis(discrete = TRUE, begin = .2, end = .8) +
  theme_tufte() +
  theme(axis.text = element_text(size = 14), 
        strip.text = element_text(size = 14), legend.position = "hide")

A Raincloud Plot

Plan for Next Week: Individual Meetings

1:00-1:15: Tian
1:15-1:30: Katie
1:30-1:45: Katherine
1:45-2:00: Gonzalo

2:00-2:15: Hannah
2:15-2:30: Yosuke
2:30-2:45: Brianna
2:45-3:00:Annaliese

3:15-3:30: Kyle
3:30-3:45:Ayako
3:45-4:00: Josh
4:00-4:15: Jake