R Show and Tell

Here’s some made up data where you can imagine people are divided into 4 groups and have to take 2 tests. Some groups drink coffee before Test 1 and others drink coffee before Test 2.
The goal is to create a graph that is easy to interpret.
So first, I’ve created the made up dataset

## # A tibble: 8 x 5
##   group   test   coffee     score    se
##   <chr>   <chr>  <chr>      <dbl> <dbl>
## 1 Group 1 Test 1 Yes Coffee    75  4.5 
## 2 Group 2 Test 1 Yes Coffee    82  5.09
## 3 Group 3 Test 1 No Coffee     65  5.14
## 4 Group 4 Test 1 No Coffee     88  4.35
## 5 Group 3 Test 2 Yes Coffee    91  5.23
## 6 Group 4 Test 2 Yes Coffee    63  4.78
## 7 Group 1 Test 2 No Coffee     77  3   
## 8 Group 2 Test 2 No Coffee     93  5.07

In a first attempt to plot the data, I’ve simply used facet_wrap to divide the data by test and coffee intake. However, you’ll notice a lot of white space because groups only completed a test once (either with or without coffee)

ggplot(mydat, aes(x = group, y=score, color=group)) +
  geom_point(size=2.5) + 
  geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) + # plots the error bars using the made up standard error (se)
  facet_wrap(test~coffee) + # divide the graph by test and coffee intake
  labs(x = " ", y = "Score") +
  theme_bw() + # this is my favorite theme to use
  theme(legend.position = "none", # removes the legend 
        axis.title = element_text(size=20),
        axis.text=element_text(size=15, color = "black"), # makes axis text larger
        strip.text = element_text(size=20)) # makes header text larger

the only change I made here is in the facet_wrap function, where I’ve added two new arguments
this graph is looking much better, but I’d really like to rearrange the order of the panels to have:
1) Yes Coffee: Group 1/Group 2
2) No Coffee: Group 1/Group 2
3) Yes Coffee: Group 3/Group 4
4) No Coffee: Group 3/Group 4

ggplot(mydat, aes(x = group, y=score, color=group)) +
  geom_point(size=2.5) +  
  geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) + 
  facet_wrap(test~coffee, nrow = 1, scales = "free_x") + # nrow = 1 puts all data on one row and scales = "free_x" removes the "missing data": you can remove this and take a look at the graph and see the difference
  labs(x = " ", y = "Score") +
  theme_bw() + 
  theme(legend.position = "none", 
        axis.title = element_text(size=20),
        axis.text=element_text(size=15, color = "black"), 
        strip.text = element_text(size=20))

Now, there’s probably a cleaner and easier way to do this, but I created 2 new variables to manipulate where the data goes within the panels
This is much better because now I can more easily compare the groups (so can compare red to red and green to green etc more easily)
However, we still need to fix the labels

mydat$divide = c(1,1,4,4,3,3,2,2) # Group 1 and 2 Yes Coffee go first, Group 3 and 4 No Coffee go fourth, Group 3 and 4 Yes Coffee go third, Group 1 and 2 No Coffee go second
mydat$order = c(1,2,7,8,5,6,3,4) # and this just tells each exact point where to go 

ggplot(mydat, aes(x = order, y=score, color=group)) + # x axis is now = order
  geom_point(size=2.5) + 
  geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) + 
  facet_wrap(~divide, nrow = 1, scales = "free_x") + # facet_wrap by our new divide variable
  labs(x = " ", y = "Score") +
  theme_bw() + 
  theme(legend.position = "none", 
        axis.title = element_text(size=20),
        axis.text=element_text(size=15, color = "black"), 
        strip.text = element_text(size=20))

Here I’m just fixing the labels and spacing
So now we can see that:
- Group 1 stays about the same with and without coffee
- Group 2 does better without coffee
- Group 3 does better with coffee
- Group 4 does better without coffee
(when manually adding labels, it’s really important to check that the labels match up with the data because it’s easy to make a mistake)

label_graph = c("Test 1\nYes Coffee", "Test 2\nNo Coffee", "Test 2\nYes Coffee", "Test 1\nNo Coffee") # creates the headers for each of the four panels
names(label_graph) = c(1,2,3,4)

ggplot(mydat, aes(x = order, y=score, color=group)) +
  geom_point(size=2.5) +  
  geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) + 
  facet_wrap(~divide, nrow = 1, scales = "free_x", labeller = labeller(divide=label_graph)) + # labeller = the labels I set outside ggplot
  scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8), 
                       labels = c("Group \n1", "Group \n2", "Group \n1", "Group \n2", "Group \n3", "Group \n4", "Group \n3", "Group \n4")) + # these are the labels for the x axis. I made 8 breaks and manually set the labels
  labs(x = " ", y = "Score") +
  theme_bw() + 
  theme(legend.position = "none", 
        panel.spacing.x = unit(c(1,3,1), "lines"), # this line adds some extra spacing between the panels, so I added more space to separate the two Tests
        axis.title = element_text(size=20),
        axis.text=element_text(size=15, color = "black"), 
        strip.text = element_text(size=20))

R Show and Tell

Jenny Sloane