Here’s some made up data where you can imagine people are divided into 4 groups and have to take 2 tests. Some groups drink coffee before Test 1 and others drink coffee before Test 2.
The goal is to create a graph that is easy to interpret.
So first, I’ve created the made up dataset
## # A tibble: 8 x 5
## group test coffee score se
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Group 1 Test 1 Yes Coffee 75 4.5
## 2 Group 2 Test 1 Yes Coffee 82 5.09
## 3 Group 3 Test 1 No Coffee 65 5.14
## 4 Group 4 Test 1 No Coffee 88 4.35
## 5 Group 3 Test 2 Yes Coffee 91 5.23
## 6 Group 4 Test 2 Yes Coffee 63 4.78
## 7 Group 1 Test 2 No Coffee 77 3
## 8 Group 2 Test 2 No Coffee 93 5.07
In a first attempt to plot the data, I’ve simply used facet_wrap to divide the data by test and coffee intake. However, you’ll notice a lot of white space because groups only completed a test once (either with or without coffee)
ggplot(mydat, aes(x = group, y=score, color=group)) +
geom_point(size=2.5) +
geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) + # plots the error bars using the made up standard error (se)
facet_wrap(test~coffee) + # divide the graph by test and coffee intake
labs(x = " ", y = "Score") +
theme_bw() + # this is my favorite theme to use
theme(legend.position = "none", # removes the legend
axis.title = element_text(size=20),
axis.text=element_text(size=15, color = "black"), # makes axis text larger
strip.text = element_text(size=20)) # makes header text larger
the only change I made here is in the facet_wrap function, where I’ve added two new arguments
this graph is looking much better, but I’d really like to rearrange the order of the panels to have:
1) Yes Coffee: Group 1/Group 2
2) No Coffee: Group 1/Group 2
3) Yes Coffee: Group 3/Group 4
4) No Coffee: Group 3/Group 4
ggplot(mydat, aes(x = group, y=score, color=group)) +
geom_point(size=2.5) +
geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) +
facet_wrap(test~coffee, nrow = 1, scales = "free_x") + # nrow = 1 puts all data on one row and scales = "free_x" removes the "missing data": you can remove this and take a look at the graph and see the difference
labs(x = " ", y = "Score") +
theme_bw() +
theme(legend.position = "none",
axis.title = element_text(size=20),
axis.text=element_text(size=15, color = "black"),
strip.text = element_text(size=20))
Now, there’s probably a cleaner and easier way to do this, but I created 2 new variables to manipulate where the data goes within the panels
This is much better because now I can more easily compare the groups (so can compare red to red and green to green etc more easily)
However, we still need to fix the labels
mydat$divide = c(1,1,4,4,3,3,2,2) # Group 1 and 2 Yes Coffee go first, Group 3 and 4 No Coffee go fourth, Group 3 and 4 Yes Coffee go third, Group 1 and 2 No Coffee go second
mydat$order = c(1,2,7,8,5,6,3,4) # and this just tells each exact point where to go
ggplot(mydat, aes(x = order, y=score, color=group)) + # x axis is now = order
geom_point(size=2.5) +
geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) +
facet_wrap(~divide, nrow = 1, scales = "free_x") + # facet_wrap by our new divide variable
labs(x = " ", y = "Score") +
theme_bw() +
theme(legend.position = "none",
axis.title = element_text(size=20),
axis.text=element_text(size=15, color = "black"),
strip.text = element_text(size=20))
Here I’m just fixing the labels and spacing
So now we can see that:
- Group 1 stays about the same with and without coffee
- Group 2 does better without coffee
- Group 3 does better with coffee
- Group 4 does better without coffee
(when manually adding labels, it’s really important to check that the labels match up with the data because it’s easy to make a mistake)
label_graph = c("Test 1\nYes Coffee", "Test 2\nNo Coffee", "Test 2\nYes Coffee", "Test 1\nNo Coffee") # creates the headers for each of the four panels
names(label_graph) = c(1,2,3,4)
ggplot(mydat, aes(x = order, y=score, color=group)) +
geom_point(size=2.5) +
geom_errorbar(aes(ymin = score-se, ymax = score+se), size=1.5, width=.2) +
facet_wrap(~divide, nrow = 1, scales = "free_x", labeller = labeller(divide=label_graph)) + # labeller = the labels I set outside ggplot
scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8),
labels = c("Group \n1", "Group \n2", "Group \n1", "Group \n2", "Group \n3", "Group \n4", "Group \n3", "Group \n4")) + # these are the labels for the x axis. I made 8 breaks and manually set the labels
labs(x = " ", y = "Score") +
theme_bw() +
theme(legend.position = "none",
panel.spacing.x = unit(c(1,3,1), "lines"), # this line adds some extra spacing between the panels, so I added more space to separate the two Tests
axis.title = element_text(size=20),
axis.text=element_text(size=15, color = "black"),
strip.text = element_text(size=20))