I started learning R because I needed to do a particular kind of statistics that wasn’t easy in SPSS. I had no idea that it would completely change the way I clean and visualise data, and I definitely had no idea it would make many of the admin jobs that I have to do repeatedly SO MUCH EASIER.
This week I have been pulling myExperience data together for an APAC report and the ggplot2 and patchwork packages have been in high rotation!
First, here is my new setup chunk (I’ve changed the include = TRUE so that it shows in my knitted document, most of the time you should set it to include = FALSE).
knitr::opts_chunk$set(fig.width = 8, fig.height = 6, fig.path = 'Figs/',
echo = TRUE, warning = FALSE, message = FALSE)
I must work out how to make this my default so I don’t have to copy it in every time. That is probably possible via my .RProfile.
library(tidyverse)
library(ggeasy)
library(janitor)
library(patchwork)
library(gt)
library(papaja) # for theme_apa()
myExp <- read_csv(here::here("data", "2021-03-30_core_myExp.csv"))
APAC a looking for reassurance that student satisfaction wasn’t adversely affected by the move to online in 2020. Here I am filtering the myExp data to include just the question about quality, grouping by term and year and getting mean agreement scores.
quality <- myExp %>%
filter(question == "quality")
quality %>%
group_by(term, year) %>%
summarise(mean = mean(agreed, na.rm = TRUE)) %>%
gt()
| year | mean |
|---|---|
| 1 | |
| 2019 | 87.250 |
| 2020 | 89.275 |
| 2 | |
| 2019 | 90.850 |
| 2020 | 88.250 |
| 3 | |
| 2019 | 82.275 |
| 2020 | 83.600 |
Mean agreement overall seems a little up and little down… all within the margin of error I would say.
Set up year, term, course as factors
# check the data types with glimpse
glimpse(quality)
## Rows: 20
## Columns: 5
## $ course <chr> "PSYC1A", "PSYC2001", "PSYC2061", "PSYC3001", "PSYC3001", "PS…
## $ year <dbl> 2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2…
## $ term <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3
## $ question <chr> "quality", "quality", "quality", "quality", "quality", "quali…
## $ agreed <dbl> 93.2, 77.9, 93.0, 84.9, 90.3, 91.4, 85.0, 90.4, 92.2, 84.3, 9…
# make course a factor, check levels
quality$course <- as_factor(quality$course)
levels(quality$course)
## [1] "PSYC1A" "PSYC2001" "PSYC2061" "PSYC3001" "PSYC1B" "PSYC2081"
## [7] "PSYC1111" "PSYC2071" "PSYC2101" "PSYC3011"
# make year a factor, check levels
quality$year <- as_factor(quality$year)
levels(quality$year)
## [1] "2019" "2020"
# make term a factor, check levels
quality$term <- as_factor(quality$term)
levels(quality$term)
## [1] "1" "2" "3"
Plot % agree by year, fill by year to compare 2019-2020
quality %>%
ggplot(aes(x = year, y = agreed, fill = year)) +
geom_col(colour = "black") +
facet_wrap(~course, scales = 'free') +
scale_fill_manual(values = c("#A6CEE3", "#1F78B4")) +
scale_y_continuous(limits = c(0,100), expand = c(0,0)) +
theme_apa() +
easy_remove_legend() +
labs(title = "Percent of students agreeing with the statement",
subtitle = "Overall I was satisfied with the quality of this course",
y = "Percent student agreement",
x = "Year")
It would be good to differentiate T1, T2, and T3 courses, fill bars by term?
quality %>%
ggplot(aes(x = year, y = agreed, fill = term)) +
geom_col(colour = "black") +
facet_wrap(~course, scales = 'free') +
scale_y_continuous(limits = c(0,100), expand = c(0,0)) +
theme_apa() +
labs(title = "Percent of students agreeing with the statement",
subtitle = "Overall I was satisfied with the quality of this course",
y = "Percent student agreement",
x = "Year")
Hmmmm I like that better… but I wonder whether I can make the orange, green, and blue (T1, T2, T3) plots each be on different lines of the facet. Perhaps that is what facet_grid() is for.
If I make a grid by term maybe each plot would end up on the right line?
quality %>%
ggplot(aes(x = year, y = agreed, fill = term)) +
geom_col(colour = "black") +
facet_grid(term~course, scales = 'free') +
scale_y_continuous(limits = c(0,100), expand = c(0,0)) +
theme_apa() +
labs(title = "Percent of students agreeing with the statement",
subtitle = "Overall I was satisfied with the quality of this course",
y = "Percent student agreement",
x = "Year")
yes…. but no… much googling and I came to the conclusion that facet was not designed to do this.
Maybe I need to make them separately and patch them together with the patchwork package. Here I have left the title and subtitle on the t1 plot and removed all x and y axis labels.
I will need to work out how to add shared x and y axis labels to the patched plot.
t1 <- quality %>%
filter(term == 1) %>%
ggplot(aes(x = year, y = agreed, fill = term)) +
geom_col(colour = "black") +
scale_fill_manual(values=c("#F8766D")) +
facet_wrap(~course, ncol = 4) +
scale_y_continuous(limits = c(0,100), expand = c(0,0)) +
theme_apa() +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
labs(title = "Percent of students agreeing with the statement",
subtitle = "Overall I was satisfied with the quality of this course")
t2 <- quality %>%
filter(term == 2) %>%
ggplot(aes(x = year, y = agreed, fill = term)) +
geom_col(colour = "black") +
scale_fill_manual(values=c("#7CAE00")) +
facet_wrap(~course, ncol = 4) +
scale_y_continuous(limits = c(0,100), expand = c(0,0)) +
theme_apa() +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
easy_remove_legend_title()
t3 <- quality %>%
filter(term == 3) %>%
ggplot(aes(x = year, y = agreed, fill = term)) +
geom_col(colour = "black") +
scale_fill_manual(values=c("#C77CFF")) +
facet_wrap(~course, ncol = 4) +
scale_y_continuous(limits = c(0,100), expand = c(0,0)) +
theme_apa() +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) + easy_remove_legend_title()
The patchwork package makes it so easy to combine ggplots. Here I am using / between each one because I want them one under the other. If you want them side by side you use +
plot <- t1 / t2 / t3
plot
It would be nice to have shared x and y axis labels… apparently that functionality isn’t in the patchwork package yet, but you can add text via the wrap_elements() function and the grid package.
https://patchwork.data-imaginist.com/articles/guides/assembly.html
For patchwork, the order of the plots tells you their left/right alignment. So here I am putting text on the left of the plot (text + plot). You can position plots underneath each other using /, so adding text + plot / text, gets text to the left and underneath. rot = 90 turns the text 90 degrees.
wrap_elements(grid::textGrob('Percent student agreeing', rot = 90)) + plot / wrap_elements(grid::textGrob('Year'))
OK, that is a start but the positioning isn’t great. How do I move the left text?
In the textGrob() function, the x and y arguments seem to control position, each has a range of 0-1, 0 meaning bottom left and 1 meaning top right.
For the y axis, x = 0.96 puts the text on the right side of the box, y = 0.5 puts it centered vertically.
For the x axis, x = 0.5 makes in centered and y = 0.90 puts it very close to the bottom of the plot.
wrap_elements(grid::textGrob('Percent student agreeing', x = 0.96, y = 0.5, rot = 90)) + plot / wrap_elements(grid::textGrob('Year', x = 0.5, y = 0.90))
YAY, but now I have this plot that is squished to the right with massive white patches. Maybe the plot_layout() function will help? Here I am specifying that there are 2 columns in the grid and that the left one (containing only the y axis) should be very skinny.
wrap_elements(grid::textGrob('Percent student agreeing', x = 0.8, y = 0.6, rot = 90)) + plot / wrap_elements(grid::textGrob('Year', x = 0.5, y = 0.90)) +
plot_layout(ncol = 2, widths = c(0.2,2))
This took a lot of fiddling. But it works! I ended up needing to adjust the x y position of the textGrob, but that looks pretty good!
The patchwork package is really cool, it would be nice if it had the functionality to make x and y axis labels that were at the level of the whole plot once it was patched together. I learned a lot about grid graphics from trying to manually add axis labels.
Next step… do the same for elective courses. The nice thing about working out this out is that now that I have code that works, I can just substitute different data in and create the same style plots. It is time consuming to work it out the first time, but writing code to automate a manual process that you have to do repeatedly is like putting money in the bank.