The proportion of Australian students completing Year 12 varies considerably with a range of demographic characteristics (e.g., gender, State/Territory, geographic location, socio-economic status, language background, and Indigenous status).
This week, I challenged myself to use R to replicate Figure 1.1 below. Shown in this figure are Year 12 completion rates for 19 year olds based on data from the Australian Census of Population and Housing 2016, as reported in Lamb et al. (2020).
Figure 1.1: Percentage of 19-year-olds who have completed a Year 12 or equivalent qualification, by selected background characteristics (2016). Data from Australian Census of Population and Housing 2016, as reported by Lamb et al. (2020).
Figure 2.1 was my solution, depicting Year 12 completion rates by grouped demographic characteristics, with each group incorporated within the graph as a separate facet.
Figure 2.1: Percentage of 19-year-olds who have completed a Year 12 or equivalent qualification, by selected background characteristics (2016). Data from Australian Census of Population and Housing 2016, as reported by Lamb et al. (2020). Replication in R.
First, load packages into R.
# Step 1: Load packages ------
library(tidyverse)
The data used in this example can be found here. Here are the first few rows:
head(Census2016)
## Group Characteristic Proportion Color Order
## 1 Australia 81.6 Reference 1
## 2 Gender Males 78.4 Default 3
## 3 Gender Females 85.0 Default 2
## 4 Indigeneity Non-Indigenous 82.7 Reference 4
## 5 Indigeneity Indigenous 57.8 Comparator 5
## 6 Indigeneity Aboriginal 56.8 Default 8
Here is the annotated code I used to generate the above graph.
# first, establish desired sort order for facets
Census2016$Group <- factor(Census2016$Group,
levels = c("", "Gender", "State/Territory", "Location",
"SES deciles (Low to High)", "Indigeneity", "Language background"))
# second, establish desired sort order for demographic characteristics (here: descending value of order variable)
Census2016$Characteristic <- factor(Census2016$Characteristic,
levels = unique(Census2016$Characteristic[order(desc(Census2016$Order))]))
# next, generate graph of proportions by characteristic, with group as facet, and color determining fill
Census2016 %>%
ggplot(aes(y = Proportion, x = Characteristic,
fill=factor(ifelse(Color=="Reference", "Highlight1",
ifelse(Color=="Comparator", "Highlight2",
"Normal"))))) +
geom_bar(stat = 'identity', show.legend = FALSE) +
scale_fill_manual(name="Color", values = c("red", "black", "grey50")) +
facet_grid(rows = vars(Group), scales = "free_y", space = "free_y") +
coord_flip() +
theme(panel.spacing = unit(1, "lines")) +
labs(y = "Percent", x = "") +
geom_hline(yintercept = 81.6, color="red", linetype = "longdash") +
geom_text(aes(label=round(Proportion, digits = 0)), hjust = 1.6, color="white", size=3.5)
Copying the following into R creates a dataframe with all data required to replicate the above.
Census2016 <- structure(list(
Characteristic = c("Australia", "Males", "Females", "NSW", "Victoria",
"Queensland", "South Australia", "Western Australia", "Tasmania",
"Northern Territory", "Australian Capital Territory", "Major Cities",
"Inner Regional", "Outer Regional", "Remote", "Very Remote",
"Low", "2", "3", "4", "5", "6", "7", "8", "9", "High", "English",
"LBOTE", "Northern European", "Southern European", "Eastern European",
"Southwest and Central Asian", "Southern Asian", "Southeast Asian",
"Eastern Asian", "Australian Indigenous", "Other", "Non-Indigenous",
"Indigenous", "Aboriginal", "Torres Strait Islander", "Both"),
Proportion = c(81.6,
78.4, 85, 80.2, 82.9, 84.4, 79.6, 80.7, 71.2, 58.8, 90.6,
85.1, 72, 70.9, 65, 48.4, 66.8, 74, 76.9, 78.9, 80, 82.4,
84.2, 86.3, 88.7, 91.8, 79.7, 88.3, 79.8, 89, 90, 79.1, 94.9,
88.6, 91.8, 33.3, 79.4, 82.7, 57.8, 56.8, 68.7, 65.3),
Group = c("", "Gender", "Gender", "State/Territory",
"State/Territory", "State/Territory", "State/Territory", "State/Territory",
"State/Territory", "State/Territory", "State/Territory", "Location",
"Location", "Location", "Location", "Location", "SES deciles (Low to High)",
"SES deciles (Low to High)", "SES deciles (Low to High)", "SES deciles (Low to High)",
"SES deciles (Low to High)", "SES deciles (Low to High)", "SES deciles (Low to High)",
"SES deciles (Low to High)", "SES deciles (Low to High)", "SES deciles (Low to High)",
"Language background", "Language background", "Language background",
"Language background", "Language background", "Language background",
"Language background", "Language background", "Language background",
"Language background", "Language background", "Indigeneity",
"Indigeneity", "Indigeneity", "Indigeneity", "Indigeneity"),
Order = c(1L, 3L, 2L, 38L,
36L, 35L, 39L, 37L, 40L, 41L, 34L, 19L, 20L, 21L, 22L, 23L, 33L,
32L, 31L, 30L, 29L, 28L, 27L, 26L, 25L, 24L, 9L, 10L, 16L, 15L,
13L, 17L, 11L, 14L, 12L, 19L, 18L, 4L, 5L, 8L, 6L, 7L),
Color = c("Reference", "Default", "Default",
"Default", "Default", "Default", "Default", "Default", "Default",
"Default", "Default", "Default", "Default", "Default", "Default",
"Default", "Default", "Default", "Default", "Default", "Default",
"Default", "Default", "Default", "Default", "Default", "Reference",
"Comparator", "Default", "Default", "Default", "Default", "Default",
"Default", "Default", "Default", "Default", "Reference", "Comparator",
"Default", "Default", "Default")),
row.names = c(NA, -42L),
class = "data.frame")
Thanks to the following contributors on stackoverflow for tips and tricks used in this document:
RoB for pointers on how to group bars using facet_grid within ggplot
Russ Thomas for tips on using if else within ggplot to format fill of bars based on an additional variable in the dataframe
Joris Meys for tips on using dput to generate a list of data for use in reproducible examples.
Lamb, Stephen, Shuyan Huo, Anne Walstab, Andrew Wade, Quentin Maire, Esther Doecke, Jen Jackson, and Zoran Endekov. 2020. “Educational Opportunity in Australia 2020: Who Succeeds and Who Misses Out?” Melbourne: Centre for International Research on Education Systems, Victoria University, for the Mitchell Institute. https://www.vu.edu.au/sites/default/files/educational-opportunity-in-australia-2020.pdf.