1 Background and Context

The proportion of Australian students completing Year 12 varies considerably with a range of demographic characteristics (e.g., gender, State/Territory, geographic location, socio-economic status, language background, and Indigenous status).

This week, I challenged myself to use R to replicate Figure 1.1 below. Shown in this figure are Year 12 completion rates for 19 year olds based on data from the Australian Census of Population and Housing 2016, as reported in Lamb et al. (2020).

Percentage of 19-year-olds who have completed a Year 12 or equivalent qualification, by selected background characteristics (2016). Data from Australian Census of Population and Housing 2016, as reported by Lamb et al. (2020).

Figure 1.1: Percentage of 19-year-olds who have completed a Year 12 or equivalent qualification, by selected background characteristics (2016). Data from Australian Census of Population and Housing 2016, as reported by Lamb et al. (2020).

2 Solution

Figure 2.1 was my solution, depicting Year 12 completion rates by grouped demographic characteristics, with each group incorporated within the graph as a separate facet.

Figure 2.1: Percentage of 19-year-olds who have completed a Year 12 or equivalent qualification, by selected background characteristics (2016). Data from Australian Census of Population and Housing 2016, as reported by Lamb et al. (2020). Replication in R.

3 Step-by-step code for replicating in R

First, load packages into R.

# Step 1: Load packages ------

library(tidyverse)

The data used in this example can be found here. Here are the first few rows:

head(Census2016)

##         Group Characteristic Proportion      Color Order
## 1                  Australia       81.6  Reference     1
## 2      Gender          Males       78.4    Default     3
## 3      Gender        Females       85.0    Default     2
## 4 Indigeneity Non-Indigenous       82.7  Reference     4
## 5 Indigeneity     Indigenous       57.8 Comparator     5
## 6 Indigeneity     Aboriginal       56.8    Default     8

Here is the annotated code I used to generate the above graph.

# first, establish desired sort order for facets

Census2016$Group <- factor(Census2016$Group, 
              levels = c("", "Gender", "State/Territory", "Location", 
                         "SES deciles (Low to High)", "Indigeneity", "Language background"))

# second, establish desired sort order for demographic characteristics (here: descending value of order variable)

Census2016$Characteristic <- factor(Census2016$Characteristic, 
        levels = unique(Census2016$Characteristic[order(desc(Census2016$Order))]))

# next, generate graph of proportions by characteristic, with group as facet, and color determining fill

Census2016 %>%
  ggplot(aes(y = Proportion, x = Characteristic,
             fill=factor(ifelse(Color=="Reference", "Highlight1",
                                ifelse(Color=="Comparator", "Highlight2",
                                "Normal"))))) +
  geom_bar(stat = 'identity', show.legend = FALSE) +
  scale_fill_manual(name="Color", values = c("red", "black", "grey50")) +
  facet_grid(rows = vars(Group), scales = "free_y", space = "free_y") +
  coord_flip() +
  theme(panel.spacing = unit(1, "lines")) +
  labs(y = "Percent", x = "") +
  geom_hline(yintercept = 81.6, color="red", linetype = "longdash") +
  geom_text(aes(label=round(Proportion, digits = 0)), hjust = 1.6, color="white", size=3.5)

4 Data

Copying the following into R creates a dataframe with all data required to replicate the above.

 Census2016 <- structure(list(
  Characteristic = c("Australia", "Males", "Females", "NSW", "Victoria", 
    "Queensland", "South Australia", "Western Australia", "Tasmania", 
    "Northern Territory", "Australian Capital Territory", "Major Cities", 
    "Inner Regional", "Outer Regional", "Remote", "Very Remote", 
    "Low", "2", "3", "4", "5", "6", "7", "8", "9", "High", "English", 
    "LBOTE", "Northern European", "Southern European", "Eastern European", 
    "Southwest and Central Asian", "Southern Asian", "Southeast Asian", 
    "Eastern Asian", "Australian Indigenous", "Other", "Non-Indigenous", 
    "Indigenous", "Aboriginal", "Torres Strait Islander", "Both"), 
  Proportion = c(81.6, 
    78.4, 85, 80.2, 82.9, 84.4, 79.6, 80.7, 71.2, 58.8, 90.6, 
    85.1, 72, 70.9, 65, 48.4, 66.8, 74, 76.9, 78.9, 80, 82.4, 
    84.2, 86.3, 88.7, 91.8, 79.7, 88.3, 79.8, 89, 90, 79.1, 94.9, 
    88.6, 91.8, 33.3, 79.4, 82.7, 57.8, 56.8, 68.7, 65.3), 
  Group = c("", "Gender", "Gender", "State/Territory", 
    "State/Territory", "State/Territory", "State/Territory", "State/Territory", 
    "State/Territory", "State/Territory", "State/Territory", "Location", 
    "Location", "Location", "Location", "Location", "SES deciles (Low to High)", 
    "SES deciles (Low to High)", "SES deciles (Low to High)", "SES deciles (Low to High)", 
    "SES deciles (Low to High)", "SES deciles (Low to High)", "SES deciles (Low to High)", 
    "SES deciles (Low to High)", "SES deciles (Low to High)", "SES deciles (Low to High)", 
    "Language background", "Language background", "Language background", 
    "Language background", "Language background", "Language background", 
    "Language background", "Language background", "Language background", 
    "Language background", "Language background", "Indigeneity", 
    "Indigeneity", "Indigeneity", "Indigeneity", "Indigeneity"), 
  Order = c(1L, 3L, 2L, 38L, 
    36L, 35L, 39L, 37L, 40L, 41L, 34L, 19L, 20L, 21L, 22L, 23L, 33L, 
    32L, 31L, 30L, 29L, 28L, 27L, 26L, 25L, 24L, 9L, 10L, 16L, 15L, 
    13L, 17L, 11L, 14L, 12L, 19L, 18L, 4L, 5L, 8L, 6L, 7L), 
  Color = c("Reference", "Default", "Default", 
    "Default", "Default", "Default", "Default", "Default", "Default", 
    "Default", "Default", "Default", "Default", "Default", "Default", 
    "Default", "Default", "Default", "Default", "Default", "Default", 
    "Default", "Default", "Default", "Default", "Default", "Reference", 
    "Comparator", "Default", "Default", "Default", "Default", "Default", 
    "Default", "Default", "Default", "Default", "Reference", "Comparator", 
    "Default", "Default", "Default")), 
  row.names = c(NA, -42L), 
  class = "data.frame")

5 Resources

Thanks to the following contributors on stackoverflow for tips and tricks used in this document:

RoB for pointers on how to group bars using facet_grid within ggplot
Russ Thomas for tips on using if else within ggplot to format fill of bars based on an additional variable in the dataframe
Joris Meys for tips on using dput to generate a list of data for use in reproducible examples.

References

Lamb, Stephen, Shuyan Huo, Anne Walstab, Andrew Wade, Quentin Maire, Esther Doecke, Jen Jackson, and Zoran Endekov. 2020. “Educational Opportunity in Australia 2020: Who Succeeds and Who Misses Out?” Melbourne: Centre for International Research on Education Systems, Victoria University, for the Mitchell Institute. https://www.vu.edu.au/sites/default/files/educational-opportunity-in-australia-2020.pdf.

Show and tell: Grouped bar graphs in ggplot

Lynley Aldridge