1 Introduction

This project for the 2020 RStudio Contest Table was announced on September 15th and is due on October 31st.

Our data science team ( JITeam ) at the Jonglei Institute of Technology (The First South Sudanese online educational platform aspiring to train the next generation of South Sudanese data scientists and data analysts for free) is thrilled to participate in this exciting contest. JITeam comprises three members: Alier Reng, the Head of the Data Science Program & President of Jonglei Institute of Technology, Luka Chol Awan, TA & student, and Nazrul Islam, student.

Our team elected to showcase the gt package’s elegant features using South Sudan’s 2008 Census Dataset obtained here. Instead of creating just a table, we opted to create tutorials to help other data science enthusiasts, aspiring data scientists, and data analysts learn to implement the gt package in their data science projects.

South Sudan is the world’s youngest country that gained its independence from Sudan in 2011. According to Wikipedia, South Sudan has a population of 10.98 million; however, the dataset we’re using for this contest shows that South Sudan has 8.26 million.

Below is the map of South Sudan.

South Sudan Map (Credits: Wikipedia)

2 Details About the Contest

We obtained the below information about the 2020 RStudio Table Contest here!.

2.0.1 Contest Judging Criteria

Tables will be judged based on technical merit, artistic design, and quality of documentation. We recognize that some tables may excel in only one category and others in more than one or all categories. Honorable mentions will be awarded with this in mind.

We are working with maintainers of many of the R community’s most popular R packages for building tables, including Yihui Xie of DT, Rich Iannone of gt, Greg Lin of reactable, David Gohel of flextable, David Hugh-Jones of huxtable , and Hao Zhu of kableExtra. Many of these maintainers will help review submissions built with their packages.

2.0.2 Requirements

A submission must include all code and data used to replicate your entry. This may be a fully knitted R Markdown document with code (for example published to RPubs or shinyapps.io), a repository, or rstudio.cloud project.

A submission can use any table-making package available in R, not just the ones mentioned above.

Submission Types - We are looking for three types of table submissions,

Single Table Example: This may highlight interesting structuring of content, useful and tricky features – for example, enabling interaction – or serve as an example of a common table popular in a specific field. Be sure to document your code for clarity.
Tutorials: It’s all about teaching us how to craft an excellent table or understand a package’s features. This may include several tables and narrative.
Other: For submissions that do not easily fit into one of the types above.

Category - Given that tables have different features and purposes, we’d also like you to further categorize the submission table. There are four categories, static-HTML, interactive-HTML, static-print, and interactive-Shiny. Simply choose the one that best fits your table.

You can submit your entry for the contest by filling the form at rstd.io/table-contest-2020. The form will generate a post on RStudio Community, which you can then edit further at a later date. You may make multiple entries.

The deadline for submissions is October 31st, 2020, at midnight Pacific Time.

3 Tabulating 2008 South Sudan Census Dataset With the `gt` Package

3.1 Loading the packages

Here we will only install two packages: tidyverse and gt.

library(tidyverse)
library(gt)

3.2 Importing the data

In this section, we’re using the vroom package, however, we could have also used readr package.

# Import the data
ss_2008_census_data_raw <- vroom::vroom("00_Data/ss_2008_census_data_raw.csv")

# View the first 5 rows
slice_head(ss_2008_census_data_raw, n = 5)

## # A tibble: 5 x 10
##   Region `Region Name` `Region - Regio… Variable `Variable Name` Age  
##   <chr>  <chr>         <chr>            <chr>    <chr>           <chr>
## 1 KN.A2  Upper Nile    SS-NU            KN.B2    Population, To… KN.C1
## 2 KN.A2  Upper Nile    SS-NU            KN.B2    Population, To… KN.C2
## 3 KN.A2  Upper Nile    SS-NU            KN.B2    Population, To… KN.C3
## 4 KN.A2  Upper Nile    SS-NU            KN.B2    Population, To… KN.C4
## 5 KN.A2  Upper Nile    SS-NU            KN.B2    Population, To… KN.C5
## # … with 4 more variables: `Age Name` <chr>, Scale <chr>, Units <chr>,
## #   `2008` <dbl>

Below, we see three rows with NAs; however, these rows do not add any value to our analyses, so we’ll delete them in the following section.

# View the last 10 rows
slice_tail(ss_2008_census_data_raw, n = 10)

## # A tibble: 10 x 10
##    Region `Region Name` `Region - Regio… Variable `Variable Name` Age  
##    <chr>  <chr>         <chr>            <chr>    <chr>           <chr>
##  1 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C9
##  2 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C…
##  3 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C…
##  4 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C…
##  5 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C…
##  6 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C…
##  7 KN.A11 Eastern Equa… SS-EE            KN.B8    Population, Fe… KN.C…
##  8 <NA>   <NA>          <NA>             <NA>     <NA>            <NA> 
##  9 Sourc… National Bur… <NA>             <NA>     <NA>            <NA> 
## 10 Downl… http://south… <NA>             <NA>     <NA>            <NA> 
## # … with 4 more variables: `Age Name` <chr>, Scale <chr>, Units <chr>,
## #   `2008` <dbl>

3.3 Wrangling the data

Now that we’ve imported our dataset and have inspected the first and the last few rows, we will wrangle the data to make it tidy.

# Subset the data
ss_2008_census_data_tbl <- ss_2008_census_data_raw %>% 
  
  # Select only the desired columns
  select(State          = `Region Name`, 
         Category       = `Variable Name`,
         `Age Category` = `Age Name`, 
         population     = `2008`) %>% 
  
  # Split the Category column
  separate(Category,
           into = c("Pop.", "Gender", "Other"),
           sep  = " ") %>% 
  
  # Delete Pop. and Other columns
  select(-Pop., -Other) %>% 
  
  # Delete NAs using the Gender column
  filter(!is.na(Gender),
         Gender         != "Total",
         `Age Category` != "Total") %>% 
  
  # Manually collapsing factor levels with fct_collapse()
  mutate(
    `Age Category` = fct_collapse(`Age Category`,
      `0-19`       = c("0 to 4", "5 to 9", "10 to 14", "15 to 19"),
      `20-34`      = c( "20 to 24", "25 to 29", "30 to 34"),
      `35-49`      = c("35 to 39", "40 to 44", "45 to 49"),
      `50-64`      = c( "50 to 54", "55 to 59", "60 to 64"),
      `>= 65`      = "65 +")) %>% 
  
  # Group by state, category and age category, and summarize
  group_by(State, Gender, `Age Category`) %>% 
  summarize(Population = sum(population),
            .groups    = "drop") %>% 
  ungroup() %>%
  
  # Add the region column
  mutate(Region = case_when(
    State %in% c("Central Equatoria", "Eastern Equatoria", "Western Equatoria")          ~ "Equatoria",
    State %in% c("Warrap", "Western Bahr el Ghazal", "Northern Bahr el Ghazal", "Lakes") ~ "Bahr el Ghazal",
    TRUE ~ "Upper Nile"),
    
    # Place this column before the State column
         .before = "State")
  
# View the first 15 rows
ss_2008_census_data_tbl %>% slice_head(n = 15)

## # A tibble: 15 x 5
##    Region    State             Gender `Age Category` Population
##    <chr>     <chr>             <chr>  <fct>               <dbl>
##  1 Equatoria Central Equatoria Female 0-19               283092
##  2 Equatoria Central Equatoria Female 20-34              139942
##  3 Equatoria Central Equatoria Female 35-49               66745
##  4 Equatoria Central Equatoria Female 50-64               23460
##  5 Equatoria Central Equatoria Female 65+                  8596
##  6 Equatoria Central Equatoria Male   0-19               308935
##  7 Equatoria Central Equatoria Male   20-34              153332
##  8 Equatoria Central Equatoria Male   35-49               79238
##  9 Equatoria Central Equatoria Male   50-64               28808
## 10 Equatoria Central Equatoria Male   65+                 11409
## 11 Equatoria Eastern Equatoria Female 0-19               243642
## 12 Equatoria Eastern Equatoria Female 20-34              111079
## 13 Equatoria Eastern Equatoria Female 35-49               57120
## 14 Equatoria Eastern Equatoria Female 50-64               20496
## 15 Equatoria Eastern Equatoria Female 65+                  8637

3.4 Tabulating the Data With the `gt` Package

In this section, we’ll tabulate the dataset and place the results in two separate tabs using .tabset - this saves space by arranging outputs horizontally on the page.

3.4.1 Population by State

In this section, we’ll tabulate only the states’ total populations (in persons).

# Subset the dataset to extract the state totals
state_pop_gt <- ss_2008_census_data_tbl %>% 
  
  # Group by region and state columns; summarize
  group_by(Region, State) %>% 
  summarize(Population = sum(Population),
            .groups = "drop") %>% 
  
  # Arrange the data in descending order by population
  arrange(desc(Population)) %>% 
  
  # Exclude the region
  select(-Region) %>% 
  
  # Initialize a gt table
  gt() %>% 
  
  # Add the spanners to the group the columns
  tab_spanner(
    label   = "State Population in Descending Order",
    columns = 2) %>% 
  
  # Add a title and a subtitle
  tab_header(
    title    = "South Sudan 2008 Population by State",
    subtitle = "Jonglei State has the largest population") %>% 
  
  # Add the row sums
  grand_summary_rows(
    columns   = 2,
    fns       = list(Total = ~sum(.)),
    formatter = fmt_number
  ) %>% 
  
  # Add the background styling - highlight the greatest state population with a green color
  tab_style(
    style             = list(
      cell_fill(color = "#4caf50"),
      cell_text(color = "white")
    ),
    locations         = cells_body(
      columns         = vars(Population),
      rows            = Population == max(Population))) %>% 
  
    # Add the background styling - highlight the median state population with an orange color
  tab_style(
    style             = list(
      cell_fill(color = "#ff8c00"),
      cell_text(color = "white")
    ),
    locations         = cells_body(
      columns         = vars(Population),
      rows            = Population %in% c("720898", "906161"))) %>% 
  
    # Add the background styling - highlight the minimum state population with a red color
  tab_style(
    style             = list(
      cell_fill(color = "#DC6140"),
      cell_text(color = "white")
    ),
    locations         = cells_body(
      columns         = vars(Population),
      rows            = Population == min(Population))) %>% 
  
  # Apply a gray background to the header
  tab_options(
    heading.background.color = "gray"
  ) %>% 
  
   # Add a foot note and a source information
   tab_footnote(
     footnote  = "gt Tutorials by JITeam, Jonglei Institute of Technology (www.jongleiinstitute.com)",
     locations = cells_column_labels(
       columns = 2)
  ) %>% 
  
  tab_source_note(
    source_note = "Data source: South Sudan Data Portal"
  )
  
# Display the table
state_pop_gt

South Sudan 2008 Population by State
Jonglei State has the largest population
	State	State Population in Descending Order
	State	Population¹
	Jonglei	1358602
	Central Equatoria	1103557
	Warrap	972928
	Upper Nile	964353
	Eastern Equatoria	906161
	Northern Bahr el Ghazal	720898
	Lakes	695730
	Western Equatoria	619029
	Unity	585801
	Western Bahr el Ghazal	333431
Total	—	8,260,490.00
Data source: South Sudan Data Portal
¹ gt Tutorials by JITeam, Jonglei Institute of Technology (www.jongleiinstitute.com)

3.4.2 Population by State and Gender

In this section, we’ll tabulate South Sudan’s 2008 Population by state and gender. Further, we’ll use the colors of the flag of South Sudan ( red, black, green, blue, and orange (for yellow)) to illustrate how to apply different background coloring to the gt table.

# Subset the dataset
ss_2008_census_gt_1 <- ss_2008_census_data_tbl %>% 
  
  # Pivot the data
  pivot_wider(
      names_from  = `Age Category`,
      values_from = Population) %>% 
  
  # Arrange the data by region in descending order by age 0-19 
  arrange(Region, desc(`0-19`)) %>% 
  
  # Exclude the region from the table
  select(-Region) %>%
  
  # Initialize a gt table
  gt() %>% 
  
  # Add a title and a subtitle
  tab_header(
    title    = "South Sudan 2008 Population by Gender and State",
    subtitle = "Population by Gender and Age Groups") %>% 
  
  # Create subgroups by columns
  # Bhar el Ghazal Region
  tab_row_group(
    group = "Bahr el Ghazal",
    rows  = 1:8) %>% 
  
  # Equatori Region
  tab_row_group(
    group = "Equatoria",
    rows  = 9:14) %>% 
  
  # Upper Nile Region
  tab_row_group(
    group = "Upper Nile",
    rows  = 15:20) %>% 
  
   # Add the spanners
  tab_spanner(
     label   = "Population by Gender & Age Category",
     columns = 2:7) %>% 
  
  tab_spanner(
     label   = "States by Former Regions",
     columns = 1) %>% 
  
  # Add the row grand summaries
  grand_summary_rows(
     columns   = 3:7,
     fns       = list(Totals = ~ sum(.)),
     formatter = fmt_number) %>% 
  
  # Style the table
  tab_options(heading.background.color       = "#ff8c00",
              column_labels.background.color = "gray") %>% 
  
  # Upper Nile Region
  tab_style(
    style             = list(
      cell_fill(color = "black"),
      cell_text(color = "white")),
    locations         = cells_body(
      columns         = 3:8,
      rows            = 15:20)) %>% 
  
  # Equatoria Region
    tab_style(
    style             = list(
      cell_fill(color = "#DC6140"),
      cell_text(color = "white")),
    locations         = cells_body(
      columns         = 3:8,
      rows            = 9:14)) %>% 
  
  # Bahr el Ghazal Region
    tab_style(
    style             = list(
      cell_fill(color = "#4caf50"),
      cell_text(color = "white")),
    locations         = cells_body(
      columns         = 3:8,
      rows            = 1:8)) %>% 
  
    tab_style(
    style             = list(
      cell_fill(color = "#5077E0"),
      cell_text(color = "white")),
    locations         = cells_body(
      columns         = 2,
      rows            = 1:20)) %>% 
  
   # Adding the foot note & source information
   tab_footnote(
     footnote  = "`gt` Tutorials by JITeam, The Jonglei Institute of Technology (www.jongleiinstitute.com)",
     locations = cells_column_labels(
       columns = 2:7)) %>% 
  
  tab_source_note(
    source_note = "Data source: South Sudan Data Portal"
  )
  
# Display the table
ss_2008_census_gt_1

South Sudan 2008 Population by Gender and State
Population by Gender and Age Groups
	States by Former Regions	Population by Gender & Age Category
	State	Gender¹	0-19¹	20-34¹	35-49¹	50-64¹	65+¹
Upper Nile
	Jonglei	Male	419182	157319	90925	44243	22658
	Jonglei	Female	329048	164193	87198	31452	12384
	Upper Nile	Male	294848	113552	70681	30603	15746
	Upper Nile	Female	237435	108924	60058	22362	10144
	Unity	Male	179616	62313	34091	15228	8999
	Unity	Female	163798	66837	33267	13851	7801
Equatoria
	Central Equatoria	Male	308935	153332	79238	28808	11409
	Central Equatoria	Female	283092	139942	66745	23460	8596
	Eastern Equatoria	Male	274404	99862	55139	23254	12528
	Eastern Equatoria	Female	243642	111079	57120	20496	8637
	Western Equatoria	Male	162324	77197	47857	19524	11541
	Western Equatoria	Female	148059	83592	45314	16252	7369
Bahr el Ghazal
	Warrap	Male	275805	94888	63010	24686	12345
	Warrap	Female	273397	127170	66936	24066	10625
	Northern Bahr el Ghazal	Male	204291	63709	45635	21132	13523
	Northern Bahr el Ghazal	Female	200375	89179	48861	21608	12585
	Lakes	Male	198581	87219	49536	20444	10100
	Lakes	Female	176918	86832	42932	16772	6396
	Western Bahr el Ghazal	Male	92265	45326	26307	8971	4171
	Western Bahr el Ghazal	Female	83151	41467	20767	7479	3527
Totals	—	—	4,549,166.00	1,973,932.00	1,091,617.00	434,691.00	211,084.00
Data source: South Sudan Data Portal
¹ `gt` Tutorials by JITeam, The Jonglei Institute of Technology (www.jongleiinstitute.com)

4 Closing Remarks

In this article, we’ve demonstrated how to wrangle data with dplyr, and we have thoroughly shown how to tabulate the data with the gt package. In the 2020 RStudio Table Contest, we’re asked to choose any R data table package of our choice and highlight its prominent features. And as a result, our team decided to do a tutorial with two tables so that others may benefit from our project.

5 Acknowledgements

We thank RStudio for allowing us to showcase our R skills in the form of the gt tutorials. We hope that our work will benefit other aspiring data scientists, data analysts, data enthusiasts, and everyone else who wants to learn the R programming, particularly the gt package.

By the same token, we thank both DataCamp and Business Science University, for without their amazing courses and tutorials, we could not have been able to complete this project.

Lastly, I would like to thank both Luka Awan and Nazrul Islam for teaming up with me on this project to represent the Jonglei Institute of Technology - this is our first competition, and we hope to participate more in the future.

Thank you once again, RStudio, for the opportunity. We hope that your users and learners will find this work beneficial.

Kind regards,

Alier Reng, Head of the Data Science Program & President

Luka Chol D’Awan, TA & Student

Nazrul Islam, Student

2020 RStudio Table Contest

JITeam

2020-10-26