This project for the 2020 RStudio Contest Table was announced on September 15th and is due on October 31st.
Our data science team ( JITeam ) at the Jonglei Institute of Technology (The First South Sudanese online educational platform aspiring to train the next generation of South Sudanese data scientists and data analysts for free) is thrilled to participate in this exciting contest. JITeam comprises three members: Alier Reng, the Head of the Data Science Program & President of Jonglei Institute of Technology, Luka Chol Awan, TA & student, and Nazrul Islam, student.
Our team elected to showcase the gt package’s elegant features using South Sudan’s 2008 Census Dataset obtained here. Instead of creating just a table, we opted to create tutorials to help other data science enthusiasts, aspiring data scientists, and data analysts learn to implement the gt package in their data science projects.
South Sudan is the world’s youngest country that gained its independence from Sudan in 2011. According to Wikipedia, South Sudan has a population of 10.98 million; however, the dataset we’re using for this contest shows that South Sudan has 8.26 million.
Below is the map of South Sudan.
South Sudan Map (Credits: Wikipedia)
We obtained the below information about the 2020 RStudio Table Contest here!.
Tables will be judged based on technical merit, artistic design, and quality of documentation. We recognize that some tables may excel in only one category and others in more than one or all categories. Honorable mentions will be awarded with this in mind.
We are working with maintainers of many of the R community’s most popular R packages for building tables, including Yihui Xie of DT, Rich Iannone of gt, Greg Lin of reactable, David Gohel of flextable, David Hugh-Jones of huxtable , and Hao Zhu of kableExtra. Many of these maintainers will help review submissions built with their packages.
A submission must include all code and data used to replicate your entry. This may be a fully knitted R Markdown document with code (for example published to RPubs or shinyapps.io), a repository, or rstudio.cloud project.
A submission can use any table-making package available in R, not just the ones mentioned above.
Submission Types - We are looking for three types of table submissions,
Single Table Example: This may highlight interesting structuring of content, useful and tricky features – for example, enabling interaction – or serve as an example of a common table popular in a specific field. Be sure to document your code for clarity.
Tutorials: It’s all about teaching us how to craft an excellent table or understand a package’s features. This may include several tables and narrative.
Other: For submissions that do not easily fit into one of the types above.
Category - Given that tables have different features and purposes, we’d also like you to further categorize the submission table. There are four categories, static-HTML, interactive-HTML, static-print, and interactive-Shiny. Simply choose the one that best fits your table.
You can submit your entry for the contest by filling the form at rstd.io/table-contest-2020. The form will generate a post on RStudio Community, which you can then edit further at a later date. You may make multiple entries.
The deadline for submissions is October 31st, 2020, at midnight Pacific Time.
gt PackageHere we will only install two packages: tidyverse and gt.
library(tidyverse)
library(gt)In this section, we’re using the vroom package, however, we could have also used readr package.
# Import the data
ss_2008_census_data_raw <- vroom::vroom("00_Data/ss_2008_census_data_raw.csv")
# View the first 5 rows
slice_head(ss_2008_census_data_raw, n = 5)## # A tibble: 5 x 10
## Region `Region Name` `Region - Regio… Variable `Variable Name` Age
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 KN.A2 Upper Nile SS-NU KN.B2 Population, To… KN.C1
## 2 KN.A2 Upper Nile SS-NU KN.B2 Population, To… KN.C2
## 3 KN.A2 Upper Nile SS-NU KN.B2 Population, To… KN.C3
## 4 KN.A2 Upper Nile SS-NU KN.B2 Population, To… KN.C4
## 5 KN.A2 Upper Nile SS-NU KN.B2 Population, To… KN.C5
## # … with 4 more variables: `Age Name` <chr>, Scale <chr>, Units <chr>,
## # `2008` <dbl>
Below, we see three rows with NAs; however, these rows do not add any value to our analyses, so we’ll delete them in the following section.
# View the last 10 rows
slice_tail(ss_2008_census_data_raw, n = 10)## # A tibble: 10 x 10
## Region `Region Name` `Region - Regio… Variable `Variable Name` Age
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C9
## 2 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C…
## 3 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C…
## 4 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C…
## 5 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C…
## 6 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C…
## 7 KN.A11 Eastern Equa… SS-EE KN.B8 Population, Fe… KN.C…
## 8 <NA> <NA> <NA> <NA> <NA> <NA>
## 9 Sourc… National Bur… <NA> <NA> <NA> <NA>
## 10 Downl… http://south… <NA> <NA> <NA> <NA>
## # … with 4 more variables: `Age Name` <chr>, Scale <chr>, Units <chr>,
## # `2008` <dbl>
Now that we’ve imported our dataset and have inspected the first and the last few rows, we will wrangle the data to make it tidy.
# Subset the data
ss_2008_census_data_tbl <- ss_2008_census_data_raw %>%
# Select only the desired columns
select(State = `Region Name`,
Category = `Variable Name`,
`Age Category` = `Age Name`,
population = `2008`) %>%
# Split the Category column
separate(Category,
into = c("Pop.", "Gender", "Other"),
sep = " ") %>%
# Delete Pop. and Other columns
select(-Pop., -Other) %>%
# Delete NAs using the Gender column
filter(!is.na(Gender),
Gender != "Total",
`Age Category` != "Total") %>%
# Manually collapsing factor levels with fct_collapse()
mutate(
`Age Category` = fct_collapse(`Age Category`,
`0-19` = c("0 to 4", "5 to 9", "10 to 14", "15 to 19"),
`20-34` = c( "20 to 24", "25 to 29", "30 to 34"),
`35-49` = c("35 to 39", "40 to 44", "45 to 49"),
`50-64` = c( "50 to 54", "55 to 59", "60 to 64"),
`>= 65` = "65 +")) %>%
# Group by state, category and age category, and summarize
group_by(State, Gender, `Age Category`) %>%
summarize(Population = sum(population),
.groups = "drop") %>%
ungroup() %>%
# Add the region column
mutate(Region = case_when(
State %in% c("Central Equatoria", "Eastern Equatoria", "Western Equatoria") ~ "Equatoria",
State %in% c("Warrap", "Western Bahr el Ghazal", "Northern Bahr el Ghazal", "Lakes") ~ "Bahr el Ghazal",
TRUE ~ "Upper Nile"),
# Place this column before the State column
.before = "State")
# View the first 15 rows
ss_2008_census_data_tbl %>% slice_head(n = 15)## # A tibble: 15 x 5
## Region State Gender `Age Category` Population
## <chr> <chr> <chr> <fct> <dbl>
## 1 Equatoria Central Equatoria Female 0-19 283092
## 2 Equatoria Central Equatoria Female 20-34 139942
## 3 Equatoria Central Equatoria Female 35-49 66745
## 4 Equatoria Central Equatoria Female 50-64 23460
## 5 Equatoria Central Equatoria Female 65+ 8596
## 6 Equatoria Central Equatoria Male 0-19 308935
## 7 Equatoria Central Equatoria Male 20-34 153332
## 8 Equatoria Central Equatoria Male 35-49 79238
## 9 Equatoria Central Equatoria Male 50-64 28808
## 10 Equatoria Central Equatoria Male 65+ 11409
## 11 Equatoria Eastern Equatoria Female 0-19 243642
## 12 Equatoria Eastern Equatoria Female 20-34 111079
## 13 Equatoria Eastern Equatoria Female 35-49 57120
## 14 Equatoria Eastern Equatoria Female 50-64 20496
## 15 Equatoria Eastern Equatoria Female 65+ 8637
gt PackageIn this section, we’ll tabulate the dataset and place the results in two separate tabs using .tabset - this saves space by arranging outputs horizontally on the page.
In this section, we’ll tabulate only the states’ total populations (in persons).
# Subset the dataset to extract the state totals
state_pop_gt <- ss_2008_census_data_tbl %>%
# Group by region and state columns; summarize
group_by(Region, State) %>%
summarize(Population = sum(Population),
.groups = "drop") %>%
# Arrange the data in descending order by population
arrange(desc(Population)) %>%
# Exclude the region
select(-Region) %>%
# Initialize a gt table
gt() %>%
# Add the spanners to the group the columns
tab_spanner(
label = "State Population in Descending Order",
columns = 2) %>%
# Add a title and a subtitle
tab_header(
title = "South Sudan 2008 Population by State",
subtitle = "Jonglei State has the largest population") %>%
# Add the row sums
grand_summary_rows(
columns = 2,
fns = list(Total = ~sum(.)),
formatter = fmt_number
) %>%
# Add the background styling - highlight the greatest state population with a green color
tab_style(
style = list(
cell_fill(color = "#4caf50"),
cell_text(color = "white")
),
locations = cells_body(
columns = vars(Population),
rows = Population == max(Population))) %>%
# Add the background styling - highlight the median state population with an orange color
tab_style(
style = list(
cell_fill(color = "#ff8c00"),
cell_text(color = "white")
),
locations = cells_body(
columns = vars(Population),
rows = Population %in% c("720898", "906161"))) %>%
# Add the background styling - highlight the minimum state population with a red color
tab_style(
style = list(
cell_fill(color = "#DC6140"),
cell_text(color = "white")
),
locations = cells_body(
columns = vars(Population),
rows = Population == min(Population))) %>%
# Apply a gray background to the header
tab_options(
heading.background.color = "gray"
) %>%
# Add a foot note and a source information
tab_footnote(
footnote = "gt Tutorials by JITeam, Jonglei Institute of Technology (www.jongleiinstitute.com)",
locations = cells_column_labels(
columns = 2)
) %>%
tab_source_note(
source_note = "Data source: South Sudan Data Portal"
)
# Display the table
state_pop_gt | South Sudan 2008 Population by State | ||
|---|---|---|
| Jonglei State has the largest population | ||
| State | State Population in Descending Order | |
| Population1 | ||
| Jonglei | 1358602 | |
| Central Equatoria | 1103557 | |
| Warrap | 972928 | |
| Upper Nile | 964353 | |
| Eastern Equatoria | 906161 | |
| Northern Bahr el Ghazal | 720898 | |
| Lakes | 695730 | |
| Western Equatoria | 619029 | |
| Unity | 585801 | |
| Western Bahr el Ghazal | 333431 | |
| Total | — | 8,260,490.00 |
| Data source: South Sudan Data Portal | ||
|
1
gt Tutorials by JITeam, Jonglei Institute of Technology (www.jongleiinstitute.com)
|
||
In this section, we’ll tabulate South Sudan’s 2008 Population by state and gender. Further, we’ll use the colors of the flag of South Sudan ( red, black, green, blue, and orange (for yellow)) to illustrate how to apply different background coloring to the gt table.
# Subset the dataset
ss_2008_census_gt_1 <- ss_2008_census_data_tbl %>%
# Pivot the data
pivot_wider(
names_from = `Age Category`,
values_from = Population) %>%
# Arrange the data by region in descending order by age 0-19
arrange(Region, desc(`0-19`)) %>%
# Exclude the region from the table
select(-Region) %>%
# Initialize a gt table
gt() %>%
# Add a title and a subtitle
tab_header(
title = "South Sudan 2008 Population by Gender and State",
subtitle = "Population by Gender and Age Groups") %>%
# Create subgroups by columns
# Bhar el Ghazal Region
tab_row_group(
group = "Bahr el Ghazal",
rows = 1:8) %>%
# Equatori Region
tab_row_group(
group = "Equatoria",
rows = 9:14) %>%
# Upper Nile Region
tab_row_group(
group = "Upper Nile",
rows = 15:20) %>%
# Add the spanners
tab_spanner(
label = "Population by Gender & Age Category",
columns = 2:7) %>%
tab_spanner(
label = "States by Former Regions",
columns = 1) %>%
# Add the row grand summaries
grand_summary_rows(
columns = 3:7,
fns = list(Totals = ~ sum(.)),
formatter = fmt_number) %>%
# Style the table
tab_options(heading.background.color = "#ff8c00",
column_labels.background.color = "gray") %>%
# Upper Nile Region
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white")),
locations = cells_body(
columns = 3:8,
rows = 15:20)) %>%
# Equatoria Region
tab_style(
style = list(
cell_fill(color = "#DC6140"),
cell_text(color = "white")),
locations = cells_body(
columns = 3:8,
rows = 9:14)) %>%
# Bahr el Ghazal Region
tab_style(
style = list(
cell_fill(color = "#4caf50"),
cell_text(color = "white")),
locations = cells_body(
columns = 3:8,
rows = 1:8)) %>%
tab_style(
style = list(
cell_fill(color = "#5077E0"),
cell_text(color = "white")),
locations = cells_body(
columns = 2,
rows = 1:20)) %>%
# Adding the foot note & source information
tab_footnote(
footnote = "`gt` Tutorials by JITeam, The Jonglei Institute of Technology (www.jongleiinstitute.com)",
locations = cells_column_labels(
columns = 2:7)) %>%
tab_source_note(
source_note = "Data source: South Sudan Data Portal"
)
# Display the table
ss_2008_census_gt_1| South Sudan 2008 Population by Gender and State | |||||||
|---|---|---|---|---|---|---|---|
| Population by Gender and Age Groups | |||||||
| States by Former Regions | Population by Gender & Age Category | ||||||
| State | Gender1 | 0-191 | 20-341 | 35-491 | 50-641 | 65+1 | |
| Upper Nile | |||||||
| Jonglei | Male | 419182 | 157319 | 90925 | 44243 | 22658 | |
| Jonglei | Female | 329048 | 164193 | 87198 | 31452 | 12384 | |
| Upper Nile | Male | 294848 | 113552 | 70681 | 30603 | 15746 | |
| Upper Nile | Female | 237435 | 108924 | 60058 | 22362 | 10144 | |
| Unity | Male | 179616 | 62313 | 34091 | 15228 | 8999 | |
| Unity | Female | 163798 | 66837 | 33267 | 13851 | 7801 | |
| Equatoria | |||||||
| Central Equatoria | Male | 308935 | 153332 | 79238 | 28808 | 11409 | |
| Central Equatoria | Female | 283092 | 139942 | 66745 | 23460 | 8596 | |
| Eastern Equatoria | Male | 274404 | 99862 | 55139 | 23254 | 12528 | |
| Eastern Equatoria | Female | 243642 | 111079 | 57120 | 20496 | 8637 | |
| Western Equatoria | Male | 162324 | 77197 | 47857 | 19524 | 11541 | |
| Western Equatoria | Female | 148059 | 83592 | 45314 | 16252 | 7369 | |
| Bahr el Ghazal | |||||||
| Warrap | Male | 275805 | 94888 | 63010 | 24686 | 12345 | |
| Warrap | Female | 273397 | 127170 | 66936 | 24066 | 10625 | |
| Northern Bahr el Ghazal | Male | 204291 | 63709 | 45635 | 21132 | 13523 | |
| Northern Bahr el Ghazal | Female | 200375 | 89179 | 48861 | 21608 | 12585 | |
| Lakes | Male | 198581 | 87219 | 49536 | 20444 | 10100 | |
| Lakes | Female | 176918 | 86832 | 42932 | 16772 | 6396 | |
| Western Bahr el Ghazal | Male | 92265 | 45326 | 26307 | 8971 | 4171 | |
| Western Bahr el Ghazal | Female | 83151 | 41467 | 20767 | 7479 | 3527 | |
| Totals | — | — | 4,549,166.00 | 1,973,932.00 | 1,091,617.00 | 434,691.00 | 211,084.00 |
| Data source: South Sudan Data Portal | |||||||
|
1
`gt` Tutorials by JITeam, The Jonglei Institute of Technology (www.jongleiinstitute.com)
|
|||||||
In this article, we’ve demonstrated how to wrangle data with dplyr, and we have thoroughly shown how to tabulate the data with the gt package. In the 2020 RStudio Table Contest, we’re asked to choose any R data table package of our choice and highlight its prominent features. And as a result, our team decided to do a tutorial with two tables so that others may benefit from our project.
We thank RStudio for allowing us to showcase our R skills in the form of the gt tutorials. We hope that our work will benefit other aspiring data scientists, data analysts, data enthusiasts, and everyone else who wants to learn the R programming, particularly the gt package.
By the same token, we thank both DataCamp and Business Science University, for without their amazing courses and tutorials, we could not have been able to complete this project.
Lastly, I would like to thank both Luka Awan and Nazrul Islam for teaming up with me on this project to represent the Jonglei Institute of Technology - this is our first competition, and we hope to participate more in the future.
Thank you once again, RStudio, for the opportunity. We hope that your users and learners will find this work beneficial.
Kind regards,
Alier Reng, Head of the Data Science Program & President
Luka Chol D’Awan, TA & Student
Nazrul Islam, Student