Note that although forcats is part of the “tidyverse”, it is not automatically loaded when you run library(tidyverse)
Handy forcats functions for ggplot2
Comparing followers of world religions
Source: https://en.wikipedia.org/wiki/List_of_religious_populations
I was looking for a simple dataset with count data for many items to demonstrate some basic forcats functions that are useful when creating plots.
religions = read_csv ("https://raw.githubusercontent.com/acatlin/data/master/religions.csv" ,
show_col_types = FALSE , col_names = FALSE ) %>%
rename (religion = X1, followers = X2) %>%
mutate (millions_of_followers = followers/ 1000000.0 ) %>%
select (religion, millions_of_followers)
religions
# A tibble: 21 × 2
religion millions_of_followers
<chr> <dbl>
1 Christianity 2400
2 Islam 1900
3 Hinduism 1200
4 Secular 1200
5 Buddhism 506
6 Chinese Traditional 394
7 Various Ethnic Religions 300
8 African traditional religions 100
9 Sikhism 26
10 Spiritism 15
# ℹ 11 more rows
1A: basic ggplot
Q: What are the most followed religions? A: Use ggplot to compare religious populations
You can also embed plots, for example:
religions %>%
ggplot (aes (x = religion, y = millions_of_followers)) +
geom_col (fill = "lightblue" ) +
labs (x = "religion" , y = "millions of followers" ,
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations" )
1B: How do I flip coordinates?
religions %>%
ggplot (aes (x = religion, y = millions_of_followers)) +
geom_col (fill = "lightblue" ) +
labs (x = "religion" , y = "millions of followers" ,
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations" ) + coord_flip ()
2A: How do I change sort order?
Revised by: Andy Catlin
Q: How do we change the chart to show the most followed religions first? A: Use forcats::fct_reorder()
library (forcats)
ggplot (religions, aes (x = fct_reorder (religion, millions_of_followers),
y = millions_of_followers)) +
geom_col (fill = "lightblue" ) +
labs (x = "religion" , y = "millions of followers" ,
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations" ) +
coord_flip ()
2B: How do I combine less frequently used categories?
Q: How do we combine the less-followed religions into a single group? A: Use forcats::fct_other()
top5 = unlist (select (head (arrange (religions, desc (millions_of_followers)), 5 ), religion))
religions %>%
mutate (religion = fct_other (religion, keep = top5, other_level = "Other religions" )) %>%
ggplot (aes (x = fct_reorder (religion, millions_of_followers), y = millions_of_followers)) +
geom_col (fill = "lightblue" ) +
labs (x = "religion" , y = "millions of followers" ,
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations" ) +
coord_flip ()
2C: Adding a title
Reference: https://www.geeksforgeeks.org/ggplot2-title-and-subtitle-with-different-size-and-color-in-r/
religions %>%
mutate (religion = fct_other (religion, keep = top5, other_level = "Other religions" )) %>%
ggplot (aes (x = fct_reorder (religion, millions_of_followers), y = millions_of_followers)) +
geom_col (fill = "lightblue" ) +
labs (x = "religion" , y = "millions of followers" ,
title = "Most Popular Religions" ,
subtitle = "[2021]" ,
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations" ) +
theme (plot.title = element_text (size = 18 , color = "blue" ),
plot.subtitle = element_text (size = 14 , color = "gold" )) +
coord_flip ()
Tabular Data
religions |>
gt (rowname_col = "religion" ) |>
tab_header (
title = "Most popular religions" ,
subtitle = md ("**2021**" )) |>
tab_source_note (
source_note = md ("https://en.wikipedia.org/wiki/List_of_religious_populations" )) |>
opt_table_font (font = google_font ("Montserrat" ), weight = 500 )
2021
Christianity
2400
Islam
1900
Hinduism
1200
Secular
1200
Buddhism
506
Chinese Traditional
394
Various Ethnic Religions
300
African traditional religions
100
Sikhism
26
Spiritism
15
Judaism
15
Baháʼí
7
Jainism
4
Shinto
4
Cao Dai
4
Zoroastrianism
3
Tenrikyo
2
Animism
2
Neo-Paganism
1
Unitarian Universalism
1
Rastafari
1
https://en.wikipedia.org/wiki/List_of_religious_populations
Findings and Recommendations
To use the terminology of descriptive analytics (vs. predictive analytics), there is a single measure (millions of followers) across a single level of a single dimension (religion). Suppose we were able to find counts of religion data every 10 years for the past 200 years, by continent.
Two useful patterns of analysis in descriptive analytics are relative contribution and changes over time .
Relative contribution: What is the percent of the total that each religion represents (overall? by continent?) Changes over time: How did the counts (and percentages) of different relgions change over time (overall? by continent?)
What other measures might be interesting (e.g. by age group)
How would you represent the information in a table or a chart?
Would you be able to forecast religion counts (by continent) into the future?
etc.