Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of this data visualisation was for Ryne Rohla to favourably demonstrate his data analysis, research and map-making skills. He published this religion-focused project on his personal website. It shows his professional skills and helps make up a portfolio section alongside his CV.
The target audience was likely potential employers, academics and peers with related fields of interest.
The visualisation’s three main issues: * Issue 1 - The answer to the practical question was misleading. The main question to come out of the visualisation was what are the biggest religions in Australia? This visualisation suggests it’s Protestantism. It is not. No religion and Catholicism are. The map confuses geography as population. It’s a function of Australia having tiny populations in large areas.
Issue 2 - The choice of religion in the key is misleading. There are many religions displayed in the key that are not big enough to be displayed anywhere on the map.
Issue 3 - The key’s colours are confusing. Not sure if pink represents a small amount of Catholicism or a moderate amount of Taoism. Trying to represent each percentage with it’s own shade of colour is not effective for this visualisation.
Reference
The following code was used to fix the issues identified in the original.
# import packages
library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(gridExtra)
# import data
df <- read_xls("data/2016_australia_religion_data.xls", sheet = "Table 4", range = "A7:J187")
names(df)[1] <- "Religion"
# clean data by reating subset for each religion that has over a million followers.
no_religion <- df[175, ]
no_religion$Religion <- "No Religious Affiliation"
# a single function that does the work of producing a one line summary total for each states summary data.
# you have to know what lines the data is captured on.
# religion_name has to be a string in quote marks.
import_religion_and_sum <- function(df, religion_name, row_start, row_end){
relevent_religions <- df[row_start:row_end, ]
relevent_religions_with_total <- bind_rows(summarise(relevent_religions,
across(where(is.character), ~ religion_name),
across(where(is.numeric), sum)))
return(relevent_religions_with_total)
}
# rest of the totals using functions
not_stated <- import_religion_and_sum(df, "Not Stated", 177, 178)
catholic <- import_religion_and_sum(df, "Catholic", 12, 19)
anglican <- import_religion_and_sum(df, "Anglican", 6, 8)
other_christian_df <- df[c(4, 9:10, 21:25, 27:30, 32:37, 39:42, 44:53, 55:61, 63:82, 84:96, 98:111), ]
other_christian <- import_religion_and_sum(other_christian_df, "Other Christian", 1, 86)
other_religions_df <- df[c(1, 114:116, 119:120, 122:127, 129:133, 135:142, 144:154), ]
other_religions <- import_religion_and_sum(other_religions_df, "Other Religion", 1, 31)
# put all these observation into a single database
religion <- rbind(no_religion, catholic, anglican, other_christian, other_religions, not_stated)
# combine all the states with less than 1 million residents.
religion$`Other States` <- religion$Tasmania + religion$`Northern Territory` + religion$`Australian Capital Territory`
# grab the states we want
religion <- religion[,c(1:6, 11)]
# rename states to their abreviations
names(religion) <- c("Religion", "NSW", "Vic", "QL", "SA", "WA", "Other")
# pivot the table long.
state_religion_total <- pivot_longer(religion, cols = "NSW":"Other", names_to = "State", values_to = "People")
state_religion_total <- state_religion_total[c("State", "Religion", "People")]
# turning this variable into a factor to control the order of the bar chart.
state_religion_total$State <- factor(state_religion_total$State, levels = c("Other", "SA", "WA", "QL", "Vic", "NSW"))
# getting the data ready to graph.
state_religion_total_graph_one <- state_religion_total
state_religion_total_graph_one$Graph <- "Percentage"
state_religion_total_graph_one <- state_religion_total_graph_one %>%
group_by(State) %>%
mutate(countT= sum(People)) %>%
group_by(Graph, add=TRUE) %>%
mutate(per=round(People/countT,2)) %>%
ungroup
# replace the people with the percentages column.
state_religion_total_graph_one$People <- state_religion_total_graph_one$per
# remove the two calculation columns off the end.
state_religion_total_graph_one <- state_religion_total_graph_one[ ,1:4]
# the second graph
state_religion_total_graph_two <- state_religion_total
state_religion_total_graph_two$Graph <- "Persons"
# Now let's combined the two dataframes and draw this graph.
state_religion_total_two_graphs <- rbind(state_religion_total_graph_one, state_religion_total_graph_two)
state_religion_total_two_graphs$Graph <- factor(state_religion_total_two_graphs$Graph, levels = c("Persons", "Percentage"))
p1 <- ggplot(data = state_religion_total_two_graphs, aes(fill = Religion, y = State, x = People)) +
geom_bar(stat = "identity", alpha = 0.7) +
facet_grid(. ~Graph, scales = "free_x") +
labs(
title = "Biggest Religions for Largest States in Australia",
subtitle = "No Religion and Catholicism lead the way"
)
Data Reference
I sourced appropriate data from here:
The following plot fixes the main issues in the original.