Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Social media platforms are used by one-in-three people in the wold and more than two-thirds of all internet users. Here the “Our World in Data” organization has been published an article on “The rise of social media” where they represented a visualisation to show the ‘Number of people using social media, from 2004 to 2018’. In this visualisation, the publisher expressed their objectives as follows:
Audience:
The article “The rise of soial media’ has been published for the people interested to know number of people using social media, from 2004 to 2018. In this context, the publisher targeted really”Big" and “Diverse” audience and these audiences vary in their age, education, and mixed of technical and general population.
Three main issues with the original visualisation:
The visualisation chosen to show had the following three main issues:
Issue no-1: The visualisation is not answering the question - “Number of people using social media platforms, 2004-2018” clearly.
Issue no-2: The data in the line graph are shwoing individually for each of the social media platforms while the question is asking for cumulative data.
Issue no-3: The line graph is not the perfect type of visualisation to represent cumulative data like this.
Solutions to the issues and reconstruction of the visualisation:
As the visualisation is not the perfect ‘type’ of data visualisation to represent cumulative data to show ‘Number of people using social media from 2004 and 2018’ I have re-arranged the original data from starting year (2004) to finish year (2018). There were some missing data for some of the social media platforms which I have added into the data frame manually into the csv file. Also the original data was in the full number and I have adjusted the number view to ‘Billions’ so it can represent in human readable form in the visualisation and hence i have changed the column name.
To perform related data wrangling and represent the data I have installed necessary packages and libraries. I have used both ggplot2 and ggplotly to represent the data as cumulative data which is now addressing the issues.
To do that, I have assigned colours for each of the social media platforms as per their logo/ theme colour so the visualisation can use human brain’s visual processing capabilities for known colours to quickly interpret data associated to each of the social media platform. Then I have scaled both x and y axis of the visualisation as per data requirements and added title, axis name and legend.
Finally, I have used ggplotly’s tooltip function to turn the visualisation into interactive so it can show individual data (year, no of users and platform name) for each of the platforms upon hoovering on any point of the visualisation.
After reconstructing the data and creating the new interactive visualisation all of the issues with the original visualisation has been fixed. Now the reconstructed visualisation can answer, Number of people using social media platform from 2004 to 2018 very clearly as well as individual platform’s data by hoovering on to the visualisation for each of the year. The reconstructed visualisation now resolves all of the issues with original visualisation while meeting the objectives of the original visualisation for the same audience group.
Reference
Our World in Data article ‘The rise of social media’. (2019). Number of people using social media platforms, 2004 to 2018. Retrieved July 31, 2021, from Our World in Data website: https://ourworldindata.org/grapher/users-by-social-media-platform?time=2004..2018&country=FacebookInstagramMySpacePinterestRedditSnapchatTikTokTumblrTwitterWeChatWhatsapp~YouTube
The original data has been sourced from ‘Statista and TNW (2019)’ by ‘Our World in Data’.
The following code was used to fix the issues identified in the original.
# Set this directory as "working directory" before you go for the run.
library(ggplot2)
library(tidyverse)
library(dplyr)
library(plotly)
# Load csv data file
org_data <- read.csv("social_media_data.csv")
# Adjust view for the numbers into Billions and arranged as per year:
org_data[c("Monthly_active_users")] <- org_data[c("Monthly_active_users")]/1e9
names(org_data)[3] <- "Active_users_in_billions" # rename the data column
arr_data <- org_data %>% arrange(org_data$Year) %>% filter(Year < 2019) %>% group_by(Year)
# Assign colour for each of the platform as per their original logo colours.
En_cols <- c("MySpace" = "darkblue", "Snapchat" = "yellow1", "Pinterest" = "red4", "Twitter" = "deepskyblue3", "Reddit" = "orangered", "TikTok" = "cyan2", "Tumblr" = "magenta", "Instagram" = "deeppink3", "WeChat" = "chartreuse", "Whatsapp" = " forestgreen", "YouTube" = " firebrick", "Facebook" = "royalblue2", "Flickr" = "maroon1", "Friendster" = "green3", "Google Buzz" = "springgreen4", "Google+" = "darkred", "Hi5" = "sienna3", "Orkut" = "violetred2", "Weibo" = "red" )
# Plot visualisation with customised scales as per data limit.
p <- arr_data %>%
ggplot(aes(x=Year, y= Active_users_in_billions, fill=Entity, text=Entity)) +
geom_area() +
expand_limits(x = 2018, y = 11) +
scale_x_continuous(expand = c(0,0), breaks = c(2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018)) +
scale_y_continuous(expand = c(0, 0), breaks = c(2,4,6,8,10,12)) +
# assign colours manually
scale_fill_manual(values = En_cols) +
# adjust axis text
theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
# setup title and axis's name
labs(title = "Number of people using Social Media Platforms \n Data from 2004 to 2018",
x = "Years",
y = " No of Active Users (in Billion)") +
theme(legend.position = "right")
# Turn the normal plot into a interactive plot
p <- ggplotly(p, tooltip= c("text", "Year", "Active_users_in_billions"))
Data Reference
Our World in Data article ‘The rise of social media’. (2019). Number of people using social media platforms, 2004 to 2018. Retrieved July 31, 2021, from Our World in Data website: https://ourworldindata.org/grapher/users-by-social-media-platform?time=2004..2018&country=FacebookInstagramMySpacePinterestRedditSnapchatTikTokTumblrTwitterWeChatWhatsapp~YouTube
Business of Apps statistics on Snapchat Revenue and Usage Statistics (2021). Retrived July 31, 2021, from Business of Apps website: https://www.businessofapps.com/data/snapchat-statistics/
Business of Apps statistics on WhatsApp Revenue and Usage Statistics (2021). Retrived July 31, 2021, from Business of Apps website: https://www.businessofapps.com/data/whatsapp-statistics/
The following plot fixes the main issues in the original.