Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

“Original Data Visualisation Image”
Source: Reddit: Data is beautiful - Movie Genre Cooccurrence (2020).


Objective

The objective of the original data visualisation is to show the quantity of genre co-occurrences and the strength of relationships between different genres using the 5000 movies from The Movie Database (TMDb). The targeted audience for this visualisation includes but not limited to movie producers, movie writers, movie executives and critics and bloggers.

The visualisation chosen had the following three main issues:

  • Failure to achieve the dataset objective – The visualisation doesn’t accurately show which genres have the most co-occurrences. It’s unclear the number of times genres co-occurrence unless you manually hover over the visualisation making it hard to extract information just by looking at the visualisation.
  • Deceptive methods – A chord diagram is similar to a pie chart as it relies on area and connection thickness to show the number of occurrences for each of the genres. The thicker the connection the greater the amount of genres that occur together. A chord diagram also makes the small proportions’ labels and co-occurrences hard to see. The visualisation overall lacks visual accuracy.
  • Perceptual or colour issues – A chord diagram relies on colour to differentiate segments. As there is many categories which overlap in this visualisation it’s not clear without hovering over the segment and/or relationship how many times the two genres occur together. This visualisation isn’t clear to just look at, it requires manual work as well (hovering).

Reference

Code

The following code was used to fix the issues identified in the original.

library(readr)
library(dplyr)
library(ggplot2)
library(RColorBrewer) 

Movie_Genres <- read_csv("Movie Genres.csv")

colourCount = length(unique(Movie_Genres$genre_sub_1))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))

rs1 <- ggplot(Movie_Genres, aes(fill=genre_sub_1, y=genre_main, x=ration, main="Movie Genre Co-occurrence")) + geom_bar(position="stack", stat="identity") + scale_fill_manual(values = getPalette(colourCount)) + labs(x="Co-occurrence percentage", y="Genre Co-occurrences", title = "Movie Genre Co-occurrence") 

Data Reference

Reconstruction

The following plot fixes the main issues in the original.