Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of the data visualisation “2018 Identity Theft Fraud Reports” is to provide data driven insights to the end consumer about identity thefts in USA (especially credit card). The report says that in USA identity theft made up 14.8 percent of the total consumer fraud reported in CY 2018. Credit card fraud accounted for 29 percent of all identity theft fraud types reported in USA in 2018 and was ranked #1.
The target audience is the small, mid-sized to large retail business, chain stores, franchises, restaurants and dealerships in USA that use merchant payment services such as credit and debit card processing, loyalty/gift card programs, and on-the-go smart phone & tablet payment. It is a legal and ethical requirement to ensure customers’ financial and personal details are safe and not shared unless advised.
The visualisation chosen had the following three main issues:
Issues with data integrity: Data Source citation is missing : In data visualisations, its pertinent to mention the source from where the underlying data was taken for the visualisation. This helps in assessing the authenticity and quality of the data. In the data visualisation “2018 Identity Theft Fraud Reports” the data source is missing. It took me some time to find the source of the data used for preparing this report. The data is released by ’Federal Trade Commisssion, USA (FTC)" every calender year.
Perceptual or colour issues : Colour models & colour associations doesn’t align well with the interpretation : The pie-chart visualisation has different colours for different variables (Visual Variable = Colur Hue). The chart colours focussed more on the types of identity theft and focussed less on the facts such as credit card related identity theft was the highest, bank fraud was the fifth highest, and so on. A sequencial visual variable would have served the purpose better.
Ethical issues: Title of the visualisation has missing information such as country, fiscal year for calendar year, etc. Also, a sub-title which highlights the important information that ’Credit Card identity theft was highest" would have made the interpretation easier.
Reference
The following code was used to fix the issues identified in the original.
#library(ggplot2)
library(dplyr)
library(plotly)
library(scales)
library(readr)
# set the working directory
setwd("C:/Users/Vijeta/Desktop/RMIT/Data Visualisation/Assignment 2")
identity_theft <- read_csv("2018_CSN_Report_Type.csv", skip = 1)
theft_types <- identity_theft[order(-identity_theft$`%_of_Total`),]
theft_types$Reports_Count <- comma(theft_types$Reports_Count)
color <- c("#718200","#94A813","#B3C732","#CCDE57", "#E0ED87","#F5FCC2","#FFFFFF")
color_red <- c("#de5747","#ee734a", "#ee894e", "#eeaf49","#f8c645","#f4ec6f", "#f1f4a8" )
m <- list(l = 40, r = 55, b = 100, t = 100, pad = 3)
p <- plot_ly(theft_types, labels = ~ theft_types$Theft_Type, values = ~ theft_types$`%_of_Total`, text = theft_types$Reports_Count, textinfo='label+percent+text', insidetextorientation='radial', textposition='outside', sort = FALSE,marker= list(colors=color_red, line = list(color="black", width=1)))%>%
add_pie(hole = 0.6)%>%
layout(title = list(text = "Identity Theft Types in USA for CY 2018", x=0.8, y=0.95), titlefont=list(size=18), showlegend = F, legend = list(x = 1.5, y = 0.5),
annotations = list(text = "Credit Card identity theft is highest", x = 0.82, y = 1.2,showarrow=FALSE,font=list(size = 15, color = "blue") ),
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE), autosize = T, margin = m)
p <- layout(p, annotations=list(text="Total \n Identity Theft \n @ 444,602 \n (14.8% of total \n consumer frauds)", "showarrow"=F,
font=list(size = 13, color = "black")))
Data Reference
The following plot fixes the main issues in the original.