Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: https://howmuch.net/articles/insurance-companies-ranking-by-direct-premiums-written-in-the-US.


Objective

This visualization is primarily covering U.S. insurance market with objective to compare various insurance sector across their top 10 companies and to analyze if ranking is based on direct premiums written.

The visualization chosen had the following three main issues:

  • Issue 1 - Data integrity issue. As per title, visualization should have 7 different Insurance sector visualized however there were only 6 insurance sector visualized.
  • Issue 2 - X and Y axis is not clearly defined.
  • Issue 3 - A lot of information to scan through and understand the graph. 2 Legends to scan across 7 sector and their top 10 companies. This is causing lot of eye movement between the data and legend, that interrupt the interpretation process.

Reference

Code

The following code was used to fix the issues identified in the original.

#install.packages("tidytext") #facet reordering

library(dplyr)    # # Install and load this package to filter data 
library(readr)    # For reading datasets in other formats 
library(magrittr) # For pipes 
library(here)     # For sensible file paths in the project folder 
library(ggplot2)  # For plotting 
library(scales)
library("RColorBrewer")  # Load RColorBrewer
library(tidytext)

#import the data
insurancedata <- read_csv(here( "insurancedata.csv"))

# define category for Marketshare
insurancedata <- mutate(insurancedata, marketshare_level = ifelse(Marketshare >= 0.15,  "15% or more",
                                        ifelse((Marketshare >= 0.10) & (Marketshare <= 0.149), "10% to 14.9%",
                                        ifelse((Marketshare >= 0.05) & (Marketshare <= 0.099), "5% to 9.9%",
                                        ifelse(Marketshare <= 0.049, "Less than 5%",
                                        "NA")))))

#Order Market share by higher % to lower
insurancedata %<>%
  mutate(marketshare_level = factor(marketshare_level,
                          levels = c("15% or more", "10% to 14.9%", "5% to 9.9%", "Less than 5%") 
                          # , ordered = TRUE
                          ))  
str(insurancedata$marketshare_level)
##  Factor w/ 4 levels "15% or more",..: 3 3 3 3 3 4 4 4 4 4 ...
#Facet bar chat for 7 insurance sector by company
data_to_plot <- insurancedata %>%
  mutate(Company = reorder_within(Company, Directpremiumswritten, InsuranceSector)) 

p1 <- ggplot(data_to_plot, aes(y=Company, x=Directpremiumswritten, fill=marketshare_level)) +
    geom_bar(stat="identity") +
    scale_x_continuous(labels = label_number(suffix = " M", scale = 1e-6)) + # millions 
    facet_grid(rows = vars(InsuranceSector), scales = "free") +
    labs(title = "U.S. Insurance Companies Ranking by Direct Premiums Written",
         subtitle = "Top 10 Companies in 7 Different Insurance Sector",
         x = "Direct Premium Written (in Millions) >>",
         y = "Insurance Company >>",
         fill = "Market Share")+
    scale_y_reordered() +
    scale_fill_manual(values = c("darkblue","steelblue4","steelblue3","steelblue1"),
                      labels = c("15% or more", "10% to 14.9%", "5% to 9.9%","Less than 5%"))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.