Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Public Tableau User:ryansoares.(2018)


Objective

Objective: The objective of original data visualization is to compare the Salary and team performance in major soccer league.

Target Audience: The target audience is fan boys who love playing or watching soccer and corporate sponsor who would like to compare the team performance vs payroll charts to understand about sponsorship requirements.

The visualisation chosen has the following three main issues:

  • Dual axis
  • Data integrity: Completeness of visualization
  • Confusing visualization
  1. Dual axis: We have two different y-axis in our data, total base salary ranking and season ranking. The key problem with dual axis chart is they can deliberately mislead the viewers about the relationship between two data series. It can deceive the viewer into wrong interpretation. In our data, telling that which team has highest performing payroll or whether highest ranking team has highest payroll is very difficult because of two axis.

  2. Incomplete Visualization: This is one of the integrity issues, if we look at the visualization it talks about ranking and salary and both are numerical features but data visualization doesn’t show any numbers. Finding out the salary difference or ranking difference between teams is impossible. This problem raises an integrity issue with the visualization, if we can’t depict anything from the visualization then that viz is useless. A data viz should tell a story from which the viewer of it can depict some conclusion. This visualization therefore needs to include some data numeric’s so that viewers can understand what’s happening.

  3. Confusing visualization: In addition to its non-comprehensive of including crucial elements for story depiction, this data viz is rather confusing. At first glance it will confuse anyone that whether the left axis is from lowest to highest ranking or opposite or whether the left axis is from highest to lowest or vice-versa. Moreover, numerous red and blue lines are working as confusion enhancers. It’s difficult to understand that what the thickness of red and blue lines depicts. It is also very confusing to build any relationship, many lines are overlapping, some of the lines are connecting form top most data frame to bottom most which show relationship between them but you can’t interpret anything from that line.

Solutions: We will solve the above three issues by:

* Dual Axis: Split the axis and make two graphs.
* Incomplete Visualization: Add the salary and ranking values in the graphs.
* Confusing Visualization : Sort out the salary from lowest to highest and ranking from lowest to highest.

With this solutions we will be able to visualize the key story that which of the team is getting highest salary and which team has highest ranking. We can also understand the relation that whether the team with highest ranking is getting the highest base salary or not.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(stringr)
library(readr)
library(tidyr)
library(dplyr)
library(RColorBrewer)
library(forcats)
library(scales)

# Loading the data
mls <- read.csv("C:/data/Slope_data.csv")

# Checking for column names
colnames(mls, do.NULL = TRUE, prefix = "col")
##  [1] "ï..Measure.Names"  "Club..group."      "Colour"           
##  [4] "Line.size"         "Total.Base.Salary" "Season"           
##  [7] "Table.Name"        "Measure.Values"    "Base.Salary"      
## [10] "Pts"               "Season.Position"
#Filtering the data
df_plot <- filter(mls, ï..Measure.Names == "Total Base Salary Ranking")
df_plot
##             ï..Measure.Names           Club..group. Colour Line.size
## 1  Total Base Salary Ranking Vancouver Whitecaps FC   Blue         2
## 2  Total Base Salary Ranking             Toronto FC    red        18
## 3  Total Base Salary Ranking   Sporting Kansas City   Blue         7
## 4  Total Base Salary Ranking    Seattle Sounders FC   Blue         5
## 5  Total Base Salary Ranking   San Jose Earthquakes    red         5
## 6  Total Base Salary Ranking         Real Salt Lake   Blue         2
## 7  Total Base Salary Ranking       Portland Timbers    red         2
## 8  Total Base Salary Ranking     Philadelphia Union    red         0
## 9  Total Base Salary Ranking        Orlando City SC    red         9
## 10 Total Base Salary Ranking     New York Red Bulls   Blue        18
## 11 Total Base Salary Ranking       New York City FC    red         4
## 12 Total Base Salary Ranking New England Revolution   Blue         6
## 13 Total Base Salary Ranking        Montreal Impact   Blue         2
## 14 Total Base Salary Ranking    Minnesota United FC    red         3
## 15 Total Base Salary Ranking         Los Angeles FC    red         0
## 16 Total Base Salary Ranking              LA Galaxy    red        11
## 17 Total Base Salary Ranking         Houston Dynamo   Blue         6
## 18 Total Base Salary Ranking              FC Dallas   Blue         6
## 19 Total Base Salary Ranking            D.C. United   Blue        12
## 20 Total Base Salary Ranking          Columbus Crew   Blue        10
## 21 Total Base Salary Ranking        Colorado Rapids    red        13
## 22 Total Base Salary Ranking           Chicago Fire    red        16
## 23 Total Base Salary Ranking      Atlanta United FC   Blue         5
##    Total.Base.Salary Season   Table.Name Measure.Values Base.Salary Pts
## 1                 16   2018 May 10, 2018             16     7531016  47
## 2                  1   2018 May 10, 2018              1    23480305  36
## 3                 10   2018 May 10, 2018             10     8825490  62
## 4                  9   2018 May 10, 2018              9     9767458  59
## 5                 18   2018 May 10, 2018             18     7116235  21
## 6                 14   2018 May 10, 2018             14     8228528  49
## 7                  6   2018 May 10, 2018              6    11209418  54
## 8                 11   2018 May 10, 2018             11     8492604  50
## 9                 13   2018 May 10, 2018             13     8230668  28
## 10                19   2018 May 10, 2018             19     7079490  71
## 11                 3   2018 May 10, 2018              3    13249558  56
## 12                22   2018 May 10, 2018             22     6139674  41
## 13                17   2018 May 10, 2018             17     7230911  46
## 14                15   2018 May 10, 2018             15     7561894  36
## 15                 5   2018 May 10, 2018              5    11254869  57
## 16                 2   2018 May 10, 2018              2    14799180  48
## 17                23   2018 May 10, 2018             23     5267338  38
## 18                12   2018 May 10, 2018             12     8239754  57
## 19                21   2018 May 10, 2018             21     6325797  51
## 20                20   2018 May 10, 2018             20     6632083  51
## 21                 8   2018 May 10, 2018              8     9981477  31
## 22                 4   2018 May 10, 2018              4    13165346  32
## 23                 7   2018 May 10, 2018              7    10369120  69
##    Season.Position
## 1               14
## 2               19
## 3                3
## 4                4
## 5               23
## 6               12
## 7                8
## 8               11
## 9               22
## 10               1
## 11               7
## 12              16
## 13              15
## 14              18
## 15               5
## 16              13
## 17              17
## 18               6
## 19               9
## 20              10
## 21              21
## 22              20
## 23               2
#Separating the data frame to plot first graph
df_sl <- df_plot[c(2,9)]
df_sl
##              Club..group. Base.Salary
## 1  Vancouver Whitecaps FC     7531016
## 2              Toronto FC    23480305
## 3    Sporting Kansas City     8825490
## 4     Seattle Sounders FC     9767458
## 5    San Jose Earthquakes     7116235
## 6          Real Salt Lake     8228528
## 7        Portland Timbers    11209418
## 8      Philadelphia Union     8492604
## 9         Orlando City SC     8230668
## 10     New York Red Bulls     7079490
## 11       New York City FC    13249558
## 12 New England Revolution     6139674
## 13        Montreal Impact     7230911
## 14    Minnesota United FC     7561894
## 15         Los Angeles FC    11254869
## 16              LA Galaxy    14799180
## 17         Houston Dynamo     5267338
## 18              FC Dallas     8239754
## 19            D.C. United     6325797
## 20          Columbus Crew     6632083
## 21        Colorado Rapids     9981477
## 22           Chicago Fire    13165346
## 23      Atlanta United FC    10369120
# Plotting first graph using ggplot2
Salary <- ggplot(df_sl, aes(x= reorder(Club..group., Base.Salary ), fill = Club..group., y=Base.Salary)) + theme(legend.position = "none") + geom_bar(stat="identity") + geom_bar(stat = "identity") + scale_y_continuous(labels = unit_format(unit ="M", scale = 1e-6),limits = c(0,25000000)) + coord_flip() +
  labs(x="Club name", y="Base Salary(in Millions)", title=" Club Names v/s Base Salary")

# Data to plot Second graph
df_sl2 <- df_plot[c(2,11)]
df_sl2
##              Club..group. Season.Position
## 1  Vancouver Whitecaps FC              14
## 2              Toronto FC              19
## 3    Sporting Kansas City               3
## 4     Seattle Sounders FC               4
## 5    San Jose Earthquakes              23
## 6          Real Salt Lake              12
## 7        Portland Timbers               8
## 8      Philadelphia Union              11
## 9         Orlando City SC              22
## 10     New York Red Bulls               1
## 11       New York City FC               7
## 12 New England Revolution              16
## 13        Montreal Impact              15
## 14    Minnesota United FC              18
## 15         Los Angeles FC               5
## 16              LA Galaxy              13
## 17         Houston Dynamo              17
## 18              FC Dallas               6
## 19            D.C. United               9
## 20          Columbus Crew              10
## 21        Colorado Rapids              21
## 22           Chicago Fire              20
## 23      Atlanta United FC               2
#Plotting second graph using ggplot2
Ranking <-ggplot(df_sl2, aes(x= reorder(Club..group., Season.Position), fill = Club..group., y=Season.Position)) + theme(legend.position = "none")+ geom_bar(stat="identity") + 
  scale_y_continuous(limits = c(0,25)) + coord_flip() +
  labs(x="Club name", y="Ranking", title="Club Names V/s Season Ranking")

Data Reference


Reconstruction

The following plot fixes the main issues in the original.