Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Objective: The objective of original data visualization is to compare the Salary and team performance in major soccer league.
Target Audience: The target audience is fan boys who love playing or watching soccer and corporate sponsor who would like to compare the team performance vs payroll charts to understand about sponsorship requirements.
The visualisation chosen has the following three main issues:
Dual axis: We have two different y-axis in our data, total base salary ranking and season ranking. The key problem with dual axis chart is they can deliberately mislead the viewers about the relationship between two data series. It can deceive the viewer into wrong interpretation. In our data, telling that which team has highest performing payroll or whether highest ranking team has highest payroll is very difficult because of two axis.
Incomplete Visualization: This is one of the integrity issues, if we look at the visualization it talks about ranking and salary and both are numerical features but data visualization doesn’t show any numbers. Finding out the salary difference or ranking difference between teams is impossible. This problem raises an integrity issue with the visualization, if we can’t depict anything from the visualization then that viz is useless. A data viz should tell a story from which the viewer of it can depict some conclusion. This visualization therefore needs to include some data numeric’s so that viewers can understand what’s happening.
Confusing visualization: In addition to its non-comprehensive of including crucial elements for story depiction, this data viz is rather confusing. At first glance it will confuse anyone that whether the left axis is from lowest to highest ranking or opposite or whether the left axis is from highest to lowest or vice-versa. Moreover, numerous red and blue lines are working as confusion enhancers. It’s difficult to understand that what the thickness of red and blue lines depicts. It is also very confusing to build any relationship, many lines are overlapping, some of the lines are connecting form top most data frame to bottom most which show relationship between them but you can’t interpret anything from that line.
Solutions: We will solve the above three issues by:
* Dual Axis: Split the axis and make two graphs.
* Incomplete Visualization: Add the salary and ranking values in the graphs.
* Confusing Visualization : Sort out the salary from lowest to highest and ranking from lowest to highest.
With this solutions we will be able to visualize the key story that which of the team is getting highest salary and which team has highest ranking. We can also understand the relation that whether the team with highest ranking is getting the highest base salary or not.
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(stringr)
library(readr)
library(tidyr)
library(dplyr)
library(RColorBrewer)
library(forcats)
library(scales)
# Loading the data
mls <- read.csv("C:/data/Slope_data.csv")
# Checking for column names
colnames(mls, do.NULL = TRUE, prefix = "col")
## [1] "ï..Measure.Names" "Club..group." "Colour"
## [4] "Line.size" "Total.Base.Salary" "Season"
## [7] "Table.Name" "Measure.Values" "Base.Salary"
## [10] "Pts" "Season.Position"
#Filtering the data
df_plot <- filter(mls, ï..Measure.Names == "Total Base Salary Ranking")
df_plot
## ï..Measure.Names Club..group. Colour Line.size
## 1 Total Base Salary Ranking Vancouver Whitecaps FC Blue 2
## 2 Total Base Salary Ranking Toronto FC red 18
## 3 Total Base Salary Ranking Sporting Kansas City Blue 7
## 4 Total Base Salary Ranking Seattle Sounders FC Blue 5
## 5 Total Base Salary Ranking San Jose Earthquakes red 5
## 6 Total Base Salary Ranking Real Salt Lake Blue 2
## 7 Total Base Salary Ranking Portland Timbers red 2
## 8 Total Base Salary Ranking Philadelphia Union red 0
## 9 Total Base Salary Ranking Orlando City SC red 9
## 10 Total Base Salary Ranking New York Red Bulls Blue 18
## 11 Total Base Salary Ranking New York City FC red 4
## 12 Total Base Salary Ranking New England Revolution Blue 6
## 13 Total Base Salary Ranking Montreal Impact Blue 2
## 14 Total Base Salary Ranking Minnesota United FC red 3
## 15 Total Base Salary Ranking Los Angeles FC red 0
## 16 Total Base Salary Ranking LA Galaxy red 11
## 17 Total Base Salary Ranking Houston Dynamo Blue 6
## 18 Total Base Salary Ranking FC Dallas Blue 6
## 19 Total Base Salary Ranking D.C. United Blue 12
## 20 Total Base Salary Ranking Columbus Crew Blue 10
## 21 Total Base Salary Ranking Colorado Rapids red 13
## 22 Total Base Salary Ranking Chicago Fire red 16
## 23 Total Base Salary Ranking Atlanta United FC Blue 5
## Total.Base.Salary Season Table.Name Measure.Values Base.Salary Pts
## 1 16 2018 May 10, 2018 16 7531016 47
## 2 1 2018 May 10, 2018 1 23480305 36
## 3 10 2018 May 10, 2018 10 8825490 62
## 4 9 2018 May 10, 2018 9 9767458 59
## 5 18 2018 May 10, 2018 18 7116235 21
## 6 14 2018 May 10, 2018 14 8228528 49
## 7 6 2018 May 10, 2018 6 11209418 54
## 8 11 2018 May 10, 2018 11 8492604 50
## 9 13 2018 May 10, 2018 13 8230668 28
## 10 19 2018 May 10, 2018 19 7079490 71
## 11 3 2018 May 10, 2018 3 13249558 56
## 12 22 2018 May 10, 2018 22 6139674 41
## 13 17 2018 May 10, 2018 17 7230911 46
## 14 15 2018 May 10, 2018 15 7561894 36
## 15 5 2018 May 10, 2018 5 11254869 57
## 16 2 2018 May 10, 2018 2 14799180 48
## 17 23 2018 May 10, 2018 23 5267338 38
## 18 12 2018 May 10, 2018 12 8239754 57
## 19 21 2018 May 10, 2018 21 6325797 51
## 20 20 2018 May 10, 2018 20 6632083 51
## 21 8 2018 May 10, 2018 8 9981477 31
## 22 4 2018 May 10, 2018 4 13165346 32
## 23 7 2018 May 10, 2018 7 10369120 69
## Season.Position
## 1 14
## 2 19
## 3 3
## 4 4
## 5 23
## 6 12
## 7 8
## 8 11
## 9 22
## 10 1
## 11 7
## 12 16
## 13 15
## 14 18
## 15 5
## 16 13
## 17 17
## 18 6
## 19 9
## 20 10
## 21 21
## 22 20
## 23 2
#Separating the data frame to plot first graph
df_sl <- df_plot[c(2,9)]
df_sl
## Club..group. Base.Salary
## 1 Vancouver Whitecaps FC 7531016
## 2 Toronto FC 23480305
## 3 Sporting Kansas City 8825490
## 4 Seattle Sounders FC 9767458
## 5 San Jose Earthquakes 7116235
## 6 Real Salt Lake 8228528
## 7 Portland Timbers 11209418
## 8 Philadelphia Union 8492604
## 9 Orlando City SC 8230668
## 10 New York Red Bulls 7079490
## 11 New York City FC 13249558
## 12 New England Revolution 6139674
## 13 Montreal Impact 7230911
## 14 Minnesota United FC 7561894
## 15 Los Angeles FC 11254869
## 16 LA Galaxy 14799180
## 17 Houston Dynamo 5267338
## 18 FC Dallas 8239754
## 19 D.C. United 6325797
## 20 Columbus Crew 6632083
## 21 Colorado Rapids 9981477
## 22 Chicago Fire 13165346
## 23 Atlanta United FC 10369120
# Plotting first graph using ggplot2
Salary <- ggplot(df_sl, aes(x= reorder(Club..group., Base.Salary ), fill = Club..group., y=Base.Salary)) + theme(legend.position = "none") + geom_bar(stat="identity") + geom_bar(stat = "identity") + scale_y_continuous(labels = unit_format(unit ="M", scale = 1e-6),limits = c(0,25000000)) + coord_flip() +
labs(x="Club name", y="Base Salary(in Millions)", title=" Club Names v/s Base Salary")
# Data to plot Second graph
df_sl2 <- df_plot[c(2,11)]
df_sl2
## Club..group. Season.Position
## 1 Vancouver Whitecaps FC 14
## 2 Toronto FC 19
## 3 Sporting Kansas City 3
## 4 Seattle Sounders FC 4
## 5 San Jose Earthquakes 23
## 6 Real Salt Lake 12
## 7 Portland Timbers 8
## 8 Philadelphia Union 11
## 9 Orlando City SC 22
## 10 New York Red Bulls 1
## 11 New York City FC 7
## 12 New England Revolution 16
## 13 Montreal Impact 15
## 14 Minnesota United FC 18
## 15 Los Angeles FC 5
## 16 LA Galaxy 13
## 17 Houston Dynamo 17
## 18 FC Dallas 6
## 19 D.C. United 9
## 20 Columbus Crew 10
## 21 Colorado Rapids 21
## 22 Chicago Fire 20
## 23 Atlanta United FC 2
#Plotting second graph using ggplot2
Ranking <-ggplot(df_sl2, aes(x= reorder(Club..group., Season.Position), fill = Club..group., y=Season.Position)) + theme(legend.position = "none")+ geom_bar(stat="identity") +
scale_y_continuous(limits = c(0,25)) + coord_flip() +
labs(x="Club name", y="Ranking", title="Club Names V/s Season Ranking")
Data Reference
The following plot fixes the main issues in the original.