Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Michael Sandberg’s Data Visualization Blog.


Objective

There is a common discourse in National Football League (NFL) circles that in order to have a winning team, an affordable but successful quarterback is required, this visualisation allows NFL fans, both casual and passionate, to see how strong the correlation between success and quarterback affordability is. As the quarterback is arguably the most important player for an NFL team, having a quarterback that is either affordable, successful or both, can be the difference between Superbowl success or not.

The visualisation chosen had the following three main issues:

  • Issue 1: The labels of the bars in the graph and not easily readable. The text of the labels is in white, to allow for the bars to be the team colour of the quarterback, this coupled with the vertical alignment of the text requires one to have to turn their head in order to read clearly. The combination of white text and bright yellow bar, for one of the data points, makes for especially difficult reading.
  • Issue 2: There visualization makes use of a categorical scale on the x-axis, despite the variable on this axis being a continuous numeric variables (average salary). This can lead to misleading conclusions being drawn. For example, there are player salaries of $5.5 million and $5.53 million being next to each other, this makes sense, these are similar salaries. However salaries of $6.75 and $13 million are also immediately next to each other, despite one salary being almost double of the other salary.
  • Issue 3: The choice of having the 25 highest paid quarterbacks seems like a strange choice. Firstly, there are 32 teams in the NFL, therefore there are 32 starting quarterbacks in the NFL (starting quarterback is the main quarterback for the team that generally plays from the outset of the game), so only having 25 data points makes it so some teams are not represented in this analysis. Also, some of the data points used in this visualisation are for quarterbacks that weren’t starting quarterback for there team, so it seems a little inappropriate to assign 0 wins for a player who didn’t play for the team, when there is different quarterback who played instead.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(tidyverse)
library(plotly)

qb_salary <- data.frame(qb =c('Aaron Rodgers','Matt Ryan','Joe Flacco','Drew Brees','Peyton Manning','Colin Kaepernick','Jay Cutler','Tony Romo','Matthew Stafford','Alex Smith','Drew Stanton','Eli Manning','Andy Dalton','Philip Rivers','Ben Roethlisberger','Tom Brady','Shaun Hill','Derek Carr','Andrew Luck','Cam Newton','Kyle Orton','Robert Griffin III','Teddy Bridgewater','Blake Bortles','Josh McCown','Ryan Tannehill','Brian Hoyer','Russell Wilson','Geno Smith','Ryan Fitzpatrick','Mark Sanchez','Charlie Whitehurst'),
                        salary = c(22, 20.75, 20.1, 20, 19.2, 19, 18.1, 18, 17.6, 17, 2.73, 16.25, 16, 15.3, 14.6, 14.1, 1.75, 1.34, 5.53, 5.5, 5.45, 5.3, 1.71, 5.16, 5, 3.17, 0.98, 0.75, 1.25, 3.62, 2.25, 2.13),
                        wins = c(12, 6, 10, 7, 12, 8, 5, 12, 11, 8, 5, 6, 10, 9, 11, 12, 3, 3, 11, 5, 7, 2, 6, 3, 1, 8, 7, 12, 3, 6, 4, 1))


p1 <- plot_ly(qb_salary, x = ~salary, y = ~wins, type = 'scatter',
              marker = list(symbol = "x",
                         color = "rgba(1, 51, 105, 1)")) %>% add_trace(
    type = 'scatter',
    mode = 'markers',
    x = qb_salary$salary,
    y = qb_salary$wins,
    text = ~qb,
    hovertemplate = paste('%{text}<br>',
                      'Average salary(in USD): %{x:$.2f} million<br>',
                      'Wins: %{y}<br><extra></extra>'),
    showlegend = FALSE
  ) %>% layout(
    title = list(text="Average Annual Salary VS Number of Wins <br> for Starting NFL Quarterbacks in 2014",
                 color = "rgb(1, 51, 105)",
                 y = 0.96, x = 0.5, xanchor = 'center', yanchor =  'top'),
    xaxis = list(title = 'Average Salary (in millions of USD)'), 
         yaxis = list(title = 'Number of Games Won'))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.