The following markdown is going to show a poorly made visualization, and then talk about ways to improve the visualization.

The data viz offender

Source: Reddit

Source: Reddit

The figure attempts attempts to visualize a poll where people voted where they would want a new NHL team to be. The purpose of this visualization is to show where the most people want to have a new NHL team.

Whats wrong with this visualization??

There’s a not of reasons I don’t like this figure:

  1. This visualization does not use the correct scale! You can see that 47 is very far away from 57 but is very close to 24. This makes the viewer not be able to correctly examine “how big the differences are” between the cities. This figure should be using length of the bars as a scale.
  2. There is way to much different text in this. Text above and below the title. Why?
  3. The caption with the extra teams… bugs me. Just plot all of your data!
  4. You should should never have color differences in your figure unless you explain what the color means. This can be using a legend or a subtitle (as you’ll see below).

A corrected visualization

Lets start out by making the data frame so we can hold the data.

#make data.frame 
df <- data.frame(
  place = as.factor(c("Houston", "Quebec City", "Arizona", "Atlanta", "Toronto", "Austin", "Saskatoon", "San Diego")),
  votes = c(54,47,24,17,8,4,3,3),
  color = as.factor(c(1,0,0,0,0,0,0,0)) # I've included a color factor for the figure
  )

#view dataframe!
head(df)
##         place votes color
## 1     Houston    54     1
## 2 Quebec City    47     0
## 3     Arizona    24     0
## 4     Atlanta    17     0
## 5     Toronto     8     0
## 6      Austin     4     0

Now that we have the data, lets visualize it! , fill = color

df |>
  mutate(place = fct_reorder(place, votes))|>
  ggplot(aes(x = place, y = votes, fill = color)) +
  geom_col()+
  theme_classic()+
  geom_text(aes(label = votes), position=position_dodge(width=0.9), hjust=
              -1, color = "#f7f8fa")+
  labs(
    title = '"Where do you want to see a new NHL team play?"',
    subtitle = "<span style='color:#f7f8fa'> Houston was voted as</span> <span style='color:#2951e3'>the most desireable new location for a NHL team.</span> ",
    caption = 
      "Teams reaciving two votes: Kansas City, Helsinki, Oklahoma City, and Miami \nTeams reciving a single vote: Boise, Dubai, Green Bay, Halifax, Jaxson Hole, Milwaukee, \nand Orlando \n \nSource: The Athletic NHL Staff  (Anonymous NHL Player Poll: 175 votes) from Sept. 27 - Nov. 10"
  )+
  ylim(c(0,60))+
  scale_fill_manual(values = c("#f7f8fa","#2951e3")) +
  coord_flip()+
  theme(
        plot.subtitle = element_markdown(),
        plot.title = element_text(color = "white", size = 16),
        plot.caption = element_text(color = "white", hjust = 0),
        axis.text.y = element_text(color = "white"),
        axis.ticks.y = element_line(color = "white"),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.line.x = element_blank(),
        axis.line.y = element_blank(),
        plot.background = element_rect(fill = 'black', color = "black"),
        panel.background = element_rect(fill = 'black'),
        legend.position = "none"
        )

Okay now we have fixed the scale! You can see how Quebec City is really not all that far behind Houston in the poll. This is why the scale you use makes such a big difference! I have another gripe with this figure. Why did they not include all of the other teams? They decided to abritrarily decided to not include the places that have less than 3 votes.

Let’s see what it looks like if we include all of the teams!

#make data.frame 
df2 <- data.frame(
  place = as.factor(c("Houston", "Quebec City", "Arizona", "Atlanta", "Toronto", "Austin", "Saskatoon", "San Diego","Kansas City", "Helsinki", "Oklahoma City", "Miami", "Boise", "Dubai", "Green Bay", "Halifax", "Jaxson Hole", "Milwaukee", "Orlando")),
  votes = c(54,47,24,17,8,4,3,3,2,2,2,2,1,1,1,1,1,1,1),
  color = as.factor(c(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)) 
  )

Now that we have a new data frame, lets visualize the figure with all of the votes!

df2 |>
  mutate(place = fct_reorder(place, votes))|>
  ggplot(aes(x = place, y = votes, fill = color)) +
  geom_col()+
  theme_classic()+
  geom_text(aes(label = votes), position=position_dodge(width=0.9), hjust=
              -1, color = "#f7f8fa")+
  labs(
    title = '"Where do you want to see a new NHL team play?"',
    subtitle = "<span style='color:#f7f8fa'> Houston was voted as</span> <span style='color:#2951e3'>the most desireable new location for a NHL team.</span> ",
    caption = 
      "Source: The Athletic NHL Staff (Anonymous NHL Player Poll: 175 votes) from Sept. 27 - Nov. 10"
  )+
  ylim(c(0,60))+
  scale_fill_manual(values = c("#f7f8fa","#2951e3")) +
  coord_flip()+
  theme(
        plot.subtitle = element_markdown(),
        plot.title = element_text(color = "white", size = 16),
        plot.caption = element_text(color = "white", hjust = 1),
        axis.text.y = element_text(color = "white"),
        axis.ticks.y = element_line(color = "white"),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.line.x = element_blank(),
        axis.line.y = element_blank(),
        plot.background = element_rect(fill = 'black', color = "black"),
        panel.background = element_rect(fill = 'black'),
        legend.position = "none"
        )

There you have it! We have our finalized figure USING A CORRECT SCALE.

When you visualize data, you should always have your scale and lengths of your bars match the actual values you are plotting!!!