This is my first attempt at using R Markdown as a solution to Makeover Monday challenges set by @VizWizBi and @acotgreave at http://vizwiz.blogspot.co.uk/p/makeover-monday-challenges.html.

The challenges have emerged out of the Tableau community and I’m a huge Tableau fan, but they also seem like a great way to see what I can manage in R and to learn some new techniques at the same time.

This week’s challenge is to make over this vis.

Original Vis

Rebuild the raw data

I could load the data from csv, but it’s very small so let’s just create it here.

#Recreate raw data
source_of_traffic <- c("Netflix","YouTube","HTTP","Amazon Video","iTunes","BitTorrent","Hulu","Facebook","Other")
percent_of_traffic <- c(37.1,17.9,6.1,3.1,2.8,2.7,2.6,2.5,25.4)
df <- data.frame(source_of_traffic, percent_of_traffic)

print(df)
##   source_of_traffic percent_of_traffic
## 1           Netflix               37.1
## 2           YouTube               17.9
## 3              HTTP                6.1
## 4      Amazon Video                3.1
## 5            iTunes                2.8
## 6        BitTorrent                2.7
## 7              Hulu                2.6
## 8          Facebook                2.5
## 9             Other               25.4

Recreate the original

I wonder what the original would look like, rendered in ggplot? I’ve never drawn a donut chart in ggplot before and it turns out that it’s trickier than I expected. Comments in the code below note the awkward bits.

library(ggplot2)
library(dplyr)
# Adding value labels to each segment means working out where the centre of the segment will be drawn
df <- df %>%
  mutate(pos = cumsum(percent_of_traffic) - (0.5 * percent_of_traffic))

original_colours <- c("#7D0808", "#D91A00", "#FFB200", "#F78200", "#0F283E", "#919191", "#87BD24", "#005CB0", "#D6D6D6")

# The colour palette doesn't work unless you force the data to sort in its original order. ggplot will default to alphabetical order
df$source_of_traffic <- factor(df$source_of_traffic, as.character(df$source_of_traffic))

ggplot(df, aes(x="", y=percent_of_traffic, fill=source_of_traffic)) +
  geom_bar(width=0.25, stat= "identity") +
  geom_text(aes(label=paste(percent_of_traffic,"%",sep=""),y=pos), size=3)+
  coord_polar(theta = "y") +
  scale_fill_manual(values=original_colours, name="Source of Traffic") +
  ggtitle("Percentage of peak period downstream internet traffic in North America")+

  #Turn off all of ggplot's default formatting
  theme(panel.grid=element_blank(),
        panel.background=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(hjust = 0, size=12))

Well that was quite a lot of effort just to produce a very average looking donut chart and I’m still not happy with the chart’s rendered size, which seems to be very difficult to change for a donut. It’s the down side of using R; you can do almost anything, so you often have to specify in minute detail what you actually want.

In ggplot’s defence, the help files do note that coord_polar() exists so that you can draw pie charts and then warns against their use. Reproducing the original vis isn’t what ggplot is good at.

What would I draw?

The original on this week’s challenge strikes me as a classic of the type you often get in marketing. The dataset is very simple, but if you draw a simple chart then you fear that nobody will think it’s clever.

Bar chart? Too basic.

Pie chart? Too common.

Donut chart!

Simplicity is good. Bar chart was a good answer and a donut doesn’t improve on it.

#Sort the data
df <- transform(df, source_of_traffic=reorder(source_of_traffic, df$percent_of_traffic) ) 

#Drop "Other". It's needed for a pie or donut, but not interesting in a bar.
df <- df[df$source_of_traffic!="Other",]

ggplot(df, aes(x=source_of_traffic, y=percent_of_traffic)) +
  geom_bar(stat="identity") +
  coord_flip() +
  geom_text(aes(label=paste(percent_of_traffic,"%",sep=""),y=percent_of_traffic, hjust=1), size=3, color="white")+
  ggtitle("Percentage of peak period downstream internet traffic in North America")+
  #Turn off all of ggplot's default formatting
  theme(text = element_text(size=15),
        panel.grid=element_blank(),
        panel.background=element_blank(),
        axis.ticks=element_blank(),
        axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(hjust = 0, size=12))

That’ll do. Seeing as I’ve already been a killjoy about the original vis overcomplicating a simple dataset, I’m not even going to colour it in. Corporate, analytical grey. Lovely!

(seriously, for publication, I promise I’d colour it in)

You might have spotted that I filtered out the “Others” category. Once you move away from a donut, or pie, I don’t think it’s really serving much purpose.

More of this?

Why not? This has been a fun way to learn some new things and brush up my R Markdown at the same time. Complicated or really fun builds are likely to still happen in Tableau for the time being though, because I can make a lot more happen in Tableau, much faster.