This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more of my R tutorials visit http://mikewk.com/statistics.

Contents

I saw a tweet today that lead me to a thread on reddit and this cool plot. It was made by zonination, who was nice enough to share the code, but left out a few important lines that really make the plot look nicer. So, for this tuotorial, I decided to reproduce zonination’s cool plot.

Packages and Data

Install and load the following packages:

# install.packages(c('ggplot2','reshape2','scales','RColorBrewer'))
library(ggplot2)
library(reshape2)
library(scales)
library(RColorBrewer)

Download probability and numbers as .csv files, and then read in the data.

probly <- read.csv("prob.csv", stringsAsFactors=FALSE)
numberly <- read.csv("numbers.csv", stringsAsFactors=FALSE)

Convert data to long format

#Melt data into column format.
numberly <- melt(numberly)
numberly$variable <- gsub("[.]"," ",numberly$variable)
probly <- melt(probly)
probly$variable <- gsub("[.]"," ",probly$variable)

Order the factors

probly$variable <- factor(probly$variable,
                          c("Chances Are Slight",
                            "Highly Unlikely",
                            "Almost No Chance",
                            "Little Chance",
                            "Probably Not",
                            "Unlikely",
                            "Improbable",
                            "We Doubt",
                            "About Even",
                            "Better Than Even",
                            "Probably",
                            "We Believe",
                            "Likely",
                            "Probable",
                            "Very Good Chance",
                            "Highly Likely",
                            "Almost Certainly"))
numberly$variable <- factor(numberly$variable,
                            c("Hundreds of",
                              "Scores of",
                              "Dozens",
                              "Many",
                              "A lot",
                              "Several",
                              "Some",
                              "A few",
                              "A couple",
                              "Fractions of"))

Original Plot

The code that was posted incorporated a customized z_theme(), but the code used to specify that theme function was not included. So, if we ignore the custom theme, we still get a plot, but it doesn’t look nearly as cool as the original.

New/Reproduced Plot

So, my goal was to recreate the original plot. Here’s the code I added:

  ggplot(probly,aes(variable,value))+
  geom_boxplot(aes(fill=variable),alpha=.5)+
  geom_jitter(aes(color=variable),size=4,alpha=.2)+
  coord_flip()+
  guides(fill=FALSE,color=FALSE)+
  # remove ' xlab("Phrase")+ ' whic was the title of x-axis (on figure it's the y-axis b/c of coord_flip())
  ylab("Assigned Probability (%)")+
  scale_y_continuous(breaks=seq(0,100,10))+
  ggtitle("Perceptions of Probability") +

#----- **New code**
  theme(panel.background=element_rect(fill="#F0F0F0"))** +
  theme(plot.background=element_rect(fill="#F0F0F0")) +
  theme(panel.grid.major=element_line(colour="#999999",size=.25)) +
  theme(axis.ticks=element_blank()) +
  theme(axis.text=element_text(size=10,colour="#535353")) +
  theme(axis.title.x=element_text(size=12,colour="#535353", vjust=-.1)) +
  theme(axis.title.y=element_blank()) +
  theme(plot.title=element_text(face="bold",hjust=-.7,vjust=1.5,colour="#3C3C3C",size=20))

And the result is a much prettier plot that I’m happy with:

And, for good measure, here’s my version of the second [numbers] plot:

And that’s it!