Tutorial: Turning a Table into a Horizontal Bar Graph using ggplot2

The table in this example comes from: GREEN, K., PINDER-GROVER, T., & MILLUNCHICK, J. (2012). Impact of Screencast Technology: Connecting the Perception of Usefulness and the Reality of Performance. Journal Of Engineering Education, 101(4), 717-737.

Table 3 in the article summarizes responses to a survey of strategies for using two types of screencasts: Homework solution screencasts and Mini-lecture screencasts.

original table

Getting the data into R

To work with the data in R, first create the table in Excel and save as a .csv file.

The table I created includes only the percentages and not response counts. The table's primary purpose was to make comparisons between strategies for each type of screencast, so the percentages rather than the counts are most important. The total number of respondents for each type of screencast will still be included on the final graph, allowing the response counts to be calculated while not cluttering the graph.

This is the Excel table I created:

first table

Set your working directory in R Studio by navigating the menus: Session > Set working directory > Choose directory then choose the folder where you have saved your data.

Read the .csv file into R using

screencastOriginal <- read.csv("hw1 data.csv")

Take a look at the table to make sure it was read properly.

screencastOriginal
##                                                       Reason
## 1                  Watched entire video from start to finish
## 2 Re-watched certain segments based on my homework responses
## 3                          Went to specific points to review
## 4               Watched large chunks looking for information
## 5                                             Browsed around
##   Homework.solution Mini.lecture
## 1                33           66
## 2                26            5
## 3                19           12
## 4                14            9
## 5                 9            8

This is a 5x3 matrix with column labels Reason, Homework.solution, and Mini.lecture. Think about if this table is formatted correctly for a bar graph. Reason will be one axis, but we will want both Homework.solution and Mini.lecture percentages on the other axis. Though the original table is presented one way, we need to change the format slightly for our purposes.

Using Excel, I created a second representation of the table, a 10x3 matrix with columns Reason, Percentage, and Type and saved it as a .csv. Type has values 0 and 1 representing Homework.solution and Mini.lecture respectively. Now when we make the graph, Reason will be on one axis, Percentage will be on the other axis, and each Type will have its own bar.

This is the second Excel table I created:

second table

Read in and view the new table using

screencast <- read.csv("hw1 data2.csv")
screencast
##                                                        Reason Percentage
## 1                   Watched entire video from start to finish         33
## 2  Re-watched certain segments based on my homework responses         26
## 3                           Went to specific points to review         19
## 4                Watched large chunks looking for information         14
## 5                                              Browsed around          9
## 6                   Watched entire video from start to finish         66
## 7  Re-watched certain segments based on my homework responses          5
## 8                           Went to specific points to review         12
## 9                Watched large chunks looking for information          9
## 10                                             Browsed around          8
##    Type
## 1     0
## 2     0
## 3     0
## 4     0
## 5     0
## 6     1
## 7     1
## 8     1
## 9     1
## 10    1

Creating the basic plot

We will use the packages ggplot2 and scales to create the graph. Load these using

library(ggplot2)
library(scales)

We will use ggplot() as a base and add on layers to customize our graph. We will set up the plot as laid out above: Reason will be on the x-axis, Percentage will be on the y-axis, and the data will be split by Type. Note that Type must be used as factor(Type) because R reads the 0,1 entries as integers rather than categories.

# The first step is specifying the basic form of the graph
ggplot(data=screencast, aes(x=Reason, y=Percentage, fill=factor(Type)))

Next, add the bar graph using the layer geom_bar() with the arguments stat="identity" to use the data as bar heights and position="dodge" so that the two bars don't overlap each other. The layer coord_flip() will flip the x- and y-axes creating a horizontal bar graph, instead of vertical. The layer ggtitle() will title the graph.

Now the code and graph look like:

ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
  geom_bar(position="dodge",stat="identity") + 
  coord_flip() +
  ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts")

plot of chunk unnamed-chunk-5

Refining the plot

The basic structure of the graph has been created, but there is still a lot to do to clean it up.

Color

Adding the layers scale_fill_grey() and theme_bw() will change the color scheme to black & white so it is easily printable.

ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
  geom_bar(position="dodge",stat="identity") +
  coord_flip() +
  ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") +
  scale_fill_grey() +
  theme_bw()

plot of chunk unnamed-chunk-6

Order and Labels of Categories

You may have noticed that the order of the categories has changed from the original table. R has automatically alphabetized them. The categories can be reordered using the argument limits in the layer scale_x_discrete(). We refer to the x-axis here because Reason was originally assigned to be x. It was only by using coord_flip() that Reason now appears on the y-axis.

scale_x_discrete(
  limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish")
  )

The elements are listed in reverse order from the table because they are assigned to the axis as it increases.

Some of the reasons are long and each takes up its own line. There are no line breaks written into the text. We can use the argument labels in scale_x_discrete() to change the labels so that the graph does not take up more room than necessary by adding the argument labels. Write your labels and use \n wherever you want to add a line break.

scale_x_discrete(
  limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"), 
  labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish")
    ) 

After making all these changes, our graph is looking better. Here is the complete code so far and the graph:

ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
  geom_bar(position="dodge",stat="identity") + 
  coord_flip() +  
  ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") +
  scale_fill_grey() +
  theme_bw() + 
  scale_x_discrete(
    limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"),
    labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish") 
    ) 

plot of chunk unnamed-chunk-7

Legend

The legend needs some fine tuning. Further arguments can be added within scale_fill_grey() to adjust the legend. The order of elements in the legend does not match that of the bars and the labels are meaningless. We can fix both these problems with the arguments breaks and labels.

scale_fill_grey(
  breaks=c(1,0),      #changes the order of elements to that listed
  labels=c("Mini-lecture Screencast, n=196","Homework Screencast, n=209")  #changes labels of elements in their respective order
  )

Since the original table gave total sample size, I include this data in the legend so no information is lost.

Arguments in the layer theme() also affect the legend. You can remove the legend title and move the legend using:

theme(
  legend.title=element_blank(),   #removes legend title
  legend.position=c(.73,.7),       #changes legend position
  )          

When positioning the legend on the graph, imagine the first quandrant of a coordinate system laying on top of it. The bottom left corner is (0,0) and the top right is (1,1). Choose your legend's coordinates within that range. I chose the coordinates (.73,.7).

Our code and graph now look like:

ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
  geom_bar(position="dodge",stat="identity") +
  coord_flip() + 
  ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") + 
  scale_fill_grey(  
    breaks=c(1,0), 
    labels=c("Mini-lecture Screencast, n=196","Homework Screencast, n=209")
    ) +                               
  theme_bw() +    
  scale_x_discrete(
    limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"), 
    labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish")   
    )  +                             
  theme(
    legend.title=element_blank(),
    legend.position=c(.73,.7)
    )

plot of chunk unnamed-chunk-8

Text

The final topic to consider is the text on the graph. This includes axis labels, axis text, title, font, size, etc. All of these aesthetic changes are made using arguments in the theme() layer.

The y-axis does not need to be labeled Reason since the categories are written out. This is easily taken out using the argument axis.title.y.

theme(
  axis.title.y=element_blank(),        #removes y-axis label
  )

We can change the font on the entire graph using text.

theme(
  axis.title.y=element_blank(),                #removes y-axis label
  text=element_text(family="serif",size=20)    #changes font family and text size on entire graph
  )

Finally, we can make changes to the title using plot.title.

theme(
  axis.title.y=element_blank(),                            #removes y-axis label
  text=element_text(family="serif"),                       #changes font on entire graph
  plot.title=element_text(face="bold",hjust=c(0,0)) #changes font face and location for graph title
  )

So our final code and graph end up as:

ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
  geom_bar(position="dodge",stat="identity") +
  coord_flip() + 
  ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") + 
  scale_fill_grey( 
    breaks=c(1,0),  
    labels=c("Mini-lecture Screencast, n=196","Homework Screencast, n=209")
    ) +                               
  theme_bw() +   
  scale_x_discrete(
    limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"), 
    labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish") 
    )  +                             
  theme(
    legend.title=element_blank(),  
    legend.position=c(.73,.7),
    axis.title.y=element_blank(), 
    text=element_text(family="serif",size=20),
    plot.title=element_text(face="bold",hjust=c(0,0))
    )

plot of chunk unnamed-chunk-9