The table in this example comes from: GREEN, K., PINDER-GROVER, T., & MILLUNCHICK, J. (2012). Impact of Screencast Technology: Connecting the Perception of Usefulness and the Reality of Performance. Journal Of Engineering Education, 101(4), 717-737.
Table 3 in the article summarizes responses to a survey of strategies for using two types of screencasts: Homework solution screencasts and Mini-lecture screencasts.
To work with the data in R, first create the table in Excel and save as a .csv file.
The table I created includes only the percentages and not response counts. The table's primary purpose was to make comparisons between strategies for each type of screencast, so the percentages rather than the counts are most important. The total number of respondents for each type of screencast will still be included on the final graph, allowing the response counts to be calculated while not cluttering the graph.
This is the Excel table I created:
Set your working directory in R Studio by navigating the menus: Session > Set working directory > Choose directory then choose the folder where you have saved your data.
Read the .csv file into R using
screencastOriginal <- read.csv("hw1 data.csv")
Take a look at the table to make sure it was read properly.
screencastOriginal
## Reason
## 1 Watched entire video from start to finish
## 2 Re-watched certain segments based on my homework responses
## 3 Went to specific points to review
## 4 Watched large chunks looking for information
## 5 Browsed around
## Homework.solution Mini.lecture
## 1 33 66
## 2 26 5
## 3 19 12
## 4 14 9
## 5 9 8
This is a 5x3 matrix with column labels Reason
, Homework.solution
, and Mini.lecture
. Think about if this table is formatted correctly for a bar graph. Reason
will be one axis, but we will want both Homework.solution
and Mini.lecture
percentages on the other axis. Though the original table is presented one way, we need to change the format slightly for our purposes.
Using Excel, I created a second representation of the table, a 10x3 matrix with columns Reason
, Percentage
, and Type
and saved it as a .csv. Type
has values 0 and 1 representing Homework.solution
and Mini.lecture
respectively. Now when we make the graph, Reason
will be on one axis, Percentage
will be on the other axis, and each Type
will have its own bar.
This is the second Excel table I created:
Read in and view the new table using
screencast <- read.csv("hw1 data2.csv")
screencast
## Reason Percentage
## 1 Watched entire video from start to finish 33
## 2 Re-watched certain segments based on my homework responses 26
## 3 Went to specific points to review 19
## 4 Watched large chunks looking for information 14
## 5 Browsed around 9
## 6 Watched entire video from start to finish 66
## 7 Re-watched certain segments based on my homework responses 5
## 8 Went to specific points to review 12
## 9 Watched large chunks looking for information 9
## 10 Browsed around 8
## Type
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 1
## 7 1
## 8 1
## 9 1
## 10 1
We will use the packages ggplot2
and scales
to create the graph. Load these using
library(ggplot2)
library(scales)
We will use ggplot()
as a base and add on layers to customize our graph. We will set up the plot as laid out above: Reason
will be on the x-axis, Percentage
will be on the y-axis, and the data will be split by Type
. Note that Type
must be used as factor(Type)
because R reads the 0,1 entries as integers rather than categories.
# The first step is specifying the basic form of the graph
ggplot(data=screencast, aes(x=Reason, y=Percentage, fill=factor(Type)))
Next, add the bar graph using the layer geom_bar()
with the arguments stat="identity"
to use the data as bar heights and position="dodge"
so that the two bars don't overlap each other. The layer coord_flip()
will flip the x- and y-axes creating a horizontal bar graph, instead of vertical. The layer ggtitle()
will title the graph.
Now the code and graph look like:
ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
geom_bar(position="dodge",stat="identity") +
coord_flip() +
ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts")
The basic structure of the graph has been created, but there is still a lot to do to clean it up.
Adding the layers scale_fill_grey()
and theme_bw()
will change the color scheme to black & white so it is easily printable.
ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
geom_bar(position="dodge",stat="identity") +
coord_flip() +
ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") +
scale_fill_grey() +
theme_bw()
You may have noticed that the order of the categories has changed from the original table. R has automatically alphabetized them. The categories can be reordered using the argument limits
in the layer scale_x_discrete()
. We refer to the x-axis here because Reason
was originally assigned to be x. It was only by using coord_flip()
that Reason
now appears on the y-axis.
scale_x_discrete(
limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish")
)
The elements are listed in reverse order from the table because they are assigned to the axis as it increases.
Some of the reasons are long and each takes up its own line. There are no line breaks written into the text. We can use the argument labels
in scale_x_discrete()
to change the labels so that the graph does not take up more room than necessary by adding the argument labels
. Write your labels and use \n
wherever you want to add a line break.
scale_x_discrete(
limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"),
labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish")
)
After making all these changes, our graph is looking better. Here is the complete code so far and the graph:
ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
geom_bar(position="dodge",stat="identity") +
coord_flip() +
ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") +
scale_fill_grey() +
theme_bw() +
scale_x_discrete(
limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"),
labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish")
)
The legend needs some fine tuning. Further arguments can be added within scale_fill_grey()
to adjust the legend. The order of elements in the legend does not match that of the bars and the labels are meaningless. We can fix both these problems with the arguments breaks
and labels
.
scale_fill_grey(
breaks=c(1,0), #changes the order of elements to that listed
labels=c("Mini-lecture Screencast, n=196","Homework Screencast, n=209") #changes labels of elements in their respective order
)
Since the original table gave total sample size, I include this data in the legend so no information is lost.
Arguments in the layer theme()
also affect the legend. You can remove the legend title and move the legend using:
theme(
legend.title=element_blank(), #removes legend title
legend.position=c(.73,.7), #changes legend position
)
When positioning the legend on the graph, imagine the first quandrant of a coordinate system laying on top of it. The bottom left corner is (0,0) and the top right is (1,1). Choose your legend's coordinates within that range. I chose the coordinates (.73,.7).
Our code and graph now look like:
ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
geom_bar(position="dodge",stat="identity") +
coord_flip() +
ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") +
scale_fill_grey(
breaks=c(1,0),
labels=c("Mini-lecture Screencast, n=196","Homework Screencast, n=209")
) +
theme_bw() +
scale_x_discrete(
limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"),
labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish")
) +
theme(
legend.title=element_blank(),
legend.position=c(.73,.7)
)
The final topic to consider is the text on the graph. This includes axis labels, axis text, title, font, size, etc. All of these aesthetic changes are made using arguments in the theme()
layer.
The y-axis does not need to be labeled Reason since the categories are written out. This is easily taken out using the argument axis.title.y
.
theme(
axis.title.y=element_blank(), #removes y-axis label
)
We can change the font on the entire graph using text
.
theme(
axis.title.y=element_blank(), #removes y-axis label
text=element_text(family="serif",size=20) #changes font family and text size on entire graph
)
Finally, we can make changes to the title using plot.title
.
theme(
axis.title.y=element_blank(), #removes y-axis label
text=element_text(family="serif"), #changes font on entire graph
plot.title=element_text(face="bold",hjust=c(0,0)) #changes font face and location for graph title
)
So our final code and graph end up as:
ggplot(data=screencast, aes(x=Reason,y=Percentage,fill=factor(Type))) +
geom_bar(position="dodge",stat="identity") +
coord_flip() +
ggtitle("Strategies for Using Homework Solution and Mini-Lecture Screencasts") +
scale_fill_grey(
breaks=c(1,0),
labels=c("Mini-lecture Screencast, n=196","Homework Screencast, n=209")
) +
theme_bw() +
scale_x_discrete(
limits=c("Browsed around","Watched large chunks looking for information","Went to specific points to review","Re-watched certain segments based on my homework responses","Watched entire video from start to finish"),
labels=c("Browsed around","Watched large chunks looking for\n information","Went to specific points to review","Re-watched certain segments based on\n my homework responses","Watched entire video from start\n to finish")
) +
theme(
legend.title=element_blank(),
legend.position=c(.73,.7),
axis.title.y=element_blank(),
text=element_text(family="serif",size=20),
plot.title=element_text(face="bold",hjust=c(0,0))
)