EPsy 8252 Lab1: Making a line graph with ggplot2

Jaehyun Shin (Department of Educational Psychology, University of Minnesota)

This is a tutorial of how to make a line graph in R, which depicts and represents data set from a published article. It allows us to compare and/or to better see some pattern. Next short steps will show how to create the plot with ggplot2 in R.

The data set for this tutorial was selected from Speece, D., Ritchey, K., Silverman, R., Schatschneider, C., Walker, C., & Andrusik, K. (2010). Identifying Children in Middle Childhood who are at risk for Reading Problems. School Psychology Review, 39(2), 258-276.

From the descriptive statistics table on page 267, all the mean values for each criterion measure of the two groups (At risk readers and Not at risk readers) will be used for comparison.

Step 1 - Creating data set (In RStudio, you can also import existing dataset on your computer)

Then, your data set in Excel should be shown as below.

Criterion.Measures at.risk.type Mean
CBM Maze not at risk 9.84
CBM Passage Reading Fluency not at risk 154.57
CBM Word Identification Fluency not at risk 83.62
Colorade Decoding not at risk 30.12
GMRT Reading Comprehension not at risk 107.41
TOWRE Phonemic Decoding Efficiency not at risk 114.70
TOWRE Sight Word Efficiency not at risk 110.33
WJ-III Word Attack not at risk 108.85
WJ-III Word Identification not at risk 107.34
CBM Maze at risk 6.60
CBM Passage Reading Fluency at risk 98.07
CBM Word Identification Fluency at risk 61.25
Colorade Decoding at risk 23.05
GMRT Reading Comprehension at risk 87.92
TOWRE Phonemic Decoding Efficiency at risk 96.67
TOWRE Sight Word Efficiency at risk 97.19
WJ-III Word Attack at risk 98.57
18 WJ-III Word Identification at risk 94.52

Now, save this data set as .csv file

Step 2 - Reading the data set

To import the data set into RStudio, use the command “read.csv()” For example, when the file name is “CBM”, like this tutorial example, you can write the command like > CBM <- read.csv(“~/CBM.csv”“)

CBM <- read.csv("C:/Users/USER/Desktop/Spring 2013/EPsy 8252/HW_lab_1/CBM.csv")

Or, as noted above, you can also load the data file(.csv) by clicking "Import Dataset” at the top on the right side of RStudio.

Then, you can check with the data set imported whether the data set was appropriately loaded and/or see the construct of the data set. You can check your data set by using such as > head(data name), tail(data name), and summary(data name).

For example,

summary(CBM)
##                           Criterion.Measures      at.risk.type
##  CBM Maze                          :2        at risk    :9    
##  CBM Passage Reading Fluency       :2        not at risk:9    
##  CBM Word Identification Fluency   :2                         
##  Colorade Decoding                 :2                         
##  GMRT Reading Comprehension        :2                         
##  TOWRE Phonemic Decoding Efficiency:2                         
##  (Other)                           :6                         
##       Mean      
##  Min.   :  6.6  
##  1st Qu.: 66.8  
##  Median : 96.9  
##  Mean   : 82.8  
##  3rd Qu.:107.4  
##  Max.   :154.6  
## 

Step 3 - Making the line graph!

Now, you can create the line graph using ggplot2! The reason why the line graph was chosen among a lot of graphs is to see and compare various mean values for each criterion measure for the two (at risk and not at risk) groups.

At first, you need to load the ggplot2 library like below:

library(ggplot2)

*Note: If your RStudio deos not have ggplot2 package, first install the package.

For making the line graph, we need to consider what variable to be x-axis and y-axis, respectively, also, what variable to be used for grouping.

In this tutorial example, x-axis indicates each criterion measure, y-axis indicates means of the two groups for each criterion measure, and two different groups will be plotted with two line graphs.

Then, you can get the line graph like below, which shows relative differences of means for each criterion measure between at risk and non at risk groups.

ggplot(CBM, aes(x = Criterion.Measures, y = Mean, group = at.risk.type)) + geom_line(aes(colour = at.risk.type))

plot of chunk unnamed-chunk-4

In this line graph, you can see that all the data points of at risk group are below the points of non at risk group. Through this line graph, we can see the differences of means for the two groups very easily and intuitively.

One more step seems to be needed for making the graph more visually clear. To do this, you'd better add some dots on the exact point of the each data. You can use command “geom_point”, and can modify the size as well as shape of the point. For example, when you add command like > geom_point(pch = 18, size = 4), black dots will appear on each data point for all criterion measures. Result (below) after apply this shows the complete graph!

ggplot(CBM, aes(x = Criterion.Measures, y = Mean, group = at.risk.type)) + geom_line(aes(colour = at.risk.type)) + 
    geom_point(pch = 18, size = 4)

plot of chunk unnamed-chunk-5

In this short tutorial, you have learned how to make a line graph which allows us to see some pattern or compare something (e.g., scores) between groups. To be sure, more options can be applied to making this kind of graph.

Thank you!