EPsy 8252 Lab1: Making a line graph with ggplot2

Jaehyun Shin (Department of Educational Psychology, University of Minnesota)

This is a tutorial of how to make a line graph in R, which depicts and represents data set from a published article. It allows us to compare and/or to better see some pattern. Next short steps will show how to create the plot with ggplot2 in R.

The data set for this tutorial was selected from Speece, D., Ritchey, K., Silverman, R., Schatschneider, C., Walker, C., & Andrusik, K. (2010). Identifying Children in Middle Childhood who are at risk for Reading Problems. School Psychology Review, 39(2), 258-276.

From the descriptive statistics table on page 267, all the mean values for each criterion measure of the two groups (At risk readers and Not at risk readers) will be used for comparison.

Step 1 - Creating data set (In RStudio, you can also import existing dataset on your computer)

First, to make the data set that can be used in RStudio should be .csv format. (In Excel, you can save the data file as .csv)
To make the line graph with ggplot for comparing means of at risk group with those of not at risk group, in Excel, the first column was designated for “Criterion measures”, containing a total of 18 measures. The second column was designated for a type of at risk (at risk or not at risk group), and, finally, the third column indicated each mean of each criterion measures for the two types.

Then, your data set in Excel should be shown as below.

Criterion.Measures	at.risk.type	Mean
CBM Maze	not at risk	9.84
CBM Passage Reading Fluency	not at risk	154.57
CBM Word Identification Fluency	not at risk	83.62
Colorade Decoding	not at risk	30.12
GMRT Reading Comprehension	not at risk	107.41
TOWRE Phonemic Decoding Efficiency	not at risk	114.70
TOWRE Sight Word Efficiency	not at risk	110.33
WJ-III Word Attack	not at risk	108.85
WJ-III Word Identification	not at risk	107.34
CBM Maze	at risk	6.60
CBM Passage Reading Fluency	at risk	98.07
CBM Word Identification Fluency	at risk	61.25
Colorade Decoding	at risk	23.05
GMRT Reading Comprehension	at risk	87.92
TOWRE Phonemic Decoding Efficiency	at risk	96.67
TOWRE Sight Word Efficiency	at risk	97.19
WJ-III Word Attack	at risk	98.57
18 WJ-III Word Identification	at risk	94.52

Now, save this data set as .csv file

Step 2 - Reading the data set

To import the data set into RStudio, use the command “read.csv()” For example, when the file name is “CBM”, like this tutorial example, you can write the command like > CBM <- read.csv(“~/CBM.csv”“)

CBM <- read.csv("C:/Users/USER/Desktop/Spring 2013/EPsy 8252/HW_lab_1/CBM.csv")

Or, as noted above, you can also load the data file(.csv) by clicking "Import Dataset” at the top on the right side of RStudio.

Then, you can check with the data set imported whether the data set was appropriately loaded and/or see the construct of the data set. You can check your data set by using such as > head(data name), tail(data name), and summary(data name).

For example,

summary(CBM)

##                           Criterion.Measures      at.risk.type
##  CBM Maze                          :2        at risk    :9    
##  CBM Passage Reading Fluency       :2        not at risk:9    
##  CBM Word Identification Fluency   :2                         
##  Colorade Decoding                 :2                         
##  GMRT Reading Comprehension        :2                         
##  TOWRE Phonemic Decoding Efficiency:2                         
##  (Other)                           :6                         
##       Mean      
##  Min.   :  6.6  
##  1st Qu.: 66.8  
##  Median : 96.9  
##  Mean   : 82.8  
##  3rd Qu.:107.4  
##  Max.   :154.6  
##

Step 3 - Making the line graph!

Now, you can create the line graph using ggplot2! The reason why the line graph was chosen among a lot of graphs is to see and compare various mean values for each criterion measure for the two (at risk and not at risk) groups.

At first, you need to load the ggplot2 library like below:

library(ggplot2)

*Note: If your RStudio deos not have ggplot2 package, first install the package.

For making the line graph, we need to consider what variable to be x-axis and y-axis, respectively, also, what variable to be used for grouping.

In this tutorial example, x-axis indicates each criterion measure, y-axis indicates means of the two groups for each criterion measure, and two different groups will be plotted with two line graphs.

Then, you can get the line graph like below, which shows relative differences of means for each criterion measure between at risk and non at risk groups.

ggplot(CBM, aes(x = Criterion.Measures, y = Mean, group = at.risk.type)) + geom_line(aes(colour = at.risk.type))

plot of chunk unnamed-chunk-4

In this line graph, you can see that all the data points of at risk group are below the points of non at risk group. Through this line graph, we can see the differences of means for the two groups very easily and intuitively.

One more step seems to be needed for making the graph more visually clear. To do this, you'd better add some dots on the exact point of the each data. You can use command “geom_point”, and can modify the size as well as shape of the point. For example, when you add command like > geom_point(pch = 18, size = 4), black dots will appear on each data point for all criterion measures. Result (below) after apply this shows the complete graph!

ggplot(CBM, aes(x = Criterion.Measures, y = Mean, group = at.risk.type)) + geom_line(aes(colour = at.risk.type)) + 
    geom_point(pch = 18, size = 4)

plot of chunk unnamed-chunk-5

In this short tutorial, you have learned how to make a line graph which allows us to see some pattern or compare something (e.g., scores) between groups. To be sure, more options can be applied to making this kind of graph.

Thank you!