Jaehyun Shin (Department of Educational Psychology, University of Minnesota)
This is a tutorial of how to make a line graph in R, which depicts and represents data set from a published article. It allows us to compare and/or to better see some pattern. Next short steps will show how to create the plot with ggplot2 in R.
The data set for this tutorial was selected from Speece, D., Ritchey, K., Silverman, R., Schatschneider, C., Walker, C., & Andrusik, K. (2010). Identifying Children in Middle Childhood who are at risk for Reading Problems. School Psychology Review, 39(2), 258-276.
From the descriptive statistics table on page 267, all the mean values for each criterion measure of the two groups (At risk readers and Not at risk readers) will be used for comparison.
Step 1 - Creating data set (In RStudio, you can also import existing dataset on your computer)
First, to make the data set that can be used in RStudio should be .csv format. (In Excel, you can save the data file as .csv)
To make the line graph with ggplot for comparing means of at risk group with those of not at risk group, in Excel, the first column was designated for “Criterion measures”, containing a total of 18 measures. The second column was designated for a type of at risk (at risk or not at risk group), and, finally, the third column indicated each mean of each criterion measures for the two types.
Then, your data set in Excel should be shown as below.
| Criterion.Measures | at.risk.type | Mean |
|---|---|---|
| CBM Maze | not at risk | 9.84 |
| CBM Passage Reading Fluency | not at risk | 154.57 |
| CBM Word Identification Fluency | not at risk | 83.62 |
| Colorade Decoding | not at risk | 30.12 |
| GMRT Reading Comprehension | not at risk | 107.41 |
| TOWRE Phonemic Decoding Efficiency | not at risk | 114.70 |
| TOWRE Sight Word Efficiency | not at risk | 110.33 |
| WJ-III Word Attack | not at risk | 108.85 |
| WJ-III Word Identification | not at risk | 107.34 |
| CBM Maze | at risk | 6.60 |
| CBM Passage Reading Fluency | at risk | 98.07 |
| CBM Word Identification Fluency | at risk | 61.25 |
| Colorade Decoding | at risk | 23.05 |
| GMRT Reading Comprehension | at risk | 87.92 |
| TOWRE Phonemic Decoding Efficiency | at risk | 96.67 |
| TOWRE Sight Word Efficiency | at risk | 97.19 |
| WJ-III Word Attack | at risk | 98.57 |
| 18 WJ-III Word Identification | at risk | 94.52 |
Now, save this data set as .csv file
Step 2 - Reading the data set
To import the data set into RStudio, use the command “read.csv()” For example, when the file name is “CBM”, like this tutorial example, you can write the command like > CBM <- read.csv(“~/CBM.csv”“)
CBM <- read.csv("C:/Users/USER/Desktop/Spring 2013/EPsy 8252/HW_lab_1/CBM.csv")
Or, as noted above, you can also load the data file(.csv) by clicking "Import Dataset” at the top on the right side of RStudio.
Then, you can check with the data set imported whether the data set was appropriately loaded and/or see the construct of the data set. You can check your data set by using such as > head(data name), tail(data name), and summary(data name).
For example,
summary(CBM)
## Criterion.Measures at.risk.type
## CBM Maze :2 at risk :9
## CBM Passage Reading Fluency :2 not at risk:9
## CBM Word Identification Fluency :2
## Colorade Decoding :2
## GMRT Reading Comprehension :2
## TOWRE Phonemic Decoding Efficiency:2
## (Other) :6
## Mean
## Min. : 6.6
## 1st Qu.: 66.8
## Median : 96.9
## Mean : 82.8
## 3rd Qu.:107.4
## Max. :154.6
##
Step 3 - Making the line graph!
Now, you can create the line graph using ggplot2! The reason why the line graph was chosen among a lot of graphs is to see and compare various mean values for each criterion measure for the two (at risk and not at risk) groups.
At first, you need to load the ggplot2 library like below:
library(ggplot2)
*Note: If your RStudio deos not have ggplot2 package, first install the package.
For making the line graph, we need to consider what variable to be x-axis and y-axis, respectively, also, what variable to be used for grouping.
In this tutorial example, x-axis indicates each criterion measure, y-axis indicates means of the two groups for each criterion measure, and two different groups will be plotted with two line graphs.
Then, you can get the line graph like below, which shows relative differences of means for each criterion measure between at risk and non at risk groups.
ggplot(CBM, aes(x = Criterion.Measures, y = Mean, group = at.risk.type)) + geom_line(aes(colour = at.risk.type))
In this line graph, you can see that all the data points of at risk group are below the points of non at risk group. Through this line graph, we can see the differences of means for the two groups very easily and intuitively.
One more step seems to be needed for making the graph more visually clear. To do this, you'd better add some dots on the exact point of the each data. You can use command “geom_point”, and can modify the size as well as shape of the point. For example, when you add command like > geom_point(pch = 18, size = 4), black dots will appear on each data point for all criterion measures. Result (below) after apply this shows the complete graph!
ggplot(CBM, aes(x = Criterion.Measures, y = Mean, group = at.risk.type)) + geom_line(aes(colour = at.risk.type)) +
geom_point(pch = 18, size = 4)
In this short tutorial, you have learned how to make a line graph which allows us to see some pattern or compare something (e.g., scores) between groups. To be sure, more options can be applied to making this kind of graph.
Thank you!