This tutorial is about making run charts in trellis layout. If you are new to run charts, please read my run charts tutorial introducing the qicharts (Quality Improvement Charts) package first.
You will need R, and preferably RStudio, with the add-on packages tidyr and qicharts.
library(tidyr)
library(qicharts)
I have these data on the number of new cases of Clostridium difficile infections per 10,000 patient days at six hospitals.
| month | AH | BH | GE | HE | HH | NO |
|---|---|---|---|---|---|---|
| 2012-08-01 | 16.1 | 8.6 | 12.6 | 5.9 | 18.0 | 9.0 |
| 2012-09-01 | 28.4 | 27.6 | 7.7 | 7.0 | 12.3 | 7.2 |
| 2012-10-01 | 11.4 | 13.5 | 7.2 | 6.5 | 8.9 | 11.1 |
| 2012-11-01 | 21.9 | 10.8 | 6.9 | 12.0 | 11.4 | 10.6 |
| 2012-12-01 | 22.0 | 9.2 | 15.1 | 11.4 | 13.9 | 11.0 |
| 2013-01-01 | 22.3 | 31.6 | 13.0 | 14.0 | 16.5 | 20.9 |
| 2013-02-01 | 24.6 | 21.1 | 17.5 | 11.3 | 14.7 | 9.6 |
| 2013-03-01 | 38.4 | 18.5 | 19.9 | 11.8 | 13.1 | 9.0 |
| 2013-04-01 | 21.4 | 25.9 | 27.6 | 9.3 | 14.3 | 9.2 |
| 2013-05-01 | 24.1 | 20.7 | 12.8 | 10.3 | 13.8 | 7.6 |
| 2013-06-01 | 12.0 | 13.8 | 6.0 | 10.6 | 16.3 | 6.3 |
| 2013-07-01 | 32.9 | 18.9 | 6.1 | 13.4 | 16.5 | 4.6 |
| 2013-08-01 | 23.3 | 27.4 | 8.0 | 10.3 | 14.2 | 6.8 |
| 2013-09-01 | 30.0 | 14.5 | 7.7 | 9.0 | 12.9 | 5.9 |
| 2013-10-01 | 29.3 | 16.6 | 3.7 | 9.7 | 14.0 | 10.7 |
| 2013-11-01 | 28.3 | 32.1 | 3.5 | 11.7 | 15.2 | 5.5 |
| 2013-12-01 | 26.9 | 12.5 | 6.7 | 9.7 | 15.3 | 7.2 |
| 2014-01-01 | 18.9 | 16.6 | 8.5 | 11.2 | 14.4 | 12.3 |
| 2014-02-01 | 20.7 | 14.1 | 9.5 | 6.9 | 19.1 | 11.6 |
| 2014-03-01 | 25.2 | 14.4 | 17.4 | 10.7 | 20.6 | 14.2 |
| 2014-04-01 | 21.4 | 15.7 | 17.2 | 7.9 | 17.9 | 6.4 |
| 2014-05-01 | 17.4 | 17.4 | 5.1 | 8.8 | 13.9 | 7.7 |
| 2014-06-01 | 10.3 | 13.7 | 1.8 | 1.5 | 9.2 | 4.9 |
| 2014-07-01 | 18.0 | 14.9 | 0.0 | 1.8 | 17.2 | 1.1 |
Note how the months are presented in the only unambiguous date format in the world, year-month-day. Always use this format for dates. It makes life so much easier because R and any other programming language understands this format right out of the box. Also, this format sorts rights if used in file names, for example.
Now, mark and copy the table and assign it to the variable d like this: d <- read.delim('clipboard')
Or you can save the table to a file and read it from R like this. Next, don’t forget to tell R that month is a date variable.
d <- read.delim('cdi_case.tab')
d$month <- as.Date(d$month)
head(d)
## month AH BH GE HE HH NO
## 1 2012-08-01 16.1 8.6 12.6 5.9 18.0 9.0
## 2 2012-09-01 28.4 27.6 7.7 7.0 12.3 7.2
## 3 2012-10-01 11.4 13.5 7.2 6.5 8.9 11.1
## 4 2012-11-01 21.9 10.8 6.9 12.0 11.4 10.6
## 5 2012-12-01 22.0 9.2 15.1 11.4 13.9 11.0
## 6 2013-01-01 22.3 31.6 13.0 14.0 16.5 20.9
In order for R to perform it’s magic, we need to “melt” the data from wide to tall format. The melting is managed by the gather function from the tidyr package.
d <- gather(d, hosp, cdi, -month)
head(d)
## month hosp cdi
## 1 2012-08-01 AH 16.1
## 2 2012-09-01 AH 28.4
## 3 2012-10-01 AH 11.4
## 4 2012-11-01 AH 21.9
## 5 2012-12-01 AH 22.0
## 6 2013-01-01 AH 22.3
Data like these are all too often presented as grouped bar charts.
What is your interpretation of the CDI rate at hospitals NO and GE compared to the other hospitals? Impossible!
Now, let’s create our first trellis run chart using the trc() function.
trc(cdi ~ month | hosp,
data = d,
main = 'New CDI cases at six hospitals',
ylab = 'CDI per 10,000 patient days',
xlab = 'Month')
The trc() function takes as it’s first argument a so called formula object. The formula is of the form y~x|g, indicating that plots of y versus x should be produced conditional on the variable g.
The dashed, red center lines on the graphs from hospitals NO and GE suggest that non-random variation is present. The presence of non-random variation combined with a (subjective) visual analysis of these charts tells us that the CDI rates are declining in these hospitals.
trc() has a number of useful option for tailoring the plot. Please study the help file, ?trc.
So, which is the better presentation of the CDI data – grouped bar charts or trellis run charts? Or put another way, which presentation helps you make good decisions?