Summary Table with dplyr

The R MASS package contains a data frame, anorexia, containing “weight change data for young female anorexia patients.” I will use the dplyr package to manipulate this data frame in a few way, concluding with a table summarizing the changes in weight for each treatment group.

Start by loading the packages and data:

library(MASS)
library(dplyr)
data(anorexia)

We can use dplyr’s “data manipulation verbs” one at a time to make individual alterations to the data. For example, we could create new objects to hold subsets for each treatment group. The groups are labeled “Cont,” for the control group, “CBT,” for the group receiving cognitive behavioral treatment, and “FT,” for the group receiving family treatment.

anor1 <- filter(anorexia, Treat=="Cont")
anor2 <- filter(anorexia, Treat=="CBT")
anor3 <- filter(anorexia, Treat=="FT")
head(anor1, 3)

##   Treat Prewt Postwt
## 1  Cont  80.7   80.2
## 2  Cont  89.4   80.1
## 3  Cont  91.8   86.4

head(anor2, 3)

##   Treat Prewt Postwt
## 1   CBT  80.5   82.2
## 2   CBT  84.9   85.6
## 3   CBT  81.5   81.4

head(anor3, 3)

##   Treat Prewt Postwt
## 1    FT  83.8   95.2
## 2    FT  83.3   94.3
## 3    FT  86.0   91.5

We can use the mutate() verb to create a new column displaying the changes in weight:

ARX <- mutate(anorexia, wtDelta = Postwt - Prewt)
head(ARX, 3)

##   Treat Prewt Postwt wtDelta
## 1  Cont  80.7   80.2    -0.5
## 2  Cont  89.4   80.1    -9.3
## 3  Cont  91.8   86.4    -5.4

We could create subsets for the data showing either an increase in weight or a decrease (or no change), and store those in new objects:

ARXpos <- filter(ARX, wtDelta > 0)
ARXneg <- filter(ARX, wtDelta <= 0)
head(ARXpos, 3)

##   Treat Prewt Postwt wtDelta
## 1  Cont  74.0   86.3    12.3
## 2  Cont  75.1   86.7    11.6
## 3  Cont  78.4   84.6     6.2

head(ARXneg, 3)

##   Treat Prewt Postwt wtDelta
## 1  Cont  80.7   80.2    -0.5
## 2  Cont  89.4   80.1    -9.3
## 3  Cont  91.8   86.4    -5.4

Here I’ll create a grouped data frame using the group_by() function. It won’t look any different, but the data is ready for some other summary operations:

ARXgroups <- group_by(ARX, Treat)
head(ARXgroups, 3)

## Source: local data frame [3 x 4]
## Groups: Treat [1]
## 
##    Treat Prewt Postwt wtDelta
##   (fctr) (dbl)  (dbl)   (dbl)
## 1   Cont  80.7   80.2    -0.5
## 2   Cont  89.4   80.1    -9.3
## 3   Cont  91.8   86.4    -5.4

For example, now I can sort within the subsets without having to break up the data frame (like I might do using the split() function in base R):

ARXsorted <- arrange(ARXgroups, wtDelta)
head(ARXsorted, 3)

## Source: local data frame [3 x 4]
## Groups: Treat [1]
## 
##    Treat Prewt Postwt wtDelta
##   (fctr) (dbl)  (dbl)   (dbl)
## 1    CBT  80.4   71.3    -9.1
## 2    CBT  81.0   73.4    -7.6
## 3    CBT  76.5   72.5    -4.0

The “piping” operator which dplyr borrows from the magrittr package allows us to do the last two operations in one command. Think of the pipe (“%>%”) as being equivalent to saying “then,” as in command1, THEN command2, THEN command3, etc:

ARX %>%
    group_by(Treat) %>%
    arrange(wtDelta)

## Source: local data frame [72 x 4]
## Groups: Treat [3]
## 
##     Treat Prewt Postwt wtDelta
##    (fctr) (dbl)  (dbl)   (dbl)
## 1     CBT  80.4   71.3    -9.1
## 2     CBT  81.0   73.4    -7.6
## 3     CBT  76.5   72.5    -4.0
## 4     CBT  86.4   82.7    -3.7
## 5     CBT  79.9   76.4    -3.5
## 6     CBT  83.0   81.6    -1.4
## 7     CBT  76.5   75.7    -0.8
## 8     CBT  87.4   86.7    -0.7
## 9     CBT  82.6   81.9    -0.7
## 10    CBT  84.2   83.9    -0.3
## ..    ...   ...    ...     ...

I’ll store that last set of manipulations in an object:

ARXwtgrp <- ARX %>%
    group_by(Treat) %>%
    arrange(wtDelta)

Then I will use the summarize() verb upon that object. Note that you can use the “summarise” spelling as well; dplyr will tolerate either spelling.

summarize (ARXwtgrp,
           count = n(),
           avgWtChange = mean(wtDelta),
           WtChangeStDev = sd(wtDelta))

## Source: local data frame [3 x 4]
## 
##    Treat count avgWtChange WtChangeStDev
##   (fctr) (int)       (dbl)         (dbl)
## 1    CBT    29    3.006897      7.308504
## 2   Cont    26   -0.450000      7.988705
## 3     FT    17    7.264706      7.157421

I will go ahead and store that last set of commands in an object:

ARXwtdeltaTable <- ARXwtgrp %>%
    summarize(count = n(),
              avgWtChange = mean(wtDelta),
              WtChangeStDev = sd(wtDelta))

Here is the full path to the final result we saw above, moving from the original data frame to the summary table in one statement. The nice thing about dplyr is the way it allows a fairly seamless and intuitive method of defining that path without making you create a bunch of new variables. I began by storing a lot of data in new objects one “verb” at a time, but that was only for illustration. It’s also fine to forego creating any new objects at all if the only thing you want to do is take an exploratory look at some different subsets in the console.

ARXWtTab <- anorexia %>%
    group_by(Treat) %>%
    mutate(wtDelta = Postwt - Prewt) %>%
    summarize(count = n(),
              avgWtChng = mean(wtDelta, na.rm=TRUE),
              StDevWtChng = sd(wtDelta, na.rm=TRUE))

Now for a rudimentary plot of the data. This is easiest if I use the “ARX” object created earlier.

plot(ARX$Treat, ARX$wtDelta)

This quick exploratory glance indicates that the “family treatment” group exhibited an average weight gain that was significantly higher than that of the control group, whereas the “cognitive behavioral treatment” group appears not to have done so. Needless to say, this is only a glance, not a rigorous analysis. The dplyr functions have allowed us to quickly arrive at this snapshot view so that we can proceed to ask more detailed questions of the data if we want to.