The R MASS package contains a data frame, anorexia, containing “weight change data for young female anorexia patients.” I will use the dplyr package to manipulate this data frame in a few way, concluding with a table summarizing the changes in weight for each treatment group.
Start by loading the packages and data:
library(MASS)
library(dplyr)
data(anorexia)
We can use dplyr’s “data manipulation verbs” one at a time to make individual alterations to the data. For example, we could create new objects to hold subsets for each treatment group. The groups are labeled “Cont,” for the control group, “CBT,” for the group receiving cognitive behavioral treatment, and “FT,” for the group receiving family treatment.
anor1 <- filter(anorexia, Treat=="Cont")
anor2 <- filter(anorexia, Treat=="CBT")
anor3 <- filter(anorexia, Treat=="FT")
head(anor1, 3)
## Treat Prewt Postwt
## 1 Cont 80.7 80.2
## 2 Cont 89.4 80.1
## 3 Cont 91.8 86.4
head(anor2, 3)
## Treat Prewt Postwt
## 1 CBT 80.5 82.2
## 2 CBT 84.9 85.6
## 3 CBT 81.5 81.4
head(anor3, 3)
## Treat Prewt Postwt
## 1 FT 83.8 95.2
## 2 FT 83.3 94.3
## 3 FT 86.0 91.5
We can use the mutate() verb to create a new column displaying the changes in weight:
ARX <- mutate(anorexia, wtDelta = Postwt - Prewt)
head(ARX, 3)
## Treat Prewt Postwt wtDelta
## 1 Cont 80.7 80.2 -0.5
## 2 Cont 89.4 80.1 -9.3
## 3 Cont 91.8 86.4 -5.4
We could create subsets for the data showing either an increase in weight or a decrease (or no change), and store those in new objects:
ARXpos <- filter(ARX, wtDelta > 0)
ARXneg <- filter(ARX, wtDelta <= 0)
head(ARXpos, 3)
## Treat Prewt Postwt wtDelta
## 1 Cont 74.0 86.3 12.3
## 2 Cont 75.1 86.7 11.6
## 3 Cont 78.4 84.6 6.2
head(ARXneg, 3)
## Treat Prewt Postwt wtDelta
## 1 Cont 80.7 80.2 -0.5
## 2 Cont 89.4 80.1 -9.3
## 3 Cont 91.8 86.4 -5.4
Here I’ll create a grouped data frame using the group_by() function. It won’t look any different, but the data is ready for some other summary operations:
ARXgroups <- group_by(ARX, Treat)
head(ARXgroups, 3)
## Source: local data frame [3 x 4]
## Groups: Treat [1]
##
## Treat Prewt Postwt wtDelta
## (fctr) (dbl) (dbl) (dbl)
## 1 Cont 80.7 80.2 -0.5
## 2 Cont 89.4 80.1 -9.3
## 3 Cont 91.8 86.4 -5.4
For example, now I can sort within the subsets without having to break up the data frame (like I might do using the split() function in base R):
ARXsorted <- arrange(ARXgroups, wtDelta)
head(ARXsorted, 3)
## Source: local data frame [3 x 4]
## Groups: Treat [1]
##
## Treat Prewt Postwt wtDelta
## (fctr) (dbl) (dbl) (dbl)
## 1 CBT 80.4 71.3 -9.1
## 2 CBT 81.0 73.4 -7.6
## 3 CBT 76.5 72.5 -4.0
The “piping” operator which dplyr borrows from the magrittr package allows us to do the last two operations in one command. Think of the pipe (“%>%”) as being equivalent to saying “then,” as in command1, THEN command2, THEN command3, etc:
ARX %>%
group_by(Treat) %>%
arrange(wtDelta)
## Source: local data frame [72 x 4]
## Groups: Treat [3]
##
## Treat Prewt Postwt wtDelta
## (fctr) (dbl) (dbl) (dbl)
## 1 CBT 80.4 71.3 -9.1
## 2 CBT 81.0 73.4 -7.6
## 3 CBT 76.5 72.5 -4.0
## 4 CBT 86.4 82.7 -3.7
## 5 CBT 79.9 76.4 -3.5
## 6 CBT 83.0 81.6 -1.4
## 7 CBT 76.5 75.7 -0.8
## 8 CBT 87.4 86.7 -0.7
## 9 CBT 82.6 81.9 -0.7
## 10 CBT 84.2 83.9 -0.3
## .. ... ... ... ...
I’ll store that last set of manipulations in an object:
ARXwtgrp <- ARX %>%
group_by(Treat) %>%
arrange(wtDelta)
Then I will use the summarize() verb upon that object. Note that you can use the “summarise” spelling as well; dplyr will tolerate either spelling.
summarize (ARXwtgrp,
count = n(),
avgWtChange = mean(wtDelta),
WtChangeStDev = sd(wtDelta))
## Source: local data frame [3 x 4]
##
## Treat count avgWtChange WtChangeStDev
## (fctr) (int) (dbl) (dbl)
## 1 CBT 29 3.006897 7.308504
## 2 Cont 26 -0.450000 7.988705
## 3 FT 17 7.264706 7.157421
I will go ahead and store that last set of commands in an object:
ARXwtdeltaTable <- ARXwtgrp %>%
summarize(count = n(),
avgWtChange = mean(wtDelta),
WtChangeStDev = sd(wtDelta))
Here is the full path to the final result we saw above, moving from the original data frame to the summary table in one statement. The nice thing about dplyr is the way it allows a fairly seamless and intuitive method of defining that path without making you create a bunch of new variables. I began by storing a lot of data in new objects one “verb” at a time, but that was only for illustration. It’s also fine to forego creating any new objects at all if the only thing you want to do is take an exploratory look at some different subsets in the console.
ARXWtTab <- anorexia %>%
group_by(Treat) %>%
mutate(wtDelta = Postwt - Prewt) %>%
summarize(count = n(),
avgWtChng = mean(wtDelta, na.rm=TRUE),
StDevWtChng = sd(wtDelta, na.rm=TRUE))
Now for a rudimentary plot of the data. This is easiest if I use the “ARX” object created earlier.
plot(ARX$Treat, ARX$wtDelta)
This quick exploratory glance indicates that the “family treatment” group exhibited an average weight gain that was significantly higher than that of the control group, whereas the “cognitive behavioral treatment” group appears not to have done so. Needless to say, this is only a glance, not a rigorous analysis. The dplyr functions have allowed us to quickly arrive at this snapshot view so that we can proceed to ask more detailed questions of the data if we want to.