The reshape2 package is a bit confusing to learn at first, but once you get it, it can greatly facilitate extracting values such as means from many columns quickly and easily. So far, my example is pretty basic. I may add more to this later as I learn more.

Here is some sample data.

library(reshape2)
mydata <- data.frame(id=c(1,1,2,2),time=c(1,2,1,2),x1=c(5,3,6,2),x2=c(6,5,1,4))

Suppose that we want the means of each of the “x” columns (x1, x2) over just the id. Without reshape, it would look like this:

aggregate(cbind(x1, x2) ~ id, mydata, mean)
##   id x1  x2
## 1  1  4 5.5
## 2  2  4 2.5

This works, but we must specifiy the name of every “x” column. Suppose there are hundreds, and we want to calculate all of them automatically? This is where reshape2 helps. First, we “melt” the data:

melt_mydata <- melt(mydata, id=c("id", "time"))
melt_mydata
##   id time variable value
## 1  1    1       x1     5
## 2  1    2       x1     3
## 3  2    1       x1     6
## 4  2    2       x1     2
## 5  1    1       x2     6
## 6  1    2       x2     5
## 7  2    1       x2     1
## 8  2    2       x2     4

Here, we specify “id” fields of id and time. All other columns are “melted” into two new columns, variable, which contains the name of a melted column, and value, which contains the value for that column from the original data. So you can see that, in row 1 of the original data, id=1, time=1, x1=5 and x2=6. There are two rows in the melted data that correspond to this row, rows 1 and 5. There is one row for each melted column. If we had four “x” columns instead of two, there would be four rows in the melted data corresponding to each row in the original data.

Now that the data is melted, we can easily get the means for each melted column, regardless of how many there are.

means_by_id <- dcast(melt_mydata, id ~ variable, mean)
means_by_id
##   id x1  x2
## 1  1  4 5.5
## 2  2  4 2.5

The old reshape package had a cast method. Note that reshape2 has an acast method, which returns a vector/matrix/array, and dcast, which returns a data.matrix.

Note that we do not need to specify the names of the melted columns. If we wanted to restrict them, we could. Maybe we want means for all of the “x” columns except x2.

means_by_id <- dcast(melt_mydata[melt_mydata$variable!=c("x2"),], id ~ variable, mean)
means_by_id
##   id x1
## 1  1  4
## 2  2  4

But what if we wanted means by the time column, instead of id? No problem.

means_by_time <- dcast(melt_mydata, time ~ variable, mean)
means_by_time
##   time  x1  x2
## 1    1 5.5 3.5
## 2    2 2.5 4.5

Now just to demonstrate a bit more of the flexibility, let’s expand the data set.

mydata <- data.frame(id=c(1,1,2,2),time=c(1,2,1,2),x1=c(3,4,7,8),x2=c(4,1,0,3),x3=c(7,1,9,11),x4=c(3,2,7,9))
mydata
##   id time x1 x2 x3 x4
## 1  1    1  3  4  7  3
## 2  1    2  4  1  1  2
## 3  2    1  7  0  9  7
## 4  2    2  8  3 11  9

Now we are up to four “x” columns. Let’s try the same approach as before.

melt_mydata <- melt(mydata, id=c("id", "time"))
means_by_id <- dcast(melt_mydata, id ~ variable, mean)
means_by_id
##   id  x1  x2 x3  x4
## 1  1 3.5 2.5  4 2.5
## 2  2 7.5 1.5 10 8.0

If you consider further that you might load a data set with a hundred or more columns, and you need means for all but a couple of “id” columns, you can see the power of reshape2.

One more small point. Suppose we want the means of each combination of the id fields?

means_by_id_time <- dcast(melt_mydata, id + time ~ variable, mean)
means_by_id_time
##   id time x1 x2 x3 x4
## 1  1    1  3  4  7  3
## 2  1    2  4  1  1  2
## 3  2    1  7  0  9  7
## 4  2    2  8  3 11  9

Alternatively, if you want means for id and time separately, but at the same time, you can use acast to output to a multidimensional array (data.frame is of course limited to two dimensions).

means_by_id_time <- acast(melt_mydata, id ~ time ~ variable, mean)
means_by_id_time
## , , x1
## 
##   1 2
## 1 3 4
## 2 7 8
## 
## , , x2
## 
##   1 2
## 1 4 1
## 2 0 3
## 
## , , x3
## 
##   1  2
## 1 7  1
## 2 9 11
## 
## , , x4
## 
##   1 2
## 1 3 2
## 2 7 9