Reshape base R function tutorial

Goal

The goal of this note is to explain how to use the function reshape to transform (reshape) a data frame between the long and wide formats.

There are lots of alternatives to this function that are probably faster and, for sure, easy to apply. I have devoted time to this function because I love the R base functions.

As just said, reshape function changes the way a data frame is shown. Let us assume the data frame has N rows and M columns, so M * N = L cells. In the long version of the data frame, the number of columns is the minimum possible when each row represents one and only one observation and one attribute. I understand as the attribute, the value given to observation. So, in the long format, there is only one value for each observation. All other formats are wide formats. This definition of the long format is quite criptic. Hopefully, it can be better understood through examples.

Datasets

Let us shown some examples of datasets and agree if they are in the long or the wide format

Iris dataset

As a first example, let us take the iris dataset

data("iris")

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Each row in this dataset gives the measure of sepal length and width and petal length and width for 50 flowers from each of the 3 species of iris (type help(iris)). Each row represents an observation (a flower) of 4 variables (Attributes). So, this representation of the data cannot be considered in the long format because it has more than one attribute. Therefore, it is in wide format.

HairEyeColor dataset

Let us use another R dataset: HirEyeColor

data("HairEyeColor")

head(HairEyeColor)

## [1] 32 53 10  3 11 50

str(HairEyeColor)

##  'table' num [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
##   ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
##   ..$ Sex : chr [1:2] "Male" "Female"

print(HairEyeColor)

## , , Sex = Male
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8
## 
## , , Sex = Female
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8

As we can see this a 3-d matrix with Hair color, eye color and Sex where each dimension is a factor and the value of each cell represents the total number of students out of the total sample size 592 with a given combination of Hair color, eye color and sex. If we transform the matrix into a data frame, each row represents one type of observation and only one with one attibute: the count of students with a given type of Hair, Eye and sex. Therefore, this dataset is in long format.

hec <- as.data.frame(HairEyeColor)
head(hec)

##    Hair   Eye  Sex Freq
## 1 Black Brown Male   32
## 2 Brown Brown Male   53
## 3   Red Brown Male   10
## 4 Blond Brown Male    3
## 5 Black  Blue Male   11
## 6 Brown  Blue Male   50

Jonhson & Johnson dataset

The third example is the Jonhson & Johnson quarterly earnings dataset

data("JohnsonJohnson")

head(JohnsonJohnson)

## [1] 0.71 0.63 0.85 0.44 0.61 0.69

str(JohnsonJohnson)

##  Time-Series [1:84] from 1960 to 1981: 0.71 0.63 0.85 0.44 0.61 0.69 0.92 0.55 0.72 0.77 ...

In this example, each row represents the quarterly earnings in dollars of Johnson & Johnson. So, first row is Q1 1960, second row Q2 1960 and in this way sequently. As each represents only one value, the format of this time series once represented as table will be in long format. Let us build this table.

JJ <- data.frame(year = rep(1960:1980, each = 4), quarter = paste0("Qtr", rep(1:4, 21)), earnings = as.vector(JohnsonJohnson))

head(JJ)

##   year quarter earnings
## 1 1960    Qtr1     0.71
## 2 1960    Qtr2     0.63
## 3 1960    Qtr3     0.85
## 4 1960    Qtr4     0.44
## 5 1961    Qtr1     0.61
## 6 1961    Qtr2     0.69

Indometh Dataset

A fourth example is the Indometh dataset, used in the examples of the reshape help function

head(Indometh)

##   Subject time conc
## 1       1 0.25 1.50
## 2       1 0.50 0.94
## 3       1 0.75 0.78
## 4       1 1.00 0.48
## 5       1 1.25 0.37
## 6       1 2.00 0.19

It is in the Long format as for every Subject and time, only one value (attribute) of the concentration is given.

Reshape function

The reshape function applies to data frames. It has the following parameters:

data
varying
v.names
timevar
idvar
ids
times
drop
direction
new.row.names
sep
split

Before we explain the parameters, it is important to identify 3 types of columns in a long format data frame:

Idvar column/s: One or more columns that identify in an unique way each row once the data frame is in wide format. This column (or columns) are passed to the idvar parameter
Timevar column: the column (only one) that will be used to create the new columns in the wide format, one column per each value in the column. The name timevar comes from the specific application when this column represents different moments in time. For many applications, this name is quite confusing. The name of this column is passed to timevar parameter.
Other column(s): the column that contains the values that will be now distributed along the timevar column. This column (or columns as we will see later) is listed in the v.names parameter. this column corresponds to the attribute (value) column we dicussed in the first section

Note that the timevar column depends on the variable we want to move to the wide format. I feel that one example will help us to understand this point.

Example to identify: idvar, timevar and other columns

Let as take the Johnson & Johnson dataset, a data frame in the long format:

str(JJ)

## 'data.frame':    84 obs. of  3 variables:
##  $ year    : int  1960 1960 1960 1960 1961 1961 1961 1961 1962 1962 ...
##  $ quarter : Factor w/ 4 levels "Qtr1","Qtr2",..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ earnings: num  0.71 0.63 0.85 0.44 0.61 0.69 0.92 0.55 0.72 0.77 ...

There are 3 columns: year, quarter and earnings

Clearly, earnings column is the ‘other’ column

We can decide that the timevar column is the year. Then, the reshape function wil return a data frame where there will be one column for the earnings of each year. In this case, the idvar column will be the quarter.

The only parameters we need to pass to the function are

data = JJ (name of the data frame)
direction = “wide” (direction in which the data frame is transformed: “wide” when long to wide, “long” in other cases)
idvar = “quarter”
timevar = “year”
v.names = “earnings” (although it is not required because, by default, all columns that are not idvar or timevar are taken as Other type)

reshape(data = JJ, direction = "wide",
  timevar = "year",
  idvar = "quarter",
  v.names = "earnings")

##   quarter earnings.1960 earnings.1961 earnings.1962 earnings.1963
## 1    Qtr1          0.71          0.61          0.72          0.83
## 2    Qtr2          0.63          0.69          0.77          0.80
## 3    Qtr3          0.85          0.92          0.92          1.00
## 4    Qtr4          0.44          0.55          0.60          0.77
##   earnings.1964 earnings.1965 earnings.1966 earnings.1967 earnings.1968
## 1          0.92          1.16          1.26          1.53          1.53
## 2          1.00          1.30          1.38          1.59          2.07
## 3          1.24          1.45          1.86          1.83          2.34
## 4          1.00          1.25          1.56          1.86          2.25
##   earnings.1969 earnings.1970 earnings.1971 earnings.1972 earnings.1973
## 1          2.16          2.79          3.60          4.86          5.58
## 2          2.43          3.42          4.32          5.04          5.85
## 3          2.70          3.69          4.32          5.04          6.57
## 4          2.25          3.60          4.05          4.41          5.31
##   earnings.1974 earnings.1975 earnings.1976 earnings.1977 earnings.1978
## 1          6.03          6.93          7.74          9.54         11.88
## 2          6.39          7.74          8.91         10.26         12.06
## 3          6.93          7.83          8.28          9.54         12.15
## 4          5.85          6.12          6.84          8.73          8.91
##   earnings.1979 earnings.1980
## 1         14.04         16.20
## 2         12.96         14.67
## 3         14.85         16.02
## 4          9.99         11.61

Note that the columns names in the “wide” format have been built by concatenating the Other column name (“earnings”) with a point (“.”) and the timevar values. The separator “.” is the default value for the sep parameter. Any other value can be used as separator.

On the other hand, if we take the quarter as timevar, the reshape function will return a data frame where there will be one column for the earnings for each quarter. The idvar will be in this case the year.

wide_1 <- reshape(data = JJ, direction = "wide", 
  timevar = "quarter",
  idvar = "year", 
  sep = "_")
head(wide_1)

##    year earnings_Qtr1 earnings_Qtr2 earnings_Qtr3 earnings_Qtr4
## 1  1960          0.71          0.63          0.85          0.44
## 5  1961          0.61          0.69          0.92          0.55
## 9  1962          0.72          0.77          0.92          0.60
## 13 1963          0.83          0.80          1.00          0.77
## 17 1964          0.92          1.00          1.24          1.00
## 21 1965          1.16          1.30          1.45          1.25

We have omitted in this second example the v.names parameter as it can be guessed by the function and changed the separator from “.” to “_“.

Long to Wide format

It is possible to continue moving a column to wide format.

Let us take our last example wide_1. Column year will be our timevar column and the rest of columns (names(wide_1)[-1]) as other columns. Note that in this case we do not have an idvar column. As it is compulsory to identify one of the columns as idvar, we create first an id column, having in mind that wide_1 has 21 columns

wide_1$id <- rep(1, 21)

Now we can continue to move “year” timevar column to wide format

wide_2 <- reshape(data = wide_1, direction = "wide",
  timevar = "year",
  idvar = "id", 
  sep = ":")
head(wide_2[, 1:6])

##   id earnings_Qtr1:1960 earnings_Qtr2:1960 earnings_Qtr3:1960
## 1  1               0.71               0.63               0.85
##   earnings_Qtr4:1960 earnings_Qtr1:1961
## 1               0.44               0.61

We have just transposed the initial long data frame JJ.

str(t(JJ))

##  chr [1:3, 1:84] "1960" "Qtr1" " 0.71" "1960" "Qtr2" " 0.63" "1960" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:3] "year" "quarter" "earnings"
##   ..$ : NULL

In the transpose of JJ numeric value has been coherced to character when the original data frame has been converted into a matrix before transposing it.

Through this example, we have explained the parameters idvar, timevar, v.names, data, sep and direction. We can try it also with the hec data frame.

str(hec)

## 'data.frame':    32 obs. of  4 variables:
##  $ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Freq: num  32 53 10 3 11 50 10 30 10 25 ...

We can chose three different columns as timevar, the values of which will become columns: Hair, Eye and Sex. Let us chose Sex. Then we build the wide representation

hec_1 <- reshape(data = hec, direction = "wide", 
  timevar = "Sex",
  idvar = c("Hair", "Eye"),
  v.names = "Freq",
  sep = "_")
hec_1

##     Hair   Eye Freq_Male Freq_Female
## 1  Black Brown        32          36
## 2  Brown Brown        53          66
## 3    Red Brown        10          16
## 4  Blond Brown         3           4
## 5  Black  Blue        11           9
## 6  Brown  Blue        50          34
## 7    Red  Blue        10           7
## 8  Blond  Blue        30          64
## 9  Black Hazel        10           5
## 10 Brown Hazel        25          29
## 11   Red Hazel         7           7
## 12 Blond Hazel         5           5
## 13 Black Green         3           2
## 14 Brown Green        15          14
## 15   Red Green         7           7
## 16 Blond Green         8           8

We can move one more column to wide format, Eye, for instance:

hec_2 <-  reshape(data = hec_1, direction = "wide",
  timevar = "Eye",
  idvar = "Hair",
  sep = ":")
hec_2

##    Hair Freq_Male:Brown Freq_Female:Brown Freq_Male:Blue Freq_Female:Blue
## 1 Black              32                36             11                9
## 2 Brown              53                66             50               34
## 3   Red              10                16             10                7
## 4 Blond               3                 4             30               64
##   Freq_Male:Hazel Freq_Female:Hazel Freq_Male:Green Freq_Female:Green
## 1              10                 5               3                 2
## 2              25                29              15                14
## 3               7                 7               7                 7
## 4               5                 5               8                 8

Nothing more is required to transform a data frame in long format to a wide format.

Long to Wide format recap

You write the following template for the reshape function:

reshape(data = df, direction = "wide",
  timevar = , 
  idvar = ,
  v.names =)

where df is the varibale that contains the data frame you want to transform to wide

identify the column that you to want to use to expand the column number in the wide format (it can only be 1 column). This is the timevar parameter.
identify the column (or columns) that will identify in an unique way each observation in the wide format. These are the ìdvarcolumns.
Finally, the column(s) left is the other column, parameter v.names

You put the parameters that you identified in this 3 steps in the 3 parameters of the template.

Wide to Long format

The template for the reshape function when tranforming a data frame from wide to long is even simpler. Sometimes, it may become complicated.

reshape(data = df, direction = "long", 
  #timevar = , times =,
  #idvar = , ids,
  #v.names =, 
  varying =,
  #sep = "."
  )

Only one step is required:

To identify the columns we want to move from the wide to long format in the parameter varying. We can pass the names of the columns or the indeces to obtain those names from names(df).

If this method works, the other required parameter (v.names) will be guessed from the column names using a split with a “·” as a separator. An error will be raised when v.names cannot be determined.

Nevertheless, the obtained result can be far from expectations. Playing with the other parameters will help us to adjust the final result to our needs.

Let us create a simple example to show this general first approach

w_df <- data.frame(
  city = c("London", "Barcelona", "Paris"),
  rivers = c(1, 2, 1),
  letters = c(6, 9, 5),
  stringsAsFactors = F
)
w_df

##        city rivers letters
## 1    London      1       6
## 2 Barcelona      2       9
## 3     Paris      1       5

reshape(w_df, direction = "long",
  varying = 2:3)

As the v.names cannot be determined by spliting varying names with the default sep (“.”), we have to provide a value to v.names

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3)

##          city time value id
## 1.1    London    1     1  1
## 2.1 Barcelona    1     2  2
## 3.1     Paris    1     1  3
## 1.2    London    2     6  1
## 2.2 Barcelona    2     9  2
## 3.2     Paris    2     5  3

Parameter times was not given and takes the default value (seq_along(varying[[1]])), that in this example is 1:2.

We can give the values to times

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3,
  times = names(w_df)[2:3])

##                city    time value id
## 1.rivers     London  rivers     1  1
## 2.rivers  Barcelona  rivers     2  2
## 3.rivers      Paris  rivers     1  3
## 1.letters    London letters     6  1
## 2.letters Barcelona letters     9  2
## 3.letters     Paris letters     5  3

By default, timevar column name is named time by default. We can change it with timevar parameter

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3,
  times = names(w_df)[2:3],
  timevar = "measure")

##                city measure value id
## 1.rivers     London  rivers     1  1
## 2.rivers  Barcelona  rivers     2  2
## 3.rivers      Paris  rivers     1  3
## 1.letters    London letters     6  1
## 2.letters Barcelona letters     9  2
## 3.letters     Paris letters     5  3

Note that a colun called id has been created to play the role of idvar column in the long format. We call it differently or using an existing column to become idvar column. We will use city column.

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3,
  times = names(w_df)[2:3],
  timevar = "measure",
  idvar = "city")

##                        city measure value
## London.rivers        London  rivers     1
## Barcelona.rivers  Barcelona  rivers     2
## Paris.rivers          Paris  rivers     1
## London.letters       London letters     6
## Barcelona.letters Barcelona letters     9
## Paris.letters         Paris letters     5

In case there is or do ot want to use any existing column as idvar, we can define its values with the ids parameter

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3,
  times = names(w_df)[2:3],
  timevar = "measure",
  idvar = "city_id",
  ids = paste0("#", 1:3))

##                 city measure value city_id
## #1.rivers     London  rivers     1      #1
## #2.rivers  Barcelona  rivers     2      #2
## #3.rivers      Paris  rivers     1      #3
## #1.letters    London letters     6      #1
## #2.letters Barcelona letters     9      #2
## #3.letters     Paris letters     5      #3

Coming back to the previous example, we can get the final result (at least, the one I expected) by changing the row identifiers. Parameter new.row.names fills this purpose

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3,
  times = names(w_df)[2:3],
  timevar = "measure",
  idvar = "city",
  new.row.names = 1:6)

##        city measure value
## 1    London  rivers     1
## 2 Barcelona  rivers     2
## 3     Paris  rivers     1
## 4    London letters     6
## 5 Barcelona letters     9
## 6     Paris letters     5

When v.names can be deduced from the varying names, then the final result can be achieve faster. Let us modify sñightly the data frame of the previous example

w_df2 <- data.frame(
  city = c("London", "Barcelona", "Paris"),
  value.rivers = c(1, 2, 1),
  value.letters = c(6, 9, 5),
  stringsAsFactors = F
)
w_df2

##        city value.rivers value.letters
## 1    London            1             6
## 2 Barcelona            2             9
## 3     Paris            1             5

Now, just with varying parameter we obtain a result

reshape(w_df2, direction = "long",
  varying = 2:3)

##                city    time value id
## 1.rivers     London  rivers     1  1
## 2.rivers  Barcelona  rivers     2  2
## 3.rivers      Paris  rivers     1  3
## 1.letters    London letters     6  1
## 2.letters Barcelona letters     9  2
## 3.letters     Paris letters     5  3

and the final result by

reshape(w_df2, direction = "long",
  varying = 2:3,
  timevar = "measure",
  idvar = "city",
  new.row.names = 1:6)

##        city measure value
## 1    London  rivers     1
## 2 Barcelona  rivers     2
## 3     Paris  rivers     1
## 4    London letters     6
## 5 Barcelona letters     9
## 6     Paris letters     5

Iris dataset

We will now apply the same rules other wide data frames.

We start with the iris well known dataframe R-Bloggers

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

long_iris <- reshape(data = iris, direction = "long"
  , timevar = "dimension"
  , times = names(iris)[1:4] 
  , idvar = "flower_ID"
  , ids = paste0("f", 1:150)
  , v.names = "Measurements"
  , varying = 1:4
  , new.row.names = 1:600
  )
head(long_iris)

##   Species    dimension Measurements flower_ID
## 1  setosa Sepal.Length          5.1        f1
## 2  setosa Sepal.Length          4.9        f2
## 3  setosa Sepal.Length          4.7        f3
## 4  setosa Sepal.Length          4.6        f4
## 5  setosa Sepal.Length          5.0        f5
## 6  setosa Sepal.Length          5.4        f6

str(long_iris)

## 'data.frame':    600 obs. of  4 variables:
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ dimension   : chr  "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Length" ...
##  $ Measurements: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ flower_ID   : chr  "f1" "f2" "f3" "f4" ...
##  - attr(*, "reshapeLong")=List of 4
##   ..$ varying:List of 1
##   .. ..$ Measurements: chr  "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
##   .. ..- attr(*, "v.names")= chr "Measurements"
##   .. ..- attr(*, "times")= chr  "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
##   ..$ v.names: chr "Measurements"
##   ..$ idvar  : chr "flower_ID"
##   ..$ timevar: chr "dimension"

long_iris_2 <- reshape(data = iris, direction = "long",
  timevar = "measure", 
  times = c("Length", "Width"), 
  idvar = "flower_ID",
  ids = paste0("f", 1:150),
  v.names = c("Sepal", "Petal"),
  varying = 1:4,
  new.row.names = 1:300)
head(long_iris_2)

##   Species measure Sepal Petal flower_ID
## 1  setosa  Length   3.5   5.1        f1
## 2  setosa  Length   3.0   4.9        f2
## 3  setosa  Length   3.2   4.7        f3
## 4  setosa  Length   3.1   4.6        f4
## 5  setosa  Length   3.6   5.0        f5
## 6  setosa  Length   3.9   5.4        f6

long_iris_3 <- reshape(data = long_iris_2, direction = "long",
  timevar = "part", 
  times = c("Sepal", "Petal"),
  idvar = "flower_ID",
  v.names = "values",
  varying = 3:4,
  new.row.names = 1:600)
head(long_iris_3)

##   Species measure flower_ID  part values
## 1  setosa  Length        f1 Sepal    3.5
## 2  setosa  Length        f2 Sepal    3.0
## 3  setosa  Length        f3 Sepal    3.2
## 4  setosa  Length        f4 Sepal    3.1
## 5  setosa  Length        f5 Sepal    3.6
## 6  setosa  Length        f6 Sepal    3.9

finally, we can keep the size measures in the wide format

long_iris_4 <- reshape(data = iris, direction = "long",
  timevar = "part", 
  times = c("Sepal", "Petal"), 
  idvar = "flower_ID",
  ids = paste0("f", 1:150),
  v.names = c("Length", "Width"),
  #varying = 1:4,
  varying = list(c(1, 3), c(2, 4)),
  new.row.names = 1:300)
head(long_iris_4)

##   Species  part Length Width flower_ID
## 1  setosa Sepal    5.1   3.5        f1
## 2  setosa Sepal    4.9   3.0        f2
## 3  setosa Sepal    4.7   3.2        f3
## 4  setosa Sepal    4.6   3.1        f4
## 5  setosa Sepal    5.0   3.6        f5
## 6  setosa Sepal    5.4   3.9        f6

reshape help examples

The three following examples can be found in the reshape function help link

Let us start with the Indometh dataset.

head(Indometh)

##   Subject time conc
## 1       1 0.25 1.50
## 2       1 0.50 0.94
## 3       1 0.75 0.78
## 4       1 1.00 0.48
## 5       1 1.25 0.37
## 6       1 2.00 0.19

If column time is considered as timevar and Subject as idvar, then

indo_wide <- reshape(data = Indometh, direction = "wide",
  timevar = "time",
  idvar = "Subject")
indo_wide

##    Subject conc.0.25 conc.0.5 conc.0.75 conc.1 conc.1.25 conc.2 conc.3
## 1        1      1.50     0.94      0.78   0.48      0.37   0.19   0.12
## 12       2      2.03     1.63      0.71   0.70      0.64   0.36   0.32
## 23       3      2.72     1.49      1.16   0.80      0.80   0.39   0.22
## 34       4      1.85     1.39      1.02   0.89      0.59   0.40   0.16
## 45       5      2.05     1.04      0.81   0.39      0.30   0.23   0.13
## 56       6      2.31     1.44      1.03   0.84      0.64   0.42   0.24
##    conc.4 conc.5 conc.6 conc.8
## 1    0.11   0.08   0.07   0.05
## 12   0.20   0.25   0.12   0.08
## 23   0.12   0.11   0.08   0.08
## 34   0.11   0.10   0.07   0.07
## 45   0.11   0.08   0.10   0.06
## 56   0.17   0.13   0.10   0.09

If we do the other way around

indo_wide_2 <- reshape(data = Indometh, direction = "wide",
  timevar = "Subject",
  idvar = "time"
  )
indo_wide_2

##    time conc.1 conc.2 conc.3 conc.4 conc.5 conc.6
## 1  0.25   1.50   2.03   2.72   1.85   2.05   2.31
## 2  0.50   0.94   1.63   1.49   1.39   1.04   1.44
## 3  0.75   0.78   0.71   1.16   1.02   0.81   1.03
## 4  1.00   0.48   0.70   0.80   0.89   0.39   0.84
## 5  1.25   0.37   0.64   0.80   0.59   0.30   0.64
## 6  2.00   0.19   0.36   0.39   0.40   0.23   0.42
## 7  3.00   0.12   0.32   0.22   0.16   0.13   0.24
## 8  4.00   0.11   0.20   0.12   0.11   0.11   0.17
## 9  5.00   0.08   0.25   0.11   0.10   0.08   0.13
## 10 6.00   0.07   0.12   0.08   0.07   0.10   0.10
## 11 8.00   0.05   0.08   0.08   0.07   0.06   0.09

One more example

df <- data.frame(id = rep(1:4, rep(2,4)),
                 visit = I(rep(c("Before","After"), 4)),
                 x = rnorm(4), y = runif(4))
df

##   id  visit          x         y
## 1  1 Before -0.6405795 0.7364623
## 2  1  After  0.2943682 0.5466475
## 3  2 Before  0.3397886 0.8687797
## 4  2  After -0.5491833 0.1694951
## 5  3 Before -0.6405795 0.7364623
## 6  3  After  0.2943682 0.5466475
## 7  4 Before  0.3397886 0.8687797
## 8  4  After -0.5491833 0.1694951

df_1 <- reshape(data = df
  , direction = "wide"
  , idvar = "visit"
  , timevar = "id")
df_1

##    visit        x.1       y.1        x.2       y.2        x.3       y.3
## 1 Before -0.6405795 0.7364623  0.3397886 0.8687797 -0.6405795 0.7364623
## 2  After  0.2943682 0.5466475 -0.5491833 0.1694951  0.2943682 0.5466475
##          x.4       y.4
## 1  0.3397886 0.8687797
## 2 -0.5491833 0.1694951

We could also expand the visit in columns

df_2 <- reshape(data = df
  , direction = "wide"
  , idvar = "id"
  , timevar = "visit")
df_2

##   id   x.Before  y.Before    x.After   y.After
## 1  1 -0.6405795 0.7364623  0.2943682 0.5466475
## 3  2  0.3397886 0.8687797 -0.5491833 0.1694951
## 5  3 -0.6405795 0.7364623  0.2943682 0.5466475
## 7  4  0.3397886 0.8687797 -0.5491833 0.1694951

Example using state.x77, matrix with 8 columns with info about the 50 states of USA

str(state.x77)

##  num [1:50, 1:8] 3615 365 2212 2110 21198 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##   ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...

df.x77 <- as.data.frame(state.x77)

str(df.x77)

## 'data.frame':    50 obs. of  8 variables:
##  $ Population: num  3615 365 2212 2110 21198 ...
##  $ Income    : num  3624 6315 4530 3378 5114 ...
##  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
##  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
##  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
##  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
##  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
##  $ Area      : num  50708 566432 113417 51945 156361 ...

To transform it into a Long format data frame

long.x77 <- reshape(data = df.x77 , direction = "long"
  , idvar ="state", ids = row.names(state.x77)
  , timevar = "variable", times = names(df.x77) 
  , v.names = "value", varying = 1:8 #varying = list(names(df.x77))
  , new.row.names = 1:400)
head(long.x77)

##     variable value      state
## 1 Population  3615    Alabama
## 2 Population   365     Alaska
## 3 Population  2212    Arizona
## 4 Population  2110   Arkansas
## 5 Population 21198 California
## 6 Population  2541   Colorado

Turn into a new wide format. First we do it automatically

head(reshape(long.x77, direction = "wide"))

##        state Population Income Illiteracy Life Exp Murder HS Grad Frost
## 1    Alabama       3615   3624        2.1    69.05   15.1    41.3    20
## 2     Alaska        365   6315        1.5    69.31   11.3    66.7   152
## 3    Arizona       2212   4530        1.8    70.55    7.8    58.1    15
## 4   Arkansas       2110   3378        1.9    70.66   10.1    39.9    65
## 5 California      21198   5114        1.1    71.71   10.3    62.6    20
## 6   Colorado       2541   4884        0.7    72.06    6.8    63.9   166
##     Area
## 1  50708
## 2 566432
## 3 113417
## 4  51945
## 5 156361
## 6 103766

Secondly, we can force the wide format as teh long one was the original shape

head(
  reshape(data = long.x77, direction = "wide"
  , idvar = "state"
  , timevar = "variable"
  , sep = "_")
)

##        state value_Population value_Income value_Illiteracy value_Life Exp
## 1    Alabama             3615         3624              2.1          69.05
## 2     Alaska              365         6315              1.5          69.31
## 3    Arizona             2212         4530              1.8          70.55
## 4   Arkansas             2110         3378              1.9          70.66
## 5 California            21198         5114              1.1          71.71
## 6   Colorado             2541         4884              0.7          72.06
##   value_Murder value_HS Grad value_Frost value_Area
## 1         15.1          41.3          20      50708
## 2         11.3          66.7         152     566432
## 3          7.8          58.1          15     113417
## 4         10.1          39.9          65      51945
## 5         10.3          62.6          20     156361
## 6          6.8          63.9         166     103766

We can also do the equivalent of a transposition by goin first to long format (long.x77) and later to wide (wide.x77)

wide.x77 <- reshape(data = long.x77
  , direction = "wide"
  , idvar = "variable"
  , timevar = "state"
  , sep = "_")
wide.x77[, 1:5]

##       variable value_Alabama value_Alaska value_Arizona value_Arkansas
## 1   Population       3615.00       365.00       2212.00        2110.00
## 51      Income       3624.00      6315.00       4530.00        3378.00
## 101 Illiteracy          2.10         1.50          1.80           1.90
## 151   Life Exp         69.05        69.31         70.55          70.66
## 201     Murder         15.10        11.30          7.80          10.10
## 251    HS Grad         41.30        66.70         58.10          39.90
## 301      Frost         20.00       152.00         15.00          65.00
## 351       Area      50708.00    566432.00     113417.00       51945.00

Example 1

Another example, from link

df3 <- data.frame(id = 1:4, age = c(40,50,60,50), dose1 = c(1,2,1,2),
                  dose2 = c(2,1,2,1), dose4 = c(3,3,3,3))
df3

##   id age dose1 dose2 dose4
## 1  1  40     1     2     3
## 2  2  50     2     1     3
## 3  3  60     1     2     3
## 4  4  50     2     1     3

reshape(data = df3, direction = "long"
  , idvar = "id"
  , timevar = "dose_type"
  , times = c(1, 2, 4)
  , v.names = "dose"
  , varying = 3:5
  , new.row.names = 1:12
  , sep = "")

##    id age dose_type dose
## 1   1  40         1    1
## 2   2  50         1    2
## 3   3  60         1    1
## 4   4  50         1    2
## 5   1  40         2    2
## 6   2  50         2    1
## 7   3  60         2    2
## 8   4  50         2    1
## 9   1  40         4    3
## 10  2  50         4    3
## 11  3  60         4    3
## 12  4  50         4    3

reshape(data = df3, direction = "long"
  , varying = 3:5
  , sep = "")

##     id age time dose
## 1.1  1  40    1    1
## 2.1  2  50    1    2
## 3.1  3  60    1    1
## 4.1  4  50    1    2
## 1.2  1  40    2    2
## 2.2  2  50    2    1
## 3.2  3  60    2    2
## 4.2  4  50    2    1
## 1.4  1  40    4    3
## 2.4  2  50    4    3
## 3.4  3  60    4    3
## 4.4  4  50    4    3

Example 2

One more example, from TRinker’s R Blog

set.seed(10)
dat <- data.frame(id=paste0("ID",1:5), 
    sex=sample(c("male", "female"), 5, replace=TRUE), 
    matrix(rpois(30, 10), 5, 6))
colnames(dat)[-c(1:2)] <- paste0(rep(1:2, times=3), 
    rep(c("work", "home", "church"), 2))
dat

##    id    sex 1work 2home 1church 2work 1home 2church
## 1 ID1 female     7     8       7    10     6      10
## 2 ID2   male    10    13      10     7    13      15
## 3 ID3   male    11    10       6    10    10       7
## 4 ID4 female     6     8      12     9    15       7
## 5 ID5   male     9    11      15    10    10      12

reshape(dat, direction="long"
  , idvar = "id"
  , timevar = "part_area"
  , times = names(dat)[3:8]
  , v.names = "# pieces"
  , varying = 3:8
  , new.row.names = 1:30)

##     id    sex part_area # pieces
## 1  ID1 female     1work        7
## 2  ID2   male     1work       10
## 3  ID3   male     1work       11
## 4  ID4 female     1work        6
## 5  ID5   male     1work        9
## 6  ID1 female     2home        8
## 7  ID2   male     2home       13
## 8  ID3   male     2home       10
## 9  ID4 female     2home        8
## 10 ID5   male     2home       11
## 11 ID1 female   1church        7
## 12 ID2   male   1church       10
## 13 ID3   male   1church        6
## 14 ID4 female   1church       12
## 15 ID5   male   1church       15
## 16 ID1 female     2work       10
## 17 ID2   male     2work        7
## 18 ID3   male     2work       10
## 19 ID4 female     2work        9
## 20 ID5   male     2work       10
## 21 ID1 female     1home        6
## 22 ID2   male     1home       13
## 23 ID3   male     1home       10
## 24 ID4 female     1home       15
## 25 ID5   male     1home       10
## 26 ID1 female   2church       10
## 27 ID2   male   2church       15
## 28 ID3   male   2church        7
## 29 ID4 female   2church        7
## 30 ID5   male   2church       12

reshape(data = dat, direction = "long",
  varying = list(c(3, 7, 5), c(6, 4, 8)),
  idvar = "id",
  v.names = c("TIME_1", "TIME_2"),
  timevar = "PLACE",
  times = c("wrk", "hom", "chr"),
  new.row.names = 1:15)

##     id    sex PLACE TIME_1 TIME_2
## 1  ID1 female   wrk      7     10
## 2  ID2   male   wrk     10      7
## 3  ID3   male   wrk     11     10
## 4  ID4 female   wrk      6      9
## 5  ID5   male   wrk      9     10
## 6  ID1 female   hom      6      8
## 7  ID2   male   hom     13     13
## 8  ID3   male   hom     10     10
## 9  ID4 female   hom     15      8
## 10 ID5   male   hom     10     11
## 11 ID1 female   chr      7     10
## 12 ID2   male   chr     10     15
## 13 ID3   male   chr      6      7
## 14 ID4 female   chr     12      7
## 15 ID5   male   chr     15     12

reshape(data = dat, direction = "long",
  timevar = "TIME",
  times = 1:2,
  varying = list(c(3, 6), c(7, 4), c(5,8)),
  v.names = c("WORK", "HOME", "CHURCH"),
  new.row.names = 1:10)

##     id    sex TIME WORK HOME CHURCH
## 1  ID1 female    1    7    6      7
## 2  ID2   male    1   10   13     10
## 3  ID3   male    1   11   10      6
## 4  ID4 female    1    6   15     12
## 5  ID5   male    1    9   10     15
## 6  ID1 female    2   10    8     10
## 7  ID2   male    2    7   13     15
## 8  ID3   male    2   10   10      7
## 9  ID4 female    2    9    8      7
## 10 ID5   male    2   10   11     12

wide_1 <- reshape(data = dat, direction = "long",
  timevar = "TIME",
  times = 1:2,
  varying = list(c(3, 6), c(7, 4), c(5,8)),
  v.names = c("WORK", "HOME", "CHURCH"),
  new.row.names = 1:10)

reshape(data = wide_1, direction = "long",
  timevar = "PLACE",
  times = names(wide_1)[4:6], #c("work", "home", "church")
  varying = 4:6, #c("WORK", "HOME", "CHURCH"),
  v.names = "# people",
  new.row.names = 1:30)

##     id    sex TIME  PLACE # people
## 1  ID1 female    1   WORK        7
## 2  ID2   male    1   WORK       10
## 3  ID3   male    1   WORK       11
## 4  ID4 female    1   WORK        6
## 5  ID5   male    1   WORK        9
## 6  ID1 female    2   WORK       10
## 7  ID2   male    2   WORK        7
## 8  ID3   male    2   WORK       10
## 9  ID4 female    2   WORK        9
## 10 ID5   male    2   WORK       10
## 11 ID1 female    1   HOME        6
## 12 ID2   male    1   HOME       13
## 13 ID3   male    1   HOME       10
## 14 ID4 female    1   HOME       15
## 15 ID5   male    1   HOME       10
## 16 ID1 female    2   HOME        8
## 17 ID2   male    2   HOME       13
## 18 ID3   male    2   HOME       10
## 19 ID4 female    2   HOME        8
## 20 ID5   male    2   HOME       11
## 21 ID1 female    1 CHURCH        7
## 22 ID2   male    1 CHURCH       10
## 23 ID3   male    1 CHURCH        6
## 24 ID4 female    1 CHURCH       12
## 25 ID5   male    1 CHURCH       15
## 26 ID1 female    2 CHURCH       10
## 27 ID2   male    2 CHURCH       15
## 28 ID3   male    2 CHURCH        7
## 29 ID4 female    2 CHURCH        7
## 30 ID5   male    2 CHURCH       12

Reshape base R function tutorial

Pep Porrà

December 16, 2018

Goal

Datasets

Iris dataset

HairEyeColor dataset

Jonhson & Johnson dataset

Indometh Dataset

Reshape function

Example to identify: idvar, timevar and other columns

Long to Wide format

Long to Wide format recap

Wide to Long format

Iris dataset

reshape help examples

Example 1

Example 2