The goal of this note is to explain how to use the function reshape to transform (reshape) a data frame between the long and wide formats.
There are lots of alternatives to this function that are probably faster and, for sure, easy to apply. I have devoted time to this function because I love the R base functions.
As just said, reshape function changes the way a data frame is shown. Let us assume the data frame has N rows and M columns, so M * N = L cells. In the long version of the data frame, the number of columns is the minimum possible when each row represents one and only one observation and one attribute. I understand as the attribute, the value given to observation. So, in the long format, there is only one value for each observation. All other formats are wide formats. This definition of the long format is quite criptic. Hopefully, it can be better understood through examples.
Let us shown some examples of datasets and agree if they are in the long or the wide format
As a first example, let us take the iris dataset
data("iris")
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Each row in this dataset gives the measure of sepal length and width and petal length and width for 50 flowers from each of the 3 species of iris (type help(iris)). Each row represents an observation (a flower) of 4 variables (Attributes). So, this representation of the data cannot be considered in the long format because it has more than one attribute. Therefore, it is in wide format.
Let us use another R dataset: HirEyeColor
data("HairEyeColor")
head(HairEyeColor)
## [1] 32 53 10 3 11 50
str(HairEyeColor)
## 'table' num [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
## - attr(*, "dimnames")=List of 3
## ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
## ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
## ..$ Sex : chr [1:2] "Male" "Female"
print(HairEyeColor)
## , , Sex = Male
##
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
##
## , , Sex = Female
##
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
As we can see this a 3-d matrix with Hair color, eye color and Sex where each dimension is a factor and the value of each cell represents the total number of students out of the total sample size 592 with a given combination of Hair color, eye color and sex. If we transform the matrix into a data frame, each row represents one type of observation and only one with one attibute: the count of students with a given type of Hair, Eye and sex. Therefore, this dataset is in long format.
hec <- as.data.frame(HairEyeColor)
head(hec)
## Hair Eye Sex Freq
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
The third example is the Jonhson & Johnson quarterly earnings dataset
data("JohnsonJohnson")
head(JohnsonJohnson)
## [1] 0.71 0.63 0.85 0.44 0.61 0.69
str(JohnsonJohnson)
## Time-Series [1:84] from 1960 to 1981: 0.71 0.63 0.85 0.44 0.61 0.69 0.92 0.55 0.72 0.77 ...
In this example, each row represents the quarterly earnings in dollars of Johnson & Johnson. So, first row is Q1 1960, second row Q2 1960 and in this way sequently. As each represents only one value, the format of this time series once represented as table will be in long format. Let us build this table.
JJ <- data.frame(year = rep(1960:1980, each = 4), quarter = paste0("Qtr", rep(1:4, 21)), earnings = as.vector(JohnsonJohnson))
head(JJ)
## year quarter earnings
## 1 1960 Qtr1 0.71
## 2 1960 Qtr2 0.63
## 3 1960 Qtr3 0.85
## 4 1960 Qtr4 0.44
## 5 1961 Qtr1 0.61
## 6 1961 Qtr2 0.69
A fourth example is the Indometh dataset, used in the examples of the reshape help function
head(Indometh)
## Subject time conc
## 1 1 0.25 1.50
## 2 1 0.50 0.94
## 3 1 0.75 0.78
## 4 1 1.00 0.48
## 5 1 1.25 0.37
## 6 1 2.00 0.19
It is in the Long format as for every Subject and time, only one value (attribute) of the concentration is given.
The reshape function applies to data frames. It has the following parameters:
Before we explain the parameters, it is important to identify 3 types of columns in a long format data frame:
idvar parametertimevar parameter.v.names parameter. this column corresponds to the attribute (value) column we dicussed in the first sectionNote that the timevar column depends on the variable we want to move to the wide format. I feel that one example will help us to understand this point.
Let as take the Johnson & Johnson dataset, a data frame in the long format:
str(JJ)
## 'data.frame': 84 obs. of 3 variables:
## $ year : int 1960 1960 1960 1960 1961 1961 1961 1961 1962 1962 ...
## $ quarter : Factor w/ 4 levels "Qtr1","Qtr2",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ earnings: num 0.71 0.63 0.85 0.44 0.61 0.69 0.92 0.55 0.72 0.77 ...
There are 3 columns: year, quarter and earnings
Clearly, earnings column is the ‘other’ column
We can decide that the timevar column is the year. Then, the reshape function wil return a data frame where there will be one column for the earnings of each year. In this case, the idvar column will be the quarter.
The only parameters we need to pass to the function are
reshape(data = JJ, direction = "wide",
timevar = "year",
idvar = "quarter",
v.names = "earnings")
## quarter earnings.1960 earnings.1961 earnings.1962 earnings.1963
## 1 Qtr1 0.71 0.61 0.72 0.83
## 2 Qtr2 0.63 0.69 0.77 0.80
## 3 Qtr3 0.85 0.92 0.92 1.00
## 4 Qtr4 0.44 0.55 0.60 0.77
## earnings.1964 earnings.1965 earnings.1966 earnings.1967 earnings.1968
## 1 0.92 1.16 1.26 1.53 1.53
## 2 1.00 1.30 1.38 1.59 2.07
## 3 1.24 1.45 1.86 1.83 2.34
## 4 1.00 1.25 1.56 1.86 2.25
## earnings.1969 earnings.1970 earnings.1971 earnings.1972 earnings.1973
## 1 2.16 2.79 3.60 4.86 5.58
## 2 2.43 3.42 4.32 5.04 5.85
## 3 2.70 3.69 4.32 5.04 6.57
## 4 2.25 3.60 4.05 4.41 5.31
## earnings.1974 earnings.1975 earnings.1976 earnings.1977 earnings.1978
## 1 6.03 6.93 7.74 9.54 11.88
## 2 6.39 7.74 8.91 10.26 12.06
## 3 6.93 7.83 8.28 9.54 12.15
## 4 5.85 6.12 6.84 8.73 8.91
## earnings.1979 earnings.1980
## 1 14.04 16.20
## 2 12.96 14.67
## 3 14.85 16.02
## 4 9.99 11.61
Note that the columns names in the “wide” format have been built by concatenating the Other column name (“earnings”) with a point (“.”) and the timevar values. The separator “.” is the default value for the sep parameter. Any other value can be used as separator.
On the other hand, if we take the quarter as timevar, the reshape function will return a data frame where there will be one column for the earnings for each quarter. The idvar will be in this case the year.
wide_1 <- reshape(data = JJ, direction = "wide",
timevar = "quarter",
idvar = "year",
sep = "_")
head(wide_1)
## year earnings_Qtr1 earnings_Qtr2 earnings_Qtr3 earnings_Qtr4
## 1 1960 0.71 0.63 0.85 0.44
## 5 1961 0.61 0.69 0.92 0.55
## 9 1962 0.72 0.77 0.92 0.60
## 13 1963 0.83 0.80 1.00 0.77
## 17 1964 0.92 1.00 1.24 1.00
## 21 1965 1.16 1.30 1.45 1.25
We have omitted in this second example the v.names parameter as it can be guessed by the function and changed the separator from “.” to “_“.
It is possible to continue moving a column to wide format.
Let us take our last example wide_1. Column year will be our timevar column and the rest of columns (names(wide_1)[-1]) as other columns. Note that in this case we do not have an idvar column. As it is compulsory to identify one of the columns as idvar, we create first an id column, having in mind that wide_1 has 21 columns
wide_1$id <- rep(1, 21)
Now we can continue to move “year” timevar column to wide format
wide_2 <- reshape(data = wide_1, direction = "wide",
timevar = "year",
idvar = "id",
sep = ":")
head(wide_2[, 1:6])
## id earnings_Qtr1:1960 earnings_Qtr2:1960 earnings_Qtr3:1960
## 1 1 0.71 0.63 0.85
## earnings_Qtr4:1960 earnings_Qtr1:1961
## 1 0.44 0.61
We have just transposed the initial long data frame JJ.
str(t(JJ))
## chr [1:3, 1:84] "1960" "Qtr1" " 0.71" "1960" "Qtr2" " 0.63" "1960" ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:3] "year" "quarter" "earnings"
## ..$ : NULL
In the transpose of JJ numeric value has been coherced to character when the original data frame has been converted into a matrix before transposing it.
Through this example, we have explained the parameters idvar, timevar, v.names, data, sep and direction. We can try it also with the hec data frame.
str(hec)
## 'data.frame': 32 obs. of 4 variables:
## $ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: num 32 53 10 3 11 50 10 30 10 25 ...
We can chose three different columns as timevar, the values of which will become columns: Hair, Eye and Sex. Let us chose Sex. Then we build the wide representation
hec_1 <- reshape(data = hec, direction = "wide",
timevar = "Sex",
idvar = c("Hair", "Eye"),
v.names = "Freq",
sep = "_")
hec_1
## Hair Eye Freq_Male Freq_Female
## 1 Black Brown 32 36
## 2 Brown Brown 53 66
## 3 Red Brown 10 16
## 4 Blond Brown 3 4
## 5 Black Blue 11 9
## 6 Brown Blue 50 34
## 7 Red Blue 10 7
## 8 Blond Blue 30 64
## 9 Black Hazel 10 5
## 10 Brown Hazel 25 29
## 11 Red Hazel 7 7
## 12 Blond Hazel 5 5
## 13 Black Green 3 2
## 14 Brown Green 15 14
## 15 Red Green 7 7
## 16 Blond Green 8 8
We can move one more column to wide format, Eye, for instance:
hec_2 <- reshape(data = hec_1, direction = "wide",
timevar = "Eye",
idvar = "Hair",
sep = ":")
hec_2
## Hair Freq_Male:Brown Freq_Female:Brown Freq_Male:Blue Freq_Female:Blue
## 1 Black 32 36 11 9
## 2 Brown 53 66 50 34
## 3 Red 10 16 10 7
## 4 Blond 3 4 30 64
## Freq_Male:Hazel Freq_Female:Hazel Freq_Male:Green Freq_Female:Green
## 1 10 5 3 2
## 2 25 29 15 14
## 3 7 7 7 7
## 4 5 5 8 8
Nothing more is required to transform a data frame in long format to a wide format.
You write the following template for the reshape function:
reshape(data = df, direction = "wide",
timevar = ,
idvar = ,
v.names =)
where df is the varibale that contains the data frame you want to transform to wide
timevar parameter.ìdvarcolumns.v.namesYou put the parameters that you identified in this 3 steps in the 3 parameters of the template.
The template for the reshape function when tranforming a data frame from wide to long is even simpler. Sometimes, it may become complicated.
reshape(data = df, direction = "long",
#timevar = , times =,
#idvar = , ids,
#v.names =,
varying =,
#sep = "."
)
Only one step is required:
varying. We can pass the names of the columns or the indeces to obtain those names from names(df).If this method works, the other required parameter (v.names) will be guessed from the column names using a split with a “·” as a separator. An error will be raised when v.names cannot be determined.
Nevertheless, the obtained result can be far from expectations. Playing with the other parameters will help us to adjust the final result to our needs.
Let us create a simple example to show this general first approach
w_df <- data.frame(
city = c("London", "Barcelona", "Paris"),
rivers = c(1, 2, 1),
letters = c(6, 9, 5),
stringsAsFactors = F
)
w_df
## city rivers letters
## 1 London 1 6
## 2 Barcelona 2 9
## 3 Paris 1 5
reshape(w_df, direction = "long",
varying = 2:3)
As the v.names cannot be determined by spliting varying names with the default sep (“.”), we have to provide a value to v.names
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3)
## city time value id
## 1.1 London 1 1 1
## 2.1 Barcelona 1 2 2
## 3.1 Paris 1 1 3
## 1.2 London 2 6 1
## 2.2 Barcelona 2 9 2
## 3.2 Paris 2 5 3
Parameter times was not given and takes the default value (seq_along(varying[[1]])), that in this example is 1:2.
We can give the values to times
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3,
times = names(w_df)[2:3])
## city time value id
## 1.rivers London rivers 1 1
## 2.rivers Barcelona rivers 2 2
## 3.rivers Paris rivers 1 3
## 1.letters London letters 6 1
## 2.letters Barcelona letters 9 2
## 3.letters Paris letters 5 3
By default, timevar column name is named time by default. We can change it with timevar parameter
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3,
times = names(w_df)[2:3],
timevar = "measure")
## city measure value id
## 1.rivers London rivers 1 1
## 2.rivers Barcelona rivers 2 2
## 3.rivers Paris rivers 1 3
## 1.letters London letters 6 1
## 2.letters Barcelona letters 9 2
## 3.letters Paris letters 5 3
Note that a colun called id has been created to play the role of idvar column in the long format. We call it differently or using an existing column to become idvar column. We will use city column.
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3,
times = names(w_df)[2:3],
timevar = "measure",
idvar = "city")
## city measure value
## London.rivers London rivers 1
## Barcelona.rivers Barcelona rivers 2
## Paris.rivers Paris rivers 1
## London.letters London letters 6
## Barcelona.letters Barcelona letters 9
## Paris.letters Paris letters 5
In case there is or do ot want to use any existing column as idvar, we can define its values with the ids parameter
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3,
times = names(w_df)[2:3],
timevar = "measure",
idvar = "city_id",
ids = paste0("#", 1:3))
## city measure value city_id
## #1.rivers London rivers 1 #1
## #2.rivers Barcelona rivers 2 #2
## #3.rivers Paris rivers 1 #3
## #1.letters London letters 6 #1
## #2.letters Barcelona letters 9 #2
## #3.letters Paris letters 5 #3
Coming back to the previous example, we can get the final result (at least, the one I expected) by changing the row identifiers. Parameter new.row.names fills this purpose
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3,
times = names(w_df)[2:3],
timevar = "measure",
idvar = "city",
new.row.names = 1:6)
## city measure value
## 1 London rivers 1
## 2 Barcelona rivers 2
## 3 Paris rivers 1
## 4 London letters 6
## 5 Barcelona letters 9
## 6 Paris letters 5
When v.names can be deduced from the varying names, then the final result can be achieve faster. Let us modify sñightly the data frame of the previous example
w_df2 <- data.frame(
city = c("London", "Barcelona", "Paris"),
value.rivers = c(1, 2, 1),
value.letters = c(6, 9, 5),
stringsAsFactors = F
)
w_df2
## city value.rivers value.letters
## 1 London 1 6
## 2 Barcelona 2 9
## 3 Paris 1 5
Now, just with varying parameter we obtain a result
reshape(w_df2, direction = "long",
varying = 2:3)
## city time value id
## 1.rivers London rivers 1 1
## 2.rivers Barcelona rivers 2 2
## 3.rivers Paris rivers 1 3
## 1.letters London letters 6 1
## 2.letters Barcelona letters 9 2
## 3.letters Paris letters 5 3
and the final result by
reshape(w_df2, direction = "long",
varying = 2:3,
timevar = "measure",
idvar = "city",
new.row.names = 1:6)
## city measure value
## 1 London rivers 1
## 2 Barcelona rivers 2
## 3 Paris rivers 1
## 4 London letters 6
## 5 Barcelona letters 9
## 6 Paris letters 5
We will now apply the same rules other wide data frames.
We start with the iris well known dataframe R-Bloggers
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
long_iris <- reshape(data = iris, direction = "long"
, timevar = "dimension"
, times = names(iris)[1:4]
, idvar = "flower_ID"
, ids = paste0("f", 1:150)
, v.names = "Measurements"
, varying = 1:4
, new.row.names = 1:600
)
head(long_iris)
## Species dimension Measurements flower_ID
## 1 setosa Sepal.Length 5.1 f1
## 2 setosa Sepal.Length 4.9 f2
## 3 setosa Sepal.Length 4.7 f3
## 4 setosa Sepal.Length 4.6 f4
## 5 setosa Sepal.Length 5.0 f5
## 6 setosa Sepal.Length 5.4 f6
str(long_iris)
## 'data.frame': 600 obs. of 4 variables:
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ dimension : chr "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Length" ...
## $ Measurements: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ flower_ID : chr "f1" "f2" "f3" "f4" ...
## - attr(*, "reshapeLong")=List of 4
## ..$ varying:List of 1
## .. ..$ Measurements: chr "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## .. ..- attr(*, "v.names")= chr "Measurements"
## .. ..- attr(*, "times")= chr "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## ..$ v.names: chr "Measurements"
## ..$ idvar : chr "flower_ID"
## ..$ timevar: chr "dimension"
long_iris_2 <- reshape(data = iris, direction = "long",
timevar = "measure",
times = c("Length", "Width"),
idvar = "flower_ID",
ids = paste0("f", 1:150),
v.names = c("Sepal", "Petal"),
varying = 1:4,
new.row.names = 1:300)
head(long_iris_2)
## Species measure Sepal Petal flower_ID
## 1 setosa Length 3.5 5.1 f1
## 2 setosa Length 3.0 4.9 f2
## 3 setosa Length 3.2 4.7 f3
## 4 setosa Length 3.1 4.6 f4
## 5 setosa Length 3.6 5.0 f5
## 6 setosa Length 3.9 5.4 f6
long_iris_3 <- reshape(data = long_iris_2, direction = "long",
timevar = "part",
times = c("Sepal", "Petal"),
idvar = "flower_ID",
v.names = "values",
varying = 3:4,
new.row.names = 1:600)
head(long_iris_3)
## Species measure flower_ID part values
## 1 setosa Length f1 Sepal 3.5
## 2 setosa Length f2 Sepal 3.0
## 3 setosa Length f3 Sepal 3.2
## 4 setosa Length f4 Sepal 3.1
## 5 setosa Length f5 Sepal 3.6
## 6 setosa Length f6 Sepal 3.9
finally, we can keep the size measures in the wide format
long_iris_4 <- reshape(data = iris, direction = "long",
timevar = "part",
times = c("Sepal", "Petal"),
idvar = "flower_ID",
ids = paste0("f", 1:150),
v.names = c("Length", "Width"),
#varying = 1:4,
varying = list(c(1, 3), c(2, 4)),
new.row.names = 1:300)
head(long_iris_4)
## Species part Length Width flower_ID
## 1 setosa Sepal 5.1 3.5 f1
## 2 setosa Sepal 4.9 3.0 f2
## 3 setosa Sepal 4.7 3.2 f3
## 4 setosa Sepal 4.6 3.1 f4
## 5 setosa Sepal 5.0 3.6 f5
## 6 setosa Sepal 5.4 3.9 f6
The three following examples can be found in the reshape function help link
Let us start with the Indometh dataset.
head(Indometh)
## Subject time conc
## 1 1 0.25 1.50
## 2 1 0.50 0.94
## 3 1 0.75 0.78
## 4 1 1.00 0.48
## 5 1 1.25 0.37
## 6 1 2.00 0.19
If column time is considered as timevar and Subject as idvar, then
indo_wide <- reshape(data = Indometh, direction = "wide",
timevar = "time",
idvar = "Subject")
indo_wide
## Subject conc.0.25 conc.0.5 conc.0.75 conc.1 conc.1.25 conc.2 conc.3
## 1 1 1.50 0.94 0.78 0.48 0.37 0.19 0.12
## 12 2 2.03 1.63 0.71 0.70 0.64 0.36 0.32
## 23 3 2.72 1.49 1.16 0.80 0.80 0.39 0.22
## 34 4 1.85 1.39 1.02 0.89 0.59 0.40 0.16
## 45 5 2.05 1.04 0.81 0.39 0.30 0.23 0.13
## 56 6 2.31 1.44 1.03 0.84 0.64 0.42 0.24
## conc.4 conc.5 conc.6 conc.8
## 1 0.11 0.08 0.07 0.05
## 12 0.20 0.25 0.12 0.08
## 23 0.12 0.11 0.08 0.08
## 34 0.11 0.10 0.07 0.07
## 45 0.11 0.08 0.10 0.06
## 56 0.17 0.13 0.10 0.09
If we do the other way around
indo_wide_2 <- reshape(data = Indometh, direction = "wide",
timevar = "Subject",
idvar = "time"
)
indo_wide_2
## time conc.1 conc.2 conc.3 conc.4 conc.5 conc.6
## 1 0.25 1.50 2.03 2.72 1.85 2.05 2.31
## 2 0.50 0.94 1.63 1.49 1.39 1.04 1.44
## 3 0.75 0.78 0.71 1.16 1.02 0.81 1.03
## 4 1.00 0.48 0.70 0.80 0.89 0.39 0.84
## 5 1.25 0.37 0.64 0.80 0.59 0.30 0.64
## 6 2.00 0.19 0.36 0.39 0.40 0.23 0.42
## 7 3.00 0.12 0.32 0.22 0.16 0.13 0.24
## 8 4.00 0.11 0.20 0.12 0.11 0.11 0.17
## 9 5.00 0.08 0.25 0.11 0.10 0.08 0.13
## 10 6.00 0.07 0.12 0.08 0.07 0.10 0.10
## 11 8.00 0.05 0.08 0.08 0.07 0.06 0.09
One more example
df <- data.frame(id = rep(1:4, rep(2,4)),
visit = I(rep(c("Before","After"), 4)),
x = rnorm(4), y = runif(4))
df
## id visit x y
## 1 1 Before -0.6405795 0.7364623
## 2 1 After 0.2943682 0.5466475
## 3 2 Before 0.3397886 0.8687797
## 4 2 After -0.5491833 0.1694951
## 5 3 Before -0.6405795 0.7364623
## 6 3 After 0.2943682 0.5466475
## 7 4 Before 0.3397886 0.8687797
## 8 4 After -0.5491833 0.1694951
df_1 <- reshape(data = df
, direction = "wide"
, idvar = "visit"
, timevar = "id")
df_1
## visit x.1 y.1 x.2 y.2 x.3 y.3
## 1 Before -0.6405795 0.7364623 0.3397886 0.8687797 -0.6405795 0.7364623
## 2 After 0.2943682 0.5466475 -0.5491833 0.1694951 0.2943682 0.5466475
## x.4 y.4
## 1 0.3397886 0.8687797
## 2 -0.5491833 0.1694951
We could also expand the visit in columns
df_2 <- reshape(data = df
, direction = "wide"
, idvar = "id"
, timevar = "visit")
df_2
## id x.Before y.Before x.After y.After
## 1 1 -0.6405795 0.7364623 0.2943682 0.5466475
## 3 2 0.3397886 0.8687797 -0.5491833 0.1694951
## 5 3 -0.6405795 0.7364623 0.2943682 0.5466475
## 7 4 0.3397886 0.8687797 -0.5491833 0.1694951
Example using state.x77, matrix with 8 columns with info about the 50 states of USA
str(state.x77)
## num [1:50, 1:8] 3615 365 2212 2110 21198 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
## ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...
df.x77 <- as.data.frame(state.x77)
str(df.x77)
## 'data.frame': 50 obs. of 8 variables:
## $ Population: num 3615 365 2212 2110 21198 ...
## $ Income : num 3624 6315 4530 3378 5114 ...
## $ Illiteracy: num 2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
## $ Life Exp : num 69 69.3 70.5 70.7 71.7 ...
## $ Murder : num 15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
## $ HS Grad : num 41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
## $ Frost : num 20 152 15 65 20 166 139 103 11 60 ...
## $ Area : num 50708 566432 113417 51945 156361 ...
To transform it into a Long format data frame
long.x77 <- reshape(data = df.x77 , direction = "long"
, idvar ="state", ids = row.names(state.x77)
, timevar = "variable", times = names(df.x77)
, v.names = "value", varying = 1:8 #varying = list(names(df.x77))
, new.row.names = 1:400)
head(long.x77)
## variable value state
## 1 Population 3615 Alabama
## 2 Population 365 Alaska
## 3 Population 2212 Arizona
## 4 Population 2110 Arkansas
## 5 Population 21198 California
## 6 Population 2541 Colorado
Turn into a new wide format. First we do it automatically
head(reshape(long.x77, direction = "wide"))
## state Population Income Illiteracy Life Exp Murder HS Grad Frost
## 1 Alabama 3615 3624 2.1 69.05 15.1 41.3 20
## 2 Alaska 365 6315 1.5 69.31 11.3 66.7 152
## 3 Arizona 2212 4530 1.8 70.55 7.8 58.1 15
## 4 Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
## 5 California 21198 5114 1.1 71.71 10.3 62.6 20
## 6 Colorado 2541 4884 0.7 72.06 6.8 63.9 166
## Area
## 1 50708
## 2 566432
## 3 113417
## 4 51945
## 5 156361
## 6 103766
Secondly, we can force the wide format as teh long one was the original shape
head(
reshape(data = long.x77, direction = "wide"
, idvar = "state"
, timevar = "variable"
, sep = "_")
)
## state value_Population value_Income value_Illiteracy value_Life Exp
## 1 Alabama 3615 3624 2.1 69.05
## 2 Alaska 365 6315 1.5 69.31
## 3 Arizona 2212 4530 1.8 70.55
## 4 Arkansas 2110 3378 1.9 70.66
## 5 California 21198 5114 1.1 71.71
## 6 Colorado 2541 4884 0.7 72.06
## value_Murder value_HS Grad value_Frost value_Area
## 1 15.1 41.3 20 50708
## 2 11.3 66.7 152 566432
## 3 7.8 58.1 15 113417
## 4 10.1 39.9 65 51945
## 5 10.3 62.6 20 156361
## 6 6.8 63.9 166 103766
We can also do the equivalent of a transposition by goin first to long format (long.x77) and later to wide (wide.x77)
wide.x77 <- reshape(data = long.x77
, direction = "wide"
, idvar = "variable"
, timevar = "state"
, sep = "_")
wide.x77[, 1:5]
## variable value_Alabama value_Alaska value_Arizona value_Arkansas
## 1 Population 3615.00 365.00 2212.00 2110.00
## 51 Income 3624.00 6315.00 4530.00 3378.00
## 101 Illiteracy 2.10 1.50 1.80 1.90
## 151 Life Exp 69.05 69.31 70.55 70.66
## 201 Murder 15.10 11.30 7.80 10.10
## 251 HS Grad 41.30 66.70 58.10 39.90
## 301 Frost 20.00 152.00 15.00 65.00
## 351 Area 50708.00 566432.00 113417.00 51945.00
Another example, from link
df3 <- data.frame(id = 1:4, age = c(40,50,60,50), dose1 = c(1,2,1,2),
dose2 = c(2,1,2,1), dose4 = c(3,3,3,3))
df3
## id age dose1 dose2 dose4
## 1 1 40 1 2 3
## 2 2 50 2 1 3
## 3 3 60 1 2 3
## 4 4 50 2 1 3
reshape(data = df3, direction = "long"
, idvar = "id"
, timevar = "dose_type"
, times = c(1, 2, 4)
, v.names = "dose"
, varying = 3:5
, new.row.names = 1:12
, sep = "")
## id age dose_type dose
## 1 1 40 1 1
## 2 2 50 1 2
## 3 3 60 1 1
## 4 4 50 1 2
## 5 1 40 2 2
## 6 2 50 2 1
## 7 3 60 2 2
## 8 4 50 2 1
## 9 1 40 4 3
## 10 2 50 4 3
## 11 3 60 4 3
## 12 4 50 4 3
reshape(data = df3, direction = "long"
, varying = 3:5
, sep = "")
## id age time dose
## 1.1 1 40 1 1
## 2.1 2 50 1 2
## 3.1 3 60 1 1
## 4.1 4 50 1 2
## 1.2 1 40 2 2
## 2.2 2 50 2 1
## 3.2 3 60 2 2
## 4.2 4 50 2 1
## 1.4 1 40 4 3
## 2.4 2 50 4 3
## 3.4 3 60 4 3
## 4.4 4 50 4 3
One more example, from TRinker’s R Blog
set.seed(10)
dat <- data.frame(id=paste0("ID",1:5),
sex=sample(c("male", "female"), 5, replace=TRUE),
matrix(rpois(30, 10), 5, 6))
colnames(dat)[-c(1:2)] <- paste0(rep(1:2, times=3),
rep(c("work", "home", "church"), 2))
dat
## id sex 1work 2home 1church 2work 1home 2church
## 1 ID1 female 7 8 7 10 6 10
## 2 ID2 male 10 13 10 7 13 15
## 3 ID3 male 11 10 6 10 10 7
## 4 ID4 female 6 8 12 9 15 7
## 5 ID5 male 9 11 15 10 10 12
reshape(dat, direction="long"
, idvar = "id"
, timevar = "part_area"
, times = names(dat)[3:8]
, v.names = "# pieces"
, varying = 3:8
, new.row.names = 1:30)
## id sex part_area # pieces
## 1 ID1 female 1work 7
## 2 ID2 male 1work 10
## 3 ID3 male 1work 11
## 4 ID4 female 1work 6
## 5 ID5 male 1work 9
## 6 ID1 female 2home 8
## 7 ID2 male 2home 13
## 8 ID3 male 2home 10
## 9 ID4 female 2home 8
## 10 ID5 male 2home 11
## 11 ID1 female 1church 7
## 12 ID2 male 1church 10
## 13 ID3 male 1church 6
## 14 ID4 female 1church 12
## 15 ID5 male 1church 15
## 16 ID1 female 2work 10
## 17 ID2 male 2work 7
## 18 ID3 male 2work 10
## 19 ID4 female 2work 9
## 20 ID5 male 2work 10
## 21 ID1 female 1home 6
## 22 ID2 male 1home 13
## 23 ID3 male 1home 10
## 24 ID4 female 1home 15
## 25 ID5 male 1home 10
## 26 ID1 female 2church 10
## 27 ID2 male 2church 15
## 28 ID3 male 2church 7
## 29 ID4 female 2church 7
## 30 ID5 male 2church 12
reshape(data = dat, direction = "long",
varying = list(c(3, 7, 5), c(6, 4, 8)),
idvar = "id",
v.names = c("TIME_1", "TIME_2"),
timevar = "PLACE",
times = c("wrk", "hom", "chr"),
new.row.names = 1:15)
## id sex PLACE TIME_1 TIME_2
## 1 ID1 female wrk 7 10
## 2 ID2 male wrk 10 7
## 3 ID3 male wrk 11 10
## 4 ID4 female wrk 6 9
## 5 ID5 male wrk 9 10
## 6 ID1 female hom 6 8
## 7 ID2 male hom 13 13
## 8 ID3 male hom 10 10
## 9 ID4 female hom 15 8
## 10 ID5 male hom 10 11
## 11 ID1 female chr 7 10
## 12 ID2 male chr 10 15
## 13 ID3 male chr 6 7
## 14 ID4 female chr 12 7
## 15 ID5 male chr 15 12
reshape(data = dat, direction = "long",
timevar = "TIME",
times = 1:2,
varying = list(c(3, 6), c(7, 4), c(5,8)),
v.names = c("WORK", "HOME", "CHURCH"),
new.row.names = 1:10)
## id sex TIME WORK HOME CHURCH
## 1 ID1 female 1 7 6 7
## 2 ID2 male 1 10 13 10
## 3 ID3 male 1 11 10 6
## 4 ID4 female 1 6 15 12
## 5 ID5 male 1 9 10 15
## 6 ID1 female 2 10 8 10
## 7 ID2 male 2 7 13 15
## 8 ID3 male 2 10 10 7
## 9 ID4 female 2 9 8 7
## 10 ID5 male 2 10 11 12
wide_1 <- reshape(data = dat, direction = "long",
timevar = "TIME",
times = 1:2,
varying = list(c(3, 6), c(7, 4), c(5,8)),
v.names = c("WORK", "HOME", "CHURCH"),
new.row.names = 1:10)
reshape(data = wide_1, direction = "long",
timevar = "PLACE",
times = names(wide_1)[4:6], #c("work", "home", "church")
varying = 4:6, #c("WORK", "HOME", "CHURCH"),
v.names = "# people",
new.row.names = 1:30)
## id sex TIME PLACE # people
## 1 ID1 female 1 WORK 7
## 2 ID2 male 1 WORK 10
## 3 ID3 male 1 WORK 11
## 4 ID4 female 1 WORK 6
## 5 ID5 male 1 WORK 9
## 6 ID1 female 2 WORK 10
## 7 ID2 male 2 WORK 7
## 8 ID3 male 2 WORK 10
## 9 ID4 female 2 WORK 9
## 10 ID5 male 2 WORK 10
## 11 ID1 female 1 HOME 6
## 12 ID2 male 1 HOME 13
## 13 ID3 male 1 HOME 10
## 14 ID4 female 1 HOME 15
## 15 ID5 male 1 HOME 10
## 16 ID1 female 2 HOME 8
## 17 ID2 male 2 HOME 13
## 18 ID3 male 2 HOME 10
## 19 ID4 female 2 HOME 8
## 20 ID5 male 2 HOME 11
## 21 ID1 female 1 CHURCH 7
## 22 ID2 male 1 CHURCH 10
## 23 ID3 male 1 CHURCH 6
## 24 ID4 female 1 CHURCH 12
## 25 ID5 male 1 CHURCH 15
## 26 ID1 female 2 CHURCH 10
## 27 ID2 male 2 CHURCH 15
## 28 ID3 male 2 CHURCH 7
## 29 ID4 female 2 CHURCH 7
## 30 ID5 male 2 CHURCH 12