pivot_wider
and pivot_longer
as alternative to reshape
base R functionlibrary(tidyr)
library(dplyr)
library(stringr)
The goal of this note is to rewrite all examples in Rpubs.com/pep1024 to transform (reshape) a data frame between the long and wide formats using pivot_*
functions in tidyr
package instead of the reshape
base R
function .
Functions pivot_wide
and pivot_long
are the latest functions created by Hadley Wickham to replace the original reshape
function in base R. An essay of cronology of the alternatives is the following:
reshape
package first published in 2005 with functions melt
and cast
CRANreshape2
package as a reboot of the reshape
package. First published 2010 CRANtidyr
package published in 2014 with functions gather
and spread
CRANtidyr
new functions pivot_longer
and pivot_wider
. First published in 2019 newsExamples in ref of reshape
function
JohnsonJohnson
datasetLet as take the Johnson & Johnson dataset, a data frame in the long format:
data("JohnsonJohnson")
JJ <- data.frame(year = rep(1960:1980, each = 4),
quarter = paste0("Qtr", rep(1:4, 21)),
earnings = as.vector(JohnsonJohnson))
tibble(JJ)
## # A tibble: 84 x 3
## year quarter earnings
## <int> <fct> <dbl>
## 1 1960 Qtr1 0.71
## 2 1960 Qtr2 0.63
## 3 1960 Qtr3 0.85
## 4 1960 Qtr4 0.44
## 5 1961 Qtr1 0.61
## 6 1961 Qtr2 0.69
## 7 1961 Qtr3 0.92
## 8 1961 Qtr4 0.55
## 9 1962 Qtr1 0.72
## 10 1962 Qtr2 0.77
## # ... with 74 more rows
reshape(data = JJ, direction = "wide",
timevar = "year",
idvar = "quarter",
v.names = "earnings")
## quarter earnings.1960 earnings.1961 earnings.1962 earnings.1963
## 1 Qtr1 0.71 0.61 0.72 0.83
## 2 Qtr2 0.63 0.69 0.77 0.80
## 3 Qtr3 0.85 0.92 0.92 1.00
## 4 Qtr4 0.44 0.55 0.60 0.77
## earnings.1964 earnings.1965 earnings.1966 earnings.1967 earnings.1968
## 1 0.92 1.16 1.26 1.53 1.53
## 2 1.00 1.30 1.38 1.59 2.07
## 3 1.24 1.45 1.86 1.83 2.34
## 4 1.00 1.25 1.56 1.86 2.25
## earnings.1969 earnings.1970 earnings.1971 earnings.1972 earnings.1973
## 1 2.16 2.79 3.60 4.86 5.58
## 2 2.43 3.42 4.32 5.04 5.85
## 3 2.70 3.69 4.32 5.04 6.57
## 4 2.25 3.60 4.05 4.41 5.31
## earnings.1974 earnings.1975 earnings.1976 earnings.1977 earnings.1978
## 1 6.03 6.93 7.74 9.54 11.88
## 2 6.39 7.74 8.91 10.26 12.06
## 3 6.93 7.83 8.28 9.54 12.15
## 4 5.85 6.12 6.84 8.73 8.91
## earnings.1979 earnings.1980
## 1 14.04 16.20
## 2 12.96 14.67
## 3 14.85 16.02
## 4 9.99 11.61
How can we get the same result using pivot_wide
?
JJ %>% pivot_wider(names_from = year, values_from = earnings, names_prefix = "earnings.")
## # A tibble: 4 x 22
## quarter earnings.1960 earnings.1961 earnings.1962 earnings.1963
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Qtr1 0.71 0.61 0.72 0.83
## 2 Qtr2 0.63 0.69 0.77 0.8
## 3 Qtr3 0.85 0.92 0.92 1
## 4 Qtr4 0.44 0.55 0.6 0.77
## # ... with 17 more variables: earnings.1964 <dbl>, earnings.1965 <dbl>,
## # earnings.1966 <dbl>, earnings.1967 <dbl>, earnings.1968 <dbl>,
## # earnings.1969 <dbl>, earnings.1970 <dbl>, earnings.1971 <dbl>,
## # earnings.1972 <dbl>, earnings.1973 <dbl>, earnings.1974 <dbl>,
## # earnings.1975 <dbl>, earnings.1976 <dbl>, earnings.1977 <dbl>,
## # earnings.1978 <dbl>, earnings.1979 <dbl>, earnings.1980 <dbl>
A different wider table generated from JJ
wide_1 <- reshape(data = JJ, direction = "wide",
timevar = "quarter",
idvar = "year",
sep = "_")
tibble(wide_1)
## # A tibble: 21 x 5
## year earnings_Qtr1 earnings_Qtr2 earnings_Qtr3 earnings_Qtr4
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1960 0.71 0.63 0.85 0.44
## 2 1961 0.61 0.69 0.92 0.55
## 3 1962 0.72 0.77 0.92 0.6
## 4 1963 0.83 0.8 1 0.77
## 5 1964 0.92 1 1.24 1
## 6 1965 1.16 1.3 1.45 1.25
## 7 1966 1.26 1.38 1.86 1.56
## 8 1967 1.53 1.59 1.83 1.86
## 9 1968 1.53 2.07 2.34 2.25
## 10 1969 2.16 2.43 2.7 2.25
## # ... with 11 more rows
JJ %>% pivot_wider(names_from = quarter, values_from = earnings, names_prefix = "earnings.")
## # A tibble: 21 x 5
## year earnings.Qtr1 earnings.Qtr2 earnings.Qtr3 earnings.Qtr4
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1960 0.71 0.63 0.85 0.44
## 2 1961 0.61 0.69 0.92 0.55
## 3 1962 0.72 0.77 0.92 0.6
## 4 1963 0.83 0.8 1 0.77
## 5 1964 0.92 1 1.24 1
## 6 1965 1.16 1.3 1.45 1.25
## 7 1966 1.26 1.38 1.86 1.56
## 8 1967 1.53 1.59 1.83 1.86
## 9 1968 1.53 2.07 2.34 2.25
## 10 1969 2.16 2.43 2.7 2.25
## # ... with 11 more rows
wide_1$id <- rep(1, 21)
Now we can continue to move “year” timevar column to wide format
wide_2 <- reshape(data = wide_1, direction = "wide",
timevar = "year",
idvar = "id",
sep = ":")
tibble(wide_2[, 1:6])
## # A tibble: 1 x 6
## id `earnings_Qtr1:~ `earnings_Qtr2:~ `earnings_Qtr3:~ `earnings_Qtr4:~
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.71 0.63 0.85 0.44
## # ... with 1 more variable: `earnings_Qtr1:1961` <dbl>
We have just transposed the initial long data frame JJ
. the same result can be achieved in one step using pivot_wider
:
JJ %>% pivot_wider(names_from = c(quarter, year), values_from = earnings, names_prefix = "earnings_",
names_sep = ":")
## # A tibble: 1 x 84
## `earnings_Qtr1:~ `earnings_Qtr2:~ `earnings_Qtr3:~ `earnings_Qtr4:~
## <dbl> <dbl> <dbl> <dbl>
## 1 0.71 0.63 0.85 0.44
## # ... with 80 more variables: `earnings_Qtr1:1961` <dbl>,
## # `earnings_Qtr2:1961` <dbl>, `earnings_Qtr3:1961` <dbl>,
## # `earnings_Qtr4:1961` <dbl>, `earnings_Qtr1:1962` <dbl>,
## # `earnings_Qtr2:1962` <dbl>, `earnings_Qtr3:1962` <dbl>,
## # `earnings_Qtr4:1962` <dbl>, `earnings_Qtr1:1963` <dbl>,
## # `earnings_Qtr2:1963` <dbl>, `earnings_Qtr3:1963` <dbl>,
## # `earnings_Qtr4:1963` <dbl>, `earnings_Qtr1:1964` <dbl>,
## # `earnings_Qtr2:1964` <dbl>, `earnings_Qtr3:1964` <dbl>,
## # `earnings_Qtr4:1964` <dbl>, `earnings_Qtr1:1965` <dbl>,
## # `earnings_Qtr2:1965` <dbl>, `earnings_Qtr3:1965` <dbl>,
## # `earnings_Qtr4:1965` <dbl>, `earnings_Qtr1:1966` <dbl>,
## # `earnings_Qtr2:1966` <dbl>, `earnings_Qtr3:1966` <dbl>,
## # `earnings_Qtr4:1966` <dbl>, `earnings_Qtr1:1967` <dbl>,
## # `earnings_Qtr2:1967` <dbl>, `earnings_Qtr3:1967` <dbl>,
## # `earnings_Qtr4:1967` <dbl>, `earnings_Qtr1:1968` <dbl>,
## # `earnings_Qtr2:1968` <dbl>, `earnings_Qtr3:1968` <dbl>,
## # `earnings_Qtr4:1968` <dbl>, `earnings_Qtr1:1969` <dbl>,
## # `earnings_Qtr2:1969` <dbl>, `earnings_Qtr3:1969` <dbl>,
## # `earnings_Qtr4:1969` <dbl>, `earnings_Qtr1:1970` <dbl>,
## # `earnings_Qtr2:1970` <dbl>, `earnings_Qtr3:1970` <dbl>,
## # `earnings_Qtr4:1970` <dbl>, `earnings_Qtr1:1971` <dbl>,
## # `earnings_Qtr2:1971` <dbl>, `earnings_Qtr3:1971` <dbl>,
## # `earnings_Qtr4:1971` <dbl>, `earnings_Qtr1:1972` <dbl>,
## # `earnings_Qtr2:1972` <dbl>, `earnings_Qtr3:1972` <dbl>,
## # `earnings_Qtr4:1972` <dbl>, `earnings_Qtr1:1973` <dbl>,
## # `earnings_Qtr2:1973` <dbl>, `earnings_Qtr3:1973` <dbl>,
## # `earnings_Qtr4:1973` <dbl>, `earnings_Qtr1:1974` <dbl>,
## # `earnings_Qtr2:1974` <dbl>, `earnings_Qtr3:1974` <dbl>,
## # `earnings_Qtr4:1974` <dbl>, `earnings_Qtr1:1975` <dbl>,
## # `earnings_Qtr2:1975` <dbl>, `earnings_Qtr3:1975` <dbl>,
## # `earnings_Qtr4:1975` <dbl>, `earnings_Qtr1:1976` <dbl>,
## # `earnings_Qtr2:1976` <dbl>, `earnings_Qtr3:1976` <dbl>,
## # `earnings_Qtr4:1976` <dbl>, `earnings_Qtr1:1977` <dbl>,
## # `earnings_Qtr2:1977` <dbl>, `earnings_Qtr3:1977` <dbl>,
## # `earnings_Qtr4:1977` <dbl>, `earnings_Qtr1:1978` <dbl>,
## # `earnings_Qtr2:1978` <dbl>, `earnings_Qtr3:1978` <dbl>,
## # `earnings_Qtr4:1978` <dbl>, `earnings_Qtr1:1979` <dbl>,
## # `earnings_Qtr2:1979` <dbl>, `earnings_Qtr3:1979` <dbl>,
## # `earnings_Qtr4:1979` <dbl>, `earnings_Qtr1:1980` <dbl>,
## # `earnings_Qtr2:1980` <dbl>, `earnings_Qtr3:1980` <dbl>,
## # `earnings_Qtr4:1980` <dbl>
HairEyeColor
datasetdata("HairEyeColor")
hec <- as.data.frame(HairEyeColor)
tibble(hec)
## # A tibble: 32 x 4
## Hair Eye Sex Freq
## <fct> <fct> <fct> <dbl>
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
## 7 Red Blue Male 10
## 8 Blond Blue Male 30
## 9 Black Hazel Male 10
## 10 Brown Hazel Male 25
## # ... with 22 more rows
hec_1 <- reshape(data = hec, direction = "wide",
timevar = "Sex",
idvar = c("Hair", "Eye"),
v.names = "Freq",
sep = "_")
tibble(hec_1)
## # A tibble: 16 x 4
## Hair Eye Freq_Male Freq_Female
## <fct> <fct> <dbl> <dbl>
## 1 Black Brown 32 36
## 2 Brown Brown 53 66
## 3 Red Brown 10 16
## 4 Blond Brown 3 4
## 5 Black Blue 11 9
## 6 Brown Blue 50 34
## 7 Red Blue 10 7
## 8 Blond Blue 30 64
## 9 Black Hazel 10 5
## 10 Brown Hazel 25 29
## 11 Red Hazel 7 7
## 12 Blond Hazel 5 5
## 13 Black Green 3 2
## 14 Brown Green 15 14
## 15 Red Green 7 7
## 16 Blond Green 8 8
The equivalent form with pivot_wider
will be
hec %>% pivot_wider(names_from = Sex, values_from = Freq, names_prefix = "Freq_")
## # A tibble: 16 x 4
## Hair Eye Freq_Male Freq_Female
## <fct> <fct> <dbl> <dbl>
## 1 Black Brown 32 36
## 2 Brown Brown 53 66
## 3 Red Brown 10 16
## 4 Blond Brown 3 4
## 5 Black Blue 11 9
## 6 Brown Blue 50 34
## 7 Red Blue 10 7
## 8 Blond Blue 30 64
## 9 Black Hazel 10 5
## 10 Brown Hazel 25 29
## 11 Red Hazel 7 7
## 12 Blond Hazel 5 5
## 13 Black Green 3 2
## 14 Brown Green 15 14
## 15 Red Green 7 7
## 16 Blond Green 8 8
hec_2 <- reshape(data = hec_1, direction = "wide",
timevar = "Eye",
idvar = "Hair",
sep = ":")
tibble(hec_2)
## # A tibble: 4 x 9
## Hair `Freq_Male:Brow~ `Freq_Female:Br~ `Freq_Male:Blue` `Freq_Female:Bl~
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Black 32 36 11 9
## 2 Brown 53 66 50 34
## 3 Red 10 16 10 7
## 4 Blond 3 4 30 64
## # ... with 4 more variables: `Freq_Male:Hazel` <dbl>,
## # `Freq_Female:Hazel` <dbl>, `Freq_Male:Green` <dbl>,
## # `Freq_Female:Green` <dbl>
Again, the same can be obtained with one step
hec %>% pivot_wider(names_from = c(Sex, Eye), values_from = Freq, names_prefix = "Freq_",
names_sep = ":")
## # A tibble: 4 x 9
## Hair `Freq_Male:Brow~ `Freq_Male:Blue` `Freq_Male:Haze~ `Freq_Male:Gree~
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Black 32 11 10 3
## 2 Brown 53 50 25 15
## 3 Red 10 10 7 7
## 4 Blond 3 30 5 8
## # ... with 4 more variables: `Freq_Female:Brown` <dbl>,
## # `Freq_Female:Blue` <dbl>, `Freq_Female:Hazel` <dbl>,
## # `Freq_Female:Green` <dbl>
w_df <- data.frame(
city = c("London", "Barcelona", "Paris"),
rivers = c(1, 2, 1),
letters = c(6, 9, 5),
stringsAsFactors = F
)
tibble(w_df)
## # A tibble: 3 x 3
## city rivers letters
## <chr> <dbl> <dbl>
## 1 London 1 6
## 2 Barcelona 2 9
## 3 Paris 1 5
reshape(w_df, direction = "long",
v.names = "value",
varying = 2:3,
times = names(w_df)[2:3],
timevar = "measure",
idvar = "city",
new.row.names = 1:6)
## city measure value
## 1 London rivers 1
## 2 Barcelona rivers 2
## 3 Paris rivers 1
## 4 London letters 6
## 5 Barcelona letters 9
## 6 Paris letters 5
Probably w_df
could be better defined as a tibble
w_tibble <- tribble(
~city, ~rivers, ~letters,
"London", 1, 6,
"Barcelona", 2, 9,
"Paris", 1, 5
)
w_tibble %>% pivot_longer(-city, names_to = "measure", values_to = "value") %>%
arrange(desc(measure))
## # A tibble: 6 x 3
## city measure value
## <chr> <chr> <dbl>
## 1 London rivers 1
## 2 Barcelona rivers 2
## 3 Paris rivers 1
## 4 London letters 6
## 5 Barcelona letters 9
## 6 Paris letters 5
Iris
datasetWe start with the iris
well known dataframe as used in the post R-Bloggers
data("iris")
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
long_iris <- reshape(data = iris, direction = "long"
, timevar = "dimension"
, times = names(iris)[1:4]
, idvar = "flower_ID"
, ids = paste0("f", 1:150)
, v.names = "Measurements"
, varying = 1:4
, new.row.names = 1:600
)
tibble(long_iris)
## # A tibble: 600 x 4
## Species dimension Measurements flower_ID
## <fct> <chr> <dbl> <chr>
## 1 setosa Sepal.Length 5.1 f1
## 2 setosa Sepal.Length 4.9 f2
## 3 setosa Sepal.Length 4.7 f3
## 4 setosa Sepal.Length 4.6 f4
## 5 setosa Sepal.Length 5 f5
## 6 setosa Sepal.Length 5.4 f6
## 7 setosa Sepal.Length 4.6 f7
## 8 setosa Sepal.Length 5 f8
## 9 setosa Sepal.Length 4.4 f9
## 10 setosa Sepal.Length 4.9 f10
## # ... with 590 more rows
The equivalent with pivot_longer
requires first to create flower_ID
column
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements")
## # A tibble: 600 x 4
## Species flower_ID dimension Measurements
## <fct> <chr> <chr> <dbl>
## 1 setosa f1 Sepal.Length 5.1
## 2 setosa f1 Sepal.Width 3.5
## 3 setosa f1 Petal.Length 1.4
## 4 setosa f1 Petal.Width 0.2
## 5 setosa f2 Sepal.Length 4.9
## 6 setosa f2 Sepal.Width 3
## 7 setosa f2 Petal.Length 1.4
## 8 setosa f2 Petal.Width 0.2
## 9 setosa f3 Sepal.Length 4.7
## 10 setosa f3 Sepal.Width 3.2
## # ... with 590 more rows
To obtain precisely the same order as with reshape, we need to reorder columns and arrage rows
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements") %>%
select(1, 3, 4, 2) %>% arrange(factor(dimension, levels = names(iris)[1:4]))
## # A tibble: 600 x 4
## Species dimension Measurements flower_ID
## <fct> <chr> <dbl> <chr>
## 1 setosa Sepal.Length 5.1 f1
## 2 setosa Sepal.Length 4.9 f2
## 3 setosa Sepal.Length 4.7 f3
## 4 setosa Sepal.Length 4.6 f4
## 5 setosa Sepal.Length 5 f5
## 6 setosa Sepal.Length 5.4 f6
## 7 setosa Sepal.Length 4.6 f7
## 8 setosa Sepal.Length 5 f8
## 9 setosa Sepal.Length 4.4 f9
## 10 setosa Sepal.Length 4.9 f10
## # ... with 590 more rows
str(long_iris)
## 'data.frame': 600 obs. of 4 variables:
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ dimension : chr "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Length" ...
## $ Measurements: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ flower_ID : chr "f1" "f2" "f3" "f4" ...
## - attr(*, "reshapeLong")=List of 4
## ..$ varying:List of 1
## .. ..$ Measurements: chr "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## .. ..- attr(*, "v.names")= chr "Measurements"
## .. ..- attr(*, "times")= chr "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## ..$ v.names: chr "Measurements"
## ..$ idvar : chr "flower_ID"
## ..$ timevar: chr "dimension"
long_iris_2 <- reshape(data = iris, direction = "long",
timevar = "measure",
times = c("Length", "Width"),
idvar = "flower_ID",
ids = paste0("f", 1:150),
v.names = c("Sepal", "Petal"),
varying = 1:4,
new.row.names = 1:300)
tibble(long_iris_2)
## # A tibble: 300 x 5
## Species measure Sepal Petal flower_ID
## <fct> <chr> <dbl> <dbl> <chr>
## 1 setosa Length 3.5 5.1 f1
## 2 setosa Length 3 4.9 f2
## 3 setosa Length 3.2 4.7 f3
## 4 setosa Length 3.1 4.6 f4
## 5 setosa Length 3.6 5 f5
## 6 setosa Length 3.9 5.4 f6
## 7 setosa Length 3.4 4.6 f7
## 8 setosa Length 3.4 5 f8
## 9 setosa Length 2.9 4.4 f9
## 10 setosa Length 3.1 4.9 f10
## # ... with 290 more rows
In this second example, we are puting in longer format columns 1:2 for Sepal and columns 3:4 for Petal. There are several ways to acheive it. May be the clearest is to use the longest expression found before, split column dimension into measure (Length, Width) and part (Sepal, Petal) and then recreate the wider format we may want
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements") %>%
mutate(part = str_split_fixed(dimension, "\\.", 2)[, 1],
measure = str_split_fixed(dimension, "\\.", 2)[, 2]) %>% select(-dimension)
## # A tibble: 600 x 5
## Species flower_ID Measurements part measure
## <fct> <chr> <dbl> <chr> <chr>
## 1 setosa f1 5.1 Sepal Length
## 2 setosa f1 3.5 Sepal Width
## 3 setosa f1 1.4 Petal Length
## 4 setosa f1 0.2 Petal Width
## 5 setosa f2 4.9 Sepal Length
## 6 setosa f2 3 Sepal Width
## 7 setosa f2 1.4 Petal Length
## 8 setosa f2 0.2 Petal Width
## 9 setosa f3 4.7 Sepal Length
## 10 setosa f3 3.2 Sepal Width
## # ... with 590 more rows
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements") %>%
mutate(part = str_split_fixed(dimension, "\\.", 2)[, 1],
measure = str_split_fixed(dimension, "\\.", 2)[, 2]) %>% select(-dimension) %>%
pivot_wider(names_from = part, values_from = Measurements)
## # A tibble: 300 x 5
## Species flower_ID measure Sepal Petal
## <fct> <chr> <chr> <dbl> <dbl>
## 1 setosa f1 Length 5.1 1.4
## 2 setosa f1 Width 3.5 0.2
## 3 setosa f2 Length 4.9 1.4
## 4 setosa f2 Width 3 0.2
## 5 setosa f3 Length 4.7 1.3
## 6 setosa f3 Width 3.2 0.2
## 7 setosa f4 Length 4.6 1.5
## 8 setosa f4 Width 3.1 0.2
## 9 setosa f5 Length 5 1.4
## 10 setosa f5 Width 3.6 0.2
## # ... with 290 more rows
Can we obtain long_iris_2 using only pivot_long
?
names_to = c(“.value”, “child”), names_sep = “_“, values_drop_na = TRUE
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = c(".value","measure"), names_sep = "\\.")
## # A tibble: 300 x 5
## Species flower_ID measure Sepal Petal
## <fct> <chr> <chr> <dbl> <dbl>
## 1 setosa f1 Length 5.1 1.4
## 2 setosa f1 Width 3.5 0.2
## 3 setosa f2 Length 4.9 1.4
## 4 setosa f2 Width 3 0.2
## 5 setosa f3 Length 4.7 1.3
## 6 setosa f3 Width 3.2 0.2
## 7 setosa f4 Length 4.6 1.5
## 8 setosa f4 Width 3.1 0.2
## 9 setosa f5 Length 5 1.4
## 10 setosa f5 Width 3.6 0.2
## # ... with 290 more rows
long_iris_3 <- reshape(data = long_iris_2, direction = "long",
timevar = "part",
times = c("Sepal", "Petal"),
idvar = "flower_ID",
v.names = "values",
varying = 3:4,
new.row.names = 1:600)
tibble(long_iris_3)
## # A tibble: 600 x 5
## Species measure flower_ID part values
## <fct> <chr> <chr> <chr> <dbl>
## 1 setosa Length f1 Sepal 3.5
## 2 setosa Length f2 Sepal 3
## 3 setosa Length f3 Sepal 3.2
## 4 setosa Length f4 Sepal 3.1
## 5 setosa Length f5 Sepal 3.6
## 6 setosa Length f6 Sepal 3.9
## 7 setosa Length f7 Sepal 3.4
## 8 setosa Length f8 Sepal 3.4
## 9 setosa Length f9 Sepal 2.9
## 10 setosa Length f10 Sepal 3.1
## # ... with 590 more rows
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = c("part","measure"), names_sep = "\\.")
## # A tibble: 600 x 5
## Species flower_ID part measure value
## <fct> <chr> <chr> <chr> <dbl>
## 1 setosa f1 Sepal Length 5.1
## 2 setosa f1 Sepal Width 3.5
## 3 setosa f1 Petal Length 1.4
## 4 setosa f1 Petal Width 0.2
## 5 setosa f2 Sepal Length 4.9
## 6 setosa f2 Sepal Width 3
## 7 setosa f2 Petal Length 1.4
## 8 setosa f2 Petal Width 0.2
## 9 setosa f3 Sepal Length 4.7
## 10 setosa f3 Sepal Width 3.2
## # ... with 590 more rows
finally, we can keep the size measures in the wide format
long_iris_4 <- reshape(data = iris, direction = "long",
timevar = "part",
times = c("Sepal", "Petal"),
idvar = "flower_ID",
ids = paste0("f", 1:150),
v.names = c("Length", "Width"),
#varying = 1:4,
varying = list(c(1, 3), c(2, 4)),
new.row.names = 1:300)
tibble(long_iris_4)
## # A tibble: 300 x 5
## Species part Length Width flower_ID
## <fct> <chr> <dbl> <dbl> <chr>
## 1 setosa Sepal 5.1 3.5 f1
## 2 setosa Sepal 4.9 3 f2
## 3 setosa Sepal 4.7 3.2 f3
## 4 setosa Sepal 4.6 3.1 f4
## 5 setosa Sepal 5 3.6 f5
## 6 setosa Sepal 5.4 3.9 f6
## 7 setosa Sepal 4.6 3.4 f7
## 8 setosa Sepal 5 3.4 f8
## 9 setosa Sepal 4.4 2.9 f9
## 10 setosa Sepal 4.9 3.1 f10
## # ... with 290 more rows
iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
pivot_longer(-c(Species, flower_ID), names_to = c("part", ".value"), names_sep = "\\.") %>%
select(1, 3, 4, 5, 2) %>% arrange(desc(part))
## # A tibble: 300 x 5
## Species part Length Width flower_ID
## <fct> <chr> <dbl> <dbl> <chr>
## 1 setosa Sepal 5.1 3.5 f1
## 2 setosa Sepal 4.9 3 f2
## 3 setosa Sepal 4.7 3.2 f3
## 4 setosa Sepal 4.6 3.1 f4
## 5 setosa Sepal 5 3.6 f5
## 6 setosa Sepal 5.4 3.9 f6
## 7 setosa Sepal 4.6 3.4 f7
## 8 setosa Sepal 5 3.4 f8
## 9 setosa Sepal 4.4 2.9 f9
## 10 setosa Sepal 4.9 3.1 f10
## # ... with 290 more rows
reshape
help examplesThe three following examples can be found in the reshape
function help link
Indometh
datasetLet us start with the Indometh dataset.
head(Indometh)
## Subject time conc
## 1 1 0.25 1.50
## 2 1 0.50 0.94
## 3 1 0.75 0.78
## 4 1 1.00 0.48
## 5 1 1.25 0.37
## 6 1 2.00 0.19
If column time
is considered as timevar
and Subject
as idvar
, then
indo_wide <- reshape(data = Indometh, direction = "wide",
timevar = "time",
idvar = "Subject")
tibble(indo_wide)
## # A tibble: 6 x 12
## Subject conc.0.25 conc.0.5 conc.0.75 conc.1 conc.1.25 conc.2 conc.3
## <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1.5 0.94 0.78 0.48 0.37 0.19 0.12
## 2 2 2.03 1.63 0.71 0.7 0.64 0.36 0.32
## 3 3 2.72 1.49 1.16 0.8 0.8 0.39 0.22
## 4 4 1.85 1.39 1.02 0.89 0.59 0.4 0.16
## 5 5 2.05 1.04 0.81 0.39 0.3 0.23 0.13
## 6 6 2.31 1.44 1.03 0.84 0.64 0.42 0.24
## # ... with 4 more variables: conc.4 <dbl>, conc.5 <dbl>, conc.6 <dbl>,
## # conc.8 <dbl>
Indometh %>%
pivot_wider(names_from = "time", values_from = "conc", names_prefix = "conc.")
## # A tibble: 6 x 12
## Subject conc.0.25 conc.0.5 conc.0.75 conc.1 conc.1.25 conc.2 conc.3
## <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1.5 0.94 0.78 0.48 0.37 0.19 0.12
## 2 2 2.03 1.63 0.71 0.7 0.64 0.36 0.32
## 3 3 2.72 1.49 1.16 0.8 0.8 0.39 0.22
## 4 4 1.85 1.39 1.02 0.89 0.59 0.4 0.16
## 5 5 2.05 1.04 0.81 0.39 0.3 0.23 0.13
## 6 6 2.31 1.44 1.03 0.84 0.64 0.42 0.24
## # ... with 4 more variables: conc.4 <dbl>, conc.5 <dbl>, conc.6 <dbl>,
## # conc.8 <dbl>
If we do the other way around
indo_wide_2 <- reshape(data = Indometh, direction = "wide",
timevar = "Subject",
idvar = "time"
)
tibble(indo_wide_2)
## # A tibble: 11 x 7
## time conc.1 conc.2 conc.3 conc.4 conc.5 conc.6
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.25 1.5 2.03 2.72 1.85 2.05 2.31
## 2 0.5 0.94 1.63 1.49 1.39 1.04 1.44
## 3 0.75 0.78 0.71 1.16 1.02 0.81 1.03
## 4 1 0.48 0.7 0.8 0.89 0.39 0.84
## 5 1.25 0.37 0.64 0.8 0.59 0.3 0.64
## 6 2 0.19 0.36 0.39 0.4 0.23 0.42
## 7 3 0.12 0.32 0.22 0.16 0.13 0.24
## 8 4 0.11 0.2 0.12 0.11 0.11 0.17
## 9 5 0.08 0.25 0.11 0.1 0.08 0.13
## 10 6 0.07 0.12 0.08 0.07 0.1 0.1
## 11 8 0.05 0.08 0.08 0.07 0.06 0.09
Indometh %>%
pivot_wider(names_from = "Subject", values_from = "conc", names_prefix = "conc.")
## # A tibble: 11 x 7
## time conc.1 conc.2 conc.3 conc.4 conc.5 conc.6
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.25 1.5 2.03 2.72 1.85 2.05 2.31
## 2 0.5 0.94 1.63 1.49 1.39 1.04 1.44
## 3 0.75 0.78 0.71 1.16 1.02 0.81 1.03
## 4 1 0.48 0.7 0.8 0.89 0.39 0.84
## 5 1.25 0.37 0.64 0.8 0.59 0.3 0.64
## 6 2 0.19 0.36 0.39 0.4 0.23 0.42
## 7 3 0.12 0.32 0.22 0.16 0.13 0.24
## 8 4 0.11 0.2 0.12 0.11 0.11 0.17
## 9 5 0.08 0.25 0.11 0.1 0.08 0.13
## 10 6 0.07 0.12 0.08 0.07 0.1 0.1
## 11 8 0.05 0.08 0.08 0.07 0.06 0.09
df <- data.frame(id = rep(1:4, rep(2,4)),
visit = I(rep(c("Before","After"), 4)),
x = rnorm(4), y = runif(4))
tibble(df)
## # A tibble: 8 x 4
## id visit x y
## <int> <I<chr>> <dbl> <dbl>
## 1 1 Before 1.30 0.0452
## 2 1 After -0.843 0.953
## 3 2 Before -0.0905 0.722
## 4 2 After -1.70 0.0532
## 5 3 Before 1.30 0.0452
## 6 3 After -0.843 0.953
## 7 4 Before -0.0905 0.722
## 8 4 After -1.70 0.0532
df_1 <- reshape(data = df
, direction = "wide"
, idvar = "visit"
, timevar = "id")
tibble(df_1)
## # A tibble: 2 x 9
## visit x.1 y.1 x.2 y.2 x.3 y.3 x.4 y.4
## <I<chr>> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Before 1.30 0.0452 -0.0905 0.722 1.30 0.0452 -0.0905 0.722
## 2 After -0.843 0.953 -1.70 0.0532 -0.843 0.953 -1.70 0.0532
df %>% pivot_wider(names_from = "id", values_from = c("x", "y"), names_sep = ".")
## # A tibble: 2 x 9
## visit x.1 x.2 x.3 x.4 y.1 y.2 y.3 y.4
## <I<chr>> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Before 1.30 -0.0905 1.30 -0.0905 0.0452 0.722 0.0452 0.722
## 2 After -0.843 -1.70 -0.843 -1.70 0.953 0.0532 0.953 0.0532
We could also expand the visit in columns
df_2 <- reshape(data = df
, direction = "wide"
, idvar = "id"
, timevar = "visit")
tibble(df_2)
## # A tibble: 4 x 5
## id x.Before y.Before x.After y.After
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1.30 0.0452 -0.843 0.953
## 2 2 -0.0905 0.722 -1.70 0.0532
## 3 3 1.30 0.0452 -0.843 0.953
## 4 4 -0.0905 0.722 -1.70 0.0532
df %>% pivot_wider(names_from = "visit", values_from = c("x", "y"), names_sep = ".")
## # A tibble: 4 x 5
## id x.Before x.After y.Before y.After
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1.30 -0.843 0.0452 0.953
## 2 2 -0.0905 -1.70 0.722 0.0532
## 3 3 1.30 -0.843 0.0452 0.953
## 4 4 -0.0905 -1.70 0.722 0.0532
Example using state.x77, matrix with 8 columns with info about the 50 states of USA
str(state.x77)
## num [1:50, 1:8] 3615 365 2212 2110 21198 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
## ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...
df.x77 <- as.data.frame(state.x77)
str(df.x77)
## 'data.frame': 50 obs. of 8 variables:
## $ Population: num 3615 365 2212 2110 21198 ...
## $ Income : num 3624 6315 4530 3378 5114 ...
## $ Illiteracy: num 2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
## $ Life Exp : num 69 69.3 70.5 70.7 71.7 ...
## $ Murder : num 15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
## $ HS Grad : num 41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
## $ Frost : num 20 152 15 65 20 166 139 103 11 60 ...
## $ Area : num 50708 566432 113417 51945 156361 ...
To transform it into a Long format data frame
long.x77 <- reshape(data = df.x77 , direction = "long"
, idvar ="state", ids = row.names(state.x77)
, timevar = "variable", times = names(df.x77)
, v.names = "value", varying = 1:8 #varying = list(names(df.x77))
, new.row.names = 1:400)
as_tibble(long.x77)
## # A tibble: 400 x 3
## variable value state
## <chr> <dbl> <chr>
## 1 Population 3615 Alabama
## 2 Population 365 Alaska
## 3 Population 2212 Arizona
## 4 Population 2110 Arkansas
## 5 Population 21198 California
## 6 Population 2541 Colorado
## 7 Population 3100 Connecticut
## 8 Population 579 Delaware
## 9 Population 8277 Florida
## 10 Population 4931 Georgia
## # ... with 390 more rows
df.x77 %>% mutate(state = row.names(df.x77)) %>%
pivot_longer(-state, names_to = "variable", values_to = "value")
## # A tibble: 400 x 3
## state variable value
## <chr> <chr> <dbl>
## 1 Alabama Population 3615
## 2 Alabama Income 3624
## 3 Alabama Illiteracy 2.1
## 4 Alabama Life Exp 69.0
## 5 Alabama Murder 15.1
## 6 Alabama HS Grad 41.3
## 7 Alabama Frost 20
## 8 Alabama Area 50708
## 9 Alaska Population 365
## 10 Alaska Income 6315
## # ... with 390 more rows
df.x77 %>% mutate(state = row.names(df.x77)) %>%
pivot_longer(-state, names_to = "variable", values_to = "value") %>%
# in case we want exactly the same order
select(2, 3, 1) %>%
arrange(factor(variable, levels = names(df.x77)), state)
## # A tibble: 400 x 3
## variable value state
## <chr> <dbl> <chr>
## 1 Population 3615 Alabama
## 2 Population 365 Alaska
## 3 Population 2212 Arizona
## 4 Population 2110 Arkansas
## 5 Population 21198 California
## 6 Population 2541 Colorado
## 7 Population 3100 Connecticut
## 8 Population 579 Delaware
## 9 Population 8277 Florida
## 10 Population 4931 Georgia
## # ... with 390 more rows
Turn into a new wide format. First we do it automatically
tibble(reshape(long.x77, direction = "wide"))
## # A tibble: 50 x 9
## state Population Income Illiteracy `Life Exp` Murder `HS Grad` Frost
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Alab~ 3615 3624 2.1 69.0 15.1 41.3 20
## 2 Alas~ 365 6315 1.5 69.3 11.3 66.7 152
## 3 Ariz~ 2212 4530 1.8 70.6 7.8 58.1 15
## 4 Arka~ 2110 3378 1.9 70.7 10.1 39.9 65
## 5 Cali~ 21198 5114 1.1 71.7 10.3 62.6 20
## 6 Colo~ 2541 4884 0.7 72.1 6.8 63.9 166
## 7 Conn~ 3100 5348 1.1 72.5 3.1 56 139
## 8 Dela~ 579 4809 0.9 70.1 6.2 54.6 103
## 9 Flor~ 8277 4815 1.3 70.7 10.7 52.6 11
## 10 Geor~ 4931 4091 2 68.5 13.9 40.6 60
## # ... with 40 more rows, and 1 more variable: Area <dbl>
Secondly, we can force the wide format as teh long one was the original shape
tibble(
reshape(data = long.x77, direction = "wide"
, idvar = "state"
, timevar = "variable"
, sep = "_")
)
## # A tibble: 50 x 9
## state value_Population value_Income value_Illiteracy `value_Life Exp`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alab~ 3615 3624 2.1 69.0
## 2 Alas~ 365 6315 1.5 69.3
## 3 Ariz~ 2212 4530 1.8 70.6
## 4 Arka~ 2110 3378 1.9 70.7
## 5 Cali~ 21198 5114 1.1 71.7
## 6 Colo~ 2541 4884 0.7 72.1
## 7 Conn~ 3100 5348 1.1 72.5
## 8 Dela~ 579 4809 0.9 70.1
## 9 Flor~ 8277 4815 1.3 70.7
## 10 Geor~ 4931 4091 2 68.5
## # ... with 40 more rows, and 4 more variables: value_Murder <dbl>,
## # `value_HS Grad` <dbl>, value_Frost <dbl>, value_Area <dbl>
We can also do the equivalent of a transposition by goin first to long format (long.x77) and later to wide (wide.x77)
wide.x77 <- reshape(data = long.x77
, direction = "wide"
, idvar = "variable"
, timevar = "state"
, sep = "_")
tibble(wide.x77)
## # A tibble: 8 x 51
## variable value_Alabama value_Alaska value_Arizona value_Arkansas
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Populat~ 3615 365 2212 2110
## 2 Income 3624 6315 4530 3378
## 3 Illiter~ 2.1 1.5 1.8 1.9
## 4 Life Exp 69.0 69.3 70.6 70.7
## 5 Murder 15.1 11.3 7.8 10.1
## 6 HS Grad 41.3 66.7 58.1 39.9
## 7 Frost 20 152 15 65
## 8 Area 50708 566432 113417 51945
## # ... with 46 more variables: value_California <dbl>,
## # value_Colorado <dbl>, value_Connecticut <dbl>, value_Delaware <dbl>,
## # value_Florida <dbl>, value_Georgia <dbl>, value_Hawaii <dbl>,
## # value_Idaho <dbl>, value_Illinois <dbl>, value_Indiana <dbl>,
## # value_Iowa <dbl>, value_Kansas <dbl>, value_Kentucky <dbl>,
## # value_Louisiana <dbl>, value_Maine <dbl>, value_Maryland <dbl>,
## # value_Massachusetts <dbl>, value_Michigan <dbl>,
## # value_Minnesota <dbl>, value_Mississippi <dbl>, value_Missouri <dbl>,
## # value_Montana <dbl>, value_Nebraska <dbl>, value_Nevada <dbl>,
## # `value_New Hampshire` <dbl>, `value_New Jersey` <dbl>, `value_New
## # Mexico` <dbl>, `value_New York` <dbl>, `value_North Carolina` <dbl>,
## # `value_North Dakota` <dbl>, value_Ohio <dbl>, value_Oklahoma <dbl>,
## # value_Oregon <dbl>, value_Pennsylvania <dbl>, `value_Rhode
## # Island` <dbl>, `value_South Carolina` <dbl>, `value_South
## # Dakota` <dbl>, value_Tennessee <dbl>, value_Texas <dbl>,
## # value_Utah <dbl>, value_Vermont <dbl>, value_Virginia <dbl>,
## # value_Washington <dbl>, `value_West Virginia` <dbl>,
## # value_Wisconsin <dbl>, value_Wyoming <dbl>
Another example, from link
df3 <- data.frame(id = 1:4, age = c(40,50,60,50), dose1 = c(1,2,1,2),
dose2 = c(2,1,2,1), dose4 = c(3,3,3,3))
tibble(df3)
## # A tibble: 4 x 5
## id age dose1 dose2 dose4
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 40 1 2 3
## 2 2 50 2 1 3
## 3 3 60 1 2 3
## 4 4 50 2 1 3
reshape(data = df3, direction = "long"
, idvar = "id"
, timevar = "dose_type"
, times = c(1, 2, 4)
, v.names = "dose"
, varying = 3:5
, new.row.names = 1:12
, sep = "")
## id age dose_type dose
## 1 1 40 1 1
## 2 2 50 1 2
## 3 3 60 1 1
## 4 4 50 1 2
## 5 1 40 2 2
## 6 2 50 2 1
## 7 3 60 2 2
## 8 4 50 2 1
## 9 1 40 4 3
## 10 2 50 4 3
## 11 3 60 4 3
## 12 4 50 4 3
df3 %>% pivot_longer(-c(id, age), names_to = "dose_type", values_to = "dose",
names_pattern = "dose(.)", names_ptypes = list(dose_type = numeric()))
## # A tibble: 12 x 4
## id age dose_type dose
## <int> <dbl> <dbl> <dbl>
## 1 1 40 1 1
## 2 1 40 2 2
## 3 1 40 4 3
## 4 2 50 1 2
## 5 2 50 2 1
## 6 2 50 4 3
## 7 3 60 1 1
## 8 3 60 2 2
## 9 3 60 4 3
## 10 4 50 1 2
## 11 4 50 2 1
## 12 4 50 4 3
reshape(data = df3, direction = "long"
, varying = 3:5
, sep = "")
## id age time dose
## 1.1 1 40 1 1
## 2.1 2 50 1 2
## 3.1 3 60 1 1
## 4.1 4 50 1 2
## 1.2 1 40 2 2
## 2.2 2 50 2 1
## 3.2 3 60 2 2
## 4.2 4 50 2 1
## 1.4 1 40 4 3
## 2.4 2 50 4 3
## 3.4 3 60 4 3
## 4.4 4 50 4 3
One more example, from TRinker’s R Blog
set.seed(10)
dat <- data.frame(id=paste0("ID",1:5),
sex=sample(c("male", "female"), 5, replace=TRUE),
matrix(rpois(30, 10), 5, 6))
colnames(dat)[-c(1:2)] <- paste0(rep(1:2, times=3),
rep(c("work", "home", "church"), 2))
tibble(dat)
## # A tibble: 5 x 8
## id sex `1work` `2home` `1church` `2work` `1home` `2church`
## <fct> <fct> <int> <int> <int> <int> <int> <int>
## 1 ID1 female 7 8 7 10 6 10
## 2 ID2 male 10 13 10 7 13 15
## 3 ID3 male 11 10 6 10 10 7
## 4 ID4 female 6 8 12 9 15 7
## 5 ID5 male 9 11 15 10 10 12
reshape(dat, direction="long"
, idvar = "id"
, timevar = "part_area"
, times = names(dat)[3:8]
, v.names = "# pieces"
, varying = 3:8
, new.row.names = 1:30)
## id sex part_area # pieces
## 1 ID1 female 1work 7
## 2 ID2 male 1work 10
## 3 ID3 male 1work 11
## 4 ID4 female 1work 6
## 5 ID5 male 1work 9
## 6 ID1 female 2home 8
## 7 ID2 male 2home 13
## 8 ID3 male 2home 10
## 9 ID4 female 2home 8
## 10 ID5 male 2home 11
## 11 ID1 female 1church 7
## 12 ID2 male 1church 10
## 13 ID3 male 1church 6
## 14 ID4 female 1church 12
## 15 ID5 male 1church 15
## 16 ID1 female 2work 10
## 17 ID2 male 2work 7
## 18 ID3 male 2work 10
## 19 ID4 female 2work 9
## 20 ID5 male 2work 10
## 21 ID1 female 1home 6
## 22 ID2 male 1home 13
## 23 ID3 male 1home 10
## 24 ID4 female 1home 15
## 25 ID5 male 1home 10
## 26 ID1 female 2church 10
## 27 ID2 male 2church 15
## 28 ID3 male 2church 7
## 29 ID4 female 2church 7
## 30 ID5 male 2church 12
dat %>% pivot_longer(-(1:2), names_to = "part_area", values_to = "# pieces")
## # A tibble: 30 x 4
## id sex part_area `# pieces`
## <fct> <fct> <chr> <int>
## 1 ID1 female 1work 7
## 2 ID1 female 2home 8
## 3 ID1 female 1church 7
## 4 ID1 female 2work 10
## 5 ID1 female 1home 6
## 6 ID1 female 2church 10
## 7 ID2 male 1work 10
## 8 ID2 male 2home 13
## 9 ID2 male 1church 10
## 10 ID2 male 2work 7
## # ... with 20 more rows
reshape(data = dat, direction = "long",
varying = list(c(3, 7, 5), c(6, 4, 8)),
idvar = "id",
v.names = c("TIME_1", "TIME_2"),
timevar = "PLACE",
times = c("wrk", "hom", "chr"),
new.row.names = 1:15)
## id sex PLACE TIME_1 TIME_2
## 1 ID1 female wrk 7 10
## 2 ID2 male wrk 10 7
## 3 ID3 male wrk 11 10
## 4 ID4 female wrk 6 9
## 5 ID5 male wrk 9 10
## 6 ID1 female hom 6 8
## 7 ID2 male hom 13 13
## 8 ID3 male hom 10 10
## 9 ID4 female hom 15 8
## 10 ID5 male hom 10 11
## 11 ID1 female chr 7 10
## 12 ID2 male chr 10 15
## 13 ID3 male chr 6 7
## 14 ID4 female chr 12 7
## 15 ID5 male chr 15 12
dat %>% pivot_longer(-(1:2), names_to = "part_area", values_to = "# pieces") %>%
mutate(part = as.numeric(str_sub(part_area, 1, 1)), area = str_extract(part_area, "\\D+")) %>%
mutate(PLACE = factor(area, levels = c("work", "home", "church"), labels = c("wrk", "hom", "chr"))) %>%
select(-c(part_area, area)) %>%
pivot_wider(names_from = part, values_from = "# pieces", names_prefix = "TIME_")
## # A tibble: 15 x 5
## id sex PLACE TIME_1 TIME_2
## <fct> <fct> <fct> <int> <int>
## 1 ID1 female wrk 7 10
## 2 ID1 female hom 6 8
## 3 ID1 female chr 7 10
## 4 ID2 male wrk 10 7
## 5 ID2 male hom 13 13
## 6 ID2 male chr 10 15
## 7 ID3 male wrk 11 10
## 8 ID3 male hom 10 10
## 9 ID3 male chr 6 7
## 10 ID4 female wrk 6 9
## 11 ID4 female hom 15 8
## 12 ID4 female chr 12 7
## 13 ID5 male wrk 9 10
## 14 ID5 male hom 10 11
## 15 ID5 male chr 15 12
reshape(data = dat, direction = "long",
timevar = "TIME",
times = 1:2,
varying = list(c(3, 6), c(7, 4), c(5,8)),
v.names = c("WORK", "HOME", "CHURCH"),
new.row.names = 1:10)
## id sex TIME WORK HOME CHURCH
## 1 ID1 female 1 7 6 7
## 2 ID2 male 1 10 13 10
## 3 ID3 male 1 11 10 6
## 4 ID4 female 1 6 15 12
## 5 ID5 male 1 9 10 15
## 6 ID1 female 2 10 8 10
## 7 ID2 male 2 7 13 15
## 8 ID3 male 2 10 10 7
## 9 ID4 female 2 9 8 7
## 10 ID5 male 2 10 11 12
dat %>% pivot_longer(-(1:2), names_to = c("TIME", ".value"), names_pattern = "(\\d)(\\D+)")
## # A tibble: 10 x 6
## id sex TIME work home church
## <fct> <fct> <chr> <int> <int> <int>
## 1 ID1 female 1 7 6 7
## 2 ID1 female 2 10 8 10
## 3 ID2 male 1 10 13 10
## 4 ID2 male 2 7 13 15
## 5 ID3 male 1 11 10 6
## 6 ID3 male 2 10 10 7
## 7 ID4 female 1 6 15 12
## 8 ID4 female 2 9 8 7
## 9 ID5 male 1 9 10 15
## 10 ID5 male 2 10 11 12
wide_1 <- reshape(data = dat, direction = "long",
timevar = "TIME",
times = 1:2,
varying = list(c(3, 6), c(7, 4), c(5,8)),
v.names = c("WORK", "HOME", "CHURCH"),
new.row.names = 1:10)
reshape(data = wide_1, direction = "long",
timevar = "PLACE",
times = names(wide_1)[4:6], #c("work", "home", "church")
varying = 4:6, #c("WORK", "HOME", "CHURCH"),
v.names = "# people",
new.row.names = 1:30)
## id sex TIME PLACE # people
## 1 ID1 female 1 WORK 7
## 2 ID2 male 1 WORK 10
## 3 ID3 male 1 WORK 11
## 4 ID4 female 1 WORK 6
## 5 ID5 male 1 WORK 9
## 6 ID1 female 2 WORK 10
## 7 ID2 male 2 WORK 7
## 8 ID3 male 2 WORK 10
## 9 ID4 female 2 WORK 9
## 10 ID5 male 2 WORK 10
## 11 ID1 female 1 HOME 6
## 12 ID2 male 1 HOME 13
## 13 ID3 male 1 HOME 10
## 14 ID4 female 1 HOME 15
## 15 ID5 male 1 HOME 10
## 16 ID1 female 2 HOME 8
## 17 ID2 male 2 HOME 13
## 18 ID3 male 2 HOME 10
## 19 ID4 female 2 HOME 8
## 20 ID5 male 2 HOME 11
## 21 ID1 female 1 CHURCH 7
## 22 ID2 male 1 CHURCH 10
## 23 ID3 male 1 CHURCH 6
## 24 ID4 female 1 CHURCH 12
## 25 ID5 male 1 CHURCH 15
## 26 ID1 female 2 CHURCH 10
## 27 ID2 male 2 CHURCH 15
## 28 ID3 male 2 CHURCH 7
## 29 ID4 female 2 CHURCH 7
## 30 ID5 male 2 CHURCH 12
dat %>% pivot_longer(-(1:2), names_to = "part_area", values_to = "# people") %>%
mutate(TIME = as.numeric(str_sub(part_area, 1, 1)), PLACE = str_extract(part_area, "\\D+")) %>%
mutate(PLACE = str_to_upper(PLACE)) %>%
select(-part_area) %>% select(1:2, 4:5, 3) %>%
arrange(desc(PLACE), TIME)
## # A tibble: 30 x 5
## id sex TIME PLACE `# people`
## <fct> <fct> <dbl> <chr> <int>
## 1 ID1 female 1 WORK 7
## 2 ID2 male 1 WORK 10
## 3 ID3 male 1 WORK 11
## 4 ID4 female 1 WORK 6
## 5 ID5 male 1 WORK 9
## 6 ID1 female 2 WORK 10
## 7 ID2 male 2 WORK 7
## 8 ID3 male 2 WORK 10
## 9 ID4 female 2 WORK 9
## 10 ID5 male 2 WORK 10
## # ... with 20 more rows