pivot_wider and pivot_longer as alternative to reshape base R function

library(tidyr)
library(dplyr)
library(stringr)

Goal

The goal of this note is to rewrite all examples in Rpubs.com/pep1024 to transform (reshape) a data frame between the long and wide formats using pivot_* functions in tidyr package instead of the reshape base R function .

Functions pivot_wideand pivot_long are the latest functions created by Hadley Wickham to replace the original reshape function in base R. An essay of cronology of the alternatives is the following:

reshape package first published in 2005 with functions melt and cast CRAN
reshape2package as a reboot of the reshape package. First published 2010 CRAN
tidyr package published in 2014 with functions gather and spread CRAN
tidyr new functions pivot_longer and pivot_wider. First published in 2019 news

Wider formats

Examples in ref of reshape function

`JohnsonJohnson` dataset

Let as take the Johnson & Johnson dataset, a data frame in the long format:

data("JohnsonJohnson")
JJ <- data.frame(year = rep(1960:1980, each = 4),
  quarter = paste0("Qtr", rep(1:4, 21)), 
  earnings = as.vector(JohnsonJohnson))
tibble(JJ)

## # A tibble: 84 x 3
##     year quarter earnings
##    <int> <fct>      <dbl>
##  1  1960 Qtr1        0.71
##  2  1960 Qtr2        0.63
##  3  1960 Qtr3        0.85
##  4  1960 Qtr4        0.44
##  5  1961 Qtr1        0.61
##  6  1961 Qtr2        0.69
##  7  1961 Qtr3        0.92
##  8  1961 Qtr4        0.55
##  9  1962 Qtr1        0.72
## 10  1962 Qtr2        0.77
## # ... with 74 more rows

reshape(data = JJ, direction = "wide",
  timevar = "year",
  idvar = "quarter",
  v.names = "earnings")

##   quarter earnings.1960 earnings.1961 earnings.1962 earnings.1963
## 1    Qtr1          0.71          0.61          0.72          0.83
## 2    Qtr2          0.63          0.69          0.77          0.80
## 3    Qtr3          0.85          0.92          0.92          1.00
## 4    Qtr4          0.44          0.55          0.60          0.77
##   earnings.1964 earnings.1965 earnings.1966 earnings.1967 earnings.1968
## 1          0.92          1.16          1.26          1.53          1.53
## 2          1.00          1.30          1.38          1.59          2.07
## 3          1.24          1.45          1.86          1.83          2.34
## 4          1.00          1.25          1.56          1.86          2.25
##   earnings.1969 earnings.1970 earnings.1971 earnings.1972 earnings.1973
## 1          2.16          2.79          3.60          4.86          5.58
## 2          2.43          3.42          4.32          5.04          5.85
## 3          2.70          3.69          4.32          5.04          6.57
## 4          2.25          3.60          4.05          4.41          5.31
##   earnings.1974 earnings.1975 earnings.1976 earnings.1977 earnings.1978
## 1          6.03          6.93          7.74          9.54         11.88
## 2          6.39          7.74          8.91         10.26         12.06
## 3          6.93          7.83          8.28          9.54         12.15
## 4          5.85          6.12          6.84          8.73          8.91
##   earnings.1979 earnings.1980
## 1         14.04         16.20
## 2         12.96         14.67
## 3         14.85         16.02
## 4          9.99         11.61

How can we get the same result using pivot_wide?

JJ %>% pivot_wider(names_from = year, values_from = earnings, names_prefix = "earnings.")

## # A tibble: 4 x 22
##   quarter earnings.1960 earnings.1961 earnings.1962 earnings.1963
##   <fct>           <dbl>         <dbl>         <dbl>         <dbl>
## 1 Qtr1             0.71          0.61          0.72          0.83
## 2 Qtr2             0.63          0.69          0.77          0.8 
## 3 Qtr3             0.85          0.92          0.92          1   
## 4 Qtr4             0.44          0.55          0.6           0.77
## # ... with 17 more variables: earnings.1964 <dbl>, earnings.1965 <dbl>,
## #   earnings.1966 <dbl>, earnings.1967 <dbl>, earnings.1968 <dbl>,
## #   earnings.1969 <dbl>, earnings.1970 <dbl>, earnings.1971 <dbl>,
## #   earnings.1972 <dbl>, earnings.1973 <dbl>, earnings.1974 <dbl>,
## #   earnings.1975 <dbl>, earnings.1976 <dbl>, earnings.1977 <dbl>,
## #   earnings.1978 <dbl>, earnings.1979 <dbl>, earnings.1980 <dbl>

A different wider table generated from JJ

wide_1 <- reshape(data = JJ, direction = "wide", 
  timevar = "quarter",
  idvar = "year", 
  sep = "_")
tibble(wide_1)

## # A tibble: 21 x 5
##     year earnings_Qtr1 earnings_Qtr2 earnings_Qtr3 earnings_Qtr4
##    <int>         <dbl>         <dbl>         <dbl>         <dbl>
##  1  1960          0.71          0.63          0.85          0.44
##  2  1961          0.61          0.69          0.92          0.55
##  3  1962          0.72          0.77          0.92          0.6 
##  4  1963          0.83          0.8           1             0.77
##  5  1964          0.92          1             1.24          1   
##  6  1965          1.16          1.3           1.45          1.25
##  7  1966          1.26          1.38          1.86          1.56
##  8  1967          1.53          1.59          1.83          1.86
##  9  1968          1.53          2.07          2.34          2.25
## 10  1969          2.16          2.43          2.7           2.25
## # ... with 11 more rows

JJ %>% pivot_wider(names_from = quarter, values_from = earnings, names_prefix = "earnings.")

## # A tibble: 21 x 5
##     year earnings.Qtr1 earnings.Qtr2 earnings.Qtr3 earnings.Qtr4
##    <int>         <dbl>         <dbl>         <dbl>         <dbl>
##  1  1960          0.71          0.63          0.85          0.44
##  2  1961          0.61          0.69          0.92          0.55
##  3  1962          0.72          0.77          0.92          0.6 
##  4  1963          0.83          0.8           1             0.77
##  5  1964          0.92          1             1.24          1   
##  6  1965          1.16          1.3           1.45          1.25
##  7  1966          1.26          1.38          1.86          1.56
##  8  1967          1.53          1.59          1.83          1.86
##  9  1968          1.53          2.07          2.34          2.25
## 10  1969          2.16          2.43          2.7           2.25
## # ... with 11 more rows

wide_1$id <- rep(1, 21)

Now we can continue to move “year” timevar column to wide format

wide_2 <- reshape(data = wide_1, direction = "wide",
  timevar = "year",
  idvar = "id", 
  sep = ":")
tibble(wide_2[, 1:6])

## # A tibble: 1 x 6
##      id `earnings_Qtr1:~ `earnings_Qtr2:~ `earnings_Qtr3:~ `earnings_Qtr4:~
##   <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
## 1     1             0.71             0.63             0.85             0.44
## # ... with 1 more variable: `earnings_Qtr1:1961` <dbl>

We have just transposed the initial long data frame JJ. the same result can be achieved in one step using pivot_wider:

JJ %>% pivot_wider(names_from = c(quarter, year), values_from = earnings, names_prefix = "earnings_",
  names_sep = ":")

## # A tibble: 1 x 84
##   `earnings_Qtr1:~ `earnings_Qtr2:~ `earnings_Qtr3:~ `earnings_Qtr4:~
##              <dbl>            <dbl>            <dbl>            <dbl>
## 1             0.71             0.63             0.85             0.44
## # ... with 80 more variables: `earnings_Qtr1:1961` <dbl>,
## #   `earnings_Qtr2:1961` <dbl>, `earnings_Qtr3:1961` <dbl>,
## #   `earnings_Qtr4:1961` <dbl>, `earnings_Qtr1:1962` <dbl>,
## #   `earnings_Qtr2:1962` <dbl>, `earnings_Qtr3:1962` <dbl>,
## #   `earnings_Qtr4:1962` <dbl>, `earnings_Qtr1:1963` <dbl>,
## #   `earnings_Qtr2:1963` <dbl>, `earnings_Qtr3:1963` <dbl>,
## #   `earnings_Qtr4:1963` <dbl>, `earnings_Qtr1:1964` <dbl>,
## #   `earnings_Qtr2:1964` <dbl>, `earnings_Qtr3:1964` <dbl>,
## #   `earnings_Qtr4:1964` <dbl>, `earnings_Qtr1:1965` <dbl>,
## #   `earnings_Qtr2:1965` <dbl>, `earnings_Qtr3:1965` <dbl>,
## #   `earnings_Qtr4:1965` <dbl>, `earnings_Qtr1:1966` <dbl>,
## #   `earnings_Qtr2:1966` <dbl>, `earnings_Qtr3:1966` <dbl>,
## #   `earnings_Qtr4:1966` <dbl>, `earnings_Qtr1:1967` <dbl>,
## #   `earnings_Qtr2:1967` <dbl>, `earnings_Qtr3:1967` <dbl>,
## #   `earnings_Qtr4:1967` <dbl>, `earnings_Qtr1:1968` <dbl>,
## #   `earnings_Qtr2:1968` <dbl>, `earnings_Qtr3:1968` <dbl>,
## #   `earnings_Qtr4:1968` <dbl>, `earnings_Qtr1:1969` <dbl>,
## #   `earnings_Qtr2:1969` <dbl>, `earnings_Qtr3:1969` <dbl>,
## #   `earnings_Qtr4:1969` <dbl>, `earnings_Qtr1:1970` <dbl>,
## #   `earnings_Qtr2:1970` <dbl>, `earnings_Qtr3:1970` <dbl>,
## #   `earnings_Qtr4:1970` <dbl>, `earnings_Qtr1:1971` <dbl>,
## #   `earnings_Qtr2:1971` <dbl>, `earnings_Qtr3:1971` <dbl>,
## #   `earnings_Qtr4:1971` <dbl>, `earnings_Qtr1:1972` <dbl>,
## #   `earnings_Qtr2:1972` <dbl>, `earnings_Qtr3:1972` <dbl>,
## #   `earnings_Qtr4:1972` <dbl>, `earnings_Qtr1:1973` <dbl>,
## #   `earnings_Qtr2:1973` <dbl>, `earnings_Qtr3:1973` <dbl>,
## #   `earnings_Qtr4:1973` <dbl>, `earnings_Qtr1:1974` <dbl>,
## #   `earnings_Qtr2:1974` <dbl>, `earnings_Qtr3:1974` <dbl>,
## #   `earnings_Qtr4:1974` <dbl>, `earnings_Qtr1:1975` <dbl>,
## #   `earnings_Qtr2:1975` <dbl>, `earnings_Qtr3:1975` <dbl>,
## #   `earnings_Qtr4:1975` <dbl>, `earnings_Qtr1:1976` <dbl>,
## #   `earnings_Qtr2:1976` <dbl>, `earnings_Qtr3:1976` <dbl>,
## #   `earnings_Qtr4:1976` <dbl>, `earnings_Qtr1:1977` <dbl>,
## #   `earnings_Qtr2:1977` <dbl>, `earnings_Qtr3:1977` <dbl>,
## #   `earnings_Qtr4:1977` <dbl>, `earnings_Qtr1:1978` <dbl>,
## #   `earnings_Qtr2:1978` <dbl>, `earnings_Qtr3:1978` <dbl>,
## #   `earnings_Qtr4:1978` <dbl>, `earnings_Qtr1:1979` <dbl>,
## #   `earnings_Qtr2:1979` <dbl>, `earnings_Qtr3:1979` <dbl>,
## #   `earnings_Qtr4:1979` <dbl>, `earnings_Qtr1:1980` <dbl>,
## #   `earnings_Qtr2:1980` <dbl>, `earnings_Qtr3:1980` <dbl>,
## #   `earnings_Qtr4:1980` <dbl>

`HairEyeColor` dataset

data("HairEyeColor")
hec <- as.data.frame(HairEyeColor)
tibble(hec)

## # A tibble: 32 x 4
##    Hair  Eye   Sex    Freq
##    <fct> <fct> <fct> <dbl>
##  1 Black Brown Male     32
##  2 Brown Brown Male     53
##  3 Red   Brown Male     10
##  4 Blond Brown Male      3
##  5 Black Blue  Male     11
##  6 Brown Blue  Male     50
##  7 Red   Blue  Male     10
##  8 Blond Blue  Male     30
##  9 Black Hazel Male     10
## 10 Brown Hazel Male     25
## # ... with 22 more rows

hec_1 <- reshape(data = hec, direction = "wide", 
  timevar = "Sex",
  idvar = c("Hair", "Eye"),
  v.names = "Freq",
  sep = "_")
tibble(hec_1)

## # A tibble: 16 x 4
##    Hair  Eye   Freq_Male Freq_Female
##    <fct> <fct>     <dbl>       <dbl>
##  1 Black Brown        32          36
##  2 Brown Brown        53          66
##  3 Red   Brown        10          16
##  4 Blond Brown         3           4
##  5 Black Blue         11           9
##  6 Brown Blue         50          34
##  7 Red   Blue         10           7
##  8 Blond Blue         30          64
##  9 Black Hazel        10           5
## 10 Brown Hazel        25          29
## 11 Red   Hazel         7           7
## 12 Blond Hazel         5           5
## 13 Black Green         3           2
## 14 Brown Green        15          14
## 15 Red   Green         7           7
## 16 Blond Green         8           8

The equivalent form with pivot_wider will be

hec %>% pivot_wider(names_from = Sex, values_from = Freq, names_prefix = "Freq_")

## # A tibble: 16 x 4
##    Hair  Eye   Freq_Male Freq_Female
##    <fct> <fct>     <dbl>       <dbl>
##  1 Black Brown        32          36
##  2 Brown Brown        53          66
##  3 Red   Brown        10          16
##  4 Blond Brown         3           4
##  5 Black Blue         11           9
##  6 Brown Blue         50          34
##  7 Red   Blue         10           7
##  8 Blond Blue         30          64
##  9 Black Hazel        10           5
## 10 Brown Hazel        25          29
## 11 Red   Hazel         7           7
## 12 Blond Hazel         5           5
## 13 Black Green         3           2
## 14 Brown Green        15          14
## 15 Red   Green         7           7
## 16 Blond Green         8           8

hec_2 <-  reshape(data = hec_1, direction = "wide",
  timevar = "Eye",
  idvar = "Hair",
  sep = ":")
tibble(hec_2)

## # A tibble: 4 x 9
##   Hair  `Freq_Male:Brow~ `Freq_Female:Br~ `Freq_Male:Blue` `Freq_Female:Bl~
##   <fct>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 Black               32               36               11                9
## 2 Brown               53               66               50               34
## 3 Red                 10               16               10                7
## 4 Blond                3                4               30               64
## # ... with 4 more variables: `Freq_Male:Hazel` <dbl>,
## #   `Freq_Female:Hazel` <dbl>, `Freq_Male:Green` <dbl>,
## #   `Freq_Female:Green` <dbl>

Again, the same can be obtained with one step

hec %>% pivot_wider(names_from = c(Sex, Eye), values_from = Freq, names_prefix = "Freq_",
  names_sep = ":")

## # A tibble: 4 x 9
##   Hair  `Freq_Male:Brow~ `Freq_Male:Blue` `Freq_Male:Haze~ `Freq_Male:Gree~
##   <fct>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 Black               32               11               10                3
## 2 Brown               53               50               25               15
## 3 Red                 10               10                7                7
## 4 Blond                3               30                5                8
## # ... with 4 more variables: `Freq_Female:Brown` <dbl>,
## #   `Freq_Female:Blue` <dbl>, `Freq_Female:Hazel` <dbl>,
## #   `Freq_Female:Green` <dbl>

Longer formats

first example

w_df <- data.frame(
  city = c("London", "Barcelona", "Paris"),
  rivers = c(1, 2, 1),
  letters = c(6, 9, 5),
  stringsAsFactors = F
)
tibble(w_df)

## # A tibble: 3 x 3
##   city      rivers letters
##   <chr>      <dbl>   <dbl>
## 1 London         1       6
## 2 Barcelona      2       9
## 3 Paris          1       5

reshape(w_df, direction = "long",
  v.names = "value",
  varying = 2:3,
  times = names(w_df)[2:3],
  timevar = "measure",
  idvar = "city",
  new.row.names = 1:6)

##        city measure value
## 1    London  rivers     1
## 2 Barcelona  rivers     2
## 3     Paris  rivers     1
## 4    London letters     6
## 5 Barcelona letters     9
## 6     Paris letters     5

Probably w_df could be better defined as a tibble

w_tibble <- tribble(
  ~city, ~rivers, ~letters,
  "London", 1, 6,
  "Barcelona", 2,   9,
  "Paris", 1,   5  
)

w_tibble %>% pivot_longer(-city, names_to = "measure", values_to = "value") %>% 
  arrange(desc(measure))

## # A tibble: 6 x 3
##   city      measure value
##   <chr>     <chr>   <dbl>
## 1 London    rivers      1
## 2 Barcelona rivers      2
## 3 Paris     rivers      1
## 4 London    letters     6
## 5 Barcelona letters     9
## 6 Paris     letters     5

`Iris` dataset

We start with the iris well known dataframe as used in the post R-Bloggers

data("iris")
as_tibble(iris)

## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows

long_iris <- reshape(data = iris, direction = "long"
  , timevar = "dimension"
  , times = names(iris)[1:4] 
  , idvar = "flower_ID"
  , ids = paste0("f", 1:150)
  , v.names = "Measurements"
  , varying = 1:4
  , new.row.names = 1:600
  )
tibble(long_iris)

## # A tibble: 600 x 4
##    Species dimension    Measurements flower_ID
##    <fct>   <chr>               <dbl> <chr>    
##  1 setosa  Sepal.Length          5.1 f1       
##  2 setosa  Sepal.Length          4.9 f2       
##  3 setosa  Sepal.Length          4.7 f3       
##  4 setosa  Sepal.Length          4.6 f4       
##  5 setosa  Sepal.Length          5   f5       
##  6 setosa  Sepal.Length          5.4 f6       
##  7 setosa  Sepal.Length          4.6 f7       
##  8 setosa  Sepal.Length          5   f8       
##  9 setosa  Sepal.Length          4.4 f9       
## 10 setosa  Sepal.Length          4.9 f10      
## # ... with 590 more rows

The equivalent with pivot_longer requires first to create flower_ID column

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements")

## # A tibble: 600 x 4
##    Species flower_ID dimension    Measurements
##    <fct>   <chr>     <chr>               <dbl>
##  1 setosa  f1        Sepal.Length          5.1
##  2 setosa  f1        Sepal.Width           3.5
##  3 setosa  f1        Petal.Length          1.4
##  4 setosa  f1        Petal.Width           0.2
##  5 setosa  f2        Sepal.Length          4.9
##  6 setosa  f2        Sepal.Width           3  
##  7 setosa  f2        Petal.Length          1.4
##  8 setosa  f2        Petal.Width           0.2
##  9 setosa  f3        Sepal.Length          4.7
## 10 setosa  f3        Sepal.Width           3.2
## # ... with 590 more rows

To obtain precisely the same order as with reshape, we need to reorder columns and arrage rows

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements") %>%
  select(1, 3, 4, 2) %>% arrange(factor(dimension, levels = names(iris)[1:4]))

## # A tibble: 600 x 4
##    Species dimension    Measurements flower_ID
##    <fct>   <chr>               <dbl> <chr>    
##  1 setosa  Sepal.Length          5.1 f1       
##  2 setosa  Sepal.Length          4.9 f2       
##  3 setosa  Sepal.Length          4.7 f3       
##  4 setosa  Sepal.Length          4.6 f4       
##  5 setosa  Sepal.Length          5   f5       
##  6 setosa  Sepal.Length          5.4 f6       
##  7 setosa  Sepal.Length          4.6 f7       
##  8 setosa  Sepal.Length          5   f8       
##  9 setosa  Sepal.Length          4.4 f9       
## 10 setosa  Sepal.Length          4.9 f10      
## # ... with 590 more rows

str(long_iris)

## 'data.frame':    600 obs. of  4 variables:
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ dimension   : chr  "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Length" ...
##  $ Measurements: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ flower_ID   : chr  "f1" "f2" "f3" "f4" ...
##  - attr(*, "reshapeLong")=List of 4
##   ..$ varying:List of 1
##   .. ..$ Measurements: chr  "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
##   .. ..- attr(*, "v.names")= chr "Measurements"
##   .. ..- attr(*, "times")= chr  "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
##   ..$ v.names: chr "Measurements"
##   ..$ idvar  : chr "flower_ID"
##   ..$ timevar: chr "dimension"

long_iris_2 <- reshape(data = iris, direction = "long",
  timevar = "measure", 
  times = c("Length", "Width"), 
  idvar = "flower_ID",
  ids = paste0("f", 1:150),
  v.names = c("Sepal", "Petal"),
  varying = 1:4,
  new.row.names = 1:300)
tibble(long_iris_2)

## # A tibble: 300 x 5
##    Species measure Sepal Petal flower_ID
##    <fct>   <chr>   <dbl> <dbl> <chr>    
##  1 setosa  Length    3.5   5.1 f1       
##  2 setosa  Length    3     4.9 f2       
##  3 setosa  Length    3.2   4.7 f3       
##  4 setosa  Length    3.1   4.6 f4       
##  5 setosa  Length    3.6   5   f5       
##  6 setosa  Length    3.9   5.4 f6       
##  7 setosa  Length    3.4   4.6 f7       
##  8 setosa  Length    3.4   5   f8       
##  9 setosa  Length    2.9   4.4 f9       
## 10 setosa  Length    3.1   4.9 f10      
## # ... with 290 more rows

In this second example, we are puting in longer format columns 1:2 for Sepal and columns 3:4 for Petal. There are several ways to acheive it. May be the clearest is to use the longest expression found before, split column dimension into measure (Length, Width) and part (Sepal, Petal) and then recreate the wider format we may want

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements") %>%
  mutate(part = str_split_fixed(dimension, "\\.", 2)[, 1], 
    measure = str_split_fixed(dimension, "\\.", 2)[, 2]) %>% select(-dimension)

## # A tibble: 600 x 5
##    Species flower_ID Measurements part  measure
##    <fct>   <chr>            <dbl> <chr> <chr>  
##  1 setosa  f1                 5.1 Sepal Length 
##  2 setosa  f1                 3.5 Sepal Width  
##  3 setosa  f1                 1.4 Petal Length 
##  4 setosa  f1                 0.2 Petal Width  
##  5 setosa  f2                 4.9 Sepal Length 
##  6 setosa  f2                 3   Sepal Width  
##  7 setosa  f2                 1.4 Petal Length 
##  8 setosa  f2                 0.2 Petal Width  
##  9 setosa  f3                 4.7 Sepal Length 
## 10 setosa  f3                 3.2 Sepal Width  
## # ... with 590 more rows

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = "dimension", values_to = "Measurements") %>%
  mutate(part = str_split_fixed(dimension, "\\.", 2)[, 1], 
    measure = str_split_fixed(dimension, "\\.", 2)[, 2]) %>% select(-dimension) %>%
  pivot_wider(names_from = part, values_from = Measurements)

## # A tibble: 300 x 5
##    Species flower_ID measure Sepal Petal
##    <fct>   <chr>     <chr>   <dbl> <dbl>
##  1 setosa  f1        Length    5.1   1.4
##  2 setosa  f1        Width     3.5   0.2
##  3 setosa  f2        Length    4.9   1.4
##  4 setosa  f2        Width     3     0.2
##  5 setosa  f3        Length    4.7   1.3
##  6 setosa  f3        Width     3.2   0.2
##  7 setosa  f4        Length    4.6   1.5
##  8 setosa  f4        Width     3.1   0.2
##  9 setosa  f5        Length    5     1.4
## 10 setosa  f5        Width     3.6   0.2
## # ... with 290 more rows

Can we obtain long_iris_2 using only pivot_long?

names_to = c(“.value”, “child”), names_sep = “_“, values_drop_na = TRUE

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = c(".value","measure"), names_sep = "\\.")

## # A tibble: 300 x 5
##    Species flower_ID measure Sepal Petal
##    <fct>   <chr>     <chr>   <dbl> <dbl>
##  1 setosa  f1        Length    5.1   1.4
##  2 setosa  f1        Width     3.5   0.2
##  3 setosa  f2        Length    4.9   1.4
##  4 setosa  f2        Width     3     0.2
##  5 setosa  f3        Length    4.7   1.3
##  6 setosa  f3        Width     3.2   0.2
##  7 setosa  f4        Length    4.6   1.5
##  8 setosa  f4        Width     3.1   0.2
##  9 setosa  f5        Length    5     1.4
## 10 setosa  f5        Width     3.6   0.2
## # ... with 290 more rows

long_iris_3 <- reshape(data = long_iris_2, direction = "long",
  timevar = "part", 
  times = c("Sepal", "Petal"),
  idvar = "flower_ID",
  v.names = "values",
  varying = 3:4,
  new.row.names = 1:600)
tibble(long_iris_3)

## # A tibble: 600 x 5
##    Species measure flower_ID part  values
##    <fct>   <chr>   <chr>     <chr>  <dbl>
##  1 setosa  Length  f1        Sepal    3.5
##  2 setosa  Length  f2        Sepal    3  
##  3 setosa  Length  f3        Sepal    3.2
##  4 setosa  Length  f4        Sepal    3.1
##  5 setosa  Length  f5        Sepal    3.6
##  6 setosa  Length  f6        Sepal    3.9
##  7 setosa  Length  f7        Sepal    3.4
##  8 setosa  Length  f8        Sepal    3.4
##  9 setosa  Length  f9        Sepal    2.9
## 10 setosa  Length  f10       Sepal    3.1
## # ... with 590 more rows

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = c("part","measure"), names_sep = "\\.")

## # A tibble: 600 x 5
##    Species flower_ID part  measure value
##    <fct>   <chr>     <chr> <chr>   <dbl>
##  1 setosa  f1        Sepal Length    5.1
##  2 setosa  f1        Sepal Width     3.5
##  3 setosa  f1        Petal Length    1.4
##  4 setosa  f1        Petal Width     0.2
##  5 setosa  f2        Sepal Length    4.9
##  6 setosa  f2        Sepal Width     3  
##  7 setosa  f2        Petal Length    1.4
##  8 setosa  f2        Petal Width     0.2
##  9 setosa  f3        Sepal Length    4.7
## 10 setosa  f3        Sepal Width     3.2
## # ... with 590 more rows

finally, we can keep the size measures in the wide format

long_iris_4 <- reshape(data = iris, direction = "long",
  timevar = "part", 
  times = c("Sepal", "Petal"), 
  idvar = "flower_ID",
  ids = paste0("f", 1:150),
  v.names = c("Length", "Width"),
  #varying = 1:4,
  varying = list(c(1, 3), c(2, 4)),
  new.row.names = 1:300)
tibble(long_iris_4)

## # A tibble: 300 x 5
##    Species part  Length Width flower_ID
##    <fct>   <chr>  <dbl> <dbl> <chr>    
##  1 setosa  Sepal    5.1   3.5 f1       
##  2 setosa  Sepal    4.9   3   f2       
##  3 setosa  Sepal    4.7   3.2 f3       
##  4 setosa  Sepal    4.6   3.1 f4       
##  5 setosa  Sepal    5     3.6 f5       
##  6 setosa  Sepal    5.4   3.9 f6       
##  7 setosa  Sepal    4.6   3.4 f7       
##  8 setosa  Sepal    5     3.4 f8       
##  9 setosa  Sepal    4.4   2.9 f9       
## 10 setosa  Sepal    4.9   3.1 f10      
## # ... with 290 more rows

iris %>% mutate(flower_ID = str_c("f", 1:150)) %>%
  pivot_longer(-c(Species, flower_ID), names_to = c("part", ".value"), names_sep = "\\.") %>%
  select(1, 3, 4, 5, 2) %>% arrange(desc(part))

## # A tibble: 300 x 5
##    Species part  Length Width flower_ID
##    <fct>   <chr>  <dbl> <dbl> <chr>    
##  1 setosa  Sepal    5.1   3.5 f1       
##  2 setosa  Sepal    4.9   3   f2       
##  3 setosa  Sepal    4.7   3.2 f3       
##  4 setosa  Sepal    4.6   3.1 f4       
##  5 setosa  Sepal    5     3.6 f5       
##  6 setosa  Sepal    5.4   3.9 f6       
##  7 setosa  Sepal    4.6   3.4 f7       
##  8 setosa  Sepal    5     3.4 f8       
##  9 setosa  Sepal    4.4   2.9 f9       
## 10 setosa  Sepal    4.9   3.1 f10      
## # ... with 290 more rows

`reshape` help examples

The three following examples can be found in the reshape function help link

`Indometh` dataset

Let us start with the Indometh dataset.

head(Indometh)

##   Subject time conc
## 1       1 0.25 1.50
## 2       1 0.50 0.94
## 3       1 0.75 0.78
## 4       1 1.00 0.48
## 5       1 1.25 0.37
## 6       1 2.00 0.19

If column time is considered as timevar and Subject as idvar, then

indo_wide <- reshape(data = Indometh, direction = "wide",
  timevar = "time",
  idvar = "Subject")
tibble(indo_wide)

## # A tibble: 6 x 12
##   Subject conc.0.25 conc.0.5 conc.0.75 conc.1 conc.1.25 conc.2 conc.3
##   <ord>       <dbl>    <dbl>     <dbl>  <dbl>     <dbl>  <dbl>  <dbl>
## 1 1            1.5      0.94      0.78   0.48      0.37   0.19   0.12
## 2 2            2.03     1.63      0.71   0.7       0.64   0.36   0.32
## 3 3            2.72     1.49      1.16   0.8       0.8    0.39   0.22
## 4 4            1.85     1.39      1.02   0.89      0.59   0.4    0.16
## 5 5            2.05     1.04      0.81   0.39      0.3    0.23   0.13
## 6 6            2.31     1.44      1.03   0.84      0.64   0.42   0.24
## # ... with 4 more variables: conc.4 <dbl>, conc.5 <dbl>, conc.6 <dbl>,
## #   conc.8 <dbl>

Indometh %>% 
  pivot_wider(names_from = "time", values_from = "conc", names_prefix = "conc.")

## # A tibble: 6 x 12
##   Subject conc.0.25 conc.0.5 conc.0.75 conc.1 conc.1.25 conc.2 conc.3
##   <ord>       <dbl>    <dbl>     <dbl>  <dbl>     <dbl>  <dbl>  <dbl>
## 1 1            1.5      0.94      0.78   0.48      0.37   0.19   0.12
## 2 2            2.03     1.63      0.71   0.7       0.64   0.36   0.32
## 3 3            2.72     1.49      1.16   0.8       0.8    0.39   0.22
## 4 4            1.85     1.39      1.02   0.89      0.59   0.4    0.16
## 5 5            2.05     1.04      0.81   0.39      0.3    0.23   0.13
## 6 6            2.31     1.44      1.03   0.84      0.64   0.42   0.24
## # ... with 4 more variables: conc.4 <dbl>, conc.5 <dbl>, conc.6 <dbl>,
## #   conc.8 <dbl>

If we do the other way around

indo_wide_2 <- reshape(data = Indometh, direction = "wide",
  timevar = "Subject",
  idvar = "time"
  )
tibble(indo_wide_2)

## # A tibble: 11 x 7
##     time conc.1 conc.2 conc.3 conc.4 conc.5 conc.6
##    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1  0.25   1.5    2.03   2.72   1.85   2.05   2.31
##  2  0.5    0.94   1.63   1.49   1.39   1.04   1.44
##  3  0.75   0.78   0.71   1.16   1.02   0.81   1.03
##  4  1      0.48   0.7    0.8    0.89   0.39   0.84
##  5  1.25   0.37   0.64   0.8    0.59   0.3    0.64
##  6  2      0.19   0.36   0.39   0.4    0.23   0.42
##  7  3      0.12   0.32   0.22   0.16   0.13   0.24
##  8  4      0.11   0.2    0.12   0.11   0.11   0.17
##  9  5      0.08   0.25   0.11   0.1    0.08   0.13
## 10  6      0.07   0.12   0.08   0.07   0.1    0.1 
## 11  8      0.05   0.08   0.08   0.07   0.06   0.09

Indometh %>% 
  pivot_wider(names_from = "Subject", values_from = "conc", names_prefix = "conc.")

## # A tibble: 11 x 7
##     time conc.1 conc.2 conc.3 conc.4 conc.5 conc.6
##    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1  0.25   1.5    2.03   2.72   1.85   2.05   2.31
##  2  0.5    0.94   1.63   1.49   1.39   1.04   1.44
##  3  0.75   0.78   0.71   1.16   1.02   0.81   1.03
##  4  1      0.48   0.7    0.8    0.89   0.39   0.84
##  5  1.25   0.37   0.64   0.8    0.59   0.3    0.64
##  6  2      0.19   0.36   0.39   0.4    0.23   0.42
##  7  3      0.12   0.32   0.22   0.16   0.13   0.24
##  8  4      0.11   0.2    0.12   0.11   0.11   0.17
##  9  5      0.08   0.25   0.11   0.1    0.08   0.13
## 10  6      0.07   0.12   0.08   0.07   0.1    0.1 
## 11  8      0.05   0.08   0.08   0.07   0.06   0.09

visit dataframe

df <- data.frame(id = rep(1:4, rep(2,4)),
                 visit = I(rep(c("Before","After"), 4)),
                 x = rnorm(4), y = runif(4))
tibble(df)

## # A tibble: 8 x 4
##      id visit          x      y
##   <int> <I<chr>>   <dbl>  <dbl>
## 1     1 Before    1.30   0.0452
## 2     1 After    -0.843  0.953 
## 3     2 Before   -0.0905 0.722 
## 4     2 After    -1.70   0.0532
## 5     3 Before    1.30   0.0452
## 6     3 After    -0.843  0.953 
## 7     4 Before   -0.0905 0.722 
## 8     4 After    -1.70   0.0532

df_1 <- reshape(data = df
  , direction = "wide"
  , idvar = "visit"
  , timevar = "id")
tibble(df_1)

## # A tibble: 2 x 9
##   visit       x.1    y.1     x.2    y.2    x.3    y.3     x.4    y.4
##   <I<chr>>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
## 1 Before    1.30  0.0452 -0.0905 0.722   1.30  0.0452 -0.0905 0.722 
## 2 After    -0.843 0.953  -1.70   0.0532 -0.843 0.953  -1.70   0.0532

df %>% pivot_wider(names_from = "id", values_from = c("x", "y"), names_sep = ".")

## # A tibble: 2 x 9
##   visit       x.1     x.2    x.3     x.4    y.1    y.2    y.3    y.4
##   <I<chr>>  <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Before    1.30  -0.0905  1.30  -0.0905 0.0452 0.722  0.0452 0.722 
## 2 After    -0.843 -1.70   -0.843 -1.70   0.953  0.0532 0.953  0.0532

We could also expand the visit in columns

df_2 <- reshape(data = df
  , direction = "wide"
  , idvar = "id"
  , timevar = "visit")
tibble(df_2)

## # A tibble: 4 x 5
##      id x.Before y.Before x.After y.After
##   <int>    <dbl>    <dbl>   <dbl>   <dbl>
## 1     1   1.30     0.0452  -0.843  0.953 
## 2     2  -0.0905   0.722   -1.70   0.0532
## 3     3   1.30     0.0452  -0.843  0.953 
## 4     4  -0.0905   0.722   -1.70   0.0532

df %>% pivot_wider(names_from = "visit", values_from = c("x", "y"), names_sep = ".")

## # A tibble: 4 x 5
##      id x.Before x.After y.Before y.After
##   <int>    <dbl>   <dbl>    <dbl>   <dbl>
## 1     1   1.30    -0.843   0.0452  0.953 
## 2     2  -0.0905  -1.70    0.722   0.0532
## 3     3   1.30    -0.843   0.0452  0.953 
## 4     4  -0.0905  -1.70    0.722   0.0532

state.x77

Example using state.x77, matrix with 8 columns with info about the 50 states of USA

str(state.x77)

##  num [1:50, 1:8] 3615 365 2212 2110 21198 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##   ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...

df.x77 <- as.data.frame(state.x77)

str(df.x77)

## 'data.frame':    50 obs. of  8 variables:
##  $ Population: num  3615 365 2212 2110 21198 ...
##  $ Income    : num  3624 6315 4530 3378 5114 ...
##  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
##  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
##  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
##  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
##  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
##  $ Area      : num  50708 566432 113417 51945 156361 ...

To transform it into a Long format data frame

long.x77 <- reshape(data = df.x77 , direction = "long"
  , idvar ="state", ids = row.names(state.x77)
  , timevar = "variable", times = names(df.x77) 
  , v.names = "value", varying = 1:8 #varying = list(names(df.x77))
  , new.row.names = 1:400)
as_tibble(long.x77)

## # A tibble: 400 x 3
##    variable   value state      
##    <chr>      <dbl> <chr>      
##  1 Population  3615 Alabama    
##  2 Population   365 Alaska     
##  3 Population  2212 Arizona    
##  4 Population  2110 Arkansas   
##  5 Population 21198 California 
##  6 Population  2541 Colorado   
##  7 Population  3100 Connecticut
##  8 Population   579 Delaware   
##  9 Population  8277 Florida    
## 10 Population  4931 Georgia    
## # ... with 390 more rows

df.x77 %>% mutate(state = row.names(df.x77)) %>%
  pivot_longer(-state, names_to = "variable", values_to = "value")

## # A tibble: 400 x 3
##    state   variable     value
##    <chr>   <chr>        <dbl>
##  1 Alabama Population  3615  
##  2 Alabama Income      3624  
##  3 Alabama Illiteracy     2.1
##  4 Alabama Life Exp      69.0
##  5 Alabama Murder        15.1
##  6 Alabama HS Grad       41.3
##  7 Alabama Frost         20  
##  8 Alabama Area       50708  
##  9 Alaska  Population   365  
## 10 Alaska  Income      6315  
## # ... with 390 more rows

df.x77 %>% mutate(state = row.names(df.x77)) %>%
  pivot_longer(-state, names_to = "variable", values_to = "value") %>%
  # in case we want exactly the same order
  select(2, 3, 1) %>%
  arrange(factor(variable, levels = names(df.x77)), state)

## # A tibble: 400 x 3
##    variable   value state      
##    <chr>      <dbl> <chr>      
##  1 Population  3615 Alabama    
##  2 Population   365 Alaska     
##  3 Population  2212 Arizona    
##  4 Population  2110 Arkansas   
##  5 Population 21198 California 
##  6 Population  2541 Colorado   
##  7 Population  3100 Connecticut
##  8 Population   579 Delaware   
##  9 Population  8277 Florida    
## 10 Population  4931 Georgia    
## # ... with 390 more rows

Turn into a new wide format. First we do it automatically

tibble(reshape(long.x77, direction = "wide"))

## # A tibble: 50 x 9
##    state Population Income Illiteracy `Life Exp` Murder `HS Grad` Frost
##    <chr>      <dbl>  <dbl>      <dbl>      <dbl>  <dbl>     <dbl> <dbl>
##  1 Alab~       3615   3624        2.1       69.0   15.1      41.3    20
##  2 Alas~        365   6315        1.5       69.3   11.3      66.7   152
##  3 Ariz~       2212   4530        1.8       70.6    7.8      58.1    15
##  4 Arka~       2110   3378        1.9       70.7   10.1      39.9    65
##  5 Cali~      21198   5114        1.1       71.7   10.3      62.6    20
##  6 Colo~       2541   4884        0.7       72.1    6.8      63.9   166
##  7 Conn~       3100   5348        1.1       72.5    3.1      56     139
##  8 Dela~        579   4809        0.9       70.1    6.2      54.6   103
##  9 Flor~       8277   4815        1.3       70.7   10.7      52.6    11
## 10 Geor~       4931   4091        2         68.5   13.9      40.6    60
## # ... with 40 more rows, and 1 more variable: Area <dbl>

Secondly, we can force the wide format as teh long one was the original shape

tibble(
  reshape(data = long.x77, direction = "wide"
  , idvar = "state"
  , timevar = "variable"
  , sep = "_")
)

## # A tibble: 50 x 9
##    state value_Population value_Income value_Illiteracy `value_Life Exp`
##    <chr>            <dbl>        <dbl>            <dbl>            <dbl>
##  1 Alab~             3615         3624              2.1             69.0
##  2 Alas~              365         6315              1.5             69.3
##  3 Ariz~             2212         4530              1.8             70.6
##  4 Arka~             2110         3378              1.9             70.7
##  5 Cali~            21198         5114              1.1             71.7
##  6 Colo~             2541         4884              0.7             72.1
##  7 Conn~             3100         5348              1.1             72.5
##  8 Dela~              579         4809              0.9             70.1
##  9 Flor~             8277         4815              1.3             70.7
## 10 Geor~             4931         4091              2               68.5
## # ... with 40 more rows, and 4 more variables: value_Murder <dbl>,
## #   `value_HS Grad` <dbl>, value_Frost <dbl>, value_Area <dbl>

We can also do the equivalent of a transposition by goin first to long format (long.x77) and later to wide (wide.x77)

wide.x77 <- reshape(data = long.x77
  , direction = "wide"
  , idvar = "variable"
  , timevar = "state"
  , sep = "_")
tibble(wide.x77)

## # A tibble: 8 x 51
##   variable value_Alabama value_Alaska value_Arizona value_Arkansas
##   <chr>            <dbl>        <dbl>         <dbl>          <dbl>
## 1 Populat~        3615          365          2212           2110  
## 2 Income          3624         6315          4530           3378  
## 3 Illiter~           2.1          1.5           1.8            1.9
## 4 Life Exp          69.0         69.3          70.6           70.7
## 5 Murder            15.1         11.3           7.8           10.1
## 6 HS Grad           41.3         66.7          58.1           39.9
## 7 Frost             20          152            15             65  
## 8 Area           50708       566432        113417          51945  
## # ... with 46 more variables: value_California <dbl>,
## #   value_Colorado <dbl>, value_Connecticut <dbl>, value_Delaware <dbl>,
## #   value_Florida <dbl>, value_Georgia <dbl>, value_Hawaii <dbl>,
## #   value_Idaho <dbl>, value_Illinois <dbl>, value_Indiana <dbl>,
## #   value_Iowa <dbl>, value_Kansas <dbl>, value_Kentucky <dbl>,
## #   value_Louisiana <dbl>, value_Maine <dbl>, value_Maryland <dbl>,
## #   value_Massachusetts <dbl>, value_Michigan <dbl>,
## #   value_Minnesota <dbl>, value_Mississippi <dbl>, value_Missouri <dbl>,
## #   value_Montana <dbl>, value_Nebraska <dbl>, value_Nevada <dbl>,
## #   `value_New Hampshire` <dbl>, `value_New Jersey` <dbl>, `value_New
## #   Mexico` <dbl>, `value_New York` <dbl>, `value_North Carolina` <dbl>,
## #   `value_North Dakota` <dbl>, value_Ohio <dbl>, value_Oklahoma <dbl>,
## #   value_Oregon <dbl>, value_Pennsylvania <dbl>, `value_Rhode
## #   Island` <dbl>, `value_South Carolina` <dbl>, `value_South
## #   Dakota` <dbl>, value_Tennessee <dbl>, value_Texas <dbl>,
## #   value_Utah <dbl>, value_Vermont <dbl>, value_Virginia <dbl>,
## #   value_Washington <dbl>, `value_West Virginia` <dbl>,
## #   value_Wisconsin <dbl>, value_Wyoming <dbl>

Example 1

Another example, from link

df3 <- data.frame(id = 1:4, age = c(40,50,60,50), dose1 = c(1,2,1,2),
                  dose2 = c(2,1,2,1), dose4 = c(3,3,3,3))
tibble(df3)

## # A tibble: 4 x 5
##      id   age dose1 dose2 dose4
##   <int> <dbl> <dbl> <dbl> <dbl>
## 1     1    40     1     2     3
## 2     2    50     2     1     3
## 3     3    60     1     2     3
## 4     4    50     2     1     3

reshape(data = df3, direction = "long"
  , idvar = "id"
  , timevar = "dose_type"
  , times = c(1, 2, 4)
  , v.names = "dose"
  , varying = 3:5
  , new.row.names = 1:12
  , sep = "")

##    id age dose_type dose
## 1   1  40         1    1
## 2   2  50         1    2
## 3   3  60         1    1
## 4   4  50         1    2
## 5   1  40         2    2
## 6   2  50         2    1
## 7   3  60         2    2
## 8   4  50         2    1
## 9   1  40         4    3
## 10  2  50         4    3
## 11  3  60         4    3
## 12  4  50         4    3

df3 %>% pivot_longer(-c(id, age), names_to = "dose_type", values_to = "dose",
  names_pattern = "dose(.)", names_ptypes = list(dose_type = numeric()))

## # A tibble: 12 x 4
##       id   age dose_type  dose
##    <int> <dbl>     <dbl> <dbl>
##  1     1    40         1     1
##  2     1    40         2     2
##  3     1    40         4     3
##  4     2    50         1     2
##  5     2    50         2     1
##  6     2    50         4     3
##  7     3    60         1     1
##  8     3    60         2     2
##  9     3    60         4     3
## 10     4    50         1     2
## 11     4    50         2     1
## 12     4    50         4     3

reshape(data = df3, direction = "long"
  , varying = 3:5
  , sep = "")

##     id age time dose
## 1.1  1  40    1    1
## 2.1  2  50    1    2
## 3.1  3  60    1    1
## 4.1  4  50    1    2
## 1.2  1  40    2    2
## 2.2  2  50    2    1
## 3.2  3  60    2    2
## 4.2  4  50    2    1
## 1.4  1  40    4    3
## 2.4  2  50    4    3
## 3.4  3  60    4    3
## 4.4  4  50    4    3

Example 2

One more example, from TRinker’s R Blog

set.seed(10)
dat <- data.frame(id=paste0("ID",1:5), 
    sex=sample(c("male", "female"), 5, replace=TRUE), 
    matrix(rpois(30, 10), 5, 6))
colnames(dat)[-c(1:2)] <- paste0(rep(1:2, times=3), 
    rep(c("work", "home", "church"), 2))
tibble(dat)

## # A tibble: 5 x 8
##   id    sex    `1work` `2home` `1church` `2work` `1home` `2church`
##   <fct> <fct>    <int>   <int>     <int>   <int>   <int>     <int>
## 1 ID1   female       7       8         7      10       6        10
## 2 ID2   male        10      13        10       7      13        15
## 3 ID3   male        11      10         6      10      10         7
## 4 ID4   female       6       8        12       9      15         7
## 5 ID5   male         9      11        15      10      10        12

reshape(dat, direction="long"
  , idvar = "id"
  , timevar = "part_area"
  , times = names(dat)[3:8]
  , v.names = "# pieces"
  , varying = 3:8
  , new.row.names = 1:30)

##     id    sex part_area # pieces
## 1  ID1 female     1work        7
## 2  ID2   male     1work       10
## 3  ID3   male     1work       11
## 4  ID4 female     1work        6
## 5  ID5   male     1work        9
## 6  ID1 female     2home        8
## 7  ID2   male     2home       13
## 8  ID3   male     2home       10
## 9  ID4 female     2home        8
## 10 ID5   male     2home       11
## 11 ID1 female   1church        7
## 12 ID2   male   1church       10
## 13 ID3   male   1church        6
## 14 ID4 female   1church       12
## 15 ID5   male   1church       15
## 16 ID1 female     2work       10
## 17 ID2   male     2work        7
## 18 ID3   male     2work       10
## 19 ID4 female     2work        9
## 20 ID5   male     2work       10
## 21 ID1 female     1home        6
## 22 ID2   male     1home       13
## 23 ID3   male     1home       10
## 24 ID4 female     1home       15
## 25 ID5   male     1home       10
## 26 ID1 female   2church       10
## 27 ID2   male   2church       15
## 28 ID3   male   2church        7
## 29 ID4 female   2church        7
## 30 ID5   male   2church       12

dat %>% pivot_longer(-(1:2), names_to = "part_area", values_to = "# pieces")

## # A tibble: 30 x 4
##    id    sex    part_area `# pieces`
##    <fct> <fct>  <chr>          <int>
##  1 ID1   female 1work              7
##  2 ID1   female 2home              8
##  3 ID1   female 1church            7
##  4 ID1   female 2work             10
##  5 ID1   female 1home              6
##  6 ID1   female 2church           10
##  7 ID2   male   1work             10
##  8 ID2   male   2home             13
##  9 ID2   male   1church           10
## 10 ID2   male   2work              7
## # ... with 20 more rows

reshape(data = dat, direction = "long",
  varying = list(c(3, 7, 5), c(6, 4, 8)),
  idvar = "id",
  v.names = c("TIME_1", "TIME_2"),
  timevar = "PLACE",
  times = c("wrk", "hom", "chr"),
  new.row.names = 1:15)

##     id    sex PLACE TIME_1 TIME_2
## 1  ID1 female   wrk      7     10
## 2  ID2   male   wrk     10      7
## 3  ID3   male   wrk     11     10
## 4  ID4 female   wrk      6      9
## 5  ID5   male   wrk      9     10
## 6  ID1 female   hom      6      8
## 7  ID2   male   hom     13     13
## 8  ID3   male   hom     10     10
## 9  ID4 female   hom     15      8
## 10 ID5   male   hom     10     11
## 11 ID1 female   chr      7     10
## 12 ID2   male   chr     10     15
## 13 ID3   male   chr      6      7
## 14 ID4 female   chr     12      7
## 15 ID5   male   chr     15     12

dat %>% pivot_longer(-(1:2), names_to = "part_area", values_to = "# pieces") %>%
  mutate(part = as.numeric(str_sub(part_area, 1, 1)), area = str_extract(part_area, "\\D+")) %>%
  mutate(PLACE = factor(area, levels = c("work", "home", "church"), labels = c("wrk", "hom", "chr"))) %>%
  select(-c(part_area, area)) %>%
  pivot_wider(names_from = part, values_from = "# pieces", names_prefix = "TIME_")

## # A tibble: 15 x 5
##    id    sex    PLACE TIME_1 TIME_2
##    <fct> <fct>  <fct>  <int>  <int>
##  1 ID1   female wrk        7     10
##  2 ID1   female hom        6      8
##  3 ID1   female chr        7     10
##  4 ID2   male   wrk       10      7
##  5 ID2   male   hom       13     13
##  6 ID2   male   chr       10     15
##  7 ID3   male   wrk       11     10
##  8 ID3   male   hom       10     10
##  9 ID3   male   chr        6      7
## 10 ID4   female wrk        6      9
## 11 ID4   female hom       15      8
## 12 ID4   female chr       12      7
## 13 ID5   male   wrk        9     10
## 14 ID5   male   hom       10     11
## 15 ID5   male   chr       15     12

reshape(data = dat, direction = "long",
  timevar = "TIME",
  times = 1:2,
  varying = list(c(3, 6), c(7, 4), c(5,8)),
  v.names = c("WORK", "HOME", "CHURCH"),
  new.row.names = 1:10)

##     id    sex TIME WORK HOME CHURCH
## 1  ID1 female    1    7    6      7
## 2  ID2   male    1   10   13     10
## 3  ID3   male    1   11   10      6
## 4  ID4 female    1    6   15     12
## 5  ID5   male    1    9   10     15
## 6  ID1 female    2   10    8     10
## 7  ID2   male    2    7   13     15
## 8  ID3   male    2   10   10      7
## 9  ID4 female    2    9    8      7
## 10 ID5   male    2   10   11     12

dat %>% pivot_longer(-(1:2), names_to = c("TIME", ".value"), names_pattern = "(\\d)(\\D+)")

## # A tibble: 10 x 6
##    id    sex    TIME   work  home church
##    <fct> <fct>  <chr> <int> <int>  <int>
##  1 ID1   female 1         7     6      7
##  2 ID1   female 2        10     8     10
##  3 ID2   male   1        10    13     10
##  4 ID2   male   2         7    13     15
##  5 ID3   male   1        11    10      6
##  6 ID3   male   2        10    10      7
##  7 ID4   female 1         6    15     12
##  8 ID4   female 2         9     8      7
##  9 ID5   male   1         9    10     15
## 10 ID5   male   2        10    11     12

wide_1 <- reshape(data = dat, direction = "long",
  timevar = "TIME",
  times = 1:2,
  varying = list(c(3, 6), c(7, 4), c(5,8)),
  v.names = c("WORK", "HOME", "CHURCH"),
  new.row.names = 1:10)

reshape(data = wide_1, direction = "long",
  timevar = "PLACE",
  times = names(wide_1)[4:6], #c("work", "home", "church")
  varying = 4:6, #c("WORK", "HOME", "CHURCH"),
  v.names = "# people",
  new.row.names = 1:30)

##     id    sex TIME  PLACE # people
## 1  ID1 female    1   WORK        7
## 2  ID2   male    1   WORK       10
## 3  ID3   male    1   WORK       11
## 4  ID4 female    1   WORK        6
## 5  ID5   male    1   WORK        9
## 6  ID1 female    2   WORK       10
## 7  ID2   male    2   WORK        7
## 8  ID3   male    2   WORK       10
## 9  ID4 female    2   WORK        9
## 10 ID5   male    2   WORK       10
## 11 ID1 female    1   HOME        6
## 12 ID2   male    1   HOME       13
## 13 ID3   male    1   HOME       10
## 14 ID4 female    1   HOME       15
## 15 ID5   male    1   HOME       10
## 16 ID1 female    2   HOME        8
## 17 ID2   male    2   HOME       13
## 18 ID3   male    2   HOME       10
## 19 ID4 female    2   HOME        8
## 20 ID5   male    2   HOME       11
## 21 ID1 female    1 CHURCH        7
## 22 ID2   male    1 CHURCH       10
## 23 ID3   male    1 CHURCH        6
## 24 ID4 female    1 CHURCH       12
## 25 ID5   male    1 CHURCH       15
## 26 ID1 female    2 CHURCH       10
## 27 ID2   male    2 CHURCH       15
## 28 ID3   male    2 CHURCH        7
## 29 ID4 female    2 CHURCH        7
## 30 ID5   male    2 CHURCH       12

dat %>% pivot_longer(-(1:2), names_to = "part_area", values_to = "# people") %>%
  mutate(TIME = as.numeric(str_sub(part_area, 1, 1)), PLACE = str_extract(part_area, "\\D+")) %>%
  mutate(PLACE = str_to_upper(PLACE)) %>%
  select(-part_area) %>% select(1:2, 4:5, 3) %>%
  arrange(desc(PLACE), TIME)

## # A tibble: 30 x 5
##    id    sex     TIME PLACE `# people`
##    <fct> <fct>  <dbl> <chr>      <int>
##  1 ID1   female     1 WORK           7
##  2 ID2   male       1 WORK          10
##  3 ID3   male       1 WORK          11
##  4 ID4   female     1 WORK           6
##  5 ID5   male       1 WORK           9
##  6 ID1   female     2 WORK          10
##  7 ID2   male       2 WORK           7
##  8 ID3   male       2 WORK          10
##  9 ID4   female     2 WORK           9
## 10 ID5   male       2 WORK          10
## # ... with 20 more rows

`pivot_wider` and `pivot_longer` as alternative to `reshape` base R function

Pep Porrà

May 4, 2020

Goal

Wider formats

`JohnsonJohnson` dataset

`HairEyeColor` dataset

Longer formats

first example

`Iris` dataset

`reshape` help examples

`Indometh` dataset

visit dataframe

state.x77

Example 1

Example 2

pivot_wider and pivot_longer as alternative to reshape base R function

Pep Porrà

May 4, 2020

Goal

Wider formats

JohnsonJohnson dataset

HairEyeColor dataset

Longer formats

first example

Iris dataset

reshape help examples

Indometh dataset

visit dataframe

state.x77

Example 1

Example 2

`pivot_wider` and `pivot_longer` as alternative to `reshape` base R function

`JohnsonJohnson` dataset

`HairEyeColor` dataset

`Iris` dataset

`reshape` help examples

`Indometh` dataset