Reshaping Data (pivot)

From Ewen Harrison - Riinu Pius - The University of Edinburgh

[https://media.ed.ac.uk/media/HealthyRA+Reshaping+Data+%28pivot%29/1_wmix0222]

Day 05 of HealthyR demo

Reshaping Data (pivot)

Apis mellifera

Honey bee (Apis mellifera) colony loss is a widespread phenomenon with important economic and biological implications, whose drivers are still an open matter of investigation.

Data showing percentage of bee colonies impacted by various stressors over time.


La pérdida de colonias de abejas (Apis mellifera) es un fenómeno generalizado con importantes implicaciones económicas y biológicas, cuyas causas siguen siendo un tema abierto de investigación.

Datos que muestran el porcentaje de colonias de abejas afectadas por diversos factores estresantes a lo largo del tiempo.

[https://www.nature.com/articles/s41598-022-24946-4]

[Apis mellifera.Crédito: John Quine (con licencia CC BY-NC-SA 2.0)]

library(tidyverse)

# Load the datasets - Buscamos el datasets 
# Datasets from Tidy Tuesday - desde Tidy Tuesday
colony   <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/colony.csv')
colony
## # A tibble: 1,222 × 10
##     year months        state     colony_n colony_max colony_lost colony_lost_pct
##    <dbl> <chr>         <chr>        <dbl>      <dbl>       <dbl>           <dbl>
##  1  2015 January-March Alabama       7000       7000        1800              26
##  2  2015 January-March Arizona      35000      35000        4600              13
##  3  2015 January-March Arkansas     13000      14000        1500              11
##  4  2015 January-March Californ…  1440000    1690000      255000              15
##  5  2015 January-March Colorado      3500      12500        1500              12
##  6  2015 January-March Connecti…     3900       3900         870              22
##  7  2015 January-March Florida     305000     315000       42000              13
##  8  2015 January-March Georgia     104000     105000       14500              14
##  9  2015 January-March Hawaii       10500      10500         380               4
## 10  2015 January-March Idaho        81000      88000        3700               4
## # ℹ 1,212 more rows
## # ℹ 3 more variables: colony_added <dbl>, colony_reno <dbl>,
## #   colony_reno_pct <dbl>
stressor <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/stressor.csv')

stressor
## # A tibble: 7,332 × 5
##     year months        state   stressor              stress_pct
##    <dbl> <chr>         <chr>   <chr>                      <dbl>
##  1  2015 January-March Alabama Varroa mites                10  
##  2  2015 January-March Alabama Other pests/parasites        5.4
##  3  2015 January-March Alabama Disesases                   NA  
##  4  2015 January-March Alabama Pesticides                   2.2
##  5  2015 January-March Alabama Other                        9.1
##  6  2015 January-March Alabama Unknown                      9.4
##  7  2015 January-March Arizona Varroa mites                26.9
##  8  2015 January-March Arizona Other pests/parasites       20.5
##  9  2015 January-March Arizona Disesases                    0.1
## 10  2015 January-March Arizona Pesticides                  NA  
## # ℹ 7,322 more rows

pivot_wider()

pivot_wider() is the opposite of pivot_longer(): it widens a data set by increasing the number of columns and decreasing the number of rows.

pivot_wider() es lo opuesto a pivot_longer(): amplía un conjunto de datos al aumentar el número de columnas y disminuir el número de filas.

# Start with wide dataset for this demo
# Data showing percentage of bee colonies impacted by various stressors over time - Datos que muestran el porcentaje de colonias de abejas afectadas por diversos factores estresantes a lo largo del tiempo.

wide_stressor = stressor %>% 
                pivot_wider(
                            names_from  = months, 
                            values_from = stress_pct)
wide_stressor 
## # A tibble: 1,974 × 7
##     year state   stressor          `January-March` `April-June` `July-September`
##    <dbl> <chr>   <chr>                       <dbl>        <dbl>            <dbl>
##  1  2015 Alabama Varroa mites                 10           16.7             63.1
##  2  2015 Alabama Other pests/para…             5.4         42.5             70.6
##  3  2015 Alabama Disesases                    NA           NA               NA  
##  4  2015 Alabama Pesticides                    2.2          2.3              2.6
##  5  2015 Alabama Other                         9.1          3.2              2.2
##  6  2015 Alabama Unknown                       9.4          4.1             17.7
##  7  2015 Arizona Varroa mites                 26.9          8.4             53.5
##  8  2015 Arizona Other pests/para…            20.5         32.1             24.8
##  9  2015 Arizona Disesases                     0.1          0.5              5.4
## 10  2015 Arizona Pesticides                   NA           20.1             17.3
## # ℹ 1,964 more rows
## # ℹ 1 more variable: `October-December` <dbl>

Make data long

Pivot_longer

Pivot_longer() is commonly needed to sort captured raw data sets.

  • The first argument is the data set to reshape.
  • The second argument describes which columns need to be reformed.
  • The names_to gives the name of the variable to be created from the data stored in the column names.
  • The values_to gives the name of the variable that will be created from the data stored in the cell value.

Pivot_longer() se necesita comúnmente para ordenar los conjuntos de datos capturados en crudo.

  • El primer argumento es el conjunto de datos para remodelar.
  • El segundo argumento describe qué columnas necesitan ser reformadas.
  • El names_to da el nombre de la variable que se creará a partir de los datos almacenados en los nombres de columna.
  • Los values_to dan el nombre de la variable que se creará a partir de los datos almacenados en el valor de la celda.
long_stressor = wide_stressor %>% 
                pivot_longer(4:7,
                             names_to = "months", 
                             values_to = "stress_pct")
long_stressor
## # A tibble: 7,896 × 5
##     year state   stressor              months           stress_pct
##    <dbl> <chr>   <chr>                 <chr>                 <dbl>
##  1  2015 Alabama Varroa mites          January-March          10  
##  2  2015 Alabama Varroa mites          April-June             16.7
##  3  2015 Alabama Varroa mites          July-September         63.1
##  4  2015 Alabama Varroa mites          October-December        3.1
##  5  2015 Alabama Other pests/parasites January-March           5.4
##  6  2015 Alabama Other pests/parasites April-June             42.5
##  7  2015 Alabama Other pests/parasites July-September         70.6
##  8  2015 Alabama Other pests/parasites October-December        6.4
##  9  2015 Alabama Disesases             January-March          NA  
## 10  2015 Alabama Disesases             April-June             NA  
## # ℹ 7,886 more rows
long_stressor = wide_stressor %>% 
                pivot_longer(c("January-March", "April-June","July-September", "October-December"), 
                             names_to  = "months", 
                             values_to = "stress_pct")
long_stressor
## # A tibble: 7,896 × 5
##     year state   stressor              months           stress_pct
##    <dbl> <chr>   <chr>                 <chr>                 <dbl>
##  1  2015 Alabama Varroa mites          January-March          10  
##  2  2015 Alabama Varroa mites          April-June             16.7
##  3  2015 Alabama Varroa mites          July-September         63.1
##  4  2015 Alabama Varroa mites          October-December        3.1
##  5  2015 Alabama Other pests/parasites January-March           5.4
##  6  2015 Alabama Other pests/parasites April-June             42.5
##  7  2015 Alabama Other pests/parasites July-September         70.6
##  8  2015 Alabama Other pests/parasites October-December        6.4
##  9  2015 Alabama Disesases             January-March          NA  
## 10  2015 Alabama Disesases             April-June             NA  
## # ℹ 7,886 more rows
long_stressor = wide_stressor %>% 
                 pivot_longer(all_of(c("January-March", "April-June","July-September","October-December")), 
                              names_to  = "months", 
                              values_to = "stress_pct")

long_stressor
## # A tibble: 7,896 × 5
##     year state   stressor              months           stress_pct
##    <dbl> <chr>   <chr>                 <chr>                 <dbl>
##  1  2015 Alabama Varroa mites          January-March          10  
##  2  2015 Alabama Varroa mites          April-June             16.7
##  3  2015 Alabama Varroa mites          July-September         63.1
##  4  2015 Alabama Varroa mites          October-December        3.1
##  5  2015 Alabama Other pests/parasites January-March           5.4
##  6  2015 Alabama Other pests/parasites April-June             42.5
##  7  2015 Alabama Other pests/parasites July-September         70.6
##  8  2015 Alabama Other pests/parasites October-December        6.4
##  9  2015 Alabama Disesases             January-March          NA  
## 10  2015 Alabama Disesases             April-June             NA  
## # ℹ 7,886 more rows
long_stressor = wide_stressor %>% 
                pivot_longer(contains("-"),
                             names_to  = "months", 
                             values_to = "stress_pct")
long_stressor 
## # A tibble: 7,896 × 5
##     year state   stressor              months           stress_pct
##    <dbl> <chr>   <chr>                 <chr>                 <dbl>
##  1  2015 Alabama Varroa mites          January-March          10  
##  2  2015 Alabama Varroa mites          April-June             16.7
##  3  2015 Alabama Varroa mites          July-September         63.1
##  4  2015 Alabama Varroa mites          October-December        3.1
##  5  2015 Alabama Other pests/parasites January-March           5.4
##  6  2015 Alabama Other pests/parasites April-June             42.5
##  7  2015 Alabama Other pests/parasites July-September         70.6
##  8  2015 Alabama Other pests/parasites October-December        6.4
##  9  2015 Alabama Disesases             January-March          NA  
## 10  2015 Alabama Disesases             April-June             NA  
## # ℹ 7,886 more rows

Explore / summarise data - Explorar / resumir datos

# For each stressor average percentage of columns affected per year
# # Para cada factor estresante, porcentaje promedio de columnas afectadas por año
year_stressor_pct = long_stressor %>% 
         group_by(year, stressor) %>% 
  summarise(pct_mean = mean(stress_pct, na.rm = TRUE))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
year_stressor_pct
## # A tibble: 42 × 3
## # Groups:   year [7]
##     year stressor              pct_mean
##    <dbl> <chr>                    <dbl>
##  1  2015 Disesases                 3.67
##  2  2015 Other                     6.37
##  3  2015 Other pests/parasites    11.3 
##  4  2015 Pesticides                7.69
##  5  2015 Unknown                   4.25
##  6  2015 Varroa mites             27.2 
##  7  2016 Disesases                 4.24
##  8  2016 Other                     5.81
##  9  2016 Other pests/parasites    10.4 
## 10  2016 Pesticides                7.26
## # ℹ 32 more rows

Pivot wider

year_stressor_wide = year_stressor_pct %>% 
                     pivot_wider(
                                 names_from  = "stressor", 
                                 values_from = "pct_mean")

year_stressor_wide
## # A tibble: 7 × 7
## # Groups:   year [7]
##    year Disesases Other Other pests/parasite…¹ Pesticides Unknown `Varroa mites`
##   <dbl>     <dbl> <dbl>                  <dbl>      <dbl>   <dbl>          <dbl>
## 1  2015      3.67  6.37                   11.3       7.69    4.25           27.2
## 2  2016      4.24  5.81                   10.4       7.26    4.47           29.4
## 3  2017      5.76  6.63                   11.5       7.96    4.44           33.0
## 4  2018      4.28  7.78                   14.7       8.77    4.78           34.9
## 5  2019      4.38  6.48                   12.7       9.22    4.19           31.7
## 6  2020      3.72  5.13                   10.4       6.29    4.63           28.9
## 7  2021      3.34  5.49                   11.2       5.62    4.08           28.2
## # ℹ abbreviated name: ¹​`Other pests/parasites`