1 引言

数据集有长数据宽数据之分。原生数据一般为宽数据,以其适合录入、查看、比较,这类数据的基本特点是:每一个观测(observation)占一行,每一个测量值(measurement)占一列。这时并不是每一列都是变量(variable),也就是说一个变量可能分解占位在不同的列。而长数据则是各类分析软件需要的格式,这类数据的特点是:每一个独特的变量集中在某列,观测位于一行。以下为各种类型数据示例:

tidyr::table4a
country 1999 2000
Afghanistan 745 2666
Brazil 37737 80488
China 212258 213766
tidyr::table4a %>% gather(2:3, key = "year", value = cases)
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766
tidyr::table5 %>% mutate(year = str_c(century, year)) %>% select(-2)
country year rate
Afghanistan 1999 745/19987071
Afghanistan 2000 2666/20595360
Brazil 1999 37737/172006362
Brazil 2000 80488/174504898
China 1999 212258/1272915272
China 2000 213766/1280428583

需要说明的是,判断数据是长还是宽数据,这取决于您所分析数据集中的变量。在矩形数据集(Rectangle Data)不是所有的列名称都是变量。是否是变量取决于您分析和计算的目的。也就是说在你本次分析和计算过程中,同一变量在同一列,观测(observation)在各自的行,才是真正的tidy data。

The ambiguity comes from the definition of tidy data. Tidiness depends on the variables in your data set. But what is a variable depends on what you are trying to do. To identify the variables that you need to work with, describe what you want to do with an equation. Each variable in the equation should correspond to a variable in your data. Primer

1.1 基本概念

1.1.1 变量

变量是可以测量的一种数量、属性或特征。 A variable is a quantity, quality, or property that you can measure.

1.1.2 变量值

变量值是测量变量时的状态。 A value is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.

1.1.3 观测或案例

观测是为了同一个目的同时测量的一组测量结果。 An observation or case is a set of measurements made under similar conditions (you usually make all of the measurements in an observation at the same time and on the same object). An observation will contain several values, each associated with a different variable. I’ll sometimes refer to an observation as a case or data point.

1.1.4 宽数据(Wide Data)

病例一览表(Cases list)是公共卫生领域最常见的宽数据。

suppressPackageStartupMessages(library(tidyverse)) 
data(Oswego, package = "epiDisplay")
wide_data <- Oswego %>% 
  as_tibble() %>% 
  slice_sample(n = 5)
wide_data 
age sex timesupper ill onsetdate onsettime bakedham spinach mashedpota cabbagesal jello rolls brownbread milk coffee water cakes vanilla chocolate fruitsalad
62 F NA TRUE 04/19 30 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
12 F 2200 TRUE 04/19 100 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
13 F NA FALSE NA NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
25 M NA TRUE 04/18 2300 TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
48 F NA TRUE 04/18 2400 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
# %>% 
  # flextable::flextable() 

1.1.5 杂乱数据(Untidy Data)

“Tidy data sets are all alike; but every messy data set is messy in its own way.” — Hadley Wickham

  • 窄数据是指用每列存储多个变量或变量值的数据
    • Narrow data uses a literal key column and a literal value column to store multiple variables.
who %>% 
  slice_sample(n = 10)
country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544 new_sp_m4554 new_sp_m5564 new_sp_m65 new_sp_f014 new_sp_f1524 new_sp_f2534 new_sp_f3544 new_sp_f4554 new_sp_f5564 new_sp_f65 new_sn_m014 new_sn_m1524 new_sn_m2534 new_sn_m3544 new_sn_m4554 new_sn_m5564 new_sn_m65 new_sn_f014 new_sn_f1524 new_sn_f2534 new_sn_f3544 new_sn_f4554 new_sn_f5564 new_sn_f65 new_ep_m014 new_ep_m1524 new_ep_m2534 new_ep_m3544 new_ep_m4554 new_ep_m5564 new_ep_m65 new_ep_f014 new_ep_f1524 new_ep_f2534 new_ep_f3544 new_ep_f4554 new_ep_f5564 new_ep_f65 newrel_m014 newrel_m1524 newrel_m2534 newrel_m3544 newrel_m4554 newrel_m5564 newrel_m65 newrel_f014 newrel_f1524 newrel_f2534 newrel_f3544 newrel_f4554 newrel_f5564 newrel_f65
South Africa ZA ZAF 1981 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Sudan SD SDN 2004 537 1377 1791 1465 1035 697 467 426 978 1187 897 601 400 237 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Malaysia MY MYS 1987 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Aruba AW ABW 2004 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
United Arab Emirates AE ARE 2012 0 2 4 4 5 5 2 0 5 2 2 3 4 4 0 0 3 1 1 1 2 1 4 0 0 1 0 1 2 1 3 3 0 0 2 0 4 2 1 1 0 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Serbia RS SRB 2011 2 60 73 74 122 112 101 5 46 59 43 30 20 129 6 11 25 23 53 49 62 4 16 13 18 21 25 70 1 4 10 6 15 18 19 1 9 9 13 9 9 32 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Sweden SE SWE 2000 0 9 10 12 11 4 25 1 9 8 10 2 2 15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Aruba AW ABW 2010 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Montserrat MS MSR 1986 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
United Arab Emirates AE ARE 2002 1 2 0 6 6 10 0 3 3 8 3 4 10 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

1.1.6 整洁数据(Tidy Data)

整洁数据,需要满足以下条件:

  1. 每个变量分布在唯一列;
  2. 每个观测分布在行;
  3. 每一个值分布在单独的单元格。
  • Each variable is in its own column
  • Each observation is in its own row
  • Each value is in its own cell (this follows from #1 and #2)

2 宽数据变长数据

2.1 列名含单变量(One variable in column names)

原始数据

table4a
country 1999 2000
Afghanistan 745 2666
Brazil 37737 80488
China 212258 213766

2.1.1 stats::reshape

table4a %>% 
  as.data.frame() %>% 
stats::reshape( 
        idvar = "country", 
        timevar = "year",
        varying = list(2:3),
        v.names = "cases", 
        direction = "long")
country year cases
Afghanistan.1 Afghanistan 1 745
Brazil.1 Brazil 1 37737
China.1 China 1 212258
Afghanistan.2 Afghanistan 2 2666
Brazil.2 Brazil 2 80488
China.2 China 2 213766

2.1.2 reshape2::melt

table4a %>% 
  reshape2::melt(
    # id.vars = "country",
    id = 1,
    variable.name = "year",
    value.name = "cases"
  )
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766

2.1.3 data.table::melt

table4a %>% 
  data.table::melt(
    id = 1,
    variable.name = "year",
    value.name = "cases"
  )
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766

2.1.4 tidyr::gather

table4a %>% 
  gather(-country, key = "year", value = "cases")
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766

2.1.5 tidyr::pivot_longer

table4a %>% 
  pivot_longer(-country,
               names_to = "year",
               values_to = "cases")
country year cases
Afghanistan 1999 745
Afghanistan 2000 2666
Brazil 1999 37737
Brazil 2000 80488
China 1999 212258
China 2000 213766

2.2 列名含多变量(Many variables in column namese)

原始数据

who_subset <- who %>% 
  slice_sample(n = 10)
who_subset
country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544 new_sp_m4554 new_sp_m5564 new_sp_m65 new_sp_f014 new_sp_f1524 new_sp_f2534 new_sp_f3544 new_sp_f4554 new_sp_f5564 new_sp_f65 new_sn_m014 new_sn_m1524 new_sn_m2534 new_sn_m3544 new_sn_m4554 new_sn_m5564 new_sn_m65 new_sn_f014 new_sn_f1524 new_sn_f2534 new_sn_f3544 new_sn_f4554 new_sn_f5564 new_sn_f65 new_ep_m014 new_ep_m1524 new_ep_m2534 new_ep_m3544 new_ep_m4554 new_ep_m5564 new_ep_m65 new_ep_f014 new_ep_f1524 new_ep_f2534 new_ep_f3544 new_ep_f4554 new_ep_f5564 new_ep_f65 newrel_m014 newrel_m1524 newrel_m2534 newrel_m3544 newrel_m4554 newrel_m5564 newrel_m65 newrel_f014 newrel_f1524 newrel_f2534 newrel_f3544 newrel_f4554 newrel_f5564 newrel_f65
Japan JP JPN 2011 0 96 215 367 465 812 3256 5 94 213 203 148 223 1840 18 240 436 501 479 748 2651 35 227 435 353 270 307 1531 17 54 112 171 169 268 1734 5 43 121 162 132 248 1590 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Algeria DZ DZA 2010 52 1203 1669 825 513 392 397 79 1086 826 417 251 222 367 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Argentina AR ARG 2006 67 519 484 360 351 346 321 74 438 437 235 197 173 213 394 196 235 143 153 176 187 345 187 185 119 75 90 127 32 111 122 94 92 76 91 36 87 91 69 59 46 75 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Japan JP JPN 2002 2 191 549 579 1192 1334 3747 3 192 395 259 248 308 1808 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Kazakhstan KZ KAZ 1983 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Northern Mariana Islands MP MNP 2012 0 0 0 3 1 1 0 0 0 3 1 0 0 0 0 1 1 6 3 2 2 0 0 4 3 1 0 1 0 0 0 0 1 2 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Tunisia TN TUN 2000 16 139 208 156 109 65 101 7 68 59 43 21 21 58 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Malaysia MY MYS 1989 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Romania RO ROU 1987 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Turks and Caicos Islands TC TCA 1996 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

2.2.1 tidyr::gather

who_subset %>% 
  gather(
    new_sp_m014:newrel_f65,
    key = "variable",
    value = "count"
  )%>% 
  dplyr::slice_head(n = 10)
country iso2 iso3 year variable count
Japan JP JPN 2011 new_sp_m014 0
Algeria DZ DZA 2010 new_sp_m014 52
Argentina AR ARG 2006 new_sp_m014 67
Japan JP JPN 2002 new_sp_m014 2
Kazakhstan KZ KAZ 1983 new_sp_m014 NA
Northern Mariana Islands MP MNP 2012 new_sp_m014 0
Tunisia TN TUN 2000 new_sp_m014 16
Malaysia MY MYS 1989 new_sp_m014 NA
Romania RO ROU 1987 new_sp_m014 NA
Turks and Caicos Islands TC TCA 1996 new_sp_m014 NA

2.2.2 tidyr::pivot_longer

who_subset %>% 
  pivot_longer(
    cols = new_sp_m014:newrel_f65,
    names_to = c("diagnosis", "gender", "age"), 
    names_pattern = "new_?(.*)_(.)(.*)",
    values_to = "count"
  ) %>% 
  dplyr::slice_head(n = 10)
country iso2 iso3 year diagnosis gender age count
Japan JP JPN 2011 sp m 014 0
Japan JP JPN 2011 sp m 1524 96
Japan JP JPN 2011 sp m 2534 215
Japan JP JPN 2011 sp m 3544 367
Japan JP JPN 2011 sp m 4554 465
Japan JP JPN 2011 sp m 5564 812
Japan JP JPN 2011 sp m 65 3256
Japan JP JPN 2011 sp f 014 5
Japan JP JPN 2011 sp f 1524 94
Japan JP JPN 2011 sp f 2534 213
who_subset %>% pivot_longer(
  cols = new_sp_m014:newrel_f65,
  names_to = c("diagnosis", "gender", "age"), 
  names_pattern = "new_?(.*)_(.)(.*)",
  names_transform = list(
    gender = ~ readr::parse_factor(.x, levels = c("f", "m")),
    age = ~ readr::parse_factor(
      .x,
      levels = c("014", "1524", "2534", "3544", "4554", "5564", "65"), 
      ordered = TRUE
    )
  ),
  values_to = "count",
)%>% 
  dplyr::slice_head(n = 10)
country iso2 iso3 year diagnosis gender age count
Japan JP JPN 2011 sp m 014 0
Japan JP JPN 2011 sp m 1524 96
Japan JP JPN 2011 sp m 2534 215
Japan JP JPN 2011 sp m 3544 367
Japan JP JPN 2011 sp m 4554 465
Japan JP JPN 2011 sp m 5564 812
Japan JP JPN 2011 sp m 65 3256
Japan JP JPN 2011 sp f 014 5
Japan JP JPN 2011 sp f 1524 94
Japan JP JPN 2011 sp f 2534 213

2.3 每行含多个观测(Multiple observations per row)

原始数据

s2 <- "family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1         30 1998-11-26 2000-01-29         NA             1             2            NA
2         27 1996-06-22         NA         NA             2            NA            NA
3         26 2002-07-11 2004-04-05 2007-09-02             2             2             1
4         32 2004-10-10 2009-08-27 2012-07-21             1             1             1
5         29 2000-12-05 2005-02-28         NA             2             1            NA"
DT <- data.table::fread(s2)
DT %>% knitr::kable()
family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1 30 1998-11-26 2000-01-29 NA 1 2 NA
2 27 1996-06-22 NA NA 2 NA NA
3 26 2002-07-11 2004-04-05 2007-09-02 2 2 1
4 32 2004-10-10 2009-08-27 2012-07-21 1 1 1
5 29 2000-12-05 2005-02-28 NA 2 1 NA

2.3.1 data.table::melt

library(data.table)
DT %>% 
  data.table::melt(
    measure = patterns("^dob", "^gender"),
    variable.name = "child",
    value.name = c("dob", "gender")
    )
family_id age_mother child dob gender
1 30 1 1998-11-26 1
2 27 1 1996-06-22 2
3 26 1 2002-07-11 2
4 32 1 2004-10-10 1
5 29 1 2000-12-05 2
1 30 2 2000-01-29 2
2 27 2 NA NA
3 26 2 2004-04-05 2
4 32 2 2009-08-27 1
5 29 2 2005-02-28 1
1 30 3 NA NA
2 27 3 NA NA
3 26 3 2007-09-02 1
4 32 3 2012-07-21 1
5 29 3 NA NA

2.3.2 tidyr::pivot_longer

DT %>% 
  pivot_longer(
    cols = -1:-2,
    names_to = c(".value", "child"),
    names_sep = "_"
  )
family_id age_mother child dob gender
1 30 child1 1998-11-26 1
1 30 child2 2000-01-29 2
1 30 child3 NA NA
2 27 child1 1996-06-22 2
2 27 child2 NA NA
2 27 child3 NA NA
3 26 child1 2002-07-11 2
3 26 child2 2004-04-05 2
3 26 child3 2007-09-02 1
4 32 child1 2004-10-10 1
4 32 child2 2009-08-27 1
4 32 child3 2012-07-21 1
5 29 child1 2000-12-05 2
5 29 child2 2005-02-28 1
5 29 child3 NA NA
DT %>% 
  pivot_longer(
    cols = -1:-2,
    names_to = c(".value", "child"),
    names_pattern = '(.*)_.*(.)'
  )
family_id age_mother child dob gender
1 30 1 1998-11-26 1
1 30 2 2000-01-29 2
1 30 3 NA NA
2 27 1 1996-06-22 2
2 27 2 NA NA
2 27 3 NA NA
3 26 1 2002-07-11 2
3 26 2 2004-04-05 2
3 26 3 2007-09-02 1
4 32 1 2004-10-10 1
4 32 2 2009-08-27 1
4 32 3 2012-07-21 1
5 29 1 2000-12-05 2
5 29 2 2005-02-28 1
5 29 3 NA NA
df <- tibble(id = 1:3, x1 = 4:6, x2 = 5:7, y1 = 7:9, y2 = 10:12)
df
id x1 x2 y1 y2
1 4 5 7 10
2 5 6 8 11
3 6 7 9 12
df %>% pivot_longer(!id, names_to = ".value", names_pattern = "(.).")
id x y
1 4 7
1 5 10
2 5 8
2 6 11
3 6 9
3 7 12

3 长数据变宽数据

原始数据

fish_encounters
fish station seen
4842 Release 1
4842 I80_1 1
4842 Lisbon 1
4842 Rstr 1
4842 Base_TD 1
4842 BCE 1
4842 BCW 1
4842 BCE2 1
4842 BCW2 1
4842 MAE 1
4842 MAW 1
4843 Release 1
4843 I80_1 1
4843 Lisbon 1
4843 Rstr 1
4843 Base_TD 1
4843 BCE 1
4843 BCW 1
4843 BCE2 1
4843 BCW2 1
4843 MAE 1
4843 MAW 1
4844 Release 1
4844 I80_1 1
4844 Lisbon 1
4844 Rstr 1
4844 Base_TD 1
4844 BCE 1
4844 BCW 1
4844 BCE2 1
4844 BCW2 1
4844 MAE 1
4844 MAW 1
4845 Release 1
4845 I80_1 1
4845 Lisbon 1
4845 Rstr 1
4845 Base_TD 1
4847 Release 1
4847 I80_1 1
4847 Lisbon 1
4848 Release 1
4848 I80_1 1
4848 Lisbon 1
4848 Rstr 1
4849 Release 1
4849 I80_1 1
4850 Release 1
4850 I80_1 1
4850 Rstr 1
4850 Base_TD 1
4850 BCE 1
4850 BCW 1
4851 Release 1
4851 I80_1 1
4854 Release 1
4854 I80_1 1
4855 Release 1
4855 I80_1 1
4855 Lisbon 1
4855 Rstr 1
4855 Base_TD 1
4857 Release 1
4857 I80_1 1
4857 Lisbon 1
4857 Rstr 1
4857 Base_TD 1
4857 BCE 1
4857 BCW 1
4857 BCE2 1
4857 BCW2 1
4858 Release 1
4858 I80_1 1
4858 Lisbon 1
4858 Rstr 1
4858 Base_TD 1
4858 BCE 1
4858 BCW 1
4858 BCE2 1
4858 BCW2 1
4858 MAE 1
4858 MAW 1
4859 Release 1
4859 I80_1 1
4859 Lisbon 1
4859 Rstr 1
4859 Base_TD 1
4861 Release 1
4861 I80_1 1
4861 Lisbon 1
4861 Rstr 1
4861 Base_TD 1
4861 BCE 1
4861 BCW 1
4861 BCE2 1
4861 BCW2 1
4861 MAE 1
4861 MAW 1
4862 Release 1
4862 I80_1 1
4862 Lisbon 1
4862 Rstr 1
4862 Base_TD 1
4862 BCE 1
4862 BCW 1
4862 BCE2 1
4862 BCW2 1
4863 Release 1
4863 I80_1 1
4864 Release 1
4864 I80_1 1
4865 Release 1
4865 I80_1 1
4865 Lisbon 1

3.1 tidyr::spread

fish_encounters %>% 
  spread(station, seen, fill = 0)
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
4842 1 1 1 1 1 1 1 1 1 1 1
4843 1 1 1 1 1 1 1 1 1 1 1
4844 1 1 1 1 1 1 1 1 1 1 1
4845 1 1 1 1 1 0 0 0 0 0 0
4847 1 1 1 0 0 0 0 0 0 0 0
4848 1 1 1 1 0 0 0 0 0 0 0
4849 1 1 0 0 0 0 0 0 0 0 0
4850 1 1 0 1 1 1 1 0 0 0 0
4851 1 1 0 0 0 0 0 0 0 0 0
4854 1 1 0 0 0 0 0 0 0 0 0
4855 1 1 1 1 1 0 0 0 0 0 0
4857 1 1 1 1 1 1 1 1 1 0 0
4858 1 1 1 1 1 1 1 1 1 1 1
4859 1 1 1 1 1 0 0 0 0 0 0
4861 1 1 1 1 1 1 1 1 1 1 1
4862 1 1 1 1 1 1 1 1 1 0 0
4863 1 1 0 0 0 0 0 0 0 0 0
4864 1 1 0 0 0 0 0 0 0 0 0
4865 1 1 1 0 0 0 0 0 0 0 0

3.2 tidyr::pivot_wider

fish_encounters %>% 
  pivot_wider(names_from = station, 
              values_from = seen, 
              values_fill = 0)
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
4842 1 1 1 1 1 1 1 1 1 1 1
4843 1 1 1 1 1 1 1 1 1 1 1
4844 1 1 1 1 1 1 1 1 1 1 1
4845 1 1 1 1 1 0 0 0 0 0 0
4847 1 1 1 0 0 0 0 0 0 0 0
4848 1 1 1 1 0 0 0 0 0 0 0
4849 1 1 0 0 0 0 0 0 0 0 0
4850 1 1 0 1 1 1 1 0 0 0 0
4851 1 1 0 0 0 0 0 0 0 0 0
4854 1 1 0 0 0 0 0 0 0 0 0
4855 1 1 1 1 1 0 0 0 0 0 0
4857 1 1 1 1 1 1 1 1 1 0 0
4858 1 1 1 1 1 1 1 1 1 1 1
4859 1 1 1 1 1 0 0 0 0 0 0
4861 1 1 1 1 1 1 1 1 1 1 1
4862 1 1 1 1 1 1 1 1 1 0 0
4863 1 1 0 0 0 0 0 0 0 0 0
4864 1 1 0 0 0 0 0 0 0 0 0
4865 1 1 1 0 0 0 0 0 0 0 0