1 引言

数据集有长数据宽数据之分。原生数据一般为宽数据,以其适合录入、查看、比较,这类数据的基本特点是:每一个观测(observation)占一行,每一个测量值(measurement)占一列。这时并不是每一列都是变量(variable),也就是说一个变量可能分解占位在不同的列。而长数据则是各类分析软件需要的格式,这类数据的特点是:每一个独特的变量集中在某列,观测位于一行。以下为各种类型数据示例:

  • 宽数据
tidyr::table4a
country 1999 2000
Afghanistan 745 2666
Brazil 37737 80488
China 212258 213766

比较隐蔽的例子

data.table::data.table(
                Date = c("2009-01-01", "2009-01-02"),
  Boeing.stock.price = c("$173.55", "$172.61"),
  Amazon.stock.price = c("$174.90", "$171.42"),
  Google.stock.price = c("$174.34", "$170.04")
)
Date Boeing.stock.price Amazon.stock.price Google.stock.price
2009-01-01 $173.55 $174.90 $174.34
2009-01-02 $172.61 $171.42 $170.04
  • 长数据
tidyr::table4a %>% gather(2:3, key = "year", value = cases)
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766
  • 窄数据
tidyr::table5 %>% mutate(year = str_c(century, year)) %>% select(-2)
country year rate
Afghanistan 1999 745/19987071
Afghanistan 2000 2666/20595360
Brazil 1999 37737/172006362
Brazil 2000 80488/174504898
China 1999 212258/1272915272
China 2000 213766/1280428583

需要说明的是,判断数据是长还是宽数据,这取决于您所分析数据集中的变量。在矩形数据集(Rectangle Data)不是所有的列名称都是变量。是否是变量取决于您分析和计算的目的。也就是说在你本次分析和计算过程中,同一变量在同一列,观测(observation)在各自的行,才是真正的tidy data。

The ambiguity comes from the definition of tidy data. Tidiness depends on the variables in your data set. But what is a variable depends on what you are trying to do. To identify the variables that you need to work with, describe what you want to do with an equation. Each variable in the equation should correspond to a variable in your data. Primer

1.1 基本概念

1.1.1 变量

变量是可以测量的一种数量、属性或特征。 A variable is a quantity, quality, or property that you can measure.

A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. -Tidy data

1.1.2 变量值

变量值是测量变量时的状态。 A value is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.

1.1.3 观测

观测是为了同一个目的同时测量的一组测量结果。 An observation or case is a set of measurements made under similar conditions (you usually make all of the measurements in an observation at the same time and on the same object). An observation will contain several values, each associated with a different variable. I’ll sometimes refer to an observation as a case or data point.

An observation contains all values measured on the same unit (like a person, or a day, or a race) across attributes.-Tidy data

1.1.4 宽数据(Wide Data)

病例一览表(Cases list)是公共卫生领域最常见的宽数据。

suppressPackageStartupMessages(library(tidyverse)) 
data(Oswego, package = "epiDisplay")
wide_data <- Oswego %>% 
  as_tibble() %>% 
  slice_sample(n = 5)
wide_data 
age sex timesupper ill onsetdate onsettime bakedham spinach mashedpota cabbagesal jello rolls brownbread milk coffee water cakes vanilla chocolate fruitsalad
24 F NA FALSE NA NA TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
36 M NA TRUE 04/18 2215 TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
48 F NA TRUE 04/18 2400 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
70 M 1930 TRUE 04/18 2230 TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
68 M NA TRUE 04/18 2130 TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
# %>% 
  # flextable::flextable() 

1.1.5 杂乱数据(Untidy Data)

“Tidy data sets are all alike; but every messy data set is messy in its own way.” — Hadley Wickham

The five most common problems with messy datasets, along with their remedies:

* Column headers are values, not variable names.
* Multiple variables are stored in one column.
* Variables are stored in both rows and columns.
* Multiple types of observational units are stored in the same table.
* A single observational unit is stored in multiple tables.
  • 窄数据是指用每列存储多个变量或变量值的数据
    • Narrow data(Molten dataTidy data) uses a literal key column and a literal value column to store multiple variables.
who %>% 
  slice_sample(n = 10)
country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544 new_sp_m4554 new_sp_m5564 new_sp_m65 new_sp_f014 new_sp_f1524 new_sp_f2534 new_sp_f3544 new_sp_f4554 new_sp_f5564 new_sp_f65 new_sn_m014 new_sn_m1524 new_sn_m2534 new_sn_m3544 new_sn_m4554 new_sn_m5564 new_sn_m65 new_sn_f014 new_sn_f1524 new_sn_f2534 new_sn_f3544 new_sn_f4554 new_sn_f5564 new_sn_f65 new_ep_m014 new_ep_m1524 new_ep_m2534 new_ep_m3544 new_ep_m4554 new_ep_m5564 new_ep_m65 new_ep_f014 new_ep_f1524 new_ep_f2534 new_ep_f3544 new_ep_f4554 new_ep_f5564 new_ep_f65 newrel_m014 newrel_m1524 newrel_m2534 newrel_m3544 newrel_m4554 newrel_m5564 newrel_m65 newrel_f014 newrel_f1524 newrel_f2534 newrel_f3544 newrel_f4554 newrel_f5564 newrel_f65
Palau PW PLW 1989 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Seychelles SC SYC 1984 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Oman OM OMN 1992 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Syrian Arab Republic SY SYR 1998 5 335 293 111 93 48 50 20 197 99 43 49 18 21 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Namibia NA NAM 2011 48 337 844 660 361 152 138 78 427 653 410 185 100 110 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Nepal NP NPL 2008 81 150 1409 1558 1706 1515 792 107 832 820 704 630 523 226 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Montserrat MS MSR 2005 NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Cameroon CM CMR 1993 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Zimbabwe ZW ZWE 1988 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Namibia NA NAM 2006 86 347 1052 799 386 174 146 74 485 875 521 239 92 80 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

1.1.6 整洁数据(Tidy Data)

整洁数据,需要满足以下条件:

  1. 每个变量分布在唯一列;
  2. 每个观测分布在行;
  3. 每一个值分布在单独的单元格。
  • Each variable is in its own column
  • Each observation is in its own row
  • Each value is in its own cell (this follows from #1 and #2)

1.2 Data manipulation

includes variable-by-variable transformation (e.g., log or sqrt), as well as aggregation, fltering and reordering.

Filter: subsetting or removing observations based on some condition.

Transform: adding or modifying variables. These modications can involve either a single variable (e.g., log-transformation), or multiple variables (e.g., computing density from weight and volume).

Aggregate: collapsing multiple values into a single value (e.g., by summing or taking means).

Sort: changing the order of observations.

2 宽数据变长数据

2.1 列名含单变量(One variable in column names)

原始数据

table4a
country 1999 2000
Afghanistan 745 2666
Brazil 37737 80488
China 212258 213766

2.1.1 stats::reshape

table4a %>% 
  as.data.frame() %>% 
stats::reshape( 
        idvar = "country", 
        timevar = "year",
        varying = list(2:3),
        v.names = "cases", 
        direction = "long")
country year cases
Afghanistan.1 Afghanistan 1 745
Brazil.1 Brazil 1 37737
China.1 China 1 212258
Afghanistan.2 Afghanistan 2 2666
Brazil.2 Brazil 2 80488
China.2 China 2 213766

2.1.2 reshape2::melt

table4a %>% 
  reshape2::melt(
    # id.vars = "country",
    id = 1,
    variable.name = "year",
    value.name = "cases"
  )
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766

2.1.3 data.table::melt

table4a %>% 
  data.table::melt(
    id = 1,
    variable.name = "year",
    value.name = "cases"
  )
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766

2.1.4 tidyr::gather

table4a %>% 
  gather(-country, key = "year", value = "cases")
country year cases
Afghanistan 1999 745
Brazil 1999 37737
China 1999 212258
Afghanistan 2000 2666
Brazil 2000 80488
China 2000 213766

2.1.5 tidyr::pivot_longer

table4a %>% 
  pivot_longer(-country,
               names_to = "year",
               values_to = "cases")
country year cases
Afghanistan 1999 745
Afghanistan 2000 2666
Brazil 1999 37737
Brazil 2000 80488
China 1999 212258
China 2000 213766

2.2 列名含多变量(Many variables in column namese)

原始数据

who_subset <- who %>% 
  slice_sample(n = 10)
who_subset
country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544 new_sp_m4554 new_sp_m5564 new_sp_m65 new_sp_f014 new_sp_f1524 new_sp_f2534 new_sp_f3544 new_sp_f4554 new_sp_f5564 new_sp_f65 new_sn_m014 new_sn_m1524 new_sn_m2534 new_sn_m3544 new_sn_m4554 new_sn_m5564 new_sn_m65 new_sn_f014 new_sn_f1524 new_sn_f2534 new_sn_f3544 new_sn_f4554 new_sn_f5564 new_sn_f65 new_ep_m014 new_ep_m1524 new_ep_m2534 new_ep_m3544 new_ep_m4554 new_ep_m5564 new_ep_m65 new_ep_f014 new_ep_f1524 new_ep_f2534 new_ep_f3544 new_ep_f4554 new_ep_f5564 new_ep_f65 newrel_m014 newrel_m1524 newrel_m2534 newrel_m3544 newrel_m4554 newrel_m5564 newrel_m65 newrel_f014 newrel_f1524 newrel_f2534 newrel_f3544 newrel_f4554 newrel_f5564 newrel_f65
Lithuania LT LTU 1981 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Iraq IQ IRQ 1985 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Haiti HT HTI 1998 188 804 971 656 331 177 142 208 827 958 620 300 141 119 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Greenland GL GRL 2000 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Nicaragua NI NIC 2006 15 162 151 129 98 90 72 25 168 144 90 65 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Norway NO NOR 1988 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Czech Republic CZ CZE 2005 0 8 24 57 55 45 46 0 3 14 16 7 5 28 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
British Virgin Islands VG VGB 1997 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Belarus BY BLR 1980 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
China CN CHN 2000 1131 19111 29399 25206 25593 21429 21771 1420 14536 18496 12377 9899 7102 6296 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

2.2.1 tidyr::gather

who_subset %>% 
  gather(
    new_sp_m014:newrel_f65,
    key = "variable",
    value = "count"
  )%>% 
  dplyr::slice_head(n = 10)
country iso2 iso3 year variable count
Lithuania LT LTU 1981 new_sp_m014 NA
Iraq IQ IRQ 1985 new_sp_m014 NA
Haiti HT HTI 1998 new_sp_m014 188
Greenland GL GRL 2000 new_sp_m014 NA
Nicaragua NI NIC 2006 new_sp_m014 15
Norway NO NOR 1988 new_sp_m014 NA
Czech Republic CZ CZE 2005 new_sp_m014 0
British Virgin Islands VG VGB 1997 new_sp_m014 NA
Belarus BY BLR 1980 new_sp_m014 NA
China CN CHN 2000 new_sp_m014 1131

2.2.2 tidyr::pivot_longer

who_subset %>% 
  pivot_longer(
    cols = new_sp_m014:newrel_f65,
    names_to = c("diagnosis", "gender", "age"), 
    names_pattern = "new_?(.*)_(.)(.*)",
    values_to = "count"
  ) %>% 
  dplyr::slice_head(n = 10)
country iso2 iso3 year diagnosis gender age count
Lithuania LT LTU 1981 sp m 014 NA
Lithuania LT LTU 1981 sp m 1524 NA
Lithuania LT LTU 1981 sp m 2534 NA
Lithuania LT LTU 1981 sp m 3544 NA
Lithuania LT LTU 1981 sp m 4554 NA
Lithuania LT LTU 1981 sp m 5564 NA
Lithuania LT LTU 1981 sp m 65 NA
Lithuania LT LTU 1981 sp f 014 NA
Lithuania LT LTU 1981 sp f 1524 NA
Lithuania LT LTU 1981 sp f 2534 NA
who_subset %>% pivot_longer(
  cols = new_sp_m014:newrel_f65,
  names_to = c("diagnosis", "gender", "age"), 
  names_pattern = "new_?(.*)_(.)(.*)",
  names_transform = list(
    gender = ~ readr::parse_factor(.x, levels = c("f", "m")),
    age = ~ readr::parse_factor(
      .x,
      levels = c("014", "1524", "2534", "3544", "4554", "5564", "65"), 
      ordered = TRUE
    )
  ),
  values_to = "count",
)%>% 
  dplyr::slice_head(n = 10)
country iso2 iso3 year diagnosis gender age count
Lithuania LT LTU 1981 sp m 014 NA
Lithuania LT LTU 1981 sp m 1524 NA
Lithuania LT LTU 1981 sp m 2534 NA
Lithuania LT LTU 1981 sp m 3544 NA
Lithuania LT LTU 1981 sp m 4554 NA
Lithuania LT LTU 1981 sp m 5564 NA
Lithuania LT LTU 1981 sp m 65 NA
Lithuania LT LTU 1981 sp f 014 NA
Lithuania LT LTU 1981 sp f 1524 NA
Lithuania LT LTU 1981 sp f 2534 NA

2.3 每行含多个观测(Multiple observations per row)

原始数据

s2 <- "family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1         30 1998-11-26 2000-01-29         NA             1             2            NA
2         27 1996-06-22         NA         NA             2            NA            NA
3         26 2002-07-11 2004-04-05 2007-09-02             2             2             1
4         32 2004-10-10 2009-08-27 2012-07-21             1             1             1
5         29 2000-12-05 2005-02-28         NA             2             1            NA"
DT <- data.table::fread(s2)
DT %>% knitr::kable()
family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1 30 1998-11-26 2000-01-29 NA 1 2 NA
2 27 1996-06-22 NA NA 2 NA NA
3 26 2002-07-11 2004-04-05 2007-09-02 2 2 1
4 32 2004-10-10 2009-08-27 2012-07-21 1 1 1
5 29 2000-12-05 2005-02-28 NA 2 1 NA

2.3.1 data.table::melt

library(data.table)
DT %>% 
  data.table::melt(
    measure = patterns("^dob", "^gender"),
    variable.name = "child",
    value.name = c("dob", "gender")
    )
family_id age_mother child dob gender
1 30 1 1998-11-26 1
2 27 1 1996-06-22 2
3 26 1 2002-07-11 2
4 32 1 2004-10-10 1
5 29 1 2000-12-05 2
1 30 2 2000-01-29 2
2 27 2 NA NA
3 26 2 2004-04-05 2
4 32 2 2009-08-27 1
5 29 2 2005-02-28 1
1 30 3 NA NA
2 27 3 NA NA
3 26 3 2007-09-02 1
4 32 3 2012-07-21 1
5 29 3 NA NA

2.3.2 tidyr::pivot_longer

# method1
DT %>% 
  pivot_longer(
    cols = -1:-2,
    names_to = c(".value", "child"),
    names_sep = "_"
  )
family_id age_mother child dob gender
1 30 child1 1998-11-26 1
1 30 child2 2000-01-29 2
1 30 child3 NA NA
2 27 child1 1996-06-22 2
2 27 child2 NA NA
2 27 child3 NA NA
3 26 child1 2002-07-11 2
3 26 child2 2004-04-05 2
3 26 child3 2007-09-02 1
4 32 child1 2004-10-10 1
4 32 child2 2009-08-27 1
4 32 child3 2012-07-21 1
5 29 child1 2000-12-05 2
5 29 child2 2005-02-28 1
5 29 child3 NA NA
# method2
DT %>% 
  pivot_longer(
    cols = -1:-2,
    names_to = c(".value", "child"),
    names_pattern = '(.*)_.*(.)'
  )
family_id age_mother child dob gender
1 30 1 1998-11-26 1
1 30 2 2000-01-29 2
1 30 3 NA NA
2 27 1 1996-06-22 2
2 27 2 NA NA
2 27 3 NA NA
3 26 1 2002-07-11 2
3 26 2 2004-04-05 2
3 26 3 2007-09-02 1
4 32 1 2004-10-10 1
4 32 2 2009-08-27 1
4 32 3 2012-07-21 1
5 29 1 2000-12-05 2
5 29 2 2005-02-28 1
5 29 3 NA NA

2.3.3 一个简单数据

df <- tibble(id = 1:3, x1 = 4:6, x2 = 5:7, y1 = 7:9, y2 = 10:12)
df
id x1 x2 y1 y2
1 4 5 7 10
2 5 6 8 11
3 6 7 9 12
df %>% pivot_longer(!id, names_to = c(".value", "time"), names_pattern = "(.)(.)")
id time x y
1 1 4 7
1 2 5 10
2 1 5 8
2 2 6 11
3 1 6 9
3 2 7 12
# 注意与以下的区别
df %>% pivot_longer(!id, names_to = c("value", "time"), names_pattern = "(.)(.)", values_to = "results")
id value time results
1 x 1 4
1 x 2 5
1 y 1 7
1 y 2 10
2 x 1 5
2 x 2 6
2 y 1 8
2 y 2 11
3 x 1 6
3 x 2 7
3 y 1 9
3 y 2 12

3 长数据变宽数据

原始数据

fish_encounters
fish station seen
4842 Release 1
4842 I80_1 1
4842 Lisbon 1
4842 Rstr 1
4842 Base_TD 1
4842 BCE 1
4842 BCW 1
4842 BCE2 1
4842 BCW2 1
4842 MAE 1
4842 MAW 1
4843 Release 1
4843 I80_1 1
4843 Lisbon 1
4843 Rstr 1
4843 Base_TD 1
4843 BCE 1
4843 BCW 1
4843 BCE2 1
4843 BCW2 1
4843 MAE 1
4843 MAW 1
4844 Release 1
4844 I80_1 1
4844 Lisbon 1
4844 Rstr 1
4844 Base_TD 1
4844 BCE 1
4844 BCW 1
4844 BCE2 1
4844 BCW2 1
4844 MAE 1
4844 MAW 1
4845 Release 1
4845 I80_1 1
4845 Lisbon 1
4845 Rstr 1
4845 Base_TD 1
4847 Release 1
4847 I80_1 1
4847 Lisbon 1
4848 Release 1
4848 I80_1 1
4848 Lisbon 1
4848 Rstr 1
4849 Release 1
4849 I80_1 1
4850 Release 1
4850 I80_1 1
4850 Rstr 1
4850 Base_TD 1
4850 BCE 1
4850 BCW 1
4851 Release 1
4851 I80_1 1
4854 Release 1
4854 I80_1 1
4855 Release 1
4855 I80_1 1
4855 Lisbon 1
4855 Rstr 1
4855 Base_TD 1
4857 Release 1
4857 I80_1 1
4857 Lisbon 1
4857 Rstr 1
4857 Base_TD 1
4857 BCE 1
4857 BCW 1
4857 BCE2 1
4857 BCW2 1
4858 Release 1
4858 I80_1 1
4858 Lisbon 1
4858 Rstr 1
4858 Base_TD 1
4858 BCE 1
4858 BCW 1
4858 BCE2 1
4858 BCW2 1
4858 MAE 1
4858 MAW 1
4859 Release 1
4859 I80_1 1
4859 Lisbon 1
4859 Rstr 1
4859 Base_TD 1
4861 Release 1
4861 I80_1 1
4861 Lisbon 1
4861 Rstr 1
4861 Base_TD 1
4861 BCE 1
4861 BCW 1
4861 BCE2 1
4861 BCW2 1
4861 MAE 1
4861 MAW 1
4862 Release 1
4862 I80_1 1
4862 Lisbon 1
4862 Rstr 1
4862 Base_TD 1
4862 BCE 1
4862 BCW 1
4862 BCE2 1
4862 BCW2 1
4863 Release 1
4863 I80_1 1
4864 Release 1
4864 I80_1 1
4865 Release 1
4865 I80_1 1
4865 Lisbon 1

3.1 tidyr::spread

fish_encounters %>% 
  spread(station, seen, fill = 0)
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
4842 1 1 1 1 1 1 1 1 1 1 1
4843 1 1 1 1 1 1 1 1 1 1 1
4844 1 1 1 1 1 1 1 1 1 1 1
4845 1 1 1 1 1 0 0 0 0 0 0
4847 1 1 1 0 0 0 0 0 0 0 0
4848 1 1 1 1 0 0 0 0 0 0 0
4849 1 1 0 0 0 0 0 0 0 0 0
4850 1 1 0 1 1 1 1 0 0 0 0
4851 1 1 0 0 0 0 0 0 0 0 0
4854 1 1 0 0 0 0 0 0 0 0 0
4855 1 1 1 1 1 0 0 0 0 0 0
4857 1 1 1 1 1 1 1 1 1 0 0
4858 1 1 1 1 1 1 1 1 1 1 1
4859 1 1 1 1 1 0 0 0 0 0 0
4861 1 1 1 1 1 1 1 1 1 1 1
4862 1 1 1 1 1 1 1 1 1 0 0
4863 1 1 0 0 0 0 0 0 0 0 0
4864 1 1 0 0 0 0 0 0 0 0 0
4865 1 1 1 0 0 0 0 0 0 0 0

3.2 tidyr::pivot_wider

fish_encounters %>% 
  pivot_wider(names_from = station, 
              values_from = seen, 
              values_fill = 0)
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
4842 1 1 1 1 1 1 1 1 1 1 1
4843 1 1 1 1 1 1 1 1 1 1 1
4844 1 1 1 1 1 1 1 1 1 1 1
4845 1 1 1 1 1 0 0 0 0 0 0
4847 1 1 1 0 0 0 0 0 0 0 0
4848 1 1 1 1 0 0 0 0 0 0 0
4849 1 1 0 0 0 0 0 0 0 0 0
4850 1 1 0 1 1 1 1 0 0 0 0
4851 1 1 0 0 0 0 0 0 0 0 0
4854 1 1 0 0 0 0 0 0 0 0 0
4855 1 1 1 1 1 0 0 0 0 0 0
4857 1 1 1 1 1 1 1 1 1 0 0
4858 1 1 1 1 1 1 1 1 1 1 1
4859 1 1 1 1 1 0 0 0 0 0 0
4861 1 1 1 1 1 1 1 1 1 1 1
4862 1 1 1 1 1 1 1 1 1 0 0
4863 1 1 0 0 0 0 0 0 0 0 0
4864 1 1 0 0 0 0 0 0 0 0 0
4865 1 1 1 0 0 0 0 0 0 0 0