数据集有长数据
、宽数据
之分。原生数据一般为宽数据,以其适合录入、查看、比较,这类数据的基本特点是:每一个观测(observation)占一行,每一个测量值(measurement)占一列。这时并不是每一列都是变量(variable),也就是说一个变量可能分解占位在不同的列。而长数据则是各类分析软件需要的格式,这类数据的特点是:每一个独特的变量集中在某列,观测位于一行。以下为各种类型数据示例:
tidyr::table4a
country | 1999 | 2000 |
---|---|---|
Afghanistan | 745 | 2666 |
Brazil | 37737 | 80488 |
China | 212258 | 213766 |
tidyr::table4a %>% gather(2:3, key = "year", value = cases)
country | year | cases |
---|---|---|
Afghanistan | 1999 | 745 |
Brazil | 1999 | 37737 |
China | 1999 | 212258 |
Afghanistan | 2000 | 2666 |
Brazil | 2000 | 80488 |
China | 2000 | 213766 |
tidyr::table5 %>% mutate(year = str_c(century, year)) %>% select(-2)
country | year | rate |
---|---|---|
Afghanistan | 1999 | 745/19987071 |
Afghanistan | 2000 | 2666/20595360 |
Brazil | 1999 | 37737/172006362 |
Brazil | 2000 | 80488/174504898 |
China | 1999 | 212258/1272915272 |
China | 2000 | 213766/1280428583 |
需要说明的是,判断数据是长还是宽数据,这取决于您所分析数据集中的变量。在矩形数据集(Rectangle Data)不是所有的列名称都是变量。是否是变量取决于您分析和计算的目的。也就是说在你本次分析和计算过程中,同一变量在同一列,观测(observation)在各自的行,才是真正的tidy data。
The ambiguity comes from the definition of tidy data. Tidiness depends on the variables in your data set. But what is a variable depends on what you are trying to do. To identify the variables that you need to work with, describe what you want to do with an equation. Each variable in the equation should correspond to a variable in your data. Primer
变量是可以测量的一种数量、属性或特征。 A variable is a quantity, quality, or property that you can measure.
变量值是测量变量时的状态。 A value is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.
观测是为了同一个目的同时测量的一组测量结果。 An observation or case is a set of measurements made under similar conditions (you usually make all of the measurements in an observation at the same time and on the same object). An observation will contain several values, each associated with a different variable. I’ll sometimes refer to an observation as a case or data point.
病例一览表(Cases list)是公共卫生领域最常见的宽数据。
suppressPackageStartupMessages(library(tidyverse))
data(Oswego, package = "epiDisplay")
wide_data <- Oswego %>%
as_tibble() %>%
slice_sample(n = 5)
wide_data
age | sex | timesupper | ill | onsetdate | onsettime | bakedham | spinach | mashedpota | cabbagesal | jello | rolls | brownbread | milk | coffee | water | cakes | vanilla | chocolate | fruitsalad |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
62 | F | NA | TRUE | 04/19 | 30 | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE |
12 | F | 2200 | TRUE | 04/19 | 100 | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE | TRUE | FALSE |
13 | F | NA | FALSE | NA | NA | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE |
25 | M | NA | TRUE | 04/18 | 2300 | TRUE | FALSE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | TRUE | TRUE | FALSE |
48 | F | NA | TRUE | 04/18 | 2400 | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE |
# %>%
# flextable::flextable()
“Tidy data sets are all alike; but every messy data set is messy in its own way.” — Hadley Wickham
who %>%
slice_sample(n = 10)
country | iso2 | iso3 | year | new_sp_m014 | new_sp_m1524 | new_sp_m2534 | new_sp_m3544 | new_sp_m4554 | new_sp_m5564 | new_sp_m65 | new_sp_f014 | new_sp_f1524 | new_sp_f2534 | new_sp_f3544 | new_sp_f4554 | new_sp_f5564 | new_sp_f65 | new_sn_m014 | new_sn_m1524 | new_sn_m2534 | new_sn_m3544 | new_sn_m4554 | new_sn_m5564 | new_sn_m65 | new_sn_f014 | new_sn_f1524 | new_sn_f2534 | new_sn_f3544 | new_sn_f4554 | new_sn_f5564 | new_sn_f65 | new_ep_m014 | new_ep_m1524 | new_ep_m2534 | new_ep_m3544 | new_ep_m4554 | new_ep_m5564 | new_ep_m65 | new_ep_f014 | new_ep_f1524 | new_ep_f2534 | new_ep_f3544 | new_ep_f4554 | new_ep_f5564 | new_ep_f65 | newrel_m014 | newrel_m1524 | newrel_m2534 | newrel_m3544 | newrel_m4554 | newrel_m5564 | newrel_m65 | newrel_f014 | newrel_f1524 | newrel_f2534 | newrel_f3544 | newrel_f4554 | newrel_f5564 | newrel_f65 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
South Africa | ZA | ZAF | 1981 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Sudan | SD | SDN | 2004 | 537 | 1377 | 1791 | 1465 | 1035 | 697 | 467 | 426 | 978 | 1187 | 897 | 601 | 400 | 237 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Malaysia | MY | MYS | 1987 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Aruba | AW | ABW | 2004 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
United Arab Emirates | AE | ARE | 2012 | 0 | 2 | 4 | 4 | 5 | 5 | 2 | 0 | 5 | 2 | 2 | 3 | 4 | 4 | 0 | 0 | 3 | 1 | 1 | 1 | 2 | 1 | 4 | 0 | 0 | 1 | 0 | 1 | 2 | 1 | 3 | 3 | 0 | 0 | 2 | 0 | 4 | 2 | 1 | 1 | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Serbia | RS | SRB | 2011 | 2 | 60 | 73 | 74 | 122 | 112 | 101 | 5 | 46 | 59 | 43 | 30 | 20 | 129 | 6 | 11 | 25 | 23 | 53 | 49 | 62 | 4 | 16 | 13 | 18 | 21 | 25 | 70 | 1 | 4 | 10 | 6 | 15 | 18 | 19 | 1 | 9 | 9 | 13 | 9 | 9 | 32 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Sweden | SE | SWE | 2000 | 0 | 9 | 10 | 12 | 11 | 4 | 25 | 1 | 9 | 8 | 10 | 2 | 2 | 15 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Aruba | AW | ABW | 2010 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Montserrat | MS | MSR | 1986 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
United Arab Emirates | AE | ARE | 2002 | 1 | 2 | 0 | 6 | 6 | 10 | 0 | 3 | 3 | 8 | 3 | 4 | 10 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
整洁数据,需要满足以下条件:
原始数据
table4a
country | 1999 | 2000 |
---|---|---|
Afghanistan | 745 | 2666 |
Brazil | 37737 | 80488 |
China | 212258 | 213766 |
table4a %>%
as.data.frame() %>%
stats::reshape(
idvar = "country",
timevar = "year",
varying = list(2:3),
v.names = "cases",
direction = "long")
country | year | cases | |
---|---|---|---|
Afghanistan.1 | Afghanistan | 1 | 745 |
Brazil.1 | Brazil | 1 | 37737 |
China.1 | China | 1 | 212258 |
Afghanistan.2 | Afghanistan | 2 | 2666 |
Brazil.2 | Brazil | 2 | 80488 |
China.2 | China | 2 | 213766 |
table4a %>%
reshape2::melt(
# id.vars = "country",
id = 1,
variable.name = "year",
value.name = "cases"
)
country | year | cases |
---|---|---|
Afghanistan | 1999 | 745 |
Brazil | 1999 | 37737 |
China | 1999 | 212258 |
Afghanistan | 2000 | 2666 |
Brazil | 2000 | 80488 |
China | 2000 | 213766 |
table4a %>%
data.table::melt(
id = 1,
variable.name = "year",
value.name = "cases"
)
country | year | cases |
---|---|---|
Afghanistan | 1999 | 745 |
Brazil | 1999 | 37737 |
China | 1999 | 212258 |
Afghanistan | 2000 | 2666 |
Brazil | 2000 | 80488 |
China | 2000 | 213766 |
table4a %>%
gather(-country, key = "year", value = "cases")
country | year | cases |
---|---|---|
Afghanistan | 1999 | 745 |
Brazil | 1999 | 37737 |
China | 1999 | 212258 |
Afghanistan | 2000 | 2666 |
Brazil | 2000 | 80488 |
China | 2000 | 213766 |
table4a %>%
pivot_longer(-country,
names_to = "year",
values_to = "cases")
country | year | cases |
---|---|---|
Afghanistan | 1999 | 745 |
Afghanistan | 2000 | 2666 |
Brazil | 1999 | 37737 |
Brazil | 2000 | 80488 |
China | 1999 | 212258 |
China | 2000 | 213766 |
原始数据
who_subset <- who %>%
slice_sample(n = 10)
who_subset
country | iso2 | iso3 | year | new_sp_m014 | new_sp_m1524 | new_sp_m2534 | new_sp_m3544 | new_sp_m4554 | new_sp_m5564 | new_sp_m65 | new_sp_f014 | new_sp_f1524 | new_sp_f2534 | new_sp_f3544 | new_sp_f4554 | new_sp_f5564 | new_sp_f65 | new_sn_m014 | new_sn_m1524 | new_sn_m2534 | new_sn_m3544 | new_sn_m4554 | new_sn_m5564 | new_sn_m65 | new_sn_f014 | new_sn_f1524 | new_sn_f2534 | new_sn_f3544 | new_sn_f4554 | new_sn_f5564 | new_sn_f65 | new_ep_m014 | new_ep_m1524 | new_ep_m2534 | new_ep_m3544 | new_ep_m4554 | new_ep_m5564 | new_ep_m65 | new_ep_f014 | new_ep_f1524 | new_ep_f2534 | new_ep_f3544 | new_ep_f4554 | new_ep_f5564 | new_ep_f65 | newrel_m014 | newrel_m1524 | newrel_m2534 | newrel_m3544 | newrel_m4554 | newrel_m5564 | newrel_m65 | newrel_f014 | newrel_f1524 | newrel_f2534 | newrel_f3544 | newrel_f4554 | newrel_f5564 | newrel_f65 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Japan | JP | JPN | 2011 | 0 | 96 | 215 | 367 | 465 | 812 | 3256 | 5 | 94 | 213 | 203 | 148 | 223 | 1840 | 18 | 240 | 436 | 501 | 479 | 748 | 2651 | 35 | 227 | 435 | 353 | 270 | 307 | 1531 | 17 | 54 | 112 | 171 | 169 | 268 | 1734 | 5 | 43 | 121 | 162 | 132 | 248 | 1590 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Algeria | DZ | DZA | 2010 | 52 | 1203 | 1669 | 825 | 513 | 392 | 397 | 79 | 1086 | 826 | 417 | 251 | 222 | 367 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Argentina | AR | ARG | 2006 | 67 | 519 | 484 | 360 | 351 | 346 | 321 | 74 | 438 | 437 | 235 | 197 | 173 | 213 | 394 | 196 | 235 | 143 | 153 | 176 | 187 | 345 | 187 | 185 | 119 | 75 | 90 | 127 | 32 | 111 | 122 | 94 | 92 | 76 | 91 | 36 | 87 | 91 | 69 | 59 | 46 | 75 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Japan | JP | JPN | 2002 | 2 | 191 | 549 | 579 | 1192 | 1334 | 3747 | 3 | 192 | 395 | 259 | 248 | 308 | 1808 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Kazakhstan | KZ | KAZ | 1983 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Northern Mariana Islands | MP | MNP | 2012 | 0 | 0 | 0 | 3 | 1 | 1 | 0 | 0 | 0 | 3 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 3 | 2 | 2 | 0 | 0 | 4 | 3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Tunisia | TN | TUN | 2000 | 16 | 139 | 208 | 156 | 109 | 65 | 101 | 7 | 68 | 59 | 43 | 21 | 21 | 58 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Malaysia | MY | MYS | 1989 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Romania | RO | ROU | 1987 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Turks and Caicos Islands | TC | TCA | 1996 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
who_subset %>%
gather(
new_sp_m014:newrel_f65,
key = "variable",
value = "count"
)%>%
dplyr::slice_head(n = 10)
country | iso2 | iso3 | year | variable | count |
---|---|---|---|---|---|
Japan | JP | JPN | 2011 | new_sp_m014 | 0 |
Algeria | DZ | DZA | 2010 | new_sp_m014 | 52 |
Argentina | AR | ARG | 2006 | new_sp_m014 | 67 |
Japan | JP | JPN | 2002 | new_sp_m014 | 2 |
Kazakhstan | KZ | KAZ | 1983 | new_sp_m014 | NA |
Northern Mariana Islands | MP | MNP | 2012 | new_sp_m014 | 0 |
Tunisia | TN | TUN | 2000 | new_sp_m014 | 16 |
Malaysia | MY | MYS | 1989 | new_sp_m014 | NA |
Romania | RO | ROU | 1987 | new_sp_m014 | NA |
Turks and Caicos Islands | TC | TCA | 1996 | new_sp_m014 | NA |
who_subset %>%
pivot_longer(
cols = new_sp_m014:newrel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
values_to = "count"
) %>%
dplyr::slice_head(n = 10)
country | iso2 | iso3 | year | diagnosis | gender | age | count |
---|---|---|---|---|---|---|---|
Japan | JP | JPN | 2011 | sp | m | 014 | 0 |
Japan | JP | JPN | 2011 | sp | m | 1524 | 96 |
Japan | JP | JPN | 2011 | sp | m | 2534 | 215 |
Japan | JP | JPN | 2011 | sp | m | 3544 | 367 |
Japan | JP | JPN | 2011 | sp | m | 4554 | 465 |
Japan | JP | JPN | 2011 | sp | m | 5564 | 812 |
Japan | JP | JPN | 2011 | sp | m | 65 | 3256 |
Japan | JP | JPN | 2011 | sp | f | 014 | 5 |
Japan | JP | JPN | 2011 | sp | f | 1524 | 94 |
Japan | JP | JPN | 2011 | sp | f | 2534 | 213 |
who_subset %>% pivot_longer(
cols = new_sp_m014:newrel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
names_transform = list(
gender = ~ readr::parse_factor(.x, levels = c("f", "m")),
age = ~ readr::parse_factor(
.x,
levels = c("014", "1524", "2534", "3544", "4554", "5564", "65"),
ordered = TRUE
)
),
values_to = "count",
)%>%
dplyr::slice_head(n = 10)
country | iso2 | iso3 | year | diagnosis | gender | age | count |
---|---|---|---|---|---|---|---|
Japan | JP | JPN | 2011 | sp | m | 014 | 0 |
Japan | JP | JPN | 2011 | sp | m | 1524 | 96 |
Japan | JP | JPN | 2011 | sp | m | 2534 | 215 |
Japan | JP | JPN | 2011 | sp | m | 3544 | 367 |
Japan | JP | JPN | 2011 | sp | m | 4554 | 465 |
Japan | JP | JPN | 2011 | sp | m | 5564 | 812 |
Japan | JP | JPN | 2011 | sp | m | 65 | 3256 |
Japan | JP | JPN | 2011 | sp | f | 014 | 5 |
Japan | JP | JPN | 2011 | sp | f | 1524 | 94 |
Japan | JP | JPN | 2011 | sp | f | 2534 | 213 |
原始数据
s2 <- "family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1 30 1998-11-26 2000-01-29 NA 1 2 NA
2 27 1996-06-22 NA NA 2 NA NA
3 26 2002-07-11 2004-04-05 2007-09-02 2 2 1
4 32 2004-10-10 2009-08-27 2012-07-21 1 1 1
5 29 2000-12-05 2005-02-28 NA 2 1 NA"
DT <- data.table::fread(s2)
DT %>% knitr::kable()
family_id | age_mother | dob_child1 | dob_child2 | dob_child3 | gender_child1 | gender_child2 | gender_child3 |
---|---|---|---|---|---|---|---|
1 | 30 | 1998-11-26 | 2000-01-29 | NA | 1 | 2 | NA |
2 | 27 | 1996-06-22 | NA | NA | 2 | NA | NA |
3 | 26 | 2002-07-11 | 2004-04-05 | 2007-09-02 | 2 | 2 | 1 |
4 | 32 | 2004-10-10 | 2009-08-27 | 2012-07-21 | 1 | 1 | 1 |
5 | 29 | 2000-12-05 | 2005-02-28 | NA | 2 | 1 | NA |
library(data.table)
DT %>%
data.table::melt(
measure = patterns("^dob", "^gender"),
variable.name = "child",
value.name = c("dob", "gender")
)
family_id | age_mother | child | dob | gender |
---|---|---|---|---|
1 | 30 | 1 | 1998-11-26 | 1 |
2 | 27 | 1 | 1996-06-22 | 2 |
3 | 26 | 1 | 2002-07-11 | 2 |
4 | 32 | 1 | 2004-10-10 | 1 |
5 | 29 | 1 | 2000-12-05 | 2 |
1 | 30 | 2 | 2000-01-29 | 2 |
2 | 27 | 2 | NA | NA |
3 | 26 | 2 | 2004-04-05 | 2 |
4 | 32 | 2 | 2009-08-27 | 1 |
5 | 29 | 2 | 2005-02-28 | 1 |
1 | 30 | 3 | NA | NA |
2 | 27 | 3 | NA | NA |
3 | 26 | 3 | 2007-09-02 | 1 |
4 | 32 | 3 | 2012-07-21 | 1 |
5 | 29 | 3 | NA | NA |
DT %>%
pivot_longer(
cols = -1:-2,
names_to = c(".value", "child"),
names_sep = "_"
)
family_id | age_mother | child | dob | gender |
---|---|---|---|---|
1 | 30 | child1 | 1998-11-26 | 1 |
1 | 30 | child2 | 2000-01-29 | 2 |
1 | 30 | child3 | NA | NA |
2 | 27 | child1 | 1996-06-22 | 2 |
2 | 27 | child2 | NA | NA |
2 | 27 | child3 | NA | NA |
3 | 26 | child1 | 2002-07-11 | 2 |
3 | 26 | child2 | 2004-04-05 | 2 |
3 | 26 | child3 | 2007-09-02 | 1 |
4 | 32 | child1 | 2004-10-10 | 1 |
4 | 32 | child2 | 2009-08-27 | 1 |
4 | 32 | child3 | 2012-07-21 | 1 |
5 | 29 | child1 | 2000-12-05 | 2 |
5 | 29 | child2 | 2005-02-28 | 1 |
5 | 29 | child3 | NA | NA |
DT %>%
pivot_longer(
cols = -1:-2,
names_to = c(".value", "child"),
names_pattern = '(.*)_.*(.)'
)
family_id | age_mother | child | dob | gender |
---|---|---|---|---|
1 | 30 | 1 | 1998-11-26 | 1 |
1 | 30 | 2 | 2000-01-29 | 2 |
1 | 30 | 3 | NA | NA |
2 | 27 | 1 | 1996-06-22 | 2 |
2 | 27 | 2 | NA | NA |
2 | 27 | 3 | NA | NA |
3 | 26 | 1 | 2002-07-11 | 2 |
3 | 26 | 2 | 2004-04-05 | 2 |
3 | 26 | 3 | 2007-09-02 | 1 |
4 | 32 | 1 | 2004-10-10 | 1 |
4 | 32 | 2 | 2009-08-27 | 1 |
4 | 32 | 3 | 2012-07-21 | 1 |
5 | 29 | 1 | 2000-12-05 | 2 |
5 | 29 | 2 | 2005-02-28 | 1 |
5 | 29 | 3 | NA | NA |
df <- tibble(id = 1:3, x1 = 4:6, x2 = 5:7, y1 = 7:9, y2 = 10:12)
df
id | x1 | x2 | y1 | y2 |
---|---|---|---|---|
1 | 4 | 5 | 7 | 10 |
2 | 5 | 6 | 8 | 11 |
3 | 6 | 7 | 9 | 12 |
df %>% pivot_longer(!id, names_to = ".value", names_pattern = "(.).")
id | x | y |
---|---|---|
1 | 4 | 7 |
1 | 5 | 10 |
2 | 5 | 8 |
2 | 6 | 11 |
3 | 6 | 9 |
3 | 7 | 12 |
原始数据
fish_encounters
fish | station | seen |
---|---|---|
4842 | Release | 1 |
4842 | I80_1 | 1 |
4842 | Lisbon | 1 |
4842 | Rstr | 1 |
4842 | Base_TD | 1 |
4842 | BCE | 1 |
4842 | BCW | 1 |
4842 | BCE2 | 1 |
4842 | BCW2 | 1 |
4842 | MAE | 1 |
4842 | MAW | 1 |
4843 | Release | 1 |
4843 | I80_1 | 1 |
4843 | Lisbon | 1 |
4843 | Rstr | 1 |
4843 | Base_TD | 1 |
4843 | BCE | 1 |
4843 | BCW | 1 |
4843 | BCE2 | 1 |
4843 | BCW2 | 1 |
4843 | MAE | 1 |
4843 | MAW | 1 |
4844 | Release | 1 |
4844 | I80_1 | 1 |
4844 | Lisbon | 1 |
4844 | Rstr | 1 |
4844 | Base_TD | 1 |
4844 | BCE | 1 |
4844 | BCW | 1 |
4844 | BCE2 | 1 |
4844 | BCW2 | 1 |
4844 | MAE | 1 |
4844 | MAW | 1 |
4845 | Release | 1 |
4845 | I80_1 | 1 |
4845 | Lisbon | 1 |
4845 | Rstr | 1 |
4845 | Base_TD | 1 |
4847 | Release | 1 |
4847 | I80_1 | 1 |
4847 | Lisbon | 1 |
4848 | Release | 1 |
4848 | I80_1 | 1 |
4848 | Lisbon | 1 |
4848 | Rstr | 1 |
4849 | Release | 1 |
4849 | I80_1 | 1 |
4850 | Release | 1 |
4850 | I80_1 | 1 |
4850 | Rstr | 1 |
4850 | Base_TD | 1 |
4850 | BCE | 1 |
4850 | BCW | 1 |
4851 | Release | 1 |
4851 | I80_1 | 1 |
4854 | Release | 1 |
4854 | I80_1 | 1 |
4855 | Release | 1 |
4855 | I80_1 | 1 |
4855 | Lisbon | 1 |
4855 | Rstr | 1 |
4855 | Base_TD | 1 |
4857 | Release | 1 |
4857 | I80_1 | 1 |
4857 | Lisbon | 1 |
4857 | Rstr | 1 |
4857 | Base_TD | 1 |
4857 | BCE | 1 |
4857 | BCW | 1 |
4857 | BCE2 | 1 |
4857 | BCW2 | 1 |
4858 | Release | 1 |
4858 | I80_1 | 1 |
4858 | Lisbon | 1 |
4858 | Rstr | 1 |
4858 | Base_TD | 1 |
4858 | BCE | 1 |
4858 | BCW | 1 |
4858 | BCE2 | 1 |
4858 | BCW2 | 1 |
4858 | MAE | 1 |
4858 | MAW | 1 |
4859 | Release | 1 |
4859 | I80_1 | 1 |
4859 | Lisbon | 1 |
4859 | Rstr | 1 |
4859 | Base_TD | 1 |
4861 | Release | 1 |
4861 | I80_1 | 1 |
4861 | Lisbon | 1 |
4861 | Rstr | 1 |
4861 | Base_TD | 1 |
4861 | BCE | 1 |
4861 | BCW | 1 |
4861 | BCE2 | 1 |
4861 | BCW2 | 1 |
4861 | MAE | 1 |
4861 | MAW | 1 |
4862 | Release | 1 |
4862 | I80_1 | 1 |
4862 | Lisbon | 1 |
4862 | Rstr | 1 |
4862 | Base_TD | 1 |
4862 | BCE | 1 |
4862 | BCW | 1 |
4862 | BCE2 | 1 |
4862 | BCW2 | 1 |
4863 | Release | 1 |
4863 | I80_1 | 1 |
4864 | Release | 1 |
4864 | I80_1 | 1 |
4865 | Release | 1 |
4865 | I80_1 | 1 |
4865 | Lisbon | 1 |
fish_encounters %>%
spread(station, seen, fill = 0)
fish | Release | I80_1 | Lisbon | Rstr | Base_TD | BCE | BCW | BCE2 | BCW2 | MAE | MAW |
---|---|---|---|---|---|---|---|---|---|---|---|
4842 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4843 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4844 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4845 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4847 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4848 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4849 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4850 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
4851 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4854 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4855 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4857 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
4858 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4859 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4861 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4862 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
4863 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4864 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4865 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
fish_encounters %>%
pivot_wider(names_from = station,
values_from = seen,
values_fill = 0)
fish | Release | I80_1 | Lisbon | Rstr | Base_TD | BCE | BCW | BCE2 | BCW2 | MAE | MAW |
---|---|---|---|---|---|---|---|---|---|---|---|
4842 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4843 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4844 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4845 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4847 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4848 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4849 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4850 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
4851 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4854 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4855 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4857 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
4858 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4859 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4861 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4862 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
4863 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4864 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4865 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |