data <- read.csv("../00_data/myData.csv") %>% as_tibble()
Primary keys in my data are minplayers, maxplayers, playingtime, yearpiblished, owned, and id.
Divide it using dplyr::select in a way the two have a common variable, which you could use to join the two.
data_1half <- data %>% select(yearpublished:boardgamecategory) %>% head(10)
data_half2 <- data %>% select(boardgamecategory:wishing) %>% head(10)
Use tidyr::left_join or other joining functions.
data_join <- left_join(data_1half, data_half2)
## Joining, by = "boardgamecategory"
data %>%
summarise(sum(str_detect(minplayers, "^2")))
## # A tibble: 1 × 1
## `sum(str_detect(minplayers, "^2"))`
## <int>
## 1 14834
str_detect(data_1half$yearpublished, "2008$")
## [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
sum(str_detect(data_1half$yearpublished, "2008$"))
## [1] 2
data %>%
mutate(new_var_withN = str_extract(description, "[0-9]+")) %>%
select(description, new_var_withN)
data %>% mutate(yearpublished_rep = yearpublished %>% str_replace("[0-5]$", "-"))
## # A tibble: 21,631 × 25
## X num id primary descr…¹ yearp…² minpl…³ maxpl…⁴ playi…⁵ minpl…⁶
## <int> <int> <int> <chr> <chr> <int> <int> <int> <int> <int>
## 1 1 0 30549 Pandemic In Pan… 2008 2 4 45 45
## 2 2 1 822 Carcasson… Carcas… 2000 2 5 45 30
## 3 3 2 13 Catan In CAT… 1995 3 4 120 60
## 4 4 3 68448 7 Wonders You ar… 2010 2 7 30 30
## 5 5 4 36218 Dominion "… 2008 2 4 30 30
## 6 6 5 9209 Ticket to… With e… 2004 2 5 60 30
## 7 7 6 178900 Codenames Codena… 2015 2 8 15 15
## 8 8 7 167791 Terraform… In the… 2016 1 5 120 120
## 9 9 8 173346 7 Wonders… In man… 2015 2 2 30 30
## 10 10 9 31260 Agricola Descri… 2007 1 5 150 30
## # … with 21,621 more rows, 15 more variables: maxplaytime <int>, minage <int>,
## # boardgamecategory <chr>, boardgamemechanic <chr>, boardgamefamily <chr>,
## # boardgameexpansion <chr>, boardgameimplementation <chr>,
## # boardgamedesigner <chr>, boardgameartist <chr>, boardgamepublisher <chr>,
## # owned <int>, trading <int>, wanting <int>, wishing <int>,
## # yearpublished_rep <chr>, and abbreviated variable names ¹description,
## # ²yearpublished, ³minplayers, ⁴maxplayers, ⁵playingtime, ⁶minplaytime