Import your data

data <- read.csv("../00_data/myData.csv") %>% as_tibble()

Chapter 13

What are primary keys in your data?

Primary keys in my data are minplayers, maxplayers, playingtime, yearpiblished, owned, and id.

Can you divide your data into two?

Divide it using dplyr::select in a way the two have a common variable, which you could use to join the two.

data_1half <- data %>% select(yearpublished:boardgamecategory) %>% head(10)
data_half2 <- data %>% select(boardgamecategory:wishing) %>% head(10)

Can you join the two together?

Use tidyr::left_join or other joining functions.

data_join <- left_join(data_1half, data_half2)
## Joining, by = "boardgamecategory"

Chapter 14

Tools

Detect matches

data %>% 
    summarise(sum(str_detect(minplayers, "^2")))
## # A tibble: 1 × 1
##   `sum(str_detect(minplayers, "^2"))`
##                                 <int>
## 1                               14834
str_detect(data_1half$yearpublished, "2008$")
##  [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
sum(str_detect(data_1half$yearpublished, "2008$"))
## [1] 2

Extract matches

data %>%
    mutate(new_var_withN = str_extract(description, "[0-9]+")) %>%
    select(description, new_var_withN) 

Replacing matches

data %>% mutate(yearpublished_rep = yearpublished %>% str_replace("[0-5]$", "-"))
## # A tibble: 21,631 × 25
##        X   num     id primary    descr…¹ yearp…² minpl…³ maxpl…⁴ playi…⁵ minpl…⁶
##    <int> <int>  <int> <chr>      <chr>     <int>   <int>   <int>   <int>   <int>
##  1     1     0  30549 Pandemic   In Pan…    2008       2       4      45      45
##  2     2     1    822 Carcasson… Carcas…    2000       2       5      45      30
##  3     3     2     13 Catan      In CAT…    1995       3       4     120      60
##  4     4     3  68448 7 Wonders  You ar…    2010       2       7      30      30
##  5     5     4  36218 Dominion   &quot;…    2008       2       4      30      30
##  6     6     5   9209 Ticket to… With e…    2004       2       5      60      30
##  7     7     6 178900 Codenames  Codena…    2015       2       8      15      15
##  8     8     7 167791 Terraform… In the…    2016       1       5     120     120
##  9     9     8 173346 7 Wonders… In man…    2015       2       2      30      30
## 10    10     9  31260 Agricola   Descri…    2007       1       5     150      30
## # … with 21,621 more rows, 15 more variables: maxplaytime <int>, minage <int>,
## #   boardgamecategory <chr>, boardgamemechanic <chr>, boardgamefamily <chr>,
## #   boardgameexpansion <chr>, boardgameimplementation <chr>,
## #   boardgamedesigner <chr>, boardgameartist <chr>, boardgamepublisher <chr>,
## #   owned <int>, trading <int>, wanting <int>, wishing <int>,
## #   yearpublished_rep <chr>, and abbreviated variable names ¹​description,
## #   ²​yearpublished, ³​minplayers, ⁴​maxplayers, ⁵​playingtime, ⁶​minplaytime