Week 9: Apply it to your data 8

Import your data

soccer <- read_csv("../00_data/myData.csv")

## Rows: 900 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (11): country, city, stage, home_team, away_team, outcome, win_conditio...
## dbl   (3): year, home_score, away_score
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Chapter 13

What are primary keys in your data?

The primary keys in my data is the year or X.

Can you divide your data into two?

Divide it using dplyr::select in a way the two have a common variable, which you could use to join the two.

soccer1half <- soccer %>% select(year:outcome)
soccer2half <- soccer %>% select(outcome:winning_team)

Can you join the two together?

Use tidyr::left_join or other joining functions.

left_join(soccer1half, soccer2half)

## Joining with `by = join_by(outcome)`

## Warning in left_join(soccer1half, soccer2half): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

## # A tibble: 303,806 × 11
##     year country city      stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵
##    <dbl> <chr>   <chr>     <chr> <chr>   <chr>     <dbl>   <dbl> <chr>   <chr>  
##  1  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  2  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  3  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  4  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  5  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  6  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  7  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  8  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  9  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
## 10  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
## # … with 303,796 more rows, 1 more variable: winning_team <chr>, and
## #   abbreviated variable names ¹home_team, ²away_team, ³home_score,
## #   ⁴away_score, ⁵win_conditions

Chapter 14

Tools

Detect matches (str_detect)

x <- c("apple", "banana", "pear")
str_detect(x, "e")

## [1]  TRUE FALSE  TRUE

Extract matches (str_extract)

length(sentences)

## [1] 720

head(sentences)

## [1] "The birch canoe slid on the smooth planks." 
## [2] "Glue the sheet to the dark blue background."
## [3] "It's easy to tell the depth of a well."     
## [4] "These days a chicken leg is a rare dish."   
## [5] "Rice is often served in round bowls."       
## [6] "The juice of lemons makes fine punch."

colours <- c("red", "orange", "yellow", "green", "blue", "purple")
colour_match <- str_c(colours, collapse = "|")
colour_match

## [1] "red|orange|yellow|green|blue|purple"

has_colour <- str_subset(sentences, colour_match)
matches <- str_extract(has_colour, colour_match)
head(matches)

## [1] "blue" "blue" "red"  "red"  "red"  "blue"

more <- sentences[str_count(sentences, colour_match) > 1]
str_view_all(more, colour_match)

## Warning: `str_view()` was deprecated in stringr 1.5.0.
## ℹ Please use `str_view_all()` instead.

## [1] │ It is hard to erase <blue> or <red> ink.
## [2] │ The <green> light in the brown box flicke<red>.
## [3] │ The sky in the west is tinged with <orange> <red>.

Replacing matches (str_replace)

x <- c("apple", "pear", "banana")
str_replace(x, "[aeiou]", "-")

## [1] "-pple"  "p-ar"   "b-nana"

str_replace_all(x, "[aeiou]", "-")

## [1] "-ppl-"  "p--r"   "b-n-n-"