r4ds_14-2

14.4.1.1. Exercises
14.4.2.1 Exercises
14.4.3.1 Exercises
14.4.4.1 Exercises
14.4.5.1 Exercises
14.5.1 Exercises

14.4.1.1. Exercises

1.For each of the following challenges, try solving it by using both a single regular expression, and a combination of multiple str_detect() calls.

Find all words that start or end with x.

# single

# combination
words[str_detect(words, "^x") | str_detect(words, "x$")]

Find all words that start with a vowel and end with a consonant.

# single
str_view(words, "^[aiueo].*[^aiueo]$", match = TRUE)
# combination
words[str_detect(words, "^[aiueo]") & str_detect(words, "[^aiueo]$")]

Are there any words that contain at least one of each different vowel?

# single

# combination
words[str_detect(words, "a") & str_detect(words, "i") & str_detect(words, "u") &
      str_detect(words, "e") & str_detect(words, "o")]

2. What word has the highest number of vowels? What word has the highest proportion of vowels? (Hint: what is the denominator?)

df <- tibble(
  word = words
)
df <- df %>% mutate(number = str_count(word, "[aiueo]"), prop = number / str_length(word))
# highest number
df %>% filter(number == max(df$number))

## # A tibble: 8 x 3
##   word        number  prop
##   <chr>        <int> <dbl>
## 1 appropriate      5 0.455
## 2 associate        5 0.556
## 3 available        5 0.556
## 4 colleague        5 0.556
## 5 encourage        5 0.556
## 6 experience       5 0.5  
## 7 individual       5 0.5  
## 8 television       5 0.5

# prop
df %>% filter(prop == max(df$prop))

## # A tibble: 1 x 3
##   word  number  prop
##   <chr>  <int> <dbl>
## 1 a          1     1

14.4.2.1 Exercises

1. In the previous example, you might have noticed that the regular expression matched “flickered”, which is not a colour. Modify the regex to fix the problem.

color <- c("red", "orange", "yellow", "green", "blue", "purple")
color_match <- str_c(color, collapse = "|")

2. From the Harvard sentences data, extract:

The first word from each sentence.

str_extract(sentences, "^[a-zA-Z]+")

All words ending in ing.

str_extract_all(sentences, "[a-zA-Z]+ing")

All plurals.

str_extract_all(sentences, "[a-zA-Z]{3,}s")

14.4.3.1 Exercises

1. Find all words that come after a “number” like “one”, “two”, “three” etc. Pull out both the number and the word.

sentences %>%
  str_subset("(one | two | three | four | five | six | seven | eight | nine | ten)([^ ]+)") %>%
  str_extract("(one | two | three | four | five | six | seven | eight | nine | ten)([^ ]+)")

##  [1] "one over"       " seven books"   " two met"       " two factors"  
##  [5] "one and"        " three lists"   " seven is"      " two when"     
##  [9] "one floor."     "one with"       "one war"        " tender"       
## [13] "one button"     " six minutes."  "one in"         "one like"      
## [17] " two shares"    " two distinct"  "one costs"      " two pins"     
## [21] " five robins."  " four kinds"    "one rang"       " tenth"        
## [25] " three story"   "one wall."      " tent,"         " tent"         
## [29] " three inches"  " six comes"     " tender"        "one before"    
## [33] " tender"        " three batches" " two leaves."

2. Find all contractions. Separate out the pieces before and after the apostrophe.

sentences %>%
  str_subset("([A-Za-z]+)'([A-Za-z]+)") %>%
  str_extract("([A-Za-z]+)'([A-Za-z]+)")

##  [1] "It's"       "man's"      "don't"      "store's"    "workmen's" 
##  [6] "Let's"      "sun's"      "child's"    "king's"     "It's"      
## [11] "don't"      "queen's"    "don't"      "pirate's"   "neighbor's"

14.4.4.1 Exercises

1. Replace all forward slashes in a string with backslashes.

x <- c("a/b/c/d/e")
str_replace_all(x, "\\/", "\\\\") %>% writeLines()

## a\b\c\d\e

2. Implement a simple version of str_to_lower() using replace_all().

x <- c("AA", "BB", "CC")
str_replace_all(x, c("A" = "a", "B" = "b", "C" = "c")) %>% writeLines()

## aa
## bb
## cc

3. Switch the first and last letters in words. Which of those strings are still words?

replace <- words %>% str_replace("(^[A-Za-z])([A-Za-z]*)([A-Za-z]$)", "\\3\\2\\1")
str_view(words, replace,match = TRUE)

14.4.5.1 Exercises

1. Split up a string like “apples, pears, and bananas” into individual components.

x <- "apple, pears, and bananas"
str_split(x, ", and |, ")

## [[1]]
## [1] "apple"   "pears"   "bananas"

2. Why is it better to split up by boundary(“word”) than " “?
`""`でやるとピリオドが単語に一部になるが、`boundary("word")`でやるとピリオドを無視できる
##### 3. What does splitting with an empty string (”") do? Experiment, and then read the documentation.

x <- "This is a sentence. This is another sentence."
str_split(x, "")[[1]]

##  [1] "T" "h" "i" "s" " " "i" "s" " " "a" " " "s" "e" "n" "t" "e" "n" "c"
## [18] "e" "." " " "T" "h" "i" "s" " " "i" "s" " " "a" "n" "o" "t" "h" "e"
## [35] "r" " " "s" "e" "n" "t" "e" "n" "c" "e" "."

14.5.1 Exercises

1. How would you find all strings containing with regex() vs. with fixed()?

x <- c("a\\b", "ab")
# regex()
str_subset(x, regex("\\\\"))

## [1] "a\\b"

# fixed()
str_subset(x, fixed("\\"))

## [1] "a\\b"

2. What are the five most common words in sentences?

sentences %>% str_extract_all(boundary("word")) %>% unlist() %>% str_to_lower() %>%
  as_tibble() %>% set_names("words") %>% group_by(words) %>% summarise(n = n()) %>%
  arrange(desc(n)) %>% head(5)

## Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `tibble::enframe(name = NULL)` instead.
## This warning is displayed once per session.

## # A tibble: 5 x 2
##   words     n
##   <chr> <int>
## 1 the     751
## 2 a       202
## 3 of      132
## 4 to      123
## 5 and     118

r4ds_14-2

Yuta NAGANO

7/17/2019

14.4.1.1. Exercises

1.For each of the following challenges, try solving it by using both a single regular expression, and a combination of multiple str_detect() calls.

2. What word has the highest number of vowels? What word has the highest proportion of vowels? (Hint: what is the denominator?)

14.4.2.1 Exercises

1. In the previous example, you might have noticed that the regular expression matched “flickered”, which is not a colour. Modify the regex to fix the problem.

2. From the Harvard sentences data, extract:

14.4.3.1 Exercises

1. Find all words that come after a “number” like “one”, “two”, “three” etc. Pull out both the number and the word.

2. Find all contractions. Separate out the pieces before and after the apostrophe.

14.4.4.1 Exercises

1. Replace all forward slashes in a string with backslashes.

2. Implement a simple version of str_to_lower() using replace_all().

3. Switch the first and last letters in words. Which of those strings are still words?

14.4.5.1 Exercises

1. Split up a string like “apples, pears, and bananas” into individual components.

2. Why is it better to split up by boundary(“word”) than " “?
`""`でやるとピリオドが単語に一部になるが、`boundary("word")`でやるとピリオドを無視できる
##### 3. What does splitting with an empty string (”") do? Experiment, and then read the documentation.

14.5.1 Exercises

1. How would you find all strings containing with regex() vs. with fixed()?

2. What are the five most common words in sentences?

r4ds_14-2

Yuta NAGANO

7/17/2019

14.4.1.1. Exercises

1.For each of the following challenges, try solving it by using both a single regular expression, and a combination of multiple str_detect() calls.

2. What word has the highest number of vowels? What word has the highest proportion of vowels? (Hint: what is the denominator?)

14.4.2.1 Exercises

1. In the previous example, you might have noticed that the regular expression matched “flickered”, which is not a colour. Modify the regex to fix the problem.

2. From the Harvard sentences data, extract:

14.4.3.1 Exercises

1. Find all words that come after a “number” like “one”, “two”, “three” etc. Pull out both the number and the word.

2. Find all contractions. Separate out the pieces before and after the apostrophe.

14.4.4.1 Exercises

1. Replace all forward slashes in a string with backslashes.

2. Implement a simple version of str_to_lower() using replace_all().

3. Switch the first and last letters in words. Which of those strings are still words?

14.4.5.1 Exercises

1. Split up a string like “apples, pears, and bananas” into individual components.

2. Why is it better to split up by boundary(“word”) than " “? ""でやるとピリオドが単語に一部になるが、boundary("word")でやるとピリオドを無視できる ##### 3. What does splitting with an empty string (”") do? Experiment, and then read the documentation.

14.5.1 Exercises

1. How would you find all strings containing with regex() vs. with fixed()?

2. What are the five most common words in sentences?

2. Why is it better to split up by boundary(“word”) than " “?
`""`でやるとピリオドが単語に一部になるが、`boundary("word")`でやるとピリオドを無視できる
##### 3. What does splitting with an empty string (”") do? Experiment, and then read the documentation.