Actually we are familiar to regular expressions (RegEx) that are powerful tools for pattern matching and text manipulation. from Web Scrabing. Regular expressions can be used to extract specific information from HTML or other any text data. The exercises below demonstrate the application of RegEx in different scenarios.

1. Exercise

Vector of strings is given

vector <- c("emoticon", ":)", "symbol", "$^$")
writeLines((vector))

## emoticon
## :)
## symbol
## $^$

# Use the function str_view() and find in vector: 
# a) string of 3 characters with the letter o in the middle
str_view(vector, '.o.')

## [1] │ e<mot>i<con>
## [3] │ sym<bol>

# b) expression "emoticon"
str_view(vector, "^emoticon$")

## [1] │ <emoticon>

# c) expression ":)"
str_view(vector, "^\\:\\)$")

## [2] │ <:)>

# d) expression "$^$"
str_view(vector, "^\\$\\^\\$$")

## [4] │ <$^$>

2. Exercise —-

Corpus of 980 words is given stringr::words

# Use the function str_view() and find in the corpus:
# a) all words containing the expression "yes" (add the parameter match=T)
str_view(stringr::words, "yes")

## [976] │ <yes>
## [977] │ <yes>terday

# b) all words starting with "w"
str_view(stringr::words, "^w")

## [922] │ <w>age
## [923] │ <w>ait
## [924] │ <w>alk
## [925] │ <w>all
## [926] │ <w>ant
## [927] │ <w>ar
## [928] │ <w>arm
## [929] │ <w>ash
## [930] │ <w>aste
## [931] │ <w>atch
## [932] │ <w>ater
## [933] │ <w>ay
## [934] │ <w>e
## [935] │ <w>ear
## [936] │ <w>ednesday
## [937] │ <w>ee
## [938] │ <w>eek
## [939] │ <w>eigh
## [940] │ <w>elcome
## [941] │ <w>ell
## ... and 33 more

# c) all words ending with "x"
str_view(stringr::words, "x$")

## [108] │ bo<x>
## [747] │ se<x>
## [772] │ si<x>
## [841] │ ta<x>

3. Exercise —-

Corpus of 980 words is given stringr::words

# Use the function str_view() and find in the corpus:
# a) all words starting with a vowel
str_view(stringr::words, '^[aeiouAEIOU]')

##  [1] │ <a>
##  [2] │ <a>ble
##  [3] │ <a>bout
##  [4] │ <a>bsolute
##  [5] │ <a>ccept
##  [6] │ <a>ccount
##  [7] │ <a>chieve
##  [8] │ <a>cross
##  [9] │ <a>ct
## [10] │ <a>ctive
## [11] │ <a>ctual
## [12] │ <a>dd
## [13] │ <a>ddress
## [14] │ <a>dmit
## [15] │ <a>dvertise
## [16] │ <a>ffect
## [17] │ <a>fford
## [18] │ <a>fter
## [19] │ <a>fternoon
## [20] │ <a>gain
## ... and 155 more

# b) all words that start only with a consonant
str_view(stringr::words, '^[^aeiouAEIOU]')

## [66] │ <b>aby
## [67] │ <b>ack
## [68] │ <b>ad
## [69] │ <b>ag
## [70] │ <b>alance
## [71] │ <b>all
## [72] │ <b>ank
## [73] │ <b>ar
## [74] │ <b>ase
## [75] │ <b>asis
## [76] │ <b>e
## [77] │ <b>ear
## [78] │ <b>eat
## [79] │ <b>eauty
## [80] │ <b>ecause
## [81] │ <b>ecome
## [82] │ <b>ed
## [83] │ <b>efore
## [84] │ <b>egin
## [85] │ <b>ehind
## ... and 785 more

# c) all words ending with "ing" or "ise"
str_view(stringr::words, '(ing|ise)$')

##  [15] │ advert<ise>
## [113] │ br<ing>
## [251] │ dur<ing>
## [280] │ even<ing>
## [288] │ exerc<ise>
## [448] │ k<ing>
## [512] │ mean<ing>
## [533] │ morn<ing>
## [588] │ otherw<ise>
## [637] │ pract<ise>
## [674] │ ra<ise>
## [681] │ real<ise>
## [709] │ r<ing>
## [710] │ r<ise>
## [765] │ s<ing>
## [834] │ surpr<ise>
## [860] │ th<ing>

# d) all words ending with "ed" but not with "eed"
str_view(stringr::words, '[^e]ed$')

##  [82] │ <bed>
## [410] │ hund<red>
## [690] │ <red>

# -------------------------------------------------#

Regular Expressions

Gizem Güleli

2023-10-20

1. Exercise

2. Exercise —-

3. Exercise —-