hw06_data_wrangling

Data wrangling wrap up
Topic 1: Character data
Topic 4: Work with the singer data
References

Data wrangling wrap up

Data wrangling is the process of cleaning, structuring and enriching data into a desired format (Trifacta, 2018).

In this assignment I chose two topics to work with to improve my data wrangling skills. In the first one I completed several exercises related to strings, and in the second one I worked with a dataset to extract and analyze geographical information.

Topic 1: Character data

For this task I completed exercises from the Strings chapter of R for Data Science.

I mostly worked with stringr::words and stringr::sentences

14.2 String basics

14.2.5 Exercises

Use str_length() and str_sub() to extract the middle character from a string. What will you do if the string has an even number of characters?

library(tidyverse)
library(stringr)

string1 <- "abc"
string2 <- "abcd"

str_sub(string1, floor((str_length(string1)+1)/2), ceiling((str_length(string1)+1)/2))

## [1] "b"

str_sub(string2, floor((str_length(string2)+1)/2), ceiling((str_length(string2)+1)/2)) #returns the two middle characters as the string has an even number.

## [1] "bc"

I chose to extract both of the middle characters when having an even number, but I could also extract only one arbitrarily:

str_sub(string2, ceiling(str_length(string2)/2), ceiling(str_length(string2)/2))

## [1] "b"

This way, I extract only one of the middle characters.

Write a function that turns (e.g.) a vector c(“a”, “b”, “c”) into the string a, b, and c. Think carefully about what it should do if given a vector of length 0, 1, or 2.

st_comma <- function (x, delim = ",") {
  num <- length(x)
  if(num == 0) {
   stop("vector length = 0") #error message when trying with a length 0 vector
  } else if(num == 1) {
    x
  } else if(num == 2) {
    str_c(x[[1]], "and", x[[2]], sep = " ")
  } else {
   str_1 <- str_c(x[seq_len(num - 1)], delim) #all but the last
   str_2 <- str_c("and", x[[num]], sep = " ")
   str_c(c(str_1, str_2), collapse = " ")
  }
}

#st_comma(c()) # as vector is length 0, the function throws an error message "vector length = 0"
st_comma("a")

## [1] "a"

st_comma(c("a", "b"))

## [1] "a and b"

st_comma(c("a", "b", "c"))

## [1] "a, b, and c"

14.3 Matching patterns with regular expressions

14.3.1.1 Basic matches

How would you match the sequence "'\?

str_view("\"'\\", "\"'\\\\")

What patterns will the regular expression \..\..\.. match? How would you represent it as a string?

str_view("w.x.y.z", "\\..\\..\\..")

It matches patterns with a dot followed by a character that repeats three times.

14.3.2.1 Anchors

How would you match the literal string "$^$" ?

str_view("$^$", "^\\$\\^\\$")

Given the corpus of common words in stringr::words, create regular expressions that find all words that:

Start with “y”.
End with “x”
Are exactly three letters long. (Don’t cheat by using str_length()!)
Have seven letters or more.

Since the list is long, I used a match argument to show only the matching words. When the output is too long, I use str_subset instead of str_view to have a more compact output.

str_view_match <- function(words, pattern) {
    str_view(words, pattern, match=TRUE)
} #function to only show matches

# start with y
str_view_match(words, "^y")

# end with x
str_view_match(words, "x$")

# have exactly three letters
str_subset(words, "^...$")

##   [1] "act" "add" "age" "ago" "air" "all" "and" "any" "arm" "art" "ask"
##  [12] "bad" "bag" "bar" "bed" "bet" "big" "bit" "box" "boy" "bus" "but"
##  [23] "buy" "can" "car" "cat" "cup" "cut" "dad" "day" "die" "dog" "dry"
##  [34] "due" "eat" "egg" "end" "eye" "far" "few" "fit" "fly" "for" "fun"
##  [45] "gas" "get" "god" "guy" "hit" "hot" "how" "job" "key" "kid" "lad"
##  [56] "law" "lay" "leg" "let" "lie" "lot" "low" "man" "may" "mrs" "new"
##  [67] "non" "not" "now" "odd" "off" "old" "one" "out" "own" "pay" "per"
##  [78] "put" "red" "rid" "run" "say" "see" "set" "sex" "she" "sir" "sit"
##  [89] "six" "son" "sun" "tax" "tea" "ten" "the" "tie" "too" "top" "try"
## [100] "two" "use" "war" "way" "wee" "who" "why" "win" "yes" "yet" "you"

# ≥ 7 letters
str_subset(words, ".......")

##   [1] "absolute"    "account"     "achieve"     "address"     "advertise"  
##   [6] "afternoon"   "against"     "already"     "alright"     "although"   
##  [11] "america"     "another"     "apparent"    "appoint"     "approach"   
##  [16] "appropriate" "arrange"     "associate"   "authority"   "available"  
##  [21] "balance"     "because"     "believe"     "benefit"     "between"    
##  [26] "brilliant"   "britain"     "brother"     "business"    "certain"    
##  [31] "chairman"    "character"   "Christmas"   "colleague"   "collect"    
##  [36] "college"     "comment"     "committee"   "community"   "company"    
##  [41] "compare"     "complete"    "compute"     "concern"     "condition"  
##  [46] "consider"    "consult"     "contact"     "continue"    "contract"   
##  [51] "control"     "converse"    "correct"     "council"     "country"    
##  [56] "current"     "decision"    "definite"    "department"  "describe"   
##  [61] "develop"     "difference"  "difficult"   "discuss"     "district"   
##  [66] "document"    "economy"     "educate"     "electric"    "encourage"  
##  [71] "english"     "environment" "especial"    "evening"     "evidence"   
##  [76] "example"     "exercise"    "expense"     "experience"  "explain"    
##  [81] "express"     "finance"     "fortune"     "forward"     "function"   
##  [86] "further"     "general"     "germany"     "goodbye"     "history"    
##  [91] "holiday"     "hospital"    "however"     "hundred"     "husband"    
##  [96] "identify"    "imagine"     "important"   "improve"     "include"    
## [101] "increase"    "individual"  "industry"    "instead"     "interest"   
## [106] "introduce"   "involve"     "kitchen"     "language"    "machine"    
## [111] "meaning"     "measure"     "mention"     "million"     "minister"   
## [116] "morning"     "necessary"   "obvious"     "occasion"    "operate"    
## [121] "opportunity" "organize"    "original"    "otherwise"   "paragraph"  
## [126] "particular"  "pension"     "percent"     "perfect"     "perhaps"    
## [131] "photograph"  "picture"     "politic"     "position"    "positive"   
## [136] "possible"    "practise"    "prepare"     "present"     "pressure"   
## [141] "presume"     "previous"    "private"     "probable"    "problem"    
## [146] "proceed"     "process"     "produce"     "product"     "programme"  
## [151] "project"     "propose"     "protect"     "provide"     "purpose"    
## [156] "quality"     "quarter"     "question"    "realise"     "receive"    
## [161] "recognize"   "recommend"   "relation"    "remember"    "represent"  
## [166] "require"     "research"    "resource"    "respect"     "responsible"
## [171] "saturday"    "science"     "scotland"    "secretary"   "section"    
## [176] "separate"    "serious"     "service"     "similar"     "situate"    
## [181] "society"     "special"     "specific"    "standard"    "station"    
## [186] "straight"    "strategy"    "structure"   "student"     "subject"    
## [191] "succeed"     "suggest"     "support"     "suppose"     "surprise"   
## [196] "telephone"   "television"  "terrible"    "therefore"   "thirteen"   
## [201] "thousand"    "through"     "thursday"    "together"    "tomorrow"   
## [206] "tonight"     "traffic"     "transport"   "trouble"     "tuesday"    
## [211] "understand"  "university"  "various"     "village"     "wednesday"  
## [216] "welcome"     "whether"     "without"     "yesterday"

14.3.3.1 Character classes and alternatives

Create regular expressions to find all words that:

Start with a vowel.
That only contain consonants. (Hint: thinking about matching “not”-vowels.)
End with ed, but not with eed.
End with ing or ise.

# start with a vowel
str_subset(words, "^[aeiou]")

##   [1] "a"           "able"        "about"       "absolute"    "accept"     
##   [6] "account"     "achieve"     "across"      "act"         "active"     
##  [11] "actual"      "add"         "address"     "admit"       "advertise"  
##  [16] "affect"      "afford"      "after"       "afternoon"   "again"      
##  [21] "against"     "age"         "agent"       "ago"         "agree"      
##  [26] "air"         "all"         "allow"       "almost"      "along"      
##  [31] "already"     "alright"     "also"        "although"    "always"     
##  [36] "america"     "amount"      "and"         "another"     "answer"     
##  [41] "any"         "apart"       "apparent"    "appear"      "apply"      
##  [46] "appoint"     "approach"    "appropriate" "area"        "argue"      
##  [51] "arm"         "around"      "arrange"     "art"         "as"         
##  [56] "ask"         "associate"   "assume"      "at"          "attend"     
##  [61] "authority"   "available"   "aware"       "away"        "awful"      
##  [66] "each"        "early"       "east"        "easy"        "eat"        
##  [71] "economy"     "educate"     "effect"      "egg"         "eight"      
##  [76] "either"      "elect"       "electric"    "eleven"      "else"       
##  [81] "employ"      "encourage"   "end"         "engine"      "english"    
##  [86] "enjoy"       "enough"      "enter"       "environment" "equal"      
##  [91] "especial"    "europe"      "even"        "evening"     "ever"       
##  [96] "every"       "evidence"    "exact"       "example"     "except"     
## [101] "excuse"      "exercise"    "exist"       "expect"      "expense"    
## [106] "experience"  "explain"     "express"     "extra"       "eye"        
## [111] "idea"        "identify"    "if"          "imagine"     "important"  
## [116] "improve"     "in"          "include"     "income"      "increase"   
## [121] "indeed"      "individual"  "industry"    "inform"      "inside"     
## [126] "instead"     "insure"      "interest"    "into"        "introduce"  
## [131] "invest"      "involve"     "issue"       "it"          "item"       
## [136] "obvious"     "occasion"    "odd"         "of"          "off"        
## [141] "offer"       "office"      "often"       "okay"        "old"        
## [146] "on"          "once"        "one"         "only"        "open"       
## [151] "operate"     "opportunity" "oppose"      "or"          "order"      
## [156] "organize"    "original"    "other"       "otherwise"   "ought"      
## [161] "out"         "over"        "own"         "under"       "understand" 
## [166] "union"       "unit"        "unite"       "university"  "unless"     
## [171] "until"       "up"          "upon"        "use"         "usual"

# only consonants
str_view_match(words, "^[^aeiou]+$")

# end with ed, but not with eed
str_view_match(words, "^ed$|[^e]ed$")

# end with ing or ise
str_view_match(words, "ing$|ise$")

Empirically verify the rule “i before e except after c”.

str_view_match(words, "([^c]ie|cei)") #rule

str_view_match(words, "(cie)") #exceptions?

From the second output we can see there are some exceptions for this rule, such as the words science and society.

Is “q” always followed by a “u”?

str_view_match(words, "q[^u]")

There were no words in the output, so “q” is always followed by a “u”.

Write a regular expression that matches a word if it’s probably written in British English, not American English.

str_view_match(words, "ise$|our") #ise instead of ize and our instead of or

Create a regular expression that will match telephone numbers as commonly written in your country

phones <- c("(55)32498722", "(778)952-5873")
str_view(phones, "\\(\\d{2}\\)\\d{8}", match = T)

The output matches the telephone number as commonly written in Mexico.

14.3.4.1 Repetition

Create regular expressions to find all words that:

Start with three consonants.
Have three or more vowels in a row.
Have two or more vowel-consonant pairs in a row.

# start with 3 consonants
str_view_match(words, "^[^aeiou]{3}")

# ≥ 3 vowels in a row
str_view_match(words, "[aeiou]{3,}")

# ≥ 2 vowel-consonant pairs in a row
str_subset(words, "([aeiou][^aeiou]){2,}")

##   [1] "absolute"    "agent"       "along"       "america"     "another"    
##   [6] "apart"       "apparent"    "authority"   "available"   "aware"      
##  [11] "away"        "balance"     "basis"       "become"      "before"     
##  [16] "begin"       "behind"      "benefit"     "business"    "character"  
##  [21] "closes"      "community"   "consider"    "cover"       "debate"     
##  [26] "decide"      "decision"    "definite"    "department"  "depend"     
##  [31] "design"      "develop"     "difference"  "difficult"   "direct"     
##  [36] "divide"      "document"    "during"      "economy"     "educate"    
##  [41] "elect"       "electric"    "eleven"      "encourage"   "environment"
##  [46] "europe"      "even"        "evening"     "ever"        "every"      
##  [51] "evidence"    "exact"       "example"     "exercise"    "exist"      
##  [56] "family"      "figure"      "final"       "finance"     "finish"     
##  [61] "friday"      "future"      "general"     "govern"      "holiday"    
##  [66] "honest"      "hospital"    "however"     "identify"    "imagine"    
##  [71] "individual"  "interest"    "introduce"   "item"        "jesus"      
##  [76] "level"       "likely"      "limit"       "local"       "major"      
##  [81] "manage"      "meaning"     "measure"     "minister"    "minus"      
##  [86] "minute"      "moment"      "money"       "music"       "nature"     
##  [91] "necessary"   "never"       "notice"      "okay"        "open"       
##  [96] "operate"     "opportunity" "organize"    "original"    "over"       
## [101] "paper"       "paragraph"   "parent"      "particular"  "photograph" 
## [106] "police"      "policy"      "politic"     "position"    "positive"   
## [111] "power"       "prepare"     "present"     "presume"     "private"    
## [116] "probable"    "process"     "produce"     "product"     "project"    
## [121] "proper"      "propose"     "protect"     "provide"     "quality"    
## [126] "realise"     "reason"      "recent"      "recognize"   "recommend"  
## [131] "record"      "reduce"      "refer"       "regard"      "relation"   
## [136] "remember"    "report"      "represent"   "result"      "return"     
## [141] "saturday"    "second"      "secretary"   "secure"      "separate"   
## [146] "seven"       "similar"     "specific"    "strategy"    "student"    
## [151] "stupid"      "telephone"   "television"  "therefore"   "thousand"   
## [156] "today"       "together"    "tomorrow"    "tonight"     "total"      
## [161] "toward"      "travel"      "unit"        "unite"       "university" 
## [166] "upon"        "visit"       "water"       "woman"

14.3.5.1 Grouping and backreferences

Construct regular expressions to match words that:

Start and end with the same character.
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

#start + end same character
str_view_match(words, "^(.).*\\1$")

#repeated pair of letters
str_view_match(words, "(..).*\\1")

#repeated letters
str_view_match(words, "(.).*\\1.*\\1")

14.4 Tools

14.4.2 Detect matches

For each of the following challenges, try solving by using both a single regular expression, and a combination of multiple str_detect() calls.

Find all words that start or end with x.
Find all words that start with a vowel and end with a consonant.
Are there any words that contain at least one of each different vowel?

# start or end with x 
str_view_match(words, "^x|x$") #single regex

# multiple str_detect() calls
start_x <- str_detect(words, "^x")
end_x <- str_detect(words, "x$")
words[start_x | end_x]

## [1] "box" "sex" "six" "tax"

#start with vowel end with consonant
str_subset(words, "^[aeiou].*[^aeiou]$") #single regex

##   [1] "about"       "accept"      "account"     "across"      "act"        
##   [6] "actual"      "add"         "address"     "admit"       "affect"     
##  [11] "afford"      "after"       "afternoon"   "again"       "against"    
##  [16] "agent"       "air"         "all"         "allow"       "almost"     
##  [21] "along"       "already"     "alright"     "although"    "always"     
##  [26] "amount"      "and"         "another"     "answer"      "any"        
##  [31] "apart"       "apparent"    "appear"      "apply"       "appoint"    
##  [36] "approach"    "arm"         "around"      "art"         "as"         
##  [41] "ask"         "at"          "attend"      "authority"   "away"       
##  [46] "awful"       "each"        "early"       "east"        "easy"       
##  [51] "eat"         "economy"     "effect"      "egg"         "eight"      
##  [56] "either"      "elect"       "electric"    "eleven"      "employ"     
##  [61] "end"         "english"     "enjoy"       "enough"      "enter"      
##  [66] "environment" "equal"       "especial"    "even"        "evening"    
##  [71] "ever"        "every"       "exact"       "except"      "exist"      
##  [76] "expect"      "explain"     "express"     "identify"    "if"         
##  [81] "important"   "in"          "indeed"      "individual"  "industry"   
##  [86] "inform"      "instead"     "interest"    "invest"      "it"         
##  [91] "item"        "obvious"     "occasion"    "odd"         "of"         
##  [96] "off"         "offer"       "often"       "okay"        "old"        
## [101] "on"          "only"        "open"        "opportunity" "or"         
## [106] "order"       "original"    "other"       "ought"       "out"        
## [111] "over"        "own"         "under"       "understand"  "union"      
## [116] "unit"        "university"  "unless"      "until"       "up"         
## [121] "upon"        "usual"

#multiple str_detect()
start_vowel <- str_detect(words, "^[aeiou]")
end_cons <- str_detect(words, "[^aeiou]$")
words[start_vowel & end_cons]

##   [1] "about"       "accept"      "account"     "across"      "act"        
##   [6] "actual"      "add"         "address"     "admit"       "affect"     
##  [11] "afford"      "after"       "afternoon"   "again"       "against"    
##  [16] "agent"       "air"         "all"         "allow"       "almost"     
##  [21] "along"       "already"     "alright"     "although"    "always"     
##  [26] "amount"      "and"         "another"     "answer"      "any"        
##  [31] "apart"       "apparent"    "appear"      "apply"       "appoint"    
##  [36] "approach"    "arm"         "around"      "art"         "as"         
##  [41] "ask"         "at"          "attend"      "authority"   "away"       
##  [46] "awful"       "each"        "early"       "east"        "easy"       
##  [51] "eat"         "economy"     "effect"      "egg"         "eight"      
##  [56] "either"      "elect"       "electric"    "eleven"      "employ"     
##  [61] "end"         "english"     "enjoy"       "enough"      "enter"      
##  [66] "environment" "equal"       "especial"    "even"        "evening"    
##  [71] "ever"        "every"       "exact"       "except"      "exist"      
##  [76] "expect"      "explain"     "express"     "identify"    "if"         
##  [81] "important"   "in"          "indeed"      "individual"  "industry"   
##  [86] "inform"      "instead"     "interest"    "invest"      "it"         
##  [91] "item"        "obvious"     "occasion"    "odd"         "of"         
##  [96] "off"         "offer"       "often"       "okay"        "old"        
## [101] "on"          "only"        "open"        "opportunity" "or"         
## [106] "order"       "original"    "other"       "ought"       "out"        
## [111] "over"        "own"         "under"       "understand"  "union"      
## [116] "unit"        "university"  "unless"      "until"       "up"         
## [121] "upon"        "usual"

#one of each vowel
allv <- c("aeioux", "aei") #to check it works
str_subset(allv, "a.*e.*i.*o.*u")

## [1] "aeioux"

str_subset(words, "a.*e.*i.*o.*u")

## character(0)

# multiple str_detect()
words[str_detect(words, "a") & str_detect(words, "e") &
        str_detect(words, "i") & str_detect(words, "o") &
        str_detect(words, "u")]

## character(0)

There are no words in stringr::words that contain all the vowels.

What word has the highest number of vowels? What word has the highest proportion of vowels?

#highest number of vowels
num_v <- str_count(words, "[aeiou]")
max_v <- max(num_v)
words[num_v == max_v]

## [1] "appropriate" "associate"   "available"   "colleague"   "encourage"  
## [6] "experience"  "individual"  "television"

#highest proportion of vowels
prop_v <- str_count(words, "[aeiou]") / str_length(words)
max_p <- max(prop_v)
words[prop_v == max_p]

## [1] "a"

8 words have 5 vowels, which is the maximum number of values among these words.

The word a has the highest proportion since length = 1 and num_v = 1.

14.4.3.1 Extract matches

From the Harvard sentences data, extract:

The first word from each sentence.
All words ending in ing.
All plurals.

#first word
str_extract(sentences, "[^ ]+") %>% head()

## [1] "The"   "Glue"  "It's"  "These" "Rice"  "The"

sentences %>% head() #to check it worked

## [1] "The birch canoe slid on the smooth planks." 
## [2] "Glue the sheet to the dark blue background."
## [3] "It's easy to tell the depth of a well."     
## [4] "These days a chicken leg is a rare dish."   
## [5] "Rice is often served in round bowls."       
## [6] "The juice of lemons makes fine punch."

#words ending in ing
pat_ing <- "[A-Za-z]+ing" #define pattern
ing <- str_detect(sentences, pat_ing)
str_extract_all(sentences[ing], pat_ing) %>%
  unlist() %>%
  unique() #don't show repeated words

##  [1] "stocking"  "spring"    "evening"   "morning"   "winding"  
##  [6] "living"    "king"      "Adding"    "making"    "raging"   
## [11] "playing"   "sleeping"  "ring"      "glaring"   "sinking"  
## [16] "thing"     "dying"     "Bring"     "lodging"   "filing"   
## [21] "wearing"   "wading"    "swing"     "nothing"   "Whiting"  
## [26] "sing"      "bring"     "painting"  "walking"   "ling"     
## [31] "shipping"  "hing"      "puzzling"  "landing"   "waiting"  
## [36] "whistling" "timing"    "ting"      "changing"  "drenching"
## [41] "moving"    "working"

#plurals
str_extract_all(sentences, "[A-Za-z]{3,}s") %>%
  unlist() %>%
  unique() %>%
  head()

## [1] "planks" "Thes"   "days"   "bowls"  "lemons" "makes"

14.4.4.1 Grouped matches

Find all words that come after a “number” like “one”, “two”, “three” etc. Pull out both the number and the word.

pat_num <- "(one|two|three|four|five|six|seven|eight|nine|ten) ([^ ]+)"
sen_num <- sentences %>% str_subset(pat_num)
sen_num %>% str_match(pat_num)

##       [,1]            [,2]    [,3]      
##  [1,] "ten served"    "ten"   "served"  
##  [2,] "one over"      "one"   "over"    
##  [3,] "seven books"   "seven" "books"   
##  [4,] "two met"       "two"   "met"     
##  [5,] "two factors"   "two"   "factors" 
##  [6,] "one and"       "one"   "and"     
##  [7,] "three lists"   "three" "lists"   
##  [8,] "seven is"      "seven" "is"      
##  [9,] "two when"      "two"   "when"    
## [10,] "one floor."    "one"   "floor."  
## [11,] "ten inches."   "ten"   "inches." 
## [12,] "one with"      "one"   "with"    
## [13,] "one war"       "one"   "war"     
## [14,] "one button"    "one"   "button"  
## [15,] "six minutes."  "six"   "minutes."
## [16,] "ten years"     "ten"   "years"   
## [17,] "one in"        "one"   "in"      
## [18,] "ten chased"    "ten"   "chased"  
## [19,] "one like"      "one"   "like"    
## [20,] "two shares"    "two"   "shares"  
## [21,] "two distinct"  "two"   "distinct"
## [22,] "one costs"     "one"   "costs"   
## [23,] "ten two"       "ten"   "two"     
## [24,] "five robins."  "five"  "robins." 
## [25,] "four kinds"    "four"  "kinds"   
## [26,] "one rang"      "one"   "rang"    
## [27,] "ten him."      "ten"   "him."    
## [28,] "three story"   "three" "story"   
## [29,] "ten by"        "ten"   "by"      
## [30,] "one wall."     "one"   "wall."   
## [31,] "three inches"  "three" "inches"  
## [32,] "ten your"      "ten"   "your"    
## [33,] "six comes"     "six"   "comes"   
## [34,] "one before"    "one"   "before"  
## [35,] "three batches" "three" "batches" 
## [36,] "two leaves."   "two"   "leaves."

Find all contractions. Separate out the pieces before and after the apostrophe.

cont <- "([A-Za-z]+)'([A-Za-z]+)"
sen_cont <- sentences %>% str_subset(cont)
sen_cont %>% str_match(cont)

##       [,1]         [,2]       [,3]
##  [1,] "It's"       "It"       "s" 
##  [2,] "man's"      "man"      "s" 
##  [3,] "don't"      "don"      "t" 
##  [4,] "store's"    "store"    "s" 
##  [5,] "workmen's"  "workmen"  "s" 
##  [6,] "Let's"      "Let"      "s" 
##  [7,] "sun's"      "sun"      "s" 
##  [8,] "child's"    "child"    "s" 
##  [9,] "king's"     "king"     "s" 
## [10,] "It's"       "It"       "s" 
## [11,] "don't"      "don"      "t" 
## [12,] "queen's"    "queen"    "s" 
## [13,] "don't"      "don"      "t" 
## [14,] "pirate's"   "pirate"   "s" 
## [15,] "neighbor's" "neighbor" "s"

14.4.5.1 Replacing matches

Replace all forward slashes in a string with backslashes.

for_slash <- ("one/two/three")
str_replace_all(for_slash, "/", "\\\\") %>% writeLines()

## one\two\three

Implement a simple version of str_to_lower() using replace_all().

caps <- ("ABCDE")
str_replace_all(caps, "([A-Z])", tolower)

## [1] "abcde"

14.4.6.1 Splitting

Split up a string like “apples, pears, and bananas” into individual components.

fruity <- ("apples, pears, and bananas")
str_split(fruity, ", and |,")

## [[1]]
## [1] "apples"  " pears"  "bananas"

Why is it better to split up by boundary("word") than " "?

fruity2 <- ("fruit: apples, pears, (bananas), and oranges")
str_split(fruity2, " ")

## [[1]]
## [1] "fruit:"     "apples,"    "pears,"     "(bananas)," "and"       
## [6] "oranges"

str_split(fruity2, boundary("word"))

## [[1]]
## [1] "fruit"   "apples"  "pears"   "bananas" "and"     "oranges"

Splitting up with boundary("word") is better so I don’t have to specify each special punctuation character to keep out like : , or ().

14.5 Other types of pattern

How would you find all strings containing \ with regex() vs. with fixed()?

strings <- c("ab", "0\\1", "x\\y")
#regex()
str_subset(strings, regex("\\\\"))

## [1] "0\\1" "x\\y"

#fixed()
str_subset(strings, fixed("\\"))

## [1] "0\\1" "x\\y"

What are the five most common words in sentences?

(words_sen <- str_split(sentences, boundary("word")) %>%
  unlist() %>%
  str_to_lower() %>% #avoid repeated words in caps and lower
  as.tibble() %>%
  set_names("word") %>%
  group_by(word) %>%
  count(sort = TRUE) %>% #order by number
  head(5)) #only top 5

## # A tibble: 5 x 2
## # Groups:   word [5]
##   word      n
##   <chr> <int>
## 1 the     751
## 2 a       202
## 3 of      132
## 4 to      123
## 5 and     118

Topic 4: Work with the singer data

4.1 Use `purrr` to map latitude and longitude into human readable information on the band’s origin places.

Notice that revgeocode(... , output = "more") outputs a data frame, while revgeocode(... , output = "address") returns a string: you have the option of dealing with nested data frames. You will need to pay attention to two things:

Not all of the track have a latitude and longitude: what can we do with the missing information? (filtering, …)
Not all of the time we make a research through revgeocode() we get a result. What can we do to avoid those errors to bite us? (look at possibly() in purrr…)

First, I need to load the necessary packages and register my google API key to use ggmap().

library(tidyverse)
library(devtools)
install_github("dkahle/ggmap")
library(ggplot2)
library(ggmap)
register_google("AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ")

The data set singer_locations contains information about songs and associated artists in the Million Song Dataset.

Let’s look at this data frame:

library(singer)
str(singer_locations)

## Classes 'tbl_df', 'tbl' and 'data.frame':    10100 obs. of  14 variables:
##  $ track_id          : chr  "TRWICRA128F42368DB" "TRXJANY128F42246FC" "TRIKPCA128F424A553" "TRYEATD128F92F87C9" ...
##  $ title             : chr  "The Conversation (Cd)" "Lonely Island" "Here's That Rainy Day" "Rego Park Blues" ...
##  $ song_id           : chr  "SOSURTI12A81C22FB8" "SODESQP12A6D4F98EF" "SOQUYQD12A8C131619" "SOEZGRC12AB017F1AC" ...
##  $ release           : chr  "Even If It Kills Me" "The Duke Of Earl" "Imprompture" "Still River" ...
##  $ artist_id         : chr  "ARACDPV1187FB58DF4" "ARYBUAO1187FB3F4EB" "AR4111G1187B9B58AB" "ARQDZP31187B98D623" ...
##  $ artist_name       : chr  "Motion City Soundtrack" "Gene Chandler" "Paul Horn" "Ronnie Earl & the Broadcasters" ...
##  $ year              : int  2007 2004 1998 1995 1968 2006 2003 2007 1966 2006 ...
##  $ duration          : num  170 107 528 695 237 ...
##  $ artist_hotttnesss : num  0.641 0.394 0.431 0.362 0.411 ...
##  $ artist_familiarity: num  0.823 0.57 0.504 0.477 0.53 ...
##  $ latitude          : num  NA 41.9 40.7 NA 42.3 ...
##  $ longitude         : num  NA -87.6 -74 NA -83 ...
##  $ name              : chr  NA "Gene Chandler" "Paul Horn" NA ...
##  $ city              : chr  NA "Chicago, IL" "New York, NY" NA ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 14
##   .. ..$ track_id          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ title             : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ song_id           : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ release           : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ artist_id         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ artist_name       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ year              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ duration          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ artist_hotttnesss : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ artist_familiarity: list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ latitude          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ longitude         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ name              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ city              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

library(kableExtra)
singer_locations %>% head() %>%
  kable() %>%
  kable_styling(full_width = F, position = "center")

track_id	title	song_id	release	artist_id	artist_name	year	duration	artist_hotttnesss	artist_familiarity	latitude	longitude	name	city
TRWICRA128F42368DB	The Conversation (Cd)	SOSURTI12A81C22FB8	Even If It Kills Me	ARACDPV1187FB58DF4	Motion City Soundtrack	2007	170.4485	0.6410183	0.8230522	NA	NA	NA	NA
TRXJANY128F42246FC	Lonely Island	SODESQP12A6D4F98EF	The Duke Of Earl	ARYBUAO1187FB3F4EB	Gene Chandler	2004	106.5530	0.3937627	0.5700167	41.88415	-87.63241	Gene Chandler	Chicago, IL
TRIKPCA128F424A553	Here’s That Rainy Day	SOQUYQD12A8C131619	Imprompture	AR4111G1187B9B58AB	Paul Horn	1998	527.5947	0.4306226	0.5039940	40.71455	-74.00712	Paul Horn	New York, NY
TRYEATD128F92F87C9	Rego Park Blues	SOEZGRC12AB017F1AC	Still River	ARQDZP31187B98D623	Ronnie Earl & the Broadcasters	1995	695.1179	0.3622792	0.4773099	NA	NA	NA	NA
TRBYYXH128F4264585	Games	SOPIOCP12A8C13A322	Afro-Harping	AR75GYU1187B9AE47A	Dorothy Ashby	1968	237.3220	0.4107520	0.5303468	42.33168	-83.04792	Dorothy Ashby	Detroit, MI
TRKFFKR128F9303AE3	More Pipes	SOHQSPY12AB0181325	Six Yanks	ARCENE01187B9AF929	Barleyjuice	2006	192.9400	0.3762635	0.5412950	40.99471	-77.60454	Barleyjuice	Pennsylvania

The singer_locations data frame contains geographical information associated with the artist location stored in two different formats: 1. as a (dirty!) variable named city; 2. as a latitude / longitude pair (stored in latitude, longitude respectively).

From the output of the first songs, we can see that some tracks don’t have this geographical information so I will filter to have only the ones that do contain this information.

singer_geo <- singer_locations %>%
  filter(!is.na(city)) %>%
  select(title, artist_name, year, latitude, longitude, city) #to make table smaller
singer_geo %>%
  head() %>%
  kable() %>%
  kable_styling(full_width = F, position = "center")

title	artist_name	year	latitude	longitude	city
Lonely Island	Gene Chandler	2004	41.88415	-87.63241	Chicago, IL
Here’s That Rainy Day	Paul Horn	1998	40.71455	-74.00712	New York, NY
Games	Dorothy Ashby	1968	42.33168	-83.04792	Detroit, MI
More Pipes	Barleyjuice	2006	40.99471	-77.60454	Pennsylvania
Indian Deli	Madlib	2007	34.20034	-119.18044	Oxnard, CA
Miss Gorgeous	Seeed’s Pharaoh Riddim Feat. General Degree	2003	50.73230	7.10169	Bonn

nrow(singer_locations)

## [1] 10100

nrow(singer_geo)

## [1] 4129

After, filtering the new data frame singer_geo has 4129 observations, compared to 10100 that the original data frame had. However, as there are still so many observations, I will only work with the first 25 songs. As there are many variables too, I will only keep a few to have an easier data set to look at.

singer_geo <- singer_geo[1:25,]
singer_geo %>%
  kable() %>%
  kable_styling(full_width = F, position = "center")

title	artist_name	year	latitude	longitude	city
Lonely Island	Gene Chandler	2004	41.88415	-87.63241	Chicago, IL
Here’s That Rainy Day	Paul Horn	1998	40.71455	-74.00712	New York, NY
Games	Dorothy Ashby	1968	42.33168	-83.04792	Detroit, MI
More Pipes	Barleyjuice	2006	40.99471	-77.60454	Pennsylvania
Indian Deli	Madlib	2007	34.20034	-119.18044	Oxnard, CA
Miss Gorgeous	Seeed’s Pharaoh Riddim Feat. General Degree	2003	50.73230	7.10169	Bonn
Lahainaluna	Keali’i Reichel	2003	19.59009	-155.43414	Hawaii
The Ingenue (LP Version)	Little Feat	1989	34.05349	-118.24532	Los Angeles, CA
The Unquiet Grave (Child No. 78)	Joan Baez	1964	40.57250	-74.15400	Staten Island, NY
The Breaks	31Knots	2008	45.51179	-122.67563	Portland, OR
The Operator	Bleep	1989	51.50632	-0.12714	UK - England - London
Con Il Nastro Rosa	Lucio Battisti	1980	42.50172	12.88512	Poggio Bustone, Rieti, Italy
SOS	Ray Brown Trio / Ralph Moore	1991	40.43831	-79.99745	Pittsburgh, PA
At The End	iio	2002	40.71455	-74.00712	New York, NY
The Hunting Song	Tom Lehrer	1953	37.77916	-122.42005	New York, NY
Mob Job (LP Version)	John Zorn	1989	40.71455	-74.00712	New York, NY
Nothing’s the Same	The Meeting Places	2006	34.05349	-118.24532	Los Angeles, CA
Bohemian Ballet	Deep Forest	1995	37.27188	-119.27023	California
Do You Mean To Imply	Billy Cobham	1999	8.41770	-80.11278	Panama
Pollen And Salt	Daphne Loves Derby	2005	47.38028	-122.23742	KENT, WASHINGTON
Surrounded	SOiL	2009	41.88415	-87.63241	Chicago
Headless	Run Level Zero	2003	62.19845	17.55142	SWEDEN
Na Laethe Bhí	Clannad	1993	53.41961	-8.24055	Ireland
Haiku (Album Version)	Tally Hall	2005	42.32807	-83.73360	Ann Arbor, MI
Bedlam Boys	Old Blind Dogs	2007	57.15382	-2.10679	Aberdeen, Scotland

singer_address <- mapply(FUN = function(longitude, latitude) { 
  revgeocode(c(longitude, latitude), output = "address")}, 
  singer_geo$longitude, singer_geo$latitude)

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=41.88415,-87.63241&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=40.71455,-74.00712&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=42.33168,-83.04792&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=40.99471,-77.60454&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=34.20034,-119.18044&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=50.7323,7.10169&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=19.59009,-155.43414&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=34.05349,-118.24532&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=40.5725,-74.154&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=45.51179,-122.67563&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=51.50632,-0.12714&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=42.50172,12.88512&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=40.43831,-79.99745&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=40.71455,-74.00712&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=37.77916,-122.42005&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=40.71455,-74.00712&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=34.05349,-118.24532&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=37.27188,-119.27023&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=8.4177,-80.11278&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=47.38028,-122.23742&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=41.88415,-87.63241&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=62.19845,17.55142&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=53.41961,-8.24055&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=42.32807,-83.7336&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

## Information from URL : https://maps.googleapis.com/maps/api/geocode/json?latlng=57.15382,-2.10679&key=AIzaSyCZDZzfVa_lzlESRafTcuwH1BzethDNdcQ

singer_address

##  [1] "134 N LaSalle St suite 1720, Chicago, IL 60602, USA"                        
##  [2] "80 Chambers St, New York, NY 10007, USA"                                    
##  [3] "1001 Woodward Ave, Detroit, MI 48226, USA"                                  
##  [4] "Z. H. Confair Memorial Hwy, Howard, PA 16841, USA"                          
##  [5] "300 W 3rd St, Oxnard, CA 93030, USA"                                        
##  [6] "Regina-Pacis-Weg 1, 53113 Bonn, Germany"                                    
##  [7] "Unnamed Road, Hawaii, USA"                                                  
##  [8] "1420 S Oakhurst Dr, Los Angeles, CA 90035, USA"                             
##  [9] "215 Arthur Kill Rd, Staten Island, NY 10306, USA"                           
## [10] "1500 SW 1st Ave, Portland, OR 97201, USA"                                   
## [11] "39 Whitehall, Westminster, London SW1A 2BY, UK"                             
## [12] "Localita' Pescatore, Poggio Bustone, RI 02018, Italy"                       
## [13] "410 Grant St, Pittsburgh, PA 15219, USA"                                    
## [14] "80 Chambers St, New York, NY 10007, USA"                                    
## [15] "1 Dr Carlton B Goodlett Pl, San Francisco, CA 94102, USA"                   
## [16] "80 Chambers St, New York, NY 10007, USA"                                    
## [17] "1420 S Oakhurst Dr, Los Angeles, CA 90035, USA"                             
## [18] "Shaver Lake, CA 93634, USA"                                                 
## [19] "Calle Aviacion, Río Hato, Panama"                                           
## [20] "220 4th Ave S, Kent, WA 98032, USA"                                         
## [21] "134 N LaSalle St suite 1720, Chicago, IL 60602, USA"                        
## [22] "Unnamed Road, 862 96 Njurunda, Sweden"                                      
## [23] "ICastle view, Borris in ossory, Laois, Borris in ossory, Co. Laois, Ireland"
## [24] "3788 Pontiac Trail, Ann Arbor, MI 48105, USA"                               
## [25] "91 Hutcheon St, Aberdeen AB25 1EW, UK"

Now singer_address contains the corresponding addresses from the given coordinates. Let’s see if these addresses match with the variable city.

4.2 Try to check wether the place in city corresponds to the information you retrieved.

sing_add_city <- data.frame(address = singer_address, city = singer_geo$city)
sing_add_city %>% 
  kable() %>%
  kable_styling(full_width = F)

address	city
134 N LaSalle St suite 1720, Chicago, IL 60602, USA	Chicago, IL
80 Chambers St, New York, NY 10007, USA	New York, NY
1001 Woodward Ave, Detroit, MI 48226, USA	Detroit, MI
Z. H. Confair Memorial Hwy, Howard, PA 16841, USA	Pennsylvania
300 W 3rd St, Oxnard, CA 93030, USA	Oxnard, CA
Regina-Pacis-Weg 1, 53113 Bonn, Germany	Bonn
Unnamed Road, Hawaii, USA	Hawaii
1420 S Oakhurst Dr, Los Angeles, CA 90035, USA	Los Angeles, CA
215 Arthur Kill Rd, Staten Island, NY 10306, USA	Staten Island, NY
1500 SW 1st Ave, Portland, OR 97201, USA	Portland, OR
39 Whitehall, Westminster, London SW1A 2BY, UK	UK - England - London
Localita’ Pescatore, Poggio Bustone, RI 02018, Italy	Poggio Bustone, Rieti, Italy
410 Grant St, Pittsburgh, PA 15219, USA	Pittsburgh, PA
80 Chambers St, New York, NY 10007, USA	New York, NY
1 Dr Carlton B Goodlett Pl, San Francisco, CA 94102, USA	New York, NY
80 Chambers St, New York, NY 10007, USA	New York, NY
1420 S Oakhurst Dr, Los Angeles, CA 90035, USA	Los Angeles, CA
Shaver Lake, CA 93634, USA	California
Calle Aviacion, Río Hato, Panama	Panama
220 4th Ave S, Kent, WA 98032, USA	KENT, WASHINGTON
134 N LaSalle St suite 1720, Chicago, IL 60602, USA	Chicago
Unnamed Road, 862 96 Njurunda, Sweden	SWEDEN
ICastle view, Borris in ossory, Laois, Borris in ossory, Co. Laois, Ireland	Ireland
3788 Pontiac Trail, Ann Arbor, MI 48105, USA	Ann Arbor, MI
91 Hutcheon St, Aberdeen AB25 1EW, UK	Aberdeen, Scotland

From the table we can visually compare the cities, but let’s try with some code:

library(stringi)
stri_detect_fixed(sing_add_city$address, sing_add_city$city)

##  [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
## [12] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE
## [23]  TRUE  TRUE FALSE

From the output, we can see that 8 of the observations don’t match. But some of these mismatches are because some cities/countries are in capitals in the column city.

I will try this again, putting all in low case.

low_address <- str_to_lower(sing_add_city$address)
low_city <- str_to_lower(sing_add_city$city)
words_address <- str_split(low_address, boundary("word"))
words_city <- str_split(low_city, boundary("word"))

no_match<-function(match_length){
  match_length > 0 #to see if intersects or not
}

mapply(intersect, words_address, words_city) %>%
  lapply(length) %>%
  map(no_match)

## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] FALSE
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] TRUE
## 
## [[7]]
## [1] TRUE
## 
## [[8]]
## [1] TRUE
## 
## [[9]]
## [1] TRUE
## 
## [[10]]
## [1] TRUE
## 
## [[11]]
## [1] TRUE
## 
## [[12]]
## [1] TRUE
## 
## [[13]]
## [1] TRUE
## 
## [[14]]
## [1] TRUE
## 
## [[15]]
## [1] FALSE
## 
## [[16]]
## [1] TRUE
## 
## [[17]]
## [1] TRUE
## 
## [[18]]
## [1] FALSE
## 
## [[19]]
## [1] TRUE
## 
## [[20]]
## [1] TRUE
## 
## [[21]]
## [1] TRUE
## 
## [[22]]
## [1] TRUE
## 
## [[23]]
## [1] TRUE
## 
## [[24]]
## [1] TRUE
## 
## [[25]]
## [1] TRUE

After applying the previous code, the mismatches were reduced from 8 to 3. Observation number 15 is a true mismatch, since address = San Francisco, and city = New York. The other two cases are because in the column city only appears the state, and in the column address this state is abbreviated so they don’t match.

This things could potentially be true to different methods (i.e. having the states abbreviated in both columns) but it may compromise the accuracy of the output.

4.3 Go visual

library(leaflet)

singer_geo %>%  
  leaflet()  %>%   
  addTiles() %>%  
  addCircles(lng = singer_geo$longitude,
             lat = singer_geo$latitude,
             popup = singer_geo$artist_name, 
             color = "deeppink")

The map can show the artist name that corresponds to the city of each pink circle.

hw06_data_wrangling

Alejandra Urcelay

09/11/2018

Data wrangling wrap up

Topic 1: Character data

14.2 String basics

14.2.5 Exercises

14.3 Matching patterns with regular expressions

14.3.1.1 Basic matches

14.3.2.1 Anchors

14.3.3.1 Character classes and alternatives

14.3.4.1 Repetition

14.3.5.1 Grouping and backreferences

14.4 Tools

14.4.2 Detect matches

14.4.3.1 Extract matches

14.4.4.1 Grouped matches

14.4.5.1 Replacing matches

14.4.6.1 Splitting

14.5 Other types of pattern

Topic 4: Work with the singer data

4.1 Use `purrr` to map latitude and longitude into human readable information on the band’s origin places.

4.2 Try to check wether the place in city corresponds to the information you retrieved.

4.3 Go visual

References

hw06_data_wrangling

Alejandra Urcelay

09/11/2018

Data wrangling wrap up

Topic 1: Character data

14.2 String basics

14.2.5 Exercises

14.3 Matching patterns with regular expressions

14.3.1.1 Basic matches

14.3.2.1 Anchors

14.3.3.1 Character classes and alternatives

14.3.4.1 Repetition

14.3.5.1 Grouping and backreferences

14.4 Tools

14.4.2 Detect matches

14.4.3.1 Extract matches

14.4.4.1 Grouped matches

14.4.5.1 Replacing matches

14.4.6.1 Splitting

14.5 Other types of pattern

Topic 4: Work with the singer data

4.1 Use purrr to map latitude and longitude into human readable information on the band’s origin places.

4.2 Try to check wether the place in city corresponds to the information you retrieved.

4.3 Go visual

References

4.1 Use `purrr` to map latitude and longitude into human readable information on the band’s origin places.