R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
raw.data
## [1] "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
##Library
library(stringr)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ forcats 0.4.0
## ✔ readr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Problem #3 Copy the introductory example. The vector name stores the extracted names. R> name [1] “Moe Szyslak” [4] “Ned Flanders” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” “Simpson, Homer” “Dr. Julius Hibbert”
Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name .
Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr. ).
Construct a logical vector indicating whether a character has a second name.
data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
#remove phone numbers from raw data page 206 of textbook
#names https://stackoverflow.com/questions/33826650/last-name-first-name-to-first-name-last-name
clnnames <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
#clnnames
splitname <- str_split(clnnames, ", ", simplify = TRUE)
#splitname
firstlast <- str_c(splitname[,2]," ",splitname[,1])
firstlast
## [1] " Moe Szyslak" "C. Montgomery Burns" " Rev. Timothy Lovejoy"
## [4] " Ned Flanders" "Homer Simpson" " Dr. Julius Hibbert"
frntname<- str_detect(firstlast, "[[:alpha:]]{2,}\\.")
frntname
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
middle <- str_detect(firstlast, "[A-Z]{1}\\.")
middle
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
exp1 <- c("goog1e$", "google1$")
ans4a <- str_detect(exp1, "[0-9]+\\$")
ans4a
## [1] FALSE TRUE
exp2 <- c(" tomz ", " Tomz ","John", "john","mik","drinkmilk", "Josi " )
ans4b <- str_detect(exp2, "\\b[a-z]{1,4}\\b")
ans4b
## [1] TRUE FALSE FALSE TRUE TRUE FALSE FALSE
exp3 <- c("test .txt", "test.txt", "test.txta","test.xml", "test.json" )
ans4c <- str_detect(exp3, ".*?\\.txt$")
ans4c
## [1] TRUE TRUE FALSE FALSE FALSE
exp4 <- c("31/12/2019", "12/31/2019","1/01/2019" )
ans4d <- str_detect(exp4, "\\d{2}/\\d{2}/\\d{4}")
ans4d
## [1] TRUE TRUE FALSE
exp5 <- c("<br>HTML is fun</br>", "<br> Happy Holidays!<br>" )
ans4e <- str_detect(exp5, "<(.+?)>.+?</\\1>")
ans4e
## [1] TRUE FALSE