library("stringr")raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
name## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
Assuming that this question refers to names that are in reversed order and separated by a comma, and that titles and other designations should remain in place:
```r
names <- str_split(name,", ")
for( i in 1:length(name) ){
if(length(names[[i]]) == 2){
name[i]<-str_c(names[[i]][2], names[[i]][1], sep=" ")
}
}
```
Result:
```r
print(name)
```
```
## [1] "Moe Szyslak" "C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Dr. Julius Hibbert"
```
My assumption here is that titles start with a single uppercase letter followed by 1 to 3 alphabetic characters and concludes with a dot.
```r
str_detect(name,"[:upper:]{1}[:alpha:]{1,3}\\.")
```
```
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
```
My approach to this is to separate the expression into two categories, people with a title and second name and people without a title but has a second name. I assumed that a title must start with an uppercase character and must be followed 1 to 3 aplhabetic characters and ends with a dot. An uppercase letter followed by dot will be treated as an abbreviated first name. These conditions are separated by a pipe to accomodate with both types. A person with a title and a second name will has an additional space followed by alphabetic characters in their string.
```r
str_detect(name,"([:upper:]{1}\\.([:space:][:upper:][:alpha:]+){2})|([:upper:]{1}[:lower:]+\\.([:space:][:upper:][:alpha:]+){3})")
```
```
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
```
This expression represents numbers with a minimum length of 1 digit followed by a dollar sign *$*.
Example: *0235$*
```r
str_extract("0235$","[0-9]+\\$")
```
```
## [1] "0235$"
```
This expression represents words that are up to four letters long.
Example: *a word*
```r
str_extract_all("a word","\\b[a-z]{1,4}\\b")
```
```
## [[1]]
## [1] "a" "word"
```
This expression represents strings starts with 0 or more characters and ends with *.txt*. This is representative of files with the extension *.txt*, but having an actual name for the file is optional.
Example: *.txt*
```r
str_detect(".txt",".*?\\.txt$")
```
```
## [1] TRUE
```
This represents 8 digits separated by a forward slash after the second digit and one after the fouth digit.
Example: 24/65/9987
```r
str_extract("29/22/2333","\\d{2}/\\d{2}/\\d{4}")
```
```
## [1] "29/22/2333"
```
The represents a text with at minimum 1 character enclosed in an XML type tag. The opening must be closed following the enclosed text.
Example: <d>x</d>
```r
str_extract("<d>x</d>","<(.+?)>.+?</\\1>")
```
```
## [1] "<d>x</d>"
```
clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3c oc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb. SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr
Extract the uppercase letters and punctuations from the text.
txt <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
str_extract_all(txt,"[:upper:]|[:punct:]")## [[1]]
## [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "." "Y"
## [18] "O" "U" "." "A" "R" "E" "." "A" "." "S" "U" "P" "E" "R" "N" "E" "R"
## [35] "D" "!"
This reveals "CONGRATULATIONS YOU ARE A SUPERNERD!".