Regular Expression in R

February 28, 2017

上集提要

x1$土地區段位置.建物區段門牌 == "文山區"

## [1] FALSE FALSE FALSE FALSE FALSE FALSE

grepl("文山區",x1$土地區段位置.建物區段門牌)

## [1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE

regexpr("[0-9]",x1$土地區段位置.建物區段門牌)

## [1] 12 13 12 12 13 12
## attr(,"match.length")
## [1] 1 1 1 1 1 1

有覺得"[0-9]"這種寫法很怪嗎?

print(x)

## [1] "106(86+20)" "114(95+19)" "99(87+12)"  "117(99+18)" "114(97+17)"

class(x)

## [1] "character"

刪除括號內容 –> 需找"("的位置

regexpr("(",x)

## Error in regexpr("(", x): invalid regular expression '(', reason 'Missing ')''

grepl(".",x)

## [1] TRUE TRUE TRUE TRUE TRUE

grepl("\",x)

## Error: <text>:1:7: unexpected INCOMPLETE_STRING
## 1: grepl("\",x)
##           ^

還有"+","$","^","?","*","{}"

~~一定是R的bug~~

## [1] "106(86+20)" "114(95+19)" "99(87+12)"  "117(99+18)" "114(97+17)"

regexpr("\\(",x)

## [1] 4 4 3 4 4
## attr(,"match.length")
## [1] 1 1 1 1 1
## attr(,"useBytes")
## [1] TRUE

x = c("apple","banana","orange","grape","melon","nut","zxcvb")

grepl("n",x)

## [1] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE

grepl("e",x)

## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE

"n" or "e"

grepl("n|e",x)

## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

## apple banana orange grape melon nut zxcvb

grepl("n|e",x)

## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

grepl("[ne]",x)

## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

[ ] 等同多個 |

## apple banana orange grape melon nut zxcvb

只要"a"開始的字串

grepl("^a",x)

## [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

只要"a"結尾的字串

grepl("a$",x)

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

z = c("workspace","論文","研究室lab","lab研究室","3F研究室","a1老b2c闆3")

## workspace 論文 研究室lab lab研究室 3F研究室 a1老b2c闆3

1.找出含小寫英文的字串

grepl("[a-z]",z) #僅考慮小寫,大寫為[A-Z]

## [1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE

z[grepl("[a-z]",z)]

## [1] "workspace"  "研究室lab"  "lab研究室"  "a1老b2c闆3"

2.找出不含數字的字串

grepl("[0-9]",z)

## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE

z[!grepl("[0-9]",z)]

## [1] "workspace" "論文"      "研究室lab" "lab研究室"

3.刪除所有小寫英文字元

gsub("[a-z]","",z)

## [1] ""         "論文"     "研究室"   "研究室"   "3F研究室" "1老2闆3"

4.刪除所有非小寫英文字元

gsub("[^a-z]","",z)

## [1] "workspace" ""          "lab"       "lab"       ""          "abc"

5.刪除所有小寫英文字元及數字

gsub("[a-z0-9]","",z)

## [1] ""        "論文"    "研究室"  "研究室"  "F研究室" "老闆"

分別是文書處理的「Enter」及「Tab」

paste0("ABCD\n","EFG")

## [1] "ABCD\nEFG"

cat(paste0("ABCD\n","EFG"))

## ABCD
## EFG

cat("ABCD\n","EFG",sep="")

## ABCD
## EFG

Advance: Perl Regex

## apple banana orange grape melon nut zxcvb

"n"前有任一字元的字串

grepl(".n",x)

## [1] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE

"n"後有任一字元的字串

grepl("n.",x)

## [1] FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE

"."在regex中，可當為萬用字元

不論中英文數字標點火星文也可以,只要佔一個字元

y = c("abc","adc","afc","kfc")

grepl("a.c",y)

## [1]  TRUE  TRUE  TRUE FALSE

grepl("a[bd]c",y)

## [1]  TRUE  TRUE FALSE FALSE

grepl("a[^bd]c",y)

## [1] FALSE FALSE  TRUE FALSE