## -- Attaching packages ---------- tidyverse 1.3.0 --
## √ ggplot2 3.3.2 √ purrr 0.3.4
## √ tibble 3.0.3 √ dplyr 0.8.5
## √ tidyr 1.0.2 √ stringr 1.4.0
## √ readr 1.3.1 √ forcats 0.5.0
## -- Conflicts ------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Join multiple strings into a single string.
用法:
str_c(…, sep = "", collapse = NULL)
sep: String to insert between input vectors.
collapse: Optional string used to combine input vectors into single string.If collapse = NULL (the default) a character vector with length equal to the longest input string. If collapse is non-NULL, a character vector of length 1
## [1] "a b c d e f g h i j k l m n o p q r s t u v w x y z"
## [1] "a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z"
## [1] "a&b&c&d&e&f&g&h&i&j&k&l&m&n&o&p&q&r&s&t&u&v&w&x&y&z"
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## [1] "a-d" NA "b-d"
## [1] "a" NA "b"
## [1] NA
## [1] "a-db"
## [1] "我-d" "和-d" "你-d"
## [1] "我-d和-d你"
## [1] "我-和-你"
## [1] "我" "和" "你"
## [1] "我和你-"
## [1] "我-d" "和-d" "你-d"
## [1] "我和你"
## [1] "我-和-你"
collapse连接字符串中间,比如连接c(“我”,“和”,“你”),代码:str_c(c(“我”,“和”,“你”),collapse=“-”),效果:“我-和-你”。sep连接字符串与字符串之间,比如连接”我“,”和“,”你“,代码:str_c(”我“,”和“,”你“,sep=”-“),效果:”我-和-你“。第三种情况,既不加collapse,也不加sep,见案例二,代码:str_c(c(”我“,”和“,”你“),”-d“),效果:”我-d" “和-d” “你-d”
Specify the encoding of a string.
指定字符串的编码
用法:
str_conv(string, encoding)
## [1] "\xb1"
## [1] "<U+0105>"
## [1] "±"
Count the number of matches in a string.
计算字符串中的匹配数。
用法:
str_count(string, pattern = "")
fruit <- c("apple", "banana", "pear", "pineapple")
str_count(fruit, "a")#第一个单词里面几个a,第二个单词里面几个a,第三个……
## [1] 1 3 1 1
## [1] 2 0 1 3
## [1] 1 0 1 2
## [1] 1 1 1 0
## [1] 2 3 4
## [1] 1 3 2
## [1] 5
## [1] 2
## [1] 1
Detect the presence or absence of a pattern in a string.
检测字符串中是否存在模式。
用法:
str_detect(string, pattern, negate = FALSE)
negate: If TRUE, return non-matching elements.
## [1] TRUE TRUE TRUE TRUE
## [1] TRUE FALSE FALSE FALSE
## [1] FALSE TRUE FALSE FALSE
## [1] FALSE TRUE FALSE FALSE
## [1] TRUE FALSE TRUE TRUE
## [1] TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE
## [1] TRUE TRUE FALSE FALSE
Duplicate and concatenate strings within a character vector.
在字符向量内复制和连接字符串。
用法:
str_dup(string, times)
## [1] "appleapple" "pearpear" "bananabanana"
## [1] "apple" "pearpear" "bananabananabanana"
## [1] "ba" "bana" "banana" "bananana" "banananana"
## [6] "bananananana"
## [1] "ba" "baba" "bababa" "babababa" "bababababa"
## [6] "babababababa"
## [1] "别叨叨" "别叨叨叨叨" "别叨叨叨叨叨叨"
Extract matching patterns from a string.
从字符串中提取匹配模式。
用法:
str_extract(string, pattern)
str_extract_all(string, pattern, simplify = FALSE)
simplify: If FALSE, the default, returns a list of character vectors. If TRUE returns a character matrix.
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk 2")
str_extract(shopping_list, "\\d")
## [1] "4" NA NA "2"
## [[1]]
## [1] "4"
##
## [[2]]
## character(0)
##
## [[3]]
## character(0)
##
## [[4]]
## [1] "2"
## [,1]
## [1,] "4"
## [2,] ""
## [3,] ""
## [4,] "2"
## [1] "a" "b" "b" "m"
## [1] "apples" "bag" "bag" "milk"
## [[1]]
## [1] "a" "p" "p" "l" "e" "s" "x"
##
## [[2]]
## [1] "b" "a" "g" "o" "f" "f" "l" "o" "u" "r"
##
## [[3]]
## [1] "b" "a" "g" "o" "f" "s" "u" "g" "a" "r"
##
## [[4]]
## [1] "m" "i" "l" "k"
## [[1]]
## [1] "apples" "x"
##
## [[2]]
## [1] "bag" "of" "flour"
##
## [[3]]
## [1] "bag" "of" "sugar"
##
## [[4]]
## [1] "milk"
## [1] "ap" "ba" "ba" "mi"
## [1] "appl" "bag" "bag" "milk"
## [1] NA "bag" "bag" "milk"
## [1] NA "of" "of" NA
## [1] NA "bag" "bag" NA
## [1] "les" "bag" "bag" "ilk"
## [1] "appl" "bag" "bag" "milk"
## [[1]]
## [1] "apples"
##
## [[2]]
## [1] "bag" "of" "flour"
##
## [[3]]
## [1] "bag" "of" "sugar"
##
## [[4]]
## [1] "milk"
## [,1] [,2] [,3]
## [1,] "apples" "" ""
## [2,] "bag" "of" "flour"
## [3,] "bag" "of" "sugar"
## [4,] "milk" "" ""
## [[1]]
## [1] "This" "is" "suprisingly" "a" "sentence"
##案例6
## [1] "4" NA "2"
## [1] "x" NA "i"
## [1] "x" NA "ipad"
## [1] "x" NA "ip"
## [1] NA NA "air"
## [1] NA NA "ipad"
## [1] NA NA "air"
## [1] NA NA NA
## [1] NA NA "pad"
## [1] "x" NA "ipa"
Flatten a string 展平字符串。
用法:
str_flatten(string, collapse = "")
## [1] "abcdefghijklmnopqrstuvwxyz"
## [1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"
## [1] "A*B*C*D*E*F*G*H*I*J*K*L*M*N*O*P*Q*R*S*T*U*V*W*X*Y*Z"
Format and interpolate a string with glue. 用glue格式化和插入字符串。
用法:
str_glue(…, .sep = "“, .envir = parent.frame())
str_glue_data(.x, …, .sep =”“, .envir = parent.frame(),.na =”NA")
.sep: [character(1): ‘""’] Separator used to separate elements.分隔符用于分隔元素。
.envir: [environment: parent.frame()] Environment to evaluate each expression in. Expressions are evaluated from left to right. If .x is an environment, the expressions are evaluated in that environment and .envir is ignored.用于评估每个表达式的环境。从左到右评估表达式。
.x: [listish] An environment, list or data frame used to lookup values.
.na: [character(1): ‘NA’] Value to replace NA values with. If NULL missing values are propagated, that is an NA result will cause NA output. Otherwise the value is replaced by the value of .na.
name <- "Fred"
age <- 50
anniversary <- as.Date("1991-10-12")
str_glue(
"My name is {name}, ",
"my age next year is {age + 1}, ",
"and my anniversary is {format(anniversary, '%A, %B %d, %Y')}."
)
## My name is Fred, my age next year is 51, and my anniversary is 星期六, 十月 12, 1991.
## My name is Fred, not {name}.
## My name is Joe, and my age next year is 41.
## 张三北风网北京海定
## 张三-北风网-北京海定
note: str_glue_data()
is useful in data pipelines
mtcars %>% str_glue_data(“{rownames(.)} has {hp} hp”)
从技术上讲,这将以字符串形式返回“代码点”的数量。一个代码点通常对应一个字符,但并非总是如此。例如,带有变音符号的u可能表示为单个字符或u和变音符号的组合。
用法:
str_length(string)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1] NA
## [1] 3
## [1] 1 4 11 NA
## [1] "ü"
## [1] "u<U+0308>"
## [1] 1
## [1] 2
## [1] 1
## [1] 1
Locate the position of patterns in a string.
用法:
str_locate(string, pattern) str_locate_all(string, pattern)
对于str_locate,是一个整数矩阵。 第一列给出比赛的开始位置,第二列给出给出最终位置。 对于str_locate_all,是整数矩阵的列表。
str_extract() for a convenient way of extracting matches, stringi::stri_locate() for the underlying implementation.
str_extract()用于提取匹配项的便捷方法,stringi ::stri_locate()用于提取匹配项基础实施。
## start end
## [1,] 6 5
## [2,] 7 6
## [3,] 5 4
## [4,] 10 9
## start end
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 5 5
## start end
## [1,] 5 5
## [2,] NA NA
## [3,] 2 2
## [4,] 4 4
## start end
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [[1]]
## start end
## [1,] 1 1
##
## [[2]]
## start end
## [1,] 2 2
## [2,] 4 4
## [3,] 6 6
##
## [[3]]
## start end
## [1,] 3 3
##
## [[4]]
## start end
## [1,] 5 5
## [[1]]
## start end
## [1,] 5 5
##
## [[2]]
## start end
##
## [[3]]
## start end
## [1,] 2 2
##
## [[4]]
## start end
## [1,] 4 4
## [2,] 9 9
## [[1]]
## start end
## [1,] 1 1
##
## [[2]]
## start end
## [1,] 1 1
##
## [[3]]
## start end
## [1,] 1 1
##
## [[4]]
## start end
## [1,] 1 1
## [2,] 6 6
## [3,] 7 7
## [[1]]
## start end
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
##
## [[2]]
## start end
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
##
## [[3]]
## start end
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
##
## [[4]]
## start end
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
## [7,] 7 7
## [8,] 8 8
## [9,] 9 9
Extract matched groups from a string.
从字符串中提取匹配的组
用法:
str_match(string, pattern) str_match_all(string, pattern)
对于str_match,是一个字符矩阵。第一列是完全匹配,第二列是对于每个捕获组。 对于str_match_all,字符矩阵列表。
str_extract() to extract the complete match,stringi::stri_match() for the underlying implementation.
strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
"387 287 6718", "apple", "233.398.9187 ", "482 952 3315",
"239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000",
"Home: 543.355.3679")
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
## [1] "219 733 8965" "329-293-8753" NA "595 794 7569" "387 287 6718"
## [6] NA "233.398.9187" "482 952 3315" "239 923 8115" "579-499-7527"
## [11] NA "543.355.3679"
## [,1] [,2] [,3] [,4]
## [1,] "219 733 8965" "219" "733" "8965"
## [2,] "329-293-8753" "329" "293" "8753"
## [3,] NA NA NA NA
## [4,] "595 794 7569" "595" "794" "7569"
## [5,] "387 287 6718" "387" "287" "6718"
## [6,] NA NA NA NA
## [7,] "233.398.9187" "233" "398" "9187"
## [8,] "482 952 3315" "482" "952" "3315"
## [9,] "239 923 8115" "239" "923" "8115"
## [10,] "579-499-7527" "579" "499" "7527"
## [11,] NA NA NA NA
## [12,] "543.355.3679" "543" "355" "3679"
## [[1]]
## [1] "219 733 8965"
##
## [[2]]
## [1] "329-293-8753"
##
## [[3]]
## character(0)
##
## [[4]]
## [1] "595 794 7569"
##
## [[5]]
## [1] "387 287 6718"
##
## [[6]]
## character(0)
##
## [[7]]
## [1] "233.398.9187"
##
## [[8]]
## [1] "482 952 3315"
##
## [[9]]
## [1] "239 923 8115" "842 566 4692"
##
## [[10]]
## [1] "579-499-7527"
##
## [[11]]
## character(0)
##
## [[12]]
## [1] "543.355.3679"
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] "219 733 8965" "219" "733" "8965"
##
## [[2]]
## [,1] [,2] [,3] [,4]
## [1,] "329-293-8753" "329" "293" "8753"
##
## [[3]]
## [,1] [,2] [,3] [,4]
##
## [[4]]
## [,1] [,2] [,3] [,4]
## [1,] "595 794 7569" "595" "794" "7569"
##
## [[5]]
## [,1] [,2] [,3] [,4]
## [1,] "387 287 6718" "387" "287" "6718"
##
## [[6]]
## [,1] [,2] [,3] [,4]
##
## [[7]]
## [,1] [,2] [,3] [,4]
## [1,] "233.398.9187" "233" "398" "9187"
##
## [[8]]
## [,1] [,2] [,3] [,4]
## [1,] "482 952 3315" "482" "952" "3315"
##
## [[9]]
## [,1] [,2] [,3] [,4]
## [1,] "239 923 8115" "239" "923" "8115"
## [2,] "842 566 4692" "842" "566" "4692"
##
## [[10]]
## [,1] [,2] [,3] [,4]
## [1,] "579-499-7527" "579" "499" "7527"
##
## [[11]]
## [,1] [,2] [,3] [,4]
##
## [[12]]
## [,1] [,2] [,3] [,4]
## [1,] "543.355.3679" "543" "355" "3679"
## [,1] [,2] [,3]
## [1,] "<a> <b>" "a" "b"
## [2,] "<a> <>" "a" ""
## [3,] NA NA NA
## [4,] NA NA NA
## [5,] NA NA NA
## [[1]]
## [,1] [,2]
## [1,] "<a>" "a"
## [2,] "<b>" "b"
##
## [[2]]
## [,1] [,2]
## [1,] "<a>" "a"
## [2,] "<>" ""
##
## [[3]]
## [,1] [,2]
## [1,] "<a>" "a"
##
## [[4]]
## [,1] [,2]
##
## [[5]]
## [,1] [,2]
## [1,] NA NA
## [1] "<a>" "<a>" "<a>" NA NA
## [[1]]
## [1] "<a>" "<b>"
##
## [[2]]
## [1] "<a>" "<>"
##
## [[3]]
## [1] "<a>"
##
## [[4]]
## character(0)
##
## [[5]]
## [1] NA
## [,1]
## [1,] "a"
## [2,] NA
## [3,] NA
## [,1]
## [1,] NA
## [2,] NA
## [3,] NA
## [[1]]
## [,1]
## [1,] "a"
##
## [[2]]
## [,1]
##
## [[3]]
## [,1]
## [[1]]
## [,1]
##
## [[2]]
## [,1]
##
## [[3]]
## [,1]
## [1] "a" NA NA
## [1] NA NA NA
## [[1]]
## [1] "a"
##
## [[2]]
## character(0)
##
## [[3]]
## character(0)
## [[1]]
## character(0)
##
## [[2]]
## character(0)
##
## [[3]]
## character(0)
Order or sort a character vector.排序
用法:
str_order(x, decreasing = FALSE, na_last = TRUE, locale = “en”, numeric = FALSE, …)
str_sort(x, decreasing = FALSE, na_last = TRUE, locale = “en”, numeric = FALSE, …)
decreasing: A boolean. If FALSE, the default, sorts from lowest to highest; if TRUE sorts from highest to lowest.
na_last: Where should NA go? TRUE at the end, FALSE at the beginning, NA dropped.
locale: In which locale should the sorting occur? Defaults to the English. This ensures that code behaves the same way across platforms.
numeric: If TRUE, will sort digits numerically, instead of as strings.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## [1] 1 5 9 15 21 2 3 4 6 7 8 10 11 12 13 14 16 17 18 19 20 22 23 24 25
## [26] 26
## [1] "a" "e" "i" "o" "u" "b" "c" "d" "f" "g" "h" "j" "k" "l" "m" "n" "p" "q" "r"
## [20] "s" "t" "v" "w" "x" "y" "z"
## [1] "100a10" "100a5" "2a" "2b"
## [1] 1 2 4 3
## [1] "2a" "2b" "100a5" "100a10"
## [1] 4 3 2 1
## [1] NA "book" "cook" "ebook" "edog" "good"
## [1] 5 3 2 4 6 1
Pad a string.
用法:
str_pad(string, width, side = c(“left”, “right”, “both”), pad = " ")
str_trim() to remove whitespace; str_trunc() to decrease the maximum width of a string. str_trim()删除空格; str_trunc()减小字符串的最大宽度。
##案例1
rbind(
str_pad("hadley", 30, "left"),
str_pad("hadley", 30, "right"),
str_pad("hadley", 30, "both")
)
## [,1]
## [1,] " hadley"
## [2,] "hadley "
## [3,] " hadley "
## [1] " a" " abc" " abcdef"
## [1] " a" " a" " a"
## [1] "---------a" "_________a" " a"
## [1] "hadley"
Remove matched patterns in a string.
用法:
str_remove(string, pattern) str_remove_all(string, pattern)
Alias for str_replace(string, pattern, "“) str_replace的别名(字符串,模式,“”)
str_replace() for the underlying implementation str_replace()用于基础实现
## [1] "ne apple" "tw pears" "thre bananas"
## [1] "n ppl" "tw prs" "thr bnns"
Replace matched patterns in a string.
用法:
str_replace(string, pattern, replacement) str_replace_all(string, pattern, replacement)
pattern Pattern to look for. The default interpretation is a regular expression, as described in stringi::stringisearch-regex. Control options with regex(). Match a fixed string (i.e. by comparing only bytes), using fixed(). This is fast, but approximate. Generally, for matching human text, you’ll want coll() which respects character matching rules for the specified locale.
str_replace_na() to turn missing values into “NA”; stri_replace() for the underlying implementation.
fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, "[aeiou]", "-")#替换每组词中第一个属于【aeiou】的字母
## [1] "-ne apple" "tw- pears" "thr-e bananas"
## [1] "-n- -ppl-" "tw- p--rs" "thr-- b-n-n-s"
## [1] "OnE ApplE" "twO pEArs" "thrEE bAnAnAs"
## [1] "one apple" "two pears" NA
## [1] "ne apple" "tw pears" "thre bananas"
## [1] "oone apple" "twoo pears" "threee bananas"
## [1] "1ne apple" "tw2 pears" "thr3e bananas"
## [1] "one -pple" "two p-ars" "three bananas"
# If you want to apply multiple patterns and replacements to the same
# string, pass a named vector to pattern.
fruits %>%
str_c(collapse = "---") %>%
str_replace_all(c("one" = "1", "two" = "2", "three" = "3"))
## [1] "1 apple---2 pears---3 bananas"
## [1] "one apple---" "two pears---" "three bananas---"
## [1] "1 apple---" "2 pears---" "3 bananas---"
## [1] "one apple---two pears---three bananas"
y <- fruits %>%
str_c(collapse = "---")
str_replace_all(y,c("one" = "1", "two" = "2", "three" = "3"))
## [1] "1 apple---2 pears---3 bananas"
Split up a string into pieces. 将一串切成小块。
用法:
str_split(string, pattern, n = Inf, simplify = FALSE) str_split_fixed(string, pattern, n)
simplify: If FALSE, the default, returns a list of character vectors. If TRUE returns a character matrix.
For str_split_fixed, a character matrix with n columns. For str_split, a list of character vectors.
## [[1]]
## [1] "apples" "oranges" "pears" "bananas"
##
## [[2]]
## [1] "pineapples" "mangos" "guavas"
## [,1] [,2] [,3] [,4]
## [1,] "apples" "oranges" "pears" "bananas"
## [2,] "pineapples" "mangos" "guavas" ""
## [[1]]
## [1] "apples" "oranges" "pears and bananas"
##
## [[2]]
## [1] "pineapples" "mangos" "guavas"
## [[1]]
## [1] "apples" "oranges and pears and bananas"
##
## [[2]]
## [1] "pineapples" "mangos and guavas"
## [[1]]
## [1] "apples" "oranges" "pears" "bananas"
##
## [[2]]
## [1] "pineapples" "mangos" "guavas"
## [,1] [,2] [,3]
## [1,] "apples" "oranges" "pears and bananas"
## [2,] "pineapples" "mangos" "guavas"
## [,1] [,2] [,3] [,4]
## [1,] "apples" "oranges" "pears" "bananas"
## [2,] "pineapples" "mangos" "guavas" ""
## [[1]]
## [1] "你" "我他"
## [,1] [,2] [,3]
## [1,] "你" "我" "他"
## [[1]]
## [1] "你" "我" "他"
Detect the presence or absence of a pattern at the beginning or end of a string. 检测字符串开头或结尾是否存在模式。
用法:
str_starts(string, pattern, negate = FALSE) str_ends(string, pattern, negate = FALSE)
negate: If TRUE, return non-matching elements.
## [1] FALSE FALSE TRUE TRUE
## [1] TRUE TRUE FALSE FALSE
## [1] TRUE FALSE FALSE TRUE
## [1] FALSE TRUE TRUE FALSE
Extract and replace substrings from a character vector. 从字符向量中提取并替换子字符串。
用法:
str_sub(string, start = 1L, end = -1L) str_sub(string, start = 1L, end = -1L, omit_na = FALSE)<- value
omit_na: Single logical value. If TRUE, missing values in any of the arguments provided will result in an unchanged input.
## [1] "dle"
## [1] "Hadley W"
## [1] "y Wickham"
## [1] "ickham"
## [1] "adley W" "y Wickham"
## [1] "am"
## [1] " Wickham"
## [1] "Hadley W"
## start end
## [1,] 2 2
## [2,] 5 5
## [3,] 9 9
## [4,] 13 13
## [1] "a" "e" "i" "a"
## [1] "a" "e" "i" "a"
## [1] "我来自"
## [1] "中国"
## [1] "我来自中"
## [1] "中国"
## [1] "中国"
## [1] "我来自中"
## [1] "我" "来" "自" "中" "国"
Keep strings matching a pattern, or find positions.
保持字符串与模式匹配,或找到位置。
用法:
str_subset(string, pattern, negate = FALSE)
str_which(string, pattern, negate = FALSE)
## [1] "apple" "banana" "pear" "pinapple"
## [1] 1 2 3 4
## [1] "apple"
## [1] "banana"
## [1] "banana"
## [1] "apple" "banana" "pear" "pinapple"
## [1] "apple" "banana"
## [1] "a" "b"
## [1] 1 3
## [1] "我来自中国" "我来自英国"
## [1] "我来自中国"
Trim whitespace from a string。 从字符串修剪空格。
用法:
str_trim(string, side = c(“both”, “left”, “right”)) str_squish(string) str_pad() to add whitespace
## [1] "String with trailing and leading white space"
## [1] "String with trailing and leading white space"
## [1] "String with trailing and leading white space"
## [1] "String with trailing and leading white space"
## [1] "String with trailing, middle, and leading white space"
## [1] "String with excess, trailing and leading white space"
## [1] "我来 自中国"
## [1] "我来 自中国"
## [1] "我来 自中国"
Truncate a character string.截断字符串。
用法:
str_trunc(string, width, side = c(“right”, “left”, “center”), ellipsis = “…”)
side, ellipsis: Location and content of ellipsis that indicates content has been removed.
str_pad() to increase the minimum width of a string.
x <- "This string is moderately long"
rbind(
str_trunc(x, 20, "right"),
str_trunc(x, 20, "left"),
str_trunc(x, 20, "center")
)
## [,1]
## [1,] "This string is mo..."
## [2,] "...s moderately long"
## [3,] "This stri...ely long"
## [1] "..."
## [1] "我..."
## [1] "我来自中国"
## [1] "..."
## [1] "I..."
## [1] "It i..."
View HTML rendering of regular expression match. 查看正则表达式匹配的HTML呈现。
用法:
str_view(string, pattern, match = NA)
str_view_all(string, pattern, match = NA)
match: If TRUE, shows only strings that match the pattern. If FALSE, shows only the strings that don’t match the pattern. Otherwise (the default, NA) displays both matches and non-matches.
Wrap strings into nicely formatted paragraphs. 将字符串包装成格式正确的段落。
用法:
str_wrap(string, width = 80, indent = 0, exdent = 0)
width: positive integer giving target line width in characters. A width less than or equal to 1 will put each word on its ownline. 正整数,以字符为单位给出目标行宽。小于或等于1的宽度会将每个单词放在自己的行上。
indent: non-negative integer giving indentation of first line in each paragraph 非负整数,使每个段落的第一行缩进
exdent: non-negative integer giving indentation of following lines in each paragraph 非负整数,使每段中的以下行缩进
thanks_path <- file.path(R.home("doc"), "THANKS")
thanks <- str_c(readLines(thanks_path), collapse = "\n")
thanks <- word(thanks, 1, 3, fixed("\n\n"))
cat(str_wrap(thanks), "\n")
## R would not be what it is today without the invaluable help of these people
## outside of the R core team, who contributed by donating code, bug fixes and
## documentation: Valerio Aimale, Suharto Anggono, Thomas Baier, Henrik Bengtsson,
## Roger Bivand, Ben Bolker, David Brahm, G"oran Brostr"om, Patrick Burns, Vince
## Carey, Saikat DebRoy, Matt Dowle, Brian D'Urso, Lyndon Drake, Dirk Eddelbuettel,
## Claus Ekstrom, Sebastian Fischmeister, John Fox, Paul Gilbert, Yu Gong, Gabor
## Grothendieck, Frank E Harrell Jr, Peter M. Haverty, Torsten Hothorn, Robert
## King, Kjetil Kjernsmo, Roger Koenker, Philippe Lambert, Jan de Leeuw, Jim
## Lindsey, Patrick Lindsey, Catherine Loader, Gordon Maclean, Arni Magnusson, John
## Maindonald, David Meyer, Ei-ji Nakama, Jens Oehlschaegel, Steve Oncley, Richard
## O'Keefe, Hubert Palme, Roger D. Peng, Jose' C. Pinheiro, Tony Plate, Anthony
## Rossini, Jonathan Rougier, Petr Savicky, Guenther Sawitzki, Marc Schwartz, Arun
## Srinivasan, Detlef Steuer, Bill Simpson, Gordon Smyth, Adrian Trapletti, Terry
## Therneau, Rolf Turner, Bill Venables, Gregory R. Warnes, Andreas Weingessel,
## Morten Welinder, James Wettenhall, Simon Wood, and Achim Zeileis. Others have
## written code that has been adopted by R and is acknowledged in the code files,
## including
## R would not be what it is today
## without the invaluable help of these
## people outside of the R core team, who
## contributed by donating code, bug fixes
## and documentation: Valerio Aimale,
## Suharto Anggono, Thomas Baier, Henrik
## Bengtsson, Roger Bivand, Ben Bolker,
## David Brahm, G"oran Brostr"om, Patrick
## Burns, Vince Carey, Saikat DebRoy,
## Matt Dowle, Brian D'Urso, Lyndon Drake,
## Dirk Eddelbuettel, Claus Ekstrom,
## Sebastian Fischmeister, John Fox, Paul
## Gilbert, Yu Gong, Gabor Grothendieck,
## Frank E Harrell Jr, Peter M. Haverty,
## Torsten Hothorn, Robert King, Kjetil
## Kjernsmo, Roger Koenker, Philippe
## Lambert, Jan de Leeuw, Jim Lindsey,
## Patrick Lindsey, Catherine Loader,
## Gordon Maclean, Arni Magnusson, John
## Maindonald, David Meyer, Ei-ji Nakama,
## Jens Oehlschaegel, Steve Oncley, Richard
## O'Keefe, Hubert Palme, Roger D. Peng,
## Jose' C. Pinheiro, Tony Plate, Anthony
## Rossini, Jonathan Rougier, Petr Savicky,
## Guenther Sawitzki, Marc Schwartz, Arun
## Srinivasan, Detlef Steuer, Bill Simpson,
## Gordon Smyth, Adrian Trapletti, Terry
## Therneau, Rolf Turner, Bill Venables,
## Gregory R. Warnes, Andreas Weingessel,
## Morten Welinder, James Wettenhall, Simon
## Wood, and Achim Zeileis. Others have
## written code that has been adopted by R
## and is acknowledged in the code files,
## including
## R would not be what it is today without the invaluable help
## of these people outside of the R core team, who contributed
## by donating code, bug fixes and documentation: Valerio
## Aimale, Suharto Anggono, Thomas Baier, Henrik Bengtsson,
## Roger Bivand, Ben Bolker, David Brahm, G"oran Brostr"om,
## Patrick Burns, Vince Carey, Saikat DebRoy, Matt Dowle,
## Brian D'Urso, Lyndon Drake, Dirk Eddelbuettel, Claus
## Ekstrom, Sebastian Fischmeister, John Fox, Paul Gilbert,
## Yu Gong, Gabor Grothendieck, Frank E Harrell Jr, Peter M.
## Haverty, Torsten Hothorn, Robert King, Kjetil Kjernsmo,
## Roger Koenker, Philippe Lambert, Jan de Leeuw, Jim Lindsey,
## Patrick Lindsey, Catherine Loader, Gordon Maclean, Arni
## Magnusson, John Maindonald, David Meyer, Ei-ji Nakama,
## Jens Oehlschaegel, Steve Oncley, Richard O'Keefe, Hubert
## Palme, Roger D. Peng, Jose' C. Pinheiro, Tony Plate, Anthony
## Rossini, Jonathan Rougier, Petr Savicky, Guenther Sawitzki,
## Marc Schwartz, Arun Srinivasan, Detlef Steuer, Bill Simpson,
## Gordon Smyth, Adrian Trapletti, Terry Therneau, Rolf Turner,
## Bill Venables, Gregory R. Warnes, Andreas Weingessel, Morten
## Welinder, James Wettenhall, Simon Wood, and Achim Zeileis.
## Others have written code that has been adopted by R and is
## acknowledged in the code files, including
## R would not be what it is today without the invaluable help
## of these people outside of the R core team, who contributed
## by donating code, bug fixes and documentation: Valerio
## Aimale, Suharto Anggono, Thomas Baier, Henrik Bengtsson,
## Roger Bivand, Ben Bolker, David Brahm, G"oran Brostr"om,
## Patrick Burns, Vince Carey, Saikat DebRoy, Matt Dowle,
## Brian D'Urso, Lyndon Drake, Dirk Eddelbuettel, Claus
## Ekstrom, Sebastian Fischmeister, John Fox, Paul Gilbert,
## Yu Gong, Gabor Grothendieck, Frank E Harrell Jr, Peter M.
## Haverty, Torsten Hothorn, Robert King, Kjetil Kjernsmo,
## Roger Koenker, Philippe Lambert, Jan de Leeuw, Jim Lindsey,
## Patrick Lindsey, Catherine Loader, Gordon Maclean, Arni
## Magnusson, John Maindonald, David Meyer, Ei-ji Nakama,
## Jens Oehlschaegel, Steve Oncley, Richard O'Keefe, Hubert
## Palme, Roger D. Peng, Jose' C. Pinheiro, Tony Plate, Anthony
## Rossini, Jonathan Rougier, Petr Savicky, Guenther Sawitzki,
## Marc Schwartz, Arun Srinivasan, Detlef Steuer, Bill Simpson,
## Gordon Smyth, Adrian Trapletti, Terry Therneau, Rolf Turner,
## Bill Venables, Gregory R. Warnes, Andreas Weingessel, Morten
## Welinder, James Wettenhall, Simon Wood, and Achim Zeileis.
## Others have written code that has been adopted by R and is
## acknowledged in the code files, including
## R
## would
## not
## be
## what
## it
## is
## today
## without
## the
## invaluable
## help
## of
## these
## people
## outside
## of
## the
## R
## core
## team,
## who
## contributed
## by
## donating
## code,
## bug
## fixes
## and
## documentation:
## Valerio
## Aimale,
## Suharto
## Anggono,
## Thomas
## Baier,
## Henrik
## Bengtsson,
## Roger
## Bivand,
## Ben
## Bolker,
## David
## Brahm,
## G"oran
## Brostr"om,
## Patrick
## Burns,
## Vince
## Carey,
## Saikat
## DebRoy,
## Matt
## Dowle,
## Brian
## D'Urso,
## Lyndon
## Drake,
## Dirk
## Eddelbuettel,
## Claus
## Ekstrom,
## Sebastian
## Fischmeister,
## John
## Fox,
## Paul
## Gilbert,
## Yu
## Gong,
## Gabor
## Grothendieck,
## Frank
## E
## Harrell
## Jr,
## Peter
## M.
## Haverty,
## Torsten
## Hothorn,
## Robert
## King,
## Kjetil
## Kjernsmo,
## Roger
## Koenker,
## Philippe
## Lambert,
## Jan
## de
## Leeuw,
## Jim
## Lindsey,
## Patrick
## Lindsey,
## Catherine
## Loader,
## Gordon
## Maclean,
## Arni
## Magnusson,
## John
## Maindonald,
## David
## Meyer,
## Ei-
## ji
## Nakama,
## Jens
## Oehlschaegel,
## Steve
## Oncley,
## Richard
## O'Keefe,
## Hubert
## Palme,
## Roger
## D.
## Peng,
## Jose'
## C.
## Pinheiro,
## Tony
## Plate,
## Anthony
## Rossini,
## Jonathan
## Rougier,
## Petr
## Savicky,
## Guenther
## Sawitzki,
## Marc
## Schwartz,
## Arun
## Srinivasan,
## Detlef
## Steuer,
## Bill
## Simpson,
## Gordon
## Smyth,
## Adrian
## Trapletti,
## Terry
## Therneau,
## Rolf
## Turner,
## Bill
## Venables,
## Gregory
## R.
## Warnes,
## Andreas
## Weingessel,
## Morten
## Welinder,
## James
## Wettenhall,
## Simon
## Wood,
## and
## Achim
## Zeileis.
## Others
## have
## written
## code
## that
## has
## been
## adopted
## by
## R
## and
## is
## acknowledged
## in
## the
## code
## files,
## including
Extract words from a sentence.
用法:
word(string, start = 1L, end = start, sep = fixed(" "))
sep: separator between words. Defaults to single space. 单词之间的分隔符。 默认为单个空格。
## [1] "Jane" "Jane"
## [1] "saw" "sat"
## [1] "cat" "down"
## [1] "saw a cat" "sat down"
## [1] "Jane saw a cat" "saw a cat" "a cat"
## [1] "Jane" "Jane saw" "Jane saw a" "Jane saw a cat"
##案例3(不懂)
# Can define words by other separators
str <- 'abc.def..123.4568.999'
word(str, 1, sep = fixed('..'))
## [1] "abc.def"
## [1] "123.4568.999"
## [1] "abc"
Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets. 比较字符串中的文字字节。 这非常快,但通常不是非ASCII字符集所需的。
用法:
fixed(pattern, ignore_case = FALSE)
ignore_case: Should case differences be ignored in the match?是否需要区分大小写?
Compare strings respecting standard collation rules.
比较符合标准整理规则的字符串。
用法:
coll(pattern, ignore_case = FALSE, locale = “en”, …)
locale: Locale to use for comparisons. See stringi::stri_locale_list() for all possible options. Defaults to “en” (English) to ensure that the default collation is consistent across platforms.
The default. Uses ICU regular expressions.
用法:
regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, …)
multiline: If TRUE, $ and ^ match the beginning and end of each line. If FALSE, the default, only match the start and end of the input.
comments: If TRUE, white space and comments beginning with # are ignored. Escape literal spaces with
dotall: If TRUE, . will also match line terminators.
Match boundaries between things.
用法:
boundary(type = c(“character”, “line_break”, “sentence”, “word”), skip_word_none = NA, …)
character
Every character is a boundary.
line_break
Boundaries are places where it is acceptable to have a line break in the current locale.
sentence
The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (details).
word
The beginnings and ends of words are boundaries.
skip_word_none: Ignore “words” that don’t contain any characters or numbers - i.e. punctuation. Default NA will skip such “words” only when splitting on word boundaries.