最近还在适应美帝的生后,并且因为还没有开始新的课题,所以晚上时间会比较充裕一些,一直在看R for data science这本书,前面一直没有记笔记,后面觉得还是需要养成记笔记的习惯。所以从今天开始,即使每天少看一些,也要记下来自己的读书笔记。今天就从字符串处理stringr包开始,也是第一部分。

需要先加载包

library(tidyverse)
library(stringr)

1 string lenght

如何查看一个字符串中字符的长度

str_length(c("a", "R for data science", NA))
[1]  1 18 NA

2 合并字符串

str_c("x", "y")
[1] "xy"
str_c("x", "y", "z")
[1] "xyz"

同样可以使用sep参数来控制合并时中间分隔符。

str_c("x", "y", sep = ",")
[1] "x,y"

和很多其他函数一样,NA值是一个问题,比如下面的例子,这时候可以使用str_replace_na()函数来处理:

x <- c("abc", NA)
str_c("|-", x, "-|")
[1] "|-abc-|" NA       
str_c("|-", str_replace_na(x), "-|")
[1] "|-abc-|" "|-NA-|" 

同样从上面的例子也可以看到,str_c是一个vector函数,他会将长度较短的对象循环,使其与长度最长的对象长度一样。

str_c("prefix-", c("a", "b", "c"), "-suffix")
[1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"

下面要简单介绍一下在str_c中非常有用的两个参数,sepcollapse。其中collapse非常形象,坍塌,也就是将一个向量中的多个对象合并为一个字符。可以看下面的例子:

str_c(c("x", "y", "z"), collapse = ", ")
[1] "x, y, z"
str_c(c("x", "y", "z"), sep = ", ")
[1] "x" "y" "z"

3 提取字字符

有时候我们需要一个字符串中的某部分,这时候可以使用str_sub()

x <- c("Apple", "Banana", "Pear")
str_sub(string = x, start = 1, end = 3)
[1] "App" "Ban" "Pea"

该函数一共有三个参数,start和end分别是指从哪开始和结束提取字符串。 还可以反过来提取字符串,也就是将参数设置为负数。

str_sub(x, -3, -1)
[1] "ple" "ana" "ear"

4 区域设置

对于某些规定,不同国家或者地区不太一样,比如排序,默认英语里面的排序是按照英语字母顺序,但是不同地方可能不一样,比如下面的例子:

x <- c("apple", "eggplant", "banana")
str_sort(x, locale = "en")  # English
[1] "apple"    "banana"   "eggplant"
str_sort(x, locale = "haw") # Hawaiian
[1] "apple"    "eggplant" "banana"  

感觉这个功能不会太常用。了解一下就好了。

LS0tDQp0aXRsZTogIlLor63oqIBzdHJpbmdy5YyF55qE5L2/55SoIg0Kb3V0cHV0Og0KICBodG1sX2RvY3VtZW50Og0KICAgIGRmX3ByaW50OiBwYWdlZA0KICB3b3JkX2RvY3VtZW50OiBkZWZhdWx0DQogIHBkZl9kb2N1bWVudDogZGVmYXVsdA0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KIyMjIOacgOi/kei/mOWcqOmAguW6lOe+juW4neeahOeUn+WQju+8jOW5tuS4lOWboOS4uui/mOayoeacieW8gOWni+aWsOeahOivvumimO+8jOaJgOS7peaZmuS4iuaXtumXtOS8muavlOi+g+WFheijleS4gOS6m++8jOS4gOebtOWcqOeciyoqUiBmb3IgZGF0YSBzY2llbmNlKirov5nmnKzkuabvvIzliY3pnaLkuIDnm7TmsqHmnInorrDnrJTorrDvvIzlkI7pnaLop4nlvpfov5jmmK/pnIDopoHlhbvmiJDorrDnrJTorrDnmoTkuaDmg6/jgILmiYDku6Xku47ku4rlpKnlvIDlp4vvvIzljbPkvb/mr4/lpKnlsJHnnIvkuIDkupvvvIzkuZ/opoHorrDkuIvmnaXoh6rlt7HnmoTor7vkuabnrJTorrDjgILku4rlpKnlsLHku47lrZfnrKbkuLLlpITnkIZzdHJpbmdy5YyF5byA5aeL77yM5Lmf5piv56ys5LiA6YOo5YiG44CCDQoNCumcgOimgeWFiOWKoOi9veWMhQ0KDQpgYGB7ciwgZXZhbD1UUlVFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KHN0cmluZ3IpDQpgYGANCg0KIyMjIDEgc3RyaW5nIGxlbmdodA0KDQrlpoLkvZXmn6XnnIvkuIDkuKrlrZfnrKbkuLLkuK3lrZfnrKbnmoTplb/luqYNCg0KYGBge3IsIGV2YWw9VFJVRX0NCnN0cl9sZW5ndGgoYygiYSIsICJSIGZvciBkYXRhIHNjaWVuY2UiLCBOQSkpDQpgYGANCg0KIyMjIDIg5ZCI5bm25a2X56ym5LiyDQoNCmBgYHtyLCBldmFsPVRSVUV9DQpzdHJfYygieCIsICJ5IikNCnN0cl9jKCJ4IiwgInkiLCAieiIpDQpgYGANCg0K5ZCM5qC35Y+v5Lul5L2/55SoKnNlcCrlj4LmlbDmnaXmjqfliLblkIjlubbml7bkuK3pl7TliIbpmpTnrKbjgIINCg0KYGBge3IsIGV2YWw9VFJVRX0NCnN0cl9jKCJ4IiwgInkiLCBzZXAgPSAiLCIpDQpgYGANCg0K5ZKM5b6I5aSa5YW25LuW5Ye95pWw5LiA5qC377yMTkHlgLzmmK/kuIDkuKrpl67popjvvIzmr5TlpoLkuIvpnaLnmoTkvovlrZDvvIzov5nml7blgJnlj6/ku6Xkvb/nlKhgc3RyX3JlcGxhY2VfbmEoKWDlh73mlbDmnaXlpITnkIbvvJoNCg0KYGBge3IsIGV2YWw9VFJVRX0NCnggPC0gYygiYWJjIiwgTkEpDQpzdHJfYygifC0iLCB4LCAiLXwiKQ0Kc3RyX2MoInwtIiwgc3RyX3JlcGxhY2VfbmEoeCksICItfCIpDQpgYGANCg0K5ZCM5qC35LuO5LiK6Z2i55qE5L6L5a2Q5Lmf5Y+v5Lul55yL5Yiw77yMYHN0cl9jYOaYr+S4gOS4qnZlY3RvcuWHveaVsO+8jOS7luS8muWwhumVv+W6pui+g+efreeahOWvueixoeW+queOr++8jOS9v+WFtuS4jumVv+W6puacgOmVv+eahOWvueixoemVv+W6puS4gOagt+OAgg0KDQpgYGB7ciwgZXZhbD1UUlVFfQ0Kc3RyX2MoInByZWZpeC0iLCBjKCJhIiwgImIiLCAiYyIpLCAiLXN1ZmZpeCIpDQpgYGANCg0K5LiL6Z2i6KaB566A5Y2V5LuL57uN5LiA5LiL5ZyoYHN0cl9jYOS4remdnuW4uOacieeUqOeahOS4pOS4quWPguaVsO+8jGBzZXBg5ZKMYGNvbGxhcHNlYOOAguWFtuS4rWBjb2xsYXBzZWDpnZ7luLjlvaLosaHvvIxg5Z2N5aGMYO+8jOS5n+WwseaYr+WwhuS4gOS4quWQkemHj+S4reeahOWkmuS4quWvueixoeWQiOW5tuS4uuS4gOS4quWtl+espuOAguWPr+S7peeci+S4i+mdoueahOS+i+WtkO+8mg0KDQoNCmBgYHtyLCBldmFsPVRSVUV9DQpzdHJfYyhjKCJ4IiwgInkiLCAieiIpLCBjb2xsYXBzZSA9ICIsICIpDQpzdHJfYyhjKCJ4IiwgInkiLCAieiIpLCBzZXAgPSAiLCAiKQ0KYGBgDQoNCiMjIyAzIOaPkOWPluWtl+Wtl+espg0KDQrmnInml7blgJnmiJHku6zpnIDopoHkuIDkuKrlrZfnrKbkuLLkuK3nmoTmn5Dpg6jliIbvvIzov5nml7blgJnlj6/ku6Xkvb/nlKhgc3RyX3N1YigpYOOAgg0KDQpgYGB7ciwgZXZhbD1UUlVFfQ0KeCA8LSBjKCJBcHBsZSIsICJCYW5hbmEiLCAiUGVhciIpDQpzdHJfc3ViKHN0cmluZyA9IHgsIHN0YXJ0ID0gMSwgZW5kID0gMykNCmBgYA0KDQror6Xlh73mlbDkuIDlhbHmnInkuInkuKrlj4LmlbDvvIxzdGFydOWSjGVuZOWIhuWIq+aYr+aMh+S7juWTquW8gOWni+WSjOe7k+adn+aPkOWPluWtl+espuS4suOAgg0K6L+Y5Y+v5Lul5Y+N6L+H5p2l5o+Q5Y+W5a2X56ym5Liy77yM5Lmf5bCx5piv5bCG5Y+C5pWw6K6+572u5Li66LSf5pWw44CCDQoNCg0KYGBge3IsIGV2YWw9VFJVRX0NCnN0cl9zdWIoeCwgLTMsIC0xKQ0KYGBgDQoNCiMjIyA0IOWMuuWfn+iuvue9rg0KDQrlr7nkuo7mn5Dkupvop4TlrprvvIzkuI3lkIzlm73lrrbmiJbogIXlnLDljLrkuI3lpKrkuIDmoLfvvIzmr5TlpoLmjpLluo/vvIzpu5jorqToi7Hor63ph4zpnaLnmoTmjpLluo/mmK/mjInnhafoi7Hor63lrZfmr43pobrluo/vvIzkvYbmmK/kuI3lkIzlnLDmlrnlj6/og73kuI3kuIDmoLfvvIzmr5TlpoLkuIvpnaLnmoTkvovlrZDvvJoNCg0KYGBge3IsIGV2YWw9VFJVRX0NCnggPC0gYygiYXBwbGUiLCAiZWdncGxhbnQiLCAiYmFuYW5hIikNCnN0cl9zb3J0KHgsIGxvY2FsZSA9ICJlbiIpICAjIEVuZ2xpc2gNCnN0cl9zb3J0KHgsIGxvY2FsZSA9ICJoYXciKSAjIEhhd2FpaWFuDQpgYGANCg0K5oSf6KeJ6L+Z5Liq5Yqf6IO95LiN5Lya5aSq5bi455So44CC5LqG6Kej5LiA5LiL5bCx5aW95LqG44CCDQo=