String
> str1 <- 'It was a good day!'
> length(str1)
[1] 1
> nchar(str1) # this shows the number of characters in the string
[1] 18
You can create vector of alphabets -
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
Logical operation in charaters
For exact matching -
> 23 == "23"
[1] TRUE
> c('23','go') == c(23, 'go')
[1] TRUE TRUE
> c(23.0, 'NOW') == c('23','Now') # Case sensitive
[1] TRUE FALSE
> c('New', 'Shoe', 'New')=='New'
[1] TRUE FALSE TRUE
grep()
It is used for partial matching in strings -
> strings <- c("file.excel","data.csv","random data.dat", "data2.csv")
> grep("csv", strings)
[1] 2 4
Which shows that 2nd and 4th elements matches the given string “csv”.
duplicated() function
> duplicated(c("R","is","not","R"))
[1] FALSE FALSE FALSE TRUE
The fourth outcome is True, which means it is duplicated.
Alphabetic order comparison
R considers letters that come later in the alphabet to be greater than earlier letters, meaning it can determine whether one string of letters is greater than another with respect to alphabetical order
> 'a' < 'b'
[1] TRUE
> 'a' > 'b'
[1] FALSE
Furthermore, uppercase letters are considered greater than lowercase letters.
> 'A' > 'a'
[1] TRUE
cat vs paste
> obj1 <- c("awesome","R","is")
> cat(obj1[2], obj1[3], obj1[1], '!')
R is awesome !
In cat one cannot directly assign the result to a new variable and treat it as a character string.
> obj1 <- c("awesome","R","is")
> paste(obj1[2], obj1[3], obj1[1], '!')
[1] "R is awesome !"
The [1] to the left of the output and the presence of the " quotes indicate the returned item is a vector containing a character string, and this can be assigned to an object and used in other functions.
sep in cat and paste
> obj1 <- c("awesome","R","is")
> paste(obj1[2],obj1[3],"totally",obj1[1],"!",sep="---")
[1] "R---is---totally---awesome---!"
‘\n’ is used in cat to start the next sentence in next line
> cat(obj1[2],obj1[3],"totally",obj1[1],"!\n",sep="---")
R---is---totally---awesome---!
> paste(obj1[2],obj1[3],"totally",obj1[1],"!",sep="")
[1] "Ristotallyawesome!"
> cat(obj1[2],obj1[3],"totally",obj1[1],"!\n",sep="")
Ristotallyawesome!
Subsetting
> str4 <- "This is a character string!"
> substr(str4, start = 9, stop = nchar(str4))
[1] "a character string!"
Replacing
replacing one or more characters:
> substr(str4, start = 9, stop = 9) <- 'A'
> str4
[1] "This is A character string!"
replacing shorter
If the string is shorter than the number of characters you’re replacing, then replacement ends when the string is fully inserted, leaving the original characters up to stop untouched.
> str4 <- "This is a character string!"
> substr(str4, start = 11, stop = 19) <- 'new'
> str4
[1] "This is a newracter string!"
replacing longer
If the replacement string is longer than the number of characters indicated by start and stop, then replacement still takes place, beginning at start and ending at stop. It cuts off any characters that overrun the number of characters you’re replacing.
> str4 <- "This is a character string!"
> substr(str4, start = 1, stop = 8) <- 'Hey there it is'
> str4
[1] "Hey thera character string!"
sub() function
It replaces by default the first pattern from a string that matches-
> str5 <- "I love meaw of dogs!"
> sub("meaw","barking",str5)
[1] "I love barking of dogs!"
> str6 <- "Dogs not not yes dogs yes."
> sub("not","yes",str6)
[1] "Dogs yes not yes dogs yes."
Here the first “not” is replaced but the second one wasn’t.
gsub() function
It replaces by default all the patterns from a string that matches-
> str6 <- "Dogs not not yes dogs yes."
> gsub("not","yes",str6)
[1] "Dogs yes yes yes dogs yes."
Case sensitivity can be turned off using “ignore.case = T”:
> gsub("not","yes","Not not nOt NOT",ignore.case = T)
[1] "yes yes yes yes"
strsplit() function
> str7 <- "An example to look at example"
> strsplit(str7, " ")
[[1]]
[1] "An" "example" "to" "look" "at" "example"
> duplicated(strsplit(str7," ")[[1]]) # [[1]] is used to take the elements
[1] FALSE FALSE FALSE FALSE FALSE TRUE
The last word is duplicated.
More precisely:
> which(duplicated(strsplit(str7," ")[[1]]))
[1] 6
6th word from the string is duplicated.
View stringr package for string manipulation.
LS0tDQp0aXRsZTogIldvcmtpbmcgd2l0aCBjaGFyYWN0ZXJzIGluIFIiDQphdXRob3I6ICJNRCBBSFNBTlVMIElTTEFNIg0Kb3V0cHV0Og0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogeWVzDQogICAgdG9jX2Zsb2F0OiB5ZXMNCiAgICB0b2NfZGVwdGg6IDQNCiAgICB0aGVtZTogc2FuZHN0b25lDQogICAgY29kZV9kb3dubG9hZDogeWVzDQotLS0NCmBgYHtyLCBpbmNsdWRlPUZBTFNFfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KA0KICBjb21tZW50ID0gIiIsIHByb21wdCA9IFRSVUUsIG1lc3NhZ2U9Riwgd2FybmluZyA9Rg0KKQ0KYGBgDQoNCi0tLQ0KDQojIyMgU3RyaW5nIA0KYGBge3IgbGVuZ3RoIHZzIG5jaGFyfQ0Kc3RyMSA8LSAnSXQgd2FzIGEgZ29vZCBkYXkhJw0KbGVuZ3RoKHN0cjEpDQpuY2hhcihzdHIxKSAjIHRoaXMgc2hvd3MgdGhlIG51bWJlciBvZiBjaGFyYWN0ZXJzIGluIHRoZSBzdHJpbmcNCmBgYA0KDQpZb3UgY2FuIGNyZWF0ZSB2ZWN0b3Igb2YgYWxwaGFiZXRzIC0gDQpgYGB7cn0NCkxFVFRFUlMNCmxldHRlcnMNCmBgYA0KDQoNCiMjIyBMb2dpY2FsIG9wZXJhdGlvbiBpbiBjaGFyYXRlcnMNCg0KRm9yIGV4YWN0IG1hdGNoaW5nIC0gDQpgYGB7ciBsb2dpY2FsIG9wZXJhdGlvbiBpbiBjaGFyYXRlcnN9DQoyMyA9PSAiMjMiDQpjKCcyMycsJ2dvJykgPT0gYygyMywgJ2dvJykNCmMoMjMuMCwgJ05PVycpID09IGMoJzIzJywnTm93JykgICMgQ2FzZSBzZW5zaXRpdmUNCmMoJ05ldycsICdTaG9lJywgJ05ldycpPT0nTmV3Jw0KYGBgDQoNCiMjIyMgZ3JlcCgpDQoNCkl0IGlzIHVzZWQgZm9yIHBhcnRpYWwgbWF0Y2hpbmcgaW4gc3RyaW5ncyAtDQpgYGB7cn0NCnN0cmluZ3MgPC0gYygiZmlsZS5leGNlbCIsImRhdGEuY3N2IiwicmFuZG9tIGRhdGEuZGF0IiwgImRhdGEyLmNzdiIpDQpncmVwKCJjc3YiLCBzdHJpbmdzKQ0KYGBgDQpXaGljaCBzaG93cyB0aGF0IDJuZCBhbmQgNHRoIGVsZW1lbnRzIG1hdGNoZXMgdGhlIGdpdmVuIHN0cmluZyAiY3N2Ii4NCg0KIyMjIyBkdXBsaWNhdGVkKCkgZnVuY3Rpb24NCg0KYGBge3J9DQpkdXBsaWNhdGVkKGMoIlIiLCJpcyIsIm5vdCIsIlIiKSkNCmBgYA0KVGhlIGZvdXJ0aCBvdXRjb21lIGlzIFRydWUsIHdoaWNoIG1lYW5zIGl0IGlzIGR1cGxpY2F0ZWQuDQoNCiMjIyBBbHBoYWJldGljIG9yZGVyIGNvbXBhcmlzb24NCg0KUiBjb25zaWRlcnMgbGV0dGVycyB0aGF0IGNvbWUgbGF0ZXIgaW4gdGhlIGFscGhhYmV0IHRvIGJlIGdyZWF0ZXIgdGhhbiBlYXJsaWVyIGxldHRlcnMsIG1lYW5pbmcgaXQgY2FuIGRldGVybWluZSB3aGV0aGVyIG9uZSBzdHJpbmcgb2YgbGV0dGVycyBpcyBncmVhdGVyIHRoYW4gYW5vdGhlciB3aXRoIHJlc3BlY3QgdG8gYWxwaGFiZXRpY2FsIG9yZGVyDQpgYGB7ciBhbHBoYWJldGljIG9yZGVyIGNvbXBhcmlzb259DQonYScgPCAnYicNCidhJyA+ICdiJw0KYGBgDQpGdXJ0aGVybW9yZSwgdXBwZXJjYXNlIGxldHRlcnMgYXJlIGNvbnNpZGVyZWQgZ3JlYXRlciB0aGFuIGxvd2VyY2FzZQ0KbGV0dGVycy4NCmBgYHtyIENhcGl0YWwgbGV0dGVyIHZzIHNtYWxsIGxldHRlciBjb21wYXJpc29ufQ0KJ0EnID4gJ2EnDQpgYGANCg0KIyMjIGNhdCB2cyBwYXN0ZQ0KYGBge3IgY2F0fQ0Kb2JqMSA8LSBjKCJhd2Vzb21lIiwiUiIsImlzIikNCmNhdChvYmoxWzJdLCBvYmoxWzNdLCBvYmoxWzFdLCAnIScpDQpgYGANCkluIGNhdCBvbmUgY2Fubm90IGRpcmVjdGx5IGFzc2lnbiB0aGUgcmVzdWx0IHRvIGEgbmV3IHZhcmlhYmxlIGFuZCB0cmVhdCBpdCBhcyBhIGNoYXJhY3RlciBzdHJpbmcuDQpgYGB7ciBwYXN0ZX0NCm9iajEgPC0gYygiYXdlc29tZSIsIlIiLCJpcyIpDQpwYXN0ZShvYmoxWzJdLCBvYmoxWzNdLCBvYmoxWzFdLCAnIScpDQpgYGANClRoZSBbMV0gdG8gdGhlIGxlZnQgb2YgdGhlIG91dHB1dCBhbmQgdGhlIHByZXNlbmNlIG9mIHRoZSAiIHF1b3RlcyBpbmRpY2F0ZSB0aGUgcmV0dXJuZWQgaXRlbSBpcyBhIHZlY3RvciBjb250YWluaW5nIGEgY2hhcmFjdGVyIHN0cmluZywgYW5kIHRoaXMgY2FuIGJlIGFzc2lnbmVkIHRvIGFuIG9iamVjdA0KYW5kIHVzZWQgaW4gb3RoZXIgZnVuY3Rpb25zLg0KDQojIyMjIGBzZXBgIGluIGNhdCBhbmQgcGFzdGUNCg0KYGBge3Igc2VwfQ0Kb2JqMSA8LSBjKCJhd2Vzb21lIiwiUiIsImlzIikNCnBhc3RlKG9iajFbMl0sb2JqMVszXSwidG90YWxseSIsb2JqMVsxXSwiISIsc2VwPSItLS0iKQ0KYGBgDQoNCidcXG4nIGlzIHVzZWQgaW4gY2F0IHRvIHN0YXJ0IHRoZSBuZXh0IHNlbnRlbmNlIGluIG5leHQgbGluZQ0KYGBge3J9DQpjYXQob2JqMVsyXSxvYmoxWzNdLCJ0b3RhbGx5IixvYmoxWzFdLCIhXG4iLHNlcD0iLS0tIikNCnBhc3RlKG9iajFbMl0sb2JqMVszXSwidG90YWxseSIsb2JqMVsxXSwiISIsc2VwPSIiKQ0KY2F0KG9iajFbMl0sb2JqMVszXSwidG90YWxseSIsb2JqMVsxXSwiIVxuIixzZXA9IiIpDQpgYGANCg0KIyMjIFN1YnNldHRpbmcNCmBgYHtyIHN1YnN0cn0NCnN0cjQgPC0gIlRoaXMgaXMgYSBjaGFyYWN0ZXIgc3RyaW5nISINCnN1YnN0cihzdHI0LCBzdGFydCA9IDksIHN0b3AgPSBuY2hhcihzdHI0KSkNCmBgYA0KDQojIyMgUmVwbGFjaW5nDQoNCnJlcGxhY2luZyBvbmUgb3IgbW9yZSBjaGFyYWN0ZXJzOg0KYGBge3IgcmVwbGFjaW5nfQ0Kc3Vic3RyKHN0cjQsIHN0YXJ0ID0gOSwgc3RvcCA9IDkpIDwtICdBJw0Kc3RyNA0KYGBgDQoNCiMjIyMgcmVwbGFjaW5nIHNob3J0ZXINCklmIHRoZSBzdHJpbmcgaXMgc2hvcnRlciB0aGFuIHRoZSBudW1iZXIgb2YgY2hhcmFjdGVycyB5b3XigJlyZSByZXBsYWNpbmcsIHRoZW4gcmVwbGFjZW1lbnQgZW5kcyB3aGVuIHRoZSBzdHJpbmcgaXMgZnVsbHkgaW5zZXJ0ZWQsIGxlYXZpbmcgdGhlIG9yaWdpbmFsIGNoYXJhY3RlcnMgdXAgdG8gc3RvcCB1bnRvdWNoZWQuDQpgYGB7ciByZXBsYWNpbmcgc2hvcnRlcn0NCnN0cjQgPC0gIlRoaXMgaXMgYSBjaGFyYWN0ZXIgc3RyaW5nISINCnN1YnN0cihzdHI0LCBzdGFydCA9IDExLCBzdG9wID0gMTkpIDwtICduZXcnDQpzdHI0DQpgYGANCg0KIyMjIyByZXBsYWNpbmcgbG9uZ2VyDQpJZiB0aGUgcmVwbGFjZW1lbnQgc3RyaW5nIGlzIGxvbmdlciB0aGFuIHRoZSBudW1iZXIgb2YgY2hhcmFjdGVycyBpbmRpY2F0ZWQNCmJ5IHN0YXJ0IGFuZCBzdG9wLCB0aGVuIHJlcGxhY2VtZW50IHN0aWxsIHRha2VzIHBsYWNlLCBiZWdpbm5pbmcgYXQNCnN0YXJ0IGFuZCBlbmRpbmcgYXQgc3RvcC4gSXQgY3V0cyBvZmYgYW55IGNoYXJhY3RlcnMgdGhhdCBvdmVycnVuIHRoZSBudW1iZXINCm9mIGNoYXJhY3RlcnMgeW914oCZcmUgcmVwbGFjaW5nLg0KYGBge3IgcmVwbGFjaW5nIGxvbmdlcn0NCnN0cjQgPC0gIlRoaXMgaXMgYSBjaGFyYWN0ZXIgc3RyaW5nISINCnN1YnN0cihzdHI0LCBzdGFydCA9IDEsIHN0b3AgPSA4KSA8LSAnSGV5IHRoZXJlIGl0IGlzJw0Kc3RyNA0KYGBgDQoNCiMjIyMgc3ViKCkgZnVuY3Rpb24NCg0KSXQgcmVwbGFjZXMgYnkgZGVmYXVsdCB0aGUgZmlyc3QgcGF0dGVybiBmcm9tIGEgc3RyaW5nIHRoYXQgbWF0Y2hlcy0NCmBgYHtyfQ0Kc3RyNSA8LSAiSSBsb3ZlIG1lYXcgb2YgZG9ncyEiDQpzdWIoIm1lYXciLCJiYXJraW5nIixzdHI1KQ0KYGBgDQoNCmBgYHtyfQ0Kc3RyNiA8LSAiRG9ncyBub3Qgbm90IHllcyBkb2dzIHllcy4iDQpzdWIoIm5vdCIsInllcyIsc3RyNikNCmBgYA0KSGVyZSB0aGUgZmlyc3QgIm5vdCIgaXMgcmVwbGFjZWQgYnV0IHRoZSBzZWNvbmQgb25lIHdhc24ndC4NCg0KIyMjIyBnc3ViKCkgZnVuY3Rpb24NCg0KSXQgcmVwbGFjZXMgYnkgZGVmYXVsdCBhbGwgdGhlIHBhdHRlcm5zIGZyb20gYSBzdHJpbmcgdGhhdCBtYXRjaGVzLQ0KYGBge3J9DQpzdHI2IDwtICJEb2dzIG5vdCBub3QgeWVzIGRvZ3MgeWVzLiINCmdzdWIoIm5vdCIsInllcyIsc3RyNikNCmBgYA0KQ2FzZSBzZW5zaXRpdml0eSBjYW4gYmUgdHVybmVkIG9mZiB1c2luZyAiaWdub3JlLmNhc2UgPSBUIjoNCmBgYHtyfQ0KZ3N1Yigibm90IiwieWVzIiwiTm90IG5vdCBuT3QgTk9UIixpZ25vcmUuY2FzZSA9IFQpDQpgYGANCg0KIyMjIHN0cnNwbGl0KCkgZnVuY3Rpb24NCg0KYGBge3J9DQpzdHI3IDwtICJBbiBleGFtcGxlIHRvIGxvb2sgYXQgZXhhbXBsZSINCnN0cnNwbGl0KHN0cjcsICIgIikNCmR1cGxpY2F0ZWQoc3Ryc3BsaXQoc3RyNywiICIpW1sxXV0pICMgW1sxXV0gaXMgdXNlZCB0byB0YWtlIHRoZSBlbGVtZW50cw0KYGBgDQpUaGUgbGFzdCB3b3JkIGlzIGR1cGxpY2F0ZWQuICANCk1vcmUgcHJlY2lzZWx5Og0KYGBge3J9DQp3aGljaChkdXBsaWNhdGVkKHN0cnNwbGl0KHN0cjcsIiAiKVtbMV1dKSkNCmBgYA0KNnRoIHdvcmQgZnJvbSB0aGUgc3RyaW5nIGlzIGR1cGxpY2F0ZWQuDQoNCi0tLQ0KDQpWaWV3IFtzdHJpbmdyXShodHRwczovL3JwdWJzLmNvbS9NZEFoc2FudWwvc3RyaW5ncl9iYXNpYyAic3RyaW5nciIpIHBhY2thZ2UgZm9yIHN0cmluZyBtYW5pcHVsYXRpb24uDQoNCg0KDQoNCg0KDQoNCg0KDQo=