This tutorial lists some of the most useful string or character functions in R. It includes concatenating two strings, extract portion of text from a string, extract word from a string, making text uppercase or lowercase, replacing text with the other text etc…
In R, strings are stored in a character vector. You can create strings with a single quote / double quote.
For example, x = “I love R Programming”
The as.character function converts argument to character type. In the example below, we are storing 25 as a character.
Y = as.character(25)
class(Y)
## [1] "character"
The class(Y) returns character as 25 is stored as a character in the previous line of code.
x = "I love R Programming"
is.character(x)
## [1] TRUE
Like is.character function
, there are other functions such as is.numeric
, is.integer
and is.array
for checking numeric vector, integer and array.
The paste function
is used to join two strings. It is one of the most important string manipulation task. Every analyst performs it almost daily to structure data.
paste (objects, sep = " ", collapse = NULL)
The sep= keyword
denotes a separator or delimiter. The default separator is a single space. The collapse= keyword is used to separate the results.
x = "Emily"
y = "in Paris"
paste(x, y)
## [1] "Emily in Paris"
paste("x", seq(1,10), sep = "")
## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" "x10"
Example 3 : Use of ‘Collapse’ keyword
paste("x", seq(1,10), sep="", collapse=",")
## [1] "x1,x2,x3,x4,x5,x6,x7,x8,x9,x10"
Compare the output of Example 2 and Example3, you would understand the usage of collapse keyword in paste function. Every sequence of x is separated by “,”.
Suppose the value is stored in fraction and you need to convert it to percent. The sprintf
is used to perform C-style string formatting.
sprintf(fmt, ...)
The keyword ‘fmt’ denotes string format. The format starts with the symbol ‘%’ followed by numbers and letters.
x = 0.25
sprintf("%.0f%%",x*100)
## [1] "25%"
Note : ‘%.0f’ indicates ‘fixed point’ decimal notation with 0 decimal. The extra % sign after ‘f’ tells R to add percentage sign after the number.
If you change the code to sprintf(“%.2f%%”,x*100), it would return 25.00%.
a = seq(1, 5)
sprintf("x%03d", a)
## [1] "x001" "x002" "x003" "x004" "x005"
The letter ‘s’ in the format is used for character string.
substr Syntax - substr(x, starting position, end position)
x = "abcdef"
substr(x, 1, 3)
## [1] "abc"
In the above example. we are telling R to extract string from 1st letter through 3rd letter.
Replace Substring - substr(x, starting position, end position) = Value
substr(x, 1, 2) = "11"
x
## [1] "11cdef"
In the above example, we are telling R to replace first 2 letters with 11.
The nchar
function is used to compute the length of a character value.
x = "I love R Programming"
nchar(x)
## [1] 20
It returns 20 as the vector ‘x’ contains 20 letters (including 3 spaces).
sub Syntax - sub(sub-string, replacement, x, ignore.case = FALSE)
if ignore.case is FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
sub("okay", "fine", "She is okay.")
## [1] "She is fine."
In the above example, we are replacing the word ‘okay’ with ‘fine’.
Let’s replace all values of a vector
In the example below, we need to replace prefix ‘x’ with ‘Year’ in values of a vector.
cols = c("x1", "x2", "x3")
sub("x", "Year", cols)
## [1] "Year1" "Year2" "Year3"
Suppose you need to pull a first or last word from a character string.
Word Function Syntax (Library : stringr)
word(string, position of word to extract, separator)
x = "I love R Programming"
library(stringr)
## Warning: package 'stringr' was built under R version 4.0.5
word(x, 1,sep = " ")
## [1] "I"
In the example above , ‘1’ denotes the first word to be extract from a string. sep=” ” denotes a single space as a delimiter (It’s the default delimiter in the word function)
Extract Last Word
x = "I love R Programming"
library(stringr)
word(x, -1,sep = " ")
## [1] "Programming"
In the example above , ‘-1’ denotes the first word but started to be reading from the right of the string. sep=” ” denotes a single space as a delimiter (It’s the default delimiter in the word function)
In many times, we need to change case of a word. For example. convert the case to uppercase or lowercase.
x = "I love R Programming"
tolower(x)
## [1] "i love r programming"
The tolower()
function converts letters in a string to lowercase.
The toupper()
function converts letters in a string to uppercase.
library(stringr)
str_to_title(x)
## [1] "I Love R Programming"
The str_to_title()
function converts first letter in a string to uppercase and the remaining letters to lowercase.
The trimws()
function is used to remove leading and/or trailing spaces.
trimws(x, which = c("both", "left", "right"))
Default Option : both : It implies removing both leading and trailing whitespace. If you want to remove only leading spaces, you can specify “left”. For removing trailing spaces,specify “right”.
a = "Cool Story "
trimws(a)
## [1] "Cool Story"
The str_trim()
function from the stringr package eliminates leading and trailing spaces.
It’s a challenging task to remove multiple spaces from a string and keep only a single space. In R, it is possible to do it easily with qdap package.
Might not work
x= "Cool Story"
#install.packages("qdap")
library(qdap)
Trim(clean(x))