A Quick Demonstration of the Textclean Package

#install.packages("textclean")
library(textclean)

This is a quick demo of the RStudio Package text clean - it’s a tool that is used to clean text quickly - mostly substrings - replacing/normalizing them when other tools are not optimal

It also works well with qdap Regex package

Here’s our test

test <- c('the', 'quick', 'fox','😊', 'jumps','','over', 'the','','dog', '')

print(test)
##  [1] "the"   "quick" "fox"   "😊"    "jumps" ""      "over"  "the"   ""     
## [10] "dog"   ""

Making it into a dataframe - we see we have two missing and an emoticon

library(data.table)

test <-data.frame(test)

head(test,10)
##     test
## 1    the
## 2  quick
## 3    fox
## 4     😊
## 5  jumps
## 6       
## 7   over
## 8    the
## 9       
## 10   dog

Checking our data type - comes back character or string

summary(test)
##      test          
##  Length:11         
##  Class :character  
##  Mode  :character

Cleaning Our Data

test <- drop_empty_row(test)

print(test)
##     test
## 1    the
## 2  quick
## 3    fox
## 4     😊
## 5  jumps
## 7   over
## 8    the
## 10   dog

Now we are going to replace the emoji/emoticon

test <- replace_emoji(test$test)

print(test)
## [1] "the"                              "quick"                           
## [3] "fox"                              " smiling face with smiling eyes "
## [5] "jumps"                            "over"                            
## [7] "the"                              "dog"

Back to a dataframe

library(data.table)

test <-data.frame(test)

head(test,10)
##                               test
## 1                              the
## 2                            quick
## 3                              fox
## 4  smiling face with smiling eyes 
## 5                            jumps
## 6                             over
## 7                              the
## 8                              dog

References

Rinker, T (2022). “Text Cleaning Tools,” Cran.org, https://cran.r-project.org/web/packages/textclean/textclean.pdf

Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001). “Textclean”.

Textclean Cheatsheat. cran.org https://cran.r-project.org/web/packages/textclean/readme/README.html