Text Mining is generally known as Text Analytics. It is the process of collecting insight and information from a set of text-data. Text Mining is used to help the business to find out relevant information from text-based content. These contents can be in the form of a word document, posts on social media, email, etc. Text mining technique allows us to feature the most frequently used keywords in a paragraph of texts. Word cloud, also referred to as a text cloud, which is a visual representation of text-data. The steps of creating word clouds are quite easy in R.
The ability to deal with text-data is one of the important skills of a data scientist in today’s scenario. With the onset of review websites, social media, forums, web pages, companies now have access to enormous text-data of their customers.
These data will be messy. however, the source of information, insights which can help companies to boost their businesses. That is the reason, why Text Mining as a technique well-known as Natural Language Processing (NLP) is growing rapidly and being broadly used by data scientists. The text mining package ‘tm’ and the word cloud package (wordcloud) are available in R for text analysis and to quickly visualize the keywords as a word cloud.
#install.package(âNLPâ)
#install.package("tm")
#install.package(âRColorBrewerâ)
#install.package(âwordcloudâ)
#install.package(âwordcloud2â)
“Text Mining is a technique that boosts the research process and helps to test the queries.”
filePath <- "https://acadgild.com/artificial-intelligence.txt"
Warning message:
In knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet, :
The file "textmining.Rmd" must be encoded in UTF-8. Please see https://yihui.name/en/2018/11/biggest-regret-knitr/ for more info.
text_file <- readLines(filePath)
incomplete final line found on 'https://acadgild.com/artificial-intelligence.txt'
head(text_file)
[1] "How ARTIFICAL INTELLIGENCE is making a huge impact across various sectors"
[2] " "
[3] ""
[4] "Author - Badal Kumar"
[5] ""
[6] ""
text_file1 <- paste(text_file, collapse = " ")
clean_text <- tolower(text_file1)
clean_text1 <- gsub(pattern = "\\W", replace = " " ,clean_text)
clean_text2 <- gsub(pattern = "\\d", replace = " ", clean_text1)
#load the required packages
library(NLP)
library(tm)
The information value of ‘stopwords’ is near zero due to the fact that they are so common in a language. Extracting this kind of words is helpful before further analyses.
stopwords()
[1] "i" "me" "my" "myself" "we" "our" "ours" "ourselves" "you"
[10] "your" "yours" "yourself" "yourselves" "he" "him" "his" "himself" "she"
[19] "her" "hers" "herself" "it" "its" "itself" "they" "them" "their"
[28] "theirs" "themselves" "what" "which" "who" "whom" "this" "that" "these"
[37] "those" "am" "is" "are" "was" "were" "be" "been" "being"
[46] "have" "has" "had" "having" "do" "does" "did" "doing" "would"
[55] "should" "could" "ought" "i'm" "you're" "he's" "she's" "it's" "we're"
[64] "they're" "i've" "you've" "we've" "they've" "i'd" "you'd" "he'd" "she'd"
[73] "we'd" "they'd" "i'll" "you'll" "he'll" "she'll" "we'll" "they'll" "isn't"
[82] "aren't" "wasn't" "weren't" "hasn't" "haven't" "hadn't" "doesn't" "don't" "didn't"
[91] "won't" "wouldn't" "shan't" "shouldn't" "can't" "cannot" "couldn't" "mustn't" "let's"
[100] "that's" "who's" "what's" "here's" "there's" "when's" "where's" "why's" "how's"
[109] "a" "an" "the" "and" "but" "if" "or" "because" "as"
[118] "until" "while" "of" "at" "by" "for" "with" "about" "against"
[127] "between" "into" "through" "during" "before" "after" "above" "below" "to"
[136] "from" "up" "down" "in" "out" "on" "off" "over" "under"
[145] "again" "further" "then" "once" "here" "there" "when" "where" "why"
[154] "how" "all" "any" "both" "each" "few" "more" "most" "other"
[163] "some" "such" "no" "nor" "not" "only" "own" "same" "so"
[172] "than" "too" "very"
clean_text3 <- removeWords(clean_text2,words = c(stopwords(),"ai","â"))
clean_text4 <- gsub(pattern = "\\b[A-z]\\b{1}", replace = " ", clean_text3 )
clean_text5 <- stripWhitespace(clean_text4)
clean_text6 <- strsplit(clean_text5, " ")
head(clean_text6, 10)
[[1]]
[1] "" "artifical" "intelligence" "making" "huge" "impact"
[7] "across" "various" "sectors" "author" "badal" "kumar"
[13] "might" "familiar" "word" "artificial" "intelligence" "might"
[19] "heard" "radio" "might" "seen" "news" "stories"
[25] "hollywood" "movie" "depicting" "sudden" "trend" "recent"
[31] "times" "starters" "thing" "popular" "almost" "top"
[37] "companies" "around" "world" "fortune" "partially" "directly"
[43] "involved" "evolving" "developing" "using" "form" "artificial"
[49] "intelligence" "tesla" "facebook" "google" "microsoft" "openai"
[55] "list" "goes" "let" "us" "look" "artificial"
[61] "intelligence" "impacting" "various" "sectors" "artificial" "intelligence"
[67] "artificial" "intelligence" "generally" "called" "machine" "intelligence"
[73] "intelligence" "determined" "machines" "comparison" "natural" "intelligence"
[79] "displayed" "humans" "trying" "bring" "artificial" "intelligence"
[85] "way" "system" "program" "can" "mimic" "human"
[91] "brain" "think" "like" "humans" "give" "faster"
[97] "results" "possible" "yes" "pre" "stored" "data"
[103] "machines" "can" "analyze" "letting" "figure" "surroundings"
[109] "like" "human" "brain" "grows" "years" "come"
[115] "cognizant" "everything" "around" "working" "human" "brain"
[121] "decision" "unlike" "human" "unbiased" "making" "decisions"
[127] "predictions" "seems" "science" "fiction" "isn" "reality"
[133] "many" "think" "human" "threat" "started" "hot"
[139] "debate" "among" "many" "well" "known" "personalities"
[145] "world" "elon" "musk" "really" "quite" "close"
[151] "close" "cutting" "edge" "scares" "hell" "capable"
[157] "vastly" "almost" "anyone" "knows" "rate" "improvement"
[163] "exponential" "mark" "zuckerberg" "going" "make" "lives"
[169] "better" "future" "doomsday" "scenarios" "pretty" "irresponsible"
[175] "think" "differently" "won" "know" "will" "happen"
[181] "truth" "already" "part" "lives" "whether" "know"
[187] "almost" "everywhere" "will" "implementing" "soon" "almost"
[193] "sectors" "world" "using" "form" "know" "artificial"
[199] "intelligence" "making" "impact" "across" "various" "sectors"
[205] "last" "years" "big" "data" "machine" "learning"
[211] "development" "deep" "learning" "brought" "revolution" "artificial"
[217] "intelligence" "today" "devices" "store" "everyday" "data"
[223] "generate" "huge" "data" "sets" "machine" "learning"
[229] "deep" "learningâ" "algorithms" "analyses" "find" "trends"
[235] "make" "predictions" "similarly" "almost" "every" "single"
[241] "industry" "finance" "fashion" "pharmaceutical" "automobile" "stormed"
[247] "ongoing" "technological" "revolution" "recent" "evolutions" "technologies"
[253] "deep" "learning" "machine" "learning" "others" "leading"
[259] "us" "towards" "th" "industrial" "age" "systems"
[265] "powerful" "enough" "cut" "human" "efforts" "different"
[271] "areas" "execute" "various" "activities" "industry" "many"
[277] "using" "artificial" "intelligence" "create" "machine" "algorithms"
[283] "perform" "various" "tasks" "daily" "important" "role"
[289] "artificial" "intelligence" "play" "today" "let" "see"
[295] "role" "one" "sectors" "one" "one" "artificial"
[301] "intelligence" "role" "sector" "probably" "biggest" "sector"
[307] "influencing" "right" "now" "every" "company" "either"
[313] "building" "kind" "companies" "big" "enough" "directly"
[319] "buy" "small" "startups" "google" "apple" "microsoft"
[325] "huawei" "big" "players" "already" "act" "may"
[331] "ask" "well" "answer" "everywhere" "around" "android"
[337] "phone" "using" "uses" "google" "assistant" "application"
[343] "everything" "ask" "like" "making" "call" "playing"
[349] "favorite" "songs" "alexa" "new" "trend" "days"
[355] "offers" "similar" "role" "google" "assistant" "microsoft"
[361] "now" "chatbot" "skype" "called" "ruuh" "chatbot"
[367] "can" "talk" "hours" "won" "bore" "just"
[373] "talks" "like" "actual" "human" "can" "also"
[379] "put" "chatbots" "hike" "messenger" "facebook" "messenger"
[385] "google" "allows" "messenger" "running" "amazing" "thing"
[391] "people" "don" "know" "something" "google" "already"
[397] "shown" "back" "yearly" "google" "meet" "thing"
[403] "step" "google" "assistant" "can" "talk" "someone"
[409] "absence" "without" "even" "knowing" "actually" "time"
[415] "passes" "keeps" "learning" "speak" "stop" "sentence"
[421] "put" "emphasis" "words" "knows" "showed" "person"
[427] "told" "assistant" "book" "dinner" "table" "hotel"
[433] "google" "assistant" "directly" "called" "hotel" "online"
[439] "search" "talked" "person" "almost" "unbelievable" "human"
[445] "tone" "booked" "table" "without" "person" "side"
[451] "phone" "without" "realizing" "just" "talked" "machine"
[457] "cool" "can" "believe" "big" "time" "fan"
[463] "tony" "stark" "jarvis" "now" "time" "need"
[469] "get" "eager" "enthusiastic" "sci" "fi" "anymore"
[475] "quite" "logical" "now" "already" "day" "day"
[481] "lives" "somehow" "making" "lives" "easier" "whenever"
[487] "hear" "term" "artificial" "intelligence" "sector" "might"
[493] "get" "image" "civilization" "hem" "robots" "unlike"
[499] "see" "sci" "fi" "movies" "machine" "intelligence"
[505] "practical" "sensible" "everything" "surroundings" "changing" "like"
[511] "systems" "software" "put" "together" "users" "can"
[517] "connect" "can" "programmed" "now" "part" "new"
[523] "generation" "machines" "started" "learns" "user" "behavior"
[529] "predict" "recognize" "next" "move" "taking" "rapid"
[535] "rate" "reasons" "everything" "connected" "now" "living"
[541] "time" "everything" "connected" "thanks" "internet" "cloud"
[547] "computing" "everything" "now" "just" "one" "click"
[553] "away" "us" "aiâ" "can" "play" "superior"
[559] "role" "making" "workload" "less" "forms" "communication"
[565] "already" "knows" "want" "instead" "typing" "something"
[571] "example" "google" "feed" "computer" "youtube" "recommendations"
[577] "bar" "know" "want" "search" "see" "learning"
[583] "history" "usage" "scenarios" "computing" "becoming" "pretty"
[589] "cheaper" "time" "computers" "costly" "now" "smartphones"
[595] "future" "computing" "amazing" "requirement" "computer" "started"
[601] "need" "calculate" "something" "now" "way" "bigger"
[607] "basically" "world" "hands" "form" "devices" "data"
[613] "super" "valuable" "every" "company" "wants" "data"
[619] "now" "might" "read" "somewhere" "regarding" "latest"
[625] "facebook" "data" "leak" "thing" "facebook" "stealing"
[631] "data" "might" "think" "company" "like" "facebook"
[637] "needs" "data" "person" "like" "well" "data"
[643] "will" "used" "train" "feed" "artificial" "intelligence"
[649] "companies" "basically" "trainer" "companies" "don" "even"
[655] "know" "data" "knows" "easier" "giant" "leader"
[661] "like" "facebook" "influence" "maybe" "ads" "political"
[667] "thought" "may" "used" "win" "elections" "latter"
[673] "case" "seen" "us" "general" "elections" "ever"
[679] "noticed" "searching" "online" "product" "site" "see"
[685] "product" "advertisement" "whole" "day" "websites" "visit"
[691] "happen" "well" "think" "now" "know" "just"
[697] "days" "back" "facebook" "announced" "will" "pay"
[703] "data" "must" "think" "oh" "yeahâ" "iâ"
[709] "getting" "paid" "personal" "information" "will" "help"
[715] "facebook" "beat" "race" "data" "future" "will"
[721] "become" "essential" "money" "data" "power" "can"
[727] "influence" "millions" "people" "just" "click" "sneaky"
[733] "isn" "use" "artificial" "intelligence" "sector" "can"
[739] "now" "extend" "enhance" "human" "capacity" "solve"
[745] "real" "problems" "affect" "education" "health" "politics"
[751] "poverty" "lens" "can" "take" "new" "look"
[757] "solving" "problem" "companies" "thinking" "future" "know"
[763] "somewhat" "key" "solve" "problems" "future" "manpower"
[769] "won" "important" "will" "kick" "full" "force"
[775] "like" "tesla" "uber" "giant" "companies" "developing"
[781] "self" "driven" "cars" "changes" "future" "may"
[787] "require" "cabbie" "drive" "car" "believe" "enhances"
[793] "human" "society" "adversely" "affect" "artificial" "intelligence"
[799] "role" "automobile" "industry" "fascinating" "might" "astonishing"
[805] "many" "automobile" "sector" "sectors" "right" "now"
[811] "using" "foremost" "modern" "cutting" "edge" "develop"
[817] "better" "cars" "automobiles" "might" "astonish" "many"
[823] "automotive" "industry" "working" "well" "self" "driving"
[829] "cars" "affirmative" "likely" "guessed" "already" "answer"
[835] "audi" "tesla" "toyota" "bmw" "renault" "volkswagen"
[841] "porsche" "mazda" "automobile" "company" "functioning" "self"
[847] "driving" "cars" "using" "implementing" "vehicles" "even"
[853] "now" "reading" "blog" "probably" "probability" "important"
[859] "player" "tesla" "corporation" "established" "one" "among"
[865] "brightest" "minds" "st" "century" "elon" "musk"
[871] "corporate" "committed" "clean" "energy" "cars" "required"
[877] "drivers" "created" "autopilot" "doable" "highways" "regular"
[883] "roads" "advance" "platform" "known" "tesla" "autopilot"
[889] "will" "believe" "often" "probability" "foremost" "advanced"
[895] "form" "known" "world" "without" "delay" "tesla"
[901] "automotive" "method" "put" "wherever" "sit" "car"
[907] "set" "destination" "will" "drive" "way" "without"
[913] "issues" "oh" "top" "sensors" "automotive" "alongside"
[919] "advanced" "knows" "accidents" "area" "unit" "aiming"
[925] "happen" "will" "believe" "see" "thing" "capable"
[931] "predicting" "accidents" "road" "secs" "takes" "place"
[937] "also" "someone" "coming" "crash" "will" "automatically"
[943] "change" "path" "save" "accident" "yes" "folks"
[949] "often" "hollywood" "sci" "fi" "want" "go"
[955] "additional" "detail" "system" "runs" "tesla" "shown"
[961] "recent" "demonstration" "show" "whole" "soc" "system"
[967] "chip" "runs" "cars" "powers" "one" "thought"
[973] "cars" "will" "able" "take" "decisions" "make"
[979] "travel" "safer" "fewer" "nerve" "wracking" "tesla"
[985] "launched" "cars" "now" "support" "tesla" "autopilot"
[991] "feature" "guess" "people" "already" "buying" "cars"
[997] "almost" "united" "state" "europe"
[ reached getOption("max.print") -- omitted 1106 entries ]
word_freq <- table(clean_text6)
head(word_freq)
clean_text6
ability able abnormal absence accident
1 1 2 2 1 1
By using cbind() by taking word_frew data-frame arguments and combine by columns or rows, respectively.
word_freq1 <- cbind(names(word_freq), as.integer(word_freq))
head(word_freq1)
[,1] [,2]
[1,] "" "1"
[2,] "ability" "1"
[3,] "able" "2"
[4,] "abnormal" "2"
[5,] "absence" "1"
[6,] "accident" "1"
Generate the Word cloud
library(RColorBrewer)
library(wordcloud)
class(clean_text6)
[1] "list"
word_cloud <- unlist(clean_text6)
wordcloud(word_cloud)
transformation drops documentstransformation drops documents
wordcloud(word_cloud,min.freq = 5 , random.order = FALSE, scale=c(3, 0.5))
transformation drops documentstransformation drops documents
wordcloud(word_cloud,min.freq = 3, max.words=1000, random.order=F, rot.per=0.2, colors=brewer.pal(5, "Dark2"), scale=c(4,0.2))
transformation drops documentstransformation drops documents
library(wordcloud2)
wordcloud2(word_freq)
wordcloud2(word_freq, color = "random-light", backgroundColor = "white")
wordcloud2(word_freq, color = "random-dark", backgroundColor = "white",size = 0.5, shape = "triangle")
The above word cloud clearly shows that “will”, “artifical”, “data”, “human” and “intelligence” are the five most important words in the “Artifical intelligance” artical.