The paper’s aim is to modelling the sentiment analysist for one of the oldest Indonesian Aircraft, Sriwijaya Airlines. Sriwijaya Air founded in 2003 by Chandra Lie, Hendry Lie, Andi Halim and Fandy Lingga. In their first year, Sriwijaya Air grow rapidly, then today, Sriwijaya Air category’s is in Medium Service Airline.
One reason why I want to do the Sentiment Analysis in the aircraft, because in 9th January 2021, Sriwijaya Air SJ-182 (Boeing Classic 737) found missing after 4 minutes takeoff. The tracking shows that the plane was at an altitude of 250ft and at that point contact with the plane was lost, while the pilot had not declared any sort of emergency. So I want to know more on how their performance in the past.
Writer: Laura Florencia (4309985)
Data Science and Business Analytics
University of Warsaw
#STEP 1, load the libraries
library("tm")
## Loading required package: NLP
library("SnowballC")
library("wordcloud")
## Loading required package: RColorBrewer
library("RColorBrewer")
library("stringr")
#Set working directory
setwd("D:/0. DSBA - Warsaw Uni/8 Unsupervised Learning/Paper/MarketBasket/SA SriwijayaAir")
docs<-readLines("sriwijayaNet.csv")
After load the libraries and set our working directory, we are going to load corpus. Corpus is representing and computing on corpora, and corpora are collections of documents containing (natural language) text. The packages employ the infrastructured by tm package.
# Load the data as a corpus
docs <- Corpus(VectorSource(docs))
This part is where we do the data cleaning such as:
> any unneccesary symbol
> change the text into lower caps so that the machine can read it as the same characters
> remove the pucntuation
> remove any numbers
> add extra stopwords
> remove stopwords from corpus
> remove stopwords
> remove extra white space
> remove URL
> replace words because maybe some typo in the dataset
For the data cleaning, it’s actually we can decide which cleaning method that we aim to use or not.
In the paper, we put the inspect(docs) down below as a comment, because the result is about 400 lines long from the original dataset result that should be inspected before classified.
#Inspect the content of the document
#inspect(docs)
#Replacing "/", "@" and "|" with space:
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "/"): transformation drops
## documents
docs <- tm_map(docs, toSpace, "@")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "@"): transformation drops
## documents
docs <- tm_map(docs, toSpace, "\\|")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "\\|"): transformation drops
## documents
#Cleaning the text and convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
#Remove punctuation
docs <- tm_map(docs, toSpace, "[[:punct:]]")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "[[:punct:]]"): transformation
## drops documents
#Remove numbers
docs <- tm_map(docs, toSpace, "[[:digit:]]")
## Warning in tm_map.SimpleCorpus(docs, toSpace, "[[:digit:]]"): transformation
## drops documents
#Add two extra stop words: "available" and "via"
myStopwords = readLines("stopword_en.csv")
#Remove stopwords from corpus
docs <- tm_map(docs, removeWords, myStopwords)
## Warning in tm_map.SimpleCorpus(docs, removeWords, myStopwords): transformation
## drops documents
#Remove your own stop word and specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("flight","you","air","sriwijaya","airline","reviewed"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, c("flight", "you", "air", :
## transformation drops documents
#Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
#Remove URL
removeURL <- function(x) gsub("http[[:alnum:]]*", " ", x)
docs <- tm_map(docs, removeURL)
## Warning in tm_map.SimpleCorpus(docs, removeURL): transformation drops documents
#Replace words
docs <- tm_map(docs, gsub, pattern="Howver", replacement="However")
## Warning in tm_map.SimpleCorpus(docs, gsub, pattern = "Howver", replacement =
## "However"): transformation drops documents
After performed all data cleaning, we do the document-term matrix (dtm). DTMis a mathematical matrix that describes the frequency of terms that occur in a collection of documents.
The goal is to represent the document by the frequency of semantically significant terms in our original dataset.
#Build a term-document matrix (dtm)
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 15)
## word freq
## time time 83
## service service 36
## late late 34
## check check 30
## airlines airlines 30
## budget budget 29
## flights flights 28
## cheap cheap 28
## fly fly 28
## june june 28
## july july 27
## hours hours 26
## april april 26
## august august 26
## plane plane 25
We gather all terms from DTM and get the most common words in the whole dataset. We can get the spread of the words before we classified it to Negative, Positive and Neutral.
#Generate the Word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words=50, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
Write a new result data in CSV file that containt the sentiments after cleaning and save the result.
dataframe<-data.frame(text=unlist(sapply(docs, `[`)), stringsAsFactors=F)
write.csv(dataframe, "D:/0. DSBA - Warsaw Uni/8 Unsupervised Learning/Paper/MarketBasket/SA SriwijayaAir/sriwijayaAir1.csv")
save.image()
The first thing we should do is the cleaning parts, and put the machine to learn the instruction and we will try to classified the sentiments from the new documents sriwijayaAir1.csv.
The machine will do the scoring, whether it’s negative or positive. We introduce positive and negative sentiments for this model, and we also have the summary of all positive or negative words from the txt file.
setwd("D:/0. DSBA - Warsaw Uni/8 Unsupervised Learning/Paper/MarketBasket/SA SriwijayaAir")
allwords<-read.csv("sriwijayaAir1.csv", header=TRUE)
#scoring, whether positive or negative
pos <- scan("D:/0. DSBA - Warsaw Uni/8 Unsupervised Learning/Paper/MarketBasket/SA SriwijayaAir/positive-words.txt",what="character", comment.char=";")
neg <- scan("D:/0. DSBA - Warsaw Uni/8 Unsupervised Learning/Paper/MarketBasket/SA SriwijayaAir/negative-words.txt", what="character", comment.char=";")
poswords = c(pos, "is near to")
negwords = c(neg, "cant")
score.sentiment = function(allwords, poswords, negwords, .progress='none')
{
require(plyr)
require(stringr)
scores = laply(allwords, function(subjectWord, poswords, negwords) {
subjectWord = gsub('[[:punct:]]', '', subjectWord)
subjectWord = gsub('[[:cntrl:]]', '', subjectWord)
subjectWord = gsub('\\d+', '', subjectWord)
subjectWord = tolower(subjectWord)
wordListSA = str_split(subjectWord, '\\s+')
letterSA = unlist(wordListSA)
pos.matches = match(letterSA, poswords)
neg.matches = match(letterSA, negwords)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
score = sum(pos.matches) - (sum(neg.matches))
return(score)
}, poswords, negwords, .progress=.progress )
scores.df = data.frame(score=scores, text=allwords)
return(scores.df)
}
result = score.sentiment(allwords$text, poswords, negwords)
## Loading required package: plyr
View(result)
We classified the result and divide it to Negative, Neutral and Positive. Then change the row sequence. Changing the row sequence has relation with the document-term matrix function and then we make a new csv file which contain the latest version of the dataset.
#CONVERT SCORE TO SENTIMENT
result$classification<- ifelse(result$score<0, "Negative", ifelse(result$score==0,"Neutral","Positive"))
result$classification
## [1] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [8] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [15] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [22] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [29] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [36] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [43] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [50] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [57] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [64] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [71] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [78] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [85] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [92] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [99] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [106] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [113] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [120] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [127] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [134] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [141] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [148] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [155] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [162] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [169] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [176] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [183] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [190] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [197] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [204] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [211] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [218] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [225] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [232] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [239] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [246] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [253] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [260] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [267] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [274] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [281] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [288] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [295] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [302] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [309] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [316] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [323] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [330] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [337] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [344] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [351] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [358] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [365] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [372] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [379] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [386] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [393] "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral" "Neutral"
## [400] "Neutral" "Neutral" "Neutral" "Neutral"
View(result)
#EXCHANGE ROW SEQUENCE
data <- result[c(3,1,2)]
View(data)
write.csv(data, file = "sriwijayaAir2.csv")
We divide data sriwijayaAir2 to Negative, Neutral and Positive into different files with help of the stopwords. We do the same steps like in second step, but this is specifically for the respective categories.
We do the cleaning level 2 for this category from sriwijayaNegative.csv.
The most often negative words from people is when they critisize the delay time (late) and how many hours they should wait. Some of them put ‘bad’ or ‘worst’ in their comments.
docs<-readLines("sriwijayaNegative.csv")
# Load the data as a corpus
docs <- Corpus(VectorSource(docs))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("airlines"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, c("airlines")): transformation
## drops documents
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
#Replace words
docs <- tm_map(docs, gsub, pattern="delayed", replacement="delay")
## Warning in tm_map.SimpleCorpus(docs, gsub, pattern = "delayed", replacement =
## "delay"): transformation drops documents
Here we calculate the DTM and find a new result that included in negative sentiments. After that we build the word cloud and find the accosiation with the current datasets. Then just do the same steps for Positive categories.
We can see here the head() result is different than proviously. The result is more targetting to the closest words of negativity.
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 15)
## word freq
## delay delay 336
## time time 170
## hours hours 115
## cheap cheap 70
## check check 70
## bad bad 63
## worst worst 61
## flights flights 60
## service service 59
## jakarta jakarta 55
## late late 54
## hour hour 54
## staff staff 53
## minutes minutes 44
## delays delays 41
#Generate the Word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=50, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
#Explore frequent terms and their associations
findFreqTerms(dtm, lowfreq = 4)
## [1] "delay" "excuse" "finally" "flew"
## [5] "half" "hours" "rude" "staff"
## [9] "working" "worst" "ago" "days"
## [13] "frequent" "booked" "called" "change"
## [17] "customer" "due" "earlier" "indonesia"
## [21] "leg" "number" "person" "recently"
## [25] "room" "seats" "service" "terrible"
## [29] "time" "uncomfortable" "dont" "money"
## [33] "waste" "delays" "flights" "gate"
## [37] "group" "month" "multiple" "travelling"
## [41] "wrong" "airport" "bandung" "schedule"
## [45] "trip" "wanted" "bali" "case"
## [49] "horrible" "put" "services" "supposed"
## [53] "yogyakarta" "week" "communication" "connecting"
## [57] "jakarta" "kind" "late" "missed"
## [61] "sat" "boarding" "june" "long"
## [65] "passenger" "tired" "wait" "waiting"
## [69] "wanna" "years" "weeks" "book"
## [73] "changed" "hour" "leave" "left"
## [77] "allowed" "arrived" "min" "reason"
## [81] "run" "told" "aircraft" "cgk"
## [85] "fly" "kul" "mins" "taking"
## [89] "avoid" "don" "times" "bad"
## [93] "experience" "bus" "cheap" "day"
## [97] "passengers" "people" "plane" "singapore"
## [101] "terminal" "wasn" "cancelled" "find"
## [105] "lombok" "medan" "minutes" "moved"
## [109] "original" "transit" "business" "ended"
## [113] "morning" "back" "coming" "destination"
## [117] "previous" "problems" "asked" "check"
## [121] "leaving" "attendant" "budget" "buy"
## [125] "english" "poor" "space" "water"
## [129] "announcement" "didn<U+0092>" "gave" "good"
## [133] "hrs" "info" "looked" "negative"
## [137] "review" "stated" "thing" "absolutely"
## [141] "flying" "frills" "legroom" "cramped"
## [145] "july" "announced" "departure" "night"
## [149] "traveled" "experienced" "return" "bags"
## [153] "make" "lost" "checked" "kinda"
## [157] "received" "sms" "end" "low"
## [161] "planes" "email" "phone" "arrival"
## [165] "baggage" "international" "made" "result"
## [169] "surabaya" "taxi" "small" "things"
## [173] "food" "date" "option" "performance"
## [177] "problem" "solo" "board" "cabin"
## [181] "inside" "management" "queue" "sin"
## [185] "care" "domestic" "don<U+0092>" "early"
## [189] "lie" "average" "beginning" "heard"
## [193] "inform" "minute" "notice" "ticket"
## [197] "booking" "didn" "give" "landing"
## [201] "website" "extremely" "checking" "found"
## [205] "normal" "route" "asia" "big"
## [209] "company" "explanation" "makassar" "routes"
## [213] "served" "home" "price" "today"
## [217] "issue" "worse" "attendants" "luckily"
## [221] "save" "short" "depart" "hotel"
## [225] "island" "year" "call" "card"
## [229] "happen" "news" "payment" "compensation"
## [233] "information" "arrive" "batam" "chaotic"
## [237] "flown" "seat" "indonesian" "scheduled"
## [241] "choice" "sriwijaya" "airplane" "choose"
## [245] "kuala" "march" "friend" "luggage"
## [249] "slow" "waited" "ground" "comfortable"
## [253] "travel" "completely" "unreliable" "pay"
## [257] "counter" "doesn" "forced" "job"
## [261] "past" "full" "landed" "chaos"
## [265] "recommend" "customers" "lack" "risk"
## [269] "great" "online" "april" "expect"
## [273] "happened" "screens" "drink" "awful"
## [277] "broken" "open" "complaints" "extra"
## [281] "row" "expensive" "manado" "cost"
## [285] "thai" "issues" "reviews" "fare"
## [289] "manage" "operational" "clean" "dirty"
## [293] "bit" "boeing" "front" "lot"
## [297] "scared" "february" "connection" "crew"
## [301] "safety" "thought" "miss" "january"
## [305] "bag" "worth" "surprise" "desk"
## [309] "december" "crews" "denpasar" "direct"
## [313] "felt" "place" "weather" "hope"
## [317] "attitude" "local" "offer" "nice"
## [321] "usual" "informed" "turned" "wings"
## [325] "november" "situation" "nightmare" "october"
## [329] "lucky" "line" "september" "august"
## [333] "ontime" "free" "hatta" "soekarno"
## [337] "world" "handling" "staffs" "process"
## [341] "boarded" "complain" "reputation" "allowance"
## [345] "maintenance" "bangkok" "drop"
#Words associations
negAssoc<-as.list(findAssocs(dtm, terms =c("lost","delay","time","hours","cheap","check","uncomfortable","unfriendly","bad","worst","flights","service"),
corlimit = c(0.10,0.10,0.10,0.10,0.10,0.10,0.10,0.10,0.10)))
negAssoc
## $lost
## send absentee doomed gap popular responsible
## 0.33 0.29 0.29 0.29 0.29 0.29
## bern helpfull hire passanger plastic prevent
## 0.29 0.29 0.29 0.29 0.29 0.29
## stollen tagging thief wraping padlock shame
## 0.29 0.29 0.29 0.29 0.29 0.29
## career feel malindo mark typical assignment
## 0.29 0.29 0.29 0.29 0.29 0.29
## insurance perth prepaid shoulders shrugged squeezed
## 0.29 0.29 0.29 0.29 0.29 0.29
## believing info lot cost don<U+0092> owner
## 0.29 0.26 0.25 0.24 0.21 0.20
## professional prepare work mine ife show
## 0.20 0.20 0.20 0.20 0.20 0.20
## stuff idr switched answer rest bags
## 0.20 0.20 0.20 0.20 0.20 0.19
## found destination money missed understand lumpur
## 0.18 0.17 0.16 0.16 0.16 0.16
## thirty tickets happening crap meal staff
## 0.16 0.16 0.16 0.16 0.16 0.15
## november waste case connecting wanna didn<U+0092>
## 0.15 0.14 0.14 0.14 0.14 0.14
## asia served kuala screens drink connection
## 0.14 0.14 0.14 0.14 0.14 0.14
## ground rude announcement luggage pay great
## 0.13 0.12 0.12 0.12 0.12 0.12
## person attitude
## 0.11 0.11
##
## $delay
## hour important itinerary msia initially wit
## 0.27 0.23 0.23 0.23 0.23 0.23
## kul meeting screen jakarta additionally affected
## 0.21 0.21 0.19 0.18 0.18 0.18
## collect led announce assumed hoursi improve
## 0.18 0.18 0.18 0.18 0.18 0.18
## schedules sunday half minutes business didn<U+0092>
## 0.18 0.18 0.17 0.17 0.17 0.17
## traveled garuda week min flyer matter
## 0.17 0.16 0.16 0.15 0.14 0.14
## urgent heavy updated supposed terminal original
## 0.14 0.14 0.14 0.13 0.13 0.13
## announcement experienced recommended <U+0092>ve compensation delivery
## 0.13 0.13 0.13 0.13 0.13 0.13
## driver stands annountment guest informaion trang
## 0.13 0.13 0.13 0.13 0.13 0.13
## amsterdam pressure verbal fantastic guinness record
## 0.13 0.13 0.13 0.13 0.13 0.13
## records cuman dan kerjanya upgrade carrier
## 0.13 0.13 0.13 0.13 0.13 0.13
## precision stereotype uncommon due tired transit
## 0.13 0.13 0.13 0.12 0.12 0.12
## previous result surabaya routes boarding waiting
## 0.12 0.12 0.12 0.12 0.11 0.11
## arrive airplane choose traffic transfer suggest
## 0.11 0.11 0.11 0.10 0.10 0.10
## luck notorious
## 0.10 0.10
##
## $time
## afraid depart announce announced <U+0092>ve
## 0.26 0.23 0.21 0.20 0.20
## estimated bed departure rescheduled resulting
## 0.18 0.18 0.18 0.18 0.18
## unnecessary generally note ongoing travellers
## 0.18 0.18 0.18 0.18 0.18
## assumed hoursi improve schedules sunday
## 0.18 0.18 0.18 0.18 0.18
## give aircraft traveled landed break
## 0.17 0.16 0.16 0.16 0.16
## pass jogjakarta boarding mins good
## 0.16 0.16 0.15 0.15 0.15
## arrive minutes make management leave
## 0.15 0.14 0.14 0.14 0.13
## times price issue manage lucky
## 0.13 0.13 0.13 0.13 0.13
## earlier hour fly provide screen
## 0.12 0.12 0.12 0.12 0.12
## occasions lumpur balikpapan bit denpasar
## 0.12 0.12 0.12 0.12 0.12
## multiple jakarta book min kul
## 0.11 0.11 0.11 0.11 0.11
## taking middle untill experienced domestic
## 0.11 0.11 0.11 0.11 0.11
## prior versa vice height suitable
## 0.11 0.11 0.11 0.11 0.11
## flied gladly overbooked regularly distracted
## 0.11 0.11 0.11 0.11 0.11
## regret trained happened impressed charges
## 0.11 0.11 0.11 0.11 0.11
## comeback luckly rumours informed passed
## 0.11 0.11 0.11 0.11 0.11
## understood apparent finds holding irresponsible
## 0.11 0.11 0.11 0.11 0.11
## walking frustrating amd buisness timimg
## 0.11 0.11 0.11 0.11 0.11
## departing entertainment precisely traveler arriving
## 0.11 0.11 0.11 0.11 0.11
## fit timer gili taxes fault
## 0.11 0.11 0.11 0.11 0.11
## politely pure refer baby carried
## 0.11 0.11 0.11 0.11 0.11
## exceeded penalty cases concern smoothness
## 0.11 0.11 0.11 0.11 0.11
## door loath passage reinforced aircondition
## 0.11 0.11 0.11 0.11 0.11
## compartment condition leaking respect maximum
## 0.11 0.11 0.11 0.11 0.11
## close increased running someones weird
## 0.11 0.11 0.11 0.11 0.11
## speaker high hopes toodenpasar precision
## 0.11 0.11 0.11 0.11 0.11
## stereotype uncommon solo notice reputation
## 0.11 0.11 0.10 0.10 0.10
##
## $hours
## finally supposed connecting jakarta sat
## 0.28 0.23 0.22 0.22 0.20
## wait communication hour leave runway
## 0.19 0.18 0.18 0.18 0.17
## joke screen unprofessional bali border
## 0.17 0.17 0.17 0.16 0.16
## conditioned failed horribly love pandang
## 0.16 0.16 0.16 0.16 0.16
## ujung arrived kul apologize chronic
## 0.16 0.16 0.16 0.16 0.16
## single supposedly important itinerary msia
## 0.16 0.16 0.16 0.16 0.16
## fell account locals initially wit
## 0.16 0.16 0.16 0.16 0.16
## center ferry guess midnight alright
## 0.16 0.16 0.16 0.16 0.16
## marks watch write city economic
## 0.16 0.16 0.16 0.16 0.16
## minimum minjmum prepare snack error
## 0.16 0.16 0.16 0.16 0.16
## impressed manado granted loved repeat
## 0.16 0.16 0.16 0.16 0.16
## responsibility warn fitting gunungsitoli terrain
## 0.16 0.16 0.16 0.16 0.16
## entiteit het results stopover unexpected
## 0.16 0.16 0.16 0.16 0.16
## amsterdam pressure verbal idr arriving
## 0.16 0.16 0.16 0.16 0.16
## manager custoner desapointed life saturday
## 0.16 0.16 0.16 0.16 0.16
## totaly unacceptable palembang airport missed
## 0.16 0.16 0.16 0.15 0.14
## tired didn<U+0092> direct days person
## 0.14 0.14 0.14 0.13 0.13
## left till coming announced recommended
## 0.13 0.12 0.12 0.12 0.12
## explanation system arrive sitting dont
## 0.12 0.12 0.12 0.12 0.11
## late plane yogyakarta boarding waiting
## 0.11 0.11 0.10 0.10 0.10
## give
## 0.10
##
## $cheap
## price crime overload public
## 0.34 0.34 0.34 0.34
## related passenger hospitality complain
## 0.34 0.25 0.23 0.23
## padang reaaaaly square tend
## 0.22 0.22 0.22 0.22
## sweating tortured worth land
## 0.22 0.22 0.21 0.18
## officer crews attention case
## 0.18 0.18 0.18 0.15
## frills lines breath sauna
## 0.15 0.15 0.15 0.15
## boeing besar aircraft gave
## 0.15 0.15 0.13 0.13
## things recommend lot kind
## 0.13 0.13 0.13 0.12
## narrow fair famous offer
## 0.11 0.11 0.11 0.11
## aisle boarded sumbawa reasonable
## 0.11 0.11 0.11 0.11
## services cars jetstar pick
## 0.10 0.10 0.10 0.10
## pricier scoot airfare regency
## 0.10 0.10 0.10 0.10
## unresponsible travels evemt alright
## 0.10 0.10 0.10 0.10
## charges fitting gunungsitoli terrain
## 0.10 0.10 0.10 0.10
## dying heavily meat monitoring
## 0.10 0.10 0.10 0.10
## nickels succeeded turbulence operate
## 0.10 0.10 0.10 0.10
## volcanos oke termnal airlane
## 0.10 0.10 0.10 0.10
## strategic hot wemakepeoplelate width
## 0.10 0.10 0.10 0.10
## positive add capacity cleaning
## 0.10 0.10 0.10 0.10
## corner crumbs loaded mars
## 0.10 0.10 0.10 0.10
## reading shaky bux sir
## 0.10 0.10 0.10 0.10
## substantial assume efficent pan
## 0.10 0.10 0.10 0.10
## airports compagny destinations forgotten
## 0.10 0.10 0.10 0.10
## mataram organized quickly stop
## 0.10 0.10 0.10 0.10
## amazing knownn dirt jakata
## 0.10 0.10 0.10 0.10
## practices wonderful hehehe goog
## 0.10 0.10 0.10 0.10
## means firstly kgs offers
## 0.10 0.10 0.10 0.10
## purchase recognise upto serves
## 0.10 0.10 0.10 0.10
## prices cold depend profesional
## 0.10 0.10 0.10 0.10
## simply undeniable close increased
## 0.10 0.10 0.10 0.10
## running someones weird includes
## 0.10 0.10 0.10 0.10
## frustating learn series
## 0.10 0.10 0.10
##
## $check
## counter officer web baggage website fews
## 0.38 0.34 0.34 0.30 0.30 0.30
## handle kno knouse persist dollar longest
## 0.30 0.30 0.30 0.30 0.30 0.30
## separation unbelievably aisle slow counters open
## 0.30 0.30 0.28 0.26 0.26 0.25
## allowance drop queue cgk desk requested
## 0.24 0.24 0.23 0.22 0.22 0.22
## online long advance crowded phones front
## 0.21 0.20 0.20 0.20 0.20 0.20
## accomodating eventhough explained lock prefer career
## 0.20 0.20 0.20 0.20 0.20 0.20
## feel malindo mark typical brought noticed
## 0.20 0.20 0.20 0.20 0.20 0.20
## staffs assume efficent pan krabi willingness
## 0.20 0.20 0.20 0.20 0.20 0.20
## doubtful frankly meet queuing distressing paralyzed
## 0.20 0.20 0.20 0.20 0.20 0.20
## played upsetting suitcase crime overload public
## 0.20 0.20 0.20 0.20 0.20 0.20
## related luggage person result date beginning
## 0.20 0.19 0.18 0.18 0.18 0.18
## terminal made mins checked regular seat
## 0.17 0.17 0.16 0.16 0.16 0.16
## impossible thirty limited attention told attitude
## 0.16 0.16 0.16 0.16 0.15 0.15
## problems announcement performance carry hand prior
## 0.14 0.13 0.13 0.13 0.13 0.13
## served happily network forced gates bring
## 0.13 0.13 0.13 0.13 0.13 0.13
## priority ife hospitality hatta soekarno didnt
## 0.13 0.13 0.13 0.13 0.13 0.13
## encounter boarded problem ground bag usual
## 0.13 0.13 0.12 0.12 0.12 0.12
## staff budget bags arrival board found
## 0.11 0.11 0.11 0.11 0.11 0.11
## depart landed thought situation complain departure
## 0.11 0.11 0.11 0.11 0.11 0.10
## cabin price
## 0.10 0.10
##
## $uncomfortable
## border conditioned failed horribly bed
## 0.41 0.41 0.41 0.41 0.41
## rescheduled resulting unnecessary airborne based
## 0.41 0.41 0.41 0.41 0.41
## turn health moment peoples sis
## 0.41 0.41 0.41 0.41 0.41
## finally engine organization middle untill
## 0.29 0.28 0.28 0.28 0.28
## conditioner bring priority sister big
## 0.28 0.28 0.28 0.28 0.25
## number seat common tall managed
## 0.24 0.24 0.23 0.23 0.23
## runway called traveled recently sat
## 0.23 0.20 0.20 0.17 0.17
## website worse person multiple taking
## 0.17 0.17 0.16 0.16 0.16
## earlier announced night seats kind
## 0.14 0.14 0.14 0.13 0.13
## booked leg room communication original
## 0.12 0.12 0.12 0.11 0.11
##
## $unfriendly
## hot wemakepeoplelate width krabi
## 0.71 0.71 0.71 0.71
## willingness advance pitch staffs
## 0.71 0.50 0.50 0.50
## reschedule impossible explanation worth
## 0.41 0.41 0.37 0.35
## wanted crews average extra
## 0.31 0.31 0.22 0.21
## dirty aircraft pay told
## 0.21 0.18 0.16 0.15
## asked counter cabin arrived
## 0.15 0.15 0.13 0.12
## departure
## 0.11
##
## $bad
## fitting gunungsitoli terrain amd
## 0.23 0.23 0.23 0.23
## buisness timimg baby carried
## 0.23 0.23 0.23 0.23
## changi strike taxi home
## 0.23 0.23 0.20 0.20
## part famous experience hatta
## 0.19 0.19 0.18 0.16
## soekarno traffic competitors nias
## 0.16 0.15 0.15 0.15
## jam hear beautiful mood
## 0.15 0.15 0.15 0.15
## employees afraid transit experienced
## 0.15 0.15 0.14 0.13
## choice nightmare reputation day
## 0.13 0.13 0.13 0.12
## couldn continue honestly including
## 0.12 0.12 0.12 0.12
## timing experiences officer aircraft
## 0.12 0.12 0.12 0.11
## eid mubarak situations exploring
## 0.11 0.11 0.11 0.11
## rough news bottom extensive
## 0.11 0.11 0.11 0.11
## otp question rock point
## 0.11 0.11 0.11 0.11
## rinjani counter past reached
## 0.11 0.11 0.11 0.11
## smaller visit banda charge
## 0.11 0.11 0.11 0.11
## reliable satisfied unreasonable spooky
## 0.11 0.11 0.11 0.11
## comeback luckly rumours passed
## 0.11 0.11 0.11 0.11
## understood compared padlock shame
## 0.11 0.11 0.11 0.11
## began complicated promised raising
## 0.11 0.11 0.11 0.11
## aint class doha geneva
## 0.11 0.11 0.11 0.11
## qatar switzerland suck traveler
## 0.11 0.11 0.11 0.11
## hot wemakepeoplelate width positive
## 0.11 0.11 0.11 0.11
## fit timer cuman dan
## 0.11 0.11 0.11 0.11
## kerjanya upgrade chat unlike
## 0.11 0.11 0.11 0.11
## unproffesional attempts ranging stopped
## 0.11 0.11 0.11 0.11
## treatment airasia jakata practices
## 0.11 0.11 0.11 0.11
## wonderful doubtful frankly meet
## 0.11 0.11 0.11 0.11
## queuing costumer custom faraway
## 0.11 0.11 0.11 0.11
## immigration standing respect simply
## 0.11 0.11 0.11 0.11
## undeniable comparable filed fleets
## 0.11 0.11 0.11 0.11
## stewardesses young cry final
## 0.11 0.11 0.11 0.11
## mnts tagline leave inside
## 0.11 0.11 0.10 0.10
##
## $worst
## discriminated filipinos working totally anymore
## 0.37 0.37 0.26 0.25 0.25
## weeks ago plans staff whatsoever
## 0.22 0.21 0.20 0.18 0.16
## thing felt started customer medan
## 0.16 0.16 0.16 0.13 0.13
## indication directed operators love pandang
## 0.12 0.11 0.11 0.11 0.11
## ujung planet estimated chronically airlome
## 0.11 0.11 0.11 0.11 0.11
## absentee doomed arrive fears realised
## 0.11 0.11 0.11 0.11 0.11
## disgraceful sweating tortured pray typing
## 0.11 0.11 0.11 0.11 0.11
## closed learned transformed chat unlike
## 0.11 0.11 0.11 0.11 0.11
## unproffesional haven apathetic collectively equipment
## 0.11 0.11 0.11 0.11 0.11
## delivered egg fried namu penang
## 0.11 0.11 0.11 0.11 0.11
## portion rice taste complete costs
## 0.11 0.11 0.11 0.11 0.11
## meaning stick stranger weekly fan
## 0.11 0.11 0.11 0.11 0.11
## indonesians rarely sabang shouldn weh
## 0.11 0.11 0.11 0.11 0.11
## dollar longest separation unbelievably appropiate
## 0.11 0.11 0.11 0.11 0.11
## putting sept custom faraway immigration
## 0.11 0.11 0.11 0.11 0.11
## standing precision stereotype uncommon appalling
## 0.11 0.11 0.11 0.11 0.11
## compete developed interface widest
## 0.11 0.11 0.11 0.11
##
## $flights
## yogya occasions indonesia internal
## 0.25 0.25 0.23 0.23
## canceled announcements directed operators
## 0.23 0.23 0.22 0.22
## initially wit oke termnal
## 0.22 0.22 0.22 0.22
## appalling compete developed interface
## 0.22 0.22 0.22 0.22
## widest group <U+0092>ve hotel
## 0.22 0.18 0.18 0.18
## attitude wrong miss knowing
## 0.17 0.16 0.16 0.15
## organization informing hrs star
## 0.15 0.15 0.15 0.15
## affected transfer updated nias
## 0.15 0.15 0.15 0.15
## lousy hostess sleep months
## 0.15 0.15 0.15 0.15
## frequently accomodation frill double
## 0.15 0.15 0.15 0.15
## delaying hit issues bajo
## 0.15 0.15 0.14 0.14
## labuan due late arrival
## 0.14 0.13 0.13 0.13
## domestic customers nightmare connecting
## 0.13 0.13 0.13 0.12
## leaving room month multiple
## 0.12 0.11 0.11 0.11
## trip border conditioned failed
## 0.11 0.11 0.11 0.11
## horribly times people thing
## 0.11 0.11 0.11 0.11
## accumulated additionally collect kupang
## 0.11 0.11 0.11 0.11
## led account locals screen
## 0.11 0.11 0.11 0.11
## compensation delivery driver stands
## 0.11 0.11 0.11 0.11
## lounge arrange deal domestically
## 0.11 0.11 0.11 0.11
## loss office onward refuse
## 0.11 0.11 0.11 0.11
## resulted doesn<U+0092> perfectly rebrand
## 0.11 0.11 0.11 0.11
## <U+0091>delay<U+0092> satisfied unreasonable forget
## 0.11 0.11 0.11 0.11
## lay recall levels meetings
## 0.11 0.11 0.11 0.11
## skip stress apparent cancel
## 0.11 0.11 0.11 0.11
## finds holding irresponsible walking
## 0.11 0.11 0.11 0.11
## began complicated promised raising
## 0.11 0.11 0.11 0.11
## connect cuz incompetence paying
## 0.11 0.11 0.11 0.11
## generally note ongoing travellers
## 0.11 0.11 0.11 0.11
## confusing alll hasanuddin horrid
## 0.11 0.11 0.11 0.11
## upg extended hasn indifferent
## 0.11 0.11 0.11 0.11
## preceding luuggage rule weight
## 0.11 0.11 0.11 0.11
## wory chat unlike unproffesional
## 0.11 0.11 0.11 0.11
## cleaning corner crumbs loaded
## 0.11 0.11 0.11 0.11
## mars airplease delayno realize
## 0.11 0.11 0.11 0.11
## assume efficent pan expence
## 0.11 0.11 0.11 0.11
## bandar controls negatives balcklist
## 0.11 0.11 0.11 0.11
## changing imagine attention leading
## 0.11 0.11 0.11 0.11
## reservations appropiate putting sept
## 0.11 0.11 0.11 0.11
## acknowledge caused dinner eating
## 0.11 0.11 0.11 0.11
## unsafe span spent high
## 0.11 0.11 0.11 0.11
## hopes desks fiew muenag
## 0.11 0.11 0.11 0.11
## unexperienced centered eyes fixed
## 0.11 0.11 0.11 0.11
## sins unsatisfying leader
## 0.11 0.11 0.11
##
## $service
## customer term english totally
## 0.40 0.34 0.29 0.25
## negligible pathetic speak chances
## 0.24 0.24 0.24 0.24
## poorest quit responsiveness showing
## 0.24 0.24 0.24 0.24
## slower unpredictable doesn<U+0092> perfectly
## 0.24 0.24 0.24 0.24
## rebrand <U+0091>delay<U+0092> contacted departs
## 0.24 0.24 0.24 0.24
## names passport appalling compete
## 0.24 0.24 0.24 0.24
## developed interface widest cheaper
## 0.24 0.24 0.24 0.20
## attention terrible attitude person
## 0.20 0.19 0.18 0.17
## attendant explain competitors complained
## 0.16 0.16 0.16 0.16
## effort image lousy cheapest
## 0.16 0.16 0.16 0.16
## poor recently changed space
## 0.15 0.14 0.14 0.14
## care bottle thing average
## 0.14 0.12 0.12 0.12
## proper yogya family balikpapan
## 0.12 0.12 0.12 0.12
## needed refund confusing reasonable
## 0.12 0.12 0.12 0.12
## estimated mess risky slightly
## 0.11 0.11 0.11 0.11
## smiled evemt exceptionally reached
## 0.11 0.11 0.11 0.11
## smaller visit duration flexible
## 0.11 0.11 0.11 0.11
## housekeeping makes airways cloud
## 0.11 0.11 0.11 0.11
## flewn grey wheather satisfied
## 0.11 0.11 0.11 0.11
## unreasonable bahasa fitting gunungsitoli
## 0.11 0.11 0.11 0.11
## terrain lcc disappeared general
## 0.11 0.11 0.11 0.11
## considered cust hiding suggestion
## 0.11 0.11 0.11 0.11
## turtle yeah oke termnal
## 0.11 0.11 0.11 0.11
## reasons ways airplease delayno
## 0.11 0.11 0.11 0.11
## realize difference instructed making
## 0.11 0.11 0.11 0.11
## move talked displeasure forwarding
## 0.11 0.11 0.11 0.11
## lows mechanical discriminated filipinos
## 0.11 0.11 0.11 0.11
## goog costumer barrly helped
## 0.11 0.11 0.11 0.11
## spoke woman suitcase custom
## 0.11 0.11 0.11 0.11
## faraway immigration standing cons
## 0.11 0.11 0.11 0.11
## smells comfort disadvantage main
## 0.11 0.11 0.11 0.11
## crime overload public related
## 0.11 0.11 0.11 0.11
## high hopes
## 0.11 0.11
#barplot
k<-barplot(d[1:20,]$freq, las = 2, names.arg = d[1:20,]$word, cex.axis=1.2, cex.names=1.2,
main ="Most frequent words",
ylab = "Words frequencies", col = topo.colors(20))
termFrequency <- rowSums(as.matrix(dtm))
termFrequency <- subset(termFrequency, termFrequency>=5)
text(k,sort(termFrequency, decreasing = T)-
1,labels=sort(termFrequency, decreasing = T),pch = 6, cex = 1)
docs<-readLines("sriwijayaNet.csv")
# Load the data as a corpus
docs <- Corpus(VectorSource(docs))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("airlines"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, c("airlines")): transformation
## drops documents
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
#Build a term-document matrix
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 15)
## word freq
## time time 83
## service service 36
## late late 34
## check check 30
## budget budget 29
## flights flights 28
## cheap cheap 28
## fly fly 28
## june june 28
## july july 27
## hours hours 26
## april april 26
## august august 26
## plane plane 25
## delayed delayed 25
#Generate the Word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words=50, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
#Explore frequent terms and their associations
findFreqTerms(dtm, lowfreq = 4)
## [1] "indonesia" "ago" "budget" "crew" "days"
## [6] "efficient" "ground" "late" "usual" "landing"
## [11] "plane" "small" "time" "week" "fine"
## [16] "check" "online" "baggage" "option" "book"
## [21] "booked" "flights" "flying" "good" "pay"
## [26] "arrived" "buy" "change" "cost" "hours"
## [31] "price" "ticket" "cheap" "free" "give"
## [36] "luggage" "weeks" "meal" "dps" "jakarta"
## [41] "july" "return" "scheduled" "snack" "staff"
## [46] "delay" "delayed" "food" "passengers" "service"
## [51] "thing" "bag" "board" "destination" "expect"
## [56] "long" "make" "money" "water" "sriwijaya"
## [61] "bad" "comfortable" "connecting" "didn" "due"
## [66] "min" "safety" "seats" "airplane" "bali"
## [71] "fly" "morning" "reason" "surabaya" "delays"
## [76] "times" "june" "schedule" "airport" "dont"
## [81] "flew" "space" "great" "company" "boarding"
## [86] "cabin" "gate" "counter" "experience" "nice"
## [91] "seat" "batik" "low" "malaysia" "standard"
## [96] "thai" "thailand" "wings" "day" "departure"
## [101] "boeing" "choice" "don" "group" "hour"
## [106] "lombok" "left" "offered" "office" "hot"
## [111] "made" "choose" "indonesian" "people" "lcc"
## [116] "carrier" "wait" "arrival" "denpasar" "terminal"
## [121] "april" "aircraft" "minutes" "leg" "room"
## [126] "batam" "short" "connection" "night" "friendly"
## [131] "average" "march" "wasn" "airasia" "asia"
## [136] "boarded" "yogyakarta" "february" "bit" "inflight"
## [141] "legroom" "bangkok" "huge" "december" "travel"
## [146] "january" "offer" "route" "credit" "november"
## [151] "booking" "october" "half" "medan" "lot"
## [156] "bags" "september" "extra" "august" "complaints"
## [161] "trip" "years" "customer" "part" "accept"
## [166] "snacks" "care" "quick" "domestic"
#Words accociation
neutAssoc<-as.list(findAssocs(dtm, terms =c("time","service","late","check","budget","cheap","flights","fly","june"),
corlimit = c(0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15)))
neutAssoc
## $time
## snacks customers factors striving accident alternative
## 0.32 0.29 0.29 0.29 0.29 0.29
## citilink notorious past promo small boarded
## 0.29 0.29 0.29 0.29 0.27 0.27
## recommend comfortable booked flying safety departure
## 0.26 0.25 0.22 0.22 0.21 0.21
## complaints cgk landed runway month requested
## 0.21 0.20 0.20 0.19 0.19 0.19
## mins terrible slightly subsidiary queue kno
## 0.19 0.19 0.19 0.19 0.19 0.19
## aisle price apprehensive bottles couple eventful
## 0.19 0.18 0.18 0.18 0.18 0.18
## mother stellar wee centre selection tidied
## 0.18 0.18 0.18 0.18 0.18 0.18
## brake confusing machine onboard pdg servicd
## 0.18 0.18 0.18 0.18 0.18 0.18
## boat gili imagine reach stood stopped
## 0.18 0.18 0.18 0.18 0.18 0.18
## taxes caught didnt factor myslef rear
## 0.18 0.18 0.18 0.18 0.18 0.18
## reline squeezed facilities instant noodles sold
## 0.18 0.18 0.18 0.18 0.18 0.18
## quick efficient arrived surabaya extra bit
## 0.17 0.16 0.16 0.16 0.16 0.15
##
## $service
## ambon awesome cut full lower makasar
## 0.44 0.44 0.44 0.44 0.44 0.44
## personally board offered smile terrible increased
## 0.44 0.38 0.35 0.30 0.30 0.29
## larger pitch possibly guest impression nationality
## 0.29 0.29 0.29 0.29 0.29 0.29
## professional companies employees entered experienced looked
## 0.29 0.29 0.29 0.29 0.29 0.29
## mad mood customer denpasar plane good
## 0.29 0.29 0.27 0.25 0.21 0.21
## staff excellent expect traffic slightly daily
## 0.21 0.19 0.19 0.19 0.19 0.19
## smaller unexpected landing flew route bad
## 0.19 0.19 0.17 0.17 0.17 0.16
## seat lcc
## 0.16 0.15
##
## $late
## pilots afternoon fuel interconnecting pool
## 0.31 0.30 0.30 0.30 0.30
## fried point recently yogya min
## 0.30 0.30 0.30 0.30 0.29
## passengers leaving evening box recommended
## 0.25 0.25 0.20 0.20 0.20
## chicken doesn credit hrs hours
## 0.20 0.20 0.18 0.16 0.16
## supposed lucky big save back
## 0.16 0.16 0.16 0.16 0.16
## gave choice
## 0.16 0.15
##
## $check
## agreement reality rejected business minute
## 0.39 0.39 0.39 0.39 0.39
## belt chatting employee german germany
## 0.39 0.39 0.39 0.39 0.39
## hannover living tag slow comfort
## 0.39 0.39 0.39 0.37 0.37
## closed passport bali counter staff
## 0.37 0.37 0.32 0.32 0.28
## departure missing traveled playing queue
## 0.28 0.27 0.27 0.27 0.27
## helpful jambi passenger european actual
## 0.27 0.27 0.27 0.27 0.26
## annoyance avoiding knew taking front
## 0.26 0.26 0.26 0.26 0.26
## guy inches journeys prevent reclined
## 0.26 0.26 0.26 0.26 0.26
## reclining year caught didnt factor
## 0.26 0.26 0.26 0.26 0.26
## myslef rear reline squeezed approximately
## 0.26 0.26 0.26 0.26 0.26
## clouds difficult ensure order penetrate
## 0.26 0.26 0.26 0.26 0.26
## rain thick vacation web asian
## 0.26 0.26 0.26 0.26 0.26
## attempting comparing double locally main
## 0.26 0.26 0.26 0.26 0.26
## picky ryanair size wizzair customer
## 0.26 0.26 0.26 0.26 0.25
## luggage online minutes hrs weather
## 0.24 0.23 0.23 0.22 0.22
## seat friendly put told snacks
## 0.22 0.22 0.22 0.22 0.22
## give efficient surabaya airport checked
## 0.21 0.20 0.20 0.20 0.18
## realised due worried office issue
## 0.18 0.18 0.18 0.18 0.18
## convenient legs window biscuit travelling
## 0.18 0.18 0.18 0.18 0.18
## cities unable ticket didn
## 0.18 0.18 0.16 0.16
##
## $budget
## bottom nokair ranks region southeast
## 0.51 0.51 0.51 0.51 0.51
## suits traveller care malindo put
## 0.51 0.51 0.41 0.28 0.28
## compare person money airasia emergency
## 0.28 0.23 0.21 0.19 0.18
## carrier cramp lttle airbus favorite
## 0.17 0.16 0.16 0.16 0.16
## shock conventional rate efficiently fuss
## 0.16 0.16 0.16 0.16 0.16
## local purposes run spacious fitri
## 0.16 0.16 0.16 0.16 0.16
## hari idul include kilos raya
## 0.16 0.16 0.16 0.16 0.16
## unlike worry basic expectations hostesses
## 0.16 0.16 0.16 0.16 0.16
## table trey additional assign beverages
## 0.16 0.16 0.16 0.16 0.16
## seldom solid differentiators expecting interior
## 0.16 0.16 0.16 0.16 0.16
## kgcheck offers smal touches
## 0.16 0.16 0.16 0.16
##
## $cheap
## bkk cheerful difference melindo partner
## 0.35 0.35 0.35 0.35 0.35
## replaced fight comfortable boarded decent
## 0.35 0.31 0.30 0.25 0.24
## cities aisle price leg snacks
## 0.24 0.24 0.23 0.22 0.22
## services asked landed wheelchair free
## 0.19 0.19 0.19 0.19 0.18
## booked add charges complain march
## 0.17 0.17 0.17 0.17 0.17
## affordable airthey badwas beforetravelled child
## 0.17 0.17 0.17 0.17 0.17
## uswould flexible recomended aircrafts east
## 0.17 0.17 0.17 0.17 0.17
## ideal including location providing assistance
## 0.17 0.17 0.17 0.17 0.17
## perfect solid facilities instant noodles
## 0.17 0.17 0.17 0.17 0.17
## sold easy dessert mini asian
## 0.17 0.17 0.17 0.17 0.17
## attempting comparing double locally main
## 0.17 0.17 0.17 0.17 0.17
## picky ryanair size wizzair airliner
## 0.17 0.17 0.17 0.17 0.17
## airplanes comfy good great room
## 0.17 0.17 0.16 0.16 0.16
## complaints
## 0.16
##
## $flights
## carriers baggage famous doesn answer
## 0.36 0.34 0.34 0.34 0.32
## heavier impossible offee overseas shows
## 0.32 0.32 0.32 0.32 0.32
## considered domestically endure etd explanation
## 0.32 0.32 0.32 0.32 0.32
## share single locals multiple private
## 0.32 0.32 0.32 0.32 0.32
## trusted fried point recently yogya
## 0.32 0.32 0.32 0.32 0.32
## experiences luck splitted book comments
## 0.32 0.32 0.32 0.29 0.28
## connecting found gonna box delayed
## 0.23 0.22 0.22 0.22 0.22
## sell pilots recommended caused guess
## 0.22 0.22 0.22 0.22 0.22
## read chicken received journey inconvenient
## 0.22 0.22 0.22 0.22 0.22
## reasonable charged process exit excess
## 0.22 0.22 0.22 0.22 0.22
## credit good fact charge international
## 0.20 0.18 0.17 0.17 0.17
## save back emergency don leaving
## 0.17 0.17 0.17 0.17 0.17
## gave asked bit biggest flying
## 0.17 0.17 0.17 0.17 0.16
## lines queuing safe simple afternoon
## 0.16 0.16 0.16 0.16 0.16
## fuel interconnecting pool unprofessional boarding
## 0.16 0.16 0.16 0.16 0.16
## day apologise draft failed jobs
## 0.16 0.16 0.16 0.16 0.16
## promised rewards testament opt apprehensive
## 0.16 0.16 0.16 0.16 0.16
## bottles couple eventful mother stellar
## 0.16 0.16 0.16 0.16 0.16
## wee plenty airportsoverall alright chain
## 0.16 0.16 0.16 0.16 0.16
## delivered reaction technical traffics fitri
## 0.16 0.16 0.16 0.16 0.16
## hari idul include kilos raya
## 0.16 0.16 0.16 0.16 0.16
## unlike worry accomplishing cheap<U+0094> policy
## 0.16 0.16 0.16 0.16 0.16
## slogan apparently complete figure joke
## 0.16 0.16 0.16 0.16 0.16
## mob payment reserve transfers website
## 0.16 0.16 0.16 0.16 0.16
## dream excellentonline finethe goodflight internal
## 0.16 0.16 0.16 0.16 0.16
## nervous timecomfort sound choices compertable
## 0.16 0.16 0.16 0.16 0.16
## feel makes megazine missed lots
## 0.16 0.16 0.16 0.16 0.16
## overbooked popular prices problems routes
## 0.16 0.16 0.16 0.16 0.16
## daylight duration leatherette nightmare nightvarrivals
## 0.16 0.16 0.16 0.16 0.16
## rarely scheduling booked
## 0.16 0.16 0.15
##
## $fly
## affordability buses connect facts inter
## 0.35 0.35 0.35 0.35 0.35
## island islands replacing ships capable
## 0.35 0.35 0.35 0.35 0.35
## job shown today announcement management
## 0.35 0.35 0.35 0.31 0.30
## top recommended serves punctuality archipelago
## 0.30 0.24 0.24 0.24 0.24
## destination big garuda remote scheduled
## 0.19 0.19 0.19 0.19 0.18
## part make afternoon fuel interconnecting
## 0.18 0.17 0.17 0.17 0.17
## pool everwhere unprofessional thailand bun
## 0.17 0.17 0.17 0.17 0.17
## england scuffed tip weren entertainment
## 0.17 0.17 0.17 0.17 0.17
## jogja widely airthey badwas beforetravelled
## 0.17 0.17 0.17 0.17 0.17
## child uswould ahead arrive mineral
## 0.17 0.17 0.17 0.17 0.17
## minimum prepared provide refreshment tough
## 0.17 0.17 0.17 0.17 0.17
## aircrafts east accomplishing cheap<U+0094> policy
## 0.17 0.17 0.17 0.17 0.17
## slogan seldom circumstances aggravations airbuses
## 0.17 0.17 0.17 0.17 0.17
## annoyances continual discount happily home
## 0.17 0.17 0.17 0.17 0.17
## jumpseat noticed operates rare besar
## 0.17 0.17 0.17 0.17 0.17
## friday party sumbawa summary surfer
## 0.17 0.17 0.17 0.17 0.17
## surfers travelled tuesday affiliation debut
## 0.17 0.17 0.17 0.17 0.17
## drinking honour kinda surprise airlane
## 0.17 0.17 0.17 0.17 0.17
## chance definately postpone recommanded trafic
## 0.17 0.17 0.17 0.17 0.17
## boeing connection morning
## 0.16 0.16 0.15
##
## $june
## hostess slept acoustic guitar motto king nature
## 0.26 0.26 0.18 0.18 0.18 0.18 0.18
## pgk plm timers charter guangzhou manado attached
## 0.18 0.18 0.18 0.18 0.18 0.18 0.18
## behavior chair concerns confidence express photo writing
## 0.18 0.18 0.18 0.18 0.18 0.18 0.18
## nerve wracking cnx dmk
## 0.18 0.18 0.18 0.18
#barplot
k<-barplot(d[1:20,]$freq, las = 2, names.arg = d[1:20,]$word,cex.axis=1.2, cex.names=1.2,
main ="Most frequent neutral words", ylab = "Word frequencies Neutral", col =topo.colors(20))
termFrequency <- rowSums(as.matrix(dtm))
termFrequency <- subset(termFrequency, termFrequency>=5)
text(k,sort(termFrequency, decreasing = T)- 1,labels=sort(termFrequency, decreasing = T),pch = 6, cex = 1)
docs<-readLines("sriwijayaPositive.csv")
# Load the data as a corpus
docs <- Corpus(VectorSource(docs))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("airlines"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, c("airlines")): transformation
## drops documents
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
#Build a term-document matrix
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 15)
## word freq
## good good 111
## time time 111
## service service 62
## price price 55
## low low 37
## check check 36
## friendly friendly 33
## cost cost 32
## budget budget 30
## crew crew 30
## jakarta jakarta 28
## seat seat 28
## august august 28
## staff staff 27
## flights flights 27
#Generate the Word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words=50, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
#Explore frequent terms and their associations
findFreqTerms(dtm, lowfreq = 4)
## [1] "budget" "check" "crew" "efficient"
## [5] "expect" "fast" "ground" "short"
## [9] "worth" "bit" "cost" "easy"
## [13] "fly" "good" "landing" "nice"
## [17] "price" "time" "trip" "clean"
## [21] "counter" "gate" "paid" "pay"
## [25] "planes" "properly" "service" "staff"
## [29] "surprised" "ticket" "wasn" "ago"
## [33] "allowance" "cheap" "entertainment" "fare"
## [37] "free" "luggage" "seating" "travel"
## [41] "bad" "comfortable" "july" "aircraft"
## [45] "amazing" "arrived" "bali" "booking"
## [49] "choose" "early" "helpful" "lot"
## [53] "minute" "minutes" "pleasant" "small"
## [57] "recommended" "made" "make" "smooth"
## [61] "thing" "delayed" "passengers" "plane"
## [65] "ready" "reputation" "room" "schedule"
## [69] "waiting" "june" "board" "drink"
## [73] "food" "jakarta" "sriwijaya" "late"
## [77] "experience" "carry" "cheapest" "find"
## [81] "indonesia" "legroom" "meal" "pretty"
## [85] "space" "cheaper" "denpasar" "flying"
## [89] "lombok" "prices" "provide" "services"
## [93] "surabaya" "long" "trips" "affordable"
## [97] "great" "company" "friendly" "times"
## [101] "didn" "choice" "extra" "hour"
## [105] "seats" "destination" "enjoy" "reason"
## [109] "domestic" "fair" "hours" "kuala"
## [113] "love" "lumpur" "airport" "bangkok"
## [117] "dmk" "english" "found" "front"
## [121] "seat" "suitable" "travelled" "delays"
## [125] "doesn" "flew" "lcc" "back"
## [129] "water" "delay" "problem" "smile"
## [133] "excellent" "local" "book" "web"
## [137] "dont" "flown" "guess" "multiple"
## [141] "plenty" "quality" "years" "boarding"
## [145] "departure" "didnt" "flights" "recommend"
## [149] "low" "online" "people" "queue"
## [153] "april" "line" "thai" "singapore"
## [157] "full" "leg" "return" "due"
## [161] "leave" "punctual" "march" "carrier"
## [165] "european" "afternoon" "class" "economy"
## [169] "cabin" "landed" "comfort" "day"
## [173] "morning" "fine" "baggage" "lucky"
## [177] "february" "system" "work" "airplane"
## [181] "reasonable" "satisfied" "transit" "january"
## [185] "special" "airways" "half" "don"
## [189] "buy" "route" "attendants" "average"
## [193] "batik" "boeing" "snack" "snacks"
## [197] "december" "reviews" "money" "travelling"
## [201] "booked" "checked" "website" "main"
## [205] "safety" "tight" "november" "indonesian"
## [209] "october" "offer" "simple" "night"
## [213] "pleasantly" "perfect" "scheduled" "september"
## [217] "timing" "exit" "standard" "person"
## [221] "august" "serve" "improvement" "condition"
## [225] "included" "frills" "quick" "left"
## [229] "asia" "hope" "hot" "emergency"
## [233] "attendant" "wings" "happy" "group"
#Word Association
posAssoc<-as.list(findAssocs(dtm, terms =c("ontime","good","service","price","low","check","friendly","cost","budget"),
corlimit = c(0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15)))
posAssoc
## $ontime
## everytime true assistance avoiding caring
## 0.58 0.58 0.58 0.58 0.58
## detail wont garuda based airasia
## 0.58 0.58 0.40 0.40 0.40
## flies traveloka wheelchair international taking
## 0.40 0.40 0.40 0.33 0.33
## business companies improve fair love
## 0.33 0.33 0.33 0.28 0.28
## multiple system timing delayed lot
## 0.28 0.28 0.28 0.26 0.23
## booked choice fast carrier choose
## 0.23 0.21 0.19 0.19 0.18
## hours kuala lumpur people return
## 0.18 0.18 0.18 0.16 0.16
##
## $good
## kindly lost management port responsible delay
## 0.43 0.43 0.43 0.43 0.43 0.27
## problem managed staff trip hour afternoon
## 0.27 0.21 0.19 0.18 0.17 0.17
## condition job forward concept fixed airplane
## 0.17 0.15 0.15 0.15 0.15 0.15
## aunt traveled money chech fire limited
## 0.15 0.15 0.15 0.15 0.15 0.15
## menu resonable cases rate answer approached
## 0.15 0.15 0.15 0.15 0.15 0.15
## broken contact grease men mistreated stained
## 0.15 0.15 0.15 0.15 0.15 0.15
## suitcases outstanding screens handsome
## 0.15 0.15 0.15 0.15
##
## $service
## standard famous jogjakarta concept fixed larger
## 0.27 0.24 0.24 0.23 0.23 0.23
## monthly availability enticing favour fool huge
## 0.23 0.23 0.23 0.23 0.23 0.23
## section levels man minimum observe occur
## 0.23 0.23 0.23 0.23 0.23 0.23
## fair doesn lot found care hidden
## 0.21 0.21 0.20 0.20 0.18 0.18
## review foot lcc offer fly delays
## 0.18 0.18 0.17 0.17 0.16 0.16
##
## $price
## taxes wanted reasonable flew year skip insurance
## 0.36 0.36 0.32 0.24 0.24 0.24 0.24
## add forward biscuit compliment pack position basic
## 0.24 0.23 0.23 0.23 0.23 0.23 0.23
## con older warm advance baht friday table
## 0.23 0.23 0.23 0.23 0.23 0.23 0.23
## suitable people affordable consistent lcc month mid
## 0.21 0.20 0.19 0.18 0.18 0.18 0.18
## takeoff option real phuket taller delayed perfect
## 0.18 0.18 0.18 0.18 0.18 0.16 0.16
## easy famous usual airasia row traveloka reserve
## 0.15 0.15 0.15 0.15 0.15 0.15 0.15
##
## $low
## world printing prone releatively virtually outstanding
## 0.39 0.28 0.28 0.28 0.28 0.28
## screens credible complains honored immediately inbound
## 0.28 0.28 0.28 0.28 0.28 0.28
## outbound request general national operate tracks
## 0.28 0.28 0.28 0.28 0.28 0.28
## airliners airplanes biggest question carrier departure
## 0.28 0.28 0.28 0.28 0.27 0.23
## feel european class economy open hurry
## 0.23 0.19 0.19 0.19 0.18 0.18
## printed desks experiences takes traveling mile
## 0.18 0.18 0.18 0.18 0.18 0.18
## problems bagage offered perfectly delays
## 0.18 0.18 0.18 0.18 0.16
##
## $check
## counter queue pass complimentary printed
## 0.42 0.40 0.35 0.35 0.32
## planned print printing criticism holidaylanded
## 0.32 0.32 0.31 0.31 0.31
## busy provided web surprised reviews
## 0.31 0.31 0.30 0.29 0.29
## online didnt pleasantly person emergency
## 0.28 0.21 0.21 0.21 0.21
## open slow hurry boarding world
## 0.20 0.20 0.20 0.20 0.20
## bagage gave sector easy exit
## 0.20 0.20 0.20 0.19 0.19
## fast bit lot drink business
## 0.18 0.16 0.16 0.16 0.16
## checking claim review foot thought
## 0.16 0.16 0.16 0.16 0.16
## inexpensive
## 0.16
##
## $friendly
## general national operate tracks spacious
## 0.34 0.34 0.34 0.34 0.28
## feel cabin safe crew times
## 0.28 0.28 0.28 0.25 0.24
## staff aircrafts felt safer attendants
## 0.23 0.23 0.23 0.23 0.23
## worried traveling problems gave bus
## 0.23 0.23 0.23 0.23 0.21
## delays found planes solid expected
## 0.20 0.18 0.17 0.17 0.17
## recent crews competitive aircraft postponed
## 0.17 0.17 0.17 0.16 0.16
## cold ceilings higher ports seat
## 0.16 0.16 0.16 0.16 0.16
## spoke usb anouncement buziness connections
## 0.16 0.16 0.16 0.16 0.16
## patience internal required dps including
## 0.16 0.16 0.16 0.16 0.16
## unit upg hassle smartly waste
## 0.16 0.16 0.16 0.16 0.16
## confess transport attentdant demonstration life
## 0.16 0.16 0.16 0.16 0.16
## talk uniform blablai surprise talked
## 0.16 0.16 0.16 0.16 0.16
## suggest gotta interpretations landings mini
## 0.16 0.16 0.16 0.16 0.16
## returned vacation abroad collection daily
## 0.16 0.16 0.16 0.16 0.16
## initiated kupang baltik chip delete
## 0.16 0.16 0.16 0.16 0.16
## hear partners medium numerous routes
## 0.16 0.16 0.16 0.16 0.16
## altitude crap cruising decend minimal
## 0.16 0.16 0.16 0.16 0.16
## weather complains honored immediately inbound
## 0.16 0.16 0.16 0.16 0.16
## outbound request enforce rules alseep
## 0.16 0.16 0.16 0.16 0.16
## contrary existent fall models sped
## 0.16 0.16 0.16 0.16 0.16
## trolley serving desk robot god
## 0.16 0.16 0.16 0.16 0.16
## krabi recomendded drinking enthusiastic promotions
## 0.16 0.16 0.16 0.16 0.16
## indonesia frills
## 0.15 0.15
##
## $cost
## world carrier printing american east lower
## 0.43 0.36 0.31 0.31 0.31 0.31
## sits complains honored immediately inbound outbound
## 0.31 0.31 0.31 0.31 0.31 0.31
## request general national operate tracks airliners
## 0.31 0.31 0.31 0.31 0.31 0.31
## airplanes biggest question european feel hurry
## 0.31 0.31 0.31 0.27 0.25 0.21
## printed desks takes traveling mile problems
## 0.21 0.21 0.21 0.21 0.21 0.21
## middle honest perfectly attentive nice frills
## 0.21 0.21 0.21 0.21 0.20 0.19
## departure counter spacious solid pass future
## 0.18 0.16 0.16 0.16 0.16 0.16
## safe legs cramped selection
## 0.16 0.16 0.16 0.16
##
## $budget
## included frills boarded meal checked slow
## 0.34 0.29 0.29 0.26 0.26 0.24
## generous desks based maintained allowance carrier
## 0.24 0.24 0.24 0.24 0.22 0.22
## plenty plane business speaking traveler hidden
## 0.21 0.19 0.18 0.18 0.18 0.18
## thought cebu comparing counters efficient lousy
## 0.18 0.17 0.17 0.17 0.17 0.17
## motion moved pace pacific workers displayed
## 0.17 0.17 0.17 0.17 0.17 0.17
## weren compartments fighting hey overhead loves
## 0.17 0.17 0.17 0.17 0.17 0.17
## travelers baggage route aorplane arrange woowwww
## 0.17 0.17 0.17 0.17 0.17 0.17
## accents smiled thick tired conscious airports
## 0.17 0.17 0.17 0.17 0.17 0.17
## coordinate staffthere announced competent considerable eat
## 0.17 0.17 0.17 0.17 0.17 0.17
## fas number public serviceable stroll strolled
## 0.17 0.17 0.17 0.17 0.17 0.17
## beware comforting basic con older warm
## 0.17 0.17 0.17 0.17 0.17 0.17
## assistance avoiding caring detail wont comment
## 0.17 0.17 0.17 0.17 0.17 0.17
## jetstar login paying reasons similar buiscuit
## 0.17 0.17 0.17 0.17 0.17 0.17
## extras direction thier inter modern young
## 0.17 0.17 0.17 0.17 0.17 0.17
## desk robot preferred alight expense navigation
## 0.17 0.17 0.17 0.17 0.17 0.17
## couple int sale offer reason english
## 0.17 0.17 0.17 0.16 0.15 0.15
## system
## 0.15
#barplot
k<-barplot(d[1:20,]$freq, las = 2, names.arg = d[1:20,]$word, cex.axis=1.2, cex.names=1.2,
main ="Most frequent words", ylab = "Word frequencies",col =topo.colors(20))
termFrequency <- rowSums(as.matrix(dtm))
termFrequency <- subset(termFrequency, termFrequency>=5)
text(k, sort(termFrequency, decreasing = T)-1, labels=sort(termFrequency, decreasing = T), pch = 6, cex = 1)
From the sentiment analysis that we do, the negative sentiment from comments is higher than the positive one. This can be the suggestion for Sriwijaya Air to fixed their services and especially about the delays. The schedule should be fixed and on time, except when there is something happen in the major things.
We hope there is no plane crashed from Sriwijaya Air in the future and Sriwijaya can maintain their services better in the future.