Catie Peranzi 11/17/2016
Since my Honda Accord was just totaled, I am in the market for buying a new car. Based on what I have seen on Edmunds website, which allows you to compare different vehicles, I am between buying a Honda CR-V, Honda HR-V or Mazda CX-5. I want to see what Twitter is saying about Honda and Mazda to see if it will help me make a decision.
First I will load libraries and set up Twitter authentication.
library(rmarkdown)
library(twitteR)
library(dplyr)
library(knitr)
library(tidytext)
library(ggplot2)
library(stringr)
library(wordcloud)
library(RColorBrewer)
## [1] "Using direct authentication"
I pulled the last 1000 tweets that used #Honda.
num_tweets <- 1000
ht<- searchTwitter('#Honda', n = num_tweets)
head(ht)
## [[1]]
## [1] "ToyotaAlphard: RT @car_review_net: ホンダの新型シャトルをモータージャーナリストの河口まなぶさんが紹介しています。 実用車なので走りだけではなく、荷室や装備の使い https://t.co/BZnoijRGDY #shuttle #honda #wagon https://…"
##
## [[2]]
## [1] "GordonMercedes9: https://t.co/VhnriksROr\n#MacauGP #Honda #mcguinness #redtorpedo\nMacau GP - first practice. Top shot of John at Moor… https://t.co/XaqT5fRq7t"
##
## [[3]]
## [1] "h_kh41053030: RT @getoncar: 新型シビックタイプRがフルモデルチェンジして2017年に発売!\n新型シビックタイプRの最新情報まとめ\nhttps://t.co/PmUTKAlDMO\n#civictyper #honda https://t.co/ZvXoTUiy2d"
##
## [[4]]
## [1] "UstunbasAkin: #cbr650f #honda #geceler #uzun \xed\xa0\xbc\xed\xbf\x81\xed\xa0\xbc\xed\xbf\x8d\xed\xa0\xbd\xed\xb8\x8e https://t.co/urep3CFxcB"
##
## [[5]]
## [1] "JACKODIAMONDS1: Tandy Bowen #Honda https://t.co/SctcGo1Scq https://t.co/7QzMD625BW"
##
## [[6]]
## [1] "NoDalyLifestyle: engineering at its finest#az #mexican #redneck #me #boost #funny #f4f #nodailylifestyle #turbo #honda #mod… https://t.co/gcS6FUPuxa"
Then I converted the list into a data frame.
ht_df <- twListToDF(ht)
head(ht_df)
## text
## 1 RT @car_review_net: ホンダの新型シャトルをモータージャーナリストの河口まなぶさんが紹介しています。 実用車なので走りだけではなく、荷室や装備の使い https://t.co/BZnoijRGDY #shuttle #honda #wagon https://…
## 2 https://t.co/VhnriksROr\n#MacauGP #Honda #mcguinness #redtorpedo\nMacau GP - first practice. Top shot of John at Moor… https://t.co/XaqT5fRq7t
## 3 RT @getoncar: 新型シビックタイプRがフルモデルチェンジして2017年に発売!\n新型シビックタイプRの最新情報まとめ\nhttps://t.co/PmUTKAlDMO\n#civictyper #honda https://t.co/ZvXoTUiy2d
## 4 #cbr650f #honda #geceler #uzun \xed\xa0\xbc\xed\xbf\x81\xed\xa0\xbc\xed\xbf\x8d\xed\xa0\xbd\xed\xb8\x8e https://t.co/urep3CFxcB
## 5 Tandy Bowen #Honda https://t.co/SctcGo1Scq https://t.co/7QzMD625BW
## 6 engineering at its finest#az #mexican #redneck #me #boost #funny #f4f #nodailylifestyle #turbo #honda #mod… https://t.co/gcS6FUPuxa
## favorited favoriteCount replyToSN created truncated
## 1 FALSE 0 <NA> 2016-11-17 22:05:11 FALSE
## 2 FALSE 0 <NA> 2016-11-17 22:05:11 TRUE
## 3 FALSE 0 <NA> 2016-11-17 22:05:07 FALSE
## 4 FALSE 0 <NA> 2016-11-17 22:04:20 FALSE
## 5 FALSE 0 <NA> 2016-11-17 22:02:32 FALSE
## 6 FALSE 0 <NA> 2016-11-17 22:02:19 TRUE
## replyToSID id replyToUID
## 1 <NA> 799372840701075456 <NA>
## 2 <NA> 799372839057059840 <NA>
## 3 <NA> 799372824171323392 <NA>
## 4 <NA> 799372624694419457 <NA>
## 5 <NA> 799372172191932416 <NA>
## 6 <NA> 799372119704485893 <NA>
## statusSource
## 1 <a href="http://www.yahoo.co.jp" rel="nofollow">自動車ニュース</a>
## 2 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 3 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 4 <a href="http://instagram.com" rel="nofollow">Instagram</a>
## 5 <a href="http://motofuze.com" rel="nofollow">MotoFuze Post</a>
## 6 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## screenName retweetCount isRetweet retweeted longitude latitude
## 1 ToyotaAlphard 1 TRUE FALSE <NA> <NA>
## 2 GordonMercedes9 0 FALSE FALSE <NA> <NA>
## 3 h_kh41053030 139 TRUE FALSE <NA> <NA>
## 4 UstunbasAkin 0 FALSE FALSE <NA> <NA>
## 5 JACKODIAMONDS1 0 FALSE FALSE <NA> <NA>
## 6 NoDalyLifestyle 0 FALSE FALSE <NA> <NA>
In order to analyze the results, I needed to tidy the text. I wanted to look for the most common words being used in tweets containing #Honda, so I ran the frequencies for the top 20 words.
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Honda_words <- ht_df %>% filter(!str_detect(text, '^"')) %>% mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>% filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
kable(Honda_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(20))
| word | n |
|---|---|
| #honda | 986 |
| rt | 453 |
| honda | 212 |
| car | 134 |
| #civic | 124 |
| #kawasaki | 109 |
| #yamaha | 108 |
| #civictyper | 102 |
| #harley | 101 |
| @getoncar | 100 |
| @harley | 95 |
| xs03 | 95 |
| dvd | 80 |
| systems | 80 |
| #finance | 74 |
| civic | 69 |
| #deals | 61 |
| cvt | 54 |
| en | 46 |
| finance | 46 |
Based on this list, the Honda Civic seems quite popular.
I performed a sentiment analysis to see how Twitter users feel about Honda.
nrc <-sentiments %>% filter(lexicon == "nrc") %>% select(word, sentiment)
kable(head(nrc))
| word | sentiment |
|---|---|
| abacus | trust |
| abandon | fear |
| abandon | negative |
| abandon | sadness |
| abandoned | anger |
| abandoned | fear |
I needed to join nrc to Honda_words and look at the sentiment counts.
Honda_words_sentiments <- Honda_words %>% inner_join(nrc, by = "word")
kable(Honda_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))
| sentiment | n |
|---|---|
| positive | 255 |
| anticipation | 150 |
| joy | 102 |
| trust | 101 |
| negative | 81 |
| surprise | 66 |
| fear | 51 |
| anger | 24 |
| sadness | 23 |
| disgust | 14 |
I was not surprised that the top sentiment is positive. Looking at the bottom of the list, “anger” and “sadness” are likely due to users who are having issues with their car or their car was damaged in an accident.
It is important to weigh the negative and positive aspects of a Honda if I choose to buy one, so I will start by making a wordcloud containing the negative coded words.
cloud_negative <- brewer.pal(8,"Set1")
Honda_words_sentiments %>% filter(sentiment == "negative") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)
Since I am not super knowledgeable about cars, I was unfamiliar with the word “cyclone” in relation to Honda. I found that there is a Honda dealership called Cyclone Honda and there is a new engine called the Cyclone in some of the newer models. However, this seems to be a small engine and used mostly in motorcycles. I would be looking at the CR-V, so I don’t think this should be an issue, but I will be sure to double check that.
Then I made a wordcloud containing the positive coded words.
cloud_positive <- brewer.pal(8,"Set1")
Honda_words_sentiments %>% filter(sentiment == "positive") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)
I am not surprised to see “Accord” as a leading positive coded word. I know I loved mine!
I wanted to see if Honda’s twitter account used the #Honda the most, or if it is “fans” of Honda using #Honda.
kable(ht_df %>%
group_by(screenName) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = n/sum(n)) %>%
arrange(desc(n)) %>%
top_n(10))
| screenName | n | percent_of_tweets |
|---|---|---|
| BuyHondaCivic | 29 | 0.029 |
| oldrugbygrump62 | 14 | 0.014 |
| anthony212 | 9 | 0.009 |
| BoskyB | 9 | 0.009 |
| harleypartsman7 | 9 | 0.009 |
| motorcycle_manu | 9 | 0.009 |
| UsedVehicleSale | 9 | 0.009 |
| bareno_motor | 8 | 0.008 |
| alyssonsccp10 | 7 | 0.007 |
| vamp_kris2102 | 6 | 0.006 |
I particularly like the second username and I was surprised that the list contained all “fans” and none of them were Honda’s twitter username.
I am going to do the same thing for #Mazda that I did with #Honda. I started by pulling the last 1000 tweets using #Mazda and created a data frame.
num_tweets <- 1000
mt<- searchTwitter('#Mazda', n = num_tweets)
head(mt)
mt_df <- twListToDF(mt)
head(mt_df)
## text
## 1 Третий кроссовер Opel дебютирует осенью 2017 года #Mazda CX-9 #Mazda CX-9 #Mazda CX-9 #opel #opel-grandland #fr... https://t.co/haYoBwt6nN
## 2 ¡El #Mazda 6 con un exterior deportivo e interior renovado tiene un carácter elegante e incorformista➡… https://t.co/jefgBbzbJk
## 3 Mazda RX 7 S Model (1984) #Mazda https://t.co/AetuN9QpYX This car is a true collector s dream! We purchased it https://t.co/DooVfixpnk
## 4 Impresionante diseño del #MAZDA #rt24p que competirá en #IMSA 2017 haciendo su debut en las 24 horas de Daytona… https://t.co/kxOq7LV0ye
## 5 RT @blumer_miss: I have car finance with first response – can i upgrade? #mazda #finance https://t.co/VFwJXzV86I
## 6 Pretty much sums up how I spend every other Wednesday. #Mazda #Mx5 #mk1 #RustBucket #Drifting… https://t.co/RR58uCEDxv
## favorited favoriteCount replyToSN created truncated
## 1 FALSE 0 <NA> 2016-11-17 22:00:57 FALSE
## 2 FALSE 0 <NA> 2016-11-17 22:00:55 TRUE
## 3 FALSE 0 <NA> 2016-11-17 22:00:12 FALSE
## 4 FALSE 1 <NA> 2016-11-17 21:58:46 TRUE
## 5 FALSE 0 <NA> 2016-11-17 21:57:48 FALSE
## 6 FALSE 0 <NA> 2016-11-17 21:55:25 FALSE
## replyToSID id replyToUID
## 1 <NA> 799371774559481860 <NA>
## 2 <NA> 799371765596176384 <NA>
## 3 <NA> 799371588273635328 <NA>
## 4 <NA> 799371224463867904 <NA>
## 5 <NA> 799370983438106624 <NA>
## 6 <NA> 799370382591410176 <NA>
## statusSource
## 1 <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
## 2 <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
## 3 <a href="http://bay2car.com" rel="nofollow">capipaula182</a>
## 4 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 5 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 6 <a href="http://instagram.com" rel="nofollow">Instagram</a>
## screenName retweetCount isRetweet retweeted longitude latitude
## 1 livecars_ru 0 FALSE FALSE <NA> <NA>
## 2 Tempul_Mazda 0 FALSE FALSE <NA> <NA>
## 3 capipaula182 0 FALSE FALSE <NA> <NA>
## 4 guardianesdrive 0 FALSE FALSE <NA> <NA>
## 5 KalaVachon 5 TRUE FALSE <NA> <NA>
## 6 JCOStunts 0 FALSE FALSE -0.59730173 52.23786032
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Mazda_words <- mt_df %>% filter(!str_detect(text, '^"')) %>% mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>% filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
kable(Mazda_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(20))
| word | n |
|---|---|
| #mazda | 1013 |
| rt | 506 |
| #imsa | 227 |
| rt24 | 224 |
| imsa | 190 |
| @f1gate | 176 |
| cx | 169 |
| #rt24p | 157 |
| #laautoshow | 100 |
| mazda | 100 |
| #cx5 | 65 |
| car | 64 |
| finance | 64 |
| en | 63 |
| #finance | 61 |
| de | 55 |
| el | 48 |
| prototype | 46 |
| la | 45 |
| #cars | 36 |
I would likely buy the CX-5 if I went with a Mazda, so I was happy to see that twice on the list.
I performed a sentiment analysis to see how Twitter users feel about Mazda.
nrc <-sentiments %>% filter(lexicon == "nrc") %>% select(word, sentiment)
kable(head(nrc))
I joined nrc to Mazda_words and looked at the sentiment counts.
Mazda_words_sentiments <- Mazda_words %>% inner_join(nrc, by = "word")
kable(Mazda_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))
| sentiment | n |
|---|---|
| positive | 282 |
| anticipation | 126 |
| trust | 117 |
| joy | 89 |
| negative | 80 |
| anger | 53 |
| sadness | 46 |
| surprise | 44 |
| fear | 33 |
| disgust | 24 |
Again, I was not surprised to see many more positive tweets than negative for #Mazda.
I will start by making a wordcloud containing the negative coded words.
cloud_negative <- brewer.pal(8,"Set1")
Mazda_words_sentiments %>% filter(sentiment == "negative") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)
I am confused as to how “war” ended up in the wordcloud.
Now, I will make a wordcloud containing the positive coded words.
cloud_positive <- brewer.pal(8,"Set1")
Mazda_words_sentiments %>% filter(sentiment == "positive") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)
Nothing here was a surprise to me.
kable(mt_df %>%
group_by(screenName) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = n/sum(n)) %>%
arrange(desc(n)) %>%
top_n(10))
| screenName | n | percent_of_tweets |
|---|---|---|
| livecars_ru | 15 | 0.015 |
| RReloadedGarage | 12 | 0.012 |
| MoeDrives | 10 | 0.010 |
| parkmazda | 10 | 0.010 |
| ChrisAnhill | 9 | 0.009 |
| Hua_Jiun_Tuan | 9 | 0.009 |
| mattdrivescom | 9 | 0.009 |
| LatestMazda | 7 | 0.007 |
| mazdaknoop | 7 | 0.007 |
| adriana_Boho | 5 | 0.005 |
| MAZDA_RX7_BOT | 5 | 0.005 |
| stroisila | 5 | 0.005 |
| topiclyev | 5 | 0.005 |
Similar to #Honda tweets, theses are all “fans” of Mazda.
I want to finish by comparing the sentiment of the tweets for both makes of cars.
Honda_words_sentiments$make <- "Honda"
Mazda_words_sentiments$make <- "Mazda"
words_sentiments <- rbind(Honda_words_sentiments, Mazda_words_sentiments)
sent_df <- words_sentiments %>%
group_by(make, sentiment) %>%
summarize(n = n()) %>%
mutate(frequency = n/sum(n))
ggplot(sent_df, aes(x = sentiment, y = frequency, fill = make)) +
geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Set1") +
xlab("Sentiment") +
ylab("Percent of tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Generally it seems that twitter users feel about the same in regards to Honda and Mazda. We can see only slight differences in sentiments between the two. Anger, surprise, and fear saw the “biggest” differences. Tweets containing #Honda show slightly higher percentages of fear, surprise, joy, and anticipation. Tweets containing #Mazda show slightly higher percentages of anger, disgust, sadness, and trust.
Based on this data, Mazda and Honda seem pretty similar. I did not find anything that will help me pick between the two makes of car. I am going to have to decide based on how the test drives go!