Should I Buy Another Honda or Make the Switch to Mazda?

Catie Peranzi 11/17/2016

Since my Honda Accord was just totaled, I am in the market for buying a new car. Based on what I have seen on Edmunds website, which allows you to compare different vehicles, I am between buying a Honda CR-V, Honda HR-V or Mazda CX-5. I want to see what Twitter is saying about Honda and Mazda to see if it will help me make a decision.

First I will load libraries and set up Twitter authentication.

library(rmarkdown)
library(twitteR)
library(dplyr)
library(knitr)
library(tidytext)
library(ggplot2)
library(stringr)
library(wordcloud)
library(RColorBrewer)
## [1] "Using direct authentication"

Honda Tweets

I pulled the last 1000 tweets that used #Honda.

num_tweets <- 1000
ht<- searchTwitter('#Honda', n = num_tweets)
head(ht)
## [[1]]
## [1] "ToyotaAlphard: RT @car_review_net: ホンダの新型シャトルをモータージャーナリストの河口まなぶさんが紹介しています。 実用車なので走りだけではなく、荷室や装備の使い https://t.co/BZnoijRGDY #shuttle #honda #wagon https://…"
## 
## [[2]]
## [1] "GordonMercedes9: https://t.co/VhnriksROr\n#MacauGP #Honda #mcguinness #redtorpedo\nMacau GP - first practice. Top shot of John at Moor… https://t.co/XaqT5fRq7t"
## 
## [[3]]
## [1] "h_kh41053030: RT @getoncar: 新型シビックタイプRがフルモデルチェンジして2017年に発売!\n新型シビックタイプRの最新情報まとめ\nhttps://t.co/PmUTKAlDMO\n#civictyper #honda https://t.co/ZvXoTUiy2d"
## 
## [[4]]
## [1] "UstunbasAkin: #cbr650f #honda #geceler #uzun \xed\xa0\xbc\xed\xbf\x81\xed\xa0\xbc\xed\xbf\x8d\xed\xa0\xbd\xed\xb8\x8e https://t.co/urep3CFxcB"
## 
## [[5]]
## [1] "JACKODIAMONDS1: Tandy Bowen #Honda https://t.co/SctcGo1Scq https://t.co/7QzMD625BW"
## 
## [[6]]
## [1] "NoDalyLifestyle: engineering at its finest#az #mexican #redneck #me #boost #funny #f4f #nodailylifestyle #turbo #honda #mod… https://t.co/gcS6FUPuxa"

Then I converted the list into a data frame.

ht_df <- twListToDF(ht)
head(ht_df)
##                                                                                                                                                                                                         text
## 1 RT @car_review_net: ホンダの新型シャトルをモータージャーナリストの河口まなぶさんが紹介しています。 実用車なので走りだけではなく、荷室や装備の使い https://t.co/BZnoijRGDY #shuttle #honda #wagon https://…
## 2                                                             https://t.co/VhnriksROr\n#MacauGP #Honda #mcguinness #redtorpedo\nMacau GP - first practice. Top shot of John at Moor… https://t.co/XaqT5fRq7t
## 3                          RT @getoncar: 新型シビックタイプRがフルモデルチェンジして2017年に発売!\n新型シビックタイプRの最新情報まとめ\nhttps://t.co/PmUTKAlDMO\n#civictyper #honda https://t.co/ZvXoTUiy2d
## 4                                                                            #cbr650f #honda #geceler #uzun \xed\xa0\xbc\xed\xbf\x81\xed\xa0\xbc\xed\xbf\x8d\xed\xa0\xbd\xed\xb8\x8e https://t.co/urep3CFxcB
## 5                                                                                                                                         Tandy Bowen #Honda https://t.co/SctcGo1Scq https://t.co/7QzMD625BW
## 6                                                                        engineering at its finest#az #mexican #redneck #me #boost #funny #f4f #nodailylifestyle #turbo #honda #mod… https://t.co/gcS6FUPuxa
##   favorited favoriteCount replyToSN             created truncated
## 1     FALSE             0      <NA> 2016-11-17 22:05:11     FALSE
## 2     FALSE             0      <NA> 2016-11-17 22:05:11      TRUE
## 3     FALSE             0      <NA> 2016-11-17 22:05:07     FALSE
## 4     FALSE             0      <NA> 2016-11-17 22:04:20     FALSE
## 5     FALSE             0      <NA> 2016-11-17 22:02:32     FALSE
## 6     FALSE             0      <NA> 2016-11-17 22:02:19      TRUE
##   replyToSID                 id replyToUID
## 1       <NA> 799372840701075456       <NA>
## 2       <NA> 799372839057059840       <NA>
## 3       <NA> 799372824171323392       <NA>
## 4       <NA> 799372624694419457       <NA>
## 5       <NA> 799372172191932416       <NA>
## 6       <NA> 799372119704485893       <NA>
##                                                                         statusSource
## 1                 <a href="http://www.yahoo.co.jp" rel="nofollow">自動車ニュース</a>
## 2                 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 3 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 4                        <a href="http://instagram.com" rel="nofollow">Instagram</a>
## 5                     <a href="http://motofuze.com" rel="nofollow">MotoFuze Post</a>
## 6 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
##        screenName retweetCount isRetweet retweeted longitude latitude
## 1   ToyotaAlphard            1      TRUE     FALSE      <NA>     <NA>
## 2 GordonMercedes9            0     FALSE     FALSE      <NA>     <NA>
## 3    h_kh41053030          139      TRUE     FALSE      <NA>     <NA>
## 4    UstunbasAkin            0     FALSE     FALSE      <NA>     <NA>
## 5  JACKODIAMONDS1            0     FALSE     FALSE      <NA>     <NA>
## 6 NoDalyLifestyle            0     FALSE     FALSE      <NA>     <NA>

In order to analyze the results, I needed to tidy the text. I wanted to look for the most common words being used in tweets containing #Honda, so I ran the frequencies for the top 20 words.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Honda_words <- ht_df %>% filter(!str_detect(text, '^"')) %>% mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>% filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))

kable(Honda_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(20))
word n
#honda 986
rt 453
honda 212
car 134
#civic 124
#kawasaki 109
#yamaha 108
#civictyper 102
#harley 101
@getoncar 100
@harley 95
xs03 95
dvd 80
systems 80
#finance 74
civic 69
#deals 61
cvt 54
en 46
finance 46

Based on this list, the Honda Civic seems quite popular.

Sentimental Honda Word Clouds

I performed a sentiment analysis to see how Twitter users feel about Honda.

nrc <-sentiments %>% filter(lexicon == "nrc") %>% select(word, sentiment)
kable(head(nrc))
word sentiment
abacus trust
abandon fear
abandon negative
abandon sadness
abandoned anger
abandoned fear

I needed to join nrc to Honda_words and look at the sentiment counts.

Honda_words_sentiments <- Honda_words %>% inner_join(nrc, by = "word")

kable(Honda_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))
sentiment n
positive 255
anticipation 150
joy 102
trust 101
negative 81
surprise 66
fear 51
anger 24
sadness 23
disgust 14

I was not surprised that the top sentiment is positive. Looking at the bottom of the list, “anger” and “sadness” are likely due to users who are having issues with their car or their car was damaged in an accident.

It is important to weigh the negative and positive aspects of a Honda if I choose to buy one, so I will start by making a wordcloud containing the negative coded words.

cloud_negative <- brewer.pal(8,"Set1") 
Honda_words_sentiments %>% filter(sentiment == "negative") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)

Since I am not super knowledgeable about cars, I was unfamiliar with the word “cyclone” in relation to Honda. I found that there is a Honda dealership called Cyclone Honda and there is a new engine called the Cyclone in some of the newer models. However, this seems to be a small engine and used mostly in motorcycles. I would be looking at the CR-V, so I don’t think this should be an issue, but I will be sure to double check that.

Then I made a wordcloud containing the positive coded words.

cloud_positive <- brewer.pal(8,"Set1") 
Honda_words_sentiments %>% filter(sentiment == "positive") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)

I am not surprised to see “Accord” as a leading positive coded word. I know I loved mine!

Most Active Honda Users

I wanted to see if Honda’s twitter account used the #Honda the most, or if it is “fans” of Honda using #Honda.

kable(ht_df %>% 
  group_by(screenName) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n)) %>%
  top_n(10))
screenName n percent_of_tweets
BuyHondaCivic 29 0.029
oldrugbygrump62 14 0.014
anthony212 9 0.009
BoskyB 9 0.009
harleypartsman7 9 0.009
motorcycle_manu 9 0.009
UsedVehicleSale 9 0.009
bareno_motor 8 0.008
alyssonsccp10 7 0.007
vamp_kris2102 6 0.006

I particularly like the second username and I was surprised that the list contained all “fans” and none of them were Honda’s twitter username.

Mazda Tweets

I am going to do the same thing for #Mazda that I did with #Honda. I started by pulling the last 1000 tweets using #Mazda and created a data frame.

num_tweets <- 1000
mt<- searchTwitter('#Mazda', n = num_tweets)
head(mt)
mt_df <- twListToDF(mt)
head(mt_df)
##                                                                                                                                         text
## 1 Третий кроссовер Opel дебютирует осенью 2017 года #Mazda CX-9 #Mazda CX-9 #Mazda CX-9 #opel #opel-grandland #fr... https://t.co/haYoBwt6nN
## 2            ¡El #Mazda 6 con un exterior deportivo e interior renovado tiene un carácter elegante e incorformista➡… https://t.co/jefgBbzbJk
## 3     Mazda RX 7 S Model (1984) #Mazda https://t.co/AetuN9QpYX This car is a true collector s dream! We purchased it https://t.co/DooVfixpnk
## 4   Impresionante diseño del #MAZDA #rt24p que competirá en #IMSA 2017 haciendo su debut en las 24 horas de Daytona… https://t.co/kxOq7LV0ye
## 5                           RT @blumer_miss: I have car finance with first response – can i upgrade? #mazda #finance https://t.co/VFwJXzV86I
## 6                     Pretty much sums up how I spend every other Wednesday. #Mazda #Mx5 #mk1 #RustBucket #Drifting… https://t.co/RR58uCEDxv
##   favorited favoriteCount replyToSN             created truncated
## 1     FALSE             0      <NA> 2016-11-17 22:00:57     FALSE
## 2     FALSE             0      <NA> 2016-11-17 22:00:55      TRUE
## 3     FALSE             0      <NA> 2016-11-17 22:00:12     FALSE
## 4     FALSE             1      <NA> 2016-11-17 21:58:46      TRUE
## 5     FALSE             0      <NA> 2016-11-17 21:57:48     FALSE
## 6     FALSE             0      <NA> 2016-11-17 21:55:25     FALSE
##   replyToSID                 id replyToUID
## 1       <NA> 799371774559481860       <NA>
## 2       <NA> 799371765596176384       <NA>
## 3       <NA> 799371588273635328       <NA>
## 4       <NA> 799371224463867904       <NA>
## 5       <NA> 799370983438106624       <NA>
## 6       <NA> 799370382591410176       <NA>
##                                                         statusSource
## 1    <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
## 2    <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
## 3       <a href="http://bay2car.com" rel="nofollow">capipaula182</a>
## 4 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 5 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 6        <a href="http://instagram.com" rel="nofollow">Instagram</a>
##        screenName retweetCount isRetweet retweeted   longitude    latitude
## 1     livecars_ru            0     FALSE     FALSE        <NA>        <NA>
## 2    Tempul_Mazda            0     FALSE     FALSE        <NA>        <NA>
## 3    capipaula182            0     FALSE     FALSE        <NA>        <NA>
## 4 guardianesdrive            0     FALSE     FALSE        <NA>        <NA>
## 5      KalaVachon            5      TRUE     FALSE        <NA>        <NA>
## 6       JCOStunts            0     FALSE     FALSE -0.59730173 52.23786032
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Mazda_words <- mt_df %>% filter(!str_detect(text, '^"')) %>% mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>% filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))

kable(Mazda_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(20))
word n
#mazda 1013
rt 506
#imsa 227
rt24 224
imsa 190
@f1gate 176
cx 169
#rt24p 157
#laautoshow 100
mazda 100
#cx5 65
car 64
finance 64
en 63
#finance 61
de 55
el 48
prototype 46
la 45
#cars 36

I would likely buy the CX-5 if I went with a Mazda, so I was happy to see that twice on the list.

Sentimental Mazda Word Clouds

I performed a sentiment analysis to see how Twitter users feel about Mazda.

nrc <-sentiments %>% filter(lexicon == "nrc") %>% select(word, sentiment)
kable(head(nrc))

I joined nrc to Mazda_words and looked at the sentiment counts.

Mazda_words_sentiments <- Mazda_words %>% inner_join(nrc, by = "word")

kable(Mazda_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))
sentiment n
positive 282
anticipation 126
trust 117
joy 89
negative 80
anger 53
sadness 46
surprise 44
fear 33
disgust 24

Again, I was not surprised to see many more positive tweets than negative for #Mazda.

I will start by making a wordcloud containing the negative coded words.

cloud_negative <- brewer.pal(8,"Set1") 
Mazda_words_sentiments %>% filter(sentiment == "negative") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)

I am confused as to how “war” ended up in the wordcloud.

Now, I will make a wordcloud containing the positive coded words.

cloud_positive <- brewer.pal(8,"Set1") 
Mazda_words_sentiments %>% filter(sentiment == "positive") %>% count(word) %>% with(wordcloud(word, n, max.words = 150, scale=c(5,.5),min.freq=1, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"Set1")), rot.per=0.35)

Nothing here was a surprise to me.

Most Active Mazda Users

kable(mt_df %>% 
  group_by(screenName) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n)) %>%
  top_n(10))
screenName n percent_of_tweets
livecars_ru 15 0.015
RReloadedGarage 12 0.012
MoeDrives 10 0.010
parkmazda 10 0.010
ChrisAnhill 9 0.009
Hua_Jiun_Tuan 9 0.009
mattdrivescom 9 0.009
LatestMazda 7 0.007
mazdaknoop 7 0.007
adriana_Boho 5 0.005
MAZDA_RX7_BOT 5 0.005
stroisila 5 0.005
topiclyev 5 0.005

Similar to #Honda tweets, theses are all “fans” of Mazda.

Comparison of Mazda Tweets to Honda Tweets

I want to finish by comparing the sentiment of the tweets for both makes of cars.

Honda_words_sentiments$make <- "Honda"
Mazda_words_sentiments$make <- "Mazda"
words_sentiments <- rbind(Honda_words_sentiments, Mazda_words_sentiments)
sent_df <- words_sentiments %>% 
  group_by(make, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n))

ggplot(sent_df, aes(x = sentiment, y = frequency, fill = make)) + 
  geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Set1") +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Generally it seems that twitter users feel about the same in regards to Honda and Mazda. We can see only slight differences in sentiments between the two. Anger, surprise, and fear saw the “biggest” differences. Tweets containing #Honda show slightly higher percentages of fear, surprise, joy, and anticipation. Tweets containing #Mazda show slightly higher percentages of anger, disgust, sadness, and trust.

Based on this data, Mazda and Honda seem pretty similar. I did not find anything that will help me pick between the two makes of car. I am going to have to decide based on how the test drives go!