Introduction

This project examines the tweets about seven popular consumer brands: Disney, McDonald’s, Microsoft, Nintendo, Samsung, Sony and Starbucks. It analyzes the trends of the users that tweeted these brands online, and the topics associated with each brand.

Methodology

1,133,796 tweets containing at least one of the keywords “Disney,” “McDonalds,” “Micrsoft,” “Nintendo,” “Samsung,” “Sony” and “Starbucks” were downloaded from a Python API between August 6th-8th 2015 and August 17th-18th 2015. The raw tweets were converted from JSON to CSV format through Python, retaining only 10 selected fields, such as the text, Twitter username, language, location and the number of followers.

Next, a multi-step data manipulation was carried out in R to process the CSV file. Tweets that contained more than one keyword in their text were removed from the data frame, leaving 1,126,272 tweets in the data set. Because there were 129,968 different locations in the original data set, most of the tweets were relabeled using regular expressions and assigned to a country based on their city or state. Many tweets in the original data set did not disclose the users’ location (ie. an empty field), and ambiguous locations such as “Everywhere,” “Your Phone” or “Space” were removed from the location field. As a result, 491,886 tweets have an empty location field. Another 168,591 tweets have their locations remain mislabeled due to time constraint.

841,475 tweets in the English language were subsetted to a different data frame. In the first treatment for sentiment analysis, the tweets were tagged with positive and/or negative sentiments using dictionaries adapted from the Harvard General Inquirer. Tweets that were not tagged with any sentiment were marked as neutral. Tweets that contained both positive and negative sentiments in the same text were removed from the data frame, leaving 327,740 tweets for brand sentiment analysis. In the second treatment for text mining, all 841,475 English tweets were subsetted to their respective brands, and converted into term document matrices after removing stop words, numbers and punctuations from their text. The term document matrices for all seven brands were saved for text mining.

Trend Analysis

Disney is the most tweeted brand, followed by Samsung, Microsoft and Sony. These four brands make up 80.7% of the Twitter buzz. This is unsurprising because Disney, Samsung and Sony have wide product and media service offerings, while Microsoft released Windows 10, its new operating system, just a week before the tweets were collected. On the other hand, McDonald’s is the least tweeted brand, consisting less than half the tweets of its beverage rival Starbucks. This is not unexpected because McDonald’s has been reported on multiple occassions for its failed Twitter campaigns. It is also possible that consumers tweet about its products, such as the McNuggets or the Big Mac, without directly mentioning the brand because of their familiar names.

Majority of the tweets originated from the United States, where Disney, Samsung and Sony have approximately the same proportion of tweets. Microsoft received fewer mentions than Sony from the American users even though it is the third most tweeted brand overall. Among the top 10 non-US countries, Disney is the most tweeted brand in the United Kingdom, Japan, Brazil, Canada, Mexico and Spain. This indicates that Disney has a wide global presence and its brand is broadly recognized by consumers worldwide. Unlike Disney, Nintendo receive significant mentions only in Japan, its native country, the United States and the United Kingdom. Sony, Nintendo’s game console rival, is more widely tweeted in the United Kingdom, Japan, Russia, Mexico, Spain, France and India. This suggests that Nintendo has a much narrower Twitter fanbase than Sony especially in North America and in Europe. Samsung is the single most dominant brand in Indonesia, where the company recently invested new manufacturing operations. McDonald’s has negligible mentions outside of the United States and the United Kingdom despite being a global brand, while Starbucks received notable brand mentions only in the United States, the United Kingdom, Canada and Mexico. Interestingly, Russia and India exhibit similar patterns where the majority tweets are about Samsung, Microsoft and Sony.

Brand sentiment analysis reveals that Disney has a 1:1 ratio of positive to negative tweets. Microsoft, Starbucks and Nintendo have higher proportion of positive tweets than negative tweets, even though the latter two brands were not widely tweeted by Twitter users in many countries. In fact, Nintendo has a 2:1 positive-to-negative ratio despite its narrow Twitter fanbase. On the other hand, Samsung, Sony and McDonald’s have higher proportion of negative tweets than positive tweets. In particular, Samsung has twice as many negative mentions as positive mentions on Twitter despite its popularity.

Examination of how the brand sentiments distribute across different countries shows that Disney has more negative than positive tweets in four of the five countries that tweeted most about the brand, particularly the United Kingdom. The tweets about Samsung and Sony in the United States are vastly negative, and it is possible that many of the tweets compare their products to mobile device rival Apple. On the other hand, the tweets about Microsoft in the United States are mostly positive, and the sentiments are evenly positive and negative in the United Kingdom, India, Canada and Australia. It is possible that Microsoft Windows 10 received rave reviews on Twitter by its American consumers. Likewise, the tweets about Starbucks and Nintendo are primarily positive in the United States and other countries. The tweets about McDonald’s, however, are mostly negative in the United States, the United Kingdom, Canada, Australia and Philippines.

Among the seven brands, Disney has the largest distribution of Twitter followers and friends. The 50th to 75th percentile users appear to be very well connected, and the 75th percentile user has about 1,200 Twitter followers, and follows 1,000 other users. In other words, users that tweeted about Disney tend to have more followers and follow more people on Twitter than users that tweeted the other six brands. It is worth noting that the median users that tweeted Disney, Starbucks and McDonald’s all have approximately the same number of followers and friends.

However, users that tweeted about Microsoft, Samsung, Sony and Nintendo have significantly higher total tweet counts than those who tweeted Disney, Starbucks and McDonalds. In particular, the 75th percentile users that tweeted those four electronic brands have more than 150,000 total tweet counts each. It is possible that there are many spambots on Twitter promoting or selling the electronic products of those companies, thereby inflating their total tweet counts. Users that tweeted Starbucks and McDonald’s have similar distributions and much lower number of total tweet counts, suggesting that majority of those users are likely to real people.

Comparison of the users reveal that even the outliers have significantly higher number of Twitter followers than number of Twitter friends. A large number of users have more than 500,000 total tweet counts and tend to tweet about the electronic brands, but they have few Twitter followers or friends. This lends credence to the hypothesis that many such users are spambots. By contrast, users that boast high number of followers (1.5 millions or greater) or high number of friends (125,000 or greater) are diverse in the brands that they tweeted, and generally have fewer than 500,000 total tweet counts.

Comedian Ellen DeGeneres has the highest number of Twitter followers among all users (46.4 millions followers), featuring Disney and Emma Watson’s new movie “Beauty and the Beast” in her tweet.

##                                                                                                                         text
## 1: Can’t wait to bring another wonderful Disney character to life. All I’m asking for is an audition. http://t.co/TUQypFTw5U
##                      time_created retweet_counts language  screen_name
## 1: Mon Aug 17 21:19:26 +0000 2015              0       en TheEllenShow
##    followers friends statuses     locations  brand
## 1:  46433802   36789    10856 United States Disney

Brazilian Twitter user Antonio J Campos has the highest number of Twitter friends among all users (1.2 million friends), and tend to tweet about tech products and services.

##                                                                                text
## 1: Microsoft libera primeira atualização do Windows 10 http://t.co/iWd84K2dyB #tech
##                      time_created retweet_counts language screen_name
## 1: Thu Aug 06 22:27:30 +0000 2015              0       pt  ajcampos01
##    followers friends statuses locations     brand
## 1:   1214415 1206790    40470    Brazil Microsoft

Noticias Venezuela, a Venezuelan news curator and aggregator service, has the highest total tweet counts (5.28 millions Tweets), featuring a news article that reported Disney’s plan to construct two Star Wars theme parks.

##                                                                                                                text
## 1: RT: @elmundomovil :Disney anuncia la construcción de dos parques temáticos de "Star Wars" http://t.co/CflaKYoAft
##                      time_created retweet_counts language screen_name
## 1: Tue Aug 18 01:15:56 +0000 2015              0       es     notiven
##    followers friends statuses locations  brand
## 1:     27850     207  5280886 Venezuela Disney

Text mining

Word networks of each brand reveal how the frequent terms are associated together among the tweets. It is apparent to see the topics within the network clusters that widely discuss about the brands’ products or service. For Disney, there are discussions of Star Wars, either about Star Wars Episode VII movie or about the announced Star Wars theme parks. There is another cluster about the tweet “Disney and Pixar will never be able to top this” that were tweeted by two users, Dory and Kardashian Reactions, and shared by thousands of other users. The word network indicates that Kardashian Reactions also has two other tweets that were widely shared on Twitter: “It’s been 8 YEARS since High School Musical 2 premiered on Disney Channel on August 17, 2007” and “IM LITERALLY FREAKING OUT ABOUT ALL THE NEW UPCOMING THINGS DISNEY RELEASED” (in reference to Disney’s Toy Story Land as well as animations including Zootopia, Toy Story 4, Cars 3 and The Incredibles 2). This suggests that Kardashian Reactions is a highly influential Twitter user about Disney because her tweets are viewed and discussed by a large number of other users.

Samsung’s word network shows that there are tweets about its LED Smart TVs and its smart phones, notably the newly released Galaxy S6 Edge+ and the Galaxy Note 5. There are also frequent mentions of its mobile rivals HTC, LG, iPhones and the mobile operating system Google Android. However, there is no significant direct mention of Apple.

Micrsoft’s word network displays a large cluster about the Xbox, NTSC (North American standard), Kinect sensor and eBay. It is likely that many tweets about the brand were trying to sell bundles of Microsoft products on eBay. There are also word clusters about the use of Microsoft Surface Pro tablets on NFL Sidelines, as well as its new Windows Bridge tool for software developers to run iOS apps on Windows operating system.

Sony has a cluster of words about its music service, the English band One Direction and their newest hit song Drag Me Down, and another cluster about its Playstation console system. There are also tweets about Sony Pictures’ recently announced sequels for the film “Bad Boys,” and Sony’s new Alpha and Cybershot cameras. Note that frequent mentions of the term “Bad Boys” possibly skew Sony’s sentiment analysis towards negative.

Starbucks’ word network reveals frequent mentions of the company’s and Panera’s new Pumpkin Spice Lattes, which use real pumkin and remove caramel coloring from their drinks. The network also displays a widely shared tweet by social media celebrity Christian Collins: “Starbucks we need a mint drink”, and another tweet by the aforementioned Kardashian Reactions: “better not take them to starbucks then or they’ll turn into a (profanity) latte”.

Interestingly, McDonald’s word network does not feature much about its food or beverage products. Instead, there is a widely shared tweet by Deveoh that McDonald’s burgers do not rot or mold (in reference to the company’s heavy use of preservative), and a tweet “the founding fathers, who all barely washed their (profanity), wanted me to have an assault rifle in this mcdonalds.” However, there are also positive mentions of the Ronald McDonald House Charities and its new mPoints initiatives.

Hierarchical clustering of the frequent terms show similar clusters to the word networks aboove. In particular, the clusters containing ‘ebay,’ ‘full’ and ‘read’ commonly appear in Samsung, Microsoft, Sony and Nintendo’s dendrograms. This suggests that many tweets associated with these four brands were about selling or trading their electronic products, because consumers that sell their devices on eBay typically insert “Full read by eBay” into their tweets. As a result, it is difficult to extract meaningful, unique clusters from the four electronic brands’ dendrograms. For instance, Samsung’s dendrogram shows clusters related to its products, such as “galaxy” and “note,” but it also has large clusters of terms such as “htc,” “iphone” and “unlocked” that do not reveal consumers’ sentiments about its products. Similarly, Microsoft’s dendrogram displays terms that related to its tablet and software products but does not offer insight to what consumers say about its brand. While Sony’s dendrogram exhibits the topic clusters about Bad Boy’s new sequels and Sony music service, there is also a large generic cluster about its electronic devices such as Xpheria and PSP. Likewise, Nintendo’s clusters are primarily about its Nintendo DS device and its popular games, such as Pokemon and Super Mario.

Brand Word Clouds

The following are word clouds created from frequently used terms in the tweets that are associated with each brand, the size of each word is directly correlated to its frequency of appearance:

Conclusion

This project applies visualization and text mining to discover the trends of users that tweeted about the popular consumer brands and the topics that are associated with each brand. While sentiment analysis shows whether consumers are generally positive or negative about each brand, the dictionaries need to be refined to avoid incorrect sentiment tagging. For instance, the frequent tweets about the movie “Bad Boys” have unintentionally skewed Sony’s consumer sentiments towards negative. It is also challenging to conduct geospatial analysis about how the sentiments differ across countries because majority of the tweets originated from the United States. Likewise, tweets that are not in the English language were removed from sentiment analysis or text mining, thus preventing us from learning about what non-English speaking consumers tweeted about the brands.

Among the seven brands, it is easier to observe brand-related topic discussions and influential users for Disney, Starbucks and McDonald’s. While there are many tweets about the products of Samsung, Microsoft, Sony and Nintendo, it is difficult to extract what consumers say about each brand because of spambots and people who try to sell their goods on eBay. Further work is required to track the general messages of these electronic brands and to identify whether the tweets are primarily about reviews, news or sales.