Introduction

In September 2020, Apple, Samsung, and Fitbit introduced their latest iterations of wearable technology to the market place. With all of the redesigns and added functionality, Investors would like to know which of these offerings will become a smashing success or abject failure for the ‘Big Three’ of smart watches. To answer this question, analyses of Twitter data was conducted to gain insight into the popularity of the brands, users’ sentimental attachments, and overall positivity associations.

Summary of Findings

Based on the limited data collected for the number of Tweets, Some could consider Fitbit to be more popular based on the summary table findings as compared to the others. However, depending on the day of the week, Apple would seem to be a more popular brand. The research indicates that this Brand receives significantly more tweets on Thursdays, Fridays and Saturdays as compared to the others.

Tweet data was subsequently analyzed to detect patterns related to the brands based on the hour of day, Fitbit is significantly more popular during the 1pm time-slot as compared to all other brands surveyed.

While the average tweet length for Apple, Fitbit and Samsung are 167, 117, and 130, respectively in the particular data sample, Tukey’s Multiple Comparison of Means test suggests that the difference in mean tweet length related to Apple is statistically significant at a 95% confidence level and we can conclude that Apple has receives longer tweets, on average, as compared to the other brands.

Overall, the individual brand sentiment scores along the eight attributes tend to reflect that of the aggregated model. Generally speaking, all the brands scored high in positivity and there were low anger scores across the board as well. However, there are a few notably observations worth mentioning. With regard to trust, we see that Apple and Samsung scored relatively high, unlike Twitter, which appears to fair below-average. The categorical variable Anticipation also produced significantly higher scores for Apple and Samsung, relative to Fitbit. What is interesting about Fitbit is that the brand tends to receive less ‘negativity’ as compared with the others. However, it is important to note that given the limited amount of tweets and sources that were analyzed, actual sentiment can vary significantly from what is observed here. F-Score testing will need to be conducted to gauge the effectiveness of the testing and the significance of the results.

With regard to user’s positivity associated with the brands, The table presents the positivity rating for the three brands. Apple appears to receive the highest positivity score with 0.42. Surprisingly, Fitbit received a higher score than Samsung with 0.38 and 0.28, respectively.

Data Collection Method

Data for the study were gathered on 9/26/2020 using the Rtweet package and Twitter API related to Apple, Samsung, and Fitbit. Duplicates and retweets were removed. Since Apple does not have an active display name account, search_tweets2 function was used to extract data using the following search terms:

 * For Apple:   'apple watch', 'applewatch', '@apple', '@apple watch', '#apple', 
               '@applewatch'
 * For Fitbit:  'fitbit', '#fitbit', 'fitbit sense', '@fitbit'
 * For Samsung: 'samsung watch', 'galaxy watch', 'galaxywatch', 'galaxywatch3', 
               '#samsung', 'samsungwatch', 'samsungwatch3', 'galaxy3watch', 
               'samsung3watch','@samsung','galaxy 3 watch', '@galaxywatch', 
               '@samsungwatch3'

Lets take a look at a brief summary table of the data collected" Table: Data summary

Name Piped data
Number of rows 2706
Number of columns 94
_______________________
Column type frequency:
character 51
factor 1
logical 24
numeric 14
POSIXct 3
________________________
Group variables brand

As indicated in the table above, 2706 tweets were collected among the three brands.

Analysis

How does the popularity of the brands differ daily from each other?

Table: Tweet Total by Brand

Brand Tweets
Fitbit 960
Apple 902
Samsung 844

The above shows total tweets by day per brand. Clearly, interest in Apple exceeds the others on Fridays, while Fitbit exhibits a more consistent volume of tweets between Tuesdays and Saturday. Interestingly, there does not appear to be any tweet activity related to Apple on Monday through Wednesday. This could be due to limitations in the amount of data extracted from Twitter. Thus, the results may not be reflective of popularity. We will need to take a closer look at this data segmented by day of the week.

Barchart: Tweet Frequency by Day of Week

Results

The visual above suggests that given the constraint on number of tweets to pull from Twitter, Apple exhausted the quota in only a few days. Apple reports significantly higher tweet volume on Fridays, Thursdays also appear to be a a high volume day for both Apple and Fitbit. Tweet volume for Fitbit seems more uniformly distributed from Tuwday to Saturday. Sundays and Mondays appear to be low-volume tweet days for all the brands in the study.

Does Popularity Differ by by Time of Day?

Since the previous chart was inconclusive due to possibility of missing values on certain days of the week, a better approach might be to view the average amount of tweets by time of day. But first, lets take a look at the times when people are creating tweets overall.

Barchart: Aggregated Sum of Tweets by Hour of Day

It appears there is a sharp spike in tweet generation for the brands around 1pm and then gradually declines for the remainder of the day. Let’s see if this pattern is consistent when viewed by individual brands.

Barchart: Aggregated Tweets by Hour Segmented by Brand

The overall pattern tends to hold true for each of the brands, however we do notice a remarkable increase in tweets related to Fitbit around 1pm, relative to the others This could possibly be attributed to office workers doing fitness routines with the aid of their Fitbit fitness trackers during the lunch break hour. Fitbit has the highest tweet average at 132.1.This is followed by both Apple and Samsung with 124.3 and 129.9, respectively.

Given the sharp spike in the 1pm tweets, yet another examination is needed.

Boxplot: Number of Tweets Created per Hour, Segmented by Brand

The box plots by individual brands do not appear to reveal any significant differences. The median hour of the day for all brands is around 1pm. Therefore, to answer our initial question, based on the data provided, Fitbit could be considered more popular, given the slightly higher average. However, further work should be done to get the p-value to see if the small difference is statistically significant. Flaws in the data collection could also invalidate the findings.

Results

Based on the data collected for the limited number of Tweets, Some could consider Fitbit to be more popular based on total tweets related to the brand as compared to the others. However, depending on day of the week, Apple would seem to be a more popular brand on Thursdays, Fridays and Saturdays. If we were to analyze the data based on hour of the day, Fitbit is clearly more popular during the 1pm time-slot as compared to all other brands surveyed.

Which Brand receives significantly longer Tweets?

Hypotheses

  • H0: There is no significant difference in length of tweets in the brands.
  • H1: There is a significant difference in the length of tweets for different brands

There appears to be a large difference in tweet lengths among the three brands, with Tweets related to Apple being longer. But are the observed differences significant?

Assumptions

For this test, we will assume that there is a normal distribution based on sample size. Otherwise, the Shapiro-Wilk’s test will be run to test for non-normal/skewed distributions.

Next, the assumption of homogeneity of variances should be tested.

Homogeneity of Variance Test

Brand Mean Width SD Width Count
Apple 166.5942 67.47848 902
Fitbit 116.6823 74.43027 960
Samsung 130.4218 62.82960 844

Based on the findings in the table, it appears that there are differences in the mean and standard deviation across the brands.

The third assumption of equal variance will be tested for validity using the LeveneTest.

LeveneTest for Assumption of Equal Variances

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2   21.64 4.748e-10 ***
##       2703                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the P-value < 0.05, the data variance is not homogeny. Thus we can reject the null hypothesis of equal variance and conclude that there is a statistically significant difference between the variances in the population.

Therefore, we will use an ANOVA test

ANOVA Test

## Analysis of Variance Table
## 
## Response: display_text_width
##             Df   Sum Sq Mean Sq F value    Pr(>F)    
## brand        2  1221817  610909  129.58 < 2.2e-16 ***
## Residuals 2703 12743085    4714                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: Given a p-value <0.05, there is some evidence to reject the null hypothesis in favor of the alternative and claim that the brand means are statistically different.

In order to determine which of the brands have means that are stastistically significantly different, we next need to run Tukey’s test.

Tukey’s HSD Test for Statistial Signicance in mean differences

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = display_text_width ~ brand, data = tweets)
## 
## $brand
##                     diff        lwr       upr    p adj
## Fitbit-Apple   -49.91194 -57.378273 -42.44561 0.00e+00
## Samsung-Apple  -36.17243 -43.883312 -28.46156 0.00e+00
## Samsung-Fitbit  13.73951   6.142063  21.33696 6.83e-05

Interpretation: There is a statistically significant difference between each pair of brands.

Conclusion

Apple receives significantly longer tweets as compared to the other brands at a confidence level of 95 percent.

How do Users Feel About the Brands?

While I will need be revealing individual or specific tweets as there exists the possibilty that rude or offensive language could crop up, we can aggregate information and gain insights on sentiment and emotions about the various brands.

Data Preparation and Text Cleaning

The text fields of the dataset will first be scrubbed to remove punctuation marks, html links, alphanumeric text, and non-American ASCII coding. Additionally, all text will be converted to lower case and white spaces removed. Finally, a custom word list will be created to remove highly redundant, yet unintelligible text, along with the names of the brands being analyzed.

Sentiment Analysis

As defined in a blog by Infegy, Sentiment Analysis is “the process of algorithmically identifying and categorizing opinions expressed in text to determine the user’s attitude toward the subject of the document or post.” The sentiment analysis will rank values and chart according to 8 emotional sentiments. A Positivity rating will be derived for each brand. In order to do sentiment analysis on text, after cleaning the data, you will need to stem the corpus. Emotional values are assigned by the sentimentr package.

The graph and table below displays the sentiment score for each of the eight emotions measured for all brands.

Score
anger 238
anticipation 659
disgust 116
fear 283
joy 458
sadness 274
surprise 202
trust 799
negative 560
positive 2076

Overall, the brands enjoy high sentiment scores for positivity and an adequate scores for both anticipation andtrust. Scores for disgust, surprise, sadness, and anger appear low on the whole. There is a

Comparative Sentiment Analysis

Results

Overall, the individual brands sentiment scores along the eight attributes tend to reflect that of the aggregated model. Generally speaking, all the brands scored high in positivity, with Fibit far outpacing the others with a score of 1291 and there were low anger scores across the board as well. However, there are a few notably observations worth mentioning. With regard to trust, we see that Apple and Samsung scored relatively high at 390 and 254, unlike Twiter, which appears to fair below-average. The categorical variable Anticipation also produced significantly higher scores for Apple and Samsung, relative to Fitbit. What is interesting about Fitbit is that the brand tends to receive less ‘negativity’ as compared with the others. However, it is important to note that given the limited amount of tweets and sources that were analyzed, actual sentiment can vary significantly from what is observed here. F-Score testing will need to be conducted to gauge the effectiveness of the testing and the significance of the results.

Positivity Rating

##     Apple    Fitbit   Samsung 
## 0.4235588 0.3755729 0.2777251
Rating
Apple 0.4235588
Fitbit 0.3755729
Samsung 0.2777251

The table presents the positivity rating scores for the three brands. Apple appears to receive the highest positivity score with 0.42. Fitbit received a higher score than Samsung with 0.38 and 0.28, respectively.

Word Cloud

While I will not show specific tweets, this wordcloud is helpful in determining if further steps are warranted in the text cleaing process. Size indicates frequency of term usage.

Interpretation

Aside from a few unintelligible terms, the text cleaning process appears adequate. As you could probably guess, misspelled terms are frequently found in Twitter.

Contributers

Professor J. Assay. Williams College of Business, Xavier University