Brief description of the Project

The purpose of this project is to see the popularity and other implications between company Huawei and Samsung based on the corresponding tweets data. Therefore, I first collected tweets data of two big companies, with hashtage #Huawei and #Samsung, by leveraging free twitter api on Oct 4th. I analysis the data from following three questions in different aspects, including Polularity (number of tweets), Favority and Retweets, Date (created_at).

loading data: tweets with hashtags #huawei and #samsung

preprocess the data:

From the data we collected, we see that the date of each tweet is stored in column “created_at” in a string. Therefore, I change its format year-month-day-hour-minute-second, besides add an dummy name “Day of the Week” and “Hour of the Day” to store the date data

huawei$created_at <- ymd_hms(huawei$created_at)
huawei$Day_of_Week <- wday(as.Date(huawei$created_at),label=TRUE,week_start = 1)
huawei$Hour_of_Day <- hour(huawei$created_at)



samsung$created_at <- ymd_hms(samsung$created_at)
samsung$Day_of_Week  <- wday(as.Date(samsung$created_at),label=TRUE,week_start = 1)
samsung$Hour_of_Day  <- hour(samsung$created_at)

Question 1

Question 2

Which brand people like the most?

Overall, people like Samsung most than Huawei in terms of the number of Favorites and Retweets. 1. Firstly, It can shed light on the question by analyzing the number of Favotites of each brand per day.

  • Samsung has the most total number of Favorites than Huawei. It indicates that people are more favor in Samsung than Huawei

  • Based on the figure, there are most Favorites in the Wednesday for Huawei, and Monday for Samsung.

  • Most interesting thing is that, More Favorites in the first two days of the week for Samsung, while more Favorites in the weekend for Huawei

  1. Secondly, I further analysis the numer of retweets for both brand per day
  • The total number of Favorites for Samsung is far more than Huawei

  • Samsung has the most Favorites in Monday and Friday, while Wednesday and Weekend for Huawei

  • Similar observation as above, people in favor of Favorite for Huawei during the weekend, while in the week day (Monday and Friday) for Samsung

# The number of favorites per day using "favorites_count"
huawei %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(favorite_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Favorites of Huawei per Day") +
  ylim(0,17000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(favorite_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Favorites of Samsung per Day") +
  ylim(0,17000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` ungrouping output (override with `.groups` argument)

# The number of retweets per day based on "retweet_count"

huawei %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(retweet_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "pink") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Retweets of Huawei per Day") +
  ylim(0,5000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(retweet_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "pink") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Retweets of Samsung per Day") +
  ylim(0,5000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` ungrouping output (override with `.groups` argument)

Question 3

Analysis the source equipment of tweets for both brand?

By plotting the number of tweets based on different source equipment, I observed that “Twitter Web App” (which is tweet from any browser on both phone and pc) is the most popular source for Huawei, and the second popular is “Android”, which means tweet from Android phone, it could be Huawei phone at a great choice at this circumstance. The least favorite source for Huawei is “TweetDeck” (app for management of Twitter accounts);

For Samsung, the most popular source is Android, which dominate compare to other sources, that means people tweet about Samsung are mainly using Android, and has a great chance on exact a samsung phone. the second popular source “Twitter Web App”. and least favor one is “Hootsuite Inc”, which is a social media management platform. Overall, There are far more source data for Samsung than Huawei based on the data I collected. “Android” and “Twitter Web App” are the most popular sources for both brand. and the third popular source for both brand is “Apple”, which means people who use apple prodcuts are also care about those brands to some degree.

huawei %>% 
  group_by(source) %>% 
  summarize(Count = n()) %>% 
  arrange(-Count) %>% 
  slice_max(Count, n = 5) %>%  
  ggplot(aes(x=source, y=Count)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(x = "Source", y = "Count", title = "Equipment Source of Tweets with #Huawei") +
  geom_text(aes(label = Count), position = position_dodge(width=0.5),vjust = -0.5) +
  ylim(0,3400)+
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(source) %>% 
  summarize(Count = n()) %>% 
  arrange(-Count) %>% 
  slice_max(Count, n = 5) %>%  
  ggplot(aes(x=source, y=Count)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(x = "Source", y = "Count", title = "Equipment Source of Tweets with #samsung") +
  geom_text(aes(label = Count), position = position_dodge(width=0.5),vjust = -0.5) +
  ylim(0,3400)+
  theme(plot.title = element_text(hjust = 0.5))
## `summarise()` ungrouping output (override with `.groups` argument)