Brief description of the Project

The purpose of this project is to see the popularity and other implications between company Huawei and Samsung based on the corresponding tweets data. Therefore, I first collected tweets data of two big companies, with hashtage #Huawei and #Samsung, by leveraging free twitter api on Oct 4th. I analysis the data from following three questions in different aspects, including Polularity (number of tweets), Favority and Retweets, Date (created_at).

Note that all analysis below is based on the data I collected, it may not accurate and only for academic analysis.

loading data: tweets with hashtags #huawei and #samsung

preprocess the data:

From the data we collected, we see that the date of each tweet is stored in column “created_at” in a string. Therefore, I change its format year-month-day-hour-minute-second, besides add an dummy name “Day of the Week” and “Hour of the Day” to store the date data

huawei$created_at <- ymd_hms(huawei$created_at)
huawei$Day_of_Week <- wday(as.Date(huawei$created_at),label=TRUE,week_start = 1)
huawei$Hour_of_Day <- hour(huawei$created_at)



samsung$created_at <- ymd_hms(samsung$created_at)
samsung$Day_of_Week  <- wday(as.Date(samsung$created_at),label=TRUE,week_start = 1)
samsung$Hour_of_Day  <- hour(samsung$created_at)

Question 1

Which brand (huawei or samsung) is more popular?

I analysis this question by considering those aspects:

Firstly, I analysis the number of the tweets for each brand, this indicates the popularity of each brand to some degree overall by presenting the corresponding table and bar figure based on the day. Overall, there are more number of tweets for Samsung than Huawei, which indicates that Samsung is more popular than Huawei. Obviously, people are talking about Huawei more often durning the weekend when they are in a short resting days. While people prefer to talk about Samsung in the first two days of the week
Secondly, by plotting the number of tweets per hour of each brand. I observed that people are talking about Huawei in the highest volumn from 11am-1pm. In other words, this may be because they were having a resting after a busy morning. Same observation for Samsung, it even more obvious for Samsung. The highest number tweets for Samsung is 601 at 12pm, and 301 at 1pm for Huawei. Samsung still has more tweets at each day. which still tells us that it more popular than Huawei

# Table by day of the week
huawei %>%
  group_by(Day_of_Week)%>%
  summarize(Total = n()) %>% 
  arrange(-Total) %>% 
  slice_max(Total,n=7) %>% 
  pander()

## `summarise()` ungrouping output (override with `.groups` argument)

Day_of_Week	Total
Sat	815
Sun	686
Thu	641
Wed	587
Tue	580
Fri	570
Mon	527

samsung %>%
  group_by(Day_of_Week)%>%
  summarize(Total = n()) %>% 
  arrange(-Total) %>% 
  slice_max(Total,n=7) %>% 
  pander()

## `summarise()` ungrouping output (override with `.groups` argument)

Day_of_Week	Total
Mon	2761
Tue	1885
Fri	860
Sat	770
Thu	751
Wed	714
Sun	604

# number of tweets by the day of the week
huawei %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = n()) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Tweets of Huawei per Day") +
  ylim(0,2800) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = n()) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Tweets of Samsung per Day") +
  ylim(0,2800) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

# number of tweets by the hour of the day
huawei %>% 
  group_by(Hour_of_Day) %>% 
  summarize(Total = n()) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 24) %>% 
  ggplot(aes(x = Hour_of_Day, y = Total)) +
  geom_bar(stat = "identity", fill = "red") +
  labs(x = "Hour of the Day", y = "Count", title = "The Number of Tweets of Huawei per Hour") +
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  ylim(0,610) + 
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(Hour_of_Day) %>% 
  summarize(Total = n()) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 24) %>% 
  ggplot(aes(x = Hour_of_Day, y = Total)) +
  geom_bar(stat = "identity", fill = "red") +
  labs(x = "Hour of the Day", y = "Count", title = "The Number of Tweets of Samsung per Hour") +
  ylim(0,610) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

Question 2

Which brand people like the most?

Overall, people like Samsung most than Huawei in terms of the number of Favorites and Retweets. 1. Firstly, It can shed light on the question by analyzing the number of Favotites of each brand per day.

Samsung has the most total number of Favorites than Huawei. It indicates that people are more favor in Samsung than Huawei
Based on the figure, there are most Favorites in the Wednesday for Huawei, and Monday for Samsung.
Most interesting thing is that, More Favorites in the first two days of the week for Samsung, while more Favorites in the weekend for Huawei

Secondly, I further analysis the numer of retweets for both brand per day

The total number of Favorites for Samsung is far more than Huawei
Samsung has the most Favorites in Monday and Friday, while Wednesday and Weekend for Huawei
Similar observation as above, people in favor of Favorite for Huawei during the weekend, while in the week day (Monday and Friday) for Samsung

# The number of favorites per day using "favorites_count"
huawei %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(favorite_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Favorites of Huawei per Day") +
  ylim(0,17000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(favorite_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Favorites of Samsung per Day") +
  ylim(0,17000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

# The number of retweets per day based on "retweet_count"

huawei %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(retweet_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "pink") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Retweets of Huawei per Day") +
  ylim(0,5000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(Day_of_Week) %>% 
  summarize(Total = sum(retweet_count)) %>% 
  arrange(-Total) %>% 
  slice_max(Total, n = 7) %>% 
  ggplot(aes(x = Day_of_Week, y = Total)) +
  geom_bar(stat = "identity", fill = "pink") +
  labs(x = "Day of the Week", y = "Count", title = "The Number of Retweets of Samsung per Day") +
  ylim(0,5000) + 
  geom_text(aes(label = Total), position = position_dodge(width=0.5),vjust = -0.5) +
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

Question 3

Analysis the source equipment of tweets for both brand?

By plotting the number of tweets based on different source equipment, I observed that “Twitter Web App” (which is tweet from any browser on both phone and pc) is the most popular source for Huawei, and the second popular is “Android”, which means tweet from Android phone, it could be Huawei phone at a great choice at this circumstance. The least favorite source for Huawei is “TweetDeck” (app for management of Twitter accounts);

For Samsung, the most popular source is Android, which dominate compare to other sources, that means people tweet about Samsung are mainly using Android, and has a great chance on exact a samsung phone. the second popular source “Twitter Web App”. and least favor one is “Hootsuite Inc”, which is a social media management platform. Overall, There are far more source data for Samsung than Huawei based on the data I collected. “Android” and “Twitter Web App” are the most popular sources for both brand. and the third popular source for both brand is “Apple”, which means people who use apple prodcuts are also care about those brands to some degree.

huawei %>% 
  group_by(source) %>% 
  summarize(Count = n()) %>% 
  arrange(-Count) %>% 
  slice_max(Count, n = 5) %>%  
  ggplot(aes(x=source, y=Count)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(x = "Source", y = "Count", title = "Equipment Source of Tweets with #Huawei") +
  geom_text(aes(label = Count), position = position_dodge(width=0.5),vjust = -0.5) +
  ylim(0,3400)+
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

samsung %>% 
  group_by(source) %>% 
  summarize(Count = n()) %>% 
  arrange(-Count) %>% 
  slice_max(Count, n = 5) %>%  
  ggplot(aes(x=source, y=Count)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(x = "Source", y = "Count", title = "Equipment Source of Tweets with #samsung") +
  geom_text(aes(label = Count), position = position_dodge(width=0.5),vjust = -0.5) +
  ylim(0,3400)+
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` ungrouping output (override with `.groups` argument)

hw4

Wubin Chen

10/4/2020