For our Project, we chose one of the most popular American Airline brands i.e Delta Airlines. This project is aimed to help Delta Airlines to understand what people think about their service through analyzing information from one of the most popular social media: Twitter. Our Analysis includes the following steps:
https://gmishiny.shinyapps.io/DeltaInsight/
We took data from January 2021 from Delta Airlines company. In Twitter it follows 38.8K and it has 1.5M Followers.
To connect to twitter API, we used OAuthFactory function and Twitter Application credentials. To use that you need to apply for Twitter Developers Account explaining why and how data will be used.
my_token <- create_token( app = "", consumer_key = "", consumer_secret = "", access_token = "", access_secret = "", set_renv=FALSE)
After that we saved the Retrieved Tweets as an Object so that we don’t exhaust our usage limit. There are 22 files with data which were merged.
As we wanted to work with more data that is possible to extract using one account, we used 6 accounts.
Giving us in final 23926 tweets from clients and 1000 from Delta to analyze.
We removed misspeled words, so they wouldn’t affect our work. That is how cleaned table looks like.
## # A tibble: 2 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 106627~ 13451575~ 2021-01-01 23:58:37 AliAkinK "If ~ Twitt~
## 2 270589~ 13451571~ 2021-01-01 23:57:11 TomMinerCMS " ~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
After we tokenized words and removed stop words.
To our mind first 5 words are quite obvious as those are words correlated to flights and company itself. On 6 places there is word “support” and that would be essential for future analyze to go deeper in those tweets where people mention that word as it might be both negative or positive: “great support Delta!” or “awful support, never returned money”.
The most popular words are “confirmation” and “dm” - direct message. As seen later there are a lot of negative sentiments. That mightlead to proposing resoluyion in direct messages.
Amount of negative sentiments is twice higher than positive.
As it can be seen from the bars top two reasons for negative feedback are inconvinience and delays. Delays are easy understandable cause while inconviniece might be studied deeper in order to see what exactly was the reason: delays, bad timing of flights, COVID measures etc.
Word “concern” and “concerns” have the same meaning. That would double bar chart of “concern” and give one more topic for investigation: what are clients so concerned about.
Speaking about positive sentiments it’s obvious that people are happy because they are safe, also company makes refunds and probably staff members are patient and sincere. Other words are grads of happiness felt by clients.
## # A tibble: 2 x 2
## sentiment n
## <chr> <int>
## 1 negative 4781
## 2 positive 2005
On this graphs we can see what clients of competitors tweet. So far it doesn’t represent a significant insights and shows that mostly people tweet about the same topics.
Table with the most popular tweets.
## # A tibble: 10 x 4
## created_at screen_name text favorite_count
## <dttm> <chr> <chr> <int>
## 1 2021-01-09 23:39:05 Lakers "Next stop: H-Town \n\n#Lake~ 7050
## 2 2021-01-12 05:25:49 Cleavon_MD "KICKED OFF FLIGHT: Melody B~ 4211
## 3 2021-01-31 17:59:22 AshaRangapp~ "Just a reminder that the GO~ 1819
## 4 2021-01-09 19:37:15 ConservaMom~ "\U0001f6a8Fascism Takes Fli~ 1421
## 5 2021-01-19 20:12:47 SilverNumbe~ "Finally @Delta changed up t~ 1001
## 6 2021-01-12 19:33:23 marcusdipao~ "A militia group called for ~ 834
## 7 2021-01-21 20:57:00 NYRangers "And we’re off. <U+2708><U+FE0F>\n \nThank~ 602
## 8 2021-01-01 09:21:25 stewartcink "Seems to me airlines mostly~ 440
## 9 2021-01-29 02:58:20 maximum "Cannot begin to understate ~ 397
## 10 2021-01-29 02:58:20 maximum "Cannot begin to understate ~ 395
It would be interesting to understand in what context clients mention competitors: negative or positive sentiments. Suprisinglym people metion Coca-Cola and various TelCom companies.
There are no specific trends in user’s tweets but we can see that on evenings of Sundays and Mondays people post more.
Answers from Delta are usually 1-2 days later but the trend is the same.
Some of the accounts are obvious: SecretFlying or GetYouRefund. They are connected to flight tickes and airlines - one finds great deals, second helps with refund. Other accounts are not so obvious and need further investigation - why they are in top-10?
There is a huge gap between iPhone and Android users. Twitter for iPhone is used by 50% of clients who tweet about Delta. If users of website and mobile application show the same trend, there is a huge need in developing and maintaining in great shape mobile app for iPhones while investing a bit less in Android’s.
Surprisingly it’s not English. That information might influence adding different languages to SMM campaighns.
This analysis might be developed further into finding correlation between angry tweets and airports. For example, Los Angeles airport accumulates the most of negative feedback. That might lead to negotiations with the airport to understand why there are delays or something like that.
The USA has the most tweets, though there is a huge red spot in Africa and it would be interesting to investigate why. All other red dots all over the map correspond to huge airport hubs.
Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, Trust, Negative and Positive.
As it’s seen on the graph, positive and negative emotions are quite close.
## Delta_clean$text
## 1 If were not getting 2000 I expect the governmentsubsided and to provide us with a loaded gas card frequent flyer miles and a waived Prime subscription
## 2 please help Luke with this Were still in the middle of a global pandemic and your flexibility would be much appreciated
## 3 Where would you go if enough people got vaccinated
## 4 I agree I feel very safe flying
## 5 can you cancel the refund and fix my ticket i was advice not to use the LaGuardia Airport and use the JFK airport because it closer you have my information you can step in am listening
## 6 I am force to have to cancel my flight and pick a different airport in New York all I ask for was to switch from LGA to JFK I am pissed off I am unhappy you treating basic flight this way
## anger anticipation disgust fear joy sadness surprise trust negative positive
## 1 0 1 0 0 0 0 1 2 0 3
## 2 0 0 0 1 0 1 0 0 1 1
## 3 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 1 1 0 0 1 0 3
## 5 0 1 0 0 0 1 0 1 1 1
## 6 2 1 1 1 0 2 0 0 3 1
## package 'RWeka' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\dasha\AppData\Local\Temp\RtmpcReJmV\downloaded_packages
Data for world clouds has to be pre-processed to delete noise and mingless words such as ‘a’, ‘the’, ‘was’ etc. Here we can see pre-processed world-cloud. Later there will be an example of a not prepared data.
“Apologize for” is clearly seen as one of the leaders. No wonder most of the tweets have negative sentiment.
This is word cloud before pre-processing. Just ‘the’, ‘a’, ‘was’ etc.
This word cloud shows sence as it was cleaned.
Not cleaned word cloud.
Cleaned word cloud.
With the overall tweets data that we were able to collect we were able to summarize the overall user tweet trend to Delta airlines which included
User Trend – Through our analysis, we found user-specific data like the most liked post, most popular hashtags among users, most shared links, etc
Tweeter Status Frequency – This helped us understand the most likable days of the month when users post about Delta Airlines and in return the tendency of Delta airlines to post replies to the user tweets.
User Profile – This included top users making maximum tweets about Delta Airlines, their most preferred platforms, top languages they spoke.
Sentiment Analysis – Based on the Emotions scores of the Tweets. Also, engagement & activity regarding the sentiments (positive / negative). It was even observed that Delta Airlines replies to tweets more often when there is extremely positive or negative sentiment.
Did a detailed Topic Analysis using Bigrams.
Follower Profile Summary - Used a world map to see the overall distribution of Tweets users across the globe and their top locations.