1 Introduction

The intent of this project was to analyze Twitter data to understand how logistics companies, such as UPS and FedEx, use the application to reach new and existing customers. Use of network analysis tools and text sentiment examination, questions of who tweets about logistics and what they are saying will be helpful for companies looking to leverage the application.

Transportation services, while everyone gets excited about seeing a delivery truck coming up the drive, has a similar status of other infrastructure commodities. No one talks about electricity or water services until it is not there. Earlier this year with the impacts of COVID-19 effecting the ability of businesses to reach customers, the economy, health, and safety of the US depended on logistics companies to move goods and, with the roll out of the COVID vaccine, services. Broadly, the project’s intent was to glean best practices for communication and replicate with the author’s employer. Leveraging network connections in social media has shown to be advantageous for many commercial goods companies, when used properly.

  • Disclosure: the author is an employee with UPS, in a department unrelated to public affairs or marketing.

2 Theory

Two underlying assumptions were made prior to pulling the data from Twitter for analysis. One: consumers only leave reviews when they are disappointed with a good or service. And two: positive posts about a company are good for a company’s bottom line.

Anyone can search for a product or restaurant on Amazon or Yelp and arbitrarily come to the conclusion that only angry costumers leave reviews. Self-appointed experts contest that it takes 40 positive reviews to undo a negative review. Reviews and comments make a difference to consumers, 72% to 94% of them saying that they read reviews prior to deciding to purchase a good or service.

This would seem an uphill for many businesses. However, it has been found that unhelpful comments or reviews without substance (i.e. “I hate this company”) don’t hurt companies; too many positive comments or negative reviews that are not addressed by businesses actually do more harm than good. Negative comments on social media do not spell the end of companies, especially if they. Engaging commenters or using reviews as feedback to deliver a better product or service, led other consumers to a higher purchase rate and overall brand sentiment.

Consumers are becoming savvy to internet trolls as much as they are suspicious of reviews on a business’s own website, they also tend trust a stranger’s review as much as a friend’s. Through network analysis of Twitter data and a sentiment analysis of tweets, it should be clear which users engage the most with transportation companies, what they are saying about said businesses, and how logistics companies are using their brand and Twitter to build their business even when the circumstances are not ideal.

3 Methodology

3.1 Data

Using a Twitter API and the rtweet package in R, the full timeline of tweets for each of the big 4 logistics companies (DHL, FedEx, UPS, and the US Postal Service) were retrieved, along with the timelines of their customer support handles and phrases associated with logistics and shipping. “Pulls” are limited to 18000 tweets per 15 minutes, and include information like the date, time, screen name, tweet message, users who liked and reposted the tweet, the platform used to post the tweet, and location. In all, 90 fields of information - combination of lists, character strings, dates, and logical entries - were pulled for each tweet retrieved.

First examination of the data revealed some biases in how the data was being pulled. Working for a specific shipping vendor led to using – and retrieving - lexicon that is primarily used by one company; this meant that the activity of one business appeared to be as much as 10 times greater than other businesses. Familiarity with company specific twitter handles (employee on the part of the author also led to a biasing in the data.

The dual use of some phrases led to a lot of noise unrelated to shipping or logistics. One such example is “ups.” Pulled originally to uncover mentions of UPS, it was also associated with “cover-ups” and other politically slanted tweets (note: the original pulls were right ahead of the 2020 US election). “Shipping” also led to a similar high noise levels, as many companies were promoting “free shipping” in the weeks leading up to the 2020 holiday season.

Multiple datasets were used for the analysis. Pulls for mentions of “shipments” and “deliver” and the timelines for UPS and FedEx. Amazon was not included in this analysis as its primary business is not solely logistics.

Below is the collected data from 10/30-11/19, and 11/30-12/2 of 2020 as .csv files.

# UPS
get_ups <- read_twitter_csv("get_UPS.csv")
ups_alts <- read_twitter_csv("ups_alts.csv")

# DHL
get_dhl <- read_twitter_csv("get_dhl.csv")
dhl_alts <- read_twitter_csv("dhl_alts.csv")

# FedEx
get_fedex <- read_twitter_csv("get_fedex.csv")
fedex_alts <- read_twitter_csv("fedex_alts.csv")

# USPS
get_usps <- read_twitter_csv("get_usps.csv")
usps_alts <- read_twitter_csv("usps_alts.csv")

# Common phrases, as smaller chunks of data.  For purposes of analysis this was necessary.
deliver <- read_twitter_csv("twts_deliver.csv")
shipment <- read_twitter_csv("twts_shipment.csv")

3.2 Variables

While the data available from Twitter yields a plethora of variables (some 90 fields), the focus of the analysis was on screen names, the text of the tweets, and re-tweeters. The unique screen names of the original posters and those who re-tweeted posted allowed for an examination of network connectedness. Tweet text, a string of characters, allowed for a number of text analysis methods. Geolocation, provided as a sub-setted list was also an interesting variable, however, too few tweets had the data available to make it worth examining.

Below is an example of @UPS’s tweets and the variables pulled.

head(ups)

3.3 Modeling Approach

Subsequent to uncovering limitations (discussed below) a focused approach to understanding the data was made. Two directions were taken with for each of the four data sets ultimately selected. Text analysis of tweets, to include high frequency words and sentiment analysis, and network analysis of those who were tweeting.
## Limitations Limitations in the R software’s computing capability were also exceeded when multiple data pulls were made and combined. Plans to compare how traffic on a specific subject changed over the course of a week were quashed when it became apparent that R could not handle the combined dataset. Other limitations came in package updates that were incompatible with other packages.

4 Results

4.1 Topic muliti-Tweeters

This section looks at the top Tweeters discussing “shipment” and “deliver.” “Shipment” includes “shipments.” Orginally, the data was focused on “ship”, intended to capture mentions of “shipping,” “ships”, and “shipment” but it encompassed too many ads mentioning “free delivery.” “Deliver” includes mentions of delivery and delivered, phrases commonly associated with the final movement of parcels by a third party.

Note: the highlighted portions are intended to denote instances major logistics companies mentioning “shipment” or “delivery/delivered” in their tweets.

4.2 Text analysis

4.2.1 Shipments: High frequency words

Words that show up alongside “shipment” in tweets

It’s not surprising that many of the tweets on “shipment” were likely about the COVID-19 vaccine, as the approval processes for the vaccine administration had been recently approved.

4.2.2 Deliver: High frequency words

Words that show up alongside “delivery” or “delivered” in tweets

It appears that a lot of tweets surrounding “deliver” were political and not related to logistics.

4.3 Sentiment scores

4.3.1 Shipments

Not overly helpful with so much of the tweets’ text ignored.

The original text of the first entry: “The mad dash to make (dollar symbol) before 9pm so I can pay rent sale! Big hero six 5” coaster -20$ Animal crossing tamagotchi magnet -10$ Shipping not included. USA only. Items will be clear coated before shipment. #PayPal only "

This was a hobby entreprenuer, looking to sell items on an etsy account; in short, not a negative a post. Take the following graph with a word of caution.

ship_words <- ship_words %>% 
  anti_join(stop_words)

ship_words <- ship_words %>% 
  left_join(sentiments, by=c("word"="word")) ## join w/ sentiments
head(ship_words)

4.3.2 Delivery

del_words <- del_words %>% 
  anti_join(stop_words)

del_words <- del_words %>% 
  left_join(sentiments, by=c("word"="word")) ## join w/ sentiments
head(del_words)

4.4 Networks

4.4.1 Shipping

4.4.2 Delivery

4.4.3 UPS Network

4.4.4 FedEx network

5 Conclusion

The depth of analysis used did not yield overly compelling and did not answer the questions at hand. Further cleaning of the data to remove non-subject related tweets or focusing solely on the Twitter handles of the four major logistics companies – DHL, FedEx, UPS, and the post office – may have been more conclusive.

Had limitations been more favorable, I would have liked to dove deeper into an analysis of verbiage by business, i.e. delivery begets Amazon how often, and determine if there was a way to leverage lexicon associated with one company over another, i.e. “posted” and the US mail.