I analysed twitter activity of the following top global pharma companies (by sales):
Given below are the findings.
Data Collection Collected tweets related to these major pharma companies using twitter API, looking for tweets including @merck @pfizer etc. Twitter search API by default returns tweets over the last 7 days.
Engamanent Profile Use tweet count as a measure of relative engament profile of these companies for the period in consideration.
Wordcloud Create a wordcloud with the most frequent words for each of these pharma companies and investigate if there are any interesting trends.
Dendogram Create a dendogram for the words frequently tweeted together and understand if there are any interesting trends.
Sentiment Analysis Perform sentiment analysis of the tweets using simple word sentiment score methodology , where each word is scored based on whether it is a positive word or negative word and the entire tweet is scored on the sum of individuals word scores in the tweet.
Tweets with a 0 score are treated a neutral tweets.Tweets with negative scores are treated as as negative tweets and tweets with positive scores are treated as positive tweets.
Given below is the wordcloud created from tweets related to Novartis.
Observations
Cancer seems to be the most frequent word followed by `malaria’.aacr (American association for cancer research) is also prominent along with uspto (US patents office). research and learn are also very frequently observed along with patents and human.This is in line with expectations of tweets from a Oncology focused pharma major , no surprises here.
Given below is the dendogram created from tweets related to Novartis.
This dendogram is very interesting to observe.
Cancer is frequently tweeted along with research and learn whereas Malaria is alone in its cluster - this is a good indicator of Novartis’s focus on Cancer research.sanofi tweeeted along with winner in Novartis twitter stream - when I googled these strings together , I found that Novartis and Sanofi along with Sunpower have won 2015 Patents for Humanity AwardsGiven below is the wordcloud created from tweets related to Novartis.
Observations
First thing I notice on this wordcloud, compared to what Novartis wordcloud is that the focus on diseases is missing here. Two key themes emerge.
amp is mentioned in a lot of tweets.Given below is the dendogram created from tweets related to Pfizer.
Along with the observations above the dendogram also brings in an additional perspective , which involves CEO , Read GSK and Merck - when I looked up why, it seems analysts expect that Pfizer’s CEO Mr.Read might make a bid for GSK , and that he made a 2.9B deal with merck recently on cancer therapies.
Given below is the wordcloud created from tweets related to Novartis.
Observations
These are typical tweets you would expect from a Pharma company ,focused on disease and patients with lesser engagements on other topics. This also explains why Roche is one of the least engaged Pharma companies in Twitter.
Given below is the dendogram created from tweets related to Roche.
The dendogram confirms our finding in the word cloud - tweets are entirely focused on the diseases and trials .Roche seems to be minimally engaged in other activities , at least as it is viewed in Twitter.
Given below is the wordcloud created from tweets related to Novartis.
As expected the tweets are completely overwhelmed by the 2015 Patents for Humanity awards - sunpower , novartis , bravo , congrats, winner etc Also terms related to Sanofi’s core business amp , malaria, health , innovation etc are also frequently tweeted.
Lets take a look at the dendogram.
Given below is the wordcloud created from tweets related to Novartis.
I was totally surprised looking at Merck’s tweet wordcloud spacex , NASA, payloads , spacestation,launch , iss , cargo etc. It was more like that of a space company than that of a pharma major - had to google it. I found the explanation - On April 17 , Merck had sent a protein crystal growth experiment to International Space Station.
Oncology , Cancer etc also find mention in relatively lesser scale.
Lets take a look at the dendogram.
Now that we have analysed the individual tweets of these pharma majors , let us take a look at the sentiment analysis plot for these tweets for each of these pharma majors.
It seems most of the plots are neutral and this plot doesn’t provide a good understanding of the tweet sentiment.
Let us calculate the tweet sentiment excluding neutral tweets and identifying the ratio of positive tweets to the overall non-negative tweets.
Using the simple word sentiment approach , we see Sanofi leading the positive sentiment score followed by Pfizer and Merck. Sanofi and Merck are helped by the congratulatory messages for their Patents for Humanity win. Merck seems to have the least positive sentiment among its Top 5 Pharma peers.
We observed a few interesting trends using a simple visual approach to the tweets and unearthed a few news items explaining curious observations in the wrodclouds and dendogram in the above analysis.
We should take the sentiment score results with a bit of caution as word based sentiment analysis can be a bit misleading especially in an industry that deals with diseases , trials and therapies.
As a future improvement on this project , I am considering improving this analysis using advanced natural language processing methodologies so calculate the sentiment scores.