I was socially enginnered into this post by this tweet which claimed that news portal opIndia had only five readers

Editor of OpIndia has extended an apology to all their readers. 80% of their readers have accepted it. Could have been 100% but 1 person hasn’t read it yet
— Joy (@Joydas) March 3, 2018

This tweet got me thinking into doing a comparative analysis of like/retweet/favorite/share of social media content of four new age digital media portals (OpIndia,Wire,Scroll,Swarajya). I couldnt find anything relevant on the web ecept for conflicting reports about their credibility on web so i went ahead

This blog post will deal with following issues:

Key objectives:

Compare activity,retweets and favourite status of equal number of tweets shared by all four portals
Compare activity,shares and like status of equal number of facebook p ostsshared by all four portals
Compare Contents via Wordcloud and do sentiment analysis of all these media outlets to verify their bias
Plot engagement and activity variability vis a vis weekday and hour

tmls=readRDS('tmls.RDS')
library(tidyverse)
library(tidytext)
library(wordcloud)
library(ggthemes)
library(ggridges)
library(knitr)
Q25 = function(x){
  quantile(x,0.25)}


Q75 = function(x){
  quantile(x,0.75)}

Data extraction

Equal number of (3200 tweets) were extracted for all four news portals from twitter api . The archive can be found here. similarly,Equal number of (2500 posts) were extracted for all four news portals from facebook api . The archive can be found here and here Data was cleaned, date,month,weekday variables were extracted for plotting. Text was cleaned , sentiment analysis was done as per unigrams by bing and AFINN snetiment lexicon. Word cloud was drawn on the whole and for tweets and posts in upper quartile of each category to compare which tweets/posts are more popular with readership.

Twitter

We will examine activity,retweets and favorites of the tweets by these news-portlas first

Activity

Lets see number and percentage of retweets

library(knitr)
kable(tmls %>% group_by(screen_name) %>% summarize(n=n(),percentage_retweet=round(100*sum(is_retweet=="TRUE")/n,2)))

screen_name	n	percentage_retweet
OpIndia_com	3247	6.16
scroll_in	3212	25.81
SwarajyaMag	3245	0.46
thewire_in	3233	7.64
Thus we see al	most<sp	an style=“color:orange”>25% of tweets by Scroll are retweets

From Now on on we will deal with original tweets unless specified otherwise

Let’s plot tweeting frequency of these portals

Daywise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
ggplot(aes(x=day,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')

Hmm, so we see that for equal number of tweets(3200), OpIndia tweets with lesser frequency/day than wire or scroll or swarajya(which posted 3200 tweets i.e 1600/month in around 2 month) We also see that this year Scroll handle tweets ina bot like fashion in extremely high frequency per day and made almost 3200 tweets in month of Feb alone.

We also see that since mid january activity of opIndia has increased beyond their usual levels.

Let’s plot frequency since 1st jan 2018 for common scale

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
ggplot(aes(x=day,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')

We see clearly that opindia,wire and swarajya have natural variance of tweeting opposed to Scroll which tweets in bot mode.

Week-day wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
  ggplot(aes(x=weekday,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')+theme_economist()+
  labs(title='Activity on weekdays',
       subtitle='Swarajya and OpIndia have lower activity on weekends',
       caption='source Twitter REST API')

We see Activity of OpIndia and Swarajya is lower on Weekends.

Hour-wise

tmls %>% filter(is_retweet=="FALSE") %>% 
  #filter(!(screen_name=="scroll_in")) %>% 
  # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
  
  ggplot(aes(x=hour,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')+theme_economist()+
  labs(title='Activity hour-wise',
       subtitle='OpIndia tweets in mid night as well, Scroll and Wire \nstart tweeting from 9 AM',
       caption='source Twitter REST API')

We see Scroll and Wire start tweeting from 9 AM while OpIndia is active in midnight as well and swarajya handle has a varying schedule

Source-Wise

knitr::kable(tmls  %>% group_by(screen_name,source) %>% summarise(n=n()) %>% mutate(Proportion = round(100*n/sum(n),2)) %>% rename(News_portal=screen_name) %>% arrange(News_portal,desc(Proportion)))

News_portal	source	n	Proportion
OpIndia_com	TweetDeck	2603	80.17
OpIndia_com	Twitter for Windows	280	8.62
OpIndia_com	Twitter for Android	132	4.07
OpIndia_com	Twitter for iPhone	125	3.85
OpIndia_com	Twitter Web Client	107	3.30
scroll_in	Buffer	2257	70.27
scroll_in	TweetDeck	499	15.54
scroll_in	Twitter Web Client	215	6.69
scroll_in	Twitter for Android	133	4.14
scroll_in	Twitter for iPhone	78	2.43
scroll_in	Media Studio	30	0.93
SwarajyaMag	Buffer	1765	54.39
SwarajyaMag	TweetDeck	1031	31.77
SwarajyaMag	Twitter Web Client	155	4.78
SwarajyaMag	Twitter for Android	80	2.47
SwarajyaMag	Bitly	55	1.69
SwarajyaMag	Twitter Ads Composer	50	1.54
SwarajyaMag	Plume for Android	28	0.86
SwarajyaMag	SocialOomph	22	0.68
SwarajyaMag	Twitter Ads	21	0.65
SwarajyaMag	Twitter Lite	11	0.34
SwarajyaMag	Nuzzel	8	0.25
SwarajyaMag	Dabr.eu - latest @Dabr build	7	0.22
SwarajyaMag	Twitter for iPhone	6	0.18
SwarajyaMag	Hootsuite	4	0.12
SwarajyaMag	dlvr.it	2	0.06
thewire_in	TweetDeck	2950	91.25
thewire_in	Twitter Web Client	203	6.28
thewire_in	Twitter for iPhone	72	2.23
thewire_in	Media Studio	8	0.25

We see that Op India and Wire use Tweetdeck to schedule their tweets while Swarajya and Scroll rely mainly on Buffer.

We also see Opindia makes almost 15% of tweets by phone(Android,iphone and surprise Windows!) while other portals make lesser number of tweets by phones, Wire handle only uses iPhone..Swarajya handle has tried all kind of feeds to schedule their posts.

Lets visualise

tmls %>% filter(screen_name=="OpIndia_com") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Activity',title='OpIndia')

tmls %>% filter(screen_name=="thewire_in") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line()+ labs(y='Activity',title='Wire')

tmls %>% filter(screen_name=="scroll_in") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line()+ labs(y='Activity',title='Scroll')

tmls %>% filter(screen_name=="SwarajyaMag") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line()+ labs(y='Activity',title='Swarajya')

Favorites

Lets see summary stats:

summary stats of engagement

Lets calculate summary stats (mean favorites,retweets per tweet of each news portal)

 knitr::kable(tmls %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% rename(News_portal=screen_name) %>%  t(),digits=2)

News_portal	OpIndia_com	scroll_in	SwarajyaMag	thewire_in
retweet_count_Mean	136.778799	3.554763	21.848916	28.004689
favorite_count_Mean	149.852314	6.331935	32.155418	49.439719
engagement_Mean	286.631113	9.886697	54.004334	77.444407
retweet_count_Median	72	1	9	11
favorite_count_Median	92	3	16	23
engagement_Median	166	5	24	35
retweet_count_Q25	37	0	4	6
favorite_count_Q25	50	1	8	13
engagement_Q25	90	2	12	19
retweet_count_Q75	158.5	3.0	20.0	25.0
favorite_count_Q75	173	6	33	46
engagement_Q75	331	9	52	72
retweet_count_SD	191.913683	9.927373	49.704859	62.102115
favorite_count_SD	188.98465	15.85491	63.75643	94.05176
engagement_SD	373.6480	24.6845	110.2582	154.6132
retweet_count_minimum	1	0	0	0
favorite_count_minimum	2	0	0	0
engagement_minimum	3	0	0	1
retweet_count_maximum	2720	248	827	1355
favorite_count_maximum	2562	361	1591	2041
engagement_maximum	5282	451	2264	3396
retweet_count_Total	3047	2383	3230	2986
favorite_count_Total	3047	2383	3230	2986
engagement_Total	3047	2383	3230	2986

Thus we see OpIndia and Wire lead considerable in mean favorites and retweets over Swarajya while scroll’s performance on twitter is poor. OpIndia seems focussed on twitter and performs very well(five fold), here as opposed to provocative tweet

Let’s plot number of favorites of these portals

Daywise

Lets plot mean favorites

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')

Lets plot median favorites

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')

We see OpIndia has massive advantage here..

Let’s plot frequency since 1st jan 2018 for common scale

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')

We see clearly that opindia is leader in Favorites by quite a margin followed by wire swarajya

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')

Week-day wise

Lets plot mean Favorites weekday wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=weekday,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+
  labs(title='Mean Favorites on weekdays',
       subtitle=' OpIndia and Wire  have highest Favorites on sunday',
       caption='source Twitter REST API')

We see even though OpIndia tweets less frequently on sunday its maximum favorites are on sunday, Wire maintains a consistent lead over swarajya.

Lets plot median Favorite weekday wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,weekday) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=weekday,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+
  labs(title='Median Favorites on weekdays',
       subtitle=' OpIndia and Wire  have highest Favorites on sunday',
       caption='source Twitter REST API')

Median is more representative than mean in skewed distributions as it shields from extremes as in this case

Hour-wise

library(ggthemes)
tmls %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=hour,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')+theme_economist()+
  labs(title='Favorites hour-wise',
       
       caption='source Twitter REST API')

We see that OpIndia and Wire have high favorites at 12 AM possibly due to NRI readers, but what explains the bump in mean favorite at 3 AM in wire given that its tweeting frequency is very less, it is possibly due to a tweet which is very popular

Let’s search for wire tweet betweet 3 and 5

library(knitr)
kable(tmls %>% filter(between(hour,2,5)) %>% filter(screen_name=="thewire_in") %>% pull(text)
)

x
Veteran actress Sridevi passes away at 54 https://t.co/DavcQjqO8w https://t.co/2XX3dYOcAr
‘Jan Gan Man Ki Baat’ episode 193: Jammu and Kashmir and governance in Uttar Pradesh https://t.co/CmjnTOFzw7

So we see the mean is propped up by very popular Sridevi death tweet..

so lets do analysis without this tweet..

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=hour,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')+theme_economist()+
  labs(title='Favorites hour-wise',
       
       caption='source Twitter REST API')

So we see Favorite tweets hour wise now without anomaly

Lets visualise Median now

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=hour,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+theme_economist()+
  labs(title='Favorites hour-wise',
       
       caption='source Twitter REST API')

Source-Wise

knitr::kable(tmls  %>% group_by(screen_name,source) %>% summarise(mean_fav=mean(favorite_count),meadian_fav=median(favorite_count)) %>% 
               rename(News_portal=screen_name) %>% arrange(News_portal,mean_fav))

News_portal	source	mean_fav	meadian_fav
OpIndia_com	Twitter for Android	46.6212121	0.0
OpIndia_com	TweetDeck	138.3177103	86.0
OpIndia_com	Twitter for iPhone	168.6720000	105.0
OpIndia_com	Twitter for Windows	177.1035714	106.0
OpIndia_com	Twitter Web Client	184.4112150	119.0
scroll_in	TweetDeck	0.0040080	0.0
scroll_in	Twitter for Android	0.0676692	0.0
scroll_in	Twitter for iPhone	0.2307692	0.0
scroll_in	Twitter Web Client	2.5348837	0.0
scroll_in	Buffer	6.0381037	3.0
scroll_in	Media Studio	29.5666667	14.5
SwarajyaMag	Twitter for iPhone	1.0000000	0.0
SwarajyaMag	Hootsuite	6.0000000	5.5
SwarajyaMag	SocialOomph	12.2272727	6.5
SwarajyaMag	Twitter Lite	13.1818182	12.0
SwarajyaMag	dlvr.it	14.0000000	14.0
SwarajyaMag	Bitly	26.7818182	16.0
SwarajyaMag	Plume for Android	27.3214286	22.0
SwarajyaMag	Twitter Ads Composer	28.3400000	13.0
SwarajyaMag	Dabr.eu - latest @Dabr build	30.5714286	19.0
SwarajyaMag	TweetDeck	30.8806984	14.0
SwarajyaMag	Twitter Web Client	31.0903226	17.0
SwarajyaMag	Buffer	32.4679887	17.0
SwarajyaMag	Twitter for Android	38.1625000	11.5
SwarajyaMag	Nuzzel	75.2500000	30.0
SwarajyaMag	Twitter Ads	90.6190476	29.0
thewire_in	Twitter for iPhone	20.9583333	0.0
thewire_in	Twitter Web Client	27.9408867	6.0
thewire_in	TweetDeck	47.2691525	23.0
thewire_in	Media Studio	125.2500000	116.5

We see that phone clients have higher mean favorites than tweet deck which is interesting!

tmls %>% group_by(screen_name,source,hour) %>%  summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%  
  arrange(hour) %>%  ggplot(aes(x=hour,y=favorite_count_Mean,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Mean_Favorite by hour')+ facet_wrap(~screen_name)

tmls %>% group_by(screen_name,source,hour) %>%  summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%  
  arrange(hour) %>%  ggplot(aes(x=hour,y=favorite_count_Median,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Median_Favorite by hour')+ facet_wrap(~screen_name)

Retweets

Let’s plot number of Retweets of these portals

Daywise

Lets plot mean retweets by day

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Retweets')

We see OpIndia has massive advantage here..

Let’s plot frequency since 1st jan 2018 for common scale

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')

We see clearly that opindia is leader in Favorites by quite a margin followed by wire and swarajya , while Scroll is at bottom

Lets plot median numbers now

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')

We see than median high of 200 plus earlier and now lesser, because tweeting frequency has increased and hence the best outliers are not able to drive the numbers..So it shouldnt be confused with worse performance.

Since 1st Jan

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')

Week-day wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
  ggplot(aes(x=weekday,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+
  labs(title='Mean_Retweets on weekdays',
       subtitle=' OpIndia and Wire  have highest retweets on sunday/n
       Swarajya marginally overtakes wire on saturday',
       caption='source Twitter REST API')

We see even though OpIndia tweets less frequently on sunday its maximum retweets are on sunday, Wire maintains a consistent lead over swarajya but dips on saturday

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
  ggplot(aes(x=weekday,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')+
  labs(title='Median_Retweets on weekdays',
       subtitle=' OpIndia and Wire  have highest retweets on sunday/n
       Swarajya marginally overtakes wire on saturday',
       caption='source Twitter REST API')

Hour-wise

tmls %>% filter(is_retweet=="FALSE") %>% 
  #filter(!(screen_name=="scroll_in")) %>% 
  # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,hour)  %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  ggplot(aes(x=hour,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+theme_economist()+
  labs(title='Mean Retweets hour-wise',
       
       caption='source Twitter REST API')

We see that OpIndia and Wire have high retweets at 12 AM possibly due to NRI readers similar to its favorite pattern. Let’s exclude that anaomalous wire tweet betweet 3 and 5

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
  group_by(screen_name,hour)  %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  ggplot(aes(x=hour,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+theme_economist()+
  labs(title='Mean Retweets hour-wise',
       
       caption='source Twitter REST API')

So we see Retweet tweets hour wise now without anomaly

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
  group_by(screen_name,hour)  %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  ggplot(aes(x=hour,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')+theme_economist()+
  labs(title='Median Retweets hour-wise',
       
       caption='source Twitter REST API')

Source-Wise

knitr::kable(tmls  %>% group_by(screen_name,source) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
               rename(News_portal=screen_name) %>% arrange(News_portal,mean_rt))

News_portal	source	n	mean_rt	mean_fav	median_rt	median_fav
OpIndia_com	Twitter for Android	132	120.446970	46.6212121	50.5	0.0
OpIndia_com	TweetDeck	2603	125.940838	138.3177103	66.0	86.0
OpIndia_com	Twitter for iPhone	125	155.832000	168.6720000	96.0	105.0
OpIndia_com	Twitter Web Client	107	174.140187	184.4112150	100.0	119.0
OpIndia_com	Twitter for Windows	280	216.371429	177.1035714	119.5	106.0
scroll_in	Buffer	2257	3.337173	6.0381037	1.0	3.0
scroll_in	TweetDeck	499	4.641283	0.0040080	3.0	0.0
scroll_in	Twitter for Android	133	5.556391	0.0676692	2.0	0.0
scroll_in	Twitter Web Client	215	7.809302	2.5348837	2.0	0.0
scroll_in	Twitter for iPhone	78	11.064103	0.2307692	2.0	0.0
scroll_in	Media Studio	30	20.000000	29.5666667	10.0	14.5
SwarajyaMag	Hootsuite	4	1.250000	6.0000000	0.0	5.5
SwarajyaMag	dlvr.it	2	4.500000	14.0000000	4.5	14.0
SwarajyaMag	Twitter Lite	11	6.545454	13.1818182	6.0	12.0
SwarajyaMag	SocialOomph	22	8.454546	12.2272727	3.5	6.5
SwarajyaMag	Twitter Ads Composer	50	16.420000	28.3400000	6.5	13.0
SwarajyaMag	Bitly	55	18.509091	26.7818182	8.0	16.0
SwarajyaMag	Dabr.eu - latest @Dabr build	7	20.142857	30.5714286	6.0	19.0
SwarajyaMag	Buffer	1765	21.978470	32.4679887	9.0	17.0
SwarajyaMag	Twitter Web Client	155	23.193548	31.0903226	10.0	17.0
SwarajyaMag	Twitter for Android	80	23.387500	38.1625000	7.0	11.5
SwarajyaMag	TweetDeck	1031	23.462658	30.8806984	8.0	14.0
SwarajyaMag	Plume for Android	28	24.035714	27.3214286	11.0	22.0
SwarajyaMag	Nuzzel	8	45.750000	75.2500000	14.0	30.0
SwarajyaMag	Twitter Ads	21	68.428571	90.6190476	22.0	29.0
SwarajyaMag	Twitter for iPhone	6	77.666667	1.0000000	40.0	0.0
thewire_in	TweetDeck	2950	29.267797	47.2691525	12.0	23.0
thewire_in	Twitter Web Client	203	31.970443	27.9408867	8.0	6.0
thewire_in	Twitter for iPhone	72	50.486111	20.9583333	16.0	0.0
thewire_in	Media Studio	8	64.125000	125.2500000	47.0	116.5

We see that Twitter for Windows client has highest mean retweets for opindia.!

tmls %>% group_by(screen_name,source,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=mean_rt,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Mean Retweet by hour')+ facet_wrap(~screen_name)

tmls %>% group_by(screen_name,source,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=median_rt,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Median Retweet by hour')+ facet_wrap(~screen_name)

Sentiment analysis

We will do sentiment analysis of text of headlines of all these newsportals using three established resources in english language based on Bing , Afinn and sentimentr lexicon . Big and Afinn calculate sentiments of individual words and sums them up while sentimentr deals with valences.

We shall do sentiment analysis with all these tools and see if there is consistency in analysis..

Let’s do first calculate sentiment of text by sentimentr package

library(sentimentr)
tmls %>% mutate(senti=sentiment_by(tmls$text,by=NULL)$ave_sentiment) %>% group_by(screen_name) %>% summarise(means=mean(senti))

## # A tibble: 4 x 2
##   screen_name    means
##   <chr>          <dbl>
## 1 OpIndia_com -0.0624 
## 2 scroll_in   -0.00528
## 3 SwarajyaMag  0.0206 
## 4 thewire_in  -0.0254

We see OpIndia has most negative sentences followed by wire and Scroll .Swarajya has mostly positive headlines.

Lets run this anlysis by Bing lexicon.

tmlz=tmls %>% select(text,screen_name,retweet_count,favorite_count,status_id,created_at,month,day,hour,weekday) %>% 
  mutate(tweetnumber=row_number())


bing <- get_sentiments("bing")

afinn = get_sentiments('afinn')

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% group_by(screen_name) %>% summarize(mean_sentiment=mean(sentiment),
                                                                                  sd_sentiment=sd(sentiment),variability=
                                                                                    sd_sentiment/abs(mean_sentiment)) %>% 
  mutate(method="Bing")

## # A tibble: 4 x 5
##   screen_name mean_sentiment sd_sentiment variability method
##   <chr>                <dbl>        <dbl>       <dbl> <chr> 
## 1 OpIndia_com        -0.613          1.27        2.07 Bing  
## 2 scroll_in          -0.148          1.40        9.47 Bing  
## 3 SwarajyaMag         0.0889         1.43       16.0  Bing  
## 4 thewire_in         -0.349          1.33        3.81 Bing

Even Bing shows similar pattern. However we see that variability in sentiments in headlines of swarajya is highest followed while in opindia it is lowest, indicating possible role of multiple people rather than single person in deciding headlines and more varied stories

Lets see if this trend hold in higher retweets

tmlz %>% filter(retweet_count>50) %>% 
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% group_by(screen_name) %>% summarize(mean_sentiment=mean(sentiment),
                                                                                  sd_sentiment=sd(sentiment),variability=
                                                                                    sd_sentiment/abs(mean_sentiment)) %>% 
  mutate(method="Bing")

## # A tibble: 4 x 5
##   screen_name mean_sentiment sd_sentiment variability method
##   <chr>                <dbl>        <dbl>       <dbl> <chr> 
## 1 OpIndia_com         -0.674         1.24        1.85 Bing  
## 2 scroll_in            0.267         1.33        5.00 Bing  
## 3 SwarajyaMag         -0.208         1.54        7.38 Bing  
## 4 thewire_in          -0.467         1.38        2.96 Bing

We see that in news with retweet higher than 50 , mean sentiment is more negative indicating negative news gets tweeted most.

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(afinn) %>%
  group_by(screen_name) %>% 
  summarise(n=n(),sentiment = sum(score),mean_sentiment=sentiment/n,sd_sentiment=sd(score),variability=
                                                                                    sd_sentiment/abs(mean_sentiment),
            
            method="Afinn")%>% select(-sentiment)

## # A tibble: 4 x 6
##   screen_name     n mean_sentiment sd_sentiment variability method
##   <chr>       <int>          <dbl>        <dbl>       <dbl> <chr> 
## 1 OpIndia_com  3296        -0.734          2.00        2.72 Afinn 
## 2 scroll_in    3059        -0.0778         2.22       28.5  Afinn 
## 3 SwarajyaMag  2922         0.129          2.00       15.5  Afinn 
## 4 thewire_in   3066        -0.405          2.00        4.93 Afinn

Afinn score again suggest similar trend, though suggesting wider variation in Scroll headlines

Lets look at how sentiment varies with day

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% 
  ggplot( aes(day, sentiment, fill = screen_name)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~screen_name, ncol = 2, scales = "free_x")

**We see that Opindia and wire mantain a consistent critical tone, while Scroll and swarajya have varying sentiments, in February articles of Opindia have been of more engative sentiment.*

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,weekday, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% 
  ggplot( aes(weekday, sentiment, fill = screen_name)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~screen_name, ncol = 2, scales = "free_x")

We see OPIndia posts most critical headlines on wednesday

Now we analyse by hour

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,hour, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% 
  ggplot( aes(hour, sentiment, fill = screen_name)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~screen_name, ncol = 2, scales = "free_x")

We see the cumulative sentiment by hour also depends on tweeting frequency of news portals at particular hour, thus this pattern resmebles activity.

WordCloud

Lets see word cloud of OPIndia

cmw = c("media","omitted","http","https","html","www",".com","t.co","to","the","of","by","and","an","its","writes",
         "we","as","that","how","after","a","for","in","from","with","on","rt","now","him","about","his","this","are",
        "while","no","but","is","what","who","they","you","has","had","have","svaradarajan")
tmlz %>%
  unnest_tokens(word, text) %>% 
  anti_join(stop_words) %>% 
  filter(!(word %in% cmw)) %>% 
 # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="OpIndia_com") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

Expectedly congress dominates as opindia is critical of it.

Lets see wordcloud in retweets higher than 50

tmlz %>%
  unnest_tokens(word, text) %>% 
  anti_join(stop_words) %>% 
  filter(retweet_count>50) %>% 
  filter(!(word %in% cmw)) %>% 
 # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="OpIndia_com") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

we see congress,gandhi,muslim appear prominent in headlines as opindia readership likes these articles

Lets see corresponding figures for wire

tmlz %>%
  unnest_tokens(word, text) %>% 
anti_join(stop_words) %>% 
  filter(!(word %in% cmw)) %>% 
  #filter(retweet_count>50) %>% 
  # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="thewire_in") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

in Wire wordcloud modi,bjp is more prominent, it also talks about india.

Lets see highly retweeted articles of Wire.

tmlz %>%
  unnest_tokens(word, text) %>% 
anti_join(stop_words) %>%
  filter(retweet_count>50) %>% 
  filter(!(word %in% cmw)) %>% 
  #filter(retweet_count>50) %>% 
  # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="thewire_in") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

We can clearly see that articles with modi,govt find more prominence in highly retweeted articles of wire indicating its leadership wants to read these articles

Facebook

We now analyse facebook archive briefly just focussing on which News portals have higher prominence there or does it follow twitter trend ?

tmlfb=readRDS('tmlfb.RDS')

Lets’ examine the trend of share and Retweets, does it hold like twitter?

 knitr::kable(tmlfb %>% 
 
  group_by(screen_name) %>% summarise_at(.vars = c("likes_count","shares_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% rename(News_portal=screen_name) %>%  t(),digits=2)

News_portal	OpIndia.com	Scroll	Swarajya	TheWire.in
likes_count_Mean	255.6568	422.0168	324.7364	213.2176
shares_count_Mean	51.2300	608.9764	83.5996	147.8304
engagement_Mean	306.8868	1030.9932	408.3360	361.0480
likes_count_Median	59.0	58.5	115.0	46.0
shares_count_Median	11	16	12	9
engagement_Median	73.5	79.0	130.5	56.0
likes_count_Q25	21	23	45	15
shares_count_Q25	3	3	4	2
engagement_Q25	25	27	50	19
likes_count_Q75	140	182	289	143
shares_count_Q75	33.00	68.25	44.25	41.00
engagement_Q75	178.00	258.25	338.25	192.25
likes_count_SD	3375.9292	3222.6586	1327.2583	865.7914
shares_count_SD	195.2241	9584.8281	839.9081	1274.1077
engagement_SD	3424.264	12387.378	1722.826	2003.546
likes_count_minimum	0	0	0	0
shares_count_minimum	0	0	0	0
engagement_minimum	0	0	0	0
likes_count_maximum	152317	96129	57125	30296
shares_count_maximum	4966	421195	40082	36555
engagement_maximum	153262	517324	59940	66851
likes_count_Total	2500	2500	2500	2500
shares_count_Total	2500	2500	2500	2500
engagement_Total	2500	2500	2500	2500

Wow here we see that Scroll and Wire absolutely clobber Swarajya and opIndia..seems too good to be true..since Scroll didnt do well at all on twitter, which is the favorite site of news-lovers.

Probably, it has reasons in type of link being shared.Let’s analyse it

library(knitr)
kable(tmlfb %>% group_by(screen_name,type) %>% summarize(n=n(),mean_like = mean(likes_count),
                                                   mean_share = mean(shares_count),
                                                   mean_engagement = mean(engagement),
                                                   median_like = median(likes_count),
                                                   median_share = median(shares_count),
                                                   median_engagement = median(engagement)) %>% 
  mutate(proportion=round(100*n/2500,2)) %>% select(screen_name,type,n,proportion,everything()) %>% 
  arrange(type))

screen_name	type	n	proportion	mean_like	mean_share	mean_engagement	median_like	median_share	median_engagement
TheWire.in	event	1	0.04	52.00000	0.00000	52.00000	52.0	0.0	52.0
OpIndia.com	link	2339	93.56	260.81317	45.89312	306.70628	61.0	11.0	74.0
Scroll	link	1141	45.64	83.37248	17.06310	100.43558	30.0	3.0	35.0
Swarajya	link	1970	78.80	283.01929	47.21168	330.23096	120.0	13.0	135.5
TheWire.in	link	1959	78.36	132.00000	28.81879	160.81879	34.0	6.0	41.0
TheWire.in	note	1	0.04	18.00000	1.00000	19.00000	18.0	1.0	19.0
Swarajya	offer	1	0.04	880.00000	0.00000	880.00000	880.0	0.0	880.0
OpIndia.com	photo	123	4.92	114.43089	46.43902	160.86992	40.0	6.0	43.0
Scroll	photo	74	2.96	280.43243	32.62162	313.05405	48.5	4.0	49.5
Swarajya	photo	348	13.92	223.25575	34.47989	257.73563	49.0	3.0	53.0
TheWire.in	photo	59	2.36	132.22034	15.49153	147.71186	13.0	3.0	16.0
OpIndia.com	status	20	0.80	149.70000	29.55000	179.25000	17.0	4.0	23.0
Swarajya	status	2	0.08	15.00000	2.50000	17.50000	15.0	2.5	17.5
TheWire.in	status	23	0.92	29.13043	16.52174	45.65217	14.0	4.0	22.0
OpIndia.com	video	18	0.72	668.38889	801.55556	1469.94444	273.0	78.5	340.0
Scroll	video	1285	51.40	730.86537	1167.74942	1898.61479	107.0	52.0	164.0
Swarajya	video	179	7.16	981.50838	580.93855	1562.44693	228.0	63.0	310.0
TheWire.in	video	457	18.28	581.87090	682.33042	1264.20131	210.0	83.0	313.0

Aha so we can clearly see here OpIndia and Swarajya dominate even here in news link category and indeed these form the major part of their public page posts, but they have very low presence in video and photo posts which typically get higher likes n shares..probably it is due to nature of crowd on facebook.. Also a lesson for these RW digital outlets to step up their game in video links..

We will focus on links for now..

tmlfb %>% # filter(is_retweet=="FALSE") %>% 
  # filter(!(screen_name=="scroll_in")) %>% 
  filter(type=="link") %>% 
 # filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% 
  
  summarise(mean_like = mean(likes_count),
            mean_share = mean(shares_count),
            mean_engagement = mean(engagement)) %>% 
  
  ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()

We see OPindia and swarajya dominate the link space on facebook, but of late we see dominance of wire. Lets zoom on this graph after december

tmlfb %>% # filter(is_retweet=="FALSE") %>% 
  # filter(!(screen_name=="scroll_in")) %>% 
  filter(type=="link") %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% 
  
  summarise(mean_like = mean(likes_count),
            mean_share = mean(shares_count),
            mean_engagement = mean(engagement)) %>% 
  
  ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()

We can see clearly Jan 15 onwards, Wire is doing great while OpIndia is slacking out on facebook

Lets lookat video

tmlfb %>% # filter(is_retweet=="FALSE") %>% 
  # filter(!(screen_name=="scroll_in")) %>% 
  filter(type=="video") %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% 
  
  summarise(mean_like = mean(likes_count),
            mean_share = mean(shares_count),
            mean_engagement = mean(engagement)) %>% 
  
  ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()

We can see clearly that Scroll dominates video, but of late wire is on rise here as well. Swarajya and OpIndia seem less invested in video.

Key takeaways:

1.OPIndia dominates twitter by a big margin

2.Wire is on ascendancy on facebook,with aggressive video and link shares and likes

3.OPIndia and Wire are more extreme outlets with opposite points of view and negative sentiments

4.Swarajya and Scroll have more varied articles

5.News shared from phone clients have higher resonance

6.OpIndia and Swarajya have less original links on weekends, but maximum retweets and likes happen on these days

7.Video articles on Facebook shared and liked to higher degree

8.Swarajya and OPIndia dominate news Link space on facebook as well

Disclosure: I have right of centre views and have written article for OpIndia, however I have provide downloadable archive in the beginning and R code for analysis is self-contained in this page

Analysis Of reach and engagement of Indian digital media outlets

Anupam kumar Singh, MD

6 March 2018

Key objectives:

Data extraction

Twitter

Activity

Daywise

Week-day wise

Hour-wise

Source-Wise

Favorites

summary stats of engagement

Daywise

Week-day wise

Hour-wise

Source-Wise

Retweets

Daywise

Week-day wise

Hour-wise

Source-Wise

Sentiment analysis

WordCloud

Facebook

Key takeaways: