I was socially enginnered into this post by this tweet which claimed that news portal opIndia had only five readers

This tweet got me thinking into doing a comparative analysis of like/retweet/favorite/share of social media content of four new age digital media portals (OpIndia,Wire,Scroll,Swarajya). I couldnt find anything relevant on the web ecept for conflicting reports about their credibility on web so i went ahead

This blog post will deal with following issues:

Key objectives:

  1. Compare activity,retweets and favourite status of equal number of tweets shared by all four portals

  2. Compare activity,shares and like status of equal number of facebook p ostsshared by all four portals

  3. Compare Contents via Wordcloud and do sentiment analysis of all these media outlets to verify their bias

  4. Plot engagement and activity variability vis a vis weekday and hour

tmls=readRDS('tmls.RDS')
library(tidyverse)
library(tidytext)
library(wordcloud)
library(ggthemes)
library(ggridges)
library(knitr)
Q25 = function(x){
  quantile(x,0.25)}


Q75 = function(x){
  quantile(x,0.75)}

Data extraction

Equal number of (3200 tweets) were extracted for all four news portals from twitter api . The archive can be found here. similarly,Equal number of (2500 posts) were extracted for all four news portals from facebook api . The archive can be found here and here Data was cleaned, date,month,weekday variables were extracted for plotting. Text was cleaned , sentiment analysis was done as per unigrams by bing and AFINN snetiment lexicon. Word cloud was drawn on the whole and for tweets and posts in upper quartile of each category to compare which tweets/posts are more popular with readership.

Twitter

We will examine activity,retweets and favorites of the tweets by these news-portlas first

Activity

Lets see number and percentage of retweets

library(knitr)
kable(tmls %>% group_by(screen_name) %>% summarize(n=n(),percentage_retweet=round(100*sum(is_retweet=="TRUE")/n,2)))
screen_name n percentage_retweet
OpIndia_com 3247 6.16
scroll_in 3212 25.81
SwarajyaMag 3245 0.46
thewire_in 3233 7.64
Thus we see al most<sp an style=“color:orange”>25% of tweets by Scroll are retweets

From Now on on we will deal with original tweets unless specified otherwise

Let’s plot tweeting frequency of these portals

Daywise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
ggplot(aes(x=day,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')

Hmm, so we see that for equal number of tweets(3200), OpIndia tweets with lesser frequency/day than wire or scroll or swarajya(which posted 3200 tweets i.e 1600/month in around 2 month) We also see that this year Scroll handle tweets ina bot like fashion in extremely high frequency per day and made almost 3200 tweets in month of Feb alone.

We also see that since mid january activity of opIndia has increased beyond their usual levels.

Let’s plot frequency since 1st jan 2018 for common scale

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
ggplot(aes(x=day,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')

We see clearly that opindia,wire and swarajya have natural variance of tweeting opposed to Scroll which tweets in bot mode.

Week-day wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
  ggplot(aes(x=weekday,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')+theme_economist()+
  labs(title='Activity on weekdays',
       subtitle='Swarajya and OpIndia have lower activity on weekends',
       caption='source Twitter REST API')

We see Activity of OpIndia and Swarajya is lower on Weekends.

Hour-wise

tmls %>% filter(is_retweet=="FALSE") %>% 
  #filter(!(screen_name=="scroll_in")) %>% 
  # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>% 
  mutate(rt_fav_ratio = mean_rt/mean_fav) %>% 
  
  ggplot(aes(x=hour,y=n,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')+theme_economist()+
  labs(title='Activity hour-wise',
       subtitle='OpIndia tweets in mid night as well, Scroll and Wire \nstart tweeting from 9 AM',
       caption='source Twitter REST API')

We see Scroll and Wire start tweeting from 9 AM while OpIndia is active in midnight as well and swarajya handle has a varying schedule

Source-Wise

knitr::kable(tmls  %>% group_by(screen_name,source) %>% summarise(n=n()) %>% mutate(Proportion = round(100*n/sum(n),2)) %>% rename(News_portal=screen_name) %>% arrange(News_portal,desc(Proportion)))
News_portal source n Proportion
OpIndia_com TweetDeck 2603 80.17
OpIndia_com Twitter for Windows 280 8.62
OpIndia_com Twitter for Android 132 4.07
OpIndia_com Twitter for iPhone 125 3.85
OpIndia_com Twitter Web Client 107 3.30
scroll_in Buffer 2257 70.27
scroll_in TweetDeck 499 15.54
scroll_in Twitter Web Client 215 6.69
scroll_in Twitter for Android 133 4.14
scroll_in Twitter for iPhone 78 2.43
scroll_in Media Studio 30 0.93
SwarajyaMag Buffer 1765 54.39
SwarajyaMag TweetDeck 1031 31.77
SwarajyaMag Twitter Web Client 155 4.78
SwarajyaMag Twitter for Android 80 2.47
SwarajyaMag Bitly 55 1.69
SwarajyaMag Twitter Ads Composer 50 1.54
SwarajyaMag Plume for Android 28 0.86
SwarajyaMag SocialOomph 22 0.68
SwarajyaMag Twitter Ads 21 0.65
SwarajyaMag Twitter Lite 11 0.34
SwarajyaMag Nuzzel 8 0.25
SwarajyaMag Dabr.eu - latest @Dabr build 7 0.22
SwarajyaMag Twitter for iPhone 6 0.18
SwarajyaMag Hootsuite 4 0.12
SwarajyaMag dlvr.it 2 0.06
thewire_in TweetDeck 2950 91.25
thewire_in Twitter Web Client 203 6.28
thewire_in Twitter for iPhone 72 2.23
thewire_in Media Studio 8 0.25

We see that Op India and Wire use Tweetdeck to schedule their tweets while Swarajya and Scroll rely mainly on Buffer.

We also see Opindia makes almost 15% of tweets by phone(Android,iphone and surprise Windows!) while other portals make lesser number of tweets by phones, Wire handle only uses iPhone..Swarajya handle has tried all kind of feeds to schedule their posts.

Lets visualise

tmls %>% filter(screen_name=="OpIndia_com") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Activity',title='OpIndia')

tmls %>% filter(screen_name=="thewire_in") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line()+ labs(y='Activity',title='Wire')

tmls %>% filter(screen_name=="scroll_in") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line()+ labs(y='Activity',title='Scroll')

tmls %>% filter(screen_name=="SwarajyaMag") %>% group_by(source,hour) %>% summarise(n=n()) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
  geom_line()+ labs(y='Activity',title='Swarajya')

Favorites

Lets see summary stats:

summary stats of engagement

Lets calculate summary stats (mean favorites,retweets per tweet of each news portal)

 knitr::kable(tmls %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% rename(News_portal=screen_name) %>%  t(),digits=2)
News_portal OpIndia_com scroll_in SwarajyaMag thewire_in
retweet_count_Mean 136.778799 3.554763 21.848916 28.004689
favorite_count_Mean 149.852314 6.331935 32.155418 49.439719
engagement_Mean 286.631113 9.886697 54.004334 77.444407
retweet_count_Median 72 1 9 11
favorite_count_Median 92 3 16 23
engagement_Median 166 5 24 35
retweet_count_Q25 37 0 4 6
favorite_count_Q25 50 1 8 13
engagement_Q25 90 2 12 19
retweet_count_Q75 158.5 3.0 20.0 25.0
favorite_count_Q75 173 6 33 46
engagement_Q75 331 9 52 72
retweet_count_SD 191.913683 9.927373 49.704859 62.102115
favorite_count_SD 188.98465 15.85491 63.75643 94.05176
engagement_SD 373.6480 24.6845 110.2582 154.6132
retweet_count_minimum 1 0 0 0
favorite_count_minimum 2 0 0 0
engagement_minimum 3 0 0 1
retweet_count_maximum 2720 248 827 1355
favorite_count_maximum 2562 361 1591 2041
engagement_maximum 5282 451 2264 3396
retweet_count_Total 3047 2383 3230 2986
favorite_count_Total 3047 2383 3230 2986
engagement_Total 3047 2383 3230 2986

Thus we see OpIndia and Wire lead considerable in mean favorites and retweets over Swarajya while scroll’s performance on twitter is poor. OpIndia seems focussed on twitter and performs very well(five fold), here as opposed to provocative tweet

Let’s plot number of favorites of these portals

Daywise

Lets plot mean favorites

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')

Lets plot median favorites

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')

We see OpIndia has massive advantage here..

Let’s plot frequency since 1st jan 2018 for common scale

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')

We see clearly that opindia is leader in Favorites by quite a margin followed by wire swarajya

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=day,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')

Week-day wise

Lets plot mean Favorites weekday wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=weekday,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+
  labs(title='Mean Favorites on weekdays',
       subtitle=' OpIndia and Wire  have highest Favorites on sunday',
       caption='source Twitter REST API')

We see even though OpIndia tweets less frequently on sunday its maximum favorites are on sunday, Wire maintains a consistent lead over swarajya.

Lets plot median Favorite weekday wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,weekday) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=weekday,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+
  labs(title='Median Favorites on weekdays',
       subtitle=' OpIndia and Wire  have highest Favorites on sunday',
       caption='source Twitter REST API')

Median is more representative than mean in skewed distributions as it shields from extremes as in this case

Hour-wise

library(ggthemes)
tmls %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=hour,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')+theme_economist()+
  labs(title='Favorites hour-wise',
       
       caption='source Twitter REST API')

We see that OpIndia and Wire have high favorites at 12 AM possibly due to NRI readers, but what explains the bump in mean favorite at 3 AM in wire given that its tweeting frequency is very less, it is possibly due to a tweet which is very popular

Let’s search for wire tweet betweet 3 and 5

library(knitr)
kable(tmls %>% filter(between(hour,2,5)) %>% filter(screen_name=="thewire_in") %>% pull(text)
)
x
Veteran actress Sridevi passes away at 54 https://t.co/DavcQjqO8w https://t.co/2XX3dYOcAr
‘Jan Gan Man Ki Baat’ episode 193: Jammu and Kashmir and governance in Uttar Pradesh https://t.co/CmjnTOFzw7

So we see the mean is propped up by very popular Sridevi death tweet..

so lets do analysis without this tweet..

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=hour,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')+theme_economist()+
  labs(title='Favorites hour-wise',
       
       caption='source Twitter REST API')

So we see Favorite tweets hour wise now without anomaly

Lets visualise Median now

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
 
  group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% 
ggplot(aes(x=hour,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+theme_economist()+
  labs(title='Favorites hour-wise',
       
       caption='source Twitter REST API')

Source-Wise

knitr::kable(tmls  %>% group_by(screen_name,source) %>% summarise(mean_fav=mean(favorite_count),meadian_fav=median(favorite_count)) %>% 
               rename(News_portal=screen_name) %>% arrange(News_portal,mean_fav))
News_portal source mean_fav meadian_fav
OpIndia_com Twitter for Android 46.6212121 0.0
OpIndia_com TweetDeck 138.3177103 86.0
OpIndia_com Twitter for iPhone 168.6720000 105.0
OpIndia_com Twitter for Windows 177.1035714 106.0
OpIndia_com Twitter Web Client 184.4112150 119.0
scroll_in TweetDeck 0.0040080 0.0
scroll_in Twitter for Android 0.0676692 0.0
scroll_in Twitter for iPhone 0.2307692 0.0
scroll_in Twitter Web Client 2.5348837 0.0
scroll_in Buffer 6.0381037 3.0
scroll_in Media Studio 29.5666667 14.5
SwarajyaMag Twitter for iPhone 1.0000000 0.0
SwarajyaMag Hootsuite 6.0000000 5.5
SwarajyaMag SocialOomph 12.2272727 6.5
SwarajyaMag Twitter Lite 13.1818182 12.0
SwarajyaMag dlvr.it 14.0000000 14.0
SwarajyaMag Bitly 26.7818182 16.0
SwarajyaMag Plume for Android 27.3214286 22.0
SwarajyaMag Twitter Ads Composer 28.3400000 13.0
SwarajyaMag Dabr.eu - latest @Dabr build 30.5714286 19.0
SwarajyaMag TweetDeck 30.8806984 14.0
SwarajyaMag Twitter Web Client 31.0903226 17.0
SwarajyaMag Buffer 32.4679887 17.0
SwarajyaMag Twitter for Android 38.1625000 11.5
SwarajyaMag Nuzzel 75.2500000 30.0
SwarajyaMag Twitter Ads 90.6190476 29.0
thewire_in Twitter for iPhone 20.9583333 0.0
thewire_in Twitter Web Client 27.9408867 6.0
thewire_in TweetDeck 47.2691525 23.0
thewire_in Media Studio 125.2500000 116.5

We see that phone clients have higher mean favorites than tweet deck which is interesting!

tmls %>% group_by(screen_name,source,hour) %>%  summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%  
  arrange(hour) %>%  ggplot(aes(x=hour,y=favorite_count_Mean,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Mean_Favorite by hour')+ facet_wrap(~screen_name)

tmls %>% group_by(screen_name,source,hour) %>%  summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%  
  arrange(hour) %>%  ggplot(aes(x=hour,y=favorite_count_Median,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Median_Favorite by hour')+ facet_wrap(~screen_name)

Retweets

Let’s plot number of Retweets of these portals

Daywise

Lets plot mean retweets by day

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Retweets')

We see OpIndia has massive advantage here..

Let’s plot frequency since 1st jan 2018 for common scale

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')

We see clearly that opindia is leader in Favorites by quite a margin followed by wire and swarajya , while Scroll is at bottom

Lets plot median numbers now

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')

We see than median high of 200 plus earlier and now lesser, because tweeting frequency has increased and hence the best outliers are not able to drive the numbers..So it shouldnt be confused with worse performance.

Since 1st Jan

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
ggplot(aes(x=day,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')

Week-day wise

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
  ggplot(aes(x=weekday,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+
  labs(title='Mean_Retweets on weekdays',
       subtitle=' OpIndia and Wire  have highest retweets on sunday/n
       Swarajya marginally overtakes wire on saturday',
       caption='source Twitter REST API')

We see even though OpIndia tweets less frequently on sunday its maximum retweets are on sunday, Wire maintains a consistent lead over swarajya but dips on saturday

tmls %>% filter(is_retweet=="FALSE") %>% 
 # filter(!(screen_name=="scroll_in")) %>% 
 # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,weekday) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  
  ggplot(aes(x=weekday,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')+
  labs(title='Median_Retweets on weekdays',
       subtitle=' OpIndia and Wire  have highest retweets on sunday/n
       Swarajya marginally overtakes wire on saturday',
       caption='source Twitter REST API')

Hour-wise

tmls %>% filter(is_retweet=="FALSE") %>% 
  #filter(!(screen_name=="scroll_in")) %>% 
  # filter(created_at >= "2018-1-1") %>%
  group_by(screen_name,hour)  %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  ggplot(aes(x=hour,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+theme_economist()+
  labs(title='Mean Retweets hour-wise',
       
       caption='source Twitter REST API')

We see that OpIndia and Wire have high retweets at 12 AM possibly due to NRI readers similar to its favorite pattern. Let’s exclude that anaomalous wire tweet betweet 3 and 5

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
  group_by(screen_name,hour)  %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  ggplot(aes(x=hour,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+theme_economist()+
  labs(title='Mean Retweets hour-wise',
       
       caption='source Twitter REST API')

So we see Retweet tweets hour wise now without anomaly

tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>% 
  group_by(screen_name,hour)  %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  ggplot(aes(x=hour,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()+scale_colour_discrete(name  ="News Portal",
                                   labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')+theme_economist()+
  labs(title='Median Retweets hour-wise',
       
       caption='source Twitter REST API')

Source-Wise

knitr::kable(tmls  %>% group_by(screen_name,source) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
               rename(News_portal=screen_name) %>% arrange(News_portal,mean_rt))
News_portal source n mean_rt mean_fav median_rt median_fav
OpIndia_com Twitter for Android 132 120.446970 46.6212121 50.5 0.0
OpIndia_com TweetDeck 2603 125.940838 138.3177103 66.0 86.0
OpIndia_com Twitter for iPhone 125 155.832000 168.6720000 96.0 105.0
OpIndia_com Twitter Web Client 107 174.140187 184.4112150 100.0 119.0
OpIndia_com Twitter for Windows 280 216.371429 177.1035714 119.5 106.0
scroll_in Buffer 2257 3.337173 6.0381037 1.0 3.0
scroll_in TweetDeck 499 4.641283 0.0040080 3.0 0.0
scroll_in Twitter for Android 133 5.556391 0.0676692 2.0 0.0
scroll_in Twitter Web Client 215 7.809302 2.5348837 2.0 0.0
scroll_in Twitter for iPhone 78 11.064103 0.2307692 2.0 0.0
scroll_in Media Studio 30 20.000000 29.5666667 10.0 14.5
SwarajyaMag Hootsuite 4 1.250000 6.0000000 0.0 5.5
SwarajyaMag dlvr.it 2 4.500000 14.0000000 4.5 14.0
SwarajyaMag Twitter Lite 11 6.545454 13.1818182 6.0 12.0
SwarajyaMag SocialOomph 22 8.454546 12.2272727 3.5 6.5
SwarajyaMag Twitter Ads Composer 50 16.420000 28.3400000 6.5 13.0
SwarajyaMag Bitly 55 18.509091 26.7818182 8.0 16.0
SwarajyaMag Dabr.eu - latest @Dabr build 7 20.142857 30.5714286 6.0 19.0
SwarajyaMag Buffer 1765 21.978470 32.4679887 9.0 17.0
SwarajyaMag Twitter Web Client 155 23.193548 31.0903226 10.0 17.0
SwarajyaMag Twitter for Android 80 23.387500 38.1625000 7.0 11.5
SwarajyaMag TweetDeck 1031 23.462658 30.8806984 8.0 14.0
SwarajyaMag Plume for Android 28 24.035714 27.3214286 11.0 22.0
SwarajyaMag Nuzzel 8 45.750000 75.2500000 14.0 30.0
SwarajyaMag Twitter Ads 21 68.428571 90.6190476 22.0 29.0
SwarajyaMag Twitter for iPhone 6 77.666667 1.0000000 40.0 0.0
thewire_in TweetDeck 2950 29.267797 47.2691525 12.0 23.0
thewire_in Twitter Web Client 203 31.970443 27.9408867 8.0 6.0
thewire_in Twitter for iPhone 72 50.486111 20.9583333 16.0 0.0
thewire_in Media Studio 8 64.125000 125.2500000 47.0 116.5

We see that Twitter for Windows client has highest mean retweets for opindia.!

tmls %>% group_by(screen_name,source,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=mean_rt,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Mean Retweet by hour')+ facet_wrap(~screen_name)

tmls %>% group_by(screen_name,source,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
                                          median_rt=median(retweet_count),median_fav=median(favorite_count)) %>% 
  arrange(hour) %>%  ggplot(aes(x=hour,y=median_rt,color=source,group=source))+ geom_point()+
  geom_line() + labs(y='Median Retweet by hour')+ facet_wrap(~screen_name)

Sentiment analysis

We will do sentiment analysis of text of headlines of all these newsportals using three established resources in english language based on Bing , Afinn and sentimentr lexicon . Big and Afinn calculate sentiments of individual words and sums them up while sentimentr deals with valences.

We shall do sentiment analysis with all these tools and see if there is consistency in analysis..

Let’s do first calculate sentiment of text by sentimentr package

library(sentimentr)
tmls %>% mutate(senti=sentiment_by(tmls$text,by=NULL)$ave_sentiment) %>% group_by(screen_name) %>% summarise(means=mean(senti))
## # A tibble: 4 x 2
##   screen_name    means
##   <chr>          <dbl>
## 1 OpIndia_com -0.0624 
## 2 scroll_in   -0.00528
## 3 SwarajyaMag  0.0206 
## 4 thewire_in  -0.0254

We see OpIndia has most negative sentences followed by wire and Scroll .Swarajya has mostly positive headlines.

Lets run this anlysis by Bing lexicon.

tmlz=tmls %>% select(text,screen_name,retweet_count,favorite_count,status_id,created_at,month,day,hour,weekday) %>% 
  mutate(tweetnumber=row_number())


bing <- get_sentiments("bing")

afinn = get_sentiments('afinn')

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% group_by(screen_name) %>% summarize(mean_sentiment=mean(sentiment),
                                                                                  sd_sentiment=sd(sentiment),variability=
                                                                                    sd_sentiment/abs(mean_sentiment)) %>% 
  mutate(method="Bing")
## # A tibble: 4 x 5
##   screen_name mean_sentiment sd_sentiment variability method
##   <chr>                <dbl>        <dbl>       <dbl> <chr> 
## 1 OpIndia_com        -0.613          1.27        2.07 Bing  
## 2 scroll_in          -0.148          1.40        9.47 Bing  
## 3 SwarajyaMag         0.0889         1.43       16.0  Bing  
## 4 thewire_in         -0.349          1.33        3.81 Bing

Even Bing shows similar pattern. However we see that variability in sentiments in headlines of swarajya is highest followed while in opindia it is lowest, indicating possible role of multiple people rather than single person in deciding headlines and more varied stories

Lets see if this trend hold in higher retweets

tmlz %>% filter(retweet_count>50) %>% 
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% group_by(screen_name) %>% summarize(mean_sentiment=mean(sentiment),
                                                                                  sd_sentiment=sd(sentiment),variability=
                                                                                    sd_sentiment/abs(mean_sentiment)) %>% 
  mutate(method="Bing")
## # A tibble: 4 x 5
##   screen_name mean_sentiment sd_sentiment variability method
##   <chr>                <dbl>        <dbl>       <dbl> <chr> 
## 1 OpIndia_com         -0.674         1.24        1.85 Bing  
## 2 scroll_in            0.267         1.33        5.00 Bing  
## 3 SwarajyaMag         -0.208         1.54        7.38 Bing  
## 4 thewire_in          -0.467         1.38        2.96 Bing

We see that in news with retweet higher than 50 , mean sentiment is more negative indicating negative news gets tweeted most.

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(afinn) %>%
  group_by(screen_name) %>% 
  summarise(n=n(),sentiment = sum(score),mean_sentiment=sentiment/n,sd_sentiment=sd(score),variability=
                                                                                    sd_sentiment/abs(mean_sentiment),
            
            method="Afinn")%>% select(-sentiment)
## # A tibble: 4 x 6
##   screen_name     n mean_sentiment sd_sentiment variability method
##   <chr>       <int>          <dbl>        <dbl>       <dbl> <chr> 
## 1 OpIndia_com  3296        -0.734          2.00        2.72 Afinn 
## 2 scroll_in    3059        -0.0778         2.22       28.5  Afinn 
## 3 SwarajyaMag  2922         0.129          2.00       15.5  Afinn 
## 4 thewire_in   3066        -0.405          2.00        4.93 Afinn

Afinn score again suggest similar trend, though suggesting wider variation in Scroll headlines

Lets look at how sentiment varies with day

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% 
  ggplot( aes(day, sentiment, fill = screen_name)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~screen_name, ncol = 2, scales = "free_x")

**We see that Opindia and wire mantain a consistent critical tone, while Scroll and swarajya have varying sentiments, in February articles of Opindia have been of more engative sentiment.*

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,weekday, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% 
  ggplot( aes(weekday, sentiment, fill = screen_name)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~screen_name, ncol = 2, scales = "free_x")

We see OPIndia posts most critical headlines on wednesday

Now we analyse by hour

tmlz %>%
  unnest_tokens(word, text) %>% 
  inner_join(bing) %>%
  count(created_at,screen_name,retweet_count,favorite_count,hour, index = tweetnumber , sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative) %>% 
  ggplot( aes(hour, sentiment, fill = screen_name)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~screen_name, ncol = 2, scales = "free_x")

We see the cumulative sentiment by hour also depends on tweeting frequency of news portals at particular hour, thus this pattern resmebles activity.

WordCloud

Lets see word cloud of OPIndia

cmw = c("media","omitted","http","https","html","www",".com","t.co","to","the","of","by","and","an","its","writes",
         "we","as","that","how","after","a","for","in","from","with","on","rt","now","him","about","his","this","are",
        "while","no","but","is","what","who","they","you","has","had","have","svaradarajan")
tmlz %>%
  unnest_tokens(word, text) %>% 
  anti_join(stop_words) %>% 
  filter(!(word %in% cmw)) %>% 
 # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="OpIndia_com") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

Expectedly congress dominates as opindia is critical of it.

Lets see wordcloud in retweets higher than 50

tmlz %>%
  unnest_tokens(word, text) %>% 
  anti_join(stop_words) %>% 
  filter(retweet_count>50) %>% 
  filter(!(word %in% cmw)) %>% 
 # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="OpIndia_com") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

we see congress,gandhi,muslim appear prominent in headlines as opindia readership likes these articles

Lets see corresponding figures for wire

tmlz %>%
  unnest_tokens(word, text) %>% 
anti_join(stop_words) %>% 
  filter(!(word %in% cmw)) %>% 
  #filter(retweet_count>50) %>% 
  # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="thewire_in") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

in Wire wordcloud modi,bjp is more prominent, it also talks about india.

Lets see highly retweeted articles of Wire.

tmlz %>%
  unnest_tokens(word, text) %>% 
anti_join(stop_words) %>%
  filter(retweet_count>50) %>% 
  filter(!(word %in% cmw)) %>% 
  #filter(retweet_count>50) %>% 
  # filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>% 
  filter(screen_name=="thewire_in") %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 50))

We can clearly see that articles with modi,govt find more prominence in highly retweeted articles of wire indicating its leadership wants to read these articles

Facebook

We now analyse facebook archive briefly just focussing on which News portals have higher prominence there or does it follow twitter trend ?

tmlfb=readRDS('tmlfb.RDS')

Lets’ examine the trend of share and Retweets, does it hold like twitter?

 knitr::kable(tmlfb %>% 
 
  group_by(screen_name) %>% summarise_at(.vars = c("likes_count","shares_count","engagement"),
                                                 .funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% rename(News_portal=screen_name) %>%  t(),digits=2)
News_portal OpIndia.com Scroll Swarajya TheWire.in
likes_count_Mean 255.6568 422.0168 324.7364 213.2176
shares_count_Mean 51.2300 608.9764 83.5996 147.8304
engagement_Mean 306.8868 1030.9932 408.3360 361.0480
likes_count_Median 59.0 58.5 115.0 46.0
shares_count_Median 11 16 12 9
engagement_Median 73.5 79.0 130.5 56.0
likes_count_Q25 21 23 45 15
shares_count_Q25 3 3 4 2
engagement_Q25 25 27 50 19
likes_count_Q75 140 182 289 143
shares_count_Q75 33.00 68.25 44.25 41.00
engagement_Q75 178.00 258.25 338.25 192.25
likes_count_SD 3375.9292 3222.6586 1327.2583 865.7914
shares_count_SD 195.2241 9584.8281 839.9081 1274.1077
engagement_SD 3424.264 12387.378 1722.826 2003.546
likes_count_minimum 0 0 0 0
shares_count_minimum 0 0 0 0
engagement_minimum 0 0 0 0
likes_count_maximum 152317 96129 57125 30296
shares_count_maximum 4966 421195 40082 36555
engagement_maximum 153262 517324 59940 66851
likes_count_Total 2500 2500 2500 2500
shares_count_Total 2500 2500 2500 2500
engagement_Total 2500 2500 2500 2500

Wow here we see that Scroll and Wire absolutely clobber Swarajya and opIndia..seems too good to be true..since Scroll didnt do well at all on twitter, which is the favorite site of news-lovers.

Probably, it has reasons in type of link being shared.Let’s analyse it

library(knitr)
kable(tmlfb %>% group_by(screen_name,type) %>% summarize(n=n(),mean_like = mean(likes_count),
                                                   mean_share = mean(shares_count),
                                                   mean_engagement = mean(engagement),
                                                   median_like = median(likes_count),
                                                   median_share = median(shares_count),
                                                   median_engagement = median(engagement)) %>% 
  mutate(proportion=round(100*n/2500,2)) %>% select(screen_name,type,n,proportion,everything()) %>% 
  arrange(type))
screen_name type n proportion mean_like mean_share mean_engagement median_like median_share median_engagement
TheWire.in event 1 0.04 52.00000 0.00000 52.00000 52.0 0.0 52.0
OpIndia.com link 2339 93.56 260.81317 45.89312 306.70628 61.0 11.0 74.0
Scroll link 1141 45.64 83.37248 17.06310 100.43558 30.0 3.0 35.0
Swarajya link 1970 78.80 283.01929 47.21168 330.23096 120.0 13.0 135.5
TheWire.in link 1959 78.36 132.00000 28.81879 160.81879 34.0 6.0 41.0
TheWire.in note 1 0.04 18.00000 1.00000 19.00000 18.0 1.0 19.0
Swarajya offer 1 0.04 880.00000 0.00000 880.00000 880.0 0.0 880.0
OpIndia.com photo 123 4.92 114.43089 46.43902 160.86992 40.0 6.0 43.0
Scroll photo 74 2.96 280.43243 32.62162 313.05405 48.5 4.0 49.5
Swarajya photo 348 13.92 223.25575 34.47989 257.73563 49.0 3.0 53.0
TheWire.in photo 59 2.36 132.22034 15.49153 147.71186 13.0 3.0 16.0
OpIndia.com status 20 0.80 149.70000 29.55000 179.25000 17.0 4.0 23.0
Swarajya status 2 0.08 15.00000 2.50000 17.50000 15.0 2.5 17.5
TheWire.in status 23 0.92 29.13043 16.52174 45.65217 14.0 4.0 22.0
OpIndia.com video 18 0.72 668.38889 801.55556 1469.94444 273.0 78.5 340.0
Scroll video 1285 51.40 730.86537 1167.74942 1898.61479 107.0 52.0 164.0
Swarajya video 179 7.16 981.50838 580.93855 1562.44693 228.0 63.0 310.0
TheWire.in video 457 18.28 581.87090 682.33042 1264.20131 210.0 83.0 313.0

Aha so we can clearly see here OpIndia and Swarajya dominate even here in news link category and indeed these form the major part of their public page posts, but they have very low presence in video and photo posts which typically get higher likes n shares..probably it is due to nature of crowd on facebook.. Also a lesson for these RW digital outlets to step up their game in video links..

We will focus on links for now..

tmlfb %>% # filter(is_retweet=="FALSE") %>% 
  # filter(!(screen_name=="scroll_in")) %>% 
  filter(type=="link") %>% 
 # filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% 
  
  summarise(mean_like = mean(likes_count),
            mean_share = mean(shares_count),
            mean_engagement = mean(engagement)) %>% 
  
  ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()

We see OPindia and swarajya dominate the link space on facebook, but of late we see dominance of wire. Lets zoom on this graph after december

tmlfb %>% # filter(is_retweet=="FALSE") %>% 
  # filter(!(screen_name=="scroll_in")) %>% 
  filter(type=="link") %>% 
  filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% 
  
  summarise(mean_like = mean(likes_count),
            mean_share = mean(shares_count),
            mean_engagement = mean(engagement)) %>% 
  
  ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()

We can see clearly Jan 15 onwards, Wire is doing great while OpIndia is slacking out on facebook

Lets lookat video

tmlfb %>% # filter(is_retweet=="FALSE") %>% 
  # filter(!(screen_name=="scroll_in")) %>% 
  filter(type=="video") %>% 
  #filter(created_at >= "2017-12-31") %>%
  group_by(screen_name,day) %>% 
  
  summarise(mean_like = mean(likes_count),
            mean_share = mean(shares_count),
            mean_engagement = mean(engagement)) %>% 
  
  ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
  geom_line()

We can see clearly that Scroll dominates video, but of late wire is on rise here as well. Swarajya and OpIndia seem less invested in video.

Key takeaways:

1.OPIndia dominates twitter by a big margin

2.Wire is on ascendancy on facebook,with aggressive video and link shares and likes

3.OPIndia and Wire are more extreme outlets with opposite points of view and negative sentiments

4.Swarajya and Scroll have more varied articles

5.News shared from phone clients have higher resonance

6.OpIndia and Swarajya have less original links on weekends, but maximum retweets and likes happen on these days

7.Video articles on Facebook shared and liked to higher degree

8.Swarajya and OPIndia dominate news Link space on facebook as well

Disclosure: I have right of centre views and have written article for OpIndia, however I have provide downloadable archive in the beginning and R code for analysis is self-contained in this page