I was socially enginnered into this post by this tweet which claimed that news portal opIndia had only five readers
Editor of OpIndia has extended an apology to all their readers. 80% of their readers have accepted it. Could have been 100% but 1 person hasn’t read it yet
— Joy (@Joydas) March 3, 2018
This tweet got me thinking into doing a comparative analysis of like/retweet/favorite/share of social media content of four new age digital media portals (OpIndia,Wire,Scroll,Swarajya). I couldnt find anything relevant on the web ecept for conflicting reports about their credibility on web so i went ahead
This blog post will deal with following issues:
Compare activity,retweets and favourite status of equal number of tweets shared by all four portals
Compare activity,shares and like status of equal number of facebook p ostsshared by all four portals
Compare Contents via Wordcloud and do sentiment analysis of all these media outlets to verify their bias
Plot engagement and activity variability vis a vis weekday and hour
tmls=readRDS('tmls.RDS')
library(tidyverse)
library(tidytext)
library(wordcloud)
library(ggthemes)
library(ggridges)
library(knitr)
Q25 = function(x){
quantile(x,0.25)}
Q75 = function(x){
quantile(x,0.75)}
Equal number of (3200 tweets) were extracted for all four news portals from twitter api . The archive can be found here. similarly,Equal number of (2500 posts) were extracted for all four news portals from facebook api . The archive can be found here and here Data was cleaned, date,month,weekday variables were extracted for plotting. Text was cleaned , sentiment analysis was done as per unigrams by bing and AFINN snetiment lexicon. Word cloud was drawn on the whole and for tweets and posts in upper quartile of each category to compare which tweets/posts are more popular with readership.
We will examine activity,retweets and favorites of the tweets by these news-portlas first
Lets see number and percentage of retweets
library(knitr)
kable(tmls %>% group_by(screen_name) %>% summarize(n=n(),percentage_retweet=round(100*sum(is_retweet=="TRUE")/n,2)))
| screen_name | n | percentage_retweet |
|---|---|---|
| OpIndia_com | 3247 | 6.16 |
| scroll_in | 3212 | 25.81 |
| SwarajyaMag | 3245 | 0.46 |
| thewire_in | 3233 | 7.64 |
| Thus we see al | most<sp | an style=“color:orange”>25% of tweets by Scroll are retweets |
From Now on on we will deal with original tweets unless specified otherwise
Let’s plot tweeting frequency of these portals
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>%
mutate(rt_fav_ratio = mean_rt/mean_fav) %>%
ggplot(aes(x=day,y=n,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')
Hmm, so we see that for equal number of tweets(3200), OpIndia tweets with lesser frequency/day than wire or scroll or swarajya(which posted 3200 tweets i.e 1600/month in around 2 month) We also see that this year Scroll handle tweets ina bot like fashion in extremely high frequency per day and made almost 3200 tweets in month of Feb alone.
We also see that since mid january activity of opIndia has increased beyond their usual levels.
Let’s plot frequency since 1st jan 2018 for common scale
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>%
mutate(rt_fav_ratio = mean_rt/mean_fav) %>%
ggplot(aes(x=day,y=n,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')
We see clearly that opindia,wire and swarajya have natural variance of tweeting opposed to Scroll which tweets in bot mode.
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
# filter(created_at >= "2018-1-1") %>%
group_by(screen_name,weekday) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>%
mutate(rt_fav_ratio = mean_rt/mean_fav) %>%
ggplot(aes(x=weekday,y=n,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')+theme_economist()+
labs(title='Activity on weekdays',
subtitle='Swarajya and OpIndia have lower activity on weekends',
caption='source Twitter REST API')
We see Activity of OpIndia and Swarajya is lower on Weekends.
tmls %>% filter(is_retweet=="FALSE") %>%
#filter(!(screen_name=="scroll_in")) %>%
# filter(created_at >= "2018-1-1") %>%
group_by(screen_name,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count)) %>%
mutate(rt_fav_ratio = mean_rt/mean_fav) %>%
ggplot(aes(x=hour,y=n,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Activity')+theme_economist()+
labs(title='Activity hour-wise',
subtitle='OpIndia tweets in mid night as well, Scroll and Wire \nstart tweeting from 9 AM',
caption='source Twitter REST API')
We see Scroll and Wire start tweeting from 9 AM while OpIndia is active in midnight as well and swarajya handle has a varying schedule
knitr::kable(tmls %>% group_by(screen_name,source) %>% summarise(n=n()) %>% mutate(Proportion = round(100*n/sum(n),2)) %>% rename(News_portal=screen_name) %>% arrange(News_portal,desc(Proportion)))
| News_portal | source | n | Proportion |
|---|---|---|---|
| OpIndia_com | TweetDeck | 2603 | 80.17 |
| OpIndia_com | Twitter for Windows | 280 | 8.62 |
| OpIndia_com | Twitter for Android | 132 | 4.07 |
| OpIndia_com | Twitter for iPhone | 125 | 3.85 |
| OpIndia_com | Twitter Web Client | 107 | 3.30 |
| scroll_in | Buffer | 2257 | 70.27 |
| scroll_in | TweetDeck | 499 | 15.54 |
| scroll_in | Twitter Web Client | 215 | 6.69 |
| scroll_in | Twitter for Android | 133 | 4.14 |
| scroll_in | Twitter for iPhone | 78 | 2.43 |
| scroll_in | Media Studio | 30 | 0.93 |
| SwarajyaMag | Buffer | 1765 | 54.39 |
| SwarajyaMag | TweetDeck | 1031 | 31.77 |
| SwarajyaMag | Twitter Web Client | 155 | 4.78 |
| SwarajyaMag | Twitter for Android | 80 | 2.47 |
| SwarajyaMag | Bitly | 55 | 1.69 |
| SwarajyaMag | Twitter Ads Composer | 50 | 1.54 |
| SwarajyaMag | Plume for Android | 28 | 0.86 |
| SwarajyaMag | SocialOomph | 22 | 0.68 |
| SwarajyaMag | Twitter Ads | 21 | 0.65 |
| SwarajyaMag | Twitter Lite | 11 | 0.34 |
| SwarajyaMag | Nuzzel | 8 | 0.25 |
| SwarajyaMag | Dabr.eu - latest @Dabr build | 7 | 0.22 |
| SwarajyaMag | Twitter for iPhone | 6 | 0.18 |
| SwarajyaMag | Hootsuite | 4 | 0.12 |
| SwarajyaMag | dlvr.it | 2 | 0.06 |
| thewire_in | TweetDeck | 2950 | 91.25 |
| thewire_in | Twitter Web Client | 203 | 6.28 |
| thewire_in | Twitter for iPhone | 72 | 2.23 |
| thewire_in | Media Studio | 8 | 0.25 |
We see that Op India and Wire use Tweetdeck to schedule their tweets while Swarajya and Scroll rely mainly on Buffer.
We also see Opindia makes almost 15% of tweets by phone(Android,iphone and surprise Windows!) while other portals make lesser number of tweets by phones, Wire handle only uses iPhone..Swarajya handle has tried all kind of feeds to schedule their posts.
Lets visualise
tmls %>% filter(screen_name=="OpIndia_com") %>% group_by(source,hour) %>% summarise(n=n()) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
geom_line() + labs(y='Activity',title='OpIndia')
tmls %>% filter(screen_name=="thewire_in") %>% group_by(source,hour) %>% summarise(n=n()) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
geom_line()+ labs(y='Activity',title='Wire')
tmls %>% filter(screen_name=="scroll_in") %>% group_by(source,hour) %>% summarise(n=n()) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
geom_line()+ labs(y='Activity',title='Scroll')
tmls %>% filter(screen_name=="SwarajyaMag") %>% group_by(source,hour) %>% summarise(n=n()) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=n,color=source,group=source))+ geom_point()+
geom_line()+ labs(y='Activity',title='Swarajya')
Lets see summary stats:
Lets calculate summary stats (mean favorites,retweets per tweet of each news portal)
knitr::kable(tmls %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% rename(News_portal=screen_name) %>% t(),digits=2)
| News_portal | OpIndia_com | scroll_in | SwarajyaMag | thewire_in |
| retweet_count_Mean | 136.778799 | 3.554763 | 21.848916 | 28.004689 |
| favorite_count_Mean | 149.852314 | 6.331935 | 32.155418 | 49.439719 |
| engagement_Mean | 286.631113 | 9.886697 | 54.004334 | 77.444407 |
| retweet_count_Median | 72 | 1 | 9 | 11 |
| favorite_count_Median | 92 | 3 | 16 | 23 |
| engagement_Median | 166 | 5 | 24 | 35 |
| retweet_count_Q25 | 37 | 0 | 4 | 6 |
| favorite_count_Q25 | 50 | 1 | 8 | 13 |
| engagement_Q25 | 90 | 2 | 12 | 19 |
| retweet_count_Q75 | 158.5 | 3.0 | 20.0 | 25.0 |
| favorite_count_Q75 | 173 | 6 | 33 | 46 |
| engagement_Q75 | 331 | 9 | 52 | 72 |
| retweet_count_SD | 191.913683 | 9.927373 | 49.704859 | 62.102115 |
| favorite_count_SD | 188.98465 | 15.85491 | 63.75643 | 94.05176 |
| engagement_SD | 373.6480 | 24.6845 | 110.2582 | 154.6132 |
| retweet_count_minimum | 1 | 0 | 0 | 0 |
| favorite_count_minimum | 2 | 0 | 0 | 0 |
| engagement_minimum | 3 | 0 | 0 | 1 |
| retweet_count_maximum | 2720 | 248 | 827 | 1355 |
| favorite_count_maximum | 2562 | 361 | 1591 | 2041 |
| engagement_maximum | 5282 | 451 | 2264 | 3396 |
| retweet_count_Total | 3047 | 2383 | 3230 | 2986 |
| favorite_count_Total | 3047 | 2383 | 3230 | 2986 |
| engagement_Total | 3047 | 2383 | 3230 | 2986 |
Thus we see OpIndia and Wire lead considerable in mean favorites and retweets over Swarajya while scroll’s performance on twitter is poor. OpIndia seems focussed on twitter and performs very well(five fold), here as opposed to provocative tweet
Let’s plot number of favorites of these portals
Lets plot mean favorites
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=day,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')
Lets plot median favorites
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=day,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')
We see OpIndia has massive advantage here..
Let’s plot frequency since 1st jan 2018 for common scale
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=day,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')
We see clearly that opindia is leader in Favorites by quite a margin followed by wire swarajya
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=day,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')
Lets plot mean Favorites weekday wise
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
# filter(created_at >= "2018-1-1") %>%
group_by(screen_name,weekday) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=weekday,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+
labs(title='Mean Favorites on weekdays',
subtitle=' OpIndia and Wire have highest Favorites on sunday',
caption='source Twitter REST API')
We see even though OpIndia tweets less frequently on sunday its maximum favorites are on sunday, Wire maintains a consistent lead over swarajya.
Lets plot median Favorite weekday wise
tmls %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name,weekday) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=weekday,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+
labs(title='Median Favorites on weekdays',
subtitle=' OpIndia and Wire have highest Favorites on sunday',
caption='source Twitter REST API')
Median is more representative than mean in skewed distributions as it shields from extremes as in this case
library(ggthemes)
tmls %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=hour,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')+theme_economist()+
labs(title='Favorites hour-wise',
caption='source Twitter REST API')
We see that OpIndia and Wire have high favorites at 12 AM possibly due to NRI readers, but what explains the bump in mean favorite at 3 AM in wire given that its tweeting frequency is very less, it is possibly due to a tweet which is very popular
Let’s search for wire tweet betweet 3 and 5
library(knitr)
kable(tmls %>% filter(between(hour,2,5)) %>% filter(screen_name=="thewire_in") %>% pull(text)
)
| x |
|---|
| Veteran actress Sridevi passes away at 54 https://t.co/DavcQjqO8w https://t.co/2XX3dYOcAr |
| ‘Jan Gan Man Ki Baat’ episode 193: Jammu and Kashmir and governance in Uttar Pradesh https://t.co/CmjnTOFzw7 |
So we see the mean is propped up by very popular Sridevi death tweet..
so lets do analysis without this tweet..
tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=hour,y=favorite_count_Mean,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean_Favorites')+theme_economist()+
labs(title='Favorites hour-wise',
caption='source Twitter REST API')
So we see Favorite tweets hour wise now without anomaly
Lets visualise Median now
tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
ggplot(aes(x=hour,y=favorite_count_Median,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median_Favorites')+theme_economist()+
labs(title='Favorites hour-wise',
caption='source Twitter REST API')
knitr::kable(tmls %>% group_by(screen_name,source) %>% summarise(mean_fav=mean(favorite_count),meadian_fav=median(favorite_count)) %>%
rename(News_portal=screen_name) %>% arrange(News_portal,mean_fav))
| News_portal | source | mean_fav | meadian_fav |
|---|---|---|---|
| OpIndia_com | Twitter for Android | 46.6212121 | 0.0 |
| OpIndia_com | TweetDeck | 138.3177103 | 86.0 |
| OpIndia_com | Twitter for iPhone | 168.6720000 | 105.0 |
| OpIndia_com | Twitter for Windows | 177.1035714 | 106.0 |
| OpIndia_com | Twitter Web Client | 184.4112150 | 119.0 |
| scroll_in | TweetDeck | 0.0040080 | 0.0 |
| scroll_in | Twitter for Android | 0.0676692 | 0.0 |
| scroll_in | Twitter for iPhone | 0.2307692 | 0.0 |
| scroll_in | Twitter Web Client | 2.5348837 | 0.0 |
| scroll_in | Buffer | 6.0381037 | 3.0 |
| scroll_in | Media Studio | 29.5666667 | 14.5 |
| SwarajyaMag | Twitter for iPhone | 1.0000000 | 0.0 |
| SwarajyaMag | Hootsuite | 6.0000000 | 5.5 |
| SwarajyaMag | SocialOomph | 12.2272727 | 6.5 |
| SwarajyaMag | Twitter Lite | 13.1818182 | 12.0 |
| SwarajyaMag | dlvr.it | 14.0000000 | 14.0 |
| SwarajyaMag | Bitly | 26.7818182 | 16.0 |
| SwarajyaMag | Plume for Android | 27.3214286 | 22.0 |
| SwarajyaMag | Twitter Ads Composer | 28.3400000 | 13.0 |
| SwarajyaMag | Dabr.eu - latest @Dabr build | 30.5714286 | 19.0 |
| SwarajyaMag | TweetDeck | 30.8806984 | 14.0 |
| SwarajyaMag | Twitter Web Client | 31.0903226 | 17.0 |
| SwarajyaMag | Buffer | 32.4679887 | 17.0 |
| SwarajyaMag | Twitter for Android | 38.1625000 | 11.5 |
| SwarajyaMag | Nuzzel | 75.2500000 | 30.0 |
| SwarajyaMag | Twitter Ads | 90.6190476 | 29.0 |
| thewire_in | Twitter for iPhone | 20.9583333 | 0.0 |
| thewire_in | Twitter Web Client | 27.9408867 | 6.0 |
| thewire_in | TweetDeck | 47.2691525 | 23.0 |
| thewire_in | Media Studio | 125.2500000 | 116.5 |
We see that phone clients have higher mean favorites than tweet deck which is interesting!
tmls %>% group_by(screen_name,source,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=favorite_count_Mean,color=source,group=source))+ geom_point()+
geom_line() + labs(y='Mean_Favorite by hour')+ facet_wrap(~screen_name)
tmls %>% group_by(screen_name,source,hour) %>% summarise_at(.vars = c("retweet_count","favorite_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=favorite_count_Median,color=source,group=source))+ geom_point()+
geom_line() + labs(y='Median_Favorite by hour')+ facet_wrap(~screen_name)
Let’s plot number of Retweets of these portals
Lets plot mean retweets by day
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=day,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Retweets')
We see OpIndia has massive advantage here..
Let’s plot frequency since 1st jan 2018 for common scale
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=day,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')
We see clearly that opindia is leader in Favorites by quite a margin followed by wire and swarajya , while Scroll is at bottom
Lets plot median numbers now
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=day,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')
We see than median high of 200 plus earlier and now lesser, because tweeting frequency has increased and hence the best outliers are not able to drive the numbers..So it shouldnt be confused with worse performance.
Since 1st Jan
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=day,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
# filter(created_at >= "2018-1-1") %>%
group_by(screen_name,weekday) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=weekday,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+
labs(title='Mean_Retweets on weekdays',
subtitle=' OpIndia and Wire have highest retweets on sunday/n
Swarajya marginally overtakes wire on saturday',
caption='source Twitter REST API')
We see even though OpIndia tweets less frequently on sunday its maximum retweets are on sunday, Wire maintains a consistent lead over swarajya but dips on saturday
tmls %>% filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
# filter(created_at >= "2018-1-1") %>%
group_by(screen_name,weekday) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=weekday,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')+
labs(title='Median_Retweets on weekdays',
subtitle=' OpIndia and Wire have highest retweets on sunday/n
Swarajya marginally overtakes wire on saturday',
caption='source Twitter REST API')
tmls %>% filter(is_retweet=="FALSE") %>%
#filter(!(screen_name=="scroll_in")) %>%
# filter(created_at >= "2018-1-1") %>%
group_by(screen_name,hour) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=hour,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+theme_economist()+
labs(title='Mean Retweets hour-wise',
caption='source Twitter REST API')
We see that OpIndia and Wire have high retweets at 12 AM possibly due to NRI readers similar to its favorite pattern. Let’s exclude that anaomalous wire tweet betweet 3 and 5
tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name,hour) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=hour,y=mean_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Mean Retweets')+theme_economist()+
labs(title='Mean Retweets hour-wise',
caption='source Twitter REST API')
So we see Retweet tweets hour wise now without anomaly
tmls %>% filter(!status_id=="967539560753242113") %>% filter(is_retweet=="FALSE") %>%
group_by(screen_name,hour) %>%summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
ggplot(aes(x=hour,y=median_rt,color=screen_name,group=screen_name))+ geom_point()+
geom_line()+scale_colour_discrete(name ="News Portal",
labels=c("OpIndia", "Scroll","Swarajya","Wire"))+ylab('Median Retweets')+theme_economist()+
labs(title='Median Retweets hour-wise',
caption='source Twitter REST API')
knitr::kable(tmls %>% group_by(screen_name,source) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
rename(News_portal=screen_name) %>% arrange(News_portal,mean_rt))
| News_portal | source | n | mean_rt | mean_fav | median_rt | median_fav |
|---|---|---|---|---|---|---|
| OpIndia_com | Twitter for Android | 132 | 120.446970 | 46.6212121 | 50.5 | 0.0 |
| OpIndia_com | TweetDeck | 2603 | 125.940838 | 138.3177103 | 66.0 | 86.0 |
| OpIndia_com | Twitter for iPhone | 125 | 155.832000 | 168.6720000 | 96.0 | 105.0 |
| OpIndia_com | Twitter Web Client | 107 | 174.140187 | 184.4112150 | 100.0 | 119.0 |
| OpIndia_com | Twitter for Windows | 280 | 216.371429 | 177.1035714 | 119.5 | 106.0 |
| scroll_in | Buffer | 2257 | 3.337173 | 6.0381037 | 1.0 | 3.0 |
| scroll_in | TweetDeck | 499 | 4.641283 | 0.0040080 | 3.0 | 0.0 |
| scroll_in | Twitter for Android | 133 | 5.556391 | 0.0676692 | 2.0 | 0.0 |
| scroll_in | Twitter Web Client | 215 | 7.809302 | 2.5348837 | 2.0 | 0.0 |
| scroll_in | Twitter for iPhone | 78 | 11.064103 | 0.2307692 | 2.0 | 0.0 |
| scroll_in | Media Studio | 30 | 20.000000 | 29.5666667 | 10.0 | 14.5 |
| SwarajyaMag | Hootsuite | 4 | 1.250000 | 6.0000000 | 0.0 | 5.5 |
| SwarajyaMag | dlvr.it | 2 | 4.500000 | 14.0000000 | 4.5 | 14.0 |
| SwarajyaMag | Twitter Lite | 11 | 6.545454 | 13.1818182 | 6.0 | 12.0 |
| SwarajyaMag | SocialOomph | 22 | 8.454546 | 12.2272727 | 3.5 | 6.5 |
| SwarajyaMag | Twitter Ads Composer | 50 | 16.420000 | 28.3400000 | 6.5 | 13.0 |
| SwarajyaMag | Bitly | 55 | 18.509091 | 26.7818182 | 8.0 | 16.0 |
| SwarajyaMag | Dabr.eu - latest @Dabr build | 7 | 20.142857 | 30.5714286 | 6.0 | 19.0 |
| SwarajyaMag | Buffer | 1765 | 21.978470 | 32.4679887 | 9.0 | 17.0 |
| SwarajyaMag | Twitter Web Client | 155 | 23.193548 | 31.0903226 | 10.0 | 17.0 |
| SwarajyaMag | Twitter for Android | 80 | 23.387500 | 38.1625000 | 7.0 | 11.5 |
| SwarajyaMag | TweetDeck | 1031 | 23.462658 | 30.8806984 | 8.0 | 14.0 |
| SwarajyaMag | Plume for Android | 28 | 24.035714 | 27.3214286 | 11.0 | 22.0 |
| SwarajyaMag | Nuzzel | 8 | 45.750000 | 75.2500000 | 14.0 | 30.0 |
| SwarajyaMag | Twitter Ads | 21 | 68.428571 | 90.6190476 | 22.0 | 29.0 |
| SwarajyaMag | Twitter for iPhone | 6 | 77.666667 | 1.0000000 | 40.0 | 0.0 |
| thewire_in | TweetDeck | 2950 | 29.267797 | 47.2691525 | 12.0 | 23.0 |
| thewire_in | Twitter Web Client | 203 | 31.970443 | 27.9408867 | 8.0 | 6.0 |
| thewire_in | Twitter for iPhone | 72 | 50.486111 | 20.9583333 | 16.0 | 0.0 |
| thewire_in | Media Studio | 8 | 64.125000 | 125.2500000 | 47.0 | 116.5 |
We see that Twitter for Windows client has highest mean retweets for opindia.!
tmls %>% group_by(screen_name,source,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=mean_rt,color=source,group=source))+ geom_point()+
geom_line() + labs(y='Mean Retweet by hour')+ facet_wrap(~screen_name)
tmls %>% group_by(screen_name,source,hour) %>% summarise(n=n(), mean_rt=mean(retweet_count),mean_fav=mean(favorite_count),
median_rt=median(retweet_count),median_fav=median(favorite_count)) %>%
arrange(hour) %>% ggplot(aes(x=hour,y=median_rt,color=source,group=source))+ geom_point()+
geom_line() + labs(y='Median Retweet by hour')+ facet_wrap(~screen_name)
We will do sentiment analysis of text of headlines of all these newsportals using three established resources in english language based on Bing , Afinn and sentimentr lexicon . Big and Afinn calculate sentiments of individual words and sums them up while sentimentr deals with valences.
We shall do sentiment analysis with all these tools and see if there is consistency in analysis..
Let’s do first calculate sentiment of text by sentimentr package
library(sentimentr)
tmls %>% mutate(senti=sentiment_by(tmls$text,by=NULL)$ave_sentiment) %>% group_by(screen_name) %>% summarise(means=mean(senti))
## # A tibble: 4 x 2
## screen_name means
## <chr> <dbl>
## 1 OpIndia_com -0.0624
## 2 scroll_in -0.00528
## 3 SwarajyaMag 0.0206
## 4 thewire_in -0.0254
We see OpIndia has most negative sentences followed by wire and Scroll .Swarajya has mostly positive headlines.
Lets run this anlysis by Bing lexicon.
tmlz=tmls %>% select(text,screen_name,retweet_count,favorite_count,status_id,created_at,month,day,hour,weekday) %>%
mutate(tweetnumber=row_number())
bing <- get_sentiments("bing")
afinn = get_sentiments('afinn')
tmlz %>%
unnest_tokens(word, text) %>%
inner_join(bing) %>%
count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>% group_by(screen_name) %>% summarize(mean_sentiment=mean(sentiment),
sd_sentiment=sd(sentiment),variability=
sd_sentiment/abs(mean_sentiment)) %>%
mutate(method="Bing")
## # A tibble: 4 x 5
## screen_name mean_sentiment sd_sentiment variability method
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 OpIndia_com -0.613 1.27 2.07 Bing
## 2 scroll_in -0.148 1.40 9.47 Bing
## 3 SwarajyaMag 0.0889 1.43 16.0 Bing
## 4 thewire_in -0.349 1.33 3.81 Bing
Even Bing shows similar pattern. However we see that variability in sentiments in headlines of swarajya is highest followed while in opindia it is lowest, indicating possible role of multiple people rather than single person in deciding headlines and more varied stories
Lets see if this trend hold in higher retweets
tmlz %>% filter(retweet_count>50) %>%
unnest_tokens(word, text) %>%
inner_join(bing) %>%
count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>% group_by(screen_name) %>% summarize(mean_sentiment=mean(sentiment),
sd_sentiment=sd(sentiment),variability=
sd_sentiment/abs(mean_sentiment)) %>%
mutate(method="Bing")
## # A tibble: 4 x 5
## screen_name mean_sentiment sd_sentiment variability method
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 OpIndia_com -0.674 1.24 1.85 Bing
## 2 scroll_in 0.267 1.33 5.00 Bing
## 3 SwarajyaMag -0.208 1.54 7.38 Bing
## 4 thewire_in -0.467 1.38 2.96 Bing
We see that in news with retweet higher than 50 , mean sentiment is more negative indicating negative news gets tweeted most.
tmlz %>%
unnest_tokens(word, text) %>%
inner_join(afinn) %>%
group_by(screen_name) %>%
summarise(n=n(),sentiment = sum(score),mean_sentiment=sentiment/n,sd_sentiment=sd(score),variability=
sd_sentiment/abs(mean_sentiment),
method="Afinn")%>% select(-sentiment)
## # A tibble: 4 x 6
## screen_name n mean_sentiment sd_sentiment variability method
## <chr> <int> <dbl> <dbl> <dbl> <chr>
## 1 OpIndia_com 3296 -0.734 2.00 2.72 Afinn
## 2 scroll_in 3059 -0.0778 2.22 28.5 Afinn
## 3 SwarajyaMag 2922 0.129 2.00 15.5 Afinn
## 4 thewire_in 3066 -0.405 2.00 4.93 Afinn
Afinn score again suggest similar trend, though suggesting wider variation in Scroll headlines
Lets look at how sentiment varies with day
tmlz %>%
unnest_tokens(word, text) %>%
inner_join(bing) %>%
count(created_at,screen_name,retweet_count,favorite_count,day, index = tweetnumber , sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>%
ggplot( aes(day, sentiment, fill = screen_name)) +
geom_bar(stat = "identity", show.legend = FALSE) +
facet_wrap(~screen_name, ncol = 2, scales = "free_x")
**We see that Opindia and wire mantain a consistent critical tone, while Scroll and swarajya have varying sentiments, in February articles of Opindia have been of more engative sentiment.*
tmlz %>%
unnest_tokens(word, text) %>%
inner_join(bing) %>%
count(created_at,screen_name,retweet_count,favorite_count,weekday, index = tweetnumber , sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>%
ggplot( aes(weekday, sentiment, fill = screen_name)) +
geom_bar(stat = "identity", show.legend = FALSE) +
facet_wrap(~screen_name, ncol = 2, scales = "free_x")
We see OPIndia posts most critical headlines on wednesday
Now we analyse by hour
tmlz %>%
unnest_tokens(word, text) %>%
inner_join(bing) %>%
count(created_at,screen_name,retweet_count,favorite_count,hour, index = tweetnumber , sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>%
ggplot( aes(hour, sentiment, fill = screen_name)) +
geom_bar(stat = "identity", show.legend = FALSE) +
facet_wrap(~screen_name, ncol = 2, scales = "free_x")
We see the cumulative sentiment by hour also depends on tweeting frequency of news portals at particular hour, thus this pattern resmebles activity.
Lets see word cloud of OPIndia
cmw = c("media","omitted","http","https","html","www",".com","t.co","to","the","of","by","and","an","its","writes",
"we","as","that","how","after","a","for","in","from","with","on","rt","now","him","about","his","this","are",
"while","no","but","is","what","who","they","you","has","had","have","svaradarajan")
tmlz %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(!(word %in% cmw)) %>%
# filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>%
filter(screen_name=="OpIndia_com") %>%
count(word) %>%
with(wordcloud(word, n, max.words = 50))
Expectedly congress dominates as opindia is critical of it.
Lets see wordcloud in retweets higher than 50
tmlz %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(retweet_count>50) %>%
filter(!(word %in% cmw)) %>%
# filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>%
filter(screen_name=="OpIndia_com") %>%
count(word) %>%
with(wordcloud(word, n, max.words = 50))
we see congress,gandhi,muslim appear prominent in headlines as opindia readership likes these articles
Lets see corresponding figures for wire
tmlz %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(!(word %in% cmw)) %>%
#filter(retweet_count>50) %>%
# filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>%
filter(screen_name=="thewire_in") %>%
count(word) %>%
with(wordcloud(word, n, max.words = 50))
in Wire wordcloud modi,bjp is more prominent, it also talks about india.
Lets see highly retweeted articles of Wire.
tmlz %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(retweet_count>50) %>%
filter(!(word %in% cmw)) %>%
#filter(retweet_count>50) %>%
# filter(!(word=="media"|word=="omitted"|word=="http"|word=="https"|word=="html"|word=="www"|word==".com")) %>%
filter(screen_name=="thewire_in") %>%
count(word) %>%
with(wordcloud(word, n, max.words = 50))
We can clearly see that articles with modi,govt find more prominence in highly retweeted articles of wire indicating its leadership wants to read these articles
We now analyse facebook archive briefly just focussing on which News portals have higher prominence there or does it follow twitter trend ?
tmlfb=readRDS('tmlfb.RDS')
Lets’ examine the trend of share and Retweets, does it hold like twitter?
knitr::kable(tmlfb %>%
group_by(screen_name) %>% summarise_at(.vars = c("likes_count","shares_count","engagement"),
.funs = c(Mean="mean",Median="median",Q25="Q25",Q75="Q75",SD="sd",minimum="min",maximum="max",Total="length")) %>% rename(News_portal=screen_name) %>% t(),digits=2)
| News_portal | OpIndia.com | Scroll | Swarajya | TheWire.in |
| likes_count_Mean | 255.6568 | 422.0168 | 324.7364 | 213.2176 |
| shares_count_Mean | 51.2300 | 608.9764 | 83.5996 | 147.8304 |
| engagement_Mean | 306.8868 | 1030.9932 | 408.3360 | 361.0480 |
| likes_count_Median | 59.0 | 58.5 | 115.0 | 46.0 |
| shares_count_Median | 11 | 16 | 12 | 9 |
| engagement_Median | 73.5 | 79.0 | 130.5 | 56.0 |
| likes_count_Q25 | 21 | 23 | 45 | 15 |
| shares_count_Q25 | 3 | 3 | 4 | 2 |
| engagement_Q25 | 25 | 27 | 50 | 19 |
| likes_count_Q75 | 140 | 182 | 289 | 143 |
| shares_count_Q75 | 33.00 | 68.25 | 44.25 | 41.00 |
| engagement_Q75 | 178.00 | 258.25 | 338.25 | 192.25 |
| likes_count_SD | 3375.9292 | 3222.6586 | 1327.2583 | 865.7914 |
| shares_count_SD | 195.2241 | 9584.8281 | 839.9081 | 1274.1077 |
| engagement_SD | 3424.264 | 12387.378 | 1722.826 | 2003.546 |
| likes_count_minimum | 0 | 0 | 0 | 0 |
| shares_count_minimum | 0 | 0 | 0 | 0 |
| engagement_minimum | 0 | 0 | 0 | 0 |
| likes_count_maximum | 152317 | 96129 | 57125 | 30296 |
| shares_count_maximum | 4966 | 421195 | 40082 | 36555 |
| engagement_maximum | 153262 | 517324 | 59940 | 66851 |
| likes_count_Total | 2500 | 2500 | 2500 | 2500 |
| shares_count_Total | 2500 | 2500 | 2500 | 2500 |
| engagement_Total | 2500 | 2500 | 2500 | 2500 |
Wow here we see that Scroll and Wire absolutely clobber Swarajya and opIndia..seems too good to be true..since Scroll didnt do well at all on twitter, which is the favorite site of news-lovers.
Probably, it has reasons in type of link being shared.Let’s analyse it
library(knitr)
kable(tmlfb %>% group_by(screen_name,type) %>% summarize(n=n(),mean_like = mean(likes_count),
mean_share = mean(shares_count),
mean_engagement = mean(engagement),
median_like = median(likes_count),
median_share = median(shares_count),
median_engagement = median(engagement)) %>%
mutate(proportion=round(100*n/2500,2)) %>% select(screen_name,type,n,proportion,everything()) %>%
arrange(type))
| screen_name | type | n | proportion | mean_like | mean_share | mean_engagement | median_like | median_share | median_engagement |
|---|---|---|---|---|---|---|---|---|---|
| TheWire.in | event | 1 | 0.04 | 52.00000 | 0.00000 | 52.00000 | 52.0 | 0.0 | 52.0 |
| OpIndia.com | link | 2339 | 93.56 | 260.81317 | 45.89312 | 306.70628 | 61.0 | 11.0 | 74.0 |
| Scroll | link | 1141 | 45.64 | 83.37248 | 17.06310 | 100.43558 | 30.0 | 3.0 | 35.0 |
| Swarajya | link | 1970 | 78.80 | 283.01929 | 47.21168 | 330.23096 | 120.0 | 13.0 | 135.5 |
| TheWire.in | link | 1959 | 78.36 | 132.00000 | 28.81879 | 160.81879 | 34.0 | 6.0 | 41.0 |
| TheWire.in | note | 1 | 0.04 | 18.00000 | 1.00000 | 19.00000 | 18.0 | 1.0 | 19.0 |
| Swarajya | offer | 1 | 0.04 | 880.00000 | 0.00000 | 880.00000 | 880.0 | 0.0 | 880.0 |
| OpIndia.com | photo | 123 | 4.92 | 114.43089 | 46.43902 | 160.86992 | 40.0 | 6.0 | 43.0 |
| Scroll | photo | 74 | 2.96 | 280.43243 | 32.62162 | 313.05405 | 48.5 | 4.0 | 49.5 |
| Swarajya | photo | 348 | 13.92 | 223.25575 | 34.47989 | 257.73563 | 49.0 | 3.0 | 53.0 |
| TheWire.in | photo | 59 | 2.36 | 132.22034 | 15.49153 | 147.71186 | 13.0 | 3.0 | 16.0 |
| OpIndia.com | status | 20 | 0.80 | 149.70000 | 29.55000 | 179.25000 | 17.0 | 4.0 | 23.0 |
| Swarajya | status | 2 | 0.08 | 15.00000 | 2.50000 | 17.50000 | 15.0 | 2.5 | 17.5 |
| TheWire.in | status | 23 | 0.92 | 29.13043 | 16.52174 | 45.65217 | 14.0 | 4.0 | 22.0 |
| OpIndia.com | video | 18 | 0.72 | 668.38889 | 801.55556 | 1469.94444 | 273.0 | 78.5 | 340.0 |
| Scroll | video | 1285 | 51.40 | 730.86537 | 1167.74942 | 1898.61479 | 107.0 | 52.0 | 164.0 |
| Swarajya | video | 179 | 7.16 | 981.50838 | 580.93855 | 1562.44693 | 228.0 | 63.0 | 310.0 |
| TheWire.in | video | 457 | 18.28 | 581.87090 | 682.33042 | 1264.20131 | 210.0 | 83.0 | 313.0 |
Aha so we can clearly see here OpIndia and Swarajya dominate even here in news link category and indeed these form the major part of their public page posts, but they have very low presence in video and photo posts which typically get higher likes n shares..probably it is due to nature of crowd on facebook.. Also a lesson for these RW digital outlets to step up their game in video links..
We will focus on links for now..
tmlfb %>% # filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(type=="link") %>%
# filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>%
summarise(mean_like = mean(likes_count),
mean_share = mean(shares_count),
mean_engagement = mean(engagement)) %>%
ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
geom_line()
We see OPindia and swarajya dominate the link space on facebook, but of late we see dominance of wire. Lets zoom on this graph after december
tmlfb %>% # filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(type=="link") %>%
filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>%
summarise(mean_like = mean(likes_count),
mean_share = mean(shares_count),
mean_engagement = mean(engagement)) %>%
ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
geom_line()
We can see clearly Jan 15 onwards, Wire is doing great while OpIndia is slacking out on facebook
Lets lookat video
tmlfb %>% # filter(is_retweet=="FALSE") %>%
# filter(!(screen_name=="scroll_in")) %>%
filter(type=="video") %>%
#filter(created_at >= "2017-12-31") %>%
group_by(screen_name,day) %>%
summarise(mean_like = mean(likes_count),
mean_share = mean(shares_count),
mean_engagement = mean(engagement)) %>%
ggplot(aes(x=day,y=mean_share,color=screen_name,group=screen_name))+ geom_point()+
geom_line()
We can see clearly that Scroll dominates video, but of late wire is on rise here as well. Swarajya and OpIndia seem less invested in video.
1.OPIndia dominates twitter by a big margin
2.Wire is on ascendancy on facebook,with aggressive video and link shares and likes
3.OPIndia and Wire are more extreme outlets with opposite points of view and negative sentiments
4.Swarajya and Scroll have more varied articles
5.News shared from phone clients have higher resonance
6.OpIndia and Swarajya have less original links on weekends, but maximum retweets and likes happen on these days
7.Video articles on Facebook shared and liked to higher degree
8.Swarajya and OPIndia dominate news Link space on facebook as well
Disclosure: I have right of centre views and have written article for OpIndia, however I have provide downloadable archive in the beginning and R code for analysis is self-contained in this page