Introduction

In the run-up to the general elections this October, the Austrian Broadcasting Agency (ORF) invited each of the leading candidates (in one case ‘only’ the party chair) of Austria’s main political parties for a one-hour, televised conversation. The following entry is a largely descriptive analysis of the tweets which were sent during these ORF-Sommergespräche (‘ORF summer talks’) and featured the widely used #orfsg17 hashtag.

The initial motivation for this analysis has been largely technical (getting familiar with the collection of social media data via R), and hence this entry contains also parts of the R script to detail the implemented programming steps. I strongly assume that there are more elegant and efficient ways to do some of the coding, so if you see a particularly glaring faux pas feel free to let me know. The idea and code is largely inspired by David Robinson’s analysis of Donald Trump’s (notorious) twitter use as well as Neil Saunders blog entry with a similar technical focus. Many thanks to them for making their code publicly available.

As a general caveat, the presented data is largely descriptive and (with a few exceptions) does not dive into the content of the tweets. One should be therefore particularly cautious when it comes to its interpretations.

Collecting the tweets

Tweets were collected with the rtweet package. The collection comprises only those tweets which 1) featured the hashtag #orfsg17 and 2) were sent within 24 hours from the start of the broadcast. While the overwhelming bulk of the relevant tweets was sent during or immediately after the show, the longer observation period was intended to pick up any possible traffic on the next day.

Number of tweets per candidate

Aggregating all tweets (incl. retweets) for each candidate provides for the first bar plot. The show with Sebastian Kurz (Liste Kurz/Austrian Peoples Party) triggered the largest twitter traffic. More than 6,000 tweets were sent during the show and the subsequent 23 hours. Chancellor Kern’s (Austrian Social Democratic Party) appearance prompted a bit more than 4,000 tweets, still considerably more than FPÖ’s Strache, but a clearly less than Kurz.

Interestingly though (at least to me), the difference appears less stark when distinguishing between ‘original’ tweets and retweets.

Obviously, the number of re-/tweets is no indication on how well candidates ‘performed’ nor does it suggest in any way that the candidate with the largest traffic in some way ‘won’. The number is also not a directly valid indictor for the popularity of the candidate. However, the different numbers of tweets might be indicative for different levels of interest in each of the candidates.

library(dplyr)
n.tweets <- tweets_found %>%
  group_by(candidate, is_retweet)%>%
  summarise(n=n())
  
library(ggplot2)
library(ggthemes)

n.tweets.plot <- n.tweets %>%
  ggplot(.,aes(candidate, n, group=candidate))+
  geom_bar(aes(fill=is_retweet), stat="identity")+
  labs(title="ORF Sommergespräche: Number of tweets with hashtag #orfsg17",
       subtitle="Sent between 21:05 hrs (start of broadcast) and 21:05 hrs the following day; incl. retweets",
       y="", x="",
       caption="Roland Schmidt, @zoowalk")+
  theme_minimal()+
  scale_fill_fivethirtyeight("is_retweet", labels=c("tweets","retweets"))+
  theme(legend.position="bottom",
        legend.title = element_blank(),
        panel.grid.major.x = element_blank())

plot(n.tweets.plot)

Tweets vs Retweets

The second bar chart provides the composition of the traffic for each candidates. Accordingly, the share of retweets was the largest during the show with Kurz.

n.tweets.plot.rel <- n.tweets %>%
  ggplot(.,aes(candidate, n, group=candidate))+
  geom_bar(aes(fill=is_retweet), stat="identity", position="fill")+
  labs(title="ORF Sommergespräche: Share of tweets and rewteeets with hashtag #orfsg17",
       subtitle="Sent between 21:05 hrs (start of broadcast) and 21:05 hrs the following day",
       y="% of all tweets", x="",
       caption="Roland Schmidt, @zoowalk")+
theme_minimal()+
  scale_fill_fivethirtyeight("is_retweet", labels=c("tweets","retweets"))+
  scale_y_continuous(labels=scales::percent)+
  theme(legend.position="bottom",
        legend.title = element_blank())
  #scale_fill_discrete(labels=c("tweets","retweets"))

plot(n.tweets.plot.rel)

Number of users tweeting

Distinct from the number of tweets is the number of users sending tweets and retweets. More than 1,500 users (= distinct twitter handles) posted at least one tweet or retweet during or after the show with Kurz. Around 1,300 users tweeted during the show with Chanellor Kern. Shows with other candidates triggered clearly less interest if one considers the numbers of persons tweeting.

library(dplyr)
library(ggplot2)
library(ggthemes)

n.tweeps <- tweets_found %>%
  group_by(candidate)%>%
  distinct(screen_name)%>%
  summarise(n=n())

n.tweeps.plot <- n.tweeps %>%
  ggplot(.,aes(candidate, n))+
  geom_bar(stat="identity", aes(fill=candidate))+
  labs(title="ORF Sommergespräche: Number of users sending tweets",
       subtitle="How many users sent tweets and retweets with the hastag #orfsg17 between 21:05 hrs (start of broadcast) and 21:05 hrs the following day",
       y="number of users", x="",
       caption="Roland Schmidt, @zoowalk")+
  theme_minimal()+
  scale_fill_manual(values=c("green","red","cyan3","blue","pink"))+
  theme(legend.position="none",
        legend.title = element_blank(),
        panel.grid.major.x = element_blank())+
  scale_y_continuous(breaks=seq(0,1750, by=250))
  
plot(n.tweeps.plot)

Relative Distribution of tweets among users

Putting the number of tweets and the number of users together, one might be interested in how tweets were distributed among users. Were tweets evenly distributed among users, or were there only a few users who generated most of the tweets? For this purpose, the Gini coefficient can be a useful indicator: The higher its value the more concentrated traffic has been among the relevant users. Lower values indicate a more even distribution.

While the Gini coefficients for ‘original tweets’ are by and large within the same range for all candidates, there is quite a difference when it comes to retweets. Retweets pertaining to the show with Kurz (0.57) are considerably higher concentrated than those pertaining to e.g. Strache (0.39) or Felipe (0.41).

library(ineq)
library(forcats)
library(tidyr)

tweets_found <- tweets_found %>%
  mutate(is_retweet=factor(is_retweet, labels=c("tweet", "retweet")))

gini.df <- tweets_found %>%
  group_by(candidate, screen_name, is_retweet)%>%
  summarise(freq=n())%>%
  group_by(candidate, is_retweet)%>%
  summarise(gini=round(ineq(freq, type="Gini"),2))%>%
  spread(.,is_retweet, gini)

library(kableExtra)
knitr::kable(gini.df,format="html", caption="Gini coefficients") %>%
  kable_styling(bootstrap_options = c("condensed"),
                full_width=F)
Gini coefficients
candidate tweet retweet
Felipe (Die Grünen) 0.52 0.41
Kern (SPÖ) 0.48 0.50
Kurz (Liste Kurz/ÖVP) 0.54 0.57
Strache (FPÖ) 0.53 0.39
Strolz (Neos) 0.47 0.45

Lorenz Curve

lorenz <- tweets_found %>%
  filter(is_retweet=="retweet")%>%
  group_by(candidate, screen_name)%>%
  summarise(n.tweets=n())%>%
  group_by(candidate)%>%
  mutate(n.user=n()) %>%
  arrange(candidate,n.tweets)%>%
  mutate(cumsum.tweets=cumsum(n.tweets),
         perc.tweets=cumsum.tweets/sum(n.tweets),
         cumsum.user=row_number(),
         perc.user=cumsum.user/max(cumsum.user),
         perc.tweets.inv=1-perc.tweets,  #top tweeters
         perc.user.inv=1-perc.user)%>%
  select(candidate, perc.user, perc.tweets,
         perc.user.inv, perc.tweets.inv)

lorenz.plot <- lorenz %>%
  ggplot(.,aes(perc.user.inv,perc.tweets.inv))+
  geom_line(aes(color=candidate))+
  theme_minimal()+
  labs(x = "% of all users sending retweets", 
       y = "% of all retweets", 
       title = "ORF Sommergespräche: User concentration of retweets",
       caption="Roland Schmidt, @zoowalk")+
  geom_vline(xintercept=0.10, linetype="dotted", color="black")+
  geom_vline(xintercept = 0.25, linetype="dashed",
             color="black")+
  theme(legend.position="bottom",
        legend.title = element_blank())+
  scale_y_continuous(labels=scales::percent, breaks=seq(0,1,0.1))+
  scale_x_continuous(labels=scales::percent, breaks=seq(0,1,0.1))+
  scale_color_manual(values=c("Felipe (Die Grünen)"="green","Kern (SPÖ)"="red","Kurz (Liste Kurz/ÖVP)"="cyan3","Strache (FPÖ)"="blue","Strolz (Neos)" ="pink"))

plot(lorenz.plot)

The (‘inverted’) Lorenz curve visualizes the different degrees of concentration of retweets. Ranking users by their number of retweets, it indicates the share of tweets contributed by a share of users. In the case of Kurz, 10 % of those sending retweets pertaining to his show contributed more than 50 % of all retweets of his show. 25 % of the users sending retweets contributed 70 % of all retweets. Contrasted with Strache’s appearance - featuring the least concentrated retweets -, the top 10 % of ‘retweeters’ contributed only 35 % of all retweets. The top 25 % ‘retweeters’ contributed 55 % of all retweets.

To better understand why retweets pertaining to Kurz’s appearance are more concentrated, it would require a more detailed look into who actually sent the retweets. Due to a lack of time, I’ll limit this aspect to a cursory analysis. The most active retweeter during Kurz’s show was a user with the twitter handle @IreneW1812 with 136 out of the 3061 retweets in total. To put her number of retweets into context, the user with the second most retweets (@SchauerAndreas) sent 59 retweets. @IreneW1812‘s twitter profile features an image with ’Kurz4Kanlzer’, strongly suggesting a partisan motivation for her retweets. A quick google search shows that she is/was active in Vienna district level politics for the Austrian People’s Party (= Kurz’s party). Similarly, a brief look at the timeline of @SchauerAndreas makes it clear that he is strongly supportive of Kurz’s candidacy. With a total of 195 retweets these two users alone contributed 6,4 % of all retweets during the show with Kurz. While it would require a more indepth analysis to come to a final conclusion, these two cases suggest that particualarly active partisan retweeters are the drivers of the high concentration of retweets pertaining to Kurz’s show.

The most favorited tweet (881 likes) for all shows was a tweet containing a link to a humoristic video summary of all shows. Since this tweet was sent within the 24 hours of Kern’s talk, which was the last show in the entire series, it shows up in the analysis of tweets pertaining to Kern, but does not make any analytic contribution particularly pertaining to Kern’s appearance. The second most favorited tweet also pertains to Kern’s show (531 times favorited) and again has no direct link with the candidate, but with the show’s anchor. Prior to the conversation with Kern, ORF’s Tarek Leitner had been accused of a conflict of interest by members of Kurz’s party. The tweet commends Leitner for his journalistic professionalism in the face of the allegations, which many considered politically motivated. The third most favorited tweet has finally a direct reference to one of the candidates. Liked 452 times, Rudi Fussi, a well-known political commentator with many talents, highlights in his tweet Kurz’s cancellation of a direct debate with Chancellor Kern. According to Fussi, Kurz’s (allegedly weak) performance during the ORF summer talk explains why Kurz avoided the direct confrontation with Kern (Kurz explained his cancellation with his participation in a EU meeting abroad).

Most re-tweeted tweets

retweets.n <- tweets_found %>%
  filter(is_retweet=="retweet") %>%
  select(candidate, screen_name, text, retweet_count, retweet_status_id)%>%
  group_by(candidate, retweet_status_id)%>%
  summarise(n.retweets=n())

retweets <- left_join(retweets.n,tweets_found[,c("screen_name","created_at","text","status_id")],by=c("retweet_status_id"="status_id"))%>%
  arrange(candidate, desc(n.retweets)) %>%
  slice(1:5)

names(retweets)[names(retweets)=="screen_name"] <- "user"
names(retweets)[names(retweets)=="n.retweets"] <- "retweets"

The most ‘liked’ tweet pertaining to a candidate with is also the most retweeted tweet. Rudi Fussi’s tweet on Kurz’s cancellation of a direct duel with Kern was retweeted 130 times within the 24 hours of Kurz’s show. The second most retweeted tweet is from the show with FPÖ’s Christian Strache (98 retweets) and calls out Strache’s somewhat misleading claim that ordinary house owners would be particularly hard hit by the introduction of an inheritance tax.

Glancing through the top retweets for each candidate also reveals instances of ‘hashtag-highjacking’ (if this term exists). Some users used the #orfsg17 hashtag to post content which is of hardly any substantive relation with the summer talks. In this regard, @tanjaplayner seems to be a particularly persistent case (see also here).

Felipe

candidate user retweets text
Felipe (Die Grünen) Kress_de 15 #airberlin #dbp17 #vflhsv #ronaldo #GameOfThrones #Charlottesville #jeremykyle #orfsg17 #FCEVfB #StreamWithSelena… https://t.co/xCKggOR4K6
Felipe (Die Grünen) tanjaplayner 12 Sebastian Kurz #orfsg17 Tanja Playner #zib2 #Kern #Pilz #puls4 #heute So lacht das Internet über Kerns Wahl-Slogan https://t.co/wm59RSZkwO
Felipe (Die Grünen) rudifussi 11 Hmmm. Ich find die Felipe sehr sympathisch, aber irgendwie sind ihr zu Schuhe fürs Wiener Parkett doch zu groß. #orfsg17
Felipe (Die Grünen) tanjaplayner 11 #AustrianGP Google #GoogleHome #thetimes #orfsg17 #srf #nytimes #zib2 #news #FAZ Happy Present for you… https://t.co/soYR7KJjVx
Felipe (Die Grünen) barbara_felkel 10 #orfsg17 zeigt den Kleingeist d. österreichischen Journalismus: lieber Vorurteile pflegen als Inhalte diskutieren

Kern

candidate user retweets text
Kern (SPÖ) KernChri 53 #orfsg17 https://t.co/qKFvplzuRz
Kern (SPÖ) lisafuchs 51 Ganz objekt. Das ist ein Kanzler. Letzte Woche saß da ein Kandidat von Österreich sucht den Super-Praktikant. #orfsg17 #kern
Kern (SPÖ) lukasriepler 47 Bereits nach fünf Minuten mehr Inhalte als bei Sebastian Kurz in einer ganzen Stunde. #orfsg17
Kern (SPÖ) andreasstrobl 30 Das Cover vom ÖVP-Wirtschaftsprgramm ist aufgetaucht. (Dank an @unbehandelt) #ORFsg17 https://t.co/x3CtVkdMhy
Kern (SPÖ) ChristinaAumayr 30 Da hat der @nowak_rainer schon wieder recht. Letzte Woche hatten wir ein Duell, heute ein Duett. #orfsg17

Kurz

candidate user retweets text
Kurz (Liste Kurz/ÖVP) rudifussi 130 Falls sich wer gefragt hat, warum Kurz das Duell mit Kern abgesagt hat. Deine Frage ist beantwortet. #orfsg17
Kurz (Liste Kurz/ÖVP) d_feierabend 61 Das will Sebastian Kurz verschweige:, die ÖVP ist schon lange im Wahlkampf. #orfsg17 https://t.co/xnNugGKZlP
Kurz (Liste Kurz/ÖVP) RablPeter 59 Die Gesprächsführung und Fragestellungen von Tarek Leitner erscheinen mir nicht sehr gelungen, sorry to say. #orfsg17
Kurz (Liste Kurz/ÖVP) MatthiasPunz 52 #orfsg17 Kurz sagt Programm nicht fertig, daher noch nicht veröffentlicht. Oberösterreichs Landesvize sagt es ist r… https://t.co/niP7ylaRNF
Kurz (Liste Kurz/ÖVP) rudifussi 41 Der pöhse Tarek Leitner. Rotfunk! #orfsg17 https://t.co/arW7cUhO7r

Strache

candidate user retweets text
Strache (FPÖ) MatthiasPunz 98 #orfsg17 Strache bringt den alten Häuslbauer-Schmäh gegen die Erbschaftssteuer. Nachhilfe: https://t.co/Y9erZBEsiz
Strache (FPÖ) michaelmingler 61 Das ist Strache, wie er “”nur gratuliert“”. #orfsg17 https://t.co/MBH3sApJwK
Strache (FPÖ) KDreisiebner 50 #Strache bei #orfsg17 zu Tarek Leitner: Heute überfällt man keine Bank, man gründet eine. Ja, Leute aus #fpoe haben Erfahrung: #Hypo Kärnten
Strache (FPÖ) DMGuertler 48 FPÖ hat die meisten Abgeordneten in höchster Zuverdienststufe. Warum sind die wohl gegen vermögensbezogene Steuern? #orfsg17
Strache (FPÖ) joseflentsch 48 FPÖ hat nichts mit Trump zu tun, sagt Strache. Google so: #orfsg17 https://t.co/sgqW2oxlV6

Strolz

candidate user retweets text
Strolz (Neos) dietmar_seiler 27 muss man dem @matstrolz lassen: er ist der begeistertste &amp; begeisterndste bildungspolitiker des landes. #orfsg17
Strolz (Neos) tanjaplayner 25 #InternationalCatDay Google #GoogleHome #thetimes #orfsg17 #srf #nytimes #zib2 #news #FAZ Happy Present for you… https://t.co/mVzHgAjKce
Strolz (Neos) blauerelefant 23

Strolz hat so viel Energie, dass Kaffee IHN in der Früh trinken muss, um in die Gänge zu kommen.

#orfsg17
Strolz (Neos) tanjaplayner 21 #puls4 #orfsg17 #srf #nytimes #kern #Pilz #zib2 #news #FAZ #oe24 Pop art von Tanja Playner ist immer eine gute Wahl… https://t.co/yLZf4sp2ov
Strolz (Neos) tanjaplayner 18 Pilz #zib2 Kern #puls4 #heute #orfsg17 Sebastian Kurz fordert härtere Strafen bei Verbrechen gegen Frauen und Kinder https://t.co/Sgs3Sz3Jng

Timeline of tweets

Tweets pertaining to the shows were overwhelmingly concentrated during the time of the broadcast. Traffic basically subsided completely during the night hours and returned only marginally in the morning.

library(padr)
flow <- tweets_found %>%
  group_by(candidate) %>%
  arrange(created_atGMT2)%>%
  thicken(interval="5 min", colname="interval", by="created_atGMT2")%>%
  group_by(candidate, is_retweet, interval)%>%
  summarise(freq=n())%>%
  pad()%>%
  fill_by_value(value=0)

# data for geom_rect
candidate <- c("Strolz (Neos)","Felipe (Die Grünen)","Strache (FPÖ)","Kurz (Liste Kurz/ÖVP)","Kern (SPÖ)")
start <- c("2017-08-07 21:05:00",  #Strolz
           "2017-08-14 21:05:00",  #Felipe
           "2017-08-21 21:05:00",  #Strache
           "2017-08-28 21:05:00",  #Kurz
           "2017-09-04 21:05:00")  #Kern

shows <- data.frame(candidate,start)
shows$start <- as.POSIXct(as.character(shows$start))
shows$candidate <- as.character(shows$candidate)
shows$end <- shows$start + 55*60

# plot
flow.plot <- flow %>%
  ggplot(.,aes(interval,freq))+
  geom_bar(aes(fill=is_retweet), position="stack",stat="identity")+
  labs(title="ORF Sommergespräche: Traffic of tweets with #orfsg17",
       subtitle="Number of tweeets and retweets with hastag #orfsg17 sent between 21:05 to 23:00 hrs",
       y="number of tweets", x="",
       caption="Roland Schmidt, @zoowalk")+
  #theme_fivethirtyeight()+
  theme_minimal()+
  scale_fill_fivethirtyeight("is_retweet", labels=c("tweets","retweets"))+
  theme(legend.position="bottom",
        legend.title = element_blank())+
  geom_rect(data=shows, aes(xmin=start, xmax=end, ymin=-Inf, ymax=+Inf),
              #color="grey20",
              fill="orange",
              alpha=0.2,
              inherit.aes = FALSE)+
   facet_wrap(~candidate, scales="free_x", ncol=1)

print(flow.plot)

Twitter clients/devices used

The information provided by twitter’s API also includes details on the type of device/app from which each tweet was sent. Twitter for Android, iPhone and twitter’s web client were dominant. Traffic pertaining to the show with Kern feature a comparably high share of tweets originating from ‘twitter lite’. I have no idea why this might be the case. Note that the number of used devices/app slightly deviates from the number of users tweeting. Some users sent tweets from more than one device/app and are hence included with each device/app separately.

library(forcats)
source.candidate <- tweets_found %>%
  group_by(candidate)%>%
  distinct(screen_name, source) %>%
  #select(candidate, source) %>%
  mutate(source.cand=fct_lump(source, n=5))%>%
  group_by(candidate, source.cand) %>%
  summarise(n=n())%>%
  mutate(n.rel=n/sum(n)*100)

source.candidate.plot <- source.candidate %>%
  ggplot(.,aes(candidate, n))+
  geom_bar(aes(fill=source.cand), stat="identity")+
  labs(title="ORF Sommergespräche: Sources of tweets with hashtag #orfsg17",
       subtitle="In absolute numbers; includes retweets",
       caption="Roland Schmidt, @zoowalk",
       y="number of devices/apps", x="")+
theme_minimal()+  
  theme(legend.title = element_blank(),
        panel.grid.major.x = element_blank())+
  scale_fill_brewer(palette="Set2")+
  scale_y_continuous(breaks=seq(0,1750, by=250))

plot(source.candidate.plot)  

source.candidate.plot.rel <- source.candidate %>%
  ggplot(.,aes(candidate, n))+
  geom_bar(aes(fill=source.cand), stat="identity",position="fill")+
  labs(title="ORF Sommergespräche: Sources of tweets with hashtag #orfsg17",
       subtitle="In releative numbers; includes retweets",
       caption="Roland Schmidt, @zoowalk",
       x="", y="")+
theme_minimal()+  
  theme(legend.title = element_blank(),
        panel.grid.major.x = element_blank())+
  scale_fill_brewer(palette="Set2")+
  scale_y_continuous(labels=scales::percent)
  
plot(source.candidate.plot.rel)