Since October 2018, Twitter has been releasing archived datasets of tweets from state-sponsored information operations (IO) identified on its platform to enable independent academic research and investigation. Independent researchers have been analyzing the Twitter IO data to better understand how coordinated campaigns seek to distort public opinion around the world.
On 15 February 2019, Spanish Prime Minister Pedro Sanchez called for snap parliamentary elections, setting the stage for a showdown between the center-left Partido Socialista Obrero Español (PSOE) and its main opposition rival, the center-right Partido Popular (PP). Facing a challenge from the nationalist Vox party on its right flank, PP staked out tough positions on crime, abortion, immigration and Catalonian separatism. The PP’s campaign strategy led the party to suffer the worst defeat in its electoral history in the 28 April elections.
On 20 September, Twitter announced that the PP allegedly used fake accounts to fraudulently amplify its messages and support levels in the run up to the April 2019 elections:
“We have removed 259 accounts we identified as falsely boosting public sentiment online in Spain. Operated by Partido Popular, these accounts were active for a relatively short period, and consisted primarily of fake accounts engaging in spamming or retweet behaviour to increase engagement.”
PP denied responsibility for creating fake accounts, but acknowledges the possibility that its supporters could have been behind such a campaign. Twitter claims that it publishes datasets on information operations (IO) when it can reliably associate it with state-affiliated actors.
Twitter also published a full archive of the Spain IO accounts and activity it identified on its platform. We dove into the Twitter Spain IO data to see what behavioral patterns can be mined to better understand how the campaign operated. Here are the main trends we discovered in the data:
- Between 27 February and 22 April, 216 accounts posted 56,712 tweets.
- Although there were 259 accounts in the dataset, 43 never posted a single tweet.
- The 216 accounts that tweeted were most active on weekdays during mid-morning and early afternoon hours.
- While the accounts had relatively few followers, they used popular hashtags to reach a wider audience
- The average account was following 193 accounts and had 65 followers; given retweet patterns, it is likely that many followers of IO accounts were other IO accounts
- The accounts tweeted and retweeted messages using popular hashtags among PP supporters to extend the reach beyond their follower base.
- Of the 56,712 tweets posted by these accounts, 27,042 (47.6%) were retweets.
- The most retweeted content were posts from the official PP account or PP leader accounts, suggesting a directed campaign to inflate popular support levels
- 1 of every 3 retweets was an IO account retweeting another IO account, suggesting a coordinated campaign to amplify party narratives
The extent to which this coordinated campaign had any impact on voters and the results of the April elections is unknown. Clearly PP did not win the elections - and these 216 accounts surely made up a very small amount of total Twitter activity ahead of the polls - however the intentions of this information operation were clear: to artificially boost PP popular support levels and advance the party’s campaign agenda.
Below is a more detailed breakdown of the Spain IO data:
The Twitter Spain IO archive contained 56,712 tweets from 216 active Twitter accounts between 27 February and 22 April 2019 (note: while Twitter identified 259 IO accounts, 43 of them never tweeted). The number of active IO accounts on a given day was largely correlated with tweet activity levels (r^2=0.63) with some fluctuation. On average, 43 accounts were actively tweeting on any given day and the median number of tweets was 736 per day. The most number of accounts tweeting on a single day was 106 and the most tweets in one day was 4,838. Of the 56,712 total tweets from Spain IO accounts, 27,042 (47.6%) were retweets.

The Spain IO Accounts were much more active during weekdays than on weekends. Peak activity periods were mid-morning and early afternoon hours on weekdays. There were also small activity spikes during late evening hours, particularly on weekend days. Given the high number of retweets (detailed in further analysis), one hypothesis for the activity patterns is that they may coincide with activity patterns of the accounts of PP and its leaders however this hypothesis can’t be tested with the data available.

The Spain IO accounts did not have many followers and nearly all 216 active accounts were following more accounts that they had followers. Using a simple linear regression formula, the average IO account followed 193 accounts and had only 65 followers. The profile locations of the accounts were largely localities in Spain, with few exceptions. The profile descriptions were mostly generic however several mentioned affiliation with or support for PP. A handful denoted support for other parties (e.g. Podemos)

Although the Spain IO accounts had few followers, they frequently tweeted using popular hashtags enabling them to reach well beyond than their limited follower bases. The most widely used Twitter hashtags by the IO accounts were among the most commonly used by PP and its supporters: “PedroSeLoFunde” (Pedro Melts It), “Decretazo Sanchez” (Sanchez Decree), “NiObreroNiEspañol” (Neither Worker Nor Spanish), “StoPSOE” (Stop PSOE), and “Nohablamoshacemos” (We don’t talk we do).

Almost half of the tweets were retweets of other accounts, including direct retweets of posts from the official accounts of PP and its leaders. While theses retweets make up a small fraction of the total retweets that these posts received, it suggests that the IO accounts were seeking to further inflate the party’s popular support levels and amplify its narratives. The timing patterns of direct retweets of the top 20 most retweeted statuses by IO accounts of the indicates a high level of coordination and responsiveness.

The most retweeted post (47 times) was a post from then-Catalonia PP Deputy Andrea Levy Soler emphasizing a hardline stance against Catalonian separatism efforts:
A retweet of a post from the official PP Twitter account underscoring its tough position on criminal justice was retweeted 32 times:
A post highlighting that PP is tough on crime and illegal immigration while pushing back on a Vox party proposal to arm citizens from then-PP Deputy Javier Moroto was retweeted 30 times:
A topical analysis of the most frequently mentioned terms provides insight into the content of the messages that IO accounts tweeted. The terms used most often – excluding hashtags and mentions – were almost exclusively election-related in nature. PSOE leader Pedro Sanchez, Partido Popular and PP leader Pablo Casado were the mentioned entities by IO accounts while topics related to economic and social justice as well as executive decrees featured prominently

The Spain IO tweets very frequently mentioned PP and its leaders. Of the top ten most mentioned Twitter accounts, 8 were associated with PP and 2 with PSOE (the party’s official account and that of its leader Pedro Sanchez). There were far fewer mentions of other PSOE leaders, other parties, or thier leaders. Each line in the graph below represents a mention in a tweet by an IO account of a PP account (red) or a PSOE account (blue).

Of the 27,042 total retweets, 1 of every 3 was an IO account retweeting another IO account. Considering the content of the tweets, this pattern suggests a coordinated campaign to amplify party narratives. The graph below highlights the volume of retweet activity among IO accounts where each line represents an IO retweet of another IO account. Given the high number of retweets of other IO accounts and the low number of followers of these accounts, it is likely that many of the followers of IO accounts were IO accounts.

---
title: "Analysis of Spain IO Accounts"
output: html_notebook
author: "Michael Baldassaro (@mbaldassaro)" 
---

*Since October 2018, Twitter has been releasing [archived datasets of tweets from state-sponsored information operations (IO)](https://blog.twitter.com/en_us/topics/company/2018/enabling-further-research-of-information-operations-on-twitter.html) identified on its platform to enable independent academic research and investigation. Independent researchers have been analyzing the Twitter IO data to better understand how coordinated campaigns seek to distort public opinion around the world.* 

On 15 February 2019, Spanish Prime Minister Pedro Sanchez [called for snap parliamentary elections](https://www.nytimes.com/2019/02/15/world/europe/spain-snap-election.html), setting the stage for a showdown between the center-left Partido Socialista Obrero Español (PSOE) and its main opposition rival, the center-right Partido Popular (PP). Facing a [challenge from the nationalist Vox party](https://elpais.com/elpais/2019/02/18/inenglish/1550506982_047374.html) on its right flank, PP staked out [tough positions](https://elpais.com/elpais/2019/02/27/inenglish/1551259773_252703.html) on crime, abortion, immigration and Catalonian separatism. The PP's campaign strategy led the party to suffer the [worst defeat in its electoral history](https://www.nytimes.com/2019/04/29/world/europe/spain-election-sanchez-vox.html) in the 28 April elections. 

On 20 September, Twitter announced that the PP allegedly used [fake accounts to fraudulently amplify its messages and support levels](https://blog.twitter.com/en_us/topics/company/2019/info-ops-disclosure-data-september-2019.html) in the run up to the April 2019 elections:

> "We have removed 259 accounts we identified as falsely boosting public sentiment online in Spain. Operated by Partido Popular, these accounts were active for a relatively short period, and consisted primarily of fake accounts engaging in spamming or retweet behaviour to increase engagement."     

PP [denied responsibility](https://www.ft.com/content/7f04e1b6-dbb8-11e9-8f9b-77216ebe1f17) for creating fake accounts, but acknowledges the possibility that its supporters could have been behind such a campaign. Twitter claims that it publishes datasets on information operations (IO) when it can [reliably associate it with state-affiliated actors](https://blog.twitter.com/en_us/topics/company/2019/information-ops-on-twitter.html).

Twitter also published a [full archive of the Spain IO accounts and activity](https://about.twitter.com/en_us/values/elections-integrity.html#data) it identified on its platform. We dove into the Twitter Spain IO data to see what behavioral patterns can be mined to better understand how the campaign operated. Here are the main trends we discovered in the data: 

* Between 27 February and 22 April, 216 accounts posted 56,712 tweets. 
    + Although there were 259 accounts in the dataset, 43 never posted a single tweet.
    + The 216 accounts that tweeted were most active on weekdays during mid-morning and early afternoon hours.
* While the accounts had relatively few followers, they used popular hashtags to reach a wider audience 
    + The average account was following 193 accounts and had 65 followers; given retweet patterns, it is likely that many followers of IO accounts were other IO accounts
    + The accounts tweeted and retweeted messages using popular hashtags among PP supporters to extend the reach beyond their follower base.
* Of the 56,712 tweets posted by these accounts, 27,042 (47.6%) were retweets. 
    + The most retweeted content were posts from the official PP account or PP leader accounts, suggesting a directed campaign to inflate popular support levels 
    + 1 of every 3 retweets was an IO account retweeting another IO account, suggesting a coordinated campaign to amplify party narratives 

The extent to which this coordinated campaign had any impact on voters and the results of the April elections is unknown. Clearly PP did not win the elections - and these 216 accounts surely made up a very small amount of total Twitter activity ahead of the polls - however the intentions of this information operation were clear: to artificially boost PP popular support levels and advance the party's campaign agenda. 

Below is a more detailed breakdown of the Spain IO data: 

```{r setup, echo=FALSE, eval=FALSE}
require("parallel")
require("DBI")
require("RSQLite")
require("foreach")
require("doParallel")
require("dplyr")
require("ggplot2")
require("rtweet")
cores <- detectCores()
workers <- makeCluster(4)
registerDoParallel(workers)
db_con <- dbConnect(RSQLite::SQLite(), "~/twittio/twittio.db")
spainbasic <- dbGetQuery(db_con, "select user_screen_name, tweet_time, is_retweet, follower_count, following_count, like_count, reply_count, retweet_count from spain")
spainsna <- dbGetQuery(db_con, "select userid, user_screen_name, tweet_text, user_mentions from spain")
spainurl <- dbGetQuery(db_con, "select user_screen_name, tweet_text, urls from spain")
spainrt <- dbGetQuery(db_con, "select userid, tweetid, tweet_text, is_retweet, retweet_userid, retweet_tweetid from spain")
spainrttime <- dbGetQuery(db_con, "select user_screen_name, tweet_time, is_retweet, retweet_tweetid from spain")
stopImplicitCluster()
stopCluster(workers)
```

***The Twitter Spain IO archive contained 56,712 tweets from 216 active Twitter accounts between 27 February and 22 April 2019 (note: while Twitter identified 259 IO accounts, 43 of them never tweeted). The number of active IO accounts on a given day was largely correlated with tweet activity levels (r^2=0.63) with some fluctuation. On average, 43 accounts were actively tweeting on any given day and the median number of tweets was 736 per day. The most number of accounts tweeting on a single day was 106 and the most tweets in one day was 4,838. Of the 56,712 total tweets from Spain IO accounts, 27,042 (47.6%) were retweets.*** 

```{r}
#was 84 tweets per day -- but no interactions
#How many interactions = 20544 reactions
#106 active accounts (11 April 2019) 
#4,838 tweets (22 March 2019)
spainbasic_table <- spainbasic %>% group_by(user_screen_name) %>% summarise(tweets=length(user_screen_name), interactions=sum(as.integer(like_count)) + sum(as.integer(reply_count)) + sum(as.integer(reply_count)), followers=mean(as.integer(follower_count)), following=mean(as.integer(following_count)), first=(min(as.POSIXct(tweet_time))), last=(max(as.POSIXct(tweet_time))), active=ceiling(last-first), volume=ceiling(tweets/as.integer(active))) %>% arrange(desc(tweets))
spainbasic_table$interactions[is.na(spainbasic_table$interactions)] <- 0

require("lubridate")
head(spainbasic)
spaingraph <- spainbasic %>% mutate(tweet_time=as.POSIXct(tweet_time), date=round_date(tweet_time, "day")) %>% group_by(date) %>% mutate(tweets=length(date), accounts=length(unique(as.factor(user_screen_name))), retweet=ifelse(is_retweet=="true", 1, 0)) 
spaingraph_table <- spaingraph %>% group_by(date) %>% summarise(tweets=min(tweets), accounts=min(accounts), interactions=sum(as.integer(like_count)) + sum(as.integer(reply_count)) + sum(as.integer(reply_count)), retweet=sum(retweet))
spaingraph_table$interactions[is.na(spaingraph_table$interactions)] <- 0
spaingraph_table %>% mutate(rt_pct=retweet/tweets*100) %>%  arrange(desc(rt_pct))#arrange(desc(accounts))  
#correlation check
test <- spainbasic %>% select(user_screen_name, tweet_time)
test$user_screen_name <- as.factor(test$user_screen_name)
test$tweet_time <- as.Date(test$tweet_time, format="%M-%D-%Y")
test <- test %>% group_by(tweet_time) %>% summarise(accounts=length(unique(user_screen_name)) ,tweets=length(tweet_time))
#test %>% ggplot(aes(x=accounts, y=tweets)) + geom_point() 
correlation <- cor(test$accounts, test$tweets)

#GRAPH
require("colormap")
g <- spaingraph_table %>% ggplot(aes(x=date, y=tweets)) + geom_bar(aes(fill=accounts), stat = "identity")
g + scale_fill_colormap(discrete = F,colormap = colormaps$viridis, reverse = T) +
    theme(axis.title.x = element_text(face = "bold", size=8),
          axis.text.x = element_text(size=8),
          axis.title.y = element_text(face = "bold", size=8),
          axis.text.y = element_text(size=8),
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_text(size=8),
          plot.title = element_text(face = "bold", size=8),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          panel.background = element_blank(), 
          axis.line = element_blank()) +
    geom_line(data=spaingraph_table, aes(x=date, y=retweet), colour="orange") +
    ggplot2::labs(
    x = "Date", y = "Tweets", fill="Accounts",
    #title = "",
    #subtitle = "",
    caption = "Tweet activity levels by 216 Spain IO accounts per day\nBar Color = Number of Accounts Tweeting per Day (Darker = More Accounts)\nLine = Number of Total Tweets that were Retweets\nSource: Twitter transparency report")

```

***The Spain IO Accounts were much more active during weekdays than on weekends. Peak activity periods were mid-morning and early afternoon hours on weekdays. There were also small activity spikes during late evening hours, particularly on weekend days. Given the high number of retweets (detailed in further analysis), one hypothesis for the activity patterns is that they may coincide with activity patterns of the accounts of PP and its leaders however this hypothesis can't be tested with the data available.***
```{r, echo=FALSE}
require("chron")
spainbasic$time <- sub(".*\\s+", "", spainbasic$tweet_time) 
spainbasic$time <- substr(spainbasic$time, 0,2)
spainbasic$time <- as.integer(spainbasic$time)
spainbasic$tweet_time <- as.POSIXct(spainbasic$tweet_time)
spainbasic$day <- weekdays(as.Date(spainbasic$tweet_time))
spainbasic$day <- as.factor(spainbasic$day)
spainbasic <- spainbasic %>% mutate(daytime=ifelse(day=="Saturday" | day=="Sunday", "Weekend", "Weekday"))
spainbasic$daytime <- as.factor(spainbasic$daytime)

#spainbasic %>% group_by(day,time)
#spainbasic %>% group_by(daytime,time) %>% summarise(x=length(time)/)) 

#spainbasic %>% group_by(daytime, time) %>% mutate(x=ifelse(daytime=="Weekday", mean(length(time))/5, mean(length(time))/2), y=ifelse(daytime=="Weekday", mean(length(unique(user_screen_name)))/5, mean(length(unique(user_screen_name)))/2)) 

spaintime <- spainbasic %>% group_by(daytime, time) %>% mutate(x=ifelse(daytime=="Weekday", mean(length(time))/5, mean(length(time))/2), y=ifelse(daytime=="Weekday", mean(length(unique(user_screen_name)))/5, mean(length(unique(user_screen_name)))/2)) 

require("gridExtra")
p1 <- ggplot(spaintime) + geom_bar(aes(x=time, y=x, group=daytime, fill=daytime), stat="identity", position="dodge") + theme(axis.title.x = element_text(face = "bold", size=8),
          axis.text.x = element_text(size=8),
          axis.title.y = element_text(face = "bold", size=8),
          axis.text.y = element_text(size=8),
          legend.position = "none",
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_text(size=6),
          plot.title = element_text(face = "bold", size=8),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          panel.background = element_blank(), 
          axis.line = element_blank()) +
    ggplot2::labs(
    x = "Time", y = "Tweets", fill=NULL,
    title = "",
    subtitle = "",
    caption = "\n\n\n")
p2 <- ggplot(spaintime) + geom_line(aes(x=time, y=y, group=daytime, color=daytime), stat="identity") + theme(axis.title.x = element_text(face = "bold", size=8),
          axis.text.x = element_text(size=8),
          axis.title.y = element_text(face = "bold", size=8),
          axis.text.y = element_text(size=8),
          legend.position = "none",
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_text(size=6),
          plot.title = element_text(face = "bold", size=8),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          panel.background = element_blank(), 
          axis.line = element_blank()) +
    ggplot2::labs(
    x = "Time", y = "Accounts", fill=NULL,
    title = "",
    subtitle = "",
    caption = "Tweet activity levels by 216 Spain IO accounts on Weekdays (red) vs. Weekends (blue)\nThe left graph displays average number of tweets posted by type of day and hour\nThe right graph displays average number of accounts active by type of day and by hour\nSource: Twitter transparency report")
grid.arrange(p1,p2, nrow=1)

```

***The Spain IO accounts did not have many followers and nearly all 216 active accounts were following more accounts that they had followers. Using a simple linear regression formula, the average IO account followed 193 accounts and had only 65 followers. The profile locations of the accounts were largely localities in Spain, with few exceptions. The profile descriptions were mostly generic however several mentioned affiliation with or support for PP. A handful denoted support for other parties (e.g. Podemos)***     

```{r followers, echo=FALSE}
spain_following <- spainbasic %>% select(user_screen_name, follower_count, following_count) %>% mutate(followers=as.numeric(follower_count), following = as.numeric(following_count)) %>% distinct() %>% arrange(desc(followers))
spain_following$followers <- as.integer(spain_following$followers)
spain_following$following <- as.integer(spain_following$following)

#mean(spain_following$following)

#regression <- lm(followers ~ following, data=spain_following)
#new <- data.frame(following=193)
#predict(regression, new)

#GRAPH
spain_following_plot <- spain_following %>% ggplot(aes(x=following, y=followers, color=followers)) + geom_point(shape = 16, size = 5, show.legend = FALSE, alpha = .4) + geom_abline(col="red", lty=2, alpha=.4) + geom_smooth(method='lm', lty=2, color="gray")
spain_following_plot + xlim(0, 4200) + ylim(0,4200) +
    ggplot2::theme(axis.title.x = element_text(face = "bold", size=8),
          axis.text.x = element_text(size=8),
          axis.title.y = element_text(face = "bold", size=8),
          axis.text.y = element_text(size=8),
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_text(size=8),
          plot.title = element_text(face = "bold", size=8),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          panel.background = element_blank(), 
          axis.line = element_blank()) +
    scale_color_gradient(low = "#0091ff", high = "#f0650e") +
    #ggplot2::theme_bw() +
    #geom_line(data=spaingraph_table, aes(x=date, y=retweet), colour="orange") +
    ggplot2::labs(
    x = "Following", y = "Followers",
    title = "",
    subtitle = "",
    caption = "Followers vs. Following Data of 216 Spain IO Accounts\nRed Line = Line of Equilibrium\nGray Line = Regression Line (R^2=0.66, p-value < 0.001)\nSource: Twitter transparency report")
```

***Although the Spain IO accounts had few followers, they frequently tweeted using popular hashtags enabling them to reach well beyond than their limited follower bases. The most widely used Twitter hashtags by the IO accounts were among the most commonly used by PP and its supporters: "PedroSeLoFunde" (Pedro Melts It), "Decretazo Sanchez" (Sanchez Decree), "NiObreroNiEspañol" (Neither Worker Nor Spanish), "StoPSOE" (Stop PSOE), and "Nohablamoshacemos" (We don't talk we do).***

```{r, echo=FALSE}
#regex <- "#([A-Za-z]+[A-Za-z0-9_]+)(?![A-Za-z0-9_]*\\.)"
require("tidytext")
require("stringr")
data_hash <- spainsna %>% select(user_screen_name, tweet_text) %>% unnest_tokens(tweets, token="regex", tweet_text) %>% filter(str_detect(tweets, "^#")) 
#head(data_hash)
data_hash$tweets <- gsub("\\:","",data_hash$tweets)

data_hash %>% mutate(tweets=as.factor(tweets)) %>% group_by(tweets) %>% count() %>% arrange(desc(n)) %>% head(20) %>% ggplot(aes(x=tweets, y=n, fill=n)) + geom_bar(stat="identity") +
    aes(x=reorder(tweets, n)) +
    theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.text.y=element_blank(),axis.ticks=element_blank()) +
    scale_fill_gradient(low='yellow', high='red', limits=c(0,2500)) +     
    theme(axis.title.x = element_text(face = "bold", size=8),
          #axis.text.x = element_text(face = "bold", size=8, angle=-20),
          axis.text.x = element_text(face = "bold", size=8),
          axis.title.y = element_blank(),
          axis.text.y = element_text(face = "bold", size=8),
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_text(size=8),
          plot.title = element_text(face = "bold", size=8),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          #panel.grid.minor=element_blank(),
          #panel.grid.major=element_blank(),
          panel.background = element_blank(),  
          #axis.line = element_blank(),
          axis.ticks.y=element_blank()) +
    #coord_polar(theta="x", clip = "on") + 
    coord_flip() + 
    #ggplot2::theme_bw() +
    ggplot2::labs(
    x ="Hashtag", y = "Direct Tweets", fill="Tweets",
    title = "",
    subtitle = "",
    caption = "Top 20 hashtags used by 216 Spain IO accounts\nBar Color = Number of Times A Hashtag Was Used (Darker = More Used)\nSource: Twitter transparency report")
```

***Almost half of the tweets were retweets of other accounts, including direct retweets of posts from the official accounts of PP and its leaders. While theses retweets make up a small fraction of the total retweets that these posts received, it suggests that the IO accounts were seeking to further inflate the party's popular support levels and amplify its narratives. The timing patterns of direct retweets of the top 20 most retweeted statuses by IO accounts of the indicates a high level of coordination and responsiveness.***


```{r}
spainrttime$user_screen_name <- as.factor(spainrttime$user_screen_name)
spainrttime$tweet_time <- as.POSIXct(spainrttime$tweet_time)
spainrttime$is_retweet <- as.factor(spainrttime$is_retweet)
spainrttime$retweet_tweetid <- as.factor(spainrttime$retweet_tweetid)
#spainrttime %>% select(tweet_time, is_retweet, retweet_tweetid) %>% filter(is_retweet=="true") %>% group_by(retweet_tweetid) %>% summarise(n=length((retweet_tweetid))) %>% arrange(desc(n))
spainrttime %>% select(tweet_time, retweet_tweetid) %>% filter(retweet_tweetid %in% c("1116311005343232000", "1106153994446082048", "1109026412239949824", "1109026412239949824", "1114107501459443712", "1106511750994817024", "1108382721842266112", "1108273607355056128", "1114631274457640960", "1105787785557012480", "1116603906094592000", "1114264488994381824", "1116307011505934336", "1118248399596212224", "1108745432656736256", "1119704274982690816", "1103233633039671296", "1115924156888113152", "1116600951236775936", "1117731638123925504", "1102503606379597824")) %>% group_by(retweet_tweetid) %>% arrange(tweet_time) %>% mutate(reach=seq_along(tweet_time)) %>% ggplot(aes(x=tweet_time, y=reach)) + geom_jitter(aes(color=retweet_tweetid)) + geom_path(aes(colour=retweet_tweetid), size=0.5, alpha=0.5) +
    theme(axis.title.x = element_text(face = "bold", size=8),
          #axis.text.x = element_text(face = "bold", size=8, angle=-20),
          axis.text.x = element_text(face = "bold", size=8),
          axis.title.y = element_text(face = "bold", size=8),
          axis.text.y = element_text(face = "bold", size=8),
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_blank(),
          legend.position = "none",
          plot.title = element_blank(),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          #panel.grid.minor=element_blank(),
          #panel.grid.major=element_blank(),
          panel.background = element_blank(),  
          #axis.line = element_blank(),
          axis.ticks.y=element_blank()) +
    #coord_polar(theta="x", clip = "on") + 
    #coord_flip() + 
    #ggplot2::theme_bw() +
    ggplot2::labs(
    x ="Date", y = "Retweets", fill="Tweets",
    title = "",
    subtitle = "",
    caption = "Top 20 directly retweeted statuses by 216 Spain IO accounts\nDifferent Color = Different Status\nOne Dot = One tetweet of status\nThe straighter the line, the shorter the time between retweets\nSource: Twitter transparency report")

```

***The most retweeted post (47 times) was a post from then-Catalonia PP Deputy Andrea Levy Soler emphasizing a hardline stance against Catalonian separatism efforts:***

<blockquote class="twitter-tweet"><p lang="es" dir="ltr">Todo mi apoyo. Una vez más, como a otros que hemos sufrido episodios así por defender ideas contrarias al totalitarismo nacionalista, alentado y patrocinado por el Gobierno independentista. ¡Basta ya de fascismo en Cataluña! Ni un paso atrás. ¡Seguiremos defendiendo la libertad! <a href="https://t.co/9vohtzHWAt">https://t.co/9vohtzHWAt</a></p>&mdash; Andrea Levy (@ALevySoler) <a href="https://twitter.com/ALevySoler/status/1116311005343232000?ref_src=twsrc%5Etfw">April 11, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

***A retweet of a post from the official PP Twitter account underscoring its tough position on criminal justice was retweeted 32 times:***

<blockquote class="twitter-tweet"><p lang="es" dir="ltr">Defendemos la <a href="https://twitter.com/hashtag/Prisi%C3%B3nPermanenteRevisable?src=hash&amp;ref_src=twsrc%5Etfw">#PrisiónPermanenteRevisable</a> para que asesinos, violadores, pederastas y pirómanos que causan incendios con víctimas mortales no salgan de la cárcel. <br><br>El PSOE votó en contra. Hay que echar a Pedro Sánchez. <a href="https://t.co/UIVQKXpC1Q">pic.twitter.com/UIVQKXpC1Q</a></p>&mdash; Partido Popular 🇪🇸 (@populares) <a href="https://twitter.com/populares/status/1114107501459443712?ref_src=twsrc%5Etfw">April 5, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

***A post highlighting that PP is tough on crime and illegal immigration while pushing back on a Vox party proposal to arm citizens from then-PP Deputy Javier Moroto was retweeted 30 times:***  

<blockquote class="twitter-tweet"><p lang="es" dir="ltr">El PP defiende medidas estrictas contra la delincuencia, las mafias y la inmigración ilegal. Y defendemos a la policia y la guardia civil. Incluso defendemos la presión permanente revisable para delitos muy graves.<br>Pero proponer que los españoles llevemos pistola por la calle, NO <a href="https://t.co/4uXBS23mWJ">pic.twitter.com/4uXBS23mWJ</a></p>&mdash; Javier Maroto (@JavierMaroto) <a href="https://twitter.com/JavierMaroto/status/1108273607355056128?ref_src=twsrc%5Etfw">March 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

***A topical analysis of the most frequently mentioned terms provides insight into the content of the messages that IO accounts tweeted. The terms used most often -- excluding hashtags and mentions -- were almost exclusively election-related in nature. PSOE leader Pedro Sanchez, Partido Popular and PP leader Pablo Casado were the mentioned entities by IO accounts while topics related to economic and social justice as well as executive decrees featured prominently***

```{r, echo=FALSE}
require("tidytext")
require("tm")
require("wordcloud")
########

#Text Cleaning
set.seed(123)
spain_text <- spainsna %>% select(tweet_text)
removeHash <- function(x) gsub("#\\w+ *", "", x)
spain_text$tweet_text <- sapply(spain_text$tweet_text, removeHash)
removeURL <- function(x) gsub("http[[:alnum:][:punct:]]*", "", x)
spain_text$tweet_text <- sapply(spain_text$tweet_text, removeURL)
removeMention <- function(x) gsub("@\\w+ *", "", x)
spain_text$tweet_text <- sapply(spain_text$tweet_text, removeMention)
#head(spain_text)

##Bigrams
spain_bigrams <- spain_text %>% unnest_tokens(bigram, tweet_text, token="ngrams", n=2)
#spain_bigrams %>% count(bigram, sort = TRUE)
require("tidyr")
custom_stop_words <- bind_rows(stop_words,
                               data_frame(word = tm::stopwords("spanish"),
                                          lexicon = "custom"))
bigrams_separated <- spain_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
  filter(!word1 %in% c(custom_stop_words$word, word)) %>%
  filter(!word2 %in% c(custom_stop_words$word, word))

#head(myStopwords)
# new bigram counts:
bigram_counts <- bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)
#bigram_counts
bigram_counts$bigrams <- do.call(paste, c(bigram_counts[c("word1", "word2")], sep = " "))
bigram_counts %>% select(bigrams, n)
wordcloud(words = bigram_counts$bigrams, freq = bigram_counts$n, min.freq = 100,
          max.words=50, random.order=FALSE, scale=c(3,.5), rot.per=0.25, 
          colors=brewer.pal(4, "Dark2"))

```

```{r, echo=FALSE, eval=FALSE}
require("treemap")
bigram_counts %>% select(bigrams, n) %>% arrange(desc(n)) %>% top_n(50) %>% treemap(index="bigrams", vSize="n", palette="RdYlGn", range=c(200,3000) , mapping=c(200,1000,3000), title = "Sales Treemap For categories", fontsize.labels = c(7, 5), align.labels =list(c("centre","centre"),c("left","top")))
```

***The Spain IO tweets very frequently mentioned PP and its leaders. Of the top ten most mentioned Twitter accounts, 8 were associated with PP and 2 with PSOE (the party's official account and that of its leader Pedro Sanchez). There were far fewer mentions of other PSOE leaders, other parties, or thier leaders. Each line in the graph below represents a mention in a tweet by an IO account of a PP account (red) or a PSOE account (blue).***

```{r, echo=FALSE, eval=TRUE}
#require("ggraph")
#require("igraph")
#require("haven")
#require("readr")

data <- spainsna %>% select(user_screen_name, tweet_text) %>% unnest_tokens(tweets, token="regex", tweet_text) %>% filter(str_detect(tweets, "^@")) 
data$tweets <- gsub("\\:","",data$tweets)
data %>% group_by(tweets) %>% count() %>% arrange(desc(n))
set.seed(3952)
labels <- read.csv("labels.csv")
labels <- labels[,1:2]
data2 <- data %>% left_join(labels)
data2 <- data2 %>% na.omit()

#data2 %>% head(2)
hairball <- graph_from_data_frame(data2)

V(hairball)$node_label <- unname(ifelse(degree(hairball, mode="in")[V(hairball)] > 100, names(V(hairball)), ""))
#V(hairball)$node_size <- unname(ifelse(degree(hairball)[V(hairball)] > 100, degree(hairball), 0)) 

#hairball %>% ggraph(layout = 'linear', circular = TRUE) + 
#  geom_edge_arc(edge_width=0.0125, aes(colour=party, alpha=..index..)) +
#  geom_node_label(aes(label=node_label, size=0.2), label.size=0 ,segment.colour="gray", color="black",repel=T) +
  #coord_fixed() +
  #scale_size_area(trans="sqrt") +
#  ggplot2::theme(axis.title.x = element_blank(),
#                 axis.text.x = element_blank(),
#                 axis.title.y = element_blank(),
#                 axis.text.y = element_blank(),
#                 legend.title=element_text(face = "bold", size=2),
#                 legend.text = element_blank(),
#                 plot.title = element_text(face = "bold", size=2),
#                 plot.subtitle = element_text(face = "bold", size=2),
#                 plot.caption = element_text(face = "italic", size=8),
#                 panel.grid.minor=element_blank(),
#                 panel.grid.major=element_blank(),
#                 panel.background = element_blank(), 
#                 axis.line = element_blank(),
#                 axis.ticks = element_blank()) +
  #theme_void()+
#  labs(title="", 
#       subtitle="", 
#       caption="Top 10 most mentioned accounts by Spain IO accounts\nRed Lines = Mentions of PP accounts\nBlue Lines = Mentions of PSOE accounts\nSource: Twitter transparency report") +
  #theme_graph(base_family=font_an) +
#  theme(legend.position="none")

hairball %>% ggraph(layout = 'linear', circular = FALSE) + 
  geom_edge_arc(edge_width=0.0125, aes(colour=party)) +
  #geom_edge_arc(color = "orange", width=0.05) +
  geom_node_point(size=0.2, color="gray50") +
  #geom_node_label(aes(label=node_label, size=0.0000000002), color="black") +
  geom_node_label(aes(label=node_label, size = 0.025), nudge_y=12, repel = TRUE, color="black") +
  #coord_fixed() +
  #scale_size_area(trans="sqrt") +
  ggplot2::theme(axis.title.x = element_blank(),
                 axis.text.x = element_blank(),
                 axis.title.y = element_blank(),
                 axis.text.y = element_blank(),
                 legend.title=element_text(face = "bold", size=2),
                 legend.text = element_blank(),
                 plot.title = element_text(face = "bold", size=2),
                 plot.subtitle = element_text(face = "bold", size=2),
                 plot.caption = element_text(face = "italic", size=8),
                 panel.grid.minor=element_blank(),
                 panel.grid.major=element_blank(),
                 panel.background = element_blank(), 
                 axis.line = element_blank(),
                 axis.ticks = element_blank()) +
  #theme_void()+
  labs(title="", 
       subtitle="", 
       caption="Top 10 most mentioned accounts by Spain IO accounts\nRed Lines = Mentions of PP accounts\nBlue Lines = Mentions of PSOE accounts\nSource: Twitter transparency report") +
  #theme_graph(base_family=font_an) +
  theme(legend.position="none")
```

***Of the 27,042 total retweets, 1 of every 3 was an IO account retweeting another IO account. Considering the content of the tweets, this pattern suggests a coordinated campaign to amplify party narratives. The graph below highlights the volume of retweet activity among IO accounts where each line represents an IO retweet of another IO account. Given the high number of retweets of other IO accounts and the low number of followers of these accounts, it is likely that many of the followers of IO accounts were IO accounts.***

```{r, echo=FALSE}
require("tidytext")
require("stringr")
require("igraph")
require("haven")
require("readr")
require("ggraph")
require("hrbrthemes")

spainrt2 <- spainrt
spainrt2 <- spainrt2[!(is.na(spainrt2$retweet_userid) | spainrt2$retweet_userid==""), ]
#spainrt2 %>% select(userid, retweet_userid, is_retweet) %>% filter(is_retweet=="true") #8,831 retweets
spainrt2 <- spainrt2 %>% select(userid, retweet_userid)

#length(unique(as.factor(spainrt2$userid)))
#length(unique(as.factor(spainrt2$retweet_userid)))

snart <- graph.data.frame(spainrt2, directed=T)
V(snart)$color <- "orange"
#V(sna)[degree(sna, mode="in") > 1000]$color <- "red" #distinguishing high degree
#V(snart)$node_label <- unname(ifelse(degree(snart, mode="in")[V(snart)] > 10, names(V(snart)), ""))
#V(snart)$node_size <- unname(ifelse(degree(snart)[V(sna)] > 10, degree(snart), 0)) 
E(snart)$color <- "gray"
set.seed(3952)
#plot(snart,
#     vertex.size = degree(snart, mode='in')*0.005,
#     vertex.label=NA,
#     edge.arrow.size = 0.001,
#     edge.arrow.mode = "_",
#     layout = layout.kamada.kawai)
     #layout = layout.auto)
     #layout = layout_with_kk)
     #vertex.label.cex= 0.6,
     #vertex.label.degree=-pi/2)


#ARC LAYOUT - VERY PRETTY!
snart %>% ggraph(layout = 'linear') + 
    geom_edge_arc(color = "orange", width=0.05) +
    geom_node_point(size=0.2, color="gray50") +
    ggplot2::theme(axis.title.x = element_blank(),
          axis.text.x = element_blank(),
          axis.title.y = element_blank(),
          axis.text.y = element_blank(),
          legend.title=element_text(face = "bold", size=8),
          legend.text = element_blank(),
          plot.title = element_text(face = "bold", size=8),
          plot.subtitle = element_text(face = "bold", size=8),
          plot.caption = element_text(face = "italic", size=8),
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          panel.background = element_blank(), 
          axis.line = element_blank(),
          axis.ticks = element_blank()) +
    #theme_void()+
    labs(title="", 
         subtitle="", 
         caption="Spain IO retweet activity among IO accounts\n126 IO accounts retweeted 111 IO accounts 8,831 times\nDots = IO accounts. Lines = Retweets\nSource: Twitter transparency report") +
  #theme_graph(base_family=font_an) +
  theme(legend.position="none")
```


```{r, eval=FALSE, echo=FALSE}
#URLS BUBBLE GRAPH?
#E
##slugs??
require("tidyr")
domain <- function(x) strsplit(gsub("http://|https://|www\\.", "", x), "/")#[c(1, 1)]
spainurl$urls <- gsub("\\[|\\]","", spainurl$urls) #remove the brackets around
spainurl$urls <- gsub("\\'","", spainurl$urls)
spainurl$url_list <- sapply(spainurl$urls, domain)
spainurl$domain <- lapply(spainurl$url_list, function(x) x[1])
spainurl$domain2 <- unlist(spainurl$domain)

#MOST SHARED URL LINKS
spainurl %>% select(urls, domain2) %>% group_by(urls) %>% count() %>% arrange(desc(n))
#MOST SHARED DOMAINS
spainurl %>% select(user_screen_name, domain2, urls) %>% group_by(domain2) %>% count() %>% arrange(desc(n))
spainurl %>% select(user_screen_name, domain2, urls) %>% group_by(urls) %>% count() %>% arrange(desc(n))

```


```{r, echo=FALSE, eval=FALSE}
#Retweets vs. non Retweets
spainrt %>% group_by(as.factor(is_retweet)) %>% count() 
#Who were they retweeting?
spainrt %>% group_by(as.factor(retweet_userid)) %>% filter(is_retweet=="true") %>% count() %>% arrange(desc(n))
#what were they retweeting
spainrt %>% group_by(as.factor(retweet_tweetid)) %>% filter(is_retweet=="true") %>% count() %>% arrange(desc(n))

#1 https://twitter.com/ALevySoler/status/1116311005343232000
#2 https://twitter.com/populares/status/1114107501459443712
#3 https://twitter.com/populares/status/1106511750994817024

#let's get a clean version with userid <-> retweet_userid for social network analysis
spainrt2 <- spainrt
spainrt2 <- spainrt2[!(is.na(spainrt2$retweet_userid) | spainrt2$retweet_userid==""), ]
spainrt2 %>% select(userid, retweet_userid, is_retweet) %>% filter(is_retweet=="true") #8,831 retweets
spainrt2 <- spainrt2 %>% select(userid, retweet_userid)

length(unique(as.factor(spainrt2$userid)))
length(unique(as.factor(spainrt2$retweet_userid)))

#match the userid to the retweet id to see how they were sharing info with one another...

```

```{r echo=FALSE, eval=FALSE}
sna_hash <- graph.data.frame(data_hash, directed=T)
#layout1 <- layout.fruchterman.reingold(sna_hash)
V(sna_hash)$color <- "yellow"
V(sna_hash)[degree(sna_hash, mode="in") > 1000]$color <- "red" #distinguishing high degree
E(sna_hash)$color <- "gray"
V(sna_hash)$node_label <- unname(ifelse(degree(sna_hash, mode="in")[V(sna)] > 1000, names(V(sna_hash)), ""))
set.seed(3952)

plot(sna_hash,
     vertex.size = degree(sna_hash, mode='in')*0.01,
     vertex.label = ifelse(degree(sna_hash) > 1000, V(sna_hash)$node_label, NA),
     edge.arrow.size = 0.001,
     edge.arrow.mode = "_",
     vertex.label.cex= 0.6,
     layout = layout.graphopt)
     #layout = layout.kamada.kawai)
     #layout = layout.auto)
     #layout = layout.grid.3d)
     #layout = layout_with_kk)


require("ggraph")
require("hrbrthemes")

#CIRCULAR LAYOUT! -- NEEDS TWEAKIN
sna %>% ggraph(layout = 'linear', circular = TRUE) + 
  geom_edge_arc(edge_width=0.025, aes(alpha=..index..)) +
  geom_node_label(aes(label=node_label, size=node_size),
                  label.size=0, fill="#ffffff66", segment.colour="springgreen",
                  color="slateblue", repel=TRUE, family=font_an, fontface="bold") +
  coord_fixed() +
  scale_size_area(trans="sqrt") +
  labs(title="Mention Relationships", subtitle="Most mentioned screen names labeled. Darker edges == more mentions. Node size == larger degree") +
  theme_graph(base_family=font_an) +
  theme(legend.position="none")

#ARC LAYOUT - VERY PRETTY!
ggraph(sna_hash, layout = 'linear') + 
    geom_edge_arc(color = "orange", width=0.007) +
    geom_node_point(size=2, color="gray50") +
    theme_void()

```



```{r, echo=FALSE, echo=FALSE}
#user -> mention -> hashtag
#SANKEY DIAGRAM

```


```{r, echo=FALSE, eval=FALSE}
###
spain_text <- spainsna %>% select(tweet_text)
spain_bigrams <- spain_text %>% unnest_tokens(bigram, tweet_text, token="ngrams", n=2)
spain_bigrams %>% count(bigram, sort = TRUE)

require("tidyr")
bigrams_separated <- spain_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
  filter(!word1 %in% c(custom_stop_words$word, word)) %>%
  filter(!word2 %in% c(custom_stop_words$word, word))
# new bigram counts:
bigram_counts <- bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)
bigram_counts
```

###
```{r, echo=FALSE, eval=FALSE}
#SIMPLIFYING ALT
#removing self loops
sna_simple <- simplify(sna, remove.multiple=T, remove.loops=T)
#layout options
layout2 <- layout.fruchterman.reingold(sna_simple)
V(sna_simple)$color <- "yellow"
V(sna_simple)[degree(sna_simple, mode="in") > 100]$color <- "red"
V(sna_simple)$size=degree(sna_simple, mode="in")
E(sna_simple)$color <- "gray"
set.seed(3952)
plot(sna_simple,
     #vertex.size = V(sna)$degree,
     vertex.size = degree(sna_simple, mode='in')*0.1,
     vertex.label=ifelse(degree(sna_simple) > 100, V(sna_simple)$node_label, NA),
     edge.arrow.size = 0.001,
     edge.arrow.mode = "_",
     vertex.label.cex= 0.6,
     vertex.label.degree=-pi/2)

```


```{r, echo=FALSE, eval=FALSE}
#Messin' around...
sna1 <- graph.data.frame(data)
#degree(sna1, mode='in')
V(sna1) #4156 vertices
E(sna1) #68085 edges

plot(sna1,
     vertex.color=rainbow(35),
     vertex.size = V(sna1)$degree,
     edge.arrow.size=0.1,
     layout=layout.fruchterman.reingold)
plot(sna1,
     vertex.color = rainbow(35),
     vertex.size = degree(sna1, mode='in')*0.4,
     edge.arrow.size=0.1,
     layout=layout.graphopt)

hs1 <- hub_score(sna1)$vector
as1 <- authority_score(sna1)$vector
par(mfrow=c(1,2))
set.seed(123)
plot(sna1, vertex.size=hs1*30,
     main='Hubs',
     vertex.color=rainbow(5),
     edge.arrow.size=0.1,
     vertex.label=NA,
     layout=layout.kamada.kawai)
plot(sna1, vertex.size=as1*30,
     main='Authorities',
     vertex.color=rainbow(5),
     edge.arrow.size=0.1,
     vertex.label=NA,
     layout=layout.kamada.kawai)

spain_net <- graph.data.frame(data, directed=F)
spain_cnet <- cluster_edge_betweenness(spain_net)
plot(spain_cnet, 
     spain_net,
     vertex.size=10,
     vertex.label.cex=0.8)

```

```{r, echo=FALSE}
V(sna1)$node_label <- unname(ifelse(degree(sna1)[V(sna1)] > 20, names(V(sna1)), "")) 
V(sna1)$node_size <- unname(ifelse(degree(sna1)[V(sna1)] > 20, degree(sna1), 0)) 
require("ggraph")
require("hrbrthemes")
ggraph(sna1, layout = 'linear', circular = TRUE) + 
  geom_edge_arc(edge_width=0.125, aes(alpha=..index..)) +
  geom_node_label(aes(label=node_label, size=node_size),
                  label.size=0, fill="#ffffff66", segment.colour="springgreen",
                  color="slateblue", repel=TRUE, family=font_rc, fontface="bold") +
  coord_fixed() +
  scale_size_area(trans="sqrt") +
  labs(title="Retweet Relationships", subtitle="Most retweeted screen names labeled. Darkers edges == more retweets. Node size == larger degree") +
  theme_graph(base_family=font_rc) +
  theme(legend.position="none")

```

```{r, echo=FALSE, eval=FALSE}
regex <- "@([A-Za-z]+[A-Za-z0-9_]+)(?![A-Za-z0-9_]*\\.)"
data <- spainsna %>% select(user_screen_name, tweet_text) %>% unnest_tokens(tweets, token="regex", tweet_text) %>% filter(str_detect(tweets, "^@")) 
data$tweets <- gsub("\\:","",data$tweets)

data %>% group_by(tweets) %>% count() %>% arrange(desc(n))

sna <- graph.data.frame(data, directed=T)
set.seed(3952)
layout1 <- layout.fruchterman.reingold(sna)
V(sna)$color <- "yellow"
V(sna)[degree(sna, mode="in") > 1000]$color <- "red" #distinguishing high degree
V(sna)$node_label <- unname(ifelse(degree(sna, mode="in")[V(sna)] > 10000, names(V(sna)), ""))
V(sna)$node_size <- unname(ifelse(degree(sna)[V(sna)] > 10000, degree(sna), 0)) 
E(sna)$color <- "gray"
set.seed(3952)
sna %>% ggraph('circlepack', weight='size') + geom_edge_link() + geom_node_point(aes(colour=depth)) + coord_fixed()
##REPLACE WITH GGRAPH 
plot(sna,
     #vertex.size = V(sna)$degree,
     vertex.size = degree(sna, mode='in')*0.005,
     vertex.label=ifelse(degree(sna) > 1000, V(sna)$node_label, NA),
     edge.arrow.size = 0.001,
     edge.arrow.mode = "_",
     vertex.label.cex= 0.6,
     vertex.label.degree=-pi/2,
     layout=layout.kamada.kawai)

#<blockquote class="twitter-tweet"><p lang="es" dir="ltr">💸 Pedro Sánchez sigue gastando el dinero de todos los españoles para su campaña. No son viernes sociales, son viernes electorales. <a href="https://twitter.com/hashtag/DecretazoS%C3%A1nchez?src=hash&amp;ref_src=twsrc%5Etfw">#DecretazoSánchez</a> <a href="https://t.co/NsdtaCKQnW">pic.twitter.com/NsdtaCKQnW</a></p>&mdash; Partido Popular 🇪🇸 (@populares) <a href="https://twitter.com/populares/status/1106511750994817024?ref_src=twsrc%5Etfw">March 15, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

#<blockquote class="twitter-tweet"><p lang="es" dir="ltr">Las pensiones se pagan con empleo. Quien pone en peligro las pensiones son los que crean paro con sus políticas socialistas fallidas. <br><br>La postura que he defendido siempre es clara. <a href="https://twitter.com/hashtag/L6Npablocasado?src=hash&amp;ref_src=twsrc%5Etfw">#L6Npablocasado</a> <a href="https://t.co/xUiy5NaIpJ">https://t.co/xUiy5NaIpJ</a></p>&mdash; Daniel Lacalle (@dlacalle) <a href="https://twitter.com/dlacalle/status/1114631274457640960?ref_src=twsrc%5Etfw">April 6, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
```

```{r, echo=FALSE, eval=FALSE}
accounts <- read.csv("spainaccounts.csv")
profile <- accounts %>% select(user_reported_location, user_profile_description)
profile$location <- profile$user_reported_location
profile$description <- profile$user_profile_description
profile <- profile %>% select(description)
profile

```

```{r, echo=FALSE, eval=FALSE}
#The accounts were primarily tweeting and retweeting pro-PP -- and anti-PSOE -- messages.A random sample of 100 unique tweets (excluding retweets) gives an indication of the content that was generated by these IO accounts:***

require("sampler")
set.seed(123)
spainrt$is_retweet <- as.factor(spainrt$is_retweet)
text <- spainrt %>% select(tweet_text, is_retweet) %>% filter(is_retweet=="false") %>% rsamp(100)
names(text) <- c("Tweet", "Retweet")
text %>% select(Tweet)
```

```{r echo=FALSE, eval=FALSE} 
spainrt %>% group_by(as.factor(is_retweet)) %>% count() 
#Who were they retweeting?
spainrt %>% group_by(as.factor(retweet_userid)) %>% filter(is_retweet=="true") %>% count() %>% arrange(desc(n))
#what were they retweeting
spainrt %>% group_by(as.factor(retweet_tweetid)) %>% filter(is_retweet=="true") %>% count() %>% arrange(desc(n))
```
