On June 15, 2018, Matthew Wright used an armored vehicle to block traffic on the Hoover Dam bypass. He brought an AR-15 and a handgun with him, and he had a single goal: to force the United States government to release a report from the Office of the Inspector General that would, he believed, contain evidence of blatant criminality by FBI agents while they were handling the investigation into Hillary Clinton’s use of a private email server.

The real OIG report had been released the previous day, which was the catalyst for Wright’s action: a vast, nebulous, and complicated conspiracy theory community had convinced themselves that there was a “real” OIG report, one that the government was hiding from the American people. Motivated by cryptic posts on the 4Chan message boards by a person claiming to have Q-level security clearance, the community had rallied around the belief that branches of the federal government were actively working to stage a coup against Donald Trump; that Special Counsel Robert Mueller was actually working with Donald Trump to arrest, prosecute, and imprison political opponents including Clinton, Barack Obama, and George Soros; and that high-profile liberal politicians, celebrities, and Jewish figures are members of a global child-trafficking ring.

The conspiracy theory spun out of the ashes of the #pizzagate conspiracy theory, and it has amassed a sizeable following since the election of Donald Trump. It has gained several noteworthy adherents, including Roseanne Barr, and believers have appeared at Trump rallies, in photographs of military personnel, and even at the White House.

From its roots on 4Chan, the QAnon community branched out across social media platforms. One of its strongest bases of support was on Reddit, where the subreddit r/GreatAwakening organized, recruited, and disseminated propaganda related to the conspiracy. After Reddit banned all associated subreddits, the community largely moved onto Voat and Gab. However, a significant portion of QAnon discussion and organization still takes place on Twitter, which has not, as of yet, taken meaningful action against the conspiracy theory.

While liberal news sites, right-wing watchdogs, and mainstream media have all covered QAnon, little rigorous social network analysis of the conspiracy and its members has been undertaken. Because of its maximalist approach to freedom of speech and reluctance to ban communities, Twitter provides the best opportunity to glean information about how the network is currently structured: who are the message magnifiers? How do adherents arrange themselves around more mainstream figures? Do automated accounts or foreign influencers play major roles in the network?

To answer these questions, this paper analyzes a corpus of 50,000 Tweets posted with the hashtag #QAnon on December 5, 2018, to visualize the conspiracy theory’s community and identify important actors and subgroups. The date was not chosen arbitrarily: #D5, as it was called, was promoted as a crucial day in QAnon circles, with various unsubstantiated claims asserting that Trump’s political enemies would finally be brought to justice. With the community already cracking apart due to growing discontent with the number of Q’s predictions that were definitively proven false, #D5 was seen as the day that the conspiracy theory would finally be validated.

In this paper, I employ an array of social network analysis tools. I first look at the overarching network to identify the main clusters of activity and whether these clusters are hierarchical or more egalitarian. Next, I find the accounts with the highest centralities in order to identify the most influential posters, those who may act as bridges from one Twitter sub-community to another, and those who are the message magnifiers. Finally, I attempt to break down the network even further into sub-groups in order to investigate specific characteristics of smaller groups.

Methodology

The D5-QAnon network was built using a corpus of 52,228 Tweets gathered via rtweet (Kearney 2018), an R wrapper for the Twitter API. Since I am interested in the broader network of interactions and message-sharing, I opted not to base ties on the follower-followed Twitter relationship. Rather, user mentions constitute the edges in the network. If one user mentions another in a Tweet, the first user will have an arc going from their node to the mentioned user’s node.

In this way, the network captures the flow of information more accurately than the follower/followed relationship. Since some of the actors are bots, which often have follow counts on the extremes–either no one or tens of thousands of users–the follow relationship is inherently limited in visualizing message flows.

With the initial network built from the edgelist, I performed several analyses: component identification to understand where the core network of users is located; centralization calculations to determine the level of hierarchy present; user centrality calculations to locate power users, message magnifiers, and users who span multiple communities; and subgroup visualization to understand how clusters within the main component operate.

Locating Components

An inherent part of mining Twitter based on a search term or a hashtag and not a specific set of users, is that the dataset will inevitably catch users who do not belong in the network, or multiple, disconnect networks. With a hashtag like #QAnon, which has garnered significant attention outside of the core network of believers, any set of Tweets will likely include journalists and academics studying the community; laypeople who do not understand what the hashtag stands for; and users who specifically push back against those in the community.

From a first glance at the full network visualization, it is easy to see these separate groups of people. They appear as isolates or smaller networks orbiting the large, central web. I simplify this broad network so that multiple mentions of the same person are collapsed to one arc.

## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   screen_name = col_character(),
##   text = col_character(),
##   created_at = col_datetime(format = ""),
##   retweet_count = col_integer(),
##   favorite_count = col_integer(),
##   Row.names = col_character(),
##   topics = col_integer(),
##   mentions_screen_name = col_character()
## )
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   screen_name = col_character(),
##   max.fol = col_integer()
## )

Immediately notable, however, is that there are not a lot of separate networks in this dataset. Normally, with other terms, we would expect several disjointed networks who are all talking about a similar topic. With #QAnon, however, the community appears to be particularly dense and dominant on that hashtag–an indication of a fairly insular social network.

Since the dense center is of the most interest in this study, I pick out the largest component connected by weak ties. I choose to base the decomposition on weak ties instead of strong ties to capture highly active or highly influential individuals–those who reply to a lot of other users without getting replies back, or vise versa.

Further analysis will be based on this central, core network to avoid dealing with issues that arise in disconnected networks. This network has an order of 18807, and an average degree of 14.2. Although in an absolute sense this indicates a relatively sparse network, the fact that, on average, users tweeting with #QAnon are mentioning or being mentioned by over 14 people is unusual. As the median degree is only 2, several users are clearly mentioning at much higher frequenciess than most others.

Centralization and Centrality

Indeed, measures of centralization reveal a fairly hierarchical network.

Table 1: Centralization Measures
cent_in_deg cent_out_deg cent_bet cent_eig
0.102 0.03 0.001 0.996

Most notably, the eigenvector centralization is very close to one–meaning that there are several users who have the most connections to others users and are also well-connected to other users. In-degree centralization is also somewhat high, meaning that there are some users who garner significantly more mentions than the majority of the network.

Taking a look at local measures of centrality, it is possible to tease out who are the dominant tweeters in the network. Certain celebrities and brands like YouTube, Donald Trump, and General Flynn appear at the top of in-degree centrality because of their topicality in this right-wing community. Out-degree centrality, on the other hand, is dominated by users who have relatively low in-degree centrality. These are power users and message magnifiers, those whose existence on Twitter is geared toward spreading a message as widely as possible–often with assistance from automated software.

Table 2: Top 5 Out-Degree Users
name out_deg
PITA444 572
twiceborn 334
erik_segelstrom 306
heytootssweet 242
PrimeCreator2 233
_Santa_Barbara 197
Table 3: Top 5 In-Degree Users
name in_deg
realDonaldTrump 1925
POTUS 919
YouTube 838
StormIsUponUs 698
LisaMei62 479
GenFlynn 362
Table 4: Top 5 Betweenness Users
name bet
heidi_weigand 530276.5
twiceborn 369510.1
PrimeCreator2 368588.4
QAnon_Wolf 361593.6
JustMy_NameHere 343946.7
threadreaderapp 293830.0
Table 5: Top 5 Eigenvector Users
name eig
realDonaldTrump 1.000
GenFlynn 0.996
FederalistNo78 0.732
AltHutch 0.694
InmateTwitmo 0.685
TWITMO_INMATE 0.679

Sub-groups and K-cores

Centrality tells only one part of the story, however. On Twitter, users arrange themselves into heterogeneous structures based on a variety of factors. We can visualize these internal sub-group structures via k-core analysis, which locates “cores” of varying densities. For these purposes, we will use both the simplified network that collapses multiple mentions of the same person into one arc, and the full network where each individual mention is represented by its own arc. The simplified network reveals a maximally dense core with a variety of different types of actors:

This core includes several prolific #QAnon community members, as well as a few celebrities who unsurprisingly appear frequently in the Tweets of these generally far-right users. Donald Trump and Michael Flynn, for instance, are often the recipients of Tweets–and this specific sub-group seems particularly likely to call out the President and his ex-National Security Advisor in their Tweets.

This sub-group gives insight into the dominant topics spoken about on “#D5” by QAnon believers. Since Flynn shows up, it is reasonable to assume that he was discussed. We can see this clearly by checking the text of the Tweets in this sub-group:

## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   screen_name = col_character(),
##   text = col_character(),
##   topics = col_integer()
## )

The accounts in this sub-group exhibit a specific style of Tweet indicative of a message-magnifying strategy. They mention sometimes dozens of other users at once, in an attempt to draw attention to their Tweet content. Often this content is nothing more than a link to a chat room, a list of hashtags, or a single sentence. Flynn has, seemingly unwillingly, been drawn into these mention-heavy Tweets as a result of his plea agreement and cooperation with Robert Mueller, as well as actions that the community has adopted as proof of Flynn’s alignment with Q (Anonymous Information YouTube Channel, 2018).

I also utilize Latent-Dirichlet Association (LDA), a statistical technique that groups documents (in this case, Tweets) into buckets based on topic discussed. As this is an unsupervised algorithm, the topics assigned do not have qualitative descriptions attached to them. However, manual review demonstrates that the vast majority of Tweets sent in this cluster use either hashtag-heavy or mention-heavy style, which are both presumably intended to maximize social engagement with specific content. Topic 9 (hashtag-heavy) and topic 5 (mention-heavy), as seen in the table below, compose the bulk of Tweets in this sub-group.

Table 6: Count of Topics in 25-core Center
topics n
1 30
2 14
3 4
4 1
5 81
6 28
7 17
8 12
9 306
10 7
NA 87

While the simplified network allows us to get a sense of the abstracted mention-based structure of a specific group of users, it also has the limitation of ignoring frequency-based characteristics of density. In other words, if there are prolific bot-style accounts mentioning the same several users over and over, the cores in the simplified network will not detect these bot-groups. As a result, I also investigate the full network, where each and every mention is represented by an arc.

In the highest-density core in the full network, which is a 925-core, the users do not interact with celebrities–they interact with each other. Several have out-degree measures so high that they are likely not “normal” users. @PrimeCreator2 is a good example of one of these anomalous users. @PrimeCreator2 has mentioned 46,000 users in his Tweets collected in the dataset, which requires an extraordinary frequency of Tweeting. In fact, according to analysis run with the Tweetbotornot package (Kearney 2018), the user is roughly 90% likely to use automated bot-like techniques. The flower-like appearance of the sub-group network and the connections between the “petals” show that this sub-group comprises @PrimeCreator2’s primary engagement community.

Manual review of @PrimeCreator2 shows that the account is probably not entirely fake, and there may be an actual person in charge of the account. However, his relatively high number of followers and followed accounts, combined with the sustained and high rate of tweeting, indicates that this account operates primarily as a message dissemenator–a user whose purpose is to garner as much engagement as possible spread his message as far as possible.

Local measures of centrality help quantify his role as a message magnifier. In the simplified network, his out-degree of 233 indicates he is engaging with hundreds of accounts per day; his in-degree of 12 indicates a moderate (although disproportionate) level of response from his community. However, his betweenness score reveals the most significant indicator of his role as an information broker and spreader. At 368,588, his betweenness score is the third-highest in the entire network. This high of a score demonstrates that he occupies a special position on the shortest path between over 368,000 different combinations of users.

In fact, four of the 10 members in this sub-group exhibit high betweenness traits and high out-degree. Even though the other three only tweeted a few times, their participation in @PrimeCreator2’s enormous Twitter thread with dozens of mentions granted them a high betweenness score–which, in the context of Twitter, is important. It means that they participated as an information broker. The top four users also have extremely high three-step reach centrality, with @PrimeCreator2 able to reach 72 percent of the entire network within three steps.

Table 7: Betweenness Scores of 925-Core Center
in_deg out_deg bet eig close_in close_out name
12 233 368588.37 0 0 0 PrimeCreator2
8 60 18193.52 0 0 0 kerrijacobi
10 50 17453.79 0 0 0 JeremyRobards7
8 50 87.12 0 0 0 mona_cajun
11 0 0.00 0 0 0 staggerlee422
11 0 0.00 0 0 0 2newearth777
11 0 0.00 0 0 0 dtrastikeville
12 0 0.00 0 0 0 FlaRhps
11 0 0.00 0 0 0 russ_thor
11 0 0.00 0 0 0 PCreator714
Table 8: 3-Step Reach Scores of 925-core Center
name reach3
PrimeCreator2 0.72
kerrijacobi 0.64
JeremyRobards7 0.41
mona_cajun 0.39
FlaRhps 0.35
2newearth777 0.35
dtrastikeville 0.35
PCreator714 0.35
russ_thor 0.35
staggerlee422 0.35

Recommendations and Conclusions

The #QAnon network is composed of thousands of actors who often use a unique mix of rhetorical devices, in-group language, and Twitter-specific engagement strategies. The conspiracy community has developed such a large array of acronyms and language that the Tweets present in this dataset can be nearly incomprehensible to the broader public. Many of the most prolific users also include dozens of hashtags and mentions in their Tweets, further obfuscating the content itself.

But what this network demonstrates is that the unusual mentioning strategies and the countless hashtags are content, because they are strategically deployed to maximize engagement and exposure. Using k-core analysis coupled with LDA topic modeling, I find that the densest cores in both the simplified and full networks are dominated by users leveraging these engagement strategies to inflate the reach of their messages, as demonstrated most definitively by user @PrimeCreator2, a power user who has tweeted over 150,000 times over the course of the existence of his account, which was created in March 2018.

Global measures of centralization also confirm that this network is structured hierarchically around several high-frequency, strategic posters. Out-degree centralization is extremely high, and a check of out-degree centrality shows that only a handful of accounts make up most of the Tweets posted. As shown in the table below, only the top quartile of users have meaningful out-degree measures, while the histogram below demonstrates that the vast majority of users are clustered near 0.

Table 9: Quantiles of Out-Degree Centrality
x
0% 0
25% 0
50% 0
75% 2
100% 46040
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The dominance of this network by a few message-magnifiers has significant implications for security considerations. As #QAnon has already proven dangerous enough to incite violence, it is important to understand how its online radicalization and communications network is structured and might be mitigated. From this analysis, it is likely that the removal of several of the most high-profile and prolific users would dramatically reduce the information flow throughout the network. There is a significant chance that it would even fracture the network.

In the future, more analysis should be done on some of the peripheral cores and sub-groups outside of the dense center. How do the less prolific users interact with the #QAnon network? Are they passive observers, or do they have more “human”, less strategic methods of engagement? Further, there is significant opportunity to tease apart how the content of Tweets relates to the structure of the network. Statistical models like ERGMs might be deployed alongside topic modelling or other text analysis to better understand who create ties, who form sub-communities, and if language or subject matter govern the creation of those ties.