U.S. Senators’ Twitter Social Network

Introduction

For this dataset, the Twitter Interaction Network for the 117th United States Congress is a detailed dataset capturing the social media interactions among members of both the House of Representatives and the Senate. This dataset was put together using Twitter’s(now known as “X”) API, which allowed for the collection of data on how often one member interacted with other members’ tweets. Interactions include retweets, quote tweets, replies, and mentions, thereby providing a detailed view of the communication dynamics within the US Congress on Twitter. We got it from SNAP Datasets collection (https://snap.stanford.edu/data/congress-twitter.html).

This directed network consists of 475 nodes, each representing a member of Congress, and 13,289 edges, which represent the directed interactions between these members. The edges in the dataset are characterized by empirical transmission probabilities, quantifying the frequency of interactions between specific members. While the dataset does not include node features, it does offer detailed edge features that enhance the analysis of interaction patterns. This dataset serves as a valuable resource for examining the social media behavior of US Congress members, their communication strategies, and the potential influence of these interactions on political discourse or alliances.

The main objective of this project is to analyze the interaction network of Congress to understand the dynamics and the alliances formed among Congress. Analyzing the retweets, replies, and mentions will allow us to explore groups and people with the most influence in Congress. This project will help provide information on how information flows through the congress through social media. We will use community detection and also other methods of graphs to find patterns of who frequently communicate with each other, which can indicate political alliances and common interests among Congress.

Package Requirments

Required Packages

The following packages are required in order to run code without errors.

# Libraries 
library(igraph) # creating and manipulating graph data structures.
library(dplyr) # for data manipulation and transformation
library(visNetwork) # used for interactive network visualization
library(DT) # used for displaying interactive tables

Setup Code

The provided zip file contained two key datasets: an edge list and a JSON file. The JSON file contained the weights of the connections and the username list. My goal was to focus solely on senators, so I retrieved a list of all senators’ Twitter handles, which included their party, state, and name.

Edgelist

# Prepare network data
edges <- read.table("congress.edgelist", sep = " ", header = FALSE)
colnames(edges) <- c("from", "to", "extra", "weight")
edges <- edges[, !(colnames(edges) %in% c("extra"))]

Merging Datasets

I merged this additional information with the original datasets to create a node-level CSV file that contains the following attributes: username, name, party affiliation, and state. Furthermore, I filtered the edge list only to include connections between senators, ensuring that the resulting data was focused on the senators.

edges$weight <- gsub("}", "", edges$weight)
graph_edges <- graph_from_data_frame(d = edges, directed = TRUE)
nodes <- read.csv("Usernamelist.csv", sep = ",", header = TRUE)

file_with_affiliation <- "SenatorsData.csv"
file_without_affiliation <- "Usernamelist.csv"

# Read the CSV files
df_affiliation <- read.csv(file_with_affiliation)
df_no_affiliation <- read.csv(file_without_affiliation)

merged_df <- inner_join(df_no_affiliation, df_affiliation, by = "username")

## Filter my data and make sure nodes and edges match 
edge_vertices <- unique(c(edges$from, edges$to))
vertex_names <- merged_df$from
missing_vertices <- setdiff(edge_vertices, vertex_names)
filtered_edges <- edges %>%
  filter(from %in% vertex_names & to %in% vertex_names)

congress_indeg <- degree(graph_edges, mode="in") 
congress_outdeg <- degree(graph_edges, mode="out")
dat <- data.frame(congress_indeg, congress_outdeg)

Clean Datasets

Edgelist

Node Attributes

Visualizations

Colleague_Graph <- graph_from_data_frame(d=filtered_edges, vertices=merged_df, directed=FALSE)
V(Colleague_Graph)$color <- ifelse(V(Colleague_Graph)$Party == "D", 
                                   "blue", 
                                   ifelse(V(Colleague_Graph)$Party == "R", 
                                          "red",
                                          "grey"))


# Convert igraph graph to a data frame suitable for visNetwork
nodes <- data.frame(id = V(Colleague_Graph)$name, 
                    color = V(Colleague_Graph)$color, 
                    label = V(Colleague_Graph)$Name, 
                    group = V(Colleague_Graph)$Party, 
                    title = V(Colleague_Graph)$name,
                    state =  V(Colleague_Graph)$State
 )


# Create the interactive plot using visNetwork
visNetwork(nodes, edges) %>%
visIgraphLayout(type = "full") %>%
  visNodes( shadow = list(enabled = TRUE, size = 40), font = list(size = 20)) %>%
  visEdges(shadow = FALSE,,arrows = list(to = list(enabled = TRUE, scaleFactor = 0.5))) %>%
  visOptions(selectedBy ="state",highlightNearest = list(enabled =TRUE, degree = .5), width = "140%", nodesIdSelection = TRUE) %>%
  visLayout(randomSeed = 42, improvedLayout = TRUE ) %>%
visLegend(useGroups = FALSE, addNodes = data.frame(label = c("D", "R", "I"), shape = "dot", color = c("blue", "red", "grey"), size = 10))

In this network plot, there are a total of 1,563 connections among 92 U.S. Senators, which includes 43 Republicans, 47 Democrats, and 2 Independents.This graph visualizes how these senators connect with each other through Twitter and is color coded by there party affiliation. An interactive plot is used because it allows for singling out any senator to observe their connections and assess their activity levels. By clicking on a node (representing a senator), you can easily see who they are connected to, making it a valuable tool for analyzing social interactions and engagement patterns among U.S. Senators on Twitter.

The senators with the greatest number of connections and highest level of interaction are located in the middle of the network layout. These senators are more frequently interacting with people from opposing parties. On the other hand, the senators who are situated in the edges of the network generally engage with members of their own party. This shows the role that some senators play in fostering understanding between opposing political groups.

In a directed network such as the Twitter interaction network of the 117th United States Congress, in-degree and out-degree centrality measures are crucial for understanding communication dynamics and influence. In-degree centrality reflects the number of times a member of Congress is mentioned, replied to, or retweeted, indicating their prominence or influence within the network. High in-degree centrality points to key figures whose tweets gained significant attention. In contrast, out-degree centrality measures the frequency with which a member initiates interactions, highlighting active participants who frequently engage with others. These active members can drive conversations and spread information within the network.

plot(Colleague_Graph, 
     edge.arrow.size = 0.1, 
     vertex.label = V(Colleague_Graph)$Name, # Add names as labels on nodes
     vertex.color = V(Colleague_Graph)$color,
     vertex.label.cex = 0.8, # Increase label size
     vertex.size = congress_indeg / 2, # Adjust vertex size as needed
     rescale = TRUE, 
     asp = 0,
     vertex.label.font = 2,
     vertex.label.color = "black"
)
legend("topright", legend=c("Democrat", "Republican", "Independent"),
       col=c("blue", "red", "grey"), pch=19, pt.cex=2)

In-Degree Plot

knitr::include_graphics("colleague_graph.png")

plot(Colleague_Graph, 
     edge.arrow.size = 0.1, 
     vertex.label = V(Colleague_Graph)$Name, # Add names as labels on nodes
     vertex.color = V(Colleague_Graph)$color,
     vertex.label.cex = 0.8, # Increase label size
     vertex.size = congress_outdeg / 2, # Adjust vertex size as needed
     rescale = TRUE, 
     asp = 0,
     vertex.label.font = 2,
     vertex.label.color = "black"
)
legend("topright", legend=c("Democrat", "Republican", "Independent"),
       col=c("blue", "red", "grey"), pch=19, pt.cex=2)

Out-Degree Plot

The in-degree plot shows fewer nodes with significant sizes, indicating that only a few members receive a high volume of interactions, thus acting as influential hubs. These influential nodes are less densely packed, suggesting their influence is distributed across the network. In the out degree the most relevant senators Tom Cotton and Joni Ernst which are both republicans.

In contrast, the out-degree plot reveals more nodes with larger sizes and a more compact distribution, indicating a greater number of members actively engaging within close-knit clusters. Also, the nodes representing Democrats are generally larger in size, indicating they have more outgoing connections. This suggests that Republican senators are interacting more frequently with others, which might be engaging in more debates or broader discussions.

gv <- cluster_edge_betweenness(Colleague_Graph, modularity = TRUE)
set.seed(5)
plot(gv, Colleague_Graph, edge.arrow.size=.2, vertex.size = 10 ,vertex.label = V(Colleague_Graph)$Name , edge.color= "gray")

Edge Betweeness Community Detection

The network is mostly divided into communities according to political affiliations using edge betweenness community detection, so clumping Democrats and Republicans. Edge betweenness community detection can help identify clusters of senators who are more closely connected to each other, which can reveal political alliances, frequent interactions. The analysis also has a few outliers, or senators wo use Twitter (X) to debate with other members. Some of the outliers were Rand Paul, John Kennedy, Josh Hawley, and Tommy Tuberville. When you look out the edglist plot and the out and in degree plots thay show how these outliers were not interactive on Twitter compared to other senators.

Conclusion

This analysis of the U.S. Senators’ Twitter network revealed information about the political dynamics and communication styles within the Senate. Republicans and Democrats formed clearly defined groups, and we were able to identify clusters mostly based on party affiliations by using edge betweenness community . The most interactive senators in the network are those who contact with members outside their party and in there party. These senators are important because they are needed for party communication and debate. We found that Democrats tend to have more outgoing connections. This suggests that Democratic senators are interacting more frequently with others and engaging discussions.Our analysis has shown that the twitter network reflects the real-world political alliances and a dominant two-party system but there is a slight sub divide in the parties. Our political system is divided into mainly two parties.

This analysis has certain limitations, such as ignoring the context of the information, which could reveal more about the nature of these communications. For example, if we had data on the specific topics that were tweeted about or if the interactions were a tweet, re-tweet, or mention. To go deeper into this analysis we could examine the impact of specific events or legislative sessions on the interaction patterns.