The main purpose of this study is to analyze the network of House members of the US Congress. Our dataset is about the 2019 Report cards, the legislative records of Members of Congress during the 2019 legislative year. The score is assigned to Members of Congress according to their legislative behavior by how similar the pattern of bills and resolutions they cosponsor are to other Members of Congress. The main objective of this paper is to analyze and visualize the network structure and build mathematical and statistical models of different members of congress, and there connection or links based on the score, cosponsored within current parties.
We have a large dataset that contains 432 nodes and each node represents a House member. We have defined the betweenness to subset our data and analyze the network statistics. To conduct this analysis we have used the igraph package for network analysis, ggplot for the graphical representation and ggrepel package to avoid overlapping of text and give clear labels. We also used the express library to create HTML tables to under the data.
#load data
setwd("C:/Users/dkkan/Desktop/Harrisburg/512 Data Viz/GIS/netwrkanls")
net <- read_graph("cnet",format="GraphML")
#html table with full dataset
htmlTable(cro(V(net)$chamber, V(net)$party))
| V(net)$party | ||
|---|---|---|
| Democratic | Republican | |
| V(net)$chamber | ||
| House | 236 | 196 |
| #Total cases | 236 | 196 |
#get vertex and edges from the data
vertex <- as.data.frame(get.vertex.attribute(net))
edge<- get.data.frame(net)
# Subset data
btwn<-betweenness(net)>400
select<-names(btwn[btwn=='TRUE'])
edge1<-edge[edge$from %in% select,]
edge1<-edge1[edge1$to %in% select,]
vertex1<-vertex[vertex$name %in% select,]
net2<-graph_from_data_frame(d = edge1,vertex1,
directed = FALSE)
#Html table after subsetting the data
htmlTable(cro(V(net2)$chamber, V(net2)$party))
| V(net2)$party | ||
|---|---|---|
| Democratic | Republican | |
| V(net2)$chamber | ||
| House | 44 | 9 |
| #Total cases | 44 | 9 |
After downloading the data, we created an Html table with a full dataset. Which contains 236 Democratic and 196 Republican House members. We created new data (net2) by subsetting our data to make it small by defining the betweenness parameter. Then we again created an Html table with new data which shows 44 Democratic and 9 Republican House Members.
Before making more network graphs, we examined the network statistics by using closeness and degree of nodes. We created two ggplot visualizations of the network. First, a barplot of closeness among the House members and second a scatterplot of nodes degree and closeness centrality.
# extract separate stats and variables from a network object:
part1 <- closeness(net2)
part2<-V(net2)$party
part3<- degree(net2,mode="all")
part4<-V(net2)$name
eigen_centr<-centr_eigen(net2)
df<-as.data.frame(cbind(part1,part2,part3,part4))
df$part1<-as.numeric(df$part1);df$part3<-as.numeric(df$part3)
ggplot(df, aes(reorder(part4, part1), part1))+
geom_bar(stat="summary",fill="cornflowerblue",alpha=.5)+
theme_minimal()+
theme(axis.text.x = element_text(angle=70))+
ggtitle("Closeness Statistics for Network Members")+
theme(plot.title = element_text(size = 20, face = "bold"))+
xlab("Members")+
ylab("Closeness")
## No summary function supplied, defaulting to `mean_se()
ggplot(df, aes(part3, part1, label = part2)) +
geom_point(color = ifelse(part2=="Democratic","blue","red"),aes(size = 8)) +
geom_text_repel()+
theme_classic(base_size = 16)+
xlab("Nodes Degree")+
ggtitle("Degree vs Closeness Centrality")+
theme(plot.title = element_text(size = 20, face = "bold"))+
ylab("ClosenessCentrality")
We can see there is a similar correlation between nodes degree and closeness centrality. The plot of nodes degree and closeness centrality shows how one or more nodes are more or less central in the network. The more a node have links, the more central it is and The node with the shortest path to access to the other nodes have the highest closeness centrality. We can see a few Democratic members are not in the close connection of the network.
set.seed(1234)
plot(net2,
vertex.color = "violet", # change color of nodes
vertex.label = NA,
vertex.label.color = "black", # change color of labels
vertex.label.cex = .75, # change size of labels to 75% of original size
edge.curved=.25, # add a 25% curve to the edges
edge.color="grey20",
)
title("Basic Network",cex.main=2,col.main="Black")
We created a network graph after subsetting the dataset. We can see a dense connection among House members. Next, we will try to network within different parties.
After creating a basic network plot, we created s plot the same as before. However, we added House parties in nodes color to classify the graph. Also, we added edges color based on the House party.
E(net2)$color<- ifelse(E(net2)$current_party=="Democratic","grey50","green")
dark_side <- c("Democratic")
light_side <- c("Republican")
# node we'll create a new color variable as a node property
V(net2)$color <- NA
V(net2)$color[V(net2)$party %in% dark_side] <- "blue"
V(net2)$color[V(net2)$party %in% light_side] <- "red"
plot(net2,
edge.curved=.25,
vertex.label = NA,
edge.width=E(net2)$percentile*0.04,
edge.color=E(net2)$color,
vertex.size=V(net2)$percentile,
layout=layout.kamada.kawai)
legend(x=1.2, y=1.2, c("Democratic","Republican"), pch=21,
pt.bg=c("blue", "red"), pt.cex=4, cex=2, bty="n", ncol=1)
title("Political Party Affiliations",cex.main=2,col.main="Black")
From the above graph, it represents that there are more Democratic House members than Republican House members of US congress. Grey edges shows how Democratic members connected with other members and green edges show how Republican members are connected with other House members.
Now, we will explore how members are connected within their own party.
E(net2)$color<- ifelse(E(net2)$current_party=="Democratic","black","black")
net.D <- net2 - E(net2)[E(net2)$current_party=="Democratic"] # another way to delete edges:
net.R <- net2 - E(net2)[E(net2)$current_party=="Republican"] # using the minus operator
# Plot the two links separately:
set.seed(1256)
par(mfrow=c(1,2))
plot(net.D, vertex.color=adjustcolor("blue", alpha.f = .5), main="Democratic", vertex.label = NA, edge.curved = .50)
plot(net.R, vertex.color=adjustcolor("red", alpha.f = .5), main="Republican", vertex.label = NA, edge.curved = .50)
From the above graph, we can say almost all Republican members have served in a committee with another member of their party but almost 1/4 Democratic members have not served in a committee with other members of their party.
At last, we have created a set of three cluster network. We have used cluster_label_prop, cluster_edge_betweenness, and cluster_leading_eigen algorithms.
par(mfrow = c(1,3))
coords = layout_with_fr(net2)
#Community detection based on based on propagating labels
clp <- cluster_label_prop(net2)
plot(clp, net2, vertex.label = NA,)
title("Propagating labels Detection",cex.main=2,col.main="Black")
#Betweenness community detection
ceb <- cluster_edge_betweenness(net2)
plot(ceb, net2, vertex.label = NA)
title("Betweenness Detection",cex.main=2,col.main="Black")
#Spectral community detection
c2 = cluster_leading_eigen(net2)
plot(c2, net2, layout=coords, vertex.label = NA)
title("Eigen Centrality",cex.main=2,col.main="Black")
The first technique assigns node labels, randomizes than replaces each vertex’s label with the label that appears most frequently among neighbors. Those steps are repeated until each vertex has the most common label of its neighbors. It shows 3 clusters.
The second technique is a hierarchical decomposition process where edges are removed in the decreasing order of their edge betweenness scores. The benefit of Betweenness community detection is that edges connecting different groups are more likely to be contained in multiple shortest paths as they have the only option to go from one group to another. It shows 4 clusters and one outside cluster.
Last technique Spectral community detection (cluster_leading_eigen) typically consists of a dense bulk of closely spaced eigenvalues, plus some outlying eigenvalues separated from the bulk by a significant gap and points can be grouped in clusters by using standard partitional clustering techniques like k-means clustering. It shows 5 clusters.
Bibliography: Cohen, B. - 512 Assignment-5 Network Analysis
https://www.govtrack.us/congress/members/report-cards/2019/senate/ideology https://www.joe.org/joe/2011december/rb7.php https://kateto.net/networks-r-igraph http://rstudio-pubs-static.s3.amazonaws.com/336369_e0da8292496445e4bfc56c3c26a3e09f.html https://mran.microsoft.com/snapshot/2017-08-20/web/packages/ggrepel/vignettes/ggrepel.html https://www.webpages.uidaho.edu/~stevel/517/Sunbelt%202016%20R%20Network%20Visualization%20Handout.pdf https://arxiv.org/pdf/1608.00163.pdf https://igraph.org/python/doc/tutorial/tutorial.html