The data I will be using for this project is as follows: @inproceedings{nr, title = {The Network Data Repository with Interactive Graph Analytics and Visualization}, author={Ryan A. Rossi and Nesreen K. Ahmed}, booktitle = {AAAI}, url={https://networkrepository.com}, year={2015}
links to downloads for each of the three datasets
https://networkrepository.com/aves-songbird-social.php
https://networkrepository.com/aves-sparrow-social.php
https://networkrepository.com/aves-weaver-social.php
For this project, I have chosen to use three distinct social networks that correspond to different bird species. The three species I focus on are: songbirds, sparrows, and weavers. All of these networks are undirected, and I intend to create a dashboard that visualizes link structure across these differing groups, as well as analyzing the results to draw conclusions from them. For this project to be considered a success, I want to answer the following research questions:
How I plan to answer these questions: -Compute and compare network metrics such as density, clustering coefficient, and modularity. -Use centrality measures (betweenness, total-degree, closeness) to examine structural properties. -Apply community detection algorithm (Louvain) to identify subgroups.Chose Louvain as it supports weighted graphs and multiedges, which my data contains. -Summarize the results in a dashboard containing individual statistics for each species and a comparative overview of the key metrics. note: all of this will be achieved using the “igraph” package provided in R.
I believe that this work plan will enable me to answer the research questions outlined above and contribute to a deeper understanding of avian social organization. That analysis is provided below, thank you for your time.
#install packages(igraph)
library(igraph)
## Warning: package 'igraph' was built under R version 4.4.3
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
aves.songbird.social <- read.table("aves-songbird-social.edges", header = FALSE, sep = "", stringsAsFactors = FALSE)
aves.sparrow.social <- read.table("aves-sparrow-social.edges", header = FALSE, sep = "", stringsAsFactors = FALSE)
aves.weaver.social <- read.table("aves-weaver-social.edges", header = FALSE, sep = "", stringsAsFactors = FALSE)
#turning imported edgelists into networks
g_songbird = graph_from_data_frame(aves.songbird.social, directed = FALSE)
colremove = aves.sparrow.social[, -ncol(aves.sparrow.social)]
g_sparrow = graph_from_data_frame(aves.sparrow.social, directed = FALSE)
g_weaver = graph_from_data_frame(aves.weaver.social, directed = FALSE)
#static plots to introduce audience to network
par(mfrow = c(1, 3))
plot(g_songbird)
plot(g_sparrow)
plot(g_weaver)
## all 3 of the edgelists are x rows by 3 columns, where x represents the amount of birds in the dataset. Therefore, the vertices represent each distinct bird and the edges represent social interaction between birds. The data was collected by 'networkrepository.org' with the purpose of providing valuable insights about avian social link structure. It's provided above, but the link is here: https://networkrepository.com
#First, I want to compute network metrics within each community.
#density
so_density <- edge_density(g_songbird)
sp_density <- edge_density(g_sparrow)
we_density <- edge_density(g_weaver)
#cluster coefficient
so_cluster <- transitivity(g_songbird)
sp_cluster <- transitivity(g_sparrow)
we_cluster <- transitivity(g_weaver)
#modularity (Louvain)
ml_songbird <- cluster_louvain(g_songbird)
so_modular <- modularity(ml_songbird)
ml_sparrow <- cluster_louvain(g_sparrow)
sp_modular <- modularity(ml_sparrow)
ml_weaver <- cluster_louvain(g_weaver)
we_modular <- modularity(ml_weaver)
#as a table
bird_metrics <- data.frame(
Network = c("Songbird", "Sparrow", "Weaver"),
Density = c(so_density, sp_density, we_density),
Cluster_Coefficient = c(so_cluster, sp_cluster, we_cluster),
Modularity_Louvain = c(so_modular, sp_modular, we_modular)
)
#round for clarity
bird_metrics[, 2:4] <- round(bird_metrics[, 2:4], 3)
print(bird_metrics)
## Network Density Cluster_Coefficient Modularity_Louvain
## 1 Songbird 0.171 0.558 0.395
## 2 Sparrow 0.389 0.596 0.262
## 3 Weaver 0.014 0.588 0.893
#next, we will analyze measures of centrality
bird_centrality <- data.frame(
Network = c("Songbird", "Sparrow", "Weaver"),
#Average Betweenness
Avg_betweenness = c(mean(betweenness(g_songbird)),
mean(betweenness(g_sparrow)),
mean(betweenness(g_weaver))
),
#Average Total Degree
Avg_Total_Degree = c(mean(degree(g_songbird)),
mean(degree(g_sparrow)),
mean(degree(g_weaver))
),
#Average Closeness
Avg_Closeness = c(mean(closeness(g_songbird)),
mean(closeness(g_sparrow)),
mean(closeness(g_weaver))
)
)
#as a table
bird_centrality[, 2:4] <- round(bird_centrality[,2:4], 3)
print(bird_centrality)
## Network Avg_betweenness Avg_Total_Degree Avg_Closeness
## 1 Songbird 65.882 18.673 0.022
## 2 Sparrow 19.096 19.846 0.011
## 3 Weaver 73.240 6.409 0.044
After computing measures of cohesion and centrality, there are some interesting conclusions we can draw. For starters, all three species of bird scored relatively low regarding edge density. However, Sparrows scored highest in this category, which tells us that they are the most social of the three. Also, weavers scored extremely low in this category, which tells us their network is extremely fragmented and they are not a very social species. Our next category of interest, the cluster_coefficient, had very surprising results. Despite vast differences in density, all 3 species of bird scored within .05 of each other. This perplexed me at first, but it allows us to understand the nature of weavers better. They may be unlikely to form social connections, but those who do are tightly knit and locally concentrated. The cluster coefficient also tells us that there is no significant difference between groups when it comes to local cliques, which suggests that birds have strong local cohesion, regardless of species. Moving on to modularity, we see an extremely high score for the weaver species. This further proves our previous conclusion that they are heavily concentrated locally, but quite sparse across the network globally. It seems that songbirds and sparrows are more likely to socialize and form groups, however those clusters are not strongly separated. They are both a more social species, but the significance of their bonds is less than that of a weaver.
Moving on to measures of centrality, We see that songbirds and
weavers score relatively well. This reiterates the point that weavers
form intentional connections and are very selective of their social
group, but flocks are very tightly connected and may have key birds
within the flock that communicate with others. The vast difference in
score between songbirds and sparrows also tells us that songbirds are
much more connected across the network on a global scale than their
sparrow counterpart. Evidence suggests that sparrows gather in small
flocks and are unlikely to socialize outside of these flocks. Average
total degree doesn’t tell us much about this network in all honesty.
Songbirds scored lower than I expected, and combining this with their
high avg betweenness just communicates they are not as particular when
it comes to socializing. Lastly, our average closeness variable allows
us to spot a trend among all 3 species of bird. Since this is a
measurement of how efficient information can spread throughout the
network, and all networks scored abysmally, it’s safe to conclude that
the avian species is one filled with cliques. Birds are likely to find a
pack and stay there, but size of pack and level of socialization differs
greatly between each species.
##### Network Plots
#side-by-side plots for all 3 networks using centrality scores
par(mfrow = c(1,3), mar = c(1, 1, 3, 1), oma = c(5, 1, 2, 1))
layout1 <- layout_with_drl(g_songbird)
layout2 <- layout_with_kk(g_sparrow)
layout3 <- layout_with_fr(g_weaver)
#creating a heat map style color palette to understand betweenness
bw_norm_so <- (betweenness(g_songbird) - min(betweenness(g_songbird)) / (max(betweenness(g_songbird)) - min(betweenness(g_songbird))))
bw_norm_sp <- (betweenness(g_sparrow) - min(betweenness(g_sparrow)) / (max(betweenness(g_sparrow)) - min(betweenness(g_sparrow))))
bw_norm_we <- (betweenness(g_weaver) - min(betweenness(g_weaver)) / (max(betweenness(g_weaver)) - min(betweenness(g_weaver))))
color_scale <- colorRampPalette(c("yellow", "orange", "red"))(100)
node_colors <- color_scale[as.numeric(cut(bw_norm_so, breaks = 100))]
node_colors2 <- color_scale[as.numeric(cut(bw_norm_sp, breaks = 100))]
node_colors3 <- color_scale[as.numeric(cut(bw_norm_we, breaks = 100))]
set.seed(123)
plot(g_songbird,
layout = layout1,
vertex.size = degree(g_songbird),
vertex.color = node_colors,
vertex.label = NA,
main = "")
title(main = "Songbird", cex.main = 1.5, font.main = 2)
box()
plot(g_sparrow,
layout = layout2,
vertex.size = degree(g_sparrow),
vertex.color = node_colors2,
vertex.label = NA,
main = "")
title(main = "Sparrow", cex.main = 1.5, font.main = 2)
box()
plot(g_weaver,
layout = layout3,
vertex.size = degree(g_weaver),
vertex.color = node_colors3,
vertex.label = NA,
main = "")
title(main = "Weaver", cex.main = 1.5, font.main = 2)
box()
#subtitle spanning all 3 objects
mtext("Bird Social Networks (Node size = degree, Node color = betweenness)",
side = 1, line = 3, outer = TRUE, cex = 1.2)
#box plots comparing centrality measures
par(mfrow = c(1, 2), mar = c(5, 2, 4, 1.5), oma = c(1, 0, 0, 4))
boxplot(degree(g_songbird),
degree(g_sparrow),
degree(g_weaver),
names = c("Songbird", "Sparrow", "Weaver"),
main = "Degree Distribution",
ylab = "Degree",
col = "skyblue")
boxplot(betweenness(g_songbird),
betweenness(g_sparrow),
betweenness(g_weaver),
names = c("Songbird", "Sparrow", "Weaver"),
main = "Betweenness Distribution",
ylab = "Betweenness",
col = "palegreen",
ylim = c(0, 1000))
mtext("Y-axis represents total degree and betweenness (respectively)",
side = 1, line = -0.5, outer = TRUE, cex = 1.2)
#degree-betweenness scatter plot
par(mfrow = c(1,4), mar = c(4, 4, 3, 1), oma = c(4, 1, 2, 1))
#songbird
plot(degree(g_songbird), betweenness(g_songbird),
main = "Songbird",
xlab = "", ylab = "",
col = "red",
xlim = c(0, 60),
ylim = c(0,1000))
#sparrow
plot(degree(g_sparrow), betweenness(g_sparrow),
main = "Sparrow",
xlab = "", ylab = "",
col = "darkblue",
xlim = c(0, 60),
ylim = c(0,1000))
#weaver
plot(degree(g_weaver), betweenness(g_weaver),
main = "Weaver",
xlab = "", ylab = "",
col = "darkgreen",
xlim = c(0, 60),
ylim = c(0,1000))
mtext("Centrality Relationship: Each point is a bird. X = degree, Y = betweenness",
side = 1, line = 2, outer = TRUE, cex = 1.3, font.main = 2)
#One scatter plot summarizing previous 3 at once
plot(NA, NA,
xlim = c(0, 60), ylim = c(0, 1000),
xlab = "", ylab = "",
main = "")
title(main = "ALL Comparison", cex.main = 1.1, font.main = 2)
points(degree(g_songbird), betweenness(g_songbird), col = "red", pch = 4)
points(degree(g_sparrow), betweenness(g_sparrow), col = "darkblue", pch = 4)
points(degree(g_weaver), betweenness(g_weaver), col = "darkgreen", pch = 4)
legend("topright", legend = c("Songbird", "Sparrow", "Weaver"),
col = c ("red", "darkblue", "darkgreen"), pch = 4)
###### Plot Analysis The first series of visualizations are intended to
demonstrate the structure behind each network and to provide an
understanding of how the nodes interact with each other. I decided to
use size to represent total degree because we are not very interested in
specific nodes, more so the flow of the network as a whole. Using degree
as weight allows us to determine the shape of the network without
worrying about specific nodes, because nodes with a smaller degree value
will be overshadowed by nearby ones with higher values. The choice to
use color to demonstrate betweenness was more of a style choice in all
honesty, however I quickly realized that utilizing color to highlight
hot-spots or “bridge” nodes works really well as you can tell right away
which nodes have the most incoming traffic. I decided to use a heat map
color scheme, which was not easy to program, but reds correlate to a
higher betweenness score and yellow correlates to a lower one. I believe
this first dashboard does a really great job at telling the story of
each of these networks. Songbirds, for starters, have a very tightly
knit global network. The relationships among nodes are weaker than other
species, but the spread tells us that they find strength in numbers.
Rather than having small flocks, songbirds prefer safety in numbers.
Sparrows, on the other hand, are the middle ground between the other two
species in terms of distribution. Most of it’s centrality and cohesion
scores were hinting at this idea, but the distribution is very balanced
here. Rather than having a bunch of small groups, or one overwhelming
component, the sparrow network does both. Moving on to weavers, we see
that the results match up with our predictions during the table
discussion. There are very few nodes in this network that are red, which
means that the flow of information is weakest here. Considering how low
the weavers’ edge density value was, this makes sense, and reveals that
weavers are by far the most independent of the three species. They have
tightly knit groups, but a lot of weak components that tell us they
prefer small flocks and do not often go out of their way to socialize
with others.
The second portion of visualizations are simple box plots that demonstrate distribution across the groups in terms of their centrality measurements. Since my first section was dedicated to visualizing shape and flow, I needed to demonstrate the distribution so that we can understand how individual nodes affect that flow. Box plots are perfect for something like this because we can understand trends within the data and spot outliers. Combining our understanding of the shape with the ability to notice specific nodes within our network allows us to judge it’s stability. Songbird and Sparrow do not have very many outliers and when they do they are commonly near each other. This tells us that their flow / shape stay consistent on a case-by-case basis, providing us verification that the previous analysis of their species is accurate. However, looking at Weaver in terms of betweenness distribution shows us that there is no level of consistency between individual nodes. I believe this translates to weavers simply being a much more independent species than it’s counterparts. The lack of consistency in distribution and the density score of 0.01 that we saw in the table tells me there is no “pattern” among the species on a social level. However, no pattern does tell us that this species thinks for itself since the case by case is so different. I did do some research on these birds to understand why my results were what they are, and found that the species does come in an extremely wide range of shapes and sizes. This probably prevents the flow to some extent and the species is also notorious for creating extremely durable nests, so maybe they simply don’t need to move around much and grow accustom to being on their own.
My last dashboard of four visualizations are comparing the two main centrality measures I have used up to this point. The reason for this is that my box plots did not clearly prove what I wanted them to. The goal behind them was to fully understand the distribution of my network, but the degree distribution results stumped me. I expected the midpoint for Songbirds to be much higher than Sparrows, given their shapes, but they were almost equivalent. This, combined with the upper quartile of sparrows going far higher than songbirds, was something I needed to analyze in more depth. So I chose to create a cross comparison of two key centrality measures, betweenness and total degree. Laying them out in this way provided a much more clear understanding of the distribution and why the box plots looked how they did. I forgot to consider the size of each group initially and total degree changes depending on that size, because you can only have as many connections as you have birds. The sparrow group was the smallest of the three, having only 500 cases instead of 1000 (songbird) or 1500 (weaver).This is likely the reason why they scored so low in terms of betweenness distribution despite having such a uniform shape. If the datasets were equivalent in cases, I think we would have seen them stick at that middle spot, instead of being so low in my centrality relationship. In the final visualization, where all three are placed, sparrows would likely have shifted in a linear line upward toward the legend rather than flattening out. The rest of the data concludes that sparrows are the middle ground regarding their social structure, not being as isolated as the weaver or as tightly knit as the songbird, so I believe the distribution is skewed due to inconsistent case numbers. Regardless, I would not have been able to spot this inconsistency without utilizing scatter plots, as they really highlight each individual case, which was the balance my dashboard needed. To summarize, the first series was meant to “see” the visualization and understand the flow. The second was to see the numbers behind distribution and analyze trends of each network. The third was to compare those trends with that of it’s peer(s) to conclusively determine the results of my work.
Conclusion
The goal with this project was not only to analyze network structure and species centrality across three social networks of birds, but to also understand what these differences suggest about avian social systems. For starters, we can conclude that songbirds have the most intimate network and are the most interconnected out of the three species. This is due to them having the highest -overall- score in terms of centrality and by plotting their network using these measures. We also can evidently conclude that weavers are the “lone wolves” when it comes to the bird species. It’s not that they don’t form connections, and the ones they do form are often extremely strong, but they have a heavily fragmented network and there aren’t any notable patterns throughout all of my measures. They do score extremely high in betweenness, which tells us there are some key nodes in this network that facilitate social connection, but outside of that there is no sense of centrality. This lack of centrality, to me, signifies that the species is simply advanced and individual nodes operate on free will. However, this is theoretical and not an absolute guarantee, species size and seasonal differences could be at play. Also, the overwhelming majority of my data did point to the idea that sparrows are the middle ground between these two species. We saw their data was in a very uniform shape, and they scored in the middle of every cohesion measurement I conducted. However, I cannot make a certain claim about sparrows specifically since I was limited by their case count. My desired measures of centrality required an equivalent or at least comparable amount of cases between species, and while I did use averages to hopefully combat this, I did not think about the fact that degree is limited to the amount of cases. If I only have five nodes, and I take the average of their degree, the maximum result I can get is 4. So this level of bias did limit my project to some extent, as it would be unfair to conclude results in comparison to other species if they had the upper hand. However, despite this hiccup, I still think the goal I set for myself was achieved. I was able to decipher how species centrality differs across these networks, what these differences suggest about their species as a whole, and create visualizations that allow others to find their own answers to those questions as well. If I was prompted to do this project again, I would start by randomly selecting 100 or so cases from each species and analyzing them in relation to one another. Random selection prevents bias and doing this would have allowed my conclusions to be much more secure. Also, as long as I had the time, I would have liked to include many more species of bird. My results led me to find these 3 distinct groups, where one was isolated, one was social, and one was in the middle. However, taking in a lot more species would allow me to rank them on a scale and make better conclusions about “the avian species as a whole” which was one of my intended questions to answer. I think expanding my lens, and collecting data in an unbiased manner, both would make this project stronger. However, I am still proud of my efforts and believe this project did allow for some key takeaways on differing species of birds. I hope you are able to find them as well & thank you for reading my project. -Jaden Jones, University of Washington.