Narrative Summary & Sources

The data I will be using for this project is as follows: @inproceedings{nr, title = {The Network Data Repository with Interactive Graph Analytics and Visualization}, author={Ryan A. Rossi and Nesreen K. Ahmed}, booktitle = {AAAI}, url={https://networkrepository.com}, year={2015}

  links to downloads for each of the three datasets 
  https://networkrepository.com/aves-songbird-social.php 
  https://networkrepository.com/aves-sparrow-social.php
  https://networkrepository.com/aves-weaver-social.php
  

For this project, I have chosen to use three distinct social networks that correspond to different bird species. The three species I focus on are: songbirds, sparrows, and weavers. All of these networks are undirected, and I intend to create a dashboard that visualizes link structure across these differing groups, as well as analyzing the results to draw conclusions from them. For this project to be considered a success, I want to answer the following research questions:

  1. How do network structure and species centrality differ across the three desired bird social networks?
  2. What do these differences suggest about the organization of avian social systems?
  3. How do community detection patterns vary across bird networks? What does this say about their ecological or behavioral structure(s)?

How I plan to answer these questions: -Compute and compare network metrics such as density, clustering coefficient, and modularity. -Use centrality measures (betweenness, total-degree, closeness) to examine structural properties. -Apply community detection algorithm (Louvain) to identify subgroups.Chose Louvain as it supports weighted graphs and multiedges, which my data contains. -Summarize the results in a dashboard containing individual statistics for each species and a comparative overview of the key metrics. note: all of this will be achieved using the “igraph” package provided in R.

I believe that this work plan will enable me to answer the research questions outlined above and contribute to a deeper understanding of avian social organization. That analysis is provided below, thank you for your time.

Setup code

#install packages(igraph)
library(igraph)
## Warning: package 'igraph' was built under R version 4.4.3
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
aves.songbird.social <- read.table("aves-songbird-social.edges", header = FALSE, sep = "", stringsAsFactors = FALSE)
aves.sparrow.social <- read.table("aves-sparrow-social.edges", header = FALSE, sep = "", stringsAsFactors = FALSE)
aves.weaver.social <- read.table("aves-weaver-social.edges", header = FALSE, sep = "", stringsAsFactors = FALSE)

#turning imported edgelists into networks 
g_songbird = graph_from_data_frame(aves.songbird.social, directed = FALSE)
colremove = aves.sparrow.social[, -ncol(aves.sparrow.social)]
g_sparrow = graph_from_data_frame(aves.sparrow.social, directed = FALSE)
g_weaver = graph_from_data_frame(aves.weaver.social, directed = FALSE)

#static plots to introduce audience to network
par(mfrow = c(1, 3))
plot(g_songbird)
plot(g_sparrow)
plot(g_weaver)

## all 3 of the edgelists are x rows by 3 columns, where x represents the amount of birds in the dataset. Therefore, the vertices represent each distinct bird and the edges represent social interaction between birds. The data was collected by 'networkrepository.org' with the purpose of providing valuable insights about avian social link structure. It's provided above, but the link is here: https://networkrepository.com

Network Analysis

#First, I want to compute network metrics within each community.

#density
so_density <- edge_density(g_songbird)
sp_density <- edge_density(g_sparrow)
we_density <- edge_density(g_weaver)

#cluster coefficient
so_cluster <- transitivity(g_songbird)
sp_cluster <- transitivity(g_sparrow)
we_cluster <- transitivity(g_weaver)

#modularity (Louvain)
ml_songbird <- cluster_louvain(g_songbird)
so_modular <- modularity(ml_songbird)
ml_sparrow <- cluster_louvain(g_sparrow)
sp_modular <- modularity(ml_sparrow)
ml_weaver <- cluster_louvain(g_weaver)
we_modular <- modularity(ml_weaver)

#as a table
bird_metrics <- data.frame(
  Network = c("Songbird", "Sparrow", "Weaver"),
  Density = c(so_density, sp_density, we_density),
  Cluster_Coefficient = c(so_cluster, sp_cluster, we_cluster),
  Modularity_Louvain = c(so_modular, sp_modular, we_modular)
)
#round for clarity
bird_metrics[, 2:4] <- round(bird_metrics[, 2:4], 3)
print(bird_metrics)
##    Network Density Cluster_Coefficient Modularity_Louvain
## 1 Songbird   0.171               0.558              0.395
## 2  Sparrow   0.389               0.596              0.262
## 3   Weaver   0.014               0.588              0.893
#next, we will analyze measures of centrality
bird_centrality <- data.frame(
  Network = c("Songbird", "Sparrow", "Weaver"),

#Average Betweenness
Avg_betweenness = c(mean(betweenness(g_songbird)),
                   mean(betweenness(g_sparrow)),
                   mean(betweenness(g_weaver))
                   ),
#Average Total Degree
Avg_Total_Degree = c(mean(degree(g_songbird)),
                     mean(degree(g_sparrow)),
                     mean(degree(g_weaver))
                   ),
#Average Closeness 
Avg_Closeness = c(mean(closeness(g_songbird)),
                  mean(closeness(g_sparrow)),
                  mean(closeness(g_weaver))
  )
)
#as a table
bird_centrality[, 2:4] <- round(bird_centrality[,2:4], 3)
print(bird_centrality)
##    Network Avg_betweenness Avg_Total_Degree Avg_Closeness
## 1 Songbird          65.882           18.673         0.022
## 2  Sparrow          19.096           19.846         0.011
## 3   Weaver          73.240            6.409         0.044

Discussion of table results

After computing measures of cohesion and centrality, there are some interesting conclusions we can draw. For starters, all three species of bird scored relatively low regarding edge density. However, Sparrows scored highest in this category, which tells us that they are the most social of the three. Also, weavers scored extremely low in this category, which tells us their network is extremely fragmented and they are not a very social species. Our next category of interest, the cluster_coefficient, had very surprising results. Despite vast differences in density, all 3 species of bird scored within .05 of each other. This perplexed me at first, but it allows us to understand the nature of weavers better. They may be unlikely to form social connections, but those who do are tightly knit and locally concentrated. The cluster coefficient also tells us that there is no significant difference between groups when it comes to local cliques, which suggests that birds have strong local cohesion, regardless of species. Moving on to modularity, we see an extremely high score for the weaver species. This further proves our previous conclusion that they are heavily concentrated locally, but quite sparse across the network globally. It seems that songbirds and sparrows are more likely to socialize and form groups, however those clusters are not strongly separated. They are both a more social species, but the significance of their bonds is less than that of a weaver.

Moving on to measures of centrality, We see that songbirds and weavers score relatively well. This reiterates the point that weavers form intentional connections and are very selective of their social group, but flocks are very tightly connected and may have key birds within the flock that communicate with others. The vast difference in score between songbirds and sparrows also tells us that songbirds are much more connected across the network on a global scale than their sparrow counterpart. Evidence suggests that sparrows gather in small flocks and are unlikely to socialize outside of these flocks. Average total degree doesn’t tell us much about this network in all honesty. Songbirds scored lower than I expected, and combining this with their high avg betweenness just communicates they are not as particular when it comes to socializing. Lastly, our average closeness variable allows us to spot a trend among all 3 species of bird. Since this is a measurement of how efficient information can spread throughout the network, and all networks scored abysmally, it’s safe to conclude that the avian species is one filled with cliques. Birds are likely to find a pack and stay there, but size of pack and level of socialization differs greatly between each species.
##### Network Plots

#side-by-side plots for all 3 networks using centrality scores
par(mfrow = c(1,3), mar = c(1, 1, 3, 1), oma = c(5, 1, 2, 1))
layout1 <- layout_with_drl(g_songbird)
layout2 <- layout_with_kk(g_sparrow)
layout3 <- layout_with_fr(g_weaver)

#creating a heat map style color palette to understand betweenness
bw_norm_so <- (betweenness(g_songbird) - min(betweenness(g_songbird)) / (max(betweenness(g_songbird)) - min(betweenness(g_songbird))))
bw_norm_sp <- (betweenness(g_sparrow) - min(betweenness(g_sparrow)) / (max(betweenness(g_sparrow)) - min(betweenness(g_sparrow))))
bw_norm_we <- (betweenness(g_weaver) - min(betweenness(g_weaver)) / (max(betweenness(g_weaver)) - min(betweenness(g_weaver))))
color_scale <- colorRampPalette(c("yellow", "orange", "red"))(100)
node_colors <- color_scale[as.numeric(cut(bw_norm_so, breaks = 100))]
node_colors2 <- color_scale[as.numeric(cut(bw_norm_sp, breaks = 100))]
node_colors3 <- color_scale[as.numeric(cut(bw_norm_we, breaks = 100))]

set.seed(123)

plot(g_songbird,
     layout = layout1,
     vertex.size = degree(g_songbird),
     vertex.color = node_colors,
     vertex.label = NA,
     main = "")
title(main = "Songbird", cex.main = 1.5, font.main = 2)
box()

plot(g_sparrow,
     layout = layout2,
     vertex.size = degree(g_sparrow),
     vertex.color = node_colors2,
     vertex.label = NA,
     main = "")
title(main = "Sparrow", cex.main = 1.5, font.main = 2)
box()

plot(g_weaver,
     layout = layout3,
     vertex.size = degree(g_weaver),
     vertex.color = node_colors3,
     vertex.label = NA,
     main = "")
title(main = "Weaver", cex.main = 1.5, font.main = 2)
box()

#subtitle spanning all 3 objects
mtext("Bird Social Networks (Node size = degree, Node color = betweenness)", 
      side = 1, line = 3, outer = TRUE, cex = 1.2)

#box plots comparing centrality measures
par(mfrow = c(1, 2), mar = c(5, 2, 4, 1.5), oma = c(1, 0, 0, 4))

boxplot(degree(g_songbird),
        degree(g_sparrow),
        degree(g_weaver),
        names = c("Songbird", "Sparrow", "Weaver"),
        main = "Degree Distribution",
        ylab = "Degree",
        col = "skyblue")

boxplot(betweenness(g_songbird),
        betweenness(g_sparrow),
        betweenness(g_weaver),
        names = c("Songbird", "Sparrow", "Weaver"),
        main = "Betweenness Distribution",
        ylab = "Betweenness",
        col = "palegreen",
        ylim = c(0, 1000))

mtext("Y-axis represents total degree and betweenness (respectively)", 
      side = 1, line = -0.5, outer = TRUE, cex = 1.2)

#degree-betweenness scatter plot

par(mfrow = c(1,4), mar = c(4, 4, 3, 1), oma = c(4, 1, 2, 1))

#songbird
plot(degree(g_songbird), betweenness(g_songbird),
     main = "Songbird",
     xlab = "", ylab = "",
     col = "red",
     xlim = c(0, 60),
     ylim = c(0,1000))

#sparrow     
plot(degree(g_sparrow), betweenness(g_sparrow),
     main = "Sparrow",
     xlab = "", ylab = "",
     col = "darkblue",
     xlim = c(0, 60),
     ylim = c(0,1000))

#weaver
plot(degree(g_weaver), betweenness(g_weaver),
     main = "Weaver",
     xlab = "", ylab = "",
     col = "darkgreen",
     xlim = c(0, 60),
     ylim = c(0,1000))

mtext("Centrality Relationship: Each point is a bird. X = degree, Y = betweenness",
      side = 1, line = 2, outer = TRUE, cex = 1.3, font.main = 2)

#One scatter plot summarizing previous 3 at once
plot(NA, NA,
     xlim = c(0, 60), ylim = c(0, 1000),
     xlab = "", ylab = "",
     main = "")
title(main = "ALL Comparison", cex.main = 1.1, font.main = 2)
points(degree(g_songbird), betweenness(g_songbird), col = "red", pch = 4)
points(degree(g_sparrow), betweenness(g_sparrow), col = "darkblue", pch = 4)
points(degree(g_weaver), betweenness(g_weaver), col = "darkgreen", pch = 4)

legend("topright", legend = c("Songbird", "Sparrow", "Weaver"),
       col = c ("red", "darkblue", "darkgreen"), pch = 4)

###### Plot Analysis The first series of visualizations are intended to demonstrate the structure behind each network and to provide an understanding of how the nodes interact with each other. I decided to use size to represent total degree because we are not very interested in specific nodes, more so the flow of the network as a whole. Using degree as weight allows us to determine the shape of the network without worrying about specific nodes, because nodes with a smaller degree value will be overshadowed by nearby ones with higher values. The choice to use color to demonstrate betweenness was more of a style choice in all honesty, however I quickly realized that utilizing color to highlight hot-spots or “bridge” nodes works really well as you can tell right away which nodes have the most incoming traffic. I decided to use a heat map color scheme, which was not easy to program, but reds correlate to a higher betweenness score and yellow correlates to a lower one. I believe this first dashboard does a really great job at telling the story of each of these networks. Songbirds, for starters, have a very tightly knit global network. The relationships among nodes are weaker than other species, but the spread tells us that they find strength in numbers. Rather than having small flocks, songbirds prefer safety in numbers. Sparrows, on the other hand, are the middle ground between the other two species in terms of distribution. Most of it’s centrality and cohesion scores were hinting at this idea, but the distribution is very balanced here. Rather than having a bunch of small groups, or one overwhelming component, the sparrow network does both. Moving on to weavers, we see that the results match up with our predictions during the table discussion. There are very few nodes in this network that are red, which means that the flow of information is weakest here. Considering how low the weavers’ edge density value was, this makes sense, and reveals that weavers are by far the most independent of the three species. They have tightly knit groups, but a lot of weak components that tell us they prefer small flocks and do not often go out of their way to socialize with others.

The second portion of visualizations are simple box plots that demonstrate distribution across the groups in terms of their centrality measurements. Since my first section was dedicated to visualizing shape and flow, I needed to demonstrate the distribution so that we can understand how individual nodes affect that flow. Box plots are perfect for something like this because we can understand trends within the data and spot outliers. Combining our understanding of the shape with the ability to notice specific nodes within our network allows us to judge it’s stability. Songbird and Sparrow do not have very many outliers and when they do they are commonly near each other. This tells us that their flow / shape stay consistent on a case-by-case basis, providing us verification that the previous analysis of their species is accurate. However, looking at Weaver in terms of betweenness distribution shows us that there is no level of consistency between individual nodes. I believe this translates to weavers simply being a much more independent species than it’s counterparts. The lack of consistency in distribution and the density score of 0.01 that we saw in the table tells me there is no “pattern” among the species on a social level. However, no pattern does tell us that this species thinks for itself since the case by case is so different. I did do some research on these birds to understand why my results were what they are, and found that the species does come in an extremely wide range of shapes and sizes. This probably prevents the flow to some extent and the species is also notorious for creating extremely durable nests, so maybe they simply don’t need to move around much and grow accustom to being on their own.

My last dashboard of four visualizations are comparing the two main centrality measures I have used up to this point. The reason for this is that my box plots did not clearly prove what I wanted them to. The goal behind them was to fully understand the distribution of my network, but the degree distribution results stumped me. I expected the midpoint for Songbirds to be much higher than Sparrows, given their shapes, but they were almost equivalent. This, combined with the upper quartile of sparrows going far higher than songbirds, was something I needed to analyze in more depth. So I chose to create a cross comparison of two key centrality measures, betweenness and total degree. Laying them out in this way provided a much more clear understanding of the distribution and why the box plots looked how they did. I forgot to consider the size of each group initially and total degree changes depending on that size, because you can only have as many connections as you have birds. The sparrow group was the smallest of the three, having only 500 cases instead of 1000 (songbird) or 1500 (weaver).This is likely the reason why they scored so low in terms of betweenness distribution despite having such a uniform shape. If the datasets were equivalent in cases, I think we would have seen them stick at that middle spot, instead of being so low in my centrality relationship. In the final visualization, where all three are placed, sparrows would likely have shifted in a linear line upward toward the legend rather than flattening out. The rest of the data concludes that sparrows are the middle ground regarding their social structure, not being as isolated as the weaver or as tightly knit as the songbird, so I believe the distribution is skewed due to inconsistent case numbers. Regardless, I would not have been able to spot this inconsistency without utilizing scatter plots, as they really highlight each individual case, which was the balance my dashboard needed. To summarize, the first series was meant to “see” the visualization and understand the flow. The second was to see the numbers behind distribution and analyze trends of each network. The third was to compare those trends with that of it’s peer(s) to conclusively determine the results of my work.

Conclusion

The goal with this project was not only to analyze network structure and species centrality across three social networks of birds, but to also understand what these differences suggest about avian social systems. For starters, we can conclude that songbirds have the most intimate network and are the most interconnected out of the three species. This is due to them having the highest -overall- score in terms of centrality and by plotting their network using these measures. We also can evidently conclude that weavers are the “lone wolves” when it comes to the bird species. It’s not that they don’t form connections, and the ones they do form are often extremely strong, but they have a heavily fragmented network and there aren’t any notable patterns throughout all of my measures. They do score extremely high in betweenness, which tells us there are some key nodes in this network that facilitate social connection, but outside of that there is no sense of centrality. This lack of centrality, to me, signifies that the species is simply advanced and individual nodes operate on free will. However, this is theoretical and not an absolute guarantee, species size and seasonal differences could be at play. Also, the overwhelming majority of my data did point to the idea that sparrows are the middle ground between these two species. We saw their data was in a very uniform shape, and they scored in the middle of every cohesion measurement I conducted. However, I cannot make a certain claim about sparrows specifically since I was limited by their case count. My desired measures of centrality required an equivalent or at least comparable amount of cases between species, and while I did use averages to hopefully combat this, I did not think about the fact that degree is limited to the amount of cases. If I only have five nodes, and I take the average of their degree, the maximum result I can get is 4. So this level of bias did limit my project to some extent, as it would be unfair to conclude results in comparison to other species if they had the upper hand. However, despite this hiccup, I still think the goal I set for myself was achieved. I was able to decipher how species centrality differs across these networks, what these differences suggest about their species as a whole, and create visualizations that allow others to find their own answers to those questions as well. If I was prompted to do this project again, I would start by randomly selecting 100 or so cases from each species and analyzing them in relation to one another. Random selection prevents bias and doing this would have allowed my conclusions to be much more secure. Also, as long as I had the time, I would have liked to include many more species of bird. My results led me to find these 3 distinct groups, where one was isolated, one was social, and one was in the middle. However, taking in a lot more species would allow me to rank them on a scale and make better conclusions about “the avian species as a whole” which was one of my intended questions to answer. I think expanding my lens, and collecting data in an unbiased manner, both would make this project stronger. However, I am still proud of my efforts and believe this project did allow for some key takeaways on differing species of birds. I hope you are able to find them as well & thank you for reading my project. -Jaden Jones, University of Washington.