‘The Social Network Unveiled’ | Analyzing Facebook Friendships with Network Science

Introduction

Ah, Facebook! 😄 The one-stop-shop for awkward high school reunion conversations, political arguments with distant relatives, and cat videos.

But what lies beneath the surface of this social media giant? 🤔

Friendships, connections, and an intricate web of network science. 🤩

The study of social networks, or the ways in which people are connected to each other, has become an increasingly important field in our digital age. And what better way to explore this field than through the lens of Facebook, where the lines between acquaintances, friends, and frenemies are blurrier than ever before.

So let’s dive into the depths of Facebook friendships and uncover the hidden gems of network science that lie beneath. 😍

Description of the network

The network we have chosen to analyze is commonly referred to as the “‘0’ ego-Facebook network”.

The data for the “0” ego-Facebook network was obtained from the Stanford Network Analysis Project (SNAP), a collection of datasets widely used for research in network analysis and related fields.

The dataset is publicly available and was originally collected by a team of researchers from Facebook, led by Jure Leskovec. The data was collected by crawling the Facebook graph, and the resulting dataset contains a snapshot of the social network of a single user with user ID “0” at a particular point in time. 👍

The dataset includes information on the users in the network, as well as their friend connections.
The use of this dataset is permitted under Facebook’s terms of service, and researchers are encouraged to cite the original source of the data.

In social network analysis, an ego network is a network centered around a single individual, called the ego. The ego network typically includes the ego as a node, and all the nodes that are directly connected to the ego. In the case of the “0” ego-Facebook network, it represents the social network of a single individual whose Facebook user ID is “0”. In other words, the network is centered around a specific Facebook user, and includes all the friends of that user, as well as the connections between them. 👨‍👨‍👦‍👦

The term “ego-Facebook” simply refers to the fact that the network is centered around a single individual on the Facebook platform.

Therefore, each node in the network corresponds to a Facebook user, and an edge (= link) between two nodes indicates that the corresponding users are friends on Facebook.

The network is directed, meaning that the relationship of “friendship” is not necessarily mutual, and there may be cases where one person is friends with another, but the other is not friends with the first.

Why did we choose this network? 🤔

This network is interesting because it offers insights into the structure and dynamics of real-world social networks.
For example, by analyzing the network, one could identify key individuals who serve as “bridges” between different groups or clusters within the network, or study the spread of information or behavior within the network.

Consequently, the “0” ego-Facebook network can be super relevant for social network analysis and has been widely used as a benchmark dataset for studying social network properties and dynamics. 😉

Analysis

Global Network Analysis

I/ Global Network Analysis

a) Checking Network Properties

cat("Number of nodes: ", vcount(network), "\n")

## Number of nodes:  347

cat("Number of edges: ", ecount(network), "\n")

## Number of edges:  5038

cat("Is the network directed? ", is_directed(network), "\n")

## Is the network directed?  TRUE

cat("Density of the network: ", edge_density(network), "\n")

## Density of the network:  0.04196165

cat("Reciprocity of the network: ", reciprocity(network), "\n")

## Reciprocity of the network:  1

cat("Diameter of the network: ", diameter(network, directed=is_directed(network)), "\n")

## Diameter of the network:  11

cat("Average path length: ", mean_distance(network, directed=is_directed(network)), "\n")

## Average path length:  3.752446

cat("Clustering coefficient: ", transitivity(network), "\n")

## Clustering coefficient:  0.4258694

# To compute the degree of each vertex in the network:
degree <- igraph::degree(network, mode = "all") 
# The argument mode = "all" specifies that we want to compute the total degree of each vertex, taking into account both incoming and outgoing edges.

# The mean degree of the vertices in the network:
mean_degree <- mean(degree)
cat("Mean of the vertices in the network: ", mean_degree, "\n")

## Mean of the vertices in the network:  29.03746

# The standard deviation degree of the vertices in the network:
sd_degree <- sd(degree)
cat("Standard deviation of degree in the network: ", sd_degree, "\n")

## Standard deviation of degree in the network:  31.01494

# Counting the number of vertices in the network that have a degree of 0:
num_degree_zero <- sum(degree == 0)
cat("Number of vertices in the network that have a degree of 0: ", num_degree_zero, "\n")

## Number of vertices in the network that have a degree of 0:  14

# Compute the average clustering coefficient:
avg_cc <- transitivity(network, type="global")
cat("Average clustering coefficient: ", avg_cc, "\n")

## Average clustering coefficient:  0.4258694

b) Analysis of components

clusters(network, mode = "weak") # information about the clusters or components in your network/graph

## $membership
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##   1   1   1   1   1   1   1   1   1   1   2   3   1   1   4   1   1   5   1   1 
##  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
##   1   1   1   1   1   1   1   1   1   1   1   1   6   1   1   1   7   1   1   1 
##  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
##   1   6   8   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   9   1   1   1   1   1   1 
##  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
##   1   1   1   1   1   1   1   1   1  10   1   1   1   1   1   1   1   1   1   1 
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
##   1   1   1   1   1   1   1   1   1   1   1   1   1  11   1   1   1   1   1   1 
## 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 
##   1   1   1   1  10   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1  10   1 
## 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 
##   1   1   1   1   1   1   1   1  12  13   1   1   1   1  14   1   1   1   1   1 
## 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 
##   1   1   1   1   1   1   1   1   1   1   1   1  15   1   1   1   1   1   1   1 
## 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 
##   1   1   1  16   1   1   1   1   1   1   1   1   1   1   1  15   1   1   1   1 
## 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 
##   1  16   1   1   1   1  17   1   1   1   1  18   1   1   1   1   1   1   1   1 
## 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1  19   1   1   1   1   1 
## 341 342 343 344 345 346 347 
##   1   1   1   1   1   1   1 
## 
## $csize
##  [1] 324   1   1   1   1   2   1   1   1   3   1   1   1   1   2   2   1   1   1
## 
## $no
## [1] 19

clusters_name <- names(clusters(network))

cat("Name of clusters: ", clusters_name, "\n")

## Name of clusters:  membership csize no

c) Assortativity of our network

What is “assortativity” ? 🤔

It is a measure of the tendency of nodes in a network to be connected to other nodes with similar or dissimilar properties. It is a way to assess the degree of homophily or heterophily in a network. Homophily means that nodes with similar attributes (e.g., age, gender, occupation, interests) are more likely to be connected to each other, while heterophily means that nodes with different attributes are more likely to be connected.

Assortativity quantifies this tendency by computing the correlation between the degrees of nodes at either end of an edge in the network.
A positive assortativity coefficient indicates that nodes with high degree tend to be connected to other nodes with high degree, while a negative coefficient indicates that high-degree nodes tend to be connected to low-degree nodes.

Let’s calculate the assortativity of our network… 🤓

using the simple correlation coefficient:

# Calculate assortativity:
es <- get.edgelist(network)
assortativity <- cor(degree[es[, 1]], degree[es[, 2]])

cat("The assortativity coefficient is equal to: ", assortativity, "\n")

## The assortativity coefficient is equal to:  0.2360392

using the assortativity coefficient:

## ...using the assortativity coefficient
assortativity_check_F <- assortativity_degree(network, directed=F)
assortativity_check_T <- assortativity_degree(network, directed=T)

cat("The assortativity coefficient is equal to: ", assortativity_check_F, "\n")

## The assortativity coefficient is equal to:  0.2360392

Therefore, an assortativity coefficient of 0.2360392 means that nodes in the network tend to be connected to other nodes with similar degrees. This is a measure of the degree correlation in the network. 👍

Especially, in the context of Facebook, the assortativity coefficient can provide insights into the way people connect with each other on the platform. An assortativity coefficient of 0.236, in this case of the Facebook network, is relatively high compared to other social networks, indicating a tendency for individuals to connect with others who have a similar number of connections or a similar level of activity on the platform.

This may indicate the existence of communities or subgroups within the network that are formed around shared interests, values, or demographic characteristics. 👨‍👨‍👦‍👦

Network Community Detection

II/ Network Community Detection

Now, imagine a vast and complex web of connections between people, each represented by a node in a massive network graph. This network represents the social connections of a single individual on Facebook, known as the “0 ego-Facebook network.” It is a world of friendships and acquaintances, where every node represents a person and every edge represents a connection between two people who are friends on Facebook…

a) The Louvain Algorithm

Let’s now use the ‘Louvain algorithm’ to identify communities in our ‘0’ ego-Facebook network. 👨‍👨‍👦‍👦

The Louvain algorithm is a heuristic optimization method that maximizes a modularity score, which measures the quality of the network partition by comparing the number of edges within communities to the number of edges expected by chance. This algorithm tends to produce larger, more cohesive communities by optimizing a global measure of community structure.

# Convert the directed network to an undirected network:
undirected_network <- as.undirected(network)

# Apply the Louvain algorithm on the undirected network:
community <- cluster_louvain(undirected_network)

# Get the number of communities:
num_communities <- length(community)

# Print the number of communities:
cat("Number of communities detected:", num_communities, "\n")

## Number of communities detected: 28

# Set vertex color based on the community membership:
V(network)$color <- rainbow(num_communities)[membership(community)]

#==== Nicely plotted ====

# Apply Fruchterman-Reingold layout to the network:
layout <- layout_with_fr(network, area = 10000)

# Plot the network with nodes colored according to their community:
plot(community, network, vertex.color = membership(community), layout = layout, vertex.label=NA, edge.arrow.size=0.5, edge.curved=0.2, main="Facebook Network Community Detection using the Louvain Algorithm")
title(main = "Facebook Network Community Detection using the Louvain Algorithm", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook, snap.stanford.edu", side = 1, line = 4, col = "red")

Identification of communities 👨‍👨‍👦‍👦:

To make sense of this massive network, we turned to community detection, a powerful technique that allows us to identify clusters of nodes that are more densely connected to each other than to nodes in other clusters. By using the Louvain algorithm, we were able to reveal the underlying structure of the network: 28 distinct communities of Facebook users, which suggests that the network has a relatively cohesive structure with well-defined groups of users.

Identification of key nodes 🤝:

Each community is represented by a unique color in our visualization, with nodes of the same color belonging to the same community. This visual representation allows us to see the complex web of relationships in a new light, revealing distinct groups of people who are more tightly connected to each other than to people outside their community.Through community detection, we have gained a deeper understanding of the social structure of the ‘0’ ego-Facebook network.

It is difficult to identify key nodes in the network. We can see that there are some nodes that are more central than others, such as the larger nodes in the center of the plot.

Identification of network structure 🌉:

The network is highly decentralized, with no clear central node or cluster dominating the network. Instead, we observe a complex web of interconnected communities, with some individuals acting as “bridges” between different groups. These insights demonstrate the power of network community detection for identifying the overall structure and topology of a complex social network.

Identification of network dynamics 👌:

A few larger communities with a high number of nodes indicate a relatively stable network structure with established communities. However, there are also smaller communities that have fewer nodes and appear to be more fluid, potentially indicating emerging or evolving communities.

Overall, the Louvain algorithm is a powerful tool for identifying communities in social networks like Facebook. It allows us to uncover hidden patterns in the data and gain insights into the complex world of social connections on Facebook, gain insights into the structure and behavior of the network, and to identify key individuals and groups that play important roles within the network. 👍

b) The Edge-Betweenness Algorithm

Although, the ‘Louvain algorithm’ is often preferred for its ability to produce larger and more coherent communities that better reflect the underlying structure of the network, we have applied another popular community detection algorithm, called the ‘edge-betweenness algorithm’, to the “0” ego-Facebook network.

This algorithm works by iteratively removing the edges with the highest “betweenness” (i.e., the edges that are most central to the network) and measuring the increase in the number of connected components. The resulting communities are based on the components that emerge when the network is split into smaller pieces.

# Run edge-betweenness algorithm
eb_community <- edge.betweenness.community(network)

# Get the number of communities detected
n_communities_eb <- length(membership(eb_community))
# Print the number of communities:
cat("Number of communities detected:", n_communities_eb, "\n")

## Number of communities detected: 347

Using this algorithm, we have identified a total of 347 communities in the network, as visualized in the following plot. The communities are represented by different colors, and the size of the nodes reflects their degree (i.e., the number of connections they have with other nodes). By analyzing the structure of the network using the edge-betweenness algorithm, we can gain further insights into the patterns of social connections within the “0” ego-Facebook network.

The edge-betweenness algorithm can still provide valuable insights into the network’s structure by revealing finer details of the social connections between individuals.

# Set vertex color based on community membership
V(network)$color <- rainbow(n_communities_eb)[membership(eb_community)]

# Plot the network with community colors
plot(eb_community, network, vertex.color = V(network)$color, layout = layout, vertex.label=NA, edge.arrow.size=0.5, edge.curved=0.2)
title(main = "Facebook Community Detection using Edge-Betweenness Algorithm", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook, snap.stanford.edu", side = 1, line = 4, col = "red")

Identification of communities 👨‍👨‍👦‍👦:

The resulting plot of the edge-betweenness algorithm shows a network with many more communities (347) than the Louvain algorithm (28). This is not surprising, as the edge-betweenness algorithm is known to produce more granular partitions of the network. This high number of communities indicates that the network is more fragmented and less cohesive overall. By analyzing the 347 communities identified by this algorithm, we can gain a deeper understanding of the patterns of friendship and communication among the users of this 0 ego-Facebook network.

Identification of key nodes 🤝:

There are many small communities that are not well connected to the rest of the network. This suggests that there may be some nodes that are more important in terms of connecting different parts of the network. However, without further analysis, it is difficult to identify these nodes from the plot alone.

To identify key nodes, we shall need to later perform additional analysis such as betweenness centrality or eigenvector centrality, which can help us to identify nodes that are important in terms of connecting different communities or serving as hubs in the network.

Identification of network structure 🌉:

Identification of network dynamics 👌:

We can observe a much more fragmented network with many smaller communities. This suggests that the network may be undergoing significant changes over time, with individuals frequently forming and leaving smaller communities. Additionally, there appear to be several nodes with high betweenness centrality that connect multiple communities, which may indicate influential individuals or bridge nodes that play a critical role in maintaining the overall network structure.

Overall, the edge-betweenness algorithm provides a different perspective on the network structure compared to the Louvain algorithm. It highlights the presence of a large number of smaller communities, indicating a more fragmented network. This could suggest that information or influence within the network may not flow as easily as in a more tightly-knit network. 👍

⚠ However, it is important to note that the choice of algorithm used for community detection can significantly impact the results obtained, and different algorithms may highlight different aspects of the network structure.

Modularity of our network

III/ Modularity of our network

a) Comparison of modularity with a random graph

To compare the modularity of our network with that of a comparable random graph, we generate a random graph with the same number of nodes and edges as our original network and compute its modularity.

We can then compare the modularity of the random graph with that of our original network to see if our network has a higher modularity than expected by chance.

# Compute the modularity score for the network:
mod_score <- modularity(community)
cat("The modularity score of the network is:", mod_score, "\n")

## The modularity score of the network is: 0.4562043

# Compute the modularity score for a random graph with the same number of nodes and edges:
random_graph <- sample_gnm(n = vcount(network), m = ecount(network))

random_community <- cluster_louvain(random_graph)$membership

random_mod_score <- modularity(random_graph, random_community)
cat("The modularity score for a random graph:", random_mod_score, "\n")

## The modularity score for a random graph: 0.1503969

# Compare the modularity of the random graph with that of your original network:
if (mod_score > random_mod_score) {
  cat("Therefore, the modularity of the network is HIGHER than that of a comparable random graph.\n")
} else {
  cat("Therefore, the modularity of the network is NOT HIGHER than that of a comparable random graph.\n")
}

## Therefore, the modularity of the network is HIGHER than that of a comparable random graph.

We notice that the modularity of the Facebook social network is higher than that of a comparable random graph, it suggests that the network has a non-random structure and is likely to have communities or clusters of nodes that are more densely connected to each other than to the rest of the network. 👨‍👨‍👦‍👦

This is a common characteristic of social networks, where nodes tend to be more connected to others who share similar interests, hobbies, beliefs, or backgrounds. 💞

The higher modularity score of the Facebook social network also suggests that there are identifiable communities or groups within the network. By further analyzing the structure of the network and the characteristics of the nodes within each community, it may be possible to gain insights into the nature of these groups and how they interact with each other. This information could be valuable for a variety of purposes, such as targeted marketing, social network analysis, or understanding social dynamics. ✅

b) Comparison of modularity between both ‘Louvain’ & ‘Edge-betweenness’ algorithms

Comparing the modularity score of the network using the Louvain method and the Edge-betweenness method can provide some insights into the community structure of the network.

# Compute the modularity score using the Louvain method:
modularity_louvain <- modularity(community, undirected_network)
cat("The modularity score of our network using the Louvain method is equal to:", modularity_louvain, "\n")

## The modularity score of our network using the Louvain method is equal to: 0.4562043

# Compute the modularity score using edge betweenness:
edge_betweenness <- edge.betweenness.community(undirected_network)
modularity_edge_between <- modularity(edge_betweenness, undirected_network)
cat("The modularity score of our network using the Edge-betweenness method is equal to:", modularity_edge_between, "\n")

## The modularity score of our network using the Edge-betweenness method is equal to: 0.4161461

Therefore, we can see that the Louvain method produced a higher modularity score (0.4634598) than the Edge-betweenness method (0.4161461). This suggests that the community structure of the network is more well-defined and distinct when using the Louvain method. 👍

Additionally, the Louvain method is known for being more efficient and scalable than the Edge-betweenness method, which can be important factors to consider when working with larger networks.

Overall, this comparison provides some insights into the community structure of the network and can inform further analysis and interpretation of the network’s properties and dynamics.

Individual Network Analysis

IV/ Individual Network Analysis

As we told earlier, we need to perform additional analysis such as “betweenness centrality” or “eigenvector centrality”, which can help us to identify nodes that are important in terms of connecting different communities or serving as hubs in the network. 🤝

In network analysis, positioning refers to the arrangement of nodes in a network with respect to each other. There are different algorithms used for positioning, such as force-directed algorithms, spectral algorithms, and multidimensional scaling. The goal of positioning is to create visualizations that reveal the underlying structure and patterns in the network, such as clusters or communities, and to facilitate analysis and interpretation of the network data.

a) Global Centrality

Betweenness centrality 🎯: This measures the number of times a node lies on the shortest path between two other nodes. Higher betweenness centrality means a node is more important for the network’s communication flow. We can look at the nodes with the highest betweenness centrality to identify key players in the network.

# Calculate betweenness:
    # Create a network object from the adjacency matrix:
adj_matrix <- as.matrix(network)
network_betweenness <- as.network(adj_matrix)
    # Compute betweenness centrality:
betweenness <- betweenness(network_betweenness)

# Not displaying the values to not pollute the page with all the results... 😉

Global transitivity 🔺: This is the ratio of the number of triangles in the network to the number of possible triangles. It measures the overall density of the network and how well-connected its nodes are. A higher global transitivity means a more tightly knit network.

# Calculate global and local transitivity:
    # Assuming adj_matrix is the adjacency matrix of your network
network_transitivity <- graph_from_adjacency_matrix(adj_matrix, mode = "undirected")
    # Compute the global transitivity:
global_transitivity <- transitivity(as.undirected(network_transitivity), type = "global")

# Not displaying the values to not pollute the page with all the results... 😉

Local transitivity 🔺: This measures the proportion of triangles around each node. A higher local transitivity means a node is more likely to be in a cluster of tightly connected nodes.

# Compute the local transitivity:
local_transitivity <- transitivity(as.undirected(network_transitivity), type = "local")

# Not displaying the values to not pollute the page with all the results... 😉

Eigenvector centrality 🎯🤝: This measures a node’s influence in the network based on the centrality of its neighbors. A node with high eigenvector centrality is connected to other nodes with high centrality, and thus has greater influence within the network.

# Calculate eigenvector centrality:
eigen_centrality <- eigen_centrality(network_transitivity)$vector

# Plot histogram of eigenvector centrality:
hist(eigen_centrality, 
     breaks = "Sturges",
     col = "#3B5998", 
     main = "\nHistogram of the Eigenvector Centrality (EC)", 
     cex.main = 1.2, 
     col.main = "red",
     xlab = "Eigenvector Centrality")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook, snap.stanford.edu", side = 1, line = 4, col = "red")

Based on the histogram of Eigenvector Centrality (EC), it appears that the majority of nodes in the network have a relatively low eigenvector centrality, with values between 0.0 and 0.1 being the most frequent. This suggests that the network may not have a well-defined hierarchy, as there are many nodes with similar levels of centrality. However, there are also some nodes with high eigenvector centrality values, which may be particularly influential in the network.

It’s also worth noting that the histogram has a relatively long tail on the right, with some nodes having very high eigenvector centrality values. These nodes may be particularly important in the network, as they have the potential to strongly influence other nodes.

Overall, the histogram of eigenvector centrality can give us insights into the structure of the network and the relative importance of different nodes within it.

b) Local Centrality

Local centrality measures help us identify the most important nodes in a network from a local perspective. In the context of Facebook, local centrality measures can help us identify individuals who are influential within specific communities or subgroups. This information can be used to better understand how information or ideas spread within a network and to design targeted interventions to influence behavior or opinions. 😉

degree_distribution <- degree.distribution(undirected_network, mode = "all")

# Calculate out-degree:
outdegree_distribution <- degree.distribution(undirected_network, mode = "out")

# Calculate in-degree:
outdegree_distribution <- degree.distribution(undirected_network, mode = "in")

# Not displaying the values to not pollute the page with all the results... 😉

c) Clustering Coefficient

The clustering coefficient measures how much nodes in a network tend to cluster together. In other words, it measures the degree to which nodes in a network tend to form tightly-knit groups. This can be useful for understanding the social structure of a network, such as Facebook, as it can help identify closely-knit communities within the larger network.

clustering <- transitivity(undirected_network, type="local")

# Not displaying the values to not pollute the page with all the results... 😉

Comparison between two networks

V/ Comparison between two networks 😉

Facebook has become one of the most popular social media platforms in the world, connecting people from different parts of the globe. Its unique structure, which revolves around the concept of ‘friendship’, has attracted the attention of network scientists. 🤓

By analyzing the patterns of connections between users, network scientists can gain insights into the functioning of the platform, the behavior of its users, and even the dynamics of social relationships.

To end this report with a bang 💥, we compare two Facebook networks (users ‘0’ and ‘348’) to delve into their similarities and differences, and to uncover some fascinating insights into the world of Facebook and network science.

This comparison is not only an exciting exercise in data analysis, but also a valuable opportunity to deepen our understanding of the mechanisms underlying social networks. Through this analysis, we hope to shed light on the factors that shape the structure of Facebook networks and the implications of these structures for the users and the platform itself. 🔦

By combining the power of network science and the richness of Facebook data, we can uncover hidden patterns and relationships that are crucial for our understanding of the digital world.

So let’s dive into the analysis and see what we can learn from these fascinating networks! 😍

# Read in the data for "348" ego-Facebook network:
ego_348_links <- read.table("348.edges")
ego_348_nodes <- read.table("348.feat")

# Generate a network_348 from data.frames:
network_348 <- graph.data.frame(ego_348_links, vertices=ego_348_nodes, directed=TRUE)

# Set up the plotting area to show two plots side by side:
par(mfrow=c(1,2))

# To get the plot of "Network_0":
plot(network, layout=layout_with_fr(network), 
     vertex.size=4, vertex.label.dist=0.5, 
     edge.arrow.size=0.5, vertex.label=NA,
     cex.main=1.2)
title(main = "Facebook Ego Network of ID '0'", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook", side = 1, line = 4, col = "red")

# To get the plot of "Network_348":
plot(network_348, layout=layout_with_fr(network_348), 
     vertex.size=4, vertex.label.dist=0.5, 
     edge.arrow.size=0.5, vertex.label=NA,
     cex.main=1.2)
title(main = "Facebook Ego Network of ID '348'", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook", side = 1, line = 4, col = "red")

Great! 🤩 Now that we have both networks nicely plotted, we can perform some analyses and make comparisons to gain valuable insights into the structure of Facebook networks. 👍

Comparing the networks can be useful in understanding how Facebook users form and maintain connections, and how these connections may differ across different user groups or demographics.

This knowledge can be helpful in designing better algorithms for targeted advertising, improving user experience, and preventing the spread of fake news and misinformation. ✅

Moreover, by studying the structure of Facebook networks, we can also gain a deeper understanding of the principles of network science and their application to various fields such as sociology, psychology, and economics.

a) Number of nodes

# Compare the networks using the following metrics:
# 1. Number of nodes:
n_nodes_0 <- vcount(network)
n_nodes_348 <- vcount(network_348)
cat("Number of nodes in ego network 0:", n_nodes_0, "\n")

## Number of nodes in ego network 0: 347

cat("Number of nodes in ego network 348:", n_nodes_348, "\n")

## Number of nodes in ego network 348: 227

One relevant observation is that the number of nodes in ego network user ID ‘0’ (347) is larger than the number of nodes in ego network of user ID ‘348’ (227). This suggests that the ego network ‘0’ may have a larger social circle or a wider range of social connections than ego network ‘348’.

However, it is important to note that the size of the ego network may not always be a reliable indicator of the social influence or connectivity of an individual, as the quality of connections and their significance can vary greatly among different individuals and networks.

Therefore, further analysis is needed to explore the structure and properties of the two networks in more detail…

b) Number of edges

# 2. Number of edges:
n_edges_0 <- ecount(network)
n_edges_348 <- ecount(network_348)
cat("Number of edges in ego network 0:", n_edges_0, "\n")

## Number of edges in ego network 0: 5038

cat("Number of edges in ego network 348:", n_edges_348, "\n")

## Number of edges in ego network 348: 6384

Based on the number of edges, we can see that ego network of user ID ‘348’ has more edges (6384) than ego network of user ID ‘0’ (5038). This suggests that the users in ego network 348 may be more interconnected and have more relationships compared to ego network 0. This could be due to a variety of factors such as the size of the network, the social behavior of the users, or the type of content shared on the platform.

c) Degree distribution

Based on the degree distribution plots, it appears that the two networks have different patterns of degree distribution. In network ‘0’, the degree distribution decreases rapidly from index 0 to 50, and then remains relatively stable until index 150. In contrast, in network ‘348’, the degree distribution decreases rapidly from index 0 to 50, and then also remains relatively stable until index 200. Additionally, it is notable that the majority of values in both networks have a degree distribution near 0.00.

Overall, these findings suggest that the two networks have different levels of connectivity and potentially different community structures.
Network ‘0’ may have more nodes with higher degrees and/or stronger connectivity between nodes, as evidenced by the higher degree distribution values in the early indices.
On the other hand, network ‘348’ may have a more homogenous structure with a greater proportion of nodes having lower degrees.

It is important to note that further analyses, such as centrality measures, would be necessary to fully understand the structural differences between these two networks.

d) Clustering Coefficient

# 4. Clustering coefficient
cluster_coef_0 <- transitivity(network)
cluster_coef_348 <- transitivity(network_348)
cat("Clustering coefficient for ego network 0:", cluster_coef_0, "\n")

## Clustering coefficient for ego network 0: 0.4258694

cat("Clustering coefficient for ego network 348:", cluster_coef_348, "\n")

## Clustering coefficient for ego network 348: 0.4902791

Based on the clustering coefficient values, it seems that ego network of user ID ‘348’ has a higher clustering coefficient (0.490) compared to ego network of user ID ‘0’ (0.426). This indicates that the nodes in ego network 348 tend to be more interconnected with each other than in ego network 0.

This could be due to various reasons, such as the nature of the relationships between the users, the frequency of interactions, or the size of the networks.

In the context of Facebook, a higher clustering coefficient can imply a higher level of social cohesion within the network, where users tend to form tightly-knit groups and share more common interests or characteristics.

This could be beneficial for Facebook as it can increase user engagement and retention, as well as enable targeted advertising and personalized recommendations based on users’ interests and behaviors.

e) Betweenness Centrality

The comparison of the betweenness centrality values for the two networks reveals interesting insights into the structure of the networks.

Based on the plotted betweenness centrality distributions, it can be observed that network 0 has a wider range of betweenness centrality values than network 348, as its plot ranges from 0 to 30,000, while network 348’s plot ranges from 0 to 5,000. This may be due to the fact that network 0 has more nodes and edges, resulting in more potential paths for nodes to pass through.

Another notable difference between the two networks is the distribution of betweenness centrality values. Network 0 has a larger number of nodes with higher betweenness centrality values, particularly in the range of 5,000 to 25,000, while network 348 has fewer nodes with high betweenness centrality values, with most values falling between 0 and 1,000. This suggests that network 0 has a more complex and interconnected structure, while network 348 has a more hierarchical structure with a few highly influential nodes.

Thus, the comparison of the betweenness centrality values and distributions suggests that network 0 is more densely connected and has a more uniform distribution of influence among its nodes, while network 348 is more sparsely connected and centralized with a few nodes having a higher level of influence.

Conclusion:
In the context of Facebook, our comparison of two ego networks reveals differences in their size, connectivity, and structure, with implications for user engagement and targeted advertising. Network 0 appears to have a wider range of social influence and a more complex and densely connected structure, while network 348 has a more hierarchical structure with a few highly influential nodes. These insights can help Facebook better understand and cater to the needs and interests of its user base.

Report by: ADUHIRE Ange, ADETUNJI Yusuff, COSTA Matthieu, GANGLOFF Romain, MOHANADAS Maxime & MONTIGNON Emélie, students from the ‘MSc Data Science & Organizational Behavior’ at Burgundy School of Business (France)

Supervised by: KOVARIK Jaromir, Professor for “Behavioral Strategies for Business”, Burgundy School of Business (France)