Ah, Facebook! 😄 The one-stop-shop for awkward high
school reunion conversations, political arguments with distant
relatives, and cat videos.
But what lies beneath the surface of this social media
giant? 🤔
Friendships, connections, and an intricate web of network
science. 🤩
The study of social networks, or the ways in which
people are connected to each other, has become an increasingly
important field in our digital age. And what better way to
explore this field than through the lens of Facebook,
where the lines between acquaintances, friends, and frenemies are
blurrier than ever before.
So let’s dive into the depths of Facebook friendships and
uncover the hidden gems of network science that lie
beneath. 😍
The network we have chosen to analyze is commonly referred to as the
“‘0’ ego-Facebook network”.
The data for the “0” ego-Facebook network was
obtained from the Stanford Network Analysis Project
(SNAP), a collection of datasets widely used for research in
network analysis and related fields.
The dataset is publicly available and was originally collected by a team
of researchers from Facebook, led by Jure Leskovec. The data was
collected by crawling the Facebook graph, and the resulting dataset
contains a snapshot of the social network of a single user with
user ID “0” at a particular point in time. 👍
The dataset includes information on the users in the
network, as well as their friend
connections.
The use of this dataset is permitted under Facebook’s terms of service,
and researchers are encouraged to cite the original source of the
data.
In social network analysis, an ego network is
a network centered around a single individual, called
the ego. The ego network typically includes the ego as
a node, and all the nodes that are directly
connected to the ego. In the case of the “0”
ego-Facebook network, it represents the social network
of a single individual whose Facebook user ID is “0”. In other
words, the network is centered around a specific Facebook
user, and includes all the friends of that
user, as well as the connections between them.
👨👨👦👦
The term “ego-Facebook” simply refers to the fact that
the network is centered around a single individual on the Facebook
platform.
Therefore, each node in the network corresponds to
a Facebook user, and an edge (= link) between
two nodes indicates that the corresponding users are
friends on Facebook.
The network is directed, meaning that the relationship
of “friendship” is not necessarily mutual, and there may be cases where
one person is friends with another, but the other is not friends with
the first.
Why did we choose this network? 🤔
This network is interesting because it offers
insights into the structure and dynamics of real-world social
networks.
For example, by analyzing the network, one could identify key
individuals who serve as “bridges” between different
groups or clusters within the network, or study the spread of
information or behavior within the network.
Consequently, the “0” ego-Facebook network can be super relevant
for social network analysis and has been widely used as
a benchmark dataset for studying social network properties and
dynamics. 😉
I/ Global Network
Analysis
a) Checking Network
Properties
cat("Number of nodes: ", vcount(network), "\n")
## Number of nodes: 347
cat("Number of edges: ", ecount(network), "\n")
## Number of edges: 5038
cat("Is the network directed? ", is_directed(network), "\n")
## Is the network directed? TRUE
cat("Density of the network: ", edge_density(network), "\n")
## Density of the network: 0.04196165
cat("Reciprocity of the network: ", reciprocity(network), "\n")
## Reciprocity of the network: 1
cat("Diameter of the network: ", diameter(network, directed=is_directed(network)), "\n")
## Diameter of the network: 11
cat("Average path length: ", mean_distance(network, directed=is_directed(network)), "\n")
## Average path length: 3.752446
cat("Clustering coefficient: ", transitivity(network), "\n")
## Clustering coefficient: 0.4258694
# To compute the degree of each vertex in the network:
degree <- igraph::degree(network, mode = "all")
# The argument mode = "all" specifies that we want to compute the total degree of each vertex, taking into account both incoming and outgoing edges.
# The mean degree of the vertices in the network:
mean_degree <- mean(degree)
cat("Mean of the vertices in the network: ", mean_degree, "\n")
## Mean of the vertices in the network: 29.03746
# The standard deviation degree of the vertices in the network:
sd_degree <- sd(degree)
cat("Standard deviation of degree in the network: ", sd_degree, "\n")
## Standard deviation of degree in the network: 31.01494
# Counting the number of vertices in the network that have a degree of 0:
num_degree_zero <- sum(degree == 0)
cat("Number of vertices in the network that have a degree of 0: ", num_degree_zero, "\n")
## Number of vertices in the network that have a degree of 0: 14
# Compute the average clustering coefficient:
avg_cc <- transitivity(network, type="global")
cat("Average clustering coefficient: ", avg_cc, "\n")
## Average clustering coefficient: 0.4258694
b) Analysis of components
clusters(network, mode = "weak") # information about the clusters or components in your network/graph
## $membership
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 1 1 1 1 1 1 1 1 1 1 2 3 1 1 4 1 1 5 1 1
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 1 1 1 1 1 1 1 1 1 1 1 1 6 1 1 1 7 1 1 1
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 1 6 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## 1 1 1 1 1 1 1 1 1 1 1 1 1 9 1 1 1 1 1 1
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
## 1 1 1 1 1 1 1 1 1 10 1 1 1 1 1 1 1 1 1 1
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1
## 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
## 1 1 1 1 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 1
## 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
## 1 1 1 1 1 1 1 1 12 13 1 1 1 1 14 1 1 1 1 1
## 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
## 1 1 1 1 1 1 1 1 1 1 1 1 15 1 1 1 1 1 1 1
## 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260
## 1 1 1 16 1 1 1 1 1 1 1 1 1 1 1 15 1 1 1 1
## 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
## 1 16 1 1 1 1 17 1 1 1 1 18 1 1 1 1 1 1 1 1
## 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 19 1 1 1 1 1
## 341 342 343 344 345 346 347
## 1 1 1 1 1 1 1
##
## $csize
## [1] 324 1 1 1 1 2 1 1 1 3 1 1 1 1 2 2 1 1 1
##
## $no
## [1] 19
clusters_name <- names(clusters(network))
cat("Name of clusters: ", clusters_name, "\n")
## Name of clusters: membership csize no
c) Assortativity of our network
What is “assortativity” ? 🤔
It is a measure of the tendency of nodes in a network to be
connected to other nodes with similar or dissimilar properties.
It is a way to assess the degree of homophily or heterophily
in a network. Homophily means that nodes with similar
attributes (e.g., age, gender, occupation, interests) are more likely to
be connected to each other, while heterophily means that nodes
with different attributes are more likely to be connected.
Assortativity quantifies this tendency by computing the
correlation between the degrees of nodes at either end of an
edge in the network.
A positive assortativity coefficient indicates that
nodes with high degree tend to be connected to other nodes with high
degree, while a negative coefficient indicates that
high-degree nodes tend to be connected to low-degree nodes.
Let’s calculate the assortativity of our
network… 🤓
# Calculate assortativity:
es <- get.edgelist(network)
assortativity <- cor(degree[es[, 1]], degree[es[, 2]])
cat("The assortativity coefficient is equal to: ", assortativity, "\n")
## The assortativity coefficient is equal to: 0.2360392
## ...using the assortativity coefficient
assortativity_check_F <- assortativity_degree(network, directed=F)
assortativity_check_T <- assortativity_degree(network, directed=T)
cat("The assortativity coefficient is equal to: ", assortativity_check_F, "\n")
## The assortativity coefficient is equal to: 0.2360392
Therefore, an assortativity coefficient of 0.2360392
means that nodes in the network tend to be connected to other
nodes with similar degrees. This is a measure of the
degree correlation in the network. 👍
Especially, in the context of Facebook, the assortativity coefficient
can provide insights into the way people connect with each other
on the platform. An assortativity coefficient of
0.236, in this case of the Facebook network, is
relatively high compared to other social networks,
indicating a tendency for individuals to connect with others who
have a similar number of connections or a similar level of activity on
the platform.
This may indicate the existence of communities or
subgroups within the network that are formed around
shared interests, values, or demographic
characteristics. 👨👨👦👦
II/ Network Community
Detection
Now, imagine a vast and complex web of connections between
people, each represented by a node in a
massive network graph. This network represents the
social connections of a single individual on Facebook,
known as the “0 ego-Facebook network.” It is a world of
friendships and acquaintances, where every node represents a
person and every edge represents a connection between two people who are
friends on Facebook…
a) The Louvain Algorithm
Let’s now use the ‘Louvain algorithm’ to identify communities in our ‘0’ ego-Facebook network. 👨👨👦👦
The Louvain algorithm is a heuristic optimization method that maximizes a modularity score, which measures the quality of the network partition by comparing the number of edges within communities to the number of edges expected by chance. This algorithm tends to produce larger, more cohesive communities by optimizing a global measure of community structure.
# Convert the directed network to an undirected network:
undirected_network <- as.undirected(network)
# Apply the Louvain algorithm on the undirected network:
community <- cluster_louvain(undirected_network)
# Get the number of communities:
num_communities <- length(community)
# Print the number of communities:
cat("Number of communities detected:", num_communities, "\n")
## Number of communities detected: 28
# Set vertex color based on the community membership:
V(network)$color <- rainbow(num_communities)[membership(community)]
#==== Nicely plotted ====
# Apply Fruchterman-Reingold layout to the network:
layout <- layout_with_fr(network, area = 10000)
# Plot the network with nodes colored according to their community:
plot(community, network, vertex.color = membership(community), layout = layout, vertex.label=NA, edge.arrow.size=0.5, edge.curved=0.2, main="Facebook Network Community Detection using the Louvain Algorithm")
title(main = "Facebook Network Community Detection using the Louvain Algorithm", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook, snap.stanford.edu", side = 1, line = 4, col = "red")
To make sense of this massive network, we turned to community
detection, a powerful technique that allows us
to identify clusters of nodes that are more densely connected to
each other than to nodes in other clusters. By using the
Louvain algorithm, we were able to reveal the underlying structure of
the network: 28 distinct communities of Facebook users,
which suggests that the network has a relatively cohesive
structure with well-defined groups of users.
Each community is represented by a unique color in our visualization, with nodes of the same color belonging to the same community. This visual representation allows us to see the complex web of relationships in a new light, revealing distinct groups of people who are more tightly connected to each other than to people outside their community.Through community detection, we have gained a deeper understanding of the social structure of the ‘0’ ego-Facebook network.
It is difficult to identify key nodes in the network. We can
see that there are some nodes that are more central than
others, such as the larger nodes in the center of the
plot.
The network is highly decentralized, with no clear
central node or cluster dominating the network. Instead, we observe
a complex web of interconnected communities, with some
individuals acting as “bridges” between different
groups. These insights demonstrate the power of network
community detection for identifying the overall structure and
topology of a complex social network.
A few larger communities with a high number of nodes
indicate a relatively stable network structure with
established communities. However, there are also
smaller communities that have fewer
nodes and appear to be more fluid, potentially
indicating emerging or evolving communities.
Overall, the Louvain algorithm is a powerful tool for
identifying communities in social networks like
Facebook. It allows us to uncover hidden patterns in the data
and gain insights into the complex world of social connections on
Facebook, gain insights into the structure and behavior of the
network, and to identify key individuals and groups
that play important roles within the network. 👍
b) The Edge-Betweenness Algorithm
Although, the ‘Louvain algorithm’ is often preferred for its
ability to produce larger and more coherent communities that better
reflect the underlying structure of the network, we have applied
another popular community detection algorithm, called
the ‘edge-betweenness algorithm’, to the “0”
ego-Facebook network.
This algorithm works by iteratively removing the edges with the
highest “betweenness” (i.e., the edges that are most central to
the network) and measuring the increase in the number of
connected components. The resulting communities are
based on the components that emerge when the network is split into
smaller pieces.
# Run edge-betweenness algorithm
eb_community <- edge.betweenness.community(network)
# Get the number of communities detected
n_communities_eb <- length(membership(eb_community))
# Print the number of communities:
cat("Number of communities detected:", n_communities_eb, "\n")
## Number of communities detected: 347
Using this algorithm, we have identified a total of 347
communities in the network, as visualized in the following
plot. The communities are represented by different
colors, and the size of the nodes reflects their
degree (i.e., the number of connections they have with
other nodes). By analyzing the structure of the network using the
edge-betweenness algorithm, we can gain further insights into the
patterns of social connections within the “0” ego-Facebook
network.
The edge-betweenness algorithm can still provide valuable
insights into the network’s structure by revealing finer
details of the social connections between individuals.
# Set vertex color based on community membership
V(network)$color <- rainbow(n_communities_eb)[membership(eb_community)]
# Plot the network with community colors
plot(eb_community, network, vertex.color = V(network)$color, layout = layout, vertex.label=NA, edge.arrow.size=0.5, edge.curved=0.2)
title(main = "Facebook Community Detection using Edge-Betweenness Algorithm", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook, snap.stanford.edu", side = 1, line = 4, col = "red")
The resulting plot of the edge-betweenness algorithm shows a
network with many more communities (347) than the Louvain
algorithm (28). This is not surprising, as the edge-betweenness
algorithm is known to produce more granular partitions of the
network. This high number of communities indicates that
the network is more fragmented and less cohesive
overall. By analyzing the 347 communities identified by this
algorithm, we can gain a deeper understanding of the
patterns of friendship and communication among the users of this 0
ego-Facebook network.
There are many small communities that are not well connected to the rest of the network. This suggests that there may be some nodes that are more important in terms of connecting different parts of the network. However, without further analysis, it is difficult to identify these nodes from the plot alone.
To identify key nodes, we shall need to later perform
additional analysis such as betweenness centrality or eigenvector
centrality, which can help us to identify nodes that are
important in terms of connecting different communities or serving as
hubs in the network.
The network is highly decentralized, with no clear
central node or cluster dominating the network. Instead, we observe
a complex web of interconnected communities, with some
individuals acting as “bridges” between different
groups. These insights demonstrate the power of network
community detection for identifying the overall structure and
topology of a complex social network.
We can observe a much more fragmented network with many
smaller communities. This suggests that the network may be
undergoing significant changes over time, with
individuals frequently forming and leaving smaller
communities. Additionally, there appear to be several
nodes with high betweenness centrality that connect multiple
communities, which may indicate influential
individuals or bridge nodes that play
a critical role in maintaining the overall network
structure.
Overall, the edge-betweenness algorithm provides a different
perspective on the network structure compared to the Louvain
algorithm. It highlights the presence of a large number
of smaller communities, indicating a more fragmented
network. This could suggest that information or
influence within the network may not flow as easily as in a more
tightly-knit network. 👍
⚠ However, it is important to note that the choice of algorithm
used for community detection can significantly impact the results
obtained, and different algorithms may highlight
different aspects of the network structure.
III/ Modularity of
our network
a) Comparison of
modularity with a random graph
To compare the modularity of our network with that of a
comparable random graph, we generate a random graph with the
same number of nodes and edges as our original network and compute its
modularity.
We can then compare the modularity of the random graph with that of our
original network to see if our network has a higher modularity than
expected by chance.
# Compute the modularity score for the network:
mod_score <- modularity(community)
cat("The modularity score of the network is:", mod_score, "\n")
## The modularity score of the network is: 0.4562043
# Compute the modularity score for a random graph with the same number of nodes and edges:
random_graph <- sample_gnm(n = vcount(network), m = ecount(network))
random_community <- cluster_louvain(random_graph)$membership
random_mod_score <- modularity(random_graph, random_community)
cat("The modularity score for a random graph:", random_mod_score, "\n")
## The modularity score for a random graph: 0.1503969
# Compare the modularity of the random graph with that of your original network:
if (mod_score > random_mod_score) {
cat("Therefore, the modularity of the network is HIGHER than that of a comparable random graph.\n")
} else {
cat("Therefore, the modularity of the network is NOT HIGHER than that of a comparable random graph.\n")
}
## Therefore, the modularity of the network is HIGHER than that of a comparable random graph.
We notice that the modularity of the Facebook social network
is higher than that of a comparable random graph, it suggests
that the network has a non-random structure and is
likely to have communities or clusters of nodes that are more
densely connected to each other than to the rest of the
network. 👨👨👦👦
This is a common characteristic of social networks,
where nodes tend to be more connected to others who
share similar interests, hobbies, beliefs, or
backgrounds. 💞
The higher modularity score of the Facebook social
network also suggests that there are identifiable communities or
groups within the network. By further analyzing the structure
of the network and the characteristics of the nodes within each
community, it may be possible to gain insights into the
nature of these groups and how they interact with each
other. This information could be valuable for
a variety of purposes, such as targeted marketing,
social network analysis, or understanding
social dynamics. ✅
b) Comparison of modularity between both ‘Louvain’ & ‘Edge-betweenness’ algorithms
Comparing the modularity score of the network using the Louvain method and the Edge-betweenness method can provide some insights into the community structure of the network.
# Compute the modularity score using the Louvain method:
modularity_louvain <- modularity(community, undirected_network)
cat("The modularity score of our network using the Louvain method is equal to:", modularity_louvain, "\n")
## The modularity score of our network using the Louvain method is equal to: 0.4562043
# Compute the modularity score using edge betweenness:
edge_betweenness <- edge.betweenness.community(undirected_network)
modularity_edge_between <- modularity(edge_betweenness, undirected_network)
cat("The modularity score of our network using the Edge-betweenness method is equal to:", modularity_edge_between, "\n")
## The modularity score of our network using the Edge-betweenness method is equal to: 0.4161461
Therefore, we can see that the Louvain method produced a
higher modularity score (0.4634598) than the
Edge-betweenness method (0.4161461). This suggests that the
community structure of the network is more well-defined and
distinct when using the Louvain method. 👍
Additionally, the Louvain method is known for being more
efficient and scalable than the Edge-betweenness method, which
can be important factors to consider when working with larger
networks.
Overall, this comparison provides some insights into the community
structure of the network and can inform further analysis and
interpretation of the network’s properties and dynamics.
IV/ Individual
Network Analysis
As we told earlier, we need to perform additional analysis such
as “betweenness centrality” or “eigenvector centrality”, which
can help us to identify nodes that are important in
terms of connecting different communities or serving as hubs in the
network. 🤝
In network analysis, positioning refers to the
arrangement of nodes in a network with respect to each
other. There are different algorithms used for positioning,
such as force-directed algorithms, spectral algorithms, and
multidimensional scaling. The goal of positioning is to
create visualizations that reveal the underlying structure and patterns
in the network, such as clusters or
communities, and to facilitate analysis and
interpretation of the network data.
a) Global Centrality
# Calculate betweenness:
# Create a network object from the adjacency matrix:
adj_matrix <- as.matrix(network)
network_betweenness <- as.network(adj_matrix)
# Compute betweenness centrality:
betweenness <- betweenness(network_betweenness)
# Not displaying the values to not pollute the page with all the results... 😉
# Calculate global and local transitivity:
# Assuming adj_matrix is the adjacency matrix of your network
network_transitivity <- graph_from_adjacency_matrix(adj_matrix, mode = "undirected")
# Compute the global transitivity:
global_transitivity <- transitivity(as.undirected(network_transitivity), type = "global")
# Not displaying the values to not pollute the page with all the results... 😉
# Compute the local transitivity:
local_transitivity <- transitivity(as.undirected(network_transitivity), type = "local")
# Not displaying the values to not pollute the page with all the results... 😉
# Calculate eigenvector centrality:
eigen_centrality <- eigen_centrality(network_transitivity)$vector
# Plot histogram of eigenvector centrality:
hist(eigen_centrality,
breaks = "Sturges",
col = "#3B5998",
main = "\nHistogram of the Eigenvector Centrality (EC)",
cex.main = 1.2,
col.main = "red",
xlab = "Eigenvector Centrality")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook, snap.stanford.edu", side = 1, line = 4, col = "red")
Based on the histogram of Eigenvector Centrality (EC), it appears
that the majority of nodes in the network have a relatively low
eigenvector centrality, with values between 0.0 and 0.1
being the most frequent. This suggests that the network
may not have a well-defined hierarchy, as there are
many nodes with similar levels of centrality. However,
there are also some nodes with high eigenvector centrality
values, which may be particularly influential
in the network.
It’s also worth noting that the histogram has a relatively long
tail on the right, with some nodes having very high
eigenvector centrality values. These nodes may be particularly
important in the network, as they have the potential to
strongly influence other nodes.
Overall, the histogram of eigenvector centrality can give us
insights into the structure of the network and the
relative importance of different nodes within it.
b) Local Centrality
Local centrality measures help us identify the most important nodes in a network from a local perspective. In the context of Facebook, local centrality measures can help us identify individuals who are influential within specific communities or subgroups. This information can be used to better understand how information or ideas spread within a network and to design targeted interventions to influence behavior or opinions. 😉
degree_distribution <- degree.distribution(undirected_network, mode = "all")
# Calculate out-degree:
outdegree_distribution <- degree.distribution(undirected_network, mode = "out")
# Calculate in-degree:
outdegree_distribution <- degree.distribution(undirected_network, mode = "in")
# Not displaying the values to not pollute the page with all the results... 😉
c) Clustering Coefficient
The clustering coefficient measures how much nodes in a network tend to cluster together. In other words, it measures the degree to which nodes in a network tend to form tightly-knit groups. This can be useful for understanding the social structure of a network, such as Facebook, as it can help identify closely-knit communities within the larger network.
clustering <- transitivity(undirected_network, type="local")
# Not displaying the values to not pollute the page with all the results... 😉
V/ Comparison between
two networks 😉
Facebook has become one of the most popular social media
platforms in the world, connecting people from different parts of the
globe. Its unique structure, which revolves
around the concept of ‘friendship’, has attracted the
attention of network scientists. 🤓
By analyzing the patterns of connections between users,
network scientists can gain insights into the functioning of the
platform, the behavior of its users, and even the dynamics of social
relationships.
To end this report with a bang 💥, we
compare two Facebook networks (users ‘0’ and ‘348’) to delve into their
similarities and differences, and to uncover some fascinating insights
into the world of Facebook and network science.
This comparison is not only an exciting exercise in data analysis, but
also a valuable opportunity to deepen our understanding of the
mechanisms underlying social networks. Through this analysis,
we hope to shed light on the factors that shape the structure of
Facebook networks and the implications of these
structures for the users and the platform itself. 🔦
By combining the power of network science and the richness of
Facebook data, we can uncover hidden patterns and relationships that are
crucial for our understanding of the digital world.
So let’s dive into the analysis and see what we can learn
from these fascinating networks! 😍
# Read in the data for "348" ego-Facebook network:
ego_348_links <- read.table("348.edges")
ego_348_nodes <- read.table("348.feat")
# Generate a network_348 from data.frames:
network_348 <- graph.data.frame(ego_348_links, vertices=ego_348_nodes, directed=TRUE)
# Set up the plotting area to show two plots side by side:
par(mfrow=c(1,2))
# To get the plot of "Network_0":
plot(network, layout=layout_with_fr(network),
vertex.size=4, vertex.label.dist=0.5,
edge.arrow.size=0.5, vertex.label=NA,
cex.main=1.2)
title(main = "Facebook Ego Network of ID '0'", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook", side = 1, line = 4, col = "red")
# To get the plot of "Network_348":
plot(network_348, layout=layout_with_fr(network_348),
vertex.size=4, vertex.label.dist=0.5,
edge.arrow.size=0.5, vertex.label=NA,
cex.main=1.2)
title(main = "Facebook Ego Network of ID '348'", col.main = "red")
# Add the caption below the plot:
mtext("Source: SNAP | Ego-Facebook", side = 1, line = 4, col = "red")
Great! 🤩 Now that we have both networks nicely
plotted, we can perform some analyses and make
comparisons to gain valuable insights into the
structure of Facebook networks. 👍
Comparing the networks can be useful in understanding how
Facebook users form and maintain connections, and how
these connections may differ across different user groups or
demographics.
This knowledge can be helpful in designing better algorithms for
targeted advertising, improving user experience, and preventing the
spread of fake news and misinformation. ✅
Moreover, by studying the structure of Facebook networks, we can also
gain a deeper understanding of the principles of network
science and their application to various
fields such as sociology, psychology, and
economics.
a) Number of nodes
# Compare the networks using the following metrics:
# 1. Number of nodes:
n_nodes_0 <- vcount(network)
n_nodes_348 <- vcount(network_348)
cat("Number of nodes in ego network 0:", n_nodes_0, "\n")
## Number of nodes in ego network 0: 347
cat("Number of nodes in ego network 348:", n_nodes_348, "\n")
## Number of nodes in ego network 348: 227
One relevant observation is that the number of nodes in ego
network user ID ‘0’ (347) is larger than the number of nodes in ego
network of user ID ‘348’ (227). This suggests that the
ego network ‘0’ may have a larger social circle or a wider range of
social connections than ego network ‘348’.
However, it is important to note that the size of the ego
network may not always be a reliable indicator of the social influence
or connectivity of an individual, as the quality of
connections and their significance can vary greatly among different
individuals and networks.
Therefore, further analysis is needed to explore the structure and
properties of the two networks in more detail…
b) Number of edges
# 2. Number of edges:
n_edges_0 <- ecount(network)
n_edges_348 <- ecount(network_348)
cat("Number of edges in ego network 0:", n_edges_0, "\n")
## Number of edges in ego network 0: 5038
cat("Number of edges in ego network 348:", n_edges_348, "\n")
## Number of edges in ego network 348: 6384
Based on the number of edges, we can see that ego network of user ID ‘348’ has more edges (6384) than ego network of user ID ‘0’ (5038). This suggests that the users in ego network 348 may be more interconnected and have more relationships compared to ego network 0. This could be due to a variety of factors such as the size of the network, the social behavior of the users, or the type of content shared on the platform.
c) Degree distribution
Based on the degree distribution plots, it appears that the
two networks have different patterns of degree
distribution. In network ‘0’, the degree distribution
decreases rapidly from index 0 to 50, and then remains
relatively stable until index 150. In contrast, in
network ‘348’, the degree distribution decreases rapidly from
index 0 to 50, and then also remains relatively stable
until index 200. Additionally, it is notable that the
majority of values in both networks have a degree distribution near
0.00.
Overall, these findings suggest that the two networks have
different levels of connectivity and potentially different community
structures.
Network ‘0’ may have more nodes with higher
degrees and/or stronger connectivity between nodes, as
evidenced by the higher degree distribution values in the early
indices.
On the other hand, network ‘348’ may have a
more homogenous structure with a greater proportion of nodes having
lower degrees.
It is important to note that further analyses, such as
centrality measures, would be necessary to fully understand the
structural differences between these two networks.
d) Clustering Coefficient
# 4. Clustering coefficient
cluster_coef_0 <- transitivity(network)
cluster_coef_348 <- transitivity(network_348)
cat("Clustering coefficient for ego network 0:", cluster_coef_0, "\n")
## Clustering coefficient for ego network 0: 0.4258694
cat("Clustering coefficient for ego network 348:", cluster_coef_348, "\n")
## Clustering coefficient for ego network 348: 0.4902791
Based on the clustering coefficient values, it seems
that ego network of user ID ‘348’ has a higher clustering
coefficient (0.490) compared to ego network of user ID ‘0’
(0.426). This indicates that the nodes in ego network
348 tend to be more interconnected with each other than in ego network
0.
This could be due to various reasons, such as the
nature of the relationships between the users, the frequency of
interactions, or the size of the networks.
In the context of Facebook, a higher clustering
coefficient can imply a higher level of social cohesion within the
network, where users tend to form tightly-knit groups and share more
common interests or characteristics.
This could be beneficial for Facebook as it can increase user
engagement and retention, as well as enable targeted
advertising and personalized recommendations based on users’ interests
and behaviors.
e) Betweenness Centrality
The comparison of the betweenness centrality values
for the two networks reveals interesting insights into the
structure of the networks.
Based on the plotted betweenness centrality distributions, it can be
observed that network 0 has a wider range of betweenness
centrality values than network 348, as its plot ranges
from 0 to 30,000, while network 348’s plot ranges from
0 to 5,000. This may be due to the fact that network 0
has more nodes and edges, resulting in more potential
paths for nodes to pass through.
Another notable difference between the two networks is the
distribution of betweenness centrality values. Network
0 has a larger number of nodes with higher betweenness
centrality values, particularly in the range of 5,000 to
25,000, while network 348 has fewer nodes with
high betweenness centrality values, with most values falling
between 0 and 1,000. This suggests that network 0 has a more
complex and interconnected structure, while network 348
has a more hierarchical structure with a few highly influential
nodes.
Thus, the comparison of the betweenness centrality values and
distributions suggests that network 0 is more densely connected
and has a more uniform distribution of influence among its
nodes, while network 348 is more sparsely connected and
centralized with a few nodes having a higher level of
influence.
Conclusion:
In the context of
Facebook, our comparison of two ego networks reveals differences in
their size, connectivity, and structure, with implications for user
engagement and targeted advertising. Network 0 appears to have a wider
range of social influence and a more complex and densely connected
structure, while network 348 has a more hierarchical structure with a
few highly influential nodes. These insights can help Facebook better
understand and cater to the needs and interests of its user
base.
Report
by: ADUHIRE Ange, ADETUNJI Yusuff, COSTA Matthieu, GANGLOFF Romain,
MOHANADAS Maxime & MONTIGNON Emélie, students from the ‘MSc Data
Science & Organizational Behavior’ at Burgundy School of Business
(France)
Supervised by: KOVARIK Jaromir, Professor for
“Behavioral Strategies for Business”, Burgundy School of Business
(France)