Network Analysis Exercise

Strategic and Competitive Intelligence
Master Degree in Data Science and Business Informatics
Univeristà di Pisa

Author

Irene Spada

This exercise introduces the fundamentals of network analysis using the tidygraph and ggraph packages in R. Building on the text mining concepts explored in the previous lecture—such as tokenization, n-grams, and word correlations—we now shift our focus toward network structures derived from relational data.

You will learn how to:

Preliminary step

# Load necessary libraries

library(tidyverse) #for data manipulation
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidygraph) #for graph manipulation

Attaching package: 'tidygraph'

The following object is masked from 'package:stats':

    filter
library(ggraph) #for graph visualization

Step 1: Convert a Data Frame to a Graph

We first define an edge list representing connections between individuals. Next, we use as_tbl_graph() to convert the edge list into a tbl_graph object, which is a tidy representation of a graph suitable for analysis with tidygraph.

# Create a sample edge list data frame

edges <- tibble::tibble( from = c("Alice", "Alice", "Bob", "Carol", "Dave", "Eve", "Frank", "Grace"), to = c("Bob", "Carol", "Dave", "Eve", "Frank", "Grace", "Alice", "Bob") )

# Convert the edge list into a tbl_graph object

graph <- as_tbl_graph(edges, directed = FALSE)

# View the graph structure

graph
# A tbl_graph: 7 nodes and 8 edges
#
# An undirected simple graph with 1 component
#
# Node Data: 7 × 1 (active)
  name 
  <chr>
1 Alice
2 Bob  
3 Carol
4 Dave 
5 Eve  
6 Frank
7 Grace
#
# Edge Data: 8 × 2
   from    to
  <int> <int>
1     1     2
2     1     3
3     2     4
# ℹ 5 more rows

Step 2: Compute Centrality Measures and Detect Communities

We first set the context to node-level operations with the function activate(nodes). Next centrality_degree() calculates the degree centrality for each node, indicating how many connections each node has. Finally, group_louvain() detects communities within the graph using the Louvain algorithm, assigning a community membership to each node. It assigns a community (cluster) label to each node using the Louvain algorithm. This method compute degree centrality and detect communities by optimizing modularity. It works by putting each node into a group for optimizing modularity score. The output is a vector of group IDs (one for each node).

# Compute centrality and communities
graph <- graph %>%
  activate(nodes) %>% # Focus on node data
  mutate( degree = centrality_degree(), # Degree centrality 
          community = group_louvain() # Community detection 
          )

# View the updated node data
graph %>% as_tibble()
# A tibble: 7 × 3
  name  degree community
  <chr>  <dbl>     <int>
1 Alice      3         1
2 Bob        3         2
3 Carol      2         1
4 Dave       2         3
5 Eve        2         1
6 Frank      2         3
7 Grace      2         2

Step 3: Visualize the Graph with ggraph

First we initializes the graph plot with a specified layout with ggraph. Next we add the other elements of the graphs in terms of data and visual:

  • geom_edge_link() adds edges between nodes.

  • geom_node_point() plots nodes, sizing them by their degree centrality and coloring them by community membership.

  • geom_node_text() adds labels to the nodes.

  • theme_void() removes background annotations for a cleaner look.

  • labs() adds a title to the plot.

# Visualize the graph

ggraph(graph, layout = "fr") + # Use Fruchterman-Reingold layout
  geom_edge_link(alpha = 0.8) + # Draw edges with some transparency
  geom_node_point(aes(size = degree, color = as.factor(community))) + # Nodes sized by degree and colored by community 
  geom_node_text(aes(label = name), repel = TRUE, size = 3) + # Add node labels
  theme_void() + # Clean theme without axes
  labs(title = "Network Graph with Centrality and Community Detection")