The purpose of this case study is to explore network structures in the Witcher book series. I was interested in the data set because I like the Witcher series and it seemed like a fun data set to work with for my final project. This project also makes me think of something I could do with my major research interests in English and literary studies. My research question for this analysis is:
Research Question: How does the network structure change over the course of the seven books?
The network data is specifically built from the seven books in the Witcher series by Andrzej Sapkowski. For this data set, Ava Sadasivan (2022) created “a function which searched each line in the book for the name of the characters appearing in the sentence and added them as the Source character. Then [she] defined a window size of 5 lines and made it so that if two characters were listed within 5 lined of eachother there would be an edge with a weight of one assigned to them OR if they were already in the dataset for that book +1 would be added to their weight.”
The key variables are:
Source & Target: The two characters who have had an
interaction. Together, the source and target form a character set.Type: The type of connection, directed or undirected.
All of the connections in this data set are undirected, likely due to
the manner of data collection.Weight: Number of interactions between two
characters.Book: Book number from which this character set is from
(character sets may occur in multiple books).The primary audience for my analysis would be people who are interested in the Witcher series or who are interested in SNA for literary analysis. The second group could use this as a model for how to conduct similar analyses in the future.
The first step in preparing myself for actually working with the data is to load the relevant libraries.
library(tidygraph)
##
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
##
## filter
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks tidygraph::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggraph)
library(igraph)
##
## Attaching package: 'igraph'
##
## The following objects are masked from 'package:lubridate':
##
## %--%, union
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following object is masked from 'package:tidygraph':
##
## groups
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(tidytext)
library(dplyr)
Before working with my data, I had to make a few changes. This data set did not require a lot of wrangling because the raw data was in a easily usable format. I didn’t have to add new variables or features at this point in my project, although I do add a few during my actual analysis of my networks. My main focus was on building network objects for the entire Witcher series and for each individual book.
I uploaded the “witcher_network.csv” file I got from Kaggle to my project and then imported it as a data file. This is my raw data before doing any wrangling.
witcher_raw <- read.csv("witcher_network.csv")
My next step is to create edgelists. I decided to make an edgelist for the entire series and for each of the seven books. This made the data more manageable later in my project and made it less work to evaluate each book as I continued working. The only change from the raw data that I needed to make for the entire series was to remove the first row, labeled “X”. This row’s purpose was to number the entries and it interfered with making nodelists later. From there, I made an edgelist for each book by filtering the “book” feature in the data.
witcher_edgelist <- witcher_raw |>
select(!X)
book_1 <- witcher_edgelist |>
filter(book == "1")
book_2 <- witcher_edgelist |>
filter(book == "2")
book_3 <- witcher_edgelist |>
filter(book == "3")
book_4 <- witcher_edgelist |>
filter(book == "4")
book_5 <- witcher_edgelist |>
filter(book == "5")
book_6 <- witcher_edgelist |>
filter(book == "6")
book_7 <- witcher_edgelist |>
filter(book == "7")
After having all of my edgelists, I went on to make my nodelists. I selected the “Source” and “Target” features and then reshaped the data frame to have a single column of the “actors” in the network. I repeated this for all of the edgelists I created so I would have a pair for the entire series and for each book.
actors <- witcher_edgelist |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_1 <- book_1 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_2 <- book_2 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_3 <- book_3 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_4 <- book_4 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_5 <- book_5 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_6 <- book_6 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
actors_7 <- book_7 |>
select(Source, Target) |>
pivot_longer(cols = c(Source, Target)) |>
select(value) |>
rename(actors = value) |>
distinct()
Next, I made network objects for the entire series and each individual book. This was necessary so I could analyze the network data.
witcher_network <- tbl_graph(edges = witcher_edgelist,
nodes = actors,
directed = FALSE)
book_1_network <- tbl_graph(edges = book_1,
nodes = actors_1,
directed = FALSE)
book_2_network <- tbl_graph(edges = book_2,
nodes = actors_2,
directed = FALSE)
book_3_network <- tbl_graph(edges = book_3,
nodes = actors_3,
directed = FALSE)
book_4_network <- tbl_graph(edges = book_4,
nodes = actors_4,
directed = FALSE)
book_5_network <- tbl_graph(edges = book_5,
nodes = actors_5,
directed = FALSE)
book_6_network <- tbl_graph(edges = book_6,
nodes = actors_6,
directed = FALSE)
book_7_network <- tbl_graph(edges = book_7,
nodes = actors_7,
directed = FALSE)
Clear descriptions of the exploratory techniques and/or models (e.g., level of analysis, community detection, ERGMs, etc.) you applied to analyze your data.
Interpretations/meanings of specific metrics (e.g., counts, means, centrality measures, etc.) that your analysis generated.
Data visualizations that are attractive and easy to interpret, and include text or narrative for aiding their Interpretation by others.
The first part of my analysis involved looking at the size of each of the networks. I added the feature “size” to each of my networks and looked at the local size for my actors. Local size is the number of connections a node has. In order of largest local size, the first three actors for each network are:
The size data shows how the main characters shift throughout the series and shows the network size for all of the books. Book 2 has the smallest network and book 7 has the largest network, apart from considering the entire series as a whole.
# total
witcher_network <- witcher_network |>
activate(nodes) |>
mutate(size = local_size())
witcher_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 224 × 2
## actors size
## <chr> <dbl>
## 1 Geralt 138
## 2 Ciri 103
## 3 Yennefer 73
## 4 Dandelion 70
## 5 Emhyr 63
## 6 Triss 46
## 7 Philippa 46
## 8 Vilgefortz 45
## 9 King 44
## 10 Calanthe 40
## # ℹ 214 more rows
# book 1
book_1_network <- book_1_network |>
activate(nodes) |>
mutate(size = local_size())
book_1_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 67 × 2
## actors size
## <chr> <dbl>
## 1 Geralt 59
## 2 Calanthe 15
## 3 Pavetta 12
## 4 Yennefer 11
## 5 Eist 11
## 6 Crach 10
## 7 Rainfarn 9
## 8 Mousesack 9
## 9 Stregobor 8
## 10 Renfri 8
## # ℹ 57 more rows
# book 2
book_2_network <- book_2_network |>
activate(nodes) |>
mutate(size = local_size())
book_2_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 36 × 2
## actors size
## <chr> <dbl>
## 1 Geralt 31
## 2 Dandelion 19
## 3 Ciri 13
## 4 Braenn 10
## 5 Yurga 10
## 6 Yennefer 8
## 7 Eithné 8
## 8 Sir 7
## 9 Agloval 7
## 10 Essi 7
## # ℹ 26 more rows
# book 3
book_3_network <- book_3_network |>
activate(nodes) |>
mutate(size = local_size())
book_3_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 56 × 2
## actors size
## <chr> <dbl>
## 1 Ciri 36
## 2 Geralt 26
## 3 Yennefer 23
## 4 Dandelion 19
## 5 Rience 17
## 6 Calanthe 14
## 7 Triss 14
## 8 King 13
## 9 Foltest 13
## 10 Yarpen 11
## # ℹ 46 more rows
# book 4
book_4_network <- book_4_network |>
activate(nodes) |>
mutate(size = local_size())
book_4_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 83 × 2
## actors size
## <chr> <dbl>
## 1 Ciri 42
## 2 Emhyr 36
## 3 Geralt 33
## 4 Yennefer 29
## 5 Vilgefortz 23
## 6 Philippa 20
## 7 Codringher 19
## 8 Sabrina 19
## 9 Gar 19
## 10 Tissaia 19
## # ℹ 73 more rows
# book 5
book_5_network <- book_5_network |>
activate(nodes) |>
mutate(size = local_size())
book_5_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 76 × 2
## actors size
## <chr> <dbl>
## 1 Geralt 36
## 2 Dandelion 29
## 3 Milva 28
## 4 Ciri 27
## 5 Francesca 26
## 6 Assire 24
## 7 Philippa 22
## 8 Yennefer 22
## 9 Emhyr 21
## 10 Cahir 21
## # ℹ 66 more rows
# book 6
book_6_network <- book_6_network |>
activate(nodes) |>
mutate(size = local_size())
book_6_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 72 × 2
## actors size
## <chr> <dbl>
## 1 Ciri 38
## 2 Geralt 29
## 3 Emhyr 22
## 4 Baron 16
## 5 Falka 14
## 6 Yennefer 14
## 7 Cahir 14
## 8 Vilgefortz 14
## 9 Rience 14
## 10 King 13
## # ℹ 62 more rows
# book 7
book_7_network <- book_7_network |>
activate(nodes) |>
mutate(size = local_size())
book_7_network |>
as_tibble() |>
arrange(desc(size)) |>
select(actors, size)
## # A tibble: 117 × 2
## actors size
## <chr> <dbl>
## 1 Ciri 50
## 2 Geralt 45
## 3 Yennefer 29
## 4 Dandelion 28
## 5 Emhyr 27
## 6 King 26
## 7 Zoltan 24
## 8 Triss 23
## 9 Count 22
## 10 Vilgefortz 22
## # ℹ 107 more rows
Next, I looked at the density of all of my networks. Density is the number of actual ties divided by the number of possible ties in the network. I didn’t add density as a feature to my network since I wasn’t considering using it when graphing my network models later in my analysis. The density values (rounded to 3 decimal places) for each of the networks are:
The most dense social network is found in book 2 and the least dense is found in book 7. This aligns the book 2 having the fewest actors and book 7 having the most actors for any given book. Overall, the networks are not dense. This makes sense for having have a few characters that most of the action is centered around and the side characters don’t interact very much with each other.
# total
edge_density(witcher_network)
## [1] 0.1040999
# book 1
edge_density(book_1_network)
## [1] 0.1198553
# book 2
edge_density(book_2_network)
## [1] 0.252381
# book 3
edge_density(book_3_network)
## [1] 0.1948052
# book 4
edge_density(book_4_network)
## [1] 0.1254775
# book 5
edge_density(book_5_network)
## [1] 0.1578947
# book 6
edge_density(book_6_network)
## [1] 0.1318466
# book 7
edge_density(book_7_network)
## [1] 0.09755379
Centrality examines how much the network revolves around a specific node. I looked at three centrality measures: degree, closeness, and betweenness centrality:
I arranged the tibbles in order of descending degree centrality so that I could look at the characters most central to the network by degree. These values were added to the networks as additional features.
The top characters according to degree centrality for each network were almost exactly the same as the top characters according to local size. If we look just at Geralt and Ciri, we can see the trends in their centrality over each book:
While Geralt maintains a relatively high degree centrality throughout the whole series, there are small ebbs and flows. Additionally, Geralt becomes closer with other characters as the series progresses. Lastly, Geralt starts out often between characters and gradually become less so until book 4 where he is increasingly between characters again as the series finishes up.
Ciri becomes increasingly central to the plot as the series progresses, from not even being in book 1 to having high degree and betweenness centrality in book 7. However, her closeness centrality decreases as the series progresses which indicates her becoming closer with the other egos in the network.
# total
# degree centrality
witcher_network <- witcher_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
witcher_network <- witcher_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
witcher_network <- witcher_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(witcher_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 224 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Geralt 404 0.00326 10855.
## 2 Ciri 331 0.00289 4726.
## 3 Yennefer 208 0.00262 1579.
## 4 Dandelion 188 0.00262 1998.
## 5 Emhyr 150 0.00254 1374.
## 6 Philippa 125 0.00242 765.
## 7 Triss 108 0.00243 333.
## 8 King 103 0.00242 681.
## 9 Vilgefortz 102 0.00240 516.
## 10 Calanthe 98 0.00236 645.
## # ℹ 214 more rows
# book 1
# degree centrality
book_1_network <- book_1_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_1_network <- book_1_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_1_network <- book_1_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_1_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 67 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Geralt 99 0.0135 1896.
## 2 Calanthe 21 0.00826 108.
## 3 Pavetta 18 0.00806 25.9
## 4 Yennefer 17 0.00769 12.0
## 5 Renfri 14 0.00769 36.2
## 6 Eist 14 0.00787 15.5
## 7 Crach 14 0.00781 1.87
## 8 Chireadan 13 0.00752 2.22
## 9 Rainfarn 12 0.00775 13.1
## 10 Nenneke 11 0.00746 2.92
## # ℹ 57 more rows
# book 2
# degree centrality
book_2_network <- book_2_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_2_network <- book_2_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_2_network <- book_2_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_2_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 36 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Geralt 56 0.025 362.
## 2 Dandelion 31 0.0192 94.7
## 3 Ciri 19 0.0169 79.4
## 4 Yurga 16 0.0161 22.6
## 5 Braenn 13 0.0161 12.2
## 6 Eithné 13 0.0156 5.22
## 7 Agloval 12 0.0147 1.47
## 8 Yennefer 11 0.0154 22.1
## 9 Calanthe 11 0.0154 3.05
## 10 Essi 10 0.0149 1.19
## # ℹ 26 more rows
# book 3
# degree centrality
book_3_network <- book_3_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_3_network <- book_3_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_3_network <- book_3_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_3_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 56 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Ciri 59 0.0128 581.
## 2 Geralt 41 0.0115 205.
## 3 Yennefer 35 0.0109 208.
## 4 Dandelion 31 0.0103 189.
## 5 Rience 23 0.0103 138.
## 6 Calanthe 21 0.0101 83.5
## 7 Triss 20 0.00935 21.6
## 8 Foltest 18 0.00893 131.
## 9 Yarpen 16 0.00901 43.4
## 10 King 15 0.00971 34.4
## # ℹ 46 more rows
# book 4
# degree centrality
book_4_network <- book_4_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_4_network <- book_4_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_4_network <- book_4_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_4_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 83 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Ciri 65 0.00826 1067.
## 2 Geralt 50 0.00763 364.
## 3 Emhyr 46 0.00794 1006.
## 4 Yennefer 46 0.00730 290.
## 5 Philippa 30 0.00645 49.1
## 6 Tissaia 29 0.00676 97.5
## 7 Vilgefortz 29 0.00699 223.
## 8 Sabrina 28 0.00621 94.5
## 9 Codringher 25 0.00676 118.
## 10 Gar 25 0.00621 112.
## # ℹ 73 more rows
# book 5
# degree centrality
book_5_network <- book_5_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_5_network <- book_5_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_5_network <- book_5_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_5_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 76 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Geralt 48 0.00885 483.
## 2 Milva 45 0.00806 426.
## 3 Dandelion 45 0.00826 343.
## 4 Ciri 44 0.00826 351.
## 5 Francesca 37 0.008 275.
## 6 Assire 37 0.00775 211.
## 7 Yennefer 34 0.00787 177.
## 8 Philippa 33 0.00752 165.
## 9 Cahir 28 0.00781 152.
## 10 Fringilla 28 0.00752 85.0
## # ℹ 66 more rows
# book 6
# degree centrality
book_6_network <- book_6_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_6_network <- book_6_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_6_network <- book_6_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_6_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 72 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Ciri 63 0.00935 999.
## 2 Geralt 41 0.00806 489.
## 3 Emhyr 25 0.00787 394.
## 4 Yennefer 24 0.00694 49.3
## 5 Falka 20 0.00694 115.
## 6 Baron 20 0.00704 110.
## 7 Rience 20 0.00694 110.
## 8 Cahir 19 0.00730 87.1
## 9 Crach 18 0.00671 22.2
## 10 Vilgefortz 18 0.00699 131.
## # ℹ 62 more rows
# book 7
# degree centrality
book_7_network <- book_7_network |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
# closeness centrality
book_7_network <- book_7_network |>
activate(nodes) |>
mutate(closeness = centrality_closeness(mode = "all"))
# betweenness centrality
book_7_network <- book_7_network |>
activate(nodes) |>
mutate(betweenness = centrality_betweenness(directed = FALSE))
as_tibble(book_7_network) |>
arrange(desc(degree)) |>
select(actors, degree, closeness, betweenness)
## # A tibble: 117 × 4
## actors degree closeness betweenness
## <chr> <dbl> <dbl> <dbl>
## 1 Ciri 81 0.00521 1663.
## 2 Geralt 69 0.005 1226.
## 3 Dandelion 45 0.00446 434.
## 4 Emhyr 42 0.00437 500.
## 5 Yennefer 41 0.00426 323.
## 6 Fringilla 36 0.004 211.
## 7 King 35 0.00444 489.
## 8 Philippa 33 0.00385 293.
## 9 Triss 32 0.00424 190.
## 10 Jarre 32 0.00426 713.
## # ℹ 107 more rows
For my network models, I decided to model each of my networks in the same way. I used degree as the color and I added the label for each character in the network. These models show the changes that are shown in the size, density, and centrality measure in a more readable fashion.
ggraph(witcher_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: ggrepel: 93 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
ggraph(book_2_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
ggraph(book_3_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
ggraph(book_4_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
ggraph(book_5_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
ggraph(book_6_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
ggraph(book_7_network, layout = "fr") +
geom_node_point(aes(size = degree,
color = degree)) +
geom_node_text(aes(label = actors,
size = degree/2,
color = degree),
repel=TRUE) +
geom_edge_link(arrow = NULL,
end_cap = circle(3, 'mm'),
alpha = .3) +
theme_graph()
## Warning: ggrepel: 12 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Research Question: How does the network structure change over the course of the seven books?
The network structure changes in size over the course of the series and it changes in who is most central to the network. In the beginning, the series is focused most prominently on Geralt; however, as the series continues, Ciri begins to take a more prominent role. Outside of those two characters, the series gradually becomes more and more decentralized, which shows the increasing depth and development of many characters throughout the series.
Since this project isn’t focused on research that could improve an educational setting, this is a bit tricky. This type of analysis could be conducted by other people interested in literary research or who want to use this type of analysis as an educational material. In these cases, it’s important to have a good data set to work with. Additionally, they might want to collect information about the specific characters that could make the analysis more intricate, such as gender, age, and profession. These demographic features would allow you to do analyses that looked for trends in relationship that you otherwise would not be able to, such as examining the social networks for just women or just men in the series.
I can’t think of any limitations of this data set for the particular question I set out to answer. Additionally, I don’t think there are any ethical or legal issues. The data set is open-source and the network is mapping the relationships of fictional characters.
Sadasivan, A. (2022). Witcher network. [Data set]. Kaggle. https://www.kaggle.com/datasets/avasadasivan/witcher-network?resource=download