Lab 9: Graphs and Social Networks in Pride & Prejudice
Author
Amanda R. Knudsen
Overview
In the this lab assignment, you are going to construct a social network from the characters in the book “Pride & Prejudice”, a novel written by Jane Austen and available in the janeaustenr package. The social network will be a weighted graph connecting the characters, where the weight is equal to the number of times the names of each character appeared in each 10 line section of the book. Once you create the graph, you will load it into tidygraph, make a visualization of the graph, and rank the most connected characters by a measure called degree centrality
Problem 1
Load the text of Pride & Prejudice into R using the janeaustenr library. Then download and read pride_prejudice_characters.csv, the csv file from my github page containing a list of characters in Pride & Prejudice and their aliases. Here aliases refers to the different names that the characters go by in the books, for example “Darcy” also goes by the names “Mr. Darcy”, and “Mr. Fitzwilliam Darcy” (not to be confused with his cousin “Colonel Fitzwilliam”).
# A tibble: 13,030 × 1
text
<chr>
1 "PRIDE AND PREJUDICE"
2 ""
3 "By Jane Austen"
4 ""
5 ""
6 ""
7 "Chapter 1"
8 ""
9 ""
10 "It is a truth universally acknowledged, that a single man in possession"
# ℹ 13,020 more rows
Rows: 41 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): unique_name, alias
dbl (1): id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pride_prejudice_chars
# A tibble: 41 × 3
id unique_name alias
<dbl> <chr> <chr>
1 1 ElizabethBennet Elizabeth Bennet
2 1 ElizabethBennet Elizabeth
3 1 ElizabethBennet Miss Bennet
4 1 ElizabethBennet Miss Lizzy
5 1 ElizabethBennet Lizzy
6 1 ElizabethBennet Eliza Bennet
7 2 MrDarcy Fitzwilliam Darcy
8 2 MrDarcy Darcy
9 3 MrBennet Mr. Bennet
10 4 MrsBennet Mrs. Bennet
# ℹ 31 more rows
Process the text of Pride & Prejudice to replace instances where an alias occurs with the full name of the character- I recommend using the iteration techniques you learned earlier, I arranged the order of names in the csv file to minimize misidentifications if you replace names in the order that they appear in the file. Making this perfect would require a bit of effort but we are ok if there are some misidentifications. Here the final name of each character will be a single word.
# A tibble: 13,030 × 1
text
<chr>
1 "PRIDE AND PREJUDICE"
2 ""
3 "By JaneBennet Austen"
4 ""
5 ""
6 ""
7 "Chapter 1"
8 ""
9 ""
10 "It is a truth universally acknowledged, that a single man in possession"
# ℹ 13,020 more rows
It looks like even the “Jane” in Jane Austen has become transformed to JaneBennet, but I guess this is by design.
Problem 2
Following the example in chapter 4 of the text mining with R book, create a new column in the data frame corresponding to the Pride & Prejudice text that divides the text into sections of 10 lines each. Then use the pairwise_count function from widyr to determine the number of times each name occurs with each other name in the same 10 line section.
Create a dataframe of nodes which contains the id and unique names of each character, and create a dataframe of edges which contains three columns: a column named from, a column named to, and a column named weight, where the from and to are the id numbers of each character and weight is the number of co-occurences you found in Problem 2. Each pair should only appear once in the edge list (i.e. Elizabeth and MrDarcy but not MrDarcy and then Elizabeth). Create a tidygraph object using tbl_graph that contains the social network data that we just constructed.
Using ggraph, graph the connections between the characters. Make sure that each node is labeled by the character name, and make sure that the weight is represented by the thickness of the edge plotted between the two nodes. Then use the centrality_degree function to calculate the weighted degree centrality of each character, and make a plot which shows the degree centrality of each character where the characters are arranged in order of degree centrality.
graph_plot <-ggraph(graph, layout ="stress") +geom_edge_link(aes(width = weight), alpha =0.4, color ="lavender") +geom_node_point(size =2, color ="purple") +geom_node_text(aes(label = unique_name), repel =TRUE, size =4) +labs(title ="Connections between Characters in Pride & Prejudice") +theme_graph() ggsave("graph_prideprejudice.png", plot = graph_plot, width =10, height =7, dpi =300)# graph_plot
I saved the output of the graph and commented-out the rendering of the graph because my Rstudio kept running into errors when I tried to render this as a PDF. So, below, the image is shown of the graph that the plot creates. If you un-comment the graph_plot this will run the actual graph inline.