Tolson Chapel, built 1866
The First Person Narratives of the American South are a collection of personal accounts of the lives of individuals such as farmers, women, enlisted men, laborers, Native Americans and African Americans as the South reconstructed itself following the Civil War.
The data used for this text analysis is just a small sample of the original 344 item collection that is housed at the University of North Carolina-Chapel Hill’s University Library. In addition to these narratives, the university collection also includes the Southern Historical Collection, which is one of the largest collections of Southern manuscripts in the nation, and the North Carolina Collections. The data was obtained from the open source data website, Kaggle, and was previously digitized and manually transcribed by UNC students and faculty.
This analysis samples 10 hand selected narratives from the collection for text analysis. This is in order to get a variety of perspectives . In it, the following questions will be addressed:
It is hoped that some common themes, patterns and words can be identified.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidytext)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.1 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.1.8
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(tidyr)
library(ggplot2)
library(igraph)
## Warning: package 'igraph' was built under R version 4.2.3
##
## Attaching package: 'igraph'
##
## The following objects are masked from 'package:lubridate':
##
## %--%, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
library(ggraph)
## Warning: package 'ggraph' was built under R version 4.2.3
library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.2.3
library(readxl)
library(writexl)
Each of the texts exist as a .txt file, so in order to use them for analysis, they will need to be combined and converted into a single usable data frame.
asnarratives <- read_xlsx("asnarratives.xlsx")
asnarratives
## # A tibble: 6,699 × 1
## text
## <chr>
## 1 <NA>
## 2 [Cover Image]
## 3 [Frontispiece Image]
## 4 [Title Page Image]
## 5 FOREWORD.
## 6 THIS little book is written for my children and the descendants of those who…
## 7 From its perusal may they learn still more to reverence the memory of their …
## 8 To this record I have added my memories of the home of my youth, under South…
## 9 This long retrospect of mine, a retrospect of eighty years, portrays faithfu…
## 10 I write with a loving hand as I pay this tribute to the past.
## # … with 6,689 more rows
asn_bigrams <- asnarratives %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
asn_bigrams
## # A tibble: 590,020 × 1
## bigram
## <chr>
## 1 <NA>
## 2 cover image
## 3 frontispiece image
## 4 title page
## 5 page image
## 6 <NA>
## 7 this little
## 8 little book
## 9 book is
## 10 is written
## # … with 590,010 more rows
Upon inspecting, the top 5 bigrams on first page already reveals some bigrams that don’t really describe the texts. They will need to be removed to get a more accurate picture of common words and themes across the texts.
asn_bigrams %>%
count(bigram, sort = TRUE)
## # A tibble: 227,462 × 2
## bigram n
## <chr> <int>
## 1 of the 4767
## 2 in the 2836
## 3 to the 2183
## 4 on the 1347
## 5 it was 1276
## 6 and the 1259
## 7 to be 1160
## 8 for the 1107
## 9 i was 984
## 10 he was 879
## # … with 227,452 more rows
bigrams_separated <- asn_bigrams %>%
separate(bigram, c("Word1", "Word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
filter(!(Word1 %in% stop_words$word)) %>%
filter(!(Word2 %in% stop_words$word))
bigram_counts <- bigrams_filtered %>%
count(Word1, Word2, sort = TRUE)
bigram_counts
## # A tibble: 42,889 × 3
## Word1 Word2 n
## <chr> <chr> <int>
## 1 <NA> <NA> 294
## 2 baton rouge 81
## 3 north carolina 74
## 4 thomas dabney 54
## 5 short time 50
## 6 port hudson 49
## 7 civil war 48
## 8 daughter emmy 41
## 9 south carolina 40
## 10 provost marshal 37
## # … with 42,879 more rows
As previously mentioned, I noticed that bigrams such as “cover image”, “title page” “page image” and “frontispiece image” kept appearing, but they really shouldn’t be factored into the counts and final united bigram data frame, so they’ll be removed.
bigrams_to_remove <- c("cover image", "NA NA", "title page", "page image", "frontispiece image")
bigrams_united <- bigrams_filtered %>%
unite(bigram, Word1, Word2, sep = " ") %>%
filter(!(bigram%in% bigrams_to_remove))
bigrams_united
## # A tibble: 53,430 × 1
## bigram
## <chr>
## 1 heritage left
## 2 honorable lives
## 3 southern skies
## 4 southern woman
## 5 civil war
## 6 portrays faithfully
## 7 faithfully life
## 8 ante bellum
## 9 bellum times
## 10 mourning vestments
## # … with 53,420 more rows
top_bigrams <- bigrams_united %>%
ungroup ()%>%
count(bigram, sort = TRUE) %>%
top_n(150)
## Selecting by n
wordcloud2(top_bigrams)
Some of top 150 bigrams were North Carolina, Civil War, Court House, Common Schools, and Law Makers, and Detective Police, among many others.
bigram_graph <- bigram_counts %>%
graph_from_data_frame()
## Warning in graph_from_data_frame(.): In `d' `NA' elements were replaced with
## string "NA"
bigram_graph
## IGRAPH 4772364 DN-- 15861 42889 --
## + attr: name (v/c), n (e/n)
## + edges from 4772364 (vertex names):
## [1] NA ->NA baton ->rouge north ->carolina
## [4] thomas ->dabney short ->time port ->hudson
## [7] civil ->war daughter ->emmy south ->carolina
## [10] provost ->marshal thousand ->dollars hundred ->dollars
## [13] god ->bless court ->house pass ->christian
## [16] supreme ->court washington->city ten ->miles
## [19] days ->ago dr ->felton twenty ->miles
## [22] half ->past southern ->women burl ->quiney
## + ... omitted several edges
bigram_graph_filtered <- bigram_counts %>%
filter(n > 1) %>%
graph_from_data_frame()
## Warning in graph_from_data_frame(.): In `d' `NA' elements were replaced with
## string "NA"
bigram_graph_filtered
## IGRAPH 4784460 DN-- 3608 4754 --
## + attr: name (v/c), n (e/n)
## + edges from 4784460 (vertex names):
## [1] NA ->NA baton ->rouge north ->carolina
## [4] thomas ->dabney short ->time port ->hudson
## [7] civil ->war daughter ->emmy south ->carolina
## [10] provost ->marshal thousand ->dollars hundred ->dollars
## [13] god ->bless court ->house pass ->christian
## [16] supreme ->court washington->city ten ->miles
## [19] days ->ago dr ->felton twenty ->miles
## [22] half ->past southern ->women burl ->quiney
## + ... omitted several edges
set.seed(100)
top_bigrams <- bigram_counts[1:75,]
bigram_graph_filtered <- graph_from_data_frame(top_bigrams, directed = TRUE)
## Warning in graph_from_data_frame(top_bigrams, directed = TRUE): In `d' `NA'
## elements were replaced with string "NA"
ggraph(bigram_graph_filtered, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE) +
geom_node_point(color = "mediumorchid2", size = 3) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
labs(title = "Top 75 Bigrams from the Narratives of the American South")
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
theme_void()
## List of 97
## $ line : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ rect : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ text :List of 11
## ..$ family : chr ""
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 11
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight : num 0.9
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.title.x : NULL
## $ axis.title.x.top : NULL
## $ axis.title.x.bottom : NULL
## $ axis.title.y : NULL
## $ axis.title.y.left : NULL
## $ axis.title.y.right : NULL
## $ axis.text : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.text.x : NULL
## $ axis.text.x.top : NULL
## $ axis.text.x.bottom : NULL
## $ axis.text.y : NULL
## $ axis.text.y.left : NULL
## $ axis.text.y.right : NULL
## $ axis.ticks : NULL
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.length : 'simpleUnit' num 0points
## ..- attr(*, "unit")= int 8
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom: NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.line : NULL
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ legend.background : NULL
## $ legend.margin : NULL
## $ legend.spacing : NULL
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key : NULL
## $ legend.key.size : 'simpleUnit' num 1.2lines
## ..- attr(*, "unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.align : NULL
## $ legend.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.align : NULL
## $ legend.position : chr "right"
## $ legend.direction : NULL
## $ legend.justification : NULL
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : NULL
## $ legend.box.background : NULL
## $ legend.box.spacing : NULL
## $ panel.background : NULL
## $ panel.border : NULL
## $ panel.spacing : 'simpleUnit' num 5.5points
## ..- attr(*, "unit")= int 8
## $ panel.spacing.x : NULL
## $ panel.spacing.y : NULL
## $ panel.grid : NULL
## $ panel.grid.major : NULL
## $ panel.grid.minor : NULL
## $ panel.grid.major.x : NULL
## $ panel.grid.major.y : NULL
## $ panel.grid.minor.x : NULL
## $ panel.grid.minor.y : NULL
## $ panel.ontop : logi FALSE
## $ plot.background : NULL
## $ plot.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 1.2
## ..$ hjust : num 0
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 5.5points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.title.position : chr "panel"
## $ plot.subtitle :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 5.5points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.caption :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : num 1
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 5.5points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.caption.position : chr "panel"
## $ plot.tag :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 1.2
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.tag.position : chr "topleft"
## $ plot.margin : 'simpleUnit' num [1:4] 0lines 0lines 0lines 0lines
## ..- attr(*, "unit")= int 3
## $ strip.background : NULL
## $ strip.background.x : NULL
## $ strip.background.y : NULL
## $ strip.clip : chr "inherit"
## $ strip.placement : NULL
## $ strip.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.text.x : NULL
## $ strip.text.x.bottom : NULL
## $ strip.text.x.top : NULL
## $ strip.text.y : NULL
## $ strip.text.y.left : NULL
## $ strip.text.y.right : NULL
## $ strip.switch.pad.grid : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ strip.switch.pad.wrap : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUE
The top 75 bigrams were taken for the word net for ease and readability of the visualization. There are some clear patterns among these bigrams, and some clustering with the many of the network clusters mentioning law enforcement and the government and judicial systems, such as police officers or detectives and the marshall’s provost, the supreme court, prison capitol, the white house, and lieutenants, along with the words “house” and “servants”, which may highlight some of the laws and changes that were beginning to come into place and conversations surrounding the trajectory of the South after their loss.
It’s clear that these conversations seemed to extend into each of the narratives, despite the variations in an individuals societal standing (wife/homemaker, Native American, laborer, soldier, slave, etc.), which makes sense as decisions would have had varying implications for each group.
In the future, I am interested in combined the word network analysis performed with these texts with another analysis method such as topic modeling to see some of the common themes across them. I also think that the analysis of these texts would be perfectly paired with for sentiment analysis, as it can be used to really get a feel for what the mood and feelings were of the authors as they were writing. This could give some additional context outside of looking at just the bigrams.
UNC-Chapel Hill done an outstanding job compiling and housing this data and making it easy to access for research and analysis, so I’d be interested in explore some of the other texts that are available as well.