Tolson Chapel, built 1866

1. Introduction

The First Person Narratives of the American South are a collection of personal accounts of the lives of individuals such as farmers, women, enlisted men, laborers, Native Americans and African Americans as the South reconstructed itself following the Civil War.

The data used for this text analysis is just a small sample of the original 344 item collection that is housed at the University of North Carolina-Chapel Hill’s University Library. In addition to these narratives, the university collection also includes the Southern Historical Collection, which is one of the largest collections of Southern manuscripts in the nation, and the North Carolina Collections. The data was obtained from the open source data website, Kaggle, and was previously digitized and manually transcribed by UNC students and faculty.

This analysis samples 10 hand selected narratives from the collection for text analysis. This is in order to get a variety of perspectives . In it, the following questions will be addressed:

  1. What are the most common bigrams that exist across these texts?
  2. What word networks exist within these texts?

It is hoped that some common themes, patterns and words can be identified.

1b. Installing Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidytext)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.1     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.1.8
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(tidyr)
library(ggplot2)
library(igraph)
## Warning: package 'igraph' was built under R version 4.2.3
## 
## Attaching package: 'igraph'
## 
## The following objects are masked from 'package:lubridate':
## 
##     %--%, union
## 
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## 
## The following object is masked from 'package:tidyr':
## 
##     crossing
## 
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## 
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## 
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## 
## The following object is masked from 'package:base':
## 
##     union
library(ggraph)
## Warning: package 'ggraph' was built under R version 4.2.3
library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.2.3
library(readxl)
library(writexl)

2. Wrangle

2a. Importing Data

Each of the texts exist as a .txt file, so in order to use them for analysis, they will need to be combined and converted into a single usable data frame.

asnarratives <- read_xlsx("asnarratives.xlsx")
asnarratives
## # A tibble: 6,699 × 1
##    text                                                                         
##    <chr>                                                                        
##  1 <NA>                                                                         
##  2 [Cover Image]                                                                
##  3 [Frontispiece Image]                                                         
##  4 [Title Page Image]                                                           
##  5 FOREWORD.                                                                    
##  6 THIS little book is written for my children and the descendants of those who…
##  7 From its perusal may they learn still more to reverence the memory of their …
##  8 To this record I have added my memories of the home of my youth, under South…
##  9 This long retrospect of mine, a retrospect of eighty years, portrays faithfu…
## 10 I write with a loving hand as I pay this tribute to the past.                
## # … with 6,689 more rows

2b. Tokenizing Text Into Bigrams

asn_bigrams <- asnarratives %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)

asn_bigrams
## # A tibble: 590,020 × 1
##    bigram            
##    <chr>             
##  1 <NA>              
##  2 cover image       
##  3 frontispiece image
##  4 title page        
##  5 page image        
##  6 <NA>              
##  7 this little       
##  8 little book       
##  9 book is           
## 10 is written        
## # … with 590,010 more rows

Upon inspecting, the top 5 bigrams on first page already reveals some bigrams that don’t really describe the texts. They will need to be removed to get a more accurate picture of common words and themes across the texts.

2c. Bigram Counts

asn_bigrams %>%
  count(bigram, sort = TRUE)
## # A tibble: 227,462 × 2
##    bigram      n
##    <chr>   <int>
##  1 of the   4767
##  2 in the   2836
##  3 to the   2183
##  4 on the   1347
##  5 it was   1276
##  6 and the  1259
##  7 to be    1160
##  8 for the  1107
##  9 i was     984
## 10 he was    879
## # … with 227,452 more rows

2d. Removing Stop-Words

bigrams_separated <- asn_bigrams %>%
  separate(bigram, c("Word1", "Word2"), sep = " ")

bigrams_filtered <- bigrams_separated %>%
  filter(!(Word1 %in% stop_words$word)) %>%
  filter(!(Word2 %in% stop_words$word))

bigram_counts <- bigrams_filtered %>% 
  count(Word1, Word2, sort = TRUE)

bigram_counts
## # A tibble: 42,889 × 3
##    Word1    Word2        n
##    <chr>    <chr>    <int>
##  1 <NA>     <NA>       294
##  2 baton    rouge       81
##  3 north    carolina    74
##  4 thomas   dabney      54
##  5 short    time        50
##  6 port     hudson      49
##  7 civil    war         48
##  8 daughter emmy        41
##  9 south    carolina    40
## 10 provost  marshal     37
## # … with 42,879 more rows

2e. Uniting Bigrams

As previously mentioned, I noticed that bigrams such as “cover image”, “title page” “page image” and “frontispiece image” kept appearing, but they really shouldn’t be factored into the counts and final united bigram data frame, so they’ll be removed.

bigrams_to_remove <- c("cover image", "NA NA", "title page", "page image", "frontispiece image")
bigrams_united <- bigrams_filtered %>%
  unite(bigram, Word1, Word2, sep = " ") %>%
  filter(!(bigram%in% bigrams_to_remove))
  

bigrams_united
## # A tibble: 53,430 × 1
##    bigram             
##    <chr>              
##  1 heritage left      
##  2 honorable lives    
##  3 southern skies     
##  4 southern woman     
##  5 civil war          
##  6 portrays faithfully
##  7 faithfully life    
##  8 ante bellum        
##  9 bellum times       
## 10 mourning vestments 
## # … with 53,420 more rows

Word Cloud

top_bigrams <- bigrams_united %>%
  ungroup ()%>%
  count(bigram, sort = TRUE) %>%
  top_n(150)
## Selecting by n
wordcloud2(top_bigrams)

Some of top 150 bigrams were North Carolina, Civil War, Court House, Common Schools, and Law Makers, and Detective Police, among many others.

3. Visualizing Word Network

3a. Creating Bigram Graph

bigram_graph <- bigram_counts %>%
  graph_from_data_frame()
## Warning in graph_from_data_frame(.): In `d' `NA' elements were replaced with
## string "NA"
bigram_graph
## IGRAPH 4772364 DN-- 15861 42889 -- 
## + attr: name (v/c), n (e/n)
## + edges from 4772364 (vertex names):
##  [1] NA        ->NA        baton     ->rouge     north     ->carolina 
##  [4] thomas    ->dabney    short     ->time      port      ->hudson   
##  [7] civil     ->war       daughter  ->emmy      south     ->carolina 
## [10] provost   ->marshal   thousand  ->dollars   hundred   ->dollars  
## [13] god       ->bless     court     ->house     pass      ->christian
## [16] supreme   ->court     washington->city      ten       ->miles    
## [19] days      ->ago       dr        ->felton    twenty    ->miles    
## [22] half      ->past      southern  ->women     burl      ->quiney   
## + ... omitted several edges

3b. Creating Filtered Bigram Graph from Bigrams that Appear More than Once

bigram_graph_filtered <- bigram_counts %>%
  filter(n > 1) %>%
  graph_from_data_frame()
## Warning in graph_from_data_frame(.): In `d' `NA' elements were replaced with
## string "NA"
bigram_graph_filtered
## IGRAPH 4784460 DN-- 3608 4754 -- 
## + attr: name (v/c), n (e/n)
## + edges from 4784460 (vertex names):
##  [1] NA        ->NA        baton     ->rouge     north     ->carolina 
##  [4] thomas    ->dabney    short     ->time      port      ->hudson   
##  [7] civil     ->war       daughter  ->emmy      south     ->carolina 
## [10] provost   ->marshal   thousand  ->dollars   hundred   ->dollars  
## [13] god       ->bless     court     ->house     pass      ->christian
## [16] supreme   ->court     washington->city      ten       ->miles    
## [19] days      ->ago       dr        ->felton    twenty    ->miles    
## [22] half      ->past      southern  ->women     burl      ->quiney   
## + ... omitted several edges

3c. Word Net

set.seed(100)
top_bigrams <- bigram_counts[1:75,]
bigram_graph_filtered <- graph_from_data_frame(top_bigrams, directed = TRUE)
## Warning in graph_from_data_frame(top_bigrams, directed = TRUE): In `d' `NA'
## elements were replaced with string "NA"
ggraph(bigram_graph_filtered, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE) +
  geom_node_point(color = "mediumorchid2", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  labs(title = "Top 75 Bigrams from the Narratives of the American South")
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

  theme_void()
## List of 97
##  $ line                      : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ rect                      : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ text                      :List of 11
##   ..$ family       : chr ""
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "black"
##   ..$ size         : num 11
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 0
##   ..$ lineheight   : num 0.9
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ title                     : NULL
##  $ aspect.ratio              : NULL
##  $ axis.title                : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.title.x              : NULL
##  $ axis.title.x.top          : NULL
##  $ axis.title.x.bottom       : NULL
##  $ axis.title.y              : NULL
##  $ axis.title.y.left         : NULL
##  $ axis.title.y.right        : NULL
##  $ axis.text                 : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.text.x               : NULL
##  $ axis.text.x.top           : NULL
##  $ axis.text.x.bottom        : NULL
##  $ axis.text.y               : NULL
##  $ axis.text.y.left          : NULL
##  $ axis.text.y.right         : NULL
##  $ axis.ticks                : NULL
##  $ axis.ticks.x              : NULL
##  $ axis.ticks.x.top          : NULL
##  $ axis.ticks.x.bottom       : NULL
##  $ axis.ticks.y              : NULL
##  $ axis.ticks.y.left         : NULL
##  $ axis.ticks.y.right        : NULL
##  $ axis.ticks.length         : 'simpleUnit' num 0points
##   ..- attr(*, "unit")= int 8
##  $ axis.ticks.length.x       : NULL
##  $ axis.ticks.length.x.top   : NULL
##  $ axis.ticks.length.x.bottom: NULL
##  $ axis.ticks.length.y       : NULL
##  $ axis.ticks.length.y.left  : NULL
##  $ axis.ticks.length.y.right : NULL
##  $ axis.line                 : NULL
##  $ axis.line.x               : NULL
##  $ axis.line.x.top           : NULL
##  $ axis.line.x.bottom        : NULL
##  $ axis.line.y               : NULL
##  $ axis.line.y.left          : NULL
##  $ axis.line.y.right         : NULL
##  $ legend.background         : NULL
##  $ legend.margin             : NULL
##  $ legend.spacing            : NULL
##  $ legend.spacing.x          : NULL
##  $ legend.spacing.y          : NULL
##  $ legend.key                : NULL
##  $ legend.key.size           : 'simpleUnit' num 1.2lines
##   ..- attr(*, "unit")= int 3
##  $ legend.key.height         : NULL
##  $ legend.key.width          : NULL
##  $ legend.text               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.align         : NULL
##  $ legend.title              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.align        : NULL
##  $ legend.position           : chr "right"
##  $ legend.direction          : NULL
##  $ legend.justification      : NULL
##  $ legend.box                : NULL
##  $ legend.box.just           : NULL
##  $ legend.box.margin         : NULL
##  $ legend.box.background     : NULL
##  $ legend.box.spacing        : NULL
##  $ panel.background          : NULL
##  $ panel.border              : NULL
##  $ panel.spacing             : 'simpleUnit' num 5.5points
##   ..- attr(*, "unit")= int 8
##  $ panel.spacing.x           : NULL
##  $ panel.spacing.y           : NULL
##  $ panel.grid                : NULL
##  $ panel.grid.major          : NULL
##  $ panel.grid.minor          : NULL
##  $ panel.grid.major.x        : NULL
##  $ panel.grid.major.y        : NULL
##  $ panel.grid.minor.x        : NULL
##  $ panel.grid.minor.y        : NULL
##  $ panel.ontop               : logi FALSE
##  $ plot.background           : NULL
##  $ plot.title                :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 1.2
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.title.position       : chr "panel"
##  $ plot.subtitle             :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption.position     : chr "panel"
##  $ plot.tag                  :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 1.2
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.tag.position         : chr "topleft"
##  $ plot.margin               : 'simpleUnit' num [1:4] 0lines 0lines 0lines 0lines
##   ..- attr(*, "unit")= int 3
##  $ strip.background          : NULL
##  $ strip.background.x        : NULL
##  $ strip.background.y        : NULL
##  $ strip.clip                : chr "inherit"
##  $ strip.placement           : NULL
##  $ strip.text                :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.x              : NULL
##  $ strip.text.x.bottom       : NULL
##  $ strip.text.x.top          : NULL
##  $ strip.text.y              : NULL
##  $ strip.text.y.left         : NULL
##  $ strip.text.y.right        : NULL
##  $ strip.switch.pad.grid     : 'simpleUnit' num 2.75points
##   ..- attr(*, "unit")= int 8
##  $ strip.switch.pad.wrap     : 'simpleUnit' num 2.75points
##   ..- attr(*, "unit")= int 8
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE

4. Discussion

The top 75 bigrams were taken for the word net for ease and readability of the visualization. There are some clear patterns among these bigrams, and some clustering with the many of the network clusters mentioning law enforcement and the government and judicial systems, such as police officers or detectives and the marshall’s provost, the supreme court, prison capitol, the white house, and lieutenants, along with the words “house” and “servants”, which may highlight some of the laws and changes that were beginning to come into place and conversations surrounding the trajectory of the South after their loss.

It’s clear that these conversations seemed to extend into each of the narratives, despite the variations in an individuals societal standing (wife/homemaker, Native American, laborer, soldier, slave, etc.), which makes sense as decisions would have had varying implications for each group. 

In the future, I am interested in combined the word network analysis performed with these texts with another analysis method such as topic modeling to see some of the common themes across them. I also think that the analysis of these texts would be perfectly paired with for sentiment analysis, as it can be used to really get a feel for what the mood and feelings were of the authors as they were writing. This could give some additional context outside of looking at just the bigrams.

UNC-Chapel Hill done an outstanding job compiling and housing this data and making it easy to access for research and analysis, so I’d be interested in explore some of the other texts that are available as well.