Mormon Reformation Glove Word Embedding

library(tokenizers)
library(text2vec)
library(Matrix)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(ggrepel)
library(broom)
library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:Matrix':
## 
##     expand

library(readr)
library(stringr)

Beginning in September of 1856, a renewed spiritual awakening took hold among the Mormons in Utah under the direction of President Brigham Young. Fiery sermons were preached, especially by Jedidiah M. Grant, a counselor to President Young. Saints were encouraged to turn from sin and recommit themselves to the Lord. As an outward sign of their renewed religious adherence, many members of the church were rebaptized [1]. Historians refer to this period as the Mormon Reformation.

Various articles and books have addressed, to varying degrees, the Mormon Reformation and its rhetoric. I propose running some text analysis tools on a corpus of the sermons from this time period. The sermons I am using come from a volume known as the Journal of Discourses (JofD). A 26 volume set, the JofD comprise various sermons and teachings by leaders of the Mormon church encompassing the years from 1854 to 1886. There is a level of innaccuracy as the sermons were recorded by individuals in the cognregation (typically in short hand) and then were later collected and brought together. While the JofD is a great source for sermons, it is still fairly limited in its geographic scope (all the sermons in my original dataset were given in Salt Lake City) which means that I am still missing sermons from the Mormon Reformation time period [2].

At any rate, they provide a good foundation for sermons spanning from September of 1856 through June of 1857. I was able to record not only the speaker but who recorded the sermon (sometimes there are two recorders) as well as the locaiton within Salt Lake City where the sermon was given (it was one of two places: the Tabernacle or the Bowery). While this data won’t be used for the current analysis, it could prove interesting or useful if I expand and add to these datasets.

sermons <- read_csv("Mormon_Reformation_JOD_Sermons.csv")
sermons

## Source: local data frame [77 x 9]
## 
##       id       date           speaker reporter_first reporter_second
##    (int)     (date)             (chr)          (chr)           (chr)
## 1      1 1856-09-21     Brigham Young     J. V. Long              NA
## 2      2 1856-09-21  Heber C. Kimball     J. V. Long              NA
## 3      3 1856-09-21 Jedidiah M. Grant     J. V. Long              NA
## 4      4 1856-09-21     Brigham Young     G. D. Watt              NA
## 5      5 1856-11-02     Brigham Young     G. D. Watt              NA
## 6      6 1856-11-02  Heber C. Kimball     G. D. Watt              NA
## 7      7 1856-11-02     Brigham Young     G. D. Watt              NA
## 8      8 1856-11-02 Jedidiah M. Grant     G. D. Watt              NA
## 9      9 1856-11-09     Brigham Young     G. D. Watt              NA
## 10    10 1856-11-09  Heber C. Kimball     J. V. Long              NA
## ..   ...        ...               ...            ...             ...
## Variables not shown: location (chr), city (chr), text (chr), url (chr)

Initially, I took the entire dataset of sermons all the way through to explore the output of tokenized words. However, I found the results to be realtively un-remarkable. The rhetoric reflected what one might assume from religious sermons. No strong correlations manifested themselves and what correlations did exists were to be expected. At the suggestion of Lincoln, I separated the dataset into two. The first contains all the sermons that Brigham Young gave during this time period while the second contains everyone else (his two counselors and other leaders of the Church). The idea is that the dataset becomes interesting when its in comparison to something else. How different was the rhetoric coming from Brigham Young compared to other leaders at that time?

by_sermons <- sermons %>% 
  filter(speaker == "Brigham Young")

other_sermons <- sermons %>% 
  filter(speaker != "Brigham Young")

count(by_sermons)

## Source: local data frame [1 x 1]
## 
##       n
##   (int)
## 1    22

count(other_sermons)

## Source: local data frame [1 x 1]
## 
##       n
##   (int)
## 1    55

other_tokens <- tokenize_words(other_sermons$text)
by_tokens <- tokenize_words(by_sermons$text)

The datasets are not evenly distributed. My Brigham Young dataset of sermons contains 22 observations while my “other” dataset contains 55. If I were to pursue this topic further, I would need to address this disparity with either expanding my date range or try to locate other sources that contain sermons during the Reformation.

After I have tokenized each of my two datasets, I go through and begin to prune the vocabualry (removing outliers and setting parameters) as well as creating my corpus to work with.

#Brigham Young's sermons
by_it <- itoken(by_tokens)
by_vocab <- create_vocabulary(by_it)
by_vocab_pruned <- prune_vocabulary(by_vocab, term_count_min = 5,
                                 doc_proportion_max = 0.9)
by_vectorizer <- vocab_vectorizer(by_vocab_pruned, grow_dtm = TRUE,
                               skip_grams_window = 10L)
by_it <- itoken(by_tokens)
by_corpus <- create_corpus(by_it, by_vectorizer)

by_tcm <- get_tcm(by_corpus)

#Other Mormon Leader's sermons
other_it <- itoken(other_tokens)
other_vocab <- create_vocabulary(other_it)
other_vocab_pruned <- prune_vocabulary(other_vocab, term_count_min = 5,
                                 doc_proportion_max = 0.9)
other_vectorizer <- vocab_vectorizer(other_vocab_pruned, grow_dtm = TRUE,
                               skip_grams_window = 10L)
other_it <- itoken(other_tokens)
other_corpus <- create_corpus(other_it, other_vectorizer)

other_tcm <- get_tcm(other_corpus)

To build each model I chose to run the iteration 30 times.

RcppParallel::setThreadOptions(numThreads = 8)
by_glove_fit <- glove(by_tcm, word_vectors_size = 100, x_max = 10, num_iters = 30)

## 2016-05-11 11:48:36 - epoch 1, expected cost 0.0812

## 2016-05-11 11:48:37 - epoch 2, expected cost 0.0553

## 2016-05-11 11:48:37 - epoch 3, expected cost 0.0438

## 2016-05-11 11:48:37 - epoch 4, expected cost 0.0366

## 2016-05-11 11:48:38 - epoch 5, expected cost 0.0316

## 2016-05-11 11:48:38 - epoch 6, expected cost 0.0278

## 2016-05-11 11:48:38 - epoch 7, expected cost 0.0248

## 2016-05-11 11:48:39 - epoch 8, expected cost 0.0224

## 2016-05-11 11:48:39 - epoch 9, expected cost 0.0205

## 2016-05-11 11:48:40 - epoch 10, expected cost 0.0188

## 2016-05-11 11:48:40 - epoch 11, expected cost 0.0174

## 2016-05-11 11:48:40 - epoch 12, expected cost 0.0161

## 2016-05-11 11:48:41 - epoch 13, expected cost 0.0151

## 2016-05-11 11:48:41 - epoch 14, expected cost 0.0141

## 2016-05-11 11:48:41 - epoch 15, expected cost 0.0132

## 2016-05-11 11:48:42 - epoch 16, expected cost 0.0125

## 2016-05-11 11:48:42 - epoch 17, expected cost 0.0118

## 2016-05-11 11:48:42 - epoch 18, expected cost 0.0111

## 2016-05-11 11:48:43 - epoch 19, expected cost 0.0106

## 2016-05-11 11:48:43 - epoch 20, expected cost 0.0101

## 2016-05-11 11:48:43 - epoch 21, expected cost 0.0096

## 2016-05-11 11:48:44 - epoch 22, expected cost 0.0091

## 2016-05-11 11:48:44 - epoch 23, expected cost 0.0087

## 2016-05-11 11:48:44 - epoch 24, expected cost 0.0083

## 2016-05-11 11:48:45 - epoch 25, expected cost 0.0080

## 2016-05-11 11:48:45 - epoch 26, expected cost 0.0076

## 2016-05-11 11:48:45 - epoch 27, expected cost 0.0073

## 2016-05-11 11:48:46 - epoch 28, expected cost 0.0070

## 2016-05-11 11:48:46 - epoch 29, expected cost 0.0068

## 2016-05-11 11:48:46 - epoch 30, expected cost 0.0065

other_glove_fit <- glove(other_tcm, word_vectors_size = 100, x_max = 10, num_iters = 30)

## 2016-05-11 11:48:47 - epoch 1, expected cost 0.0828

## 2016-05-11 11:48:47 - epoch 2, expected cost 0.0551

## 2016-05-11 11:48:48 - epoch 3, expected cost 0.0445

## 2016-05-11 11:48:48 - epoch 4, expected cost 0.0380

## 2016-05-11 11:48:49 - epoch 5, expected cost 0.0335

## 2016-05-11 11:48:49 - epoch 6, expected cost 0.0301

## 2016-05-11 11:48:50 - epoch 7, expected cost 0.0275

## 2016-05-11 11:48:51 - epoch 8, expected cost 0.0253

## 2016-05-11 11:48:51 - epoch 9, expected cost 0.0235

## 2016-05-11 11:48:52 - epoch 10, expected cost 0.0219

## 2016-05-11 11:48:52 - epoch 11, expected cost 0.0206

## 2016-05-11 11:48:53 - epoch 12, expected cost 0.0194

## 2016-05-11 11:48:53 - epoch 13, expected cost 0.0184

## 2016-05-11 11:48:54 - epoch 14, expected cost 0.0175

## 2016-05-11 11:48:54 - epoch 15, expected cost 0.0167

## 2016-05-11 11:48:55 - epoch 16, expected cost 0.0159

## 2016-05-11 11:48:55 - epoch 17, expected cost 0.0153

## 2016-05-11 11:48:56 - epoch 18, expected cost 0.0146

## 2016-05-11 11:48:56 - epoch 19, expected cost 0.0141

## 2016-05-11 11:48:57 - epoch 20, expected cost 0.0135

## 2016-05-11 11:48:57 - epoch 21, expected cost 0.0131

## 2016-05-11 11:48:58 - epoch 22, expected cost 0.0126

## 2016-05-11 11:48:58 - epoch 23, expected cost 0.0122

## 2016-05-11 11:48:59 - epoch 24, expected cost 0.0118

## 2016-05-11 11:48:59 - epoch 25, expected cost 0.0114

## 2016-05-11 11:49:00 - epoch 26, expected cost 0.0111

## 2016-05-11 11:49:00 - epoch 27, expected cost 0.0108

## 2016-05-11 11:49:01 - epoch 28, expected cost 0.0105

## 2016-05-11 11:49:01 - epoch 29, expected cost 0.0102

## 2016-05-11 11:49:02 - epoch 30, expected cost 0.0099

by_word_vectors <- by_glove_fit$word_vectors[[1]] + by_glove_fit$word_vectors[[2]]
other_word_vectors <- other_glove_fit$word_vectors[[1]] + other_glove_fit$word_vectors[[2]]

rownames(by_word_vectors) <- rownames(by_tcm)
rownames(other_word_vectors) <- rownames(other_tcm)

by_word_vectors_norm <- sqrt(rowSums(by_word_vectors ^ 2))
other_word_vectors_norm <- sqrt(rowSums(other_word_vectors ^ 2))

Finally, I created comparison functions to use on both of the datasets.

# Brigham Young's sermons
by_word_vec <- function(word) {
  by_word_vectors[word, , drop = FALSE]
}

by_closest_to <- function(by_word_vec, n = 10) {
  cos_dist <- text2vec:::cosine(by_word_vec, by_word_vectors, by_word_vectors_norm)
  head(sort(cos_dist[1, ], decreasing = TRUE), n)
}

by_similarities <- function(by_word_vec) {
  cos_dist <- text2vec:::cosine(by_word_vec, by_word_vectors, by_word_vectors_norm)
  cos_dist %>% t() %>% tidy() %>% rename(word = .rownames)
}

# Other Mormon Leader's sermons
other_word_vec <- function(word) {
  other_word_vectors[word, , drop = FALSE]
}

other_closest_to <- function(other_word_vec, n = 10) {
  cos_dist <- text2vec:::cosine(other_word_vec, other_word_vectors, other_word_vectors_norm)
  head(sort(cos_dist[1, ], decreasing = TRUE), n)
}

other_similarities <- function(other_word_vec) {
  cos_dist <- text2vec:::cosine(other_word_vec, other_word_vectors, other_word_vectors_norm)
  cos_dist %>% t() %>% tidy() %>% rename(word = .rownames)
}

Now that both models have comparative functions, I can start to compare words between these two different camps. I chose the words “believe”,“sin”, and “repent”. The results begin to show the variation in rhetoric between Brigham Young and his other church leaders.You can see that the other church leaders used the word “believe” in connection with Jesus, Joseph Smith, and Brigham Young. Yet, President Young did not speak of Joseph of Jesus when using the term “believe.” Furthermore, President Young does not have a strong correlation between “repent” and words like “forsake” or “sint” as other LDS leaders do.

by_word_vec("believe") %>% by_closest_to()

##   believe    gospel    saints salvation  doctrine    bought   whether 
## 1.0000000 0.4170137 0.3462395 0.3460344 0.3371346 0.3335818 0.3333892 
##      lord      holy     truth 
## 0.3272827 0.3255280 0.2906673

other_word_vec("believe") %>% other_closest_to

##   believe    cannot    joseph     jesus       did    father   brigham 
## 1.0000000 0.4378603 0.4159731 0.3893648 0.3437728 0.3388390 0.3230928 
##       son      weep      vain 
## 0.3155737 0.3120995 0.2946186

by_word_vec("sin") %>% by_closest_to()

##        sin        add especially    neglect      saved      seven 
##  1.0000000  0.3454590  0.3200679  0.2953026  0.2832285  0.2826048 
##    exhibit       pick      claim  committed 
##  0.2705552  0.2664833  0.2656206  0.2610359

other_word_vec("sin") %>% other_closest_to

##       sin   against    others     among    commit direction     gives 
## 1.0000000 0.4321262 0.3416466 0.3248543 0.2989848 0.2983613 0.2910826 
##       try   suppose   discern 
## 0.2899688 0.2861000 0.2823848

by_word_vec("repent") %>% by_closest_to()

##    repent     souls      sins     happy  faithful    devils     doing 
## 1.0000000 0.4021526 0.3442230 0.3001911 0.2889863 0.2752443 0.2672160 
##     works   prepare naturally 
## 0.2559600 0.2482452 0.2411704

other_word_vec("repent") %>% other_closest_to

##    repent      sins   forsake    unless      flee    chance    burden 
## 1.0000000 0.5291698 0.4430974 0.3342721 0.3175771 0.2910523 0.2769944 
##      both    throne  junction 
## 0.2753208 0.2686145 0.2669553

test_words <- c("repent", "believe", "hell", "faithful", "holy", "gospel", "sin", "reform")

by_word_sim <- by_word_vec(test_words) %>% by_similarities()
other_word_sim <- other_word_vec(test_words) %>% other_similarities()

I created a list of interesting words to filter the results when I graph or compare two different tokens. I generated the list with the reofrmation in mind, thus focusing on words like “repent” or “commit.”

interesting_words <- c("atonment", "repentance", "repent", "baptism",
                       "rebaptism", "faith", "recommit", "worldly", "world",
                       "gentile", "gentiles", "mormon", "mormonism", "devil",
                       "excitement","doctrine", "apostolic", "primitive",
                       "ancient", "scripture", "doctrines", "truth", "hell",
                       "apostolical", "scriptures", "teaching", "religion", "god",
                       "commit", "reformation", "damnation", "astray", "joy", "hope")

The narrative for the Mormon Reformation is that baptism, or rather rebaptism, was a key in the recommitting of the saints to their faith. I searched each dataset for the term baptism and found Brigham Young does not employ the word “baptism” in the sermons I ahve collected. The other LDS church leaders did use the term baptism but the words it had the strongest relationship don’t stand out as speciic words for the reformation. Furthermore, neither set of sermons employed the term “rebaptism.”

"baptism" %in% rownames(by_word_vectors)

## [1] FALSE

"baptism" %in% rownames(other_word_vectors)

## [1] TRUE

other_word_vec("baptism") %>% other_closest_to

##    baptism      start proclaimed      soles      going    promise 
##  1.0000000  0.3560951  0.3351214  0.3187571  0.3125649  0.3047831 
##  overboard       work  treasures    carried 
##  0.2987197  0.2932416  0.2919151  0.2870017

"rebaptism" %in% rownames(by_word_vectors)

## [1] FALSE

"rebaptism" %in% rownames(other_word_vectors)

## [1] FALSE

filter_interesting <- function(df, threshold = 0.33) {
  filter(df, abs(df[[2]]) >= threshold |
             abs(df[[3]]) >= threshold |
             df[[1]] %in% names(df)[2:3])
}

I decided to compare some words to each other within each model and then try to compare the outputs to eachother in order to see variations between President Young and the other LDS church leaders.

by_word_sim %>%
  select(word, repent, reform) %>% 
  filter_interesting(threshold = 0.30) %>%
  ggplot(aes(x = repent, y = reform, label = word)) +
  geom_rect(xmin = -0.30, xmax = 0.30, ymin = -0.30, ymax = 0.30,
            fill = "lightgray", alpha = 0.1) +
  geom_point() +
  geom_text_repel() +
  theme_bw() +
  lims(x = c(-1.01, 1.01), y = c(-1.01, 1.01)) +
  labs(title = "Brigham Young: Words related to 'reform' and 'repent'")

other_word_sim %>%
  select(word, repent, reform) %>% 
  filter_interesting(threshold = 0.30) %>%
  ggplot(aes(x = repent, y = reform, label = word)) +
  geom_rect(xmin = -0.30, xmax = 0.30, ymin = -0.30, ymax = 0.30,
            fill = "lightgray", alpha = 0.1) +
  geom_point() +
  geom_text_repel() +
  theme_bw() +
  lims(x = c(-1.01, 1.01), y = c(-1.01, 1.01)) +
  labs(title = "Other LDS Leaders: Words related to 'reform' and 'repent'")

by_word_sim %>%
  select(word, sin, gospel) %>%
  filter_interesting(threshold = 0.33) %>%
  ggplot(aes(x = sin, y = gospel, label = word)) +
  geom_rect(xmin = -0.33, xmax = 0.33, ymin = -0.33, ymax = 0.33,
            fill = "lightgray", alpha = 0.1) +
  geom_point() +
  geom_text_repel() +
  theme_bw() +
  lims(x = c(-1.01, 1.01), y = c(-1.01, 1.01)) +
  labs(title = "Brigham Young: Words related to 'sin' and 'gospel'")

other_word_sim %>%
  select(word, sin, gospel) %>%
  filter_interesting(threshold = 0.33) %>%
  ggplot(aes(x = sin, y = gospel, label = word)) +
  geom_rect(xmin = -0.33, xmax = 0.33, ymin = -0.33, ymax = 0.33,
            fill = "lightgray", alpha = 0.1) +
  geom_point() +
  geom_text_repel() +
  theme_bw() +
  lims(x = c(-1.01, 1.01), y = c(-1.01, 1.01)) +
  labs(title = "Other LDS Leaders: Words related to 'sin' and 'gospel'")

Ultimately, I found some interesting relationship but none that were overwhelming strong and of import. My first impression is that my dataset needs expanding (both in the time frame and in the source base). In addition, my interesting word dataset probably needs some pruning and attention. Also, Jedidiah Grant was known to be the fiery preacher, yet in our dataset we only have a handful of his sermons. What this analysis does show is that there were differences int he rhetoric of Brigham Young and the other church leaders. Recognizing this difference and exploring it further (with a much larger dataset) would prove fruitful to understanding the Mormon Reformation, President Young’s role in it, and even how the rhetoric for each changed as the reformation rose and then dissipated in late 1857.

Peterson, Paul H. “The Mormon Reformation of 1856-1857: The Rhetoric and the Reality.” Journal of Mormon History 15 (January 1, 1989): 59–87.
“Journal of Discourses | Compilation of Early LDS Church Sermons.” Accessed May 5, 2016. https://www.lds.org/topics/journal-of-discourses?lang=eng.

Mormon Reformation Glove Word Embedding

Jordan Bratt