Below are a number of dataframes and analytical visualizations rendered using data from the “ussher” package in R.
More information on the ussher package can be viewed in R using “?ussher”
To start, the ussher data set is called, and the indexed paragraphs are tokenized by word, for more tidy NLP analysis. The resulting tables are various abbreviated views of the tokenized data.
ussher
## # A tibble: 6,998 × 8
## # Rowwise:
## Index EventTxt YearB…¹ Epoch BibBk1 AnnoM…² Season JulPer
## <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl>
## 1 1 In the beginning God create… -4004 1st … <NA> 1 Autumn 710
## 2 2 On the first day of the w… -4004 1st … Ge 1 Autumn 710
## 3 3 On the second day Monday… -4004 1st … Ge 1 Autumn 710
## 4 4 On the third day Tuesda… -4004 1st … Ge 1 Autumn 710
## 5 5 On the fourth day Wednesda… -4004 1st … <NA> 1 Autumn 710
## 6 6 On the fifth day Thursday … -4004 1st … <NA> 1 Autumn 710
## 7 7 On the sixth day Friday O… -4004 1st … <NA> 1 Autumn 710
## 8 8 Now on the seventh day Sa… -4004 1st … <NA> 1 Autumn 710
## 9 9 After the first week of the… -4004 1st … <NA> 1 Autumn 710
## 10 10 The Devil envied God s hono… -4004 1st … <NA> 1 Autumn 710
## # … with 6,988 more rows, and abbreviated variable names ¹YearBCAD, ²AnnoMund
## # ℹ Use `print(n = ...)` to see more rows
ussh.ind <- ussher
tidy_annals <- ussh.ind %>%
unnest_tokens(word, EventTxt)
head(tidy_annals[!(!is.na(tidy_annals$word) & tidy_annals$word=="" & tidy_annals$word==" "& tidy_annals$word==" "), ])
## # A tibble: 6 × 8
## # Rowwise:
## Index YearBCAD Epoch BibBk1 AnnoMund Season JulPer word
## <dbl> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 1 -4004 1st Age <NA> 1 Autumn 710 in
## 2 1 -4004 1st Age <NA> 1 Autumn 710 the
## 3 1 -4004 1st Age <NA> 1 Autumn 710 beginning
## 4 1 -4004 1st Age <NA> 1 Autumn 710 god
## 5 1 -4004 1st Age <NA> 1 Autumn 710 created
## 6 1 -4004 1st Age <NA> 1 Autumn 710 the
Bigrams (two-word combinations) can also be tokenized. This first attempt at bigrams includes stop words (such as “in” or “of”) to more simply show the bigrams in succession (“In the”,“the beginning”, etc.)
## # A tibble: 6 × 8
## # Rowwise:
## Index YearBCAD Epoch BibBk1 AnnoMund Season JulPer bigram
## <dbl> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 1 -4004 1st Age <NA> 1 Autumn 710 in the
## 2 1 -4004 1st Age <NA> 1 Autumn 710 the beginning
## 3 1 -4004 1st Age <NA> 1 Autumn 710 beginning god
## 4 1 -4004 1st Age <NA> 1 Autumn 710 god created
## 5 1 -4004 1st Age <NA> 1 Autumn 710 created the
## 6 1 -4004 1st Age <NA> 1 Autumn 710 the heaven
Bigrams are then separated, filtered and united to develop variables that can be used differently in various visualizations and correlation analyses. One of the simplest tables to develop is a count of unique bigrams in the entire text. This becomes very useful upon deeper analysis and visualization.
For example, many of the bigrams that occur more frequently than 200 times often happen to be scholarly or source references. Thus, isolated high frequency bigrams may be useful in isolating James Ussher’s primary sources, contributing authors, or indexing and apendix-related citation, or even a study of Enlightenment Era scholarship and research practices and conventions. Moreover, this efficient “superindex” can not only cross reference to the location in the original corpus, but also can be compared to dates, Epochs and other features unique to the chronology.
bigrams_separated <- ussher_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
# new bigram counts:
bigram_counts <- bigrams_filtered %>%
count(word1, word2, sort = TRUE)
head(bigram_counts)
## # A tibble: 6 × 3
## word1 word2 n
## <chr> <chr> <int>
## 1 diod sic 525
## 2 foot soldiers 408
## 3 josephus antiq 375
## 4 tacitus annals 169
## 5 velleius paterculus 166
## 6 polyb legat 163
It is now necessary to join the various ways bigram information has been sliced, for better analysis of the data within the text. The resultant tables are presented here for continuity.
bigrams_united <- bigrams_filtered %>%
unite(bigram, word1, word2, sep = " ")
count_united <- bigrams_united %>%
add_count(bigram)
head(count_united)
## # A tibble: 6 × 9
## Index YearBCAD Epoch BibBk1 AnnoMund Season JulPer bigram n
## <dbl> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <int>
## 1 1 -4004 1st Age <NA> 1 Autumn 710 beginning god 1
## 2 1 -4004 1st Age <NA> 1 Autumn 710 god created 3
## 3 1 -4004 1st Age <NA> 1 Autumn 710 earth ge 1
## 4 1 -4004 1st Age <NA> 1 Autumn 710 chronology happened 1
## 5 1 -4004 1st Age <NA> 1 Autumn 710 evening preceding 1
## 6 1 -4004 1st Age <NA> 1 Autumn 710 julian calendar 20
The sample visualizations below do not rely on more complex ngrams, but here trigrams are generated as an example of possible further exploration beyond the scope of this report:
ussher_trigrams <- ussh.ind %>%
unnest_tokens(trigram, EventTxt, token = "ngrams", n = 3) %>%
separate(trigram, c("word1", "word2", "word3"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word,
!word3 %in% stop_words$word) %>%
count(word1, word2, word3, sort = TRUE)
head(ussher_trigrams)
## # A tibble: 6 × 4
## word1 word2 word3 n
## <chr> <chr> <chr> <int>
## 1 ad attic epist 93
## 2 cicero ad attic 79
## 3 appian civil war 59
## 4 caesar civil war 45
## 5 hirtius de bell 39
## 6 de bell alexandrin 32
As can be seen in some of the terms above, interesting references appear. “ad attic epist” is a scholarly reference to Cicero’s letter to Atticus, aka Epistle ad Atticum. The fact that that trigram appears in the corpus 93 times indicates the source’s prominence and would make for an interesting review of appearances by date.
Once a bigram dataframe is established, it can be filtered. In the example below, the filtered table produces bigram counts of various 1st words that end with the 2nd word “son.” 96 different bigrams combinations are identified by the Epoch in which they appear.
bigrams_filtered %>%
filter(word2 == "son") %>%
count(Epoch, word1, sort = FALSE)
## # A tibble: 96 × 3
## Epoch word1 n
## <chr> <chr> <int>
## 1 1st Age st 1
## 2 3rd Age begotten 1
## 3 3rd Age promised 1
## 4 4th Age jair 1
## 5 4th Age manasseh 1
## 6 4th Age naphtali 1
## 7 4th Age semiramis 1
## 8 5th Age baruch 1
## 9 5th Age cambyses 1
## 10 5th Age firstborn 1
## # … with 86 more rows
## # ℹ Use `print(n = ...)` to see more rows
In NLP, term frequency(tf) and idf(inverse document frequency) can be compared in a unified measure known as tf-idf. This enables a statistical method for estimating the impact of certain bigrams on the corpus containing those bigrams. Then we can compare different segments of the chronology (in this case, those segments are based on the variable “Epoch” or one of Ussher’s designated “Seven Ages of the World.”) In this way, an analyst can take a quick look at the most impactful bigrams of each Age and then begin to proximate an initial “distant reading” of those segments.
Very basically, tf measures if a bigram occurs frequently in the corpus or not, and idf puts more weight on any bigrams that are particularly unique to a given Epoch. tf-idf is an algorithm derived from these counterbalancing elements to better determine thematic elements unique to each distinct Epoch.
bigram_tf_idf <- bigrams_united %>%
count(Epoch, bigram) %>%
bind_tf_idf(bigram, Epoch, n) %>%
arrange(desc(tf_idf))
head(bigram_tf_idf)
## # A tibble: 6 × 6
## Epoch bigram n tf idf tf_idf
## <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 1st Age adam died 7 0.0407 1.95 0.0792
## 2 2nd Age noah died 5 0.0333 1.25 0.0418
## 3 1st Age friday september 3 0.0174 1.95 0.0339
## 4 1st Age god created 3 0.0174 1.95 0.0339
## 5 1st Age living creatures 4 0.0233 1.25 0.0291
## 6 7th Age tacitus annals 158 0.0210 1.25 0.0263
Each Epoch in Ussher’s chronology is distinguished by different tf-idf factors. A combination of these factors can function as a sort of a fingerprint of that particular section of the chronology.
The 5 strongest tf-idf bigrams by Epoch are visualized here.
library(forcats)
bigram_tf_idf %>%
group_by(Epoch) %>%
slice_max(tf_idf, n = 5) %>%
ungroup() %>%
ggplot(aes(tf_idf, fct_reorder(bigram, tf_idf), fill = Epoch)) +
geom_col(show.legend = FALSE) +
facet_wrap(~Epoch, ncol = 2, scales = "free") +
labs(x = "tf-idf", y = NULL)
bigrams_separated %>%
filter(word1 == "not") %>%
count(word1, word2, sort = TRUE)
## # A tibble: 792 × 3
## word1 word2 n
## <chr> <chr> <int>
## 1 not to 169
## 2 not be 109
## 3 not know 72
## 4 not only 60
## 5 not so 49
## 6 not go 47
## 7 not far 46
## 8 not have 46
## 9 not yet 44
## 10 not allow 43
## # … with 782 more rows
## # ℹ Use `print(n = ...)` to see more rows
In social media and many other text analysis processes, “sentiment analysis” is often thought of as the primary objective of NLP models. Sentiment is typically measured statistically by assigning a certain negative or positive numeric weight to lists of words. For example, if Amazon wants to detect “troll 1-star” reviews, they could run sentiment analysis on all 1-star reviews and determine which ones are disproportionately neutral or even positive, indicating that a substantively negative review was not posted, despite the apparently strong negative response reflected in the single-star.
Other works, including medical, technical and historical works tend to rely less on sentiment analysis for the purposes of determining contextual meaning or even truth-detecting in their respective texts. There are other uses for sentiment analysis, however. One overlooked use is governance and process in medical, legal, technical and historical documents via analysis of “not” sentiment bigrams. By examining bigrams whose first term is that a negation term (in this example, the word “not”.)
In the sentiment visualization below, a cursory examination reveals a few points of note: The strongest negative sentiment “not” pair is “not allow”, while the strongest positive sentiment “not” pair is “not kill.” This very limited examination rings true for a chronology that heavily covers the records and affairs of lawmakers, kings and social conflict.
library(textdata)
AFINN <- get_sentiments("afinn")
not_words <- bigrams_separated %>%
filter(word1 == "not") %>%
inner_join(AFINN, by = c(word2 = "word")) %>%
count(word2, value, sort = TRUE)
head(not_words)
## # A tibble: 6 × 3
## word2 value n
## <chr> <dbl> <int>
## 1 allow 1 43
## 2 want 1 39
## 3 like 2 20
## 4 agree 1 18
## 5 fight -1 16
## 6 stop -1 16
library(ggplot2)
not_words %>%
mutate(contribution = n * value) %>%
arrange(desc(abs(contribution))) %>%
head(20) %>%
mutate(word2 = reorder(word2, contribution)) %>%
ggplot(aes((n * value)*-1, word2, fill = n * value < 0)) +
geom_col(show.legend = FALSE) +
labs(x = "Sentiment value * number of occurrences",
y = "Words preceded by \"not\"")
Here frequently occurring single words contained in the 1st Age portion of the chronology are correlated. When performing correlations on words, it is important to slice the data in order to get a significant number of occurences without creating too large of a correlation matrix. In this example, words had to occur more than ten times within the 1st Age, and were positively or negatively correlated
ussher_index_words <- ussh.ind %>%
filter(Epoch == "1st Age") %>%
filter(Index > 0) %>%
unnest_tokens(word, EventTxt) %>%
filter(!word %in% stop_words$word)
word_pairs <- ussher_index_words %>%
pairwise_count(word, Index, sort = TRUE)
## Warning: `distinct_()` was deprecated in dplyr 0.7.0.
## Please use `distinct()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
head(word_pairs)
## # A tibble: 6 × 3
## item1 item2 n
## <chr> <chr> <dbl>
## 1 adam ge 14
## 2 ge adam 14
## 3 ge god 11
## 4 god ge 11
## 5 day ge 11
## 6 born ge 11
word_cors <- ussher_index_words %>%
group_by(word) %>%
filter(n() >= 10) %>%
pairwise_cor(word, Index, sort = TRUE) %>%
ungroup()
head(word_cors)
## # A tibble: 6 × 3
## item1 item2 correlation
## <chr> <chr> <dbl>
## 1 world god 0.625
## 2 god world 0.625
## 3 day earth 0.480
## 4 earth day 0.480
## 5 earth god 0.414
## 6 god earth 0.414
word_cors %>%
filter(item1 == "god")
## # A tibble: 6 × 3
## item1 item2 correlation
## <chr> <chr> <dbl>
## 1 god world 0.625
## 2 god earth 0.414
## 3 god day 0.354
## 4 god adam -0.0423
## 5 god ge -0.125
## 6 god born -0.350
library(ggcorrplot)
# Computing correlation matrix
correlation_matrix <- xtabs(correlation~., word_cors)
correlation_matrix
## item2
## item1 adam born day earth ge god
## adam 0.00000000 -0.14414999 -0.33793249 -0.14056338 0.19102329 -0.04225771
## born -0.14414999 0.00000000 -0.52277330 -0.17729434 0.26726124 -0.35025832
## day -0.33793249 -0.52277330 0.00000000 0.47997460 -0.36675724 0.35387166
## earth -0.14056338 -0.17729434 0.47997460 0.00000000 -0.31137996 0.41409498
## ge 0.19102329 0.26726124 -0.36675724 -0.31137996 0.00000000 -0.12535663
## god -0.04225771 -0.35025832 0.35387166 0.41409498 -0.12535663 0.00000000
## world -0.09507985 -0.29315098 0.28908807 0.09334108 -0.17094086 0.62500000
## item2
## item1 world
## adam -0.09507985
## born -0.29315098
## day 0.28908807
## earth 0.09334108
## ge -0.17094086
## god 0.62500000
## world 0.00000000
# Visualizing the correlation matrix using
# square and circle methods
ggcorrplot(correlation_matrix, method ="square")
ggcorrplot(correlation_matrix, method ="circle")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Multi-dimensional referencing of terms in ussher relies on correlation, dating and date categories (“Epochs”) and ngrammatic relationships. This referencing opens up a host of opportunities for data-oriented “distant reading” and analysis of a text.
Here are a few methods:
annals.count <- tidy_annals %>%
anti_join(stop_words) %>%
count(Epoch, word, sort = TRUE)
## Joining, by = "word"
epochs_dtm <- annals.count %>%
cast_dtm(Epoch, word,n)
head(annals.count)
## # A tibble: 6 × 3
## # Rowwise:
## Epoch word n
## <chr> <chr> <int>
## 1 6th Age king 2135
## 2 6th Age alexander 1492
## 3 6th Age army 1486
## 4 6th Age city 1240
## 5 6th Age soldiers 1173
## 6 6th Age war 1132
epochs_lda <- LDA(epochs_dtm, k = 10, control = list(seed = 1234))
epochs_lda
## A LDA_VEM topic model with 10 topics.
epochs_topics <- tidy(epochs_lda, matrix = "beta")
head(epochs_topics)
## # A tibble: 6 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 king 0.0193
## 2 2 king 0.0180
## 3 3 king 0.00857
## 4 4 king 0.00183
## 5 5 king 0.000931
## 6 6 king 0.0170
Topics can be explored by grouping top terms by epoch and visualizing them. Here topics are clustered into 10 different groupings (a number somewhat arbitrarily selected for variety. Preliminary cluster analysis could be performed to more specifically narrow or expand the number of clusters selected.)
The position of a term within the top five makes a difference in subjectively evaluating the respective topic profiles.
top_terms <- epochs_topics %>%
group_by(topic) %>%
slice_max(beta, n = 5) %>%
ungroup() %>%
arrange(topic, -beta)
head(top_terms)
## # A tibble: 6 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 king 0.0193
## 2 1 josephus 0.0169
## 3 1 time 0.0140
## 4 1 city 0.0128
## 5 1 son 0.0112
## 6 2 king 0.0180
top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered()
Linear Discriminant Analysis LDA is a useful dimensionality reduction algorithm for matrixing topics. It is a supervised learning mechanism that is optimized to put distance between classes.
This ultimately means that a gamma matrix can be used to illustrate how the different topics are distributed within different documents, or - in this case - between different epochs.
A number of observations can be made from visualizing the gamma matrix in this case:
Low level cross-topic sharing is to be expected in a single unified corpus. If this method were used to compare different corpora, such as Ussher’s Chronology and Sun Tzu’s Art of War, one would expect significantly lower cross-topic sharing. Moreover, the number of topic divisions selected at the start will make a difference, which is why in full practice, an elbow, silhouette or other cluster algorithm is critical.
The first 3 Epochs share a single, identical topic: Topic 9 (genesis-god-egypt-day-jacob). The 4th and 5th Age share topic (king-god-son-time-people), while the 6th and 7th Age each have more distinct topic profiles, but also have multiple topic balances. Unique topic elements can provide leads for further distinguishing these profiles.
It makes sense, subjectively, that a topic that might emerge out of such a chronology would include the figure of “jesus”, but that such a topic would not appear until the 7th Age of the Earth. Likewise, a topic including Plutarch would only be expected to be mentioned during the 6th Age.
epoch_gamma <- tidy(epochs_lda, matrix = "gamma")
head(epoch_gamma)
## # A tibble: 6 × 3
## document topic gamma
## <chr> <int> <dbl>
## 1 6th Age 1 0.000162
## 2 7th Age 1 0.353
## 3 5th Age 1 0.00000302
## 4 4th Age 1 0.00000453
## 5 3rd Age 1 0.0000107
## 6 1st Age 1 0.0000521
epoch_gamma %>%
mutate(title = reorder(document, gamma * topic)) %>%
ggplot(aes(factor(topic), gamma)) +
geom_boxplot() +
facet_wrap(~ document) +
labs(x = "topic", y = expression(gamma))+
ggtitle("Association of Topics by Epoch")
epoch_classifications <- epoch_gamma %>%
group_by(document) %>%
slice_max(gamma) %>%
ungroup()
epoch_classifications
## # A tibble: 7 × 3
## document topic gamma
## <chr> <int> <dbl>
## 1 1st Age 9 1.00
## 2 2nd Age 9 0.999
## 3 3rd Age 9 1.00
## 4 4th Age 6 1.00
## 5 5th Age 6 1.00
## 6 6th Age 10 0.277
## 7 7th Age 5 0.647
epoch_topics <- epoch_classifications %>%
count(document, topic) %>%
group_by(document) %>%
slice_max(n, n = 1) %>%
ungroup() %>%
transmute(consensus = document, topic)
epoch_classifications %>%
inner_join(epoch_topics, by = "topic") %>%
filter(document != consensus)
## # A tibble: 8 × 4
## document topic gamma consensus
## <chr> <int> <dbl> <chr>
## 1 1st Age 9 1.00 2nd Age
## 2 1st Age 9 1.00 3rd Age
## 3 2nd Age 9 0.999 1st Age
## 4 2nd Age 9 0.999 3rd Age
## 5 3rd Age 9 1.00 1st Age
## 6 3rd Age 9 1.00 2nd Age
## 7 4th Age 6 1.00 5th Age
## 8 5th Age 6 1.00 4th Age
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
##
## viridis_pal
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
assignments <- augment(epochs_lda, data = epochs_dtm)
assignments <- assignments %>%
inner_join(epoch_topics, by = c(".topic" = "topic"))
head(assignments)
## # A tibble: 6 × 5
## document term count .topic consensus
## <chr> <chr> <dbl> <dbl> <chr>
## 1 5th Age king 258 6 4th Age
## 2 5th Age king 258 6 5th Age
## 3 4th Age king 68 6 4th Age
## 4 4th Age king 68 6 5th Age
## 5 3rd Age king 21 9 1st Age
## 6 3rd Age king 21 9 2nd Age
assignments %>%
count(document, consensus, wt = count) %>%
mutate(across(c(document, consensus), ~str_wrap(., 20))) %>%
group_by(document) %>%
mutate(percent = n / sum(n)) %>%
ggplot(aes(consensus, document, fill = percent)) +
geom_tile() +
scale_fill_gradient2(high = "darkred", label = percent_format()) +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
panel.grid = element_blank()) +
labs(x = "Epoch words were assigned to",
y = "Epoch words came from",
fill = "% of assignments")
Using a different kind of correlation matrix, the above observations can be visualized more efficiently. Here it can be seen that the first three Epochs share the same topic, as to do Epochs 4 and 5, while 6 and 7, respectively, have their own unique fingerprints, sharing nothing in common with the other ages.
Two novel interactive visualizations have been developed to indicate possible new frontiers in NLP-based distant reading analytics, using ngrams.
These examine the count of frequently-occuring bigrams, and the years, seasons and ages in which those bigrams occur. For the purpose of this visualization, bigrams with no season associated with it are not included in the visualization, but easily could be included as a 5th season category if necessary.
The first examines the seasonal distribution of various bigrams. One quickly notices that, in the First Age, Summer is by far the season most frequently associated with bigrams. By hovering over specific “x”s indicating unique bigrams, the visualization indicates that births and deaths in the first age are associated with summer, while “yearly fast” shows up in Autumn. No seasonal bigrams appear in Winter or Spring in the 1st Age.
The second visualization lends itself to specific bigram use over time, as bigrams are stratified by count, and aside from a few very common counts (such as 2) bigrams have a fairly unique count profile.
What this means is that the analyst can zoom in on specific date ranges or positions on the visualization and review the qualities of a given bigram. For example a very high-frequency bigram near the top of the visualization is “foot soldiers.” By hovering across its instances, it can be seen that foot soldier is typically associated with the winter months and only begins to appear as a seasonal bigram in the 6th Age, in the Autumn of 538 BC.
This observation might be of “distant read” use to research. Historical hypotheses are beyond the scope or expertise of this exercise, but analytical questions that may inform professional inquiry abound:
The Ussher data set can be further manipulated and modeled to pursue questions like these, and thousands more, to data-centered analytical conclusions, thus providing tools and techniques for applying NLP model theory to novel pathways of analysis.
Finally, an app has been developed in Shiny that illustrates how to make more interactive the “distant reading” analytical process using NLP.
https://datascinet.shinyapps.io/UssherXplore/
count_united <- count_united %>%
filter(n>2) %>%
filter(n<500) %>%
filter(!is.na(Season))
usshplot<-ggplot(count_united,aes(YearBCAD,Season,color=Epoch,size=n,bigram=bigram,Epoch=Epoch,AnnoMund=AnnoMund))+
geom_point(shape = 4,alpha=1)+
xlab("Year BC or AD")+ylab("Season of Bigram Appearance")+ ggtitle("Bigram Distribution by Season")+
labs(color="Season")
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:sentimentr':
##
## highlight
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
ggplotly(usshplot,tooltip= c("bigram","YearBCAD","Epoch","AnnoMund","n"))
linegraph <-ggplot(count_united,aes(YearBCAD,n,group=bigram,color=Season,bigram=bigram,Epoch=Epoch,AnnoMund=AnnoMund))+
geom_line()+
geom_point(shape = 1,alpha=1)+
xlab("Year BC or AD")+ylab("Number of Bigram Appearances in Entire Chronology")+ ggtitle("Bigram Distribution by Count")+
labs(color="Season")
ggplotly(linegraph,tooltip= c("bigram","YearBCAD","Epoch","AnnoMund","n","Season"))