0.1 1. Project Summary and Goals

This project explores the earliest century of English printed books — the 1400s — using a novel combination of generative AI, classical data science, and historical inquiry. Drawing from the Early English Books Online (EEBO) corpus, our goal is to surface new insights from texts published between 1400 and 1499, a relatively quiet but foundational period in the history of print.

While the 1400s represent a small fraction of the overall EEBO collection, these texts are invaluable: they capture a moment when the written word began shifting from manuscript culture to the early mechanics of the printing press. By applying large language models (LLMs) and automated text analysis, we aim to make this underexplored slice of literary history more searchable, quantifiable, and meaningful.

0.1.1 Core Goals:

  • Digitally analyze and summarize all available books from the 1400s in the EEBO collection
  • Translate early English texts into modern English using LLMs
  • Extract high-level themes, word trends, and topics from translated content
  • Generate visual insights into the people, places, and ideas that dominated the printed landscape
  • Demonstrate a scalable methodology for using GenAI in historical research

This work serves as both a case study and a proof-of-concept: showing how modern AI tools can illuminate pre-modern texts and make the distant past more legible, searchable, and surprising.

Here’s a shorter, more active version of the Methodology section — concise, clear, and action-driven:


0.2 2. Methodology

We focus on the earliest printed books in the EEBO corpus — those published between 1400 and 1499. Our approach combines large language models (LLMs), classical text mining, and data visualization to extract meaning from texts that are often linguistically and structurally challenging.

0.2.1 Data Preparation

We filter the EEBO dataset to isolate books from the 1400s with valid publication dates and usable title and text fields. After removing OCR noise and empty entries, we work with a curated subset representing the complete known print output of the 15th century.

0.2.2 Translation via LLMs

We use API-connected large language models to translate both titles and full texts into modern English. Prompts are engineered to retain historical vocabulary while skipping OCR artifacts and outdated syntax. We run translations in stable batches, log errors, and save intermediate results.

0.2.3 Text Cleaning and Structuring

After translation, we clean the texts to remove formatting noise and tokenize the data for analysis. We apply filters for stopwords, word length, and character types, using tidytext in R for structured outputs.

0.2.4 Analysis and Visualization

We analyze trends in word frequency, themes, and named entities. Topic models and n-gram patterns help us trace how certain concepts evolved. We visualize the findings using bar plots, word clouds, and time-based comparisons.

0.2.5 Tools Used

Our pipeline combines:

  • LLMs (via Portkey API)
  • R (tidyverse, tidytext, ggplot2)
  • Python (for optional NER and parsing)

This hybrid workflow allows us to explore the intellectual DNA of the 1400s and build scalable methods for future historical analysis.

0.3 EDA

Before diving into deeper interpretation, we perform an exploratory analysis of the translated 1400s EEBO texts to better understand their structure, content, and thematic focus.

0.3.1 Sample writings

df <- df<- read_csv("df_full_id_year.csv")


set.seed(12)
df_1400 <- df %>%
  select(id, year, title = Title, text =Text) %>% 
  filter(year>=1400, year < 1500) 

df_1400 %>% 
  slice_sample(n=15) %>% 
  mutate(across(where(is.character), ~ substr(., 1, 800))) %>%  # Truncate characters
  kable() %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
id year title text
132123 1485 I shal disceyue them by sum mene … No Text
132112 1485 Vulgaria q[ue]da[m] abs There[n]cio in Anglicam linguam t[ra]ducta No Text
93563 1482 [T]henne followyng this for Breton booke of Prolicronycon … No Text
132124 1485 Thinges used ye may go surely em[on]g the peple … No Text
70992 1494 [Here begynneth the table of a book entytled the book of good maners.] No Text
21283 1495 KL Ianuarius habet dies .xxxi. Luna .xxx No Text
15475 1496 Tenir per seruice de chiualer, est a tenir per homage foi alte … Page [unnumbered] Page [unnumbered] Page [unnumbered] Dieu et mon droit Page [unnumbered] TEnir per seruice de chiualer: est a tenir per homage foi alte & estuage & tret a suy garder mariage & relif{is} ¶Et nota {qu} seruice de chiualer est seruice de terre ou de tene¦ment pour armes porter en guerre en defence du roial¦me. & doit garder mariage & relief {per} la raison que nul est able ne depoiar et ne puit auer conusaunce darmes porter auaunt {qu} il soit dage de .xxi. ans. Et pour taunt {qu} le seignour ne perdera ceo {qu} de droit doit auer & {qu} la poiar de la roialme de rien ne soit enfeble. ¶Le roy voet per cause de sountendre age que soun seignour luy auera en sa gard tan{que} al pleyn age de luy: cest assauoir .xxi. ans. tEnir per graunt sergeanti est si come vn home tient certei
59659 1498 Manipulus curatorum No Text
28047 1498 In dei nomine amen nouerint vniversi cristifideles qualiter santissimi dominus felicis re Page [unnumbered] In dei nomine Amen Nouerint vniuer•• cristifideles qualiter S•••••ssim{us} dominus noster felicis re¦cordacionis Innocentib{us} papa octau{us} co•••ssit de speciali preuilegio & gracis vt anima{rum} illo•um qui cum Caritate ab hac luce decesserint salus procuretur quod si qui parentes amici aut alii xpristifideles pietate comoti cuiusuis nacionib{us} et prouincie & vbicunq{que} fuerint. ac vbicū{que} degāt vicesimam partem vnius ducati pro anima vniuscuius{que} sic defuncti dederint aut miserint pro reedificatione hospitalis maioris apud sanctum Iacobum in cō¦postella nec non pro duarum capellarum in dicto hospitali fundacione quarum vna viris alia mulieribus tam dan∣tes & mittētes {quam} defuncti predicti In omnibus Suffragiis Precibus & Elimosinis. Ieiuniis. Oracion
22927 1481 Incipit tabula h[uius] libri No Text
143714 1489 Lere [sic] endeth the book named the dictes or sayngis of the philosophers … No Text
54606 1498 [A prognostication for 1498] No Text
110672 1499 Incipit liber qui vocatur festiualis No Text
43803 1483 Explanationes notabiles deuotissimi viri Richardi Hampole heremite sup[er] lectio[n]es illas Beati Iob … No Text
42336 1476 Hic incipit paruus Catho Page [unnumbered] Page [unnumbered] Page [unnumbered] Page [unnumbered] Hic Incipit paruus Catho •Vm aīduerterem quā hoīes grauiter errare Whanne I aduerte to my remembrance And see how sele folkes erren greuously In the wey of vertuouse gouernance I haue supposed in myn hert that I Oughte to supporte and conseille prudently Them to be vertuous in lyuing And how they shal them self in hōnour bring Igitur fili carissime docebo te quo pacto mores Therfore my leue child I shal now telle the Herken me wele the maner and the guyse How thi sowle inward shal acqueinted be With thewes good and vertues in al wyse Rede and conceiue for he is to dispise That redith ay and wote not what it ment Suche reding is no thing but wynde myspent Ita{que} deo supplica / parentes ama Pray thy god & preise hy

0.3.2 Total Number of books in EEBO

total_books <- df %>% summarise(n = n()) %>% pull(n)
paste("Total number of books:", total_books)
[1] "Total number of books: 146323"
total_books_1400 <- df_1400 %>% summarise(n = n()) %>% pull(n)
paste("Total number of books from 1400s:", total_books_1400)
[1] "Total number of books from 1400s: 408"

We only have 408 books from 1400s, out of the total 146,323 in the collection.

0.6 Insights into the books lenthgs

df_1400_summary <- df_1400 %>%
  mutate(
    title_char_count = nchar(title),
    title_word_count = str_count(title, "\\S+"),
    text_char_count = nchar(text),
    text_word_count = str_count(text, "\\S+")
  )

df_1400_summary %>%
  summarise(
    avg_title_words = mean(title_word_count, na.rm = TRUE),
    avg_text_words = mean(text_word_count, na.rm = TRUE),
    median_text_words = median(text_word_count, na.rm = TRUE),
    min_text_words = min(text_word_count, na.rm = TRUE),
    max_text_words = max(text_word_count, na.rm = TRUE),
    total_books = n()
  ) %>%
  kable(
    caption = "Summary Statistics for 1400s EEBO Books",
    digits = 1,  # rounds to 1 decimal place
    format = "markdown" 
  )
Summary Statistics for 1400s EEBO Books
avg_title_words avg_text_words median_text_words min_text_words max_text_words total_books
12.6 18220.2 2 2 694182 408

As we previously saw, many of the books have No Text in their text column, this could be for many reasons, the title may exist but EEBO collection does not have the full text.

0.7 Removing No Text

Next, we narrow our analysis to books that contain actual textual content, excluding entries labeled “No Text” or those with only two words or fewer. This helps us focus on documents with meaningful substance.

df_1400_text <- df_1400_summary %>% 
  filter(text != "No Text") 

df_1400_text %>%
  summarise(
    avg_title_words = mean(title_word_count, na.rm = TRUE),
    avg_text_words = mean(text_word_count, na.rm = TRUE),
    median_text_words = median(text_word_count, na.rm = TRUE),
    min_text_words = min(text_word_count, na.rm = TRUE),
    max_text_words = max(text_word_count, na.rm = TRUE),
    total_books = n()
  )  %>%
  kable(
    caption = "Summary Statistics for 1400s EEBO Books with Text",
    digits = 1,  # rounds to 1 decimal place
    format = "markdown" 
  )
Summary Statistics for 1400s EEBO Books with Text
avg_title_words avg_text_words median_text_words min_text_words max_text_words total_books
17.9 58072.5 19613.5 13 694182 128

0.8 Translation via LLMs API

To make the content of 15th-century books accessible for modern research and analysis, we use Large Language Models (LLMs) to translate both titles and full texts into contemporary English.

The process included several key steps:

  • Translation via API: Using a batch-processing pipeline, we passed each book’s original title and text through an LLM (via API), producing two new fields:

    • modern_title: a plain English version of the original title
    • modern_text: the translated full text, preserving sentence structure and meaning
  • Noise tolerance: The prompt was crafted to be tolerant of OCR noise and early-modern spellings, instructing the model to focus on semantic clarity while ignoring non-linguistic artifacts.

After filtering, only books with substantive content were included in the final translated dataset.

The result is a modernized corpus of early printed texts that can now be used for downstream research, including topic modeling, keyword extraction, and content-based historical analysis.

0.8.1 Sample text, after translation

Below is an example of how the GenAI-powered translation pipeline transforms early printed English into modern, readable language.

set.seed(14)
df_1400_translated <- read_csv("df_translated_1400s.csv")

df_1400_translated %>% 
  slice_sample(n=5) %>% 
  mutate(across(where(is.character), ~ substr(., 1, 800))) %>%  # Truncate characters
  kable() %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
id year modern_title modern_text
54608 1498 The Death of Arthur

Sir Tristram was banished from Cornwall for ten years after a council with Sir Fergus and others. He swore an oath before the king and his barons and left the country, accompanied by many barons, some friends and some foes. Meanwhile, a knight named Dinadan came looking for Sir Tristram and was directed to him as he was preparing to board his ship. Dinadan asked Tristram to joust with him before he left, and Tristram agreed, with the barons’ permission. They charged at each other, and Tristram knocked Dinadan down. Dinadan then requested to join Tristram’s company, which Tristram welcomed, and they rode together to their ships.

Once at sea, Tristram sent greetings to King Mark and his enemies, promising to return when he could. He reflected on his past battles, feeling he had been well re
145254 1497 Here begins a treatise called Perula Here begins a treatise called “Perula.” A depiction of a master with his students. What should you do when you need to translate English into Latin? I will first repeat my English once, twice, or three times, and look for my main verb, then ask this question: who or what? The word that answers the question will be the nominative case for the verb. But if it is an impersonal verb, as in this example (The master teaches students), “teaches” is the verb. Who teaches? The master teaches. The master will not be the nominative case for the verb. The word that comes after the verb will most commonly be the accusative case, as in (Magister docet me). When I have an adjective noun, pronoun, participle, or relative pronoun, I will ask the same question: who or what? The word that answers will be the
54610 1498 A prediction for 1498 In the second quarter of the year, a king will likely embark on a journey outside his realm, but if he does, he will not return home because of his enemies. Kings and princes will face troubles and sadness this year due to conflicts among themselves and uprisings from their subjects. Therefore, it is wise for kings and princes to keep themselves away from the common people and from poisons and other dangers. It would not be advisable for them to start a war in the first three quarters of the year, nor to send their men to war from mid-August to October 20. If war does arise, they should resist it through preparation and delay, as that would be best. Ultimately, kings and princes will defeat their adversaries, as I predicted last year, which the Cornishmen could have avoided if they had bee
93568 1482 Little Long

As I must go to the master, I need to go to the teacher. Good students enjoy learning. However, sometimes they use impersonal verbs and have nominative cases before them. For example, “This thing pleases me” and “This matter concerns me.” Therefore, the verb “interest” is constructed with a genitive case of nouns and an ablative case of pronouns like “my,” “your,” “his,” and “our.” In this example, it is for my benefit: “It concerns me.” It is for your benefit: “It concerns you.” It is for his benefit: “It concerns him.” Also, the verb “interest” after “precisely” is constructed with an ablative case of the feminine gender of this possessive noun: “Whose.” Therefore, “It pertains to me” and not “to me.”

All the impersonal verbs in the following verses will be constructed with a dative ca
143565 1478 The Horse, the Sheep, and the Geese

The horse, the sheep, and the geese. Disputes, pleas, and disagreements between people were often resolved by looking for evidence in records. This was the custom of ancient times. Judges were appointed who had authority to hear cases that were presented fairly between the parties. People of all ranks were allowed to present their arguments based on the law that governed their disputes. Without exception, they were called to speak in front of the president, who first commanded silence. The parties were free to express their grievances and declare their rights.

To conclude this matter briefly, not long ago, I will find a fitting analogy. A scene was skillfully painted on a wall showing two figures sitting in royal status: the fierce lion, famous in all kingdoms, and the imperial eagle, pie

0.8.2 Top words, cumulative

To get a high-level view of recurring themes and vocabulary, we extract the most common words across all translated 1400s texts. After removing stopwords and short noise terms, we reveal the dominant lexical patterns that shaped early printed thought.

top_words_cumulative <- df_1400_translated %>%
  select(id, modern_text) %>%
  unnest_tokens(word, modern_text) %>%
  filter(str_detect(word, "^[a-z]+$"), nchar(word) > 3) %>%  # Clean words
  anti_join(stop_words, by = "word") %>%
  count(word, sort = TRUE) %>% 
  slice_max(n, n = 20)

top_words_cumulative %>%
  kable(col.names = c("Word", "Frequency"), caption = "Top 20 Most Common Words Across All Translated Texts") %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
Top 20 Most Common Words Across All Translated Texts
Word Frequency
lord 125
king 102
christ 75
heart 62
jesus 62
love 61
saint 53
world 53
father 51
holy 51
called 48
soul 46
time 45
death 44
people 44
body 42
grace 40
life 40
mercy 39
person 37

0.9 Distinctive Words per Book: Word Cloud Visualization

While common word frequency highlights broad themes, TF-IDF (term frequency–inverse document frequency) helps us surface the most distinctive words from each book. Below, we calculate the top 5 unique words per book and visualize their weighted importance across the entire 1400s corpus using a word cloud.

library(wordcloud)
Loading required package: RColorBrewer
# Tokenize and calculate tf-idf
tfidf_words <- df_1400_translated %>%
  select(id, modern_text) %>%
  unnest_tokens(word, modern_text) %>%
  filter(str_detect(word, "^[a-z]+$"), nchar(word) > 3) %>%  # clean words
  anti_join(stop_words, by = "word") %>%
  count(id, word, sort = TRUE) %>%
  bind_tf_idf(word, id, n) %>%
  group_by(id) %>%
  slice_max(tf_idf, n = 5) %>%
  ungroup()

tfidf_words %>%
  count(word, wt = tf_idf, sort = TRUE) %>%
  with(wordcloud(words = word, freq = n, max.words = 100))

1 If you asked an LLM to summarize 15th-century Europe, what would it say?

We’ll simulate this by feeding GPT a bundle of book summaries or full texts from the 1400s and asking it to synthesize what it learns.

1.1 Step 1: Create a single promptable text block

set.seed(42)

text_for_prompt <- df_1400_translated %>%
  pull(modern_text) %>%
#  str_sub(1, 3000) %>%  # truncate each text for token limit
  paste(collapse = "\n---\n")

#writeLines(text_for_prompt, "sample_15th_century_text.txt")

1.2 Summarize using LLM API

Here’s running the API on the full data sample of about 30 books

1.2.1 Insights from 15th-Century English Texts: A Window into Medieval Worldviews

1.2.1.1 1. Spiritual and Moral Foundations

The translated excerpts from 15th-century English texts reveal a rich tapestry of themes and concerns that reflect the worldview of the time. Central to these writings is a profound engagement with spiritual and moral questions—particularly the relationship between the divine and the individual, the nature of virtue and vice, and the societal structures that govern human conduct. These texts consistently emphasize faith, obedience to God, and the pursuit of virtue as essential for both personal salvation and communal harmony.

1.2.1.2 2. Spiritual Marriage and the Role of Women

One prominent theme is the concept of spiritual marriage, especially within the consecration of religious sisters. The texts liken a woman’s commitment to Christ to a marital bond, requiring the renunciation of worldly attachments and ambitions. This metaphor signals a higher spiritual calling—one that transcends social roles and earthly desires. Such narratives reflect a deeply hierarchical worldview, where divine order is mirrored in the social order, and any moral decay among the clergy or nobility threatens the spiritual and social fabric of society.

1.2.1.3 3. Ethical Leadership and Virtuous Governance

The behavior of rulers and clergy is a recurring focus. Leaders are called upon to embody justice, humility, and wisdom, while being warned against the perils of greed and corruption. The moral health of society is seen as a reflection of its leaders’ character. Exhortations directed at priests and nobles urge them to lead by example, reinforcing the belief that the virtues—or vices—of those in power have a ripple effect throughout the entire community.

1.2.1.4 4. Concern for the Afterlife and Collective Salvation

Another striking aspect is the concern for the afterlife and the fate of souls. These texts reflect the medieval preoccupation with death, judgment, and eternal salvation. Confession, penance, and the intercession of saints are portrayed as vital spiritual practices. Notably, the well-being of the dead is tied to the actions of the living—through prayers and acts of charity—highlighting a strong ethos of communal responsibility for spiritual outcomes.

1.2.1.5 5. Summary

In total, these 15th-century texts articulate a worldview that is:

  • Deeply spiritual in its moral and theological commitments,
  • Structured and hierarchical in its view of society and divine order,
  • Focused on ethical leadership as a cornerstone of social stability, and
  • Collectively oriented, where salvation is both a personal and communal concern.

These documents offer more than historical insight—they provide a vivid lens into how people of the time understood virtue, duty, leadership, and the human journey toward redemption.

Here’s a well-formatted version of your llm_output, ready to use in Substack, a PDF, or research summary. Each excerpt is titled, quoted, and paired with concise, insightful commentary:


2 Selected Excerpts & Commentary from 15th-Century English Texts

Prompting the LLM API the following:

Prompt:
Based on the following translated excerpts from 15th-century English books,
identify 5–7 specific passages or quotations that are especially revealing or insightful.
For each one, explain what it tells us about the worldview, values, or concerns of the period.
You may focus on themes like spirituality, governance, morality, death, gender, or community.
Present each excerpt followed by your commentary in 2–3 sentences.

1. Marriage to Christ

“This marriage must be desired by you with all your heart and free will, not compelled by any of your friends, nor for worldly honor or comfort, but only for the love and service you owe to the mighty prince…”

Commentary: This excerpt reveals the deep spiritual values of the 15th century, emphasizing personal choice and devotion in religious life. The metaphor of marrying Christ illustrates a profound commitment to spiritual purity and divine service, reflecting the era’s belief in fulfillment through religious dedication rather than worldly attachments.


2. Bishop John of Ely’s Address

“He emphasized the importance of recognizing the role of priests in the downfall of the community and lamented the moral decay among the clergy.”

Commentary: This passage reflects anxiety over moral decline within the Church. It underscores the essential role of clergy as ethical exemplars, and the societal risk posed by their failure. Calls for reform like this illustrate broader cultural concerns about spiritual leadership and public morality.


3. The Art of Dying

“When someone is likely to die, it is most important to have a special friend who will sincerely help and pray for them…”

Commentary: This excerpt captures the communal aspect of death in medieval society. Spiritual preparation for dying was a shared responsibility, where friends played a key role in prayer and moral support, reflecting the deeply social and religious understanding of mortality.


4. Moral Sayings of Christians

“Without justice, no land or country can thrive in good prosperity. Without faith, no creature can please God…”

Commentary: The text encapsulates the belief that societal success hinges on moral integrity and religious faith. Rather than wealth or power, virtues like justice and faith were seen as the true cornerstones of both individual and collective well-being.


5. Pope Innocent VIII’s Letter

“If any parents, friends, or other Christians… give or send one-twentieth of a ducat for the soul of any such deceased person… both the donors and the deceased will be participants in all the suffrages…”

Commentary: This passage highlights the medieval belief in indulgences and spiritual economy. Material donations were thought to secure spiritual benefits, not only for the deceased but also for the donor—an example of the era’s blend of religious, social, and economic life.


6. Governance and Restraint of Desire

“Therefore, emperors, princes, and governors of the past have wisely and politically established various laws to restrain such excessive desires…”

Commentary: This excerpt reflects a vision of governance grounded in moral responsibility. Leaders were expected to curb human vice through wise laws, upholding social harmony and divine justice—suggesting that political leadership was a sacred duty.


7. Spiritual Plea for Mercy

“I cry for God’s mercy, I cry for God’s mercy, I cry for God’s mercy…”

Commentary: A raw and emotional expression of penitence, this repeated plea for mercy shows the intensity of religious devotion. It echoes a worldview in which human frailty demands constant repentance and divine grace, core themes in medieval spiritual life.