Department of Curriculum and Instruction
Program in Applied Data Science and Analytics
Howard University
nathan.alexander@howard.edu
Computational curriculum collective and community-centered teaching and laerning lab
AI for Recovering Networks from Fragmented Archives
Archival collections offer important information about the intellectual and social networks of historical actors. Often, however, relationships are “hidden in plain sight” in archival materials due to their fragmented structures. In our individual investigations of Black mathematicians, our team has found important and valuable connections between different actors across time periods that extend and challenge how we think about the development of their lives and work in the mathematical sciences.
This project is joint work with Drs. John Stigall (Howard University, Philosophy), Robin Wilson (Loyola Marymount University), Terika Harris (Columbia University), Erica Walker (University of Toronto), and Edray Goins (Pomona College).
Dr. Erica Walker, expert on Black mathematicians
Motivation for this work
Dr. Euphemia Lofton Haynes
Dr. Kelly Miller
Dr. Kelly Miller, Dean at Howard University
Sigma Pi Phi, 1911 initiates
Sigma Pi Phi, 1911 initiates - Kelly Miller and Carter G. Woodson
American Negro Academy, circa 1910
For this project, we are expanding our existing research using named entity recognition (NER) and social network analysis to develop a set of open source computational records and data tools that can map the network structure of actors across multiple archival collections.
How do archival fragments reveal overlooked relationships among Black mathematicians, and what do those relationships show about the historical development of their intellectual and social worlds?
Density. Archival collections contain dense information about the intellectual and social networks of historical actors, and fewer studies by individual teams – as opposed to institutions – are able to map the mass of archival content.
Fragmentation. Many organizations exist for archival research; however, there are still issues of coordination. As a result, many relationships remain “hidden in plain sight” because they are scattered across fragmented, multimodal sources.
Explicit networks. Co-authored publications, conference proceedings, and formal organizational rosters capture only a narrow share of networks.
Hidden networks. Many unseen relationships, mentorship, informal collaborations, and social infrastructures shape intellectual life
To support our knowledge of the intellectual worlds of Black mathematicians, we develop detailed timelines tracking their lives.
We begin with archival collections using methods from Black archival practice (Okechukwu, 2022; Prosper, 2024; Sutherland & Collier, 2022).
Care as Stewardship: Reimagining archival labor as an “ethic of care”
Refusal and Fugitivity: The, “refusal” to conform to traditional, often harmful, archival standards
Reparative Work and Re-membering: Using the archive to repair the fragmentation caused by slavery and systemic oppression
Embodied and Living Archives: Recognizing that memory exists not just in paper records, but in bodies and oral traditions
Community-Centered Approach: Shifting power back to the community to determine what is documented and how
Timeline of Miller and Haynes and Howard University
Our work begins with a formal modeling process of the data to be stored.
A relational database of metadata and extracted network links is built on set theory, relations, and relational algebra. In this framework, each table is a relation, each row is a tuple, and each query is an operation that returns new relations.
Let \(D\) be a domain, such as names, dates, or identifiers. An \(n\)-ary relation \(R\) is a subset of the Cartesian product \(D_1 \times D_2 \times \cdots \times D_n\), and each tuple in \(R\) is one record in the database.
A database schema specifies the attributes and their domains, while the database instance is the current set of tuples stored at a given time.
Let \(D\) be a domain, such as names, dates, or identifiers. An \(n\)-ary relation \(R\) is a subset of the Cartesian product \(D_1 \times D_2 \times \cdots \times D_n\), and each tuple in \(R\) is one record in the database.
A database schema specifies the attributes and their domains, while the database instance is the current set of tuples stored at a given time.
Kelly Miller Network-Organization Database
Our metadata records describe the database itself: tables, columns, data types, keys, constraints, and indexes. We use these records in our NER analyses.
NER and metadata interactions
In archival and network research, this metadata often becomes the structured representation of entities and sources, such as person, document, date, collection, and tie type.
If the archive is fragmented, this structure helps preserve provenance while making missingness and partial overlap explicit.
If links are extracted from documents, they can be represented as a binary relation:
\[ L \subseteq E \times E \]
where \(E\) is the set of entities and \(L\) is the set of observed ties.
A more detailed link table can be modeled as a higher-arity relation, for example:
\[ (source, target, type, date) \in L \]
This allows directed, typed, weighted, or time-stamped links to be stored in one normalized structure. Importantly, this normalization helps to solve our issue of fragmentation.
The basic operations are:
These operations are closed over relations, meaning the output of each operation is again a relation.
Some of the archival correspondence has information about the sender, recipient, date, and context. A join can connect those records to person authority files, a projection can keep only sender and recipient, and the resulting edge list can be turned into a graph for centrality and broader community analysis.
Letter from Kelly Miller to W. E. B. Du Bois
A network can be derived from a relational table by constructing an adjacency matrix \(A\), where
\[ A_{ij} = \begin{cases} 1, & \text{if a link exists from } i \text{ to } j \\ 0, & \text{otherwise} \end{cases} \]
Weighted or directed networks replace the 0/1 value with counts, strengths, or direction-specific values. This gives a pipeline:
\[ \text{metadata} \rightarrow \text{relation} \rightarrow \text{relational operations} \]
\[ \text{relational operations} \rightarrow \text{edge list or matrix} \rightarrow \text{network analysis} \]
Mathematically, our databases are not just for storage. They are a formal system for representing entities as sets of tuples and relationships as relations, then using algebraic operations to derive the networks we analyze.
Mathematically, our databases are not just for storage. They are a formal system for representing entities as sets of tuples and relationships as relations, then using algebraic operations to derive the networks we analyze.
–
We welcome your suggestions!
April 13 2026 – Howard University – CADSA Symposium