2023-07-11
SITREP Overview
- SITREP is a LLM-based tool that supports report generation for intelligence analysis.
- A key feature of SITREP is long-term memory - the ability to generate reports that reason over a large corpus of input documents of varying source, modality, and noise.
- We do this over time; as new documents come in new reports are generated that capture the dynamics of that underlying corpora.
Baseline: Retrieval Augmented Generation (RAG)
- The baseline form of long-term memory is Retrieval augmented generation (RAG).
- RAG works by matching a user-provided query to relevant docs, and using the relevant docs to generate a report.
- One of the drawbacks of RAG is that, as a form of semantic search, it is query dependent.
Graph-Based Long-term Memory
- Our goal with long-term memory is to automatically detect domain-specific graphical representations of semantic structure across the corpus, and generate reports appropriate to that structure.
- The advantage of graph-based representations of semantic structure are that they are human interpretable.
- Further we can expose graphs to user in the UI, enabling high-level analysis.
- While avoiding dependence on query, we enable the user to use queries to augment the structure-based generation of reports.
Graph-Based Approaches: Graph Communities
- One graph-based analysis technique is learning community structure.
- This involves using the LLM to extract entities and their relationships from the documents, constructing a graph, and doing community detection on the learned graph (e.g., Leiden community detection, cluster detection in graph embedding space).
- Reports are generated based on community.
- Communities can be hierarchical, enabling graph generation at various levels of abstraction.
Graph-Based Approaches: Graph Communities
- Another approach is learning semantic graphs from the corpus (e.g., hierarchical clustering in text embedding space)
- Reports are generated based on cluster.
- Again, we can generate reports at multiple layers of abstraction using hierarchical methods.
Dynamic Graph Modeling
- A key element of the graph-based techniques is to ability to use dynamic graph modeling methods
- This enables the user to see reports base elements of the community of semantic structure that change (or stay stable) over time
Alternative Forms of Longterm Memory
- There are other methods for longterm memory that we can employ including: – Embedding past user queries and reports as documents (Memorizing Transformer technique) – Using user behavior as supervision in online training of models that learn better embeddings of input documents.
- These can be combined with the proposed models