2023-07-11

SITREP Overview

  • SITREP is a LLM-based tool that supports report generation for intelligence analysis.
  • A key feature of SITREP is long-term memory - the ability to generate reports that reason over a large corpus of input documents of varying source, modality, and noise.
  • We do this over time; as new documents come in new reports are generated that capture the dynamics of that underlying corpora.

Baseline: Retrieval Augmented Generation (RAG)

  • The baseline form of long-term memory is Retrieval augmented generation (RAG).
  • RAG works by matching a user-provided query to relevant docs, and using the relevant docs to generate a report.
  • One of the drawbacks of RAG is that, as a form of semantic search, it is query dependent.

Graph-Based Long-term Memory

  • Our goal with long-term memory is to automatically detect domain-specific graphical representations of semantic structure across the corpus, and generate reports appropriate to that structure.
  • The advantage of graph-based representations of semantic structure are that they are human interpretable.
  • Further we can expose graphs to user in the UI, enabling high-level analysis.
  • While avoiding dependence on query, we enable the user to use queries to augment the structure-based generation of reports.

Graph-Based Approaches: Graph Communities

  • One graph-based analysis technique is learning community structure.
  • This involves using the LLM to extract entities and their relationships from the documents, constructing a graph, and doing community detection on the learned graph (e.g., Leiden community detection, cluster detection in graph embedding space).
  • Reports are generated based on community.
  • Communities can be hierarchical, enabling graph generation at various levels of abstraction.

Graph-Based Approaches: Graph Communities

  • Another approach is learning semantic graphs from the corpus (e.g., hierarchical clustering in text embedding space)
  • Reports are generated based on cluster.
  • Again, we can generate reports at multiple layers of abstraction using hierarchical methods.

Dynamic Graph Modeling

  • A key element of the graph-based techniques is to ability to use dynamic graph modeling methods
  • This enables the user to see reports base elements of the community of semantic structure that change (or stay stable) over time

Alternative Forms of Longterm Memory

  • There are other methods for longterm memory that we can employ including: – Embedding past user queries and reports as documents (Memorizing Transformer technique) – Using user behavior as supervision in online training of models that learn better embeddings of input documents.
  • These can be combined with the proposed models