9/17/2020

Background: Our Goal as Distinct for Authors

Using RL, automate many battlefield decisions

Concepts in RL

  • Key concepts: agent, environment, state, observation, action, reward, policy, Q-value.
  • Extensions: multi-agent RL, hierarchical RL, partially observed RL.

Previous Efforts

  • Hierarchy over actions (combine low level actions into high level primititive) – domain specific
  • Intrinsic rewards for training (solving one task may not transfer to others)
  • Our application?

Pre-training

Learns a span of skills in pretraining environment, then (later) higher level network switches between these skills.

  • Our application?

SSN

  • Architecture (concatenation versus bilinear integration)

Information-Theoretic Regularization

  • Additive reward bonus to regularize and keep skills distinct.
  • Our application?

Learning High Level Policies

  • Frozen low-level policies (not necessarily so)