SNN for HRL

9/17/2020

Background: Our Goal as Distinct for Authors

Using RL, automate many battlefield decisions

Concepts in RL

Key concepts: agent, environment, state, observation, action, reward, policy, Q-value.
Extensions: multi-agent RL, hierarchical RL, partially observed RL.

Previous Efforts

Hierarchy over actions (combine low level actions into high level primititive) – domain specific
Intrinsic rewards for training (solving one task may not transfer to others)
Our application?

Pre-training

Learns a span of skills in pretraining environment, then (later) higher level network switches between these skills.

Our application?

SSN

Architecture (concatenation versus bilinear integration)

Information-Theoretic Regularization

Additive reward bonus to regularize and keep skills distinct.
Our application?

Learning High Level Policies

Frozen low-level policies (not necessarily so)