Key concepts: agent, environment, state, observation, action, reward, policy, Q-value. Extensions: multi-agent RL, hierarchical RL, partially observed RL.
Hierarchy over actions (combine low level actions into high level primititive) – domain specific Intrinsic rewards for training (solving one task may not transfer to others) Our application?
Learns a span of skills in pretraining environment, then (later) higher level network switches between these skills. Our application?