Learning to Score Behaviors

(This is a note based on Learning to Score Behaviors for Guided Policy Optimization. I am trying to expand and clarify some of the algorithms that were presented there. More content will be added to this note in the future!) The core question: What is the right measure of similarity between two policies acting on the same underlaying MDP and how can we devise algorithms to leverage this information for RL?...

September 21, 2022 · 9 min · Saeed Hedayatian