Generative Models & Games for Offline IRL

Research on achieving SOTA results for Offline IRL

This research project was conducted as part of COMPSCI 690S: Human-Centric Machine Learning, under the instruction of Professor Scott Niekum. Thank you to my collaborator Giang Nguyen.

Abstract from:

Ranking Games Meet Generative World Models for Offline Inverse Reinforcement Learning

The motivation for Offline Inverse Reinforcement Learning (Offline IRL) is to discern the underlying reward structure and environmental dynamics from a fixed dataset of previously collected experiences. For safety-sensitive applications this paradigm becomes a topic of interest because interacting with the environment may not be possible. Therefore accurate models of the world becomes crucial to avoid compounding errors in estimated rewards. With limited demonstrations data of varying expertise, it also becomes important to be able to extrapolate from beyond these demonstrations in order to infer high-quality reward functions. We introduce a bi-level optimization approach for offline IRL which accounts for uncertainty in an estimated world model and uses a ranking loss to encourage learning from intent. We demonstrate the algorithm can match state-of-the-art offline IRL frameworks over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.

Source

Code