Estimating and minimizing the Wasserstein distance to an idealized target distribution to learn a goal-conditioned policy. Introducing the time-step metric as a way to measure the work of transporting measure in MDPs and used in estimating the Wasserstein distance.
Recommended citation: Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone. (2021). “Adversarial Intrinsic Motivation for Reinforcement Learning”. Proceedings of the International Conference on Neural Information Processing Systems, 2021.