Publications
Decentralized MARL approach using distribution matching for cooperative multi-agent coordination
Estimating and maximizing the Wasserstein distance from a start state to learn skills in a controlled Markov process
Estimating and minimizing the Wasserstein distance to an idealized target distribution to learn a goal-conditioned policy
Using Imitation from Observation techniques to speed up transfer of policies between environments with dynamics mismatch
A mixing scheme that balances individual agent preferences with shared objectives and studies the subsequent learning behavior
Reducing the error due to sampling in batch TD learning
Using policy gradient to learn navigation in knowledge bases
Constraining TD updates to improve stability
Improving GAN training stability and quality with multiple discriminators
Predicting nonstationarity in changing real-world scenarios for off-policy evaluation
Mapping VAE decoder samples back to latent space for improved generative accuracy
Extending DQN with temporally extended macro-actions for improved exploration and learning
Swarm-inspired technique for convex optimization using cohort-based self-supervised learning