Publications

NeurIPS 2024 · 2024
Extending ad hoc teamwork to settings with multiple unknown agents, enabling cooperative behavior without prior coordination
NeurIPS 2023 · 2023
A unified framework for goal-conditioned RL that generalizes policy gradient methods using f-divergences
AAAI 2023 · 2023
Decentralized MARL approach using distribution matching for cooperative multi-agent coordination
Robot World Cup 2022 · 2022
A lightweight real-time object detection pipeline for robot soccer, nominated for best paper at RoboCup 2022
NeurIPS 2021 · 2021
Estimating and minimizing the Wasserstein distance to an idealized target distribution to learn a goal-conditioned policy
NeurIPS 2020 · 2020
Using Imitation from Observation techniques to speed up transfer of policies between environments with dynamics mismatch
IJCAI 2020 · 2020
A mixing scheme that balances individual agent preferences with shared objectives and studies the subsequent learning behavior
ICML 2020 · 2020
Reducing the error due to sampling in batch TD learning
ICLR 2018 · 2018
Using policy gradient to learn navigation in knowledge bases
IEEE IGARSS 2017 · 2017
Using deep generative models to separate spectral signals from nuisance variables in hyperspectral unmixing
ICLR 2017 · 2017
Improving GAN training stability and quality with multiple discriminators
IAAI 2017 · 2017
Predicting nonstationarity in changing real-world scenarios for off-policy evaluation
IEEE SMC 2013 · 2013
Swarm-inspired technique for convex optimization using cohort-based self-supervised learning

Workshops

Deep RL Workshop & Offline RL Workshop, NeurIPS 2022 · 2022
An adversarial approach to offline imitation learning that seeks modes of the expert distribution rather than averaging over them
Deep RL Workshop, NeurIPS 2021 · 2021
Estimating and maximizing the Wasserstein distance from a start state to learn skills in a controlled Markov process
Offline RL Workshop, NeurIPS 2020 · 2020
Analysis of sampling error sources in batch TD learning and their effect on value prediction accuracy
RLDM 2019 · 2019
An actor-critic method that balances multiple reward preferences for multi-objective reinforcement learning
Deep RL Symposium, NeurIPS 2017 · 2017
Constraining TD updates to improve stability

Preprints

arXiv · 2025
Reformulates world modeling as a visual question answering problem, leveraging vision-language models for robotic planning with improved generalization over reconstruction-based methods
arXiv · 2025
Applying sequence modeling to enable agents to quickly adapt and cooperate with N unknown teammates in ad hoc teamwork settings
arXiv · 2016
Mapping VAE decoder samples back to latent space for improved generative accuracy
arXiv · 2016
Extending DQN with temporally extended macro-actions for improved exploration and learning