Publications

NeurIPS 2024 · 2024

Extending ad hoc teamwork to settings with multiple unknown agents, enabling cooperative behavior without prior coordination

f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

NeurIPS 2023 · 2023

A unified framework for goal-conditioned RL that generalizes policy gradient methods using f-divergences

DM²: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

AAAI 2023 · 2023

Decentralized MARL approach using distribution matching for cooperative multi-agent coordination

Towards a Real-Time, Low-Resource, End-to-End Object Detection Pipeline for Robot Soccer

Robot World Cup 2022 · 2022

A lightweight real-time object detection pipeline for robot soccer, nominated for best paper at RoboCup 2022

Adversarial Intrinsic Motivation for Reinforcement Learning

NeurIPS 2021 · 2021

Estimating and minimizing the Wasserstein distance to an idealized target distribution to learn a goal-conditioned policy

An Imitation from Observation Approach to Sim-to-Real Transfer

NeurIPS 2020 · 2020

Using Imitation from Observation techniques to speed up transfer of policies between environments with dynamics mismatch

Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning

IJCAI 2020 · 2020

A mixing scheme that balances individual agent preferences with shared objectives and studies the subsequent learning behavior

Reducing Sampling Error in Batch Temporal Difference Learning

ICML 2020 · 2020

Reducing the error due to sampling in batch TD learning

Go for a Walk and Arrive at the Answer: Reasoning over Paths in Knowledge Bases using Reinforcement Learning

ICLR 2018 · 2018

Using policy gradient to learn navigation in knowledge bases

Unmixing in the Presence of Nuisances with Deep Generative Models

IEEE IGARSS 2017 · 2017

Using deep generative models to separate spectral signals from nuisance variables in hyperspectral unmixing

Generative Multi-Adversarial Networks

ICLR 2017 · 2017

Improving GAN training stability and quality with multiple discriminators

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing

IAAI 2017 · 2017

Predicting nonstationarity in changing real-world scenarios for off-policy evaluation

Cohort Intelligence: A Self Supervised Learning Behavior

IEEE SMC 2013 · 2013

Swarm-inspired technique for convex optimization using cohort-based self-supervised learning

Workshops

ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning

Deep RL Workshop & Offline RL Workshop, NeurIPS 2022 · 2022

An adversarial approach to offline imitation learning that seeks modes of the expert distribution rather than averaging over them

Wasserstein Distance Maximizing Intrinsic Control

Deep RL Workshop, NeurIPS 2021 · 2021

Estimating and maximizing the Wasserstein distance from a start state to learn skills in a controlled Markov process

On Sampling Error in Batch Action-Value Prediction Algorithms

Offline RL Workshop, NeurIPS 2020 · 2020

Analysis of sampling error sources in batch TD learning and their effect on value prediction accuracy

Multi-Preference Actor Critic

RLDM 2019 · 2019

An actor-critic method that balances multiple reward preferences for multi-objective reinforcement learning

TD Learning with Constrained Gradients

Deep RL Symposium, NeurIPS 2017 · 2017

Constraining TD updates to improve stability

Preprints

Semantic World Models

arXiv · 2025

Reformulates world modeling as a visual question answering problem, leveraging vision-language models for robotic planning with improved generalization over reconstruction-based methods

Sequence Modeling for N-Agent Ad Hoc Teamwork

arXiv · 2025

Applying sequence modeling to enable agents to quickly adapt and cooperate with N unknown teammates in ad hoc teamwork settings

Inverting Variational Autoencoders for Improved Generative Accuracy

arXiv · 2016

Mapping VAE decoder samples back to latent space for improved generative accuracy

Deep Reinforcement Learning with Macro-Actions

arXiv · 2016

Extending DQN with temporally extended macro-actions for improved exploration and learning