Publications

DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

Published in AAAI 2023, 2023

Estimating and minimizing the Wasserstein distance to an idealized target distribution to learn a goal-conditioned policy

Recommended citation: Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone. (2023). "DM2: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching". AAAI, 2023. https://arxiv.org/abs/2206.00233

Wasserstein Distance Maximizing Intrinsic Control

Published in Deep RL Workshop, NeurIPS 2021, 2021

Estimating and maximizing the Wasserstein distance from a start state to learn skills in a controlled Markov process

Recommended citation: Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih. (2021). "Wasserstein Distance Maximizing Intrinsic Control". Deep RL workshop, NeurIPS, 2021. https://arxiv.org/abs/2110.15331

Adversarial Intrinsic Motivation for Reinforcement Learning

Published in NeurIPS 2021, 2021

Estimating and minimizing the Wasserstein distance to an idealized target distribution to learn a goal-conditioned policy

Recommended citation: Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone. (2021). "Adversarial Intrinsic Motivation for Reinforcement Learning". NeurIPS, 2021. https://arxiv.org/abs/2105.13345

An Imitation from Observation Approach to Sim-to-Real Transfer

Published in NeurIPS 2020, 2020

Using Imitation from Observation techniques to speed up transfer of policies between environments with dynamics mismatch

Recommended citation: Siddarth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone. (2020). "An Imitation from Observation Approach to Sim-to-Real Transfer". NeurIPS, 2020. https://arxiv.org/pdf/2008.01594.pdf

Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning

Published in International Joint Conference on Artificial Intelligence, 2020

This paper focuses on such a scenario in which agents have individual preferences regarding how to accomplish the shared task

Recommended citation: Ishan Durugkar, Elad Liebman, Peter Stone. (2020). "Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning". International Joint Conference on Artificial Intelligence, 2020. https://idurugkar.github.com/files/IJCAI20-ishand.pdf

Reducing Sampling Error in Batch Temporal Difference Learning

Published in International Conference on Machine Learning, 2020

Reducing Sampling Error in Batch Temporal Difference Learning

Recommended citation: Brahma S. Pavse, Ishan Durugkar, Josiah P. Hanna, Peter Stone. (2020). "Reducing Sampling Error in Batch Temporal Difference Learning". International Conference on Machine Learning, 2020. https://idurugkar.github.com/files/PSEC-TD.pdf

Multi-Preference Actor Critic

Published in ArXiv, 2019

Using multiple auxiliary tasks as soft preferences on the policy while learning

Recommended citation: Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine. (2019). "Multi-Preference Actor Critic". ArXiv Preprint 2019. https://idurugkar.github.com/files/MultiPref_ActorCritic.pdf

Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning

Published in International Conference on Learning Representations, 2018

Using Policy Gradient to learn navigation in knowledge bases

Recommended citation: Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum. (2018). "Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning". International Conference on Learning Representations, 2017. https://idurugkar.github.com/files/MINERVA_ICLR2018.pdf

TD Learning with Constrained Gradients

Published in Deep Reinforcement Learning Symposium, NeurIPS, 2017

Constraining TD update to improve stability

Recommended citation: Ishan Durugkar, Peter Stone. (2017). "TD Learning with Constrained Gradients". Deep Reinforcement Learning Symposium, NeurIPS 2017. https://idurugkar.github.com/files/constrained_td_NeurIPS2017_workshop.pdf

Generative Multi-Adversarial Networks

Published in International Conference on Learning Representations, 2017

Improving GAN training with multiple descriminators

Recommended citation: Ishan Durugkar, Ian Gemp, Sridhar Mahadevan. (2017). "Generative Multi-Adversarial Networks". International Conference on Learning Representations, 2017. https://idurugkar.github.com/files/GMAN_ICLR2017.pdf

Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing

Published in Twenty-Ninth IAAI Conference, 2017

Predicting nonstationarity in changing real world scenarios for off-policy evaluation

Recommended citation: Philip S Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, Emma Brunskill. (2017). "Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing". Twenty-Ninth IAAI Conference. https://idurugkar.github.com/files/predictive_off_policy_evaluation_IAAI2017.pdf

Inverting Variational Autoencoders for Improved Generative Accuracy

Published in arXiv preprint arXiv:1608.05983, 2016

mapping VAE decoder samples back to latent space for improved accuracy

Recommended citation: Ian Gemp, Ishan Durugkar, Mario Parente, M Darby Dyar, Sridhar Mahadevan. (2016). "Inverting Variational Autoencoders for Improved Generative Accuracy". arXiv preprint arXiv:1608.05983. https://idurugkar.github.com/files/inverting_variational_ae.pdf

Deep reinforcement learning with macro-actions

Published in arXiv preprint arXiv:1606.04615, 2016

DQN with macro-actions

Recommended citation: Ishan P Durugkar, Clemens Rosenbaum, Stefan Dernbach, Sridhar Mahadevan. (2016). "Deep reinforcement learning with macro-actions". arXiv preprint arXiv:1606.04615. https://idurugkar.github.com/files/macro_actions.pdf

Cohort intelligence: a self supervised learning behavior

Published in 2013 IEEE international conference on systems, man, and cybernetics, 2013

Swarm technique for convex optimization

Recommended citation: Anand J Kulkarni, Ishan P Durugkar, Mrinal Kumar. (2013). "Cohort intelligence: a self supervised learning behavior" 2013 IEEE international conference on systems, man, and cybernetics. https://www.researchgate.net/profile/Anand_Kulkarni4/publication/262285087_Cohort_Intelligence_A_Self_Supervised_Learning_Behavior/links/00b49538dd90cb260e000000.pdf