Showing results 1 to 4 of 4
Learning Policy from Suboptimal Demonstrations under Transition Dynamic Mismatch Kim, Taesu; Wang, Jianhong; Antotsiou, Dafni; Kim, Tae-Kyun, Thirty-ninth International Conference on Machine Learning, Workshop on Adaptive Experimental Design and Active Learning, ICML, 2022-07-22 |
Modelling hierarchical structure between dialogue policy and natural language generator with option framework for task-oriented dialogue system Wang, Jianhong; Zhang, Y; Kim, Tae-Kyun; Gu, Yunjie, Ninth International Conference on Learning Representation (ICLR), The International Conference on Learning Representations (ICLR), 2021-05-05 |
Shapley Q-value: A Local Reward Approach to Solve Global Reward Games Wang, Jianhong; Zhang, Yuan; Kim, Tae-Kyun; Gu, Yunjie, 34th AAAI Conference on Artificial Intelligence, AAAI 2020, pp.7285 - 7292, AAAI, 2020-02-08 |
SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning Wang, Jianhong; Wang, Jinxin; Zhang, Yuan; Gu, Yunjie; Kim, Tae-Kyun, 36th Conference on Neural Information Processing Systems, NeurIPS 2022, NeurIPS, 2022-11-30 |
Discover