DSpace at KOASAS: SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Conference Papers(학술대회논문)

SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning

Cited 0 time in webofscience

Cited 0 time in

Hit : 75
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Park, Jongjin	ko
dc.contributor.author	Seo, Younggyo	ko
dc.contributor.author	Shin, Jinwoo	ko
dc.contributor.author	Lee, Honglak	ko
dc.contributor.author	Abbeel, Pieter	ko
dc.contributor.author	Lee, Kimin	ko
dc.date.accessioned	2023-12-12T11:03:00Z	-
dc.date.available	2023-12-12T11:03:00Z	-
dc.date.created	2023-12-08	-
dc.date.created	2023-12-08	-
dc.date.issued	2022-04-26	-
dc.identifier.citation	10th International Conference on Learning Representations, ICLR 2022	-
dc.identifier.uri	http://hdl.handle.net/10203/316334	-
dc.description.abstract	Preference-based reinforcement learning (RL) has shown potential for teaching agents to perform the target tasks without a costly, pre-defined reward function by learning the reward with a supervisor's preference between the two agent behaviors. However, preference-based learning often requires a large amount of human feedback, making it difficult to apply this approach to various applications. This data-efficiency problem, on the other hand, has been typically addressed by using unlabeled samples or data augmentation techniques in the context of supervised learning. Motivated by the recent success of these approaches, we present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation. In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor. To further improve the label-efficiency of reward learning, we introduce a new data augmentation that temporally crops consecutive subsequences from the original behaviors. Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the state-of-the-art preference-based method on a variety of locomotion and robotic manipulation tasks.	-
dc.language	English	-
dc.publisher	International Conference on Learning Representations	-
dc.title	SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning	-
dc.type	Conference	-
dc.identifier.scopusid	2-s2.0-85140724160	-
dc.type.rims	CONF	-
dc.citation.publicationname	10th International Conference on Learning Representations, ICLR 2022	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Virtual	-
dc.contributor.localauthor	Shin, Jinwoo	-
dc.contributor.localauthor	Lee, Kimin	-
dc.contributor.nonIdAuthor	Lee, Honglak	-
dc.contributor.nonIdAuthor	Abbeel, Pieter	-

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning

KOASAS

Communities & Collections