Rewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards

Cited 32 time in webofscience Cited 18 time in scopus
  • Hit : 666
  • Download : 480
DC FieldValueLanguage
dc.contributor.authorSeo, Minahko
dc.contributor.authorVecchietti, Luiz Felipeko
dc.contributor.authorLee, Sangkeumko
dc.contributor.authorHar, Dongsooko
dc.date.accessioned2019-09-24T11:21:52Z-
dc.date.available2019-09-24T11:21:52Z-
dc.date.created2019-09-24-
dc.date.created2019-09-24-
dc.date.created2019-09-24-
dc.date.created2019-09-24-
dc.date.issued2019-08-
dc.identifier.citationIEEE ACCESS, v.7, pp.118776 - 118791-
dc.identifier.issn2169-3536-
dc.identifier.urihttp://hdl.handle.net/10203/267665-
dc.description.abstractIn reinforcement learning (RL), a reinforcement signal may be infrequent and delayed, not appearing immediately after the action that triggered the reward. To trace back what sequence of actions contributes to delayed rewards, e.g., credit assignment (CA), is one of the biggest challenges in RL. This challenge is aggravated under sparse binary rewards, especially when rewards are given only after successful completion of the task. To this end, a novel method consisting of key-action detection, among a sequence of actions to perform a task under sparse binary rewards, and CA strategy is proposed. The key-action defined as the most important action contributing to the reward is detected by a deep neural network that predicts future rewards based on the environment information. The rewards are re-assigned to the key-action and its adjacent actions, defined as adjacent-key-actions. Such re-assignment process enables increased success rate and convergence speed during training. For efficient re-assignment, two CA strategies are considered as part of proposed method. Proposed method is combined with hindsight experience replay (HER) for experiments in the OpenAI gym suite robotics environment. In the experiments, it is demonstrated that proposed method can detect key-actions and outperform the HER, increasing success rate and convergence speed, in the Fetch slide task, a type of task that is more exacting as compared to other tasks, but is addressed by few publications in the literature. From the experiments, a guideline for selecting CA strategy according to goal location is provided through goal distribution analysis with dot map.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleRewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards-
dc.typeArticle-
dc.identifier.wosid000484355400003-
dc.identifier.scopusid2-s2.0-85089409425-
dc.type.rimsART-
dc.citation.volume7-
dc.citation.beginningpage118776-
dc.citation.endingpage118791-
dc.citation.publicationnameIEEE ACCESS-
dc.identifier.doi10.1109/ACCESS.2019.2936863-
dc.contributor.localauthorHar, Dongsoo-
dc.contributor.nonIdAuthorSeo, Minah-
dc.description.isOpenAccessY-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorCredit assignment-
dc.subject.keywordAuthordelayed rewards-
dc.subject.keywordAuthorgoal distribution-
dc.subject.keywordAuthorreinforcement learning-
dc.subject.keywordAuthorreward shaping-
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 32 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0