PRTFNet: HRTF Individualization for Accurate Spectral Cues Using a Compact PRTF

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 115
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKo, Byeong-Yunko
dc.contributor.authorLee, Gyeong-Taeko
dc.contributor.authorNam, Hyeonukko
dc.contributor.authorPark, Yong-Hwako
dc.date.accessioned2023-10-04T06:02:59Z-
dc.date.available2023-10-04T06:02:59Z-
dc.date.created2023-10-04-
dc.date.created2023-10-04-
dc.date.issued2023-08-
dc.identifier.citationIEEE ACCESS, v.11, pp.96119 - 96130-
dc.identifier.issn2169-3536-
dc.identifier.urihttp://hdl.handle.net/10203/312973-
dc.description.abstractSpatial audio rendering relies on accurate localization perception, which requires individual head-related transfer functions (HRTFs). Previous methods based on deep neural networks (DNNs) for predicting HRTF magnitude spectra from pinna images used HRTF log-magnitude as the network output during the training stage. However, HRTFs encompass the acoustical characteristics of the head and torso, making it challenging to reconstruct the spectral cues necessary for elevation localization. To tackle this issue, we propose PRTFNet to reconstruct the individual spectral cues in HRTFs by mitigating the influence of the head and torso. PRTFNet consists of an end-to-end convolutional neural network (CNN) model and leverages a compact pinna-related transfer function (PRTF) that eliminates the impact of sound reflections from the head and torso in the head-related impulse response (HRIR) as network output. Additionally, we introduce HRTF phase personalization, a technique that utilizes the phase spectra of a selected HRTFs from a database and adjusts the phase by multiplying it by the ratio of the target listener's head width to that of the subject of the selected HRTFs. We evaluated the proposed HRTF individualization methods using the HUTUBS dataset, and the results demonstrate that PRTFNet is highly effective in reconstructing the first and second spectral cues. In terms of log spectral distortion (LSD) and effective LSD (LSDE), PRTFNet outperforms previous deep learning-based model. Furthermore, multiplying the selected phase by the head width ratio reduces the root mean square error (RMSE) of interaural time difference (ITD) by 0.003 ms.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titlePRTFNet: HRTF Individualization for Accurate Spectral Cues Using a Compact PRTF-
dc.typeArticle-
dc.identifier.wosid001067569700001-
dc.identifier.scopusid2-s2.0-85168689812-
dc.type.rimsART-
dc.citation.volume11-
dc.citation.beginningpage96119-
dc.citation.endingpage96130-
dc.citation.publicationnameIEEE ACCESS-
dc.identifier.doi10.1109/ACCESS.2023.3308143-
dc.contributor.localauthorPark, Yong-Hwa-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorHead-related transfer functions-
dc.subject.keywordAuthorindividualization-
dc.subject.keywordAuthorpinna-related transfer functions-
dc.subject.keywordAuthorspectral cues-
dc.subject.keywordAuthorspatial hearing-
dc.subject.keywordPlusSPATIAL-AUDIO-
dc.subject.keywordPlusSOUND LOCALIZATION-
dc.subject.keywordPlusPARAMETRIC MODEL-
dc.subject.keywordPlusHEAD-
dc.subject.keywordPlusFREQUENCY-
dc.subject.keywordPlusSENSITIVITY-
dc.subject.keywordPlusREGRESSION-
dc.subject.keywordPlusPEAK-
Appears in Collection
ME-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0