(A) study on transformer-based image compression framework of spatial scalability for arbitrary scaling applications임의 해상도를 지원하는 트랜스포머 기반 해상도 스케일러블 이미지 압축 프레임워크 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 58
  • Download : 0
In recent years, the difference in device quality such as the resolution or computing performance among users has increased in multimedia transmission environment. Regarding this, the study of scalable image compression has been considered an essential technique that encodes multiple quality images for each user environment into a shared bitstream. Recently, the end-to-end scalable image compression method based on convolutional neural network(CNN) architecture has been presented for efficient optimization process and enhanced RD performance than the existing methods which adopt the traditional compression standards. However, the CNN architecture based scalable image compression method has two critical limitations in terms of functionality and performance. First, it is impossible to generate any desired scaled images with only CNN-based scalable image compression architecture because the CNN SR module can upscale the input images to only discrete ratios. Second, the image compression model cannot consider the global relationship among all pixels due to its limited receptive field depending on the kernel size of CNN layers. Thus RD performance may be limited in the image compression process. To overcome these two limitations, we propose a novel spatial scalable image compression framework that can generate any arbitrarily scaled outputs with high image quality regardless of the number of outputs and even the resolution of outputs in this paper. In the proposed framework, we use the implicit representation function, which represents the pixel value of images to the continuous function, so it is possible to control the output of scaled images to any target resolution. We additionally adopt a novel Transformer-CNN hybrid architecture that have recently shown in the field of computer vision to enhance the RD performance of the image compression model in our proposed framework. The proposed transformer architecture, Multi Window size RSTB(MW-RSTB), consists of multiple Residual Swin Transformer Blocks, which have different window sizes for local window self-attention layers. By adding the proposed transformer architecture into the existing CNN-based image compression model, the proposed architecture can learn the relationship with the overall pixels of the images. In this paper, we first attempt the new approach for arbitrarily scalable image compression to any continuous scale, which has never been studied yet in the existing deep learning-based scalable image compression area. Also, we demonstrate that our proposed framework can generate consistent image quality for any scaled output images. Furthermore, extensive experiments show that our framework generates better image quality outputs than SHVC, a scalable extension of HEVC, and a state-of-the-art CNN-based scalable image compression method by +1.30dB and +0.99dB in PSNR, respectively, by using similar bitrate in the discrete scale scenario.
Advisors
Kim, Munchurlresearcher김문철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[vi, 50 p. :]

Keywords

Scalable coding▼aImage compression▼aTransformer▼aDeep learning; 스케일러블 코딩▼a이미지 압축▼a트랜스포머▼a딥러닝

URI
http://hdl.handle.net/10203/309821
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008345&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0