Vision processor design based on unifed FAST-BRIEF hardware and variation aware power estimation technique통합된 FAST-BRIEF 하드웨어와 변이 고려 전력 측정 기법을 기반으로 한 비전 프로세서 디자인

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 366
  • Download : 0
As the demand for high-quality vision processing on mobile devices such as smart phones, tablet PCs, and ADAS (advanced driving assistance system) increases, hardware supports for various vision algorithms become essential in mobile environments. To support those vision algorithms with high-quality and high-resolution trends in embedded systems, the requirement for hardware resources also increases super-linearly with respect to the amount of data to be processed. Therefore, (1) high processing performance is required as well as (2) low power consumption and (3) small implementation area. Furthermore, (4) sufficient external memory bandwidth also should be guaranteed to support maximum target frame rate. In this dissertation, the energy efficient vision processor is presented for real-time low-level vision processing on mobile devices. It includes a reconfigurable dedicated accelerator with a unified memory scheme, a specialized cache memory for reducing external memory overhead, and parallel processing cores for general purpose vision processing on a mobile platform. The vision accelerators based on the optimized memory architecture shows better system efficiency in terms of power, performance, and area than parallel processing cores when they execute low-level vision algorithm. Meanwhile, a dynamic power management scheme based on a real-time variation analysis method is proposed because the parallel processing cores consume than 50% of total power in general. Interest point extraction and matching algorithms are essential in most vision tasks such as object tracking, localization, SLAM (simultaneous localization and mapping), image matching, recognition and image stitching. However, it is not easy to detect features from high resolution video streams in real-time even with high computing power. Many hardware architectures based on parallel processing cores have been proposed to resolve this problem, but the state-of-the-art implementations achieve only 30fps with VGA images ($640\times 480$) and suffer from massive area/power overhead. A unified interest point detection and matching accelerator is presented for embedded vision applications. It performs image-based recognition applications in real-time both in mobile and vehicle. The proposed system is implemented as a small IP, and it has 8 times higher throughput than state-of-the-art object recognition processors which are implemented based on heterogeneous many-core system. The accelerator has 3 key features: 1) Joint algorithm-architecture optimizations for exploiting bit-level parallelism, 2) A low-power unified hardware platform for interest point detection and matching, and 3) scalable hardware architecture. It consists of 78.3k logic gates and 128kB SRAM, integrated in a test chip for verification. Both interest point detection and matching operations are required for the general recognition process. These two operations are functionally independent, so different hardware should be implemented for the complete recognition process. This causes two critical problems; reconfigurable image memory (RIM) and point cloud index memory (PCIM). RIM is a unified memory architecture to load pixel value from a raw image patch. Since FAST, BRIEF and Census transform have different and complex memory access patterns, miss-rate of memory access might be increased. To optimize the memory operation, RIM can change its memory configuration according to the algorithm. Since joint algorithm-architecture co-optimization mitigates performance degradation caused by bank conflicts, the unified reconfigurable memory scheme has a lot of flexibility at a minimal hardware overhead. PCIM is a dedicated memory system utilizing the geometric information of the cameras in order to reduce the off-chip memory bandwidth. Based on the geometric information, PCIM removes most of the redundant candidates. Since PCIM minimizes the off-chip memory bandwidth using a dedicated cache, the performance degradation is negligible compared to the exact nearest neighbor method. The area-based stereo matching is accelerated based on GPGPU architecture as the search range is adaptively reduced according to the disparity of the matched correspondences. In order to fully support massive computational requirement of a vision algorithm, GPGPU (General purpose GPU)-based multi-core parallel processing architecture is essential. Since the parallel processing cores consume than 50% of total power in general, many-core power management technique such as dynamic voltage frequency scaling (DVFS) is required for energy-efficient vision processing. Since advanced technology makes it possible to integrate more transistors on a chip, circuits are suffering from large process, voltage, and temperature (PVT) variations impacting dynamic and leakage power consumption with deep submicron technologies (20nm and further). However, since the state-of-the-art dynamic power management schemes did not consider those power variations, it is not easy to estimate power consumption accurately according to configurations of power management. In order to understand the power variation of integrated parallel cores, machine learning based variation analysis technique is proposed. It estimates the power characteristics of each core in real-time from total power and activation events of on-chip cores. Many-core power management technique is also presented to improve energy efficiency according to target applications. The proposed method finds the optimal V-F configuration of all cores in terms of energy within a few micro-seconds. The overall hardware consists of 1.20M logic gates and consumes maximum 185mW. Interest point detection and matching accelerator achieves 106 frames per second (fps) in 1080p full HD resolution at 200MHz operating frequency with 3500 descriptors per image. The proposed many-core power management technique is verified in 65nm low-power CMOS process, and also evaluated in more advanced CMOS technology.; 1) area-efficiency loss and 2) unbalanced workload. In order to resolve these problems, unified hardware platform is proposed to share the same hardware between interest point detection and matching as a result of joint algorithm-architecture co-optimization. Since the proposed hardware is a multi-functional hardware accelerators exploiting bit-level parallelism, it is a good solution to mitigate the area overhead in a mobile environment. Furthermore, it resolves performance degradation caused by load unbalancing. The proposed hardware achieves $9.5 \times$ performance improvement only with 30% of logic gates including SRAM compared to the state-of-the-art object recognition processors. The unified interest point detection and matching hardware with optimized memory architecture is used for real-time high-resolution stereo matching system. In order to accelerate stereo matching algorithm, the unified data-path in the hardware performs not only interest point detection and matching algorithm such as FAST and BRIEF but also Census transform, which is widely used in stereo matching, in real-time. To achieve maximum performance, two special memory architectures are proposed
Advisors
Kim, Lee-Supresearcher김이섭researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2015
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2015.8,[xii, 131 p. :]

Keywords

Vision processor; Interest point detection; Interest point matching; Power estimation technique; DVFS; Power variation; Unified datapath; Memory architecture; Low power design; 비전 프로세서; 특징점 추철; 특징점 매칭; 전력 측정 기법; 파워 변이; 통합된 데이터패스; 메모리 구조; 저전력 설계

URI
http://hdl.handle.net/10203/241998
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=669329&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0