Attention modules for feed-forward neural networks피드포워드 인공신경망을 위한 주의집중 모듈

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 394
  • Download : 0
Deep learning has been a major tool for pattern recognition tasks due to its high representational power. Given powerful baseline architectures with residual connections [27], inception architectures [77] or densely-connected architectures [35], many studies try to find better architectures with higher representational power and better generalizability via depth [75], width [95], cardinality [89, 12] and many other aspects of deep neural networks. In this dissertation, we investigate the effect of attention mechanism in feed-forward models. First, we propose a simple and light-weight attention module for deep convolutional neural networks (DCNN), named $Bottleneck Attention Module$ (BAM). Inspired by the attention $bottleneck$ in human brain [57], we place our attention module at every bottleneck of DCNN. Bottleneck regions in human brain reduce the information quantity and keep only the relevant part. In DCNNs, pooling operations explicitly reduce the information quantity of feature maps in a spatial manner. We define such pooling operations as \textit{bottlenecks} of DCNNs, and add our attention module to resemble bottleneck regions of human brains. Since our module is added upon any DCNN, we want to keep the parameter and computational overhead as small as possible. For an efficient design, we factorize our module into two separate yet complementary branches: spatial and channel-wise branches. The spatial branch generates a 2D spatial attention map where the target object's activation is enhanced, so the spatial branch looks for 'where' the target exists; the channel-wise branch generates a 1D channel-wise attention map, and since channels are often regarded as feature detectors, so the channel-wise branch looks for 'what' the target object is. The two separate branches are then combined into single 3D attention map with the same size as the input 3D feature vector. As a result, we show that $BAM$ can boost the performance across various baseline architectures and across various tasks. The second part of this dissertation is the attention mechanism with multiple modalities for better representation learning. Specifically, we investigate the use of attention mechanism with correspondence learning to tackle the adversarial attack problem. Previous works with a single modality such as images are vulnerable to adversarial attacks or fraud inputs. To effectively detect fraud inputs, we propose a deep neural network which utilizes multi-modal inputs with attention mechanisms and correspondence learning scheme. With attention mechanisms, the network can effectively learn the representation with multiple modalities. With the correspondence learning scheme, the network is forced to check the correspondence among modalities and thus can figure out the fraud inputs. We investigate the proposed approach in a reverse vending machine system, Nephron, where the task is to classify among 3 given classes (can, pet, glass), and reject any suspicious input. Specifically, we utilize 3 different modalities (image, ultrasound, and weight) with multi-modal attention and correspondence learning. As a result, we show that our proposed model can effectively learn to exclude fraud inputs while keeping a high accuracy in the given classification task.
Advisors
Kweon, In Soresearcher권인소researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[v, 41 p. :]

Keywords

Deep learning▼aneural network▼aartificial intelligence▼aattention mechanism; 딥러닝▼a인공신경망▼a인공지능▼a주의집중▼a합성곱 인공신경망

URI
http://hdl.handle.net/10203/266705
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=734017&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0