Since its advent, deep networks showed amazing performance on various topics in computer vision.
The two representative network models are Convolutional Neural Network(CNN), a powerful architecture for treating spatial information in an image, and Long Short-Term Memory(LSTM), which is best known for its exceptional achievements in sequential data. In this paper, we propose for the first time a method to diagnose both CNN and CNN+LSTM network in classification tasks. Our idea is based on analogy between CNN and object processing stages in the human visual cortex. Using analogy from the human visual perception process, we introduce a novel method to figure out a set of critical features for the CNN's classification performance, and call this set of features as evidence. The visualized evidence can explain the reasons when misclassifications occur and helps understand constraints of outperforming networks. Finally, we address the limitations of our work and suggest areas for further study.