In this paper, we propose the through silicon via (TSV) array design optimization method using deep reinforcement learning (DRL) framework. The agent trained through the proposed method can provide an optimal TSV array that minimizes far-end crosstalk (FEXT) in one single step. We define the state, action, and reward that are elements of the Markov Decision Process (MDP) for optimizing the TSV array considering FEXT and train a deep q network (DQN) agent. For verification, we applied the proposed method to a 3 by 3 through silicon via array at stacked DRAM of High Bandwidth Memory (HBM). The network converged well, and as the result, the proposed method provided the optimal design that satisfies the target FEXT in which 3 dB lower than the initial design.