Recently, as the sizes of real tensors have become overwhelmingly large including billions of nonzeros, fast and scalable Tucker decomposition methods have become increasingly important. Tucker decomposition has been widely used to analyze multidimensional data modeled as tensors. Several GPU-based Tucker decomposition methods have been proposed to enhance the decomposition speed. However, they easily fail to process large-scale tensors owing to the high memory requirements, which are larger than the GPU memory. This paper presents a scalable GPU-based Tucker decomposition method called GTucker, which carefully partitions large-scale tensors into subtensors and processes them with reduced overhead on a single machine. The results of the experiments indicate that GTucker outperforms state-of-the-art methods in terms of scalability and decomposition speed.