The cluster tool, which consists of multiple process chambers are widely used in the semiconductor industry. As the process of wafers becomes more sophisticated, the operation of cluster tools is also being improved. To effectively operate cluster tools, several rule-based schedules, such as the swap sequence have been developed. However, scheduling in time variance environment is not fully considered. In this paper, we propose a cluster tool modeling method, which can handle time variance in dual-armed cluster tool. Then, we present a reinforcement learning process based on the proposed cluster tool model to find new operational schedules in specific configurations. To measure the performance of the newly obtained schedule, makespan is compared under the new policy and the swap policy. The makespan reduced compared to the conventional swap policy, which implies that the reinforcement learning well learned the operation schedule in the time variance environment.