Assuring stability of the guidance law for quadrotor-type Urban Air Mobility (UAM) is important since it is assumed to operate in urban areas. Model free reinforcement learning was intensively applied for this purpose in recent studies. In reinforcement learning, the environment is an important part of training. Usually, a Proximal Policy Optimization (PPO) algorithm is used widely for reinforcement learning of quadrotors. However, PPO algorithms for quadrotors tend to fail to guarantee the stability of the guidance law in the environment as the search space increases. In this work, we show the improvements of stability in a multi-agent quadrotor-type UAM environment by applying the Soft Actor-Critic (SAC) reinforcement learning algorithm. The simulations were performed in Unity. Our results achieved three times better reward in the Urban Air Mobility environment than when trained with the PPO algorithm and our approach also shows faster training time than the PPO algorithm.