In this paper, we designed reinforcement learning environment for distributed patrolling agents. In the partially observable environment, the agents take actions for each one's interest and the non-stationary problem in multi-agent setting encourages the agents not to invade other agent's region. In our environment, the patrolling routes for the agents are generated implicitly. We suggested different types of the environments and evaluated with different initial positions of the agents. We also show how the reinforcement learning algorithm changes the distribution of agents as training time goes.