This research proposes a dispatching policy optimization procedure for the initial process of manufacturing systems where new jobs arrive continuously. In particular, instead of solving the optimization problem at regular intervals, we focused on obtaining a single policy that can generate a good schedule in various scenarios through learning. To achieve this, the problem was divided into the following three themes. 1) How to generate realistic demand data for training purposes. 2) How to find good features to use in dispatching policies, as well as how to learn the policy function using multiple demand data from a single objective view. 3) How to learn policy function from a multi-objective view. Each topic is combined to provide practical solutions to scheduling problems that reflect real-world factory sizes and constraints.