This study presents a robust dual control method for batch processes under parametric uncertainty. Proximal policy optimization (PPO), a policy gradient reinforcement learning algorithm, is employed to construct an implicit dual controller in a computationally amenable way. The proposed control method can robustly and actively cope with uncertainties seen in a repeated sequence of batch operations by incorporating a penalty term for constraint violation into the reward function and by considering the effect of control inputs on future uncertainty. An application to a bioethanol fermentation process is discussed to demonstrate the effectiveness of the proposed control strategy. It is shown that the proposed robust dual controller has an active learning feature such that the overall performance improves compared to a certainty-equivalence based approach.