Strategies to apply reinforcement learning for advanced chemical process control발전된 화학 공정 제어를 위한 강화 학습 적용 전략 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 70
  • Download : 0
A typical procedure for the chemical process control is to build a first principle model or black-box model with given data sets and prior knowledge, then construct and solve mathematical programming problems to obtain an optimal control policy. However, the performance of a model-based control method, such as model predictive control (MPC), depends strongly on the quality of the model. As developing an accurate model requires significant prior knowledge and sufficient data, model-plant mismatch is inevitable. In addition, most process control problems involve substantial disturbances. The result of optimization based on a deterministic model that ignores the uncertainty can be suboptimal and result in constraint violations. Meanwhile, as interest in machine learning has increased recently, many researchers investigated reinforcement learning (RL) algorithms and its applications. RL learns system information through interaction with the environment and derives an optimal policy based on the collected data. This strategy is suitable for learning the stochastic variations and for deriving an optimal policy accordingly; thus, it has been proposed as an alternative to the model-based control method. The prime advantages of this approach are not requiring a precise model and moving a significant amount of online calculations offline. However, application-specific features of existing RL algorithms researched in games, robot controls, and autonomous driving make it challenging to apply the published algorithms directly to chemical process control problems with success. Therefore, it is necessary to suggest RL strategies considering the aspects of chemical process problems to effectively apply RL to chemical process control. This dissertation develops strategies for the effective applications of RL to chemical process control problems. First, an RL strategy is proposed for advanced batch process control, representing a challenge given its dynamic operation over a large operating envelope. We define components of reward function reflecting both constraints and reactor productivity. For the reward function design and value/policy function representation, a phase segmentation approach is suggested. In addition, the deep deterministic policy gradient algorithm (DDPG) is modified with Monte-Carlo learning to ensure more stable and efficient learning behavior. A case study of a batch polymerization process producing polyols is used to demonstrate the improvement brought by the proposed approach. Second, we propose a dynamic penalty (DP) approach where the penalty factor is gradually and systematically increased as the iteration episodes proceed during training. Adding penalty values to the reward function for violating the constraints is a general method to handle constraints in deep RL applications. However, while training neural networks to learn the value (or Q) function, one can run into numerical difficulties caused by sharp changes in the value function at the constraint boundary. To address this issue, we propose a DP approach, and the agents trained with the proposed approach are compared with the agents trained with other constant penalty functions in a vehicle control problem and a battery management control problem. Results show that the DP approach can help improve the accuracy of the value function approximation, leading to superior results in constraint satisfaction and the degree of violation. Finally, we propose an algorithm to complementarily use MPC and RL. The suggested method, 'value function incorporated MPC (VFMPC),' uses a value function approximator as a terminal cost term in the MPC and updates the value function with the collected data through an RL algorithm. With a CSTR benchmark problem, We show that this algorithm can rectify the performance loss due to model-plant mismatch and get robust control performances against unmeasured disturbances.
Advisors
이재형researcherLee, Jay Hyungresearcher
Description
한국과학기술원 :생명화학공학과,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 생명화학공학과, 2022.2,[iv, 85 p. :]

URI
http://hdl.handle.net/10203/308516
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=996300&flag=dissertation
Appears in Collection
CBE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0