When a system works in dangerous materials or on hazardous area, is fault, then we cannot immediately repair the system. Therefore fault-tolerant design concept is very useful to obtain system`s reliability. This thesis gives an optimal control algorithm which can control system with fault. We can detect system`s fault as differences between model output and real output, and also from these differences a new model for system can be generated with on-line learning algorithm. With a new model, the controller can be generated as form of reinforcement learning on continuous state and action. Firstly, we use online sparse Gaussian Process (GP) regression for system modeling. Using that regression algorithm we can model the system in real time experiment. However, it is hard to choose the hyper-parameters of current GP. We propose new optimization algorithm based on information aspect, using that we can handle bias and variance trade-off. Secondly, using model-based value gradient control scheme with GP Reinforcement Learning (RL), we can obtain the optimal control algorithm which reduced time consuming. we use dynamic-framework which fully use simulation from learned model and real experiment from given environment. Using BEB-algorithm, we can make much strict algorithm which could solve exploration and exploitation trade-off. Simulation result shows performance of proposed algorithm which is superior to others. Our study of learning method for unknown system can be expected to stimulate research about fault tolerant design of intelligence robot.