Joint maintenance optimization of a multi-component system is not straightforward due to inter-dependence among components, and because reliability data is known to be incomplete and scarce. Also, when analyzing multi-component systems, phase-type distributions are widely used because they allow approximation of non-Markovian models, which permits to analyze complex systems under Markovian deterioration. Thus, a novel approach that fits a restricted class of discrete phase-type distribution through a pre-specified hazard sequence from incomplete observations is proposed. In addition, a 4-parameter pre-specified hazard sequence that unifies non-decreasing, hump-shaped, and bath-tube shaped hazard functions are presented. Since reliability data are typically truncated and censored, an Expectation-Maximization algorithm is derived to fit the parameters of the proposed pre-specified hazard sequence from left-truncated and right-censored observations. Thus, the maintenance optimization problem is modeled by using a model-based reinforcement learning scheme, where the transition probabilities are derived from the discrete phase-type distribution linked to the fitted pre-specified unified hazard sequence. In addition, looking for optimal joint preventive maintenance policy is known to be challenging due to the combinatorial maintenance grouping problem. Hence, a reduced action space is proposed by preserving optimality for homogeneous multi-component systems. A threshold policy derived from the characterization of optimal policy's decision boundaries is presented for heterogeneous multi-component systems. Moreover, we observe that the optimal policy’s decision boundary is counter-intuitive, which is not seen in the literature. Some detailed analysis is given about it. Finally, the proposed Expectation-Maximization algorithm and the threshold policy are analyzed through extensive Monte Carlo simulations.