Unmanned aerial vehicles (UAVs) have been widely used in complex applications, such as military, exploration, and rescue. Although there are many quadcopter applications, the bi-copter like a coaxial helicopter has apparent energy efficiency and scalability advantages. What makes the bi-copter challenging to use is difficulty in control because additional mechanical structures are essential for stable movement. This paper tackles this problem by proposing a novel bi-rotor design called M-BRIC with rotatable weight rods and reinforcement learning-based controller. Two weight rods that affect the model's center of mass (CoM) allow higher maneuverability in horizontal directions. The controller of the model is trained to reach the random target point reliably using Proximal Policy Optimization (PPO). To train and test M-BRIC, NVIDIA Isaac Gym is adopted, which is a state-of-the-art physics simulation and supports superfast parallel training. Finally, four reward functions with different characteristics are designed, and the tracking performances of the controller trained with each reward function are compared in the simulation.