Accurate estimation of 3D geometry and camera motion enables a wide range of tasks in robotics and autonomous vehicles. However, the lack of semantics and the performance degradation due to dynamic objects hinder its application to real-world scenarios. To overcome these limitations, we design a novel neural semantic visual odometry (VO) architecture on top of the simultaneous VO, object detection and instance segmentation (SimVODIS) network. Next, we propose an attentive pose estimation architecture with a multi-task learning formulation for handling dynamic objects and VO performance enhancement. The extensive experiments conducted in our work attest that the proposed SimVODIS++ improves the VO performance in dynamic environments. Further, SimVODIS++ focuses on salient regions while excluding feature-less regions. Performing the experiments, we have discovered and fixed the data leakage problem in the conventional experiment setting followed by numerous previous works-which we claim as one of our contributions. We make the source code public.