Multi-robot state estimation is crucial for real-time and accurate operation, especially in complex environments where a global navigation satellite system cannot be used. Many researchers employ multiple sensor modalities, including cameras, LiDAR, and ultra-wideband (UWB), to achieve real-time state estimation. However, each sensor has specific requirements that might limit its usage. While LiDAR sensors demand a high payload capacity, camera sensors must have matching image features between robots, and UWB sensors require known fixed anchor locations for accurate positioning. This study introduces a robust localization system with a minimal sensor setup that eliminates the need for the previously mentioned requirements. We used an anchor-free UWB setup to establish a global coordinate system, unifying all robots. Each robot performs visual-inertial odometry to estimate its ego-motion in its local coordinate system. By optimizing the local odometry from each robot using inter-robot range measurements, the positions of the robots can be robustly estimated without relying on an extensive sensor setup or infrastructure. Our method offers a simple yet effective solution for achieving accurate and real-time multi-robot state estimation in challenging environments without relying on traditional sensor requirements.