
Contributions
- We propose a framework that incrementally builds a structured representation of the environment, enabling the VLM to make more informed decisions.
- We design an efficient two-stage navigation policy based on this representation, combining high-level planning guided by the VLM's reasoning and low-level exploration with VLM's assistance.
- STRIVE achieves state-of-the-art performance on simulated benchmarks (HM3D, RoboTHOR, MP3D) and shows strong performance in diverse and complex real-world environments.