Abstract:With the continuous development of computer vision and artificial intelligence (AI) in recent years, embodied AI has received widespread attention from academia and industry at home and abroad. Embodied AI emphasizes that an agent should actively obtain real feedback from the physical world by interacting with the environment in a contextualized way and make itself more intelligent through learning from the feedback. As one of the concrete tasks of embodied AI, object goal navigation requires an agent to search for and navigate to a specified object goal (e.g., find a sink) in a previously unknown, complex, and semantically rich scenario. Object goal navigation has great potential for applications in smart assistants that support daily human activities, serving as a fundamental and antecedent task for other interaction-based embodied AI research. This study systematically classifies current research on object goal navigation. Firstly, the knowledge related to environmental representation and autonomous visual exploration is introduced, and existing object goal navigation methods are classified and analyzed from three different perspectives. Secondly, two categories of higher-level object rearrangement tasks are introduced, with a description of datasets for realistic indoor environment simulation, evaluation metrics, and a generic training paradigm for navigation strategies. Finally, the performance of existing object goal navigation strategies is compared and analyzed on different datasets. The challenges in this field are summarized, and development trends are predicted.