IVGM: Large Scale Indoor Visual-Geometric Multimodal Dataset and Benchmark for Novel View Synthesis

Junming Cao1,2, Xiting Zhao3, Sören Schwertfeger3
1Shanghai Advanced Research Institute, Chinese Academy of Sciences, 2University of Chinese Academy of Sciences, 3ShanghaiTech University

Abstract

Accurate reconstruction of indoor environments is crucial for applications in augmented reality, virtual reality, and robotics. However, existing indoor datasets are often limited in scale, lack ground-truth point clouds, and provide insufficient viewpoints, which impedes the development of robust novel view synthesis (NVS) techniques. To address these limitations, we introduce a new large-scale indoor dataset that features diverse and challenging scenes, including basements and long corridors. This dataset offers panoramic image sequences for comprehensive coverage, high-resolution point clouds, meshes, and textures as ground truth, and a novel benchmark specifically designed to evaluate NVS algorithms in complex indoor environments. Our dataset and benchmark aim to advance indoor scene reconstruction and facilitate the creation of more effective NVS solutions for real-world applications.

Dataset Overview

The IVGM dataset encompasses a diverse array of environments, meticulously captured by our custom-designed data acquisition vehicle across three distinct scenes. This collection includes two segments from school office floors and one scene from underground garages. It provides:

Sequence Name Area Size(m2) Point Number Insta Images Titan Images
Office Area1 2,989.63 76,488,066 1,610 12,872
Office Area2 2,651.00 86,233,513 2,669 21,608
Underground Garage 3,797.11 153,185,271 1,816 14,528
Overview of 3DRef dataset

Benchmark Results

To evaluate the applicability, versatility, and performance of our dataset on novel view synthesis algorithms, we tested several popular methods developed in recent years.

Key results show:

  • The choice of camera system can significantly affect the performance of NVS algorithms.
  • Models trained with images from all five cameras better fit the images from all perspectives.
  • The inclusion of LiDAR point cloud data significantly enhances the performance of the novel view synthesis (NVS) algorithms.
Lidar benchmark results
Quantitative Comparison of Different Novel View Synthesis Methods in IVGM Dataset.
RGB benchmark results
Results of Different Input Image Dataset on IVGM Dataset. Images of Insta-single results are rendered using Insta-five camera pose and vice versa.
RGB benchmark results
Gaussian Splatting with Different Input Point Clouds results.