IM360: Large-scale Indoor Mapping with 360 Cameras

Annonymous

Nerfies

We propose a complete pipeline for indoor mapping using omnidirectional images, consisting of three key stages: (1) Spherical SfM, (2) Geometry Reconstruction, and (3) Texture Optimization.

Abstract

We present a novel 3D reconstruction pipeline for 360° cameras for 3D mapping and rendering of indoor environments. Traditional Structure-from-Motion (SfM) methods may not work well in large-scale indoor scenes due to the prevalence of textureless and repetitive regions. To overcome these challenges, our approach (IM360) leverages the wide field of view of omnidirectional images and integrates the spherical camera model into every core component of the SfM pipeline. In order to develop a comprehensive 3D reconstruction solution, we integrate a neural implicit surface reconstruction technique to generate high-quality surfaces from sparse input data. Additionally, we utilize a mesh-based neural rendering approach to refine texture maps and accurately capture view-dependent properties by combining diffuse and specular components. We evaluate our pipeline on large-scale indoor scenes from the Matterport3D and Stanford2D3D datasets. In practice, IM360 demonstrate superior performance in terms of textured mesh reconstruction over SOTA. We observe accuracy improvements in terms of camera localization and registration as well as rendering high frequency details.

Nerfies
We provide a visual comparison between sparse matching perspective SfM and our proposed dense matching spherical SfM. In (a), ERP images are converted into a cubemap representation, after which feature matching is performed across all 36 possible image pairs, resulting in sparse and noisy correspondence matches. (b) demonstrates our approach, which directly finds dense and accurate correspondences on ERP images, thereby facilitating the construction of a detailed 3D structure.

Nerfies
By leveraging dense features on equirectangular projection (ERP) images, our method effectively finds correspondences in textureless regions, registers more camera poses, and reconstructs a greater number of 3D points.

Results: Neural Surface Reconstruction

Using accurately estimated camera poses from a 360-degree vision sensor, we can construct a geometric mesh from sparsely scanned, large-scale indoor datasets.

🚀 Try the Online Interactive Viewer Demo 🔍

2t7WUuJeko7 2t7WUuJeko7 bicycle pLe4wQe7qrG bicycle RPmz2sHmrrY bicycle YVUc4YcDtcY
Thanks to the use of mesh and neural textures, our method is easily compatible with popular graphics frameworks such as WebGL. This demonstrates the potential for our method to be integrated with widely available graphics pipelines.

Please note that the reconstructed mesh has been compressed to enable a smooth in-browser experience. We decimated 50% of the mesh triangles. Thus, the visual quality is slightly degraded compared to the visual results. ⚠️ If the renderer does not work properly, please press Ctrl + Shift + R to force-refresh the page. You may also open the Developer Console (F12) and check the logs — sometimes the viewer loads after a short delay.

Results: Texture Optimization

Nerfies
Nerfies
Nerfies
Nerfies
Nerfies