Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Aleksandar Jevtić^{* 1} Christoph Reich^{* 1,2,4,5} Felix Wimbauer^1,4 Oliver Hahn² Christian Rupprecht³ Stefan Roth^2,5,6 Daniel Cremers^1,4,5

¹TU Munich ²TU Darmstadt ³University of Oxford ⁴MCML ⁵ELIZA ⁶hessian.AI *equal contribution

ICCV 2025

TL;DR: SceneDINO is unsupervised and infers 3D geometry and features from a single image in a feed-forward manner. Distilling and clustering SceneDINO's 3D feature field results in unsupervised semantic scene completion predictions. SceneDINO is trained using multi-view self-supervision.

Abstract

Semantic scene completion (SSC) aims to infer both the 3D geometry and semantics of a scene from single images. In contrast to prior work on SSC that heavily relies on expensive ground-truth annotations, we approach SSC in an unsupervised setting. Our novel method, SceneDINO, adapts techniques from self-supervised representation learning and 2D unsupervised scene understanding to SSC. Our training exclusively utilizes multi-view consistency self-supervision without any form of semantic or geometric ground truth. Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner. Through a novel 3D feature distillation approach, we obtain unsupervised 3D semantics. In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy. Linear probing our 3D features matches the segmentation accuracy of a current supervised SSC approach. Additionally, we showcase the domain generalization and multi-view consistency of SceneDINO, taking the first steps towards a strong foundation for single image 3D scene understanding.

News

09/07/2025: ArXiv preprint and code released. 🚀

Setup (Installation & Datasets)

Python Environment

Our Python environment is managed with Conda.

conda env create -f environment.yml
conda activate scenedino

Datasets

We provide configuration files for the datasets SceneDINO is trained and evaluated on. Adjust these files and, most importantly, insert the data paths you use.

configs/dataset/kitti_360_sscbench.yaml
configs/dataset/cityscapes_seg.yaml
configs/dataset/bdd_seg.yaml
configs/dataset/realestate10k.yaml

KITTI-360

To download KITTI-360, create and account and follow the instructions on the official website. We require the perspective images, fisheye images, raw velodyne scans, calibrations, and vehicle poses.

Checkpoints

Our pre-trained checkpoints are stored in the CVG webshare. Download one of the checkpoints using the dedicated script. To replicate our results using ORB-SLAM3, we provide the obtained poses in datasets/kitti_360/orb_slam_poses.

# Download best model trained on KITTI-360 (SSCBench split)
python download_checkpoint.py ssc-kitti-360-dino
python download_checkpoint.py ssc-kitti-360-dino-orb-slam
python download_checkpoint.py ssc-kitti-360-dinov2

Table 1. SSCBench-KITTI-360 results. We compare SceneDINO to the STEGO + S4C baseline in unsupervised SSC using the mean intersection over union score (mIoU) in %.

Method	Checkpoint	mIoU
		12.8m	25.6m	51.2m
Baseline	-	10.53	9.26	6.60
SceneDINO	ssc-kitti-360-dino	10.76	10.01	8.00
SceneDINO (ORB-SLAM3 poses)	ssc-kitti-360-dino-orb-slam	10.88	9.86	7.88
SceneDINO (DINOv2)	ssc-kitti-360-dinov2	13.76	11.78	9.08

Inference Demo Script

This simple demo script demonstrates loading a model and performing inference in 3D and rendered 2D. It can be used as a starting point to experiment with SceneDINO feature fields.

python demo_script.py -h

# First image of kitti-360 test set
python demo_script.py --ckpt <PATH-MODEL-CKPT>
# Custom image
python demo_script.py --ckpt <PATH-MODEL-CKPT> --image <PATH-DEMO-IMAGE>

Training

For unsupervised SSC, training is performed in two stages. We provide training configurations in configs/ for each of them.

SceneDINO

First, the 3D feature fields of SceneDINO are trained.

python train.py -cn train_scenedino_kitti_360

Unsupervised SSC

Based on a SceneDINO checkpoint, we train the unsupervised SSC head.

python train.py -cn train_semantic_kitti_360

Logging

We use TensorBoard to keep track of losses, metrics, and qualitative results.

tensorboard --port 8000 --logdir out/

Evaluation

We further provide configurations to reproduce the evaluation results from the paper.

Unsupervised 2D Segmentation

# Unsupervised 2D Segmentation
python eval.py -cn evaluate_semantic_kitti_360

Unsupervised SSC

# Unsupervised SSC, adapted from S4C (https://github.com/ahayler/s4c)
python evaluate_model_sscbench.py -ssc <PATH-SSCBENCH> -vgt <PATH-SSCBENCH-LABELS> -cp <PATH-CHECKPOINT>.pt -f -m scenedino -p <RUN-NAME>

Citation

If you find our work useful, please consider citing our paper.

@inproceedings{Jevtic:2025:SceneDINO,
    author  = {Aleksandar Jevti{\'c} and
               Christoph Reich and
               Felix Wimbauer and
               Oliver Hahn and
               Christian Rupprecht and
               Stefan Roth and
               Daniel Cremers},
    title   = {Feed-Forward {SceneDINO} for Unsupervised Semantic Scene Completion},
    journal = {IEEE/CVF International Conference on Computer Vision (ICCV)},
    year    = {2025},
}

Acknowledgements

This repository is based on the Behind The Scenes (BTS) code base.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

ICCV 2025

Abstract

News

Setup (Installation & Datasets)

Python Environment

Datasets

KITTI-360

Checkpoints

Inference Demo Script

Training

Evaluation

Citation

Acknowledgements

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
configs		configs
datasets		datasets
demo_utils		demo_utils
scenedino		scenedino
sscbench		sscbench
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
demo_gradio.py		demo_gradio.py
demo_script.py		demo_script.py
download_checkpoint.py		download_checkpoint.py
environment.yml		environment.yml
eval.py		eval.py
train.py		train.py

License

tum-vision/scenedino

Folders and files

Latest commit

History

Repository files navigation

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

ICCV 2025

Abstract

News

Setup (Installation & Datasets)

Python Environment

Datasets

KITTI-360

Checkpoints

Inference Demo Script

Training

Evaluation

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages