SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

1Shanghai Jiao Tong University 2Ant Group 3Visual Geometry Group, University of Oxford 4Meta AI 5Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 6Zhejiang Key Laboratory of Industrial Intelligence and Digital Twin

*Denotes equal contribution     Indicates corresponding author


CVPR 2026

overview

SceneScribe-1M offers more than one million dynamic scenes spanning over 4,000 hours, featuring comprehensive semantic and geometric annotations.

Abstract

The convergence of 3D geometric perception and video synthesis has created an unprecedented demand for large-scale video data that is rich in both semantic and spatio-temporal information. While existing datasets have advanced either 3D understanding or video generation, a significant gap remains in providing a unified resource that supports both domains at scale. To bridge this chasm, we introduce SceneScribe-1M, a new large-scale, multi-modal video dataset. It comprises one million in-the-wild videos, each meticulously annotated with detailed textual descriptions, precise camera parameters, dense depth maps, and consistent 3D point tracks. We demonstrate the versatility and value of SceneScribe-1M by establishing benchmarks across a wide array of downstream tasks, including monocular depth estimation, scene reconstruction, and dynamic point tracking, as well as generative tasks such as text-to-video synthesis, with or without camera control. By open-sourcing SceneScribe-1M, we aim to provide a comprehensive benchmark and a catalyst for research, fostering the development of models that can both perceive the dynamic 3D world and generate controllable, realistic video content.

Annotation Visualization

Showcases w/o Tracks.

Sample 1
Sample 2
Sample 3

Showcases w/ Tracks.

Sample 1
Sample 2
Sample 3

BibTeX

@inproceedings{wang2026scenescribe1m,
title={SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations}, 
author={Yunnan Wang and Kecheng Zheng and Jianyuan Wang and Minghao Chen and David Novotny and Christian Rupprecht and Yinghao Xu and Xing Zhu and Wenjun Zeng and Xin Jin and Yujun Shen},
booktitle={CVPR},
year={2026}
}

Acknowledgements

This website adapted from Nerfies template.