This is the official implementation for the CVPR 2025 paper:
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos
Felix Wimbauer1,2,3, Weirong Chen1,2,3, Dominik Muhle1,2, Christian Rupprecht3, and Daniel Cremers1,2
1Technical University of Munich, 2MCML, 3University of Oxford
If you find our work useful, please consider citing our paper:
@inproceedings{wimbauer2025anycam,
title={AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos},
author={Wimbauer, Felix and Chen, Weirong and Muhle, Dominik and Rupprecht, Christian and Cremers, Daniel},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
2025/05 Add integration for PyTorch Hub and upload model files to HuggingFace.
2025/04 Improved demo script and colmap export.
2025/04 Initial code release.
- Project page with interactive demos
- Demo script
- PyTorch Hub integration
- HuggingFace space
- Scripts for training data
To set up the environment, follow these steps individually or see below:
-
Create a new conda environment with Python 3.11:
conda create -n anycam python=3.11
-
Activate the conda environment:
conda activate anycam
-
Install pytorch according to your CUDA version:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
-
Install the corresponding cudatoolkit for compilation:
conda install -c nvidia cuda-toolkit
-
Install the required packages from
requirements.txt
:pip install -r requirements.txt
Combined, this yields the following comand. Building might take a few minutes.
conda create -n anycam python=3.11 -y && \
conda activate anycam && \
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 && \
conda install -c nvidia cuda-toolkit -y && \
pip install -r requirements.txt
We use a slightly customized fork of UniMatch and UniDepth (in order to ensure backward-compatibility). Furthermore, we use the minipytorch3d variant by VGGSfM.
To download pretrained models, you can use the download_checkpoints.sh
script. Follow these steps:
-
Open a terminal and navigate to the root directory of the repository.
-
Run the
download_checkpoints.sh
script with the desired model name. For example, to download the finalanycam_seq8
model, use the following command:./download_checkpoints.sh anycam_seq8
This will download and unpack the pretrained model into the pretrained_models
directory. You can then use the downloaded model for evaluation or further training.
You can use the demo script to process custom videos and extract camera trajectories, depth maps, and 3D point clouds.
We provide a simple script to run the basic functionality of AnyCam and export the results. The results can either be visualized in rerun.io, or exported to the Colmap format.
To run the model in feed-forward only mode, turn off the ba_refinement
flag.
If the provided video has a high framerate, we recommend to subsample the video to a lower framerate by adding the fps=10
flag.
# Full model
python anycam/scripts/anycam_demo.py \
++input_path=/path/to/video.mp4 \
++model_path=pretrained_models/anycam_seq8 \
++visualize=true
# Feed-foward only without refinement
python anycam/scripts/anycam_demo.py \
++input_path=/path/to/video.mp4 \
++model_path=pretrained_models/anycam_seq8 \
++ba_refinement=false \
++visualize=true
If you are developing on a remote server, you can start rerun.io as a webserver. First, open a new terminal on your remote machine and start the viewer:
rerun --serve-web
Then, forward port 9090 to your local machine. Finally, make sure to launch the script with the ++rerun_mode=connect
.
You should be able to view the results in your browser under:
http://localhost:9090/?url=ws://localhost:9877
Export to COLMAP format:
python anycam/scripts/anycam_demo.py \
++input_path=/path/to/video.mp4 \
++model_path=pretrained_models/anycam_seq8 \
++export_colmap=true \
++output_path=/path/to/output_dir
Save trajectory, depth maps, and other results:
python anycam/scripts/anycam_demo.py \
++input_path=/path/to/video.mp4 \
++model_path=pretrained_models/anycam_seq8 \
++output_path=/path/to/output_dir
AnyCam is also available through PyTorch Hub, making it easy to use the model in your own projects:
# Load the model
anycam = torch.hub.load('Brummi/anycam', 'AnyCam', version="1.0", training_variant="seq8", pretrained=True)
# Process a list of frames (H,W,3) [0,1] with or without bundle adjustment refinement
results = anycam.process_video(frames, ba_refinement=True)
# Access the results
trajectory = results["trajectory"] # Camera poses
depths = results["depths"] # Depth maps
uncertainties = results["uncertainties"] # Uncertainty maps
projection_matrix = results["projection_matrix"] # Camera intrinsics
The process_video
function accepts the following parameters:
frames
: List of frames as numpy arrays with shape (H,W,3) and values in [0,1]config
: Optional configuration dictionary for processingba_refinement
: Whether to perform bundle adjustment (default: True)
To evaluate the AnyCam model, run the following command:
python anycam/scripts/evaluate_trajectories.py -cn evaluate_trajectories ++model_path=pretrained_models/anycam_seq8
You can also enable the with_rerun
option during evaluation to plot the process to rerun.io:
python anycam/scripts/evaluate_trajectories.py -cn evaluate_trajectories ++model_path=pretrained_models/anycam_seq8 ++fit_video.ba_refinement.with_rerun=true
You can use the Jupyter notebook anycam/scripts/anycam_4d_plot.ipynb
for visualizing the results.
For more details, refer to the individual scripts and configuration files in the repository.
We use five datasets to train AnyCam:
We will soon release instructions on how to setup the data.
To train the AnyCam model, run the following commands. The provided setup assumes two A100 40GB GPUs. If your setup is different, modify the ++nproc_per_node
and ++backend
flags.
First stage (2 frames):
python train_anycam.py -cn anycam_training \
++nproc_per_node=2 \
++backend=nccl \
++name=anycam_seq2 \
++output.unique_id=baseline
Second stage (8 frames):
python train_campred.py -cn anycam_training \
++nproc_per_node=2 \
++backend=nccl \
++name=anycam_seq8 \
++output.unique_id=baseline \
++batch_size=4 \
++dataset_params.frame_count=8 \
++training.optimizer.args.lr=1e-5 \
++training.from_pretrained=out/anycam_training/anycam_seq2_backend-nccl-2_baseline/training_checkpoint_247500.pt \
~dataloading.staged_datasets \
++validation.validation.fit_video_config=cam_pred/configs/eval_cfgs/train_eval.yaml \
++loss.0.lambda_label_scale=100