Skip to content
/ anycam Public

Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)

License

Notifications You must be signed in to change notification settings

Brummi/anycam

Repository files navigation

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

This is the official implementation for the CVPR 2025 paper:

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

Felix Wimbauer1,2,3, Weirong Chen1,2,3, Dominik Muhle1,2, Christian Rupprecht3, and Daniel Cremers1,2
1Technical University of Munich, 2MCML, 3University of Oxford

CVPR 2025 (arXiv)

If you find our work useful, please consider citing our paper:

@inproceedings{wimbauer2025anycam,
  title={AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos},
  author={Wimbauer, Felix and Chen, Weirong and Muhle, Dominik and Rupprecht, Christian and Cremers, Daniel},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

News

2025/05 Add integration for PyTorch Hub and upload model files to HuggingFace.

2025/04 Improved demo script and colmap export.

2025/04 Initial code release.

ToDos

  • Project page with interactive demos
  • Demo script
  • PyTorch Hub integration
  • HuggingFace space
  • Scripts for training data

Setting Up the Environment

To set up the environment, follow these steps individually or see below:

  1. Create a new conda environment with Python 3.11:

    conda create -n anycam python=3.11
  2. Activate the conda environment:

    conda activate anycam
  3. Install pytorch according to your CUDA version:

    pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
  4. Install the corresponding cudatoolkit for compilation:

    conda install -c nvidia cuda-toolkit
  5. Install the required packages from requirements.txt:

    pip install -r requirements.txt

Combined, this yields the following comand. Building might take a few minutes.

conda create -n anycam python=3.11 -y && \
conda activate anycam && \
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 && \
conda install -c nvidia cuda-toolkit -y && \
pip install -r requirements.txt

Note on dependencies

We use a slightly customized fork of UniMatch and UniDepth (in order to ensure backward-compatibility). Furthermore, we use the minipytorch3d variant by VGGSfM.

Download pretrained checkpoint

To download pretrained models, you can use the download_checkpoints.sh script. Follow these steps:

  1. Open a terminal and navigate to the root directory of the repository.

  2. Run the download_checkpoints.sh script with the desired model name. For example, to download the final anycam_seq8 model, use the following command:

    ./download_checkpoints.sh anycam_seq8

This will download and unpack the pretrained model into the pretrained_models directory. You can then use the downloaded model for evaluation or further training.

Demo

You can use the demo script to process custom videos and extract camera trajectories, depth maps, and 3D point clouds.

AnyCam Demo Example

Basic Usage

We provide a simple script to run the basic functionality of AnyCam and export the results. The results can either be visualized in rerun.io, or exported to the Colmap format. To run the model in feed-forward only mode, turn off the ba_refinement flag. If the provided video has a high framerate, we recommend to subsample the video to a lower framerate by adding the fps=10 flag.

# Full model
python anycam/scripts/anycam_demo.py \
    ++input_path=/path/to/video.mp4 \ 
    ++model_path=pretrained_models/anycam_seq8 \
    ++visualize=true

# Feed-foward only without refinement
python anycam/scripts/anycam_demo.py \
    ++input_path=/path/to/video.mp4 \
    ++model_path=pretrained_models/anycam_seq8 \
    ++ba_refinement=false \
    ++visualize=true

Visualization with Remote Setup

If you are developing on a remote server, you can start rerun.io as a webserver. First, open a new terminal on your remote machine and start the viewer:

rerun --serve-web

Then, forward port 9090 to your local machine. Finally, make sure to launch the script with the ++rerun_mode=connect. You should be able to view the results in your browser under:

http://localhost:9090/?url=ws://localhost:9877

Export Options

Export to COLMAP format:

python anycam/scripts/anycam_demo.py \
    ++input_path=/path/to/video.mp4 \
    ++model_path=pretrained_models/anycam_seq8 \
    ++export_colmap=true \
    ++output_path=/path/to/output_dir

Save trajectory, depth maps, and other results:

python anycam/scripts/anycam_demo.py \
    ++input_path=/path/to/video.mp4 \
    ++model_path=pretrained_models/anycam_seq8 \
    ++output_path=/path/to/output_dir

PyTorch Hub Integration

AnyCam is also available through PyTorch Hub, making it easy to use the model in your own projects:

# Load the model
anycam = torch.hub.load('Brummi/anycam', 'AnyCam', version="1.0", training_variant="seq8", pretrained=True)

# Process a list of frames (H,W,3) [0,1] with or without bundle adjustment refinement
results = anycam.process_video(frames, ba_refinement=True)

# Access the results
trajectory = results["trajectory"]  # Camera poses
depths = results["depths"]          # Depth maps
uncertainties = results["uncertainties"]  # Uncertainty maps
projection_matrix = results["projection_matrix"]  # Camera intrinsics

The process_video function accepts the following parameters:

  • frames: List of frames as numpy arrays with shape (H,W,3) and values in [0,1]
  • config: Optional configuration dictionary for processing
  • ba_refinement: Whether to perform bundle adjustment (default: True)

Evaluation

To evaluate the AnyCam model, run the following command:

python anycam/scripts/evaluate_trajectories.py -cn evaluate_trajectories ++model_path=pretrained_models/anycam_seq8

You can also enable the with_rerun option during evaluation to plot the process to rerun.io:

python anycam/scripts/evaluate_trajectories.py -cn evaluate_trajectories ++model_path=pretrained_models/anycam_seq8 ++fit_video.ba_refinement.with_rerun=true

Visualization

You can use the Jupyter notebook anycam/scripts/anycam_4d_plot.ipynb for visualizing the results.

For more details, refer to the individual scripts and configuration files in the repository.

Training

Data Preparation

We use five datasets to train AnyCam:

  1. RealEstate10K
  2. YouTube VOS
  3. WalkingTours
  4. OpenDV
  5. EpicKitchens

We will soon release instructions on how to setup the data.

Training Stages

To train the AnyCam model, run the following commands. The provided setup assumes two A100 40GB GPUs. If your setup is different, modify the ++nproc_per_node and ++backend flags.

First stage (2 frames):

python train_anycam.py -cn anycam_training  \
    ++nproc_per_node=2 \
    ++backend=nccl \
    ++name=anycam_seq2 \
    ++output.unique_id=baseline

Second stage (8 frames):

python train_campred.py -cn anycam_training \
    ++nproc_per_node=2 \
    ++backend=nccl \
    ++name=anycam_seq8 \
    ++output.unique_id=baseline \
    ++batch_size=4 \
    ++dataset_params.frame_count=8 \
    ++training.optimizer.args.lr=1e-5 \
    ++training.from_pretrained=out/anycam_training/anycam_seq2_backend-nccl-2_baseline/training_checkpoint_247500.pt \
    ~dataloading.staged_datasets \
    ++validation.validation.fit_video_config=cam_pred/configs/eval_cfgs/train_eval.yaml \
    ++loss.0.lambda_label_scale=100

About

Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published