Skip to content
/ V2M4 Public

A novel 4D reconstruction method that directly generates high-quality, animation-ready 4D mesh asset (.GLB file) from a single monocular video.

License

Notifications You must be signed in to change notification settings

WindVChen/V2M4

Repository files navigation

V2M4's preface

Share us a ⭐ if this repo does help

This repository is the official implementation of V2M4 (Accepted by ICCV 2025)! 🚀

If you encounter any question about the paper, please feel free to contact us. You can create an issue or just send email to me [email protected]. Also welcome for any idea exchange and discussion.

Updates

[07/04/2025] Evaluation code released.

[06/30/2025] 🎉 Code is now publicly released! We invite you to try it out. The released version includes several NEW FEATURES beyond the initial paper:

  • Support for multiple state-of-the-art 3D generators: TRELLIS, Hunyuan3D-2.0, TripoSG, and CraftsMan3D
  • Integration of advanced stereo prediction techniques: DUSt3R and VGGT for improved camera search
  • Enhanced mesh registration with the CoTracker3 tracking technique

Explore these new capabilities and let us know your feedback! 🚀🚀🚀

[06/25/2025] Paper accepted by ICCV 2025. 🎉🎉

[03/18/2025] Repository init.

Table of Contents

Abstract

V2M4's framework

We present V2M4, a novel 4D reconstruction method that directly generates a usable 4D mesh animation asset from a single monocular video. Unlike existing approaches that rely on priors from multi-view image and video generation models, our method is based on native 3D mesh generation models. Naively applying 3D mesh generation models to generate a mesh for each frame in a 4D task can lead to issues such as incorrect mesh poses, misalignment of mesh appearance, and inconsistencies in mesh geometry and texture maps. To address these problems, we propose a structured workflow that includes camera search and mesh reposing, condition embedding optimization for mesh appearance refinement, pairwise mesh registration for topology consistency, and global texture map optimization for texture consistency. Our method outputs high-quality 4D animated assets that are compatible with mainstream graphics and game software. Experimental results across a variety of animation types and motion amplitudes demonstrate the generalization and effectiveness of our method.

Requirements & Installation

  1. Hardware Requirements

    • GPU: 1x high-end NVIDIA GPU with at least 40GB memory
  2. Installation

    • Please follow the detailed instructions in Install.md to install the required packages and set up the environment.
    • The code has been tested with Python 3.10 + Pytorch 2.4.0 + CUDA 12.1.
  3. Datasets

    • There have been demo-datasets in examples, you can directly run the reconstruction code below to see the results.
    • If you want to test your own videos, please follow the format of the demo datasets. (The input image frames could be either RGB images with background or transparent RGBA images.) We also provide some useful data preparation scripts in data_preparation_tools to help you prepare the input data. (For better performance, it is recommended that the object in the first frame has minimal part overlap and that no parts are touching each other. This helps avoid artificial topology issues during reconstruction.)

4D Mesh Animation Reconstruction

To reconstruct a 4D mesh animation from a single monocular video, please refer to the following command:

python main.py \
  --root {your_video_folder} \
  --output {your_output_folder} \
  --model Hunyuan \  # Performance order in our experiments (from good to bad): Hunyuan ≈ TripoSG > TRELLIS ≈ Craftsman3D. (Actual performance may vary depending on your data and use case.)
  --N 1 \
  --n 0 \
  --skip 5 \
  --seed 42 \
  --use_vggt         # (Highly Recommend) Use VGGT for camera search; omit for USING DUSt3R
  --baseline         # (optional) Run the baseline model, i.e., directly use the 3D mesh generator to generate a mesh for each frame without V2M4
  --use_tracking     # (optional) Use point tracking for mesh registration guidance, will add more memory usage and time cost
  --blender_path {your_blender_path}  # Directory path of Blender executable

Argument Descriptions:

  • --root: Root directory of the dataset
  • --output: Output directory for results (there will be quite detailed intermediate results saved in this folder for debugging and analysis, the final reconstructed glb file's name will be output_animation.glb)
  • --model: Base model to use TRELLIS, Hunyuan, TripoSG, or Craftsman (Performance order in our experiments: Hunyuan ≈ TripoSG > TRELLIS ≈ Craftsman3D.)
  • --N: Total number of parallel processes (default: 1)
  • --n: Index of the current process (default: 0)
  • --skip: Skip every N frames for large object movement (default: 5)
  • --seed: Random seed for reproducibility (default: 42)
  • --baseline: Run the baseline model (flag)
  • --use_vggt: Use VGGT for camera search (omit for USING DUSt3R)
  • --use_tracking: Use point tracking for mesh registration guidance
  • --blender_path: Path to Blender executable (example: blender-4.2.1-linux-x64/)

Example:

python main.py --root examples --output results --model Hunyuan --N 1 --n 0 --skip 5 --seed 42 --use_vggt --use_tracking --blender_path blender-4.2.1-linux-x64/

Note: In some cases, the reconstruction results may not be fully satisfactory. We recommend experimenting with different random seeds and adjusting the --skip value by ±1 to potentially achieve better outcomes.

Rendering videos based on the reconstructed results

After reconstructing your 4D mesh animation, you can render videos from the generated mesh sequences using our provided script. This script will render images and videos from the reconstructed .glb mesh files for each animation.

Usage:

python rendering_video.py --result_path {your_results_folder} [--baseline] [--normal] [--interpolate N]

Argument Descriptions:

  • --result_path: Path to the folder containing your reconstructed results (default: results)
  • --baseline: (Optional) Also render videos for the baseline mesh results
  • --normal: (Optional) Render normal maps in addition to texture images
  • --interpolate N: (Optional) Number of interpolation steps between frames for smoother animation (default: 1)

Example:

python rendering_video.py --result_path results --baseline --normal

Output: This will generate the following outputs for each animation:

  • Individual rendered images in output_final_rendering_images/ subfolder
  • Main animation video: output_final_rendering_video.mp4
  • Interpolated animation video: output_final_rendering_video_interpolated_{N}.mp4 (if --interpolate > 1)
  • Normal map videos (if --normal flag is used)
  • Baseline comparison videos (if --baseline flag is used)

Requirements:

  • Your results folder should contain the reconstructed .glb files with _texture_consistency_sample.glb suffix
  • An extrinsics_list.pkl file for each animation (automatically generated during reconstruction)
  • If using --baseline, files with _baseline_sample.glb suffix should also be present

Evaluation

To evaluate the quality of animation reconstruction results against ground truth videos, we provide a comprehensive evaluation script that calculates several widely used video similarity metrics.

The evaluation includes the following metrics: FVD, LPIPS , DreamSim, and CLIP Loss.

cd evaluation
python evaluation.py --gt_videos_path {path_to_GT_videos} --result_videos_path {path_to_V2M4_rendering_videos}

Please ensure your video structure follows this format:

├── path_to_GT_videos/
│   ├── animation1.mp4
│   ├── animation2.mp4
│   └── ...
├── path_to_V2M4_rendering_videos/
│   ├── animation1.mp4
│   ├── animation2.mp4
│   └── ...

Results

The results below were obtained using the TRELLIS generator with DUSt3R for camera search, as described in our initial paper. For improved performance, we recommend trying our newly supported models, such as Hunyuan3D-2.0 and VGGT.

Visual comparisons1

Citation & Acknowledgments

If you find this paper useful in your research, please consider citing:

@article{chen2025v2m4,
  title={V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video},
  author={Chen, Jianqi and Zhang, Biao and Tang, Xiangjun and Wonka, Peter},
  journal={arXiv preprint arXiv:2503.09631},
  year={2025}
}

We gratefully acknowledge the authors and contributors of the following open-source projects, whose work made this research possible: TRELLIS, CraftsMan3D, TripoSG, Hunyuan3D-2.0, DUSt3R, VGGT, CoTracker3, etc. We appreciate their commitment to open research and the broader scientific community.

License

This project is licensed under the MIT license. See LICENSE for details.

About

A novel 4D reconstruction method that directly generates high-quality, animation-ready 4D mesh asset (.GLB file) from a single monocular video.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages