Skip to content

mindspore-lab/mindone

MindSpore ONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B , CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] v0.2.0 is released

Quick tour

To install v0.3.0, please install MindSpore 2.5.0 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run.

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
  • compatibale with hf diffusers 0.32.2
component features
pipeline support text-to-image,text-to-video,text-to-audio tasks 160+
models support audoencoder & transformers base models same as hf diffusers 50+
schedulers support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+

supported models under mindone/examples

task model inference finetune pretrain institute
Image-to-Video hunyuanvideo-i2v πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Tencent
Text/Image-to-Video wan2.1 πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Text-to-Image cogview4 πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Zhipuai
Text-to-Video step_video_t2v πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ StepFun
Image-Text-to-Text qwen2_vl πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Any-to-Any janus πŸ”₯πŸ”₯πŸ”₯ βœ… βœ… βœ… DeepSeek
Any-to-Any emu3 πŸ”₯πŸ”₯ βœ… βœ… βœ… BAAI
Class-to-Image varπŸ”₯πŸ”₯ βœ… βœ… βœ… ByteDance
Text/Image-to-Video hpcai open sora 1.2/2.0 πŸ”₯πŸ”₯ βœ… βœ… βœ… HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B πŸ”₯πŸ”₯ βœ… βœ… βœ… Zhipu
Text-to-Video open sora plan 1.3 πŸ”₯πŸ”₯ βœ… βœ… βœ… PKU
Text-to-Video hunyuanvideo πŸ”₯πŸ”₯ βœ… βœ… βœ… Tencent
Text-to-Video movie gen 30B πŸ”₯πŸ”₯ βœ… βœ… βœ… Meta
Video-Encode-Decode magvit βœ… βœ… βœ… Google
Text-to-Image story_diffusion βœ… βœ–οΈ βœ–οΈ ByteDance
Image-to-Video dynamicrafter βœ… βœ–οΈ βœ–οΈ Tencent
Video-to-Video venhancer βœ… βœ–οΈ βœ–οΈ Shanghai AI Lab
Text-to-Video t2v_turbo βœ… βœ… βœ… Google
Image-to-Video svd βœ… βœ… βœ… Stability AI
Text-to-Video animate diff βœ… βœ… βœ… CUHK
Text/Image-to-Video video composer βœ… βœ… βœ… Alibaba
Text-to-Image flux πŸ”₯ βœ… βœ… βœ–οΈ Black Forest Lab
Text-to-Image stable diffusion 3 πŸ”₯ βœ… βœ… βœ–οΈ Stability AI
Text-to-Image kohya_sd_scripts βœ… βœ… βœ–οΈ kohya
Text-to-Image stable diffusion xl βœ… βœ… βœ… Stability AI
Text-to-Image stable diffusion βœ… βœ… βœ… Stability AI
Text-to-Image hunyuan_dit βœ… βœ… βœ… Tencent
Text-to-Image pixart_sigma βœ… βœ… βœ… Huawei
Text-to-Image fit βœ… βœ… βœ… Shanghai AI Lab
Class-to-Video latte βœ… βœ… βœ… Shanghai AI Lab
Class-to-Image dit βœ… βœ… βœ… Meta
Text-to-Image t2i-adapter βœ… βœ… βœ… Shanghai AI Lab
Text-to-Image ip adapter βœ… βœ… βœ… Tencent
Text-to-3D mvdream βœ… βœ… βœ… ByteDance
Image-to-3D instantmesh βœ… βœ… βœ… Tencent
Image-to-3D sv3d βœ… βœ… βœ… Stability AI
Text/Image-to-3D hunyuan3d-1.0 βœ… βœ… βœ… Tencent

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava πŸ”₯ βœ… βœ–οΈ βœ–οΈ support video and image captioning

About

one for all, Optimal generator with No Exception

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages