Juncheng Li1, Siliang Tang1, Hanwang Zhang2, Yueting Zhuang1
1Zhejiang University, 2Nanyang Technological University, 3Alibaba Group
*Equal Contribution.
AnyEdit is a comprehensive multimodal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types across five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity.
- [2025.04.05] 🎉AnyEdit has been accepted by CVPR 2025 (5,5,5) as Oral presentation.
- We have released the training & inference scripts and model weight of AnySD. If you want more details on training or using AnySD to complete your desired image editing, please refer to our model repo for more details.
- [2024.12.23] We have finished uploading the AnyEdit datasets with AnyEdit-Test benchmark and the AnyEdit data curation pipelines.
- Release AnyEdit datasets.
- Release AnyEdit-Test Benchmark.
- Release data curation pipelines.
- Release inference code.
- Release training scripts.
Full training set and dev set are publicly available on Huggingface. We only provide a zip file for the test split to prevent potential data contamination from foundation models crawling the test set for training. Please download the test set here.
We comprehensively categorize image editing tasks into 5 groups based on different editing capabilities:
- (a) Local Editing which focuses on region-based editing (green area);
- (b) Global Editing which focuses on the full range of image rendering (yellow area);
- (c) Camera Move Editing which focuses on viewpoints changing instead of scenes (gray area);
- (d) Implicit Editing which requires commonsense knowledge to complete complex editing (orange area);
- (e) Visual Editing which encompasses additional visual inputs, addressing the requirements for multi-modal editing (blue area).
- General Data Preparation
- Diverse Instruction Generation
- Adaptive Editing Pipelines
- Data Quality Enhancement
{
"edit": "change the airplane to green", # edited instruction
"edited object": "airplane", # the edited region, only for local editing, else is None
"input": "a small airplane sits stationary on a piece of concrete.", # the caption of the original image
"output": "A green small airplane sits stationary on a piece of concrete.", # the caption of the edited image
"edit_type": "color_alter", # editing type
"visual_input": "None", # the reference image for visual input instruction, else is None
"image_file": "COCO_train2014_000000521165.jpg", # the file of original image
"edited_file": "xxxxx.png" # the file of edited image
}
- Conda a new python environment and Download the pretrained weights
bash setup.sh
- Download all of our candidate datasets.
- Instruction Generation (please ref to CaptionsGenerator).
- Pre-filter for target images (before editing)
CUDA_VISIBLE_DEVICES=2 python pre_filter.py --instruction-path [xx.json] --instruction-type [] --image-root []
- Image Editing (refer to scripts for more examples)
- Post-filter for final datasets
CUDA_VISIBLE_DEVICES=2 python post_filter.py --instruction-type []
- Datasets/
- anyedit_datasets/
- add
- remove
- replace
- coco/
- train2014/
- 0.jpg
- 1.jpg
- train2014/
- flux_coco_images/
- 0.jpg
- 1.jpg
- add_postfilter.json
- remove_postfilter.json
- replace_postfilter.json
- anyedit_datasets/
If you find this work useful for your research, please cite our paper and star our git repo:
@article{yu2024anyedit,
title={AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea},
author={Yu, Qifan and Chow, Wei and Yue, Zhongqi and Pan, Kaihang and Wu, Yang and Wan, Xiaoyang and Li, Juncheng and Tang, Siliang and Zhang, Hanwang and Zhuang, Yueting},
journal={arXiv preprint arXiv:2411.15738},
year={2024}
}