Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models (ICCV 2025)

Hongyang Wei^1,3,* Shuaizheng Liu^2,3,* Chun Yuan^1,† Lei Zhang^2,3,†

¹Tsinghua Shenzhen International Graduate School, Tsinghua University

²The Hong Kong Polytechnic University, ³OPPO Research Institute

⭐ If PURE is helpful to your images or projects, please help star this repo. Thanks! 🤗

🚩Accepted by ICCV 2025

📢 News

2025.07.07 🎉🎉🎉 Tanining code is released! 🎉🎉🎉
2025.04.11 🎉🎉🎉 Inference code and checkpoints are released! 🎉🎉🎉
2025.3.13 🎉🎉🎉 PURE is released! 🎉🎉🎉

🎬 Overview

📷 Results

Quantitative Comparisons (click to expand)

Visual Comparisons (click to expand)

⚙️ Installation

git clone https://github.com/nonwhy/PURE.git && cd PURE
conda create -n pure python=3.10 -y
conda activate pure
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
pip install -e .

You need to download the VQ-VAE model from LlamaGen and place it in pure/tokenizer/vq_ds8_c2i.pt.

💡 Inference

🔑 Simple Inference

The simple code for PURE inference:

from inference_solver import FlexARInferenceSolver
from PIL import Image
from utils.wavelet_color_fix import wavelet_color_fix

inference_solver = FlexARInferenceSolver(
    model_path="nonwhy/PURE",
    precision="bf16",
    target_size=512,
)

image_path = "/path/to/example_input.png"
image = Image.open(image_path)

if image.size != (512, 512):
    image = image.resize((512, 512), Image.BICUBIC)

q1 = "Perceive the degradation level, understand the image content, and restore the high-quality image. <|image|>"
images = [image]
qas = [[q1, None]]

generated = inference_solver.generate(
                images=images,
                qas=qas,
                max_gen_len=8192,
                temperature=0.9,
                logits_processor=inference_solver.create_logits_processor(
                            cfg=0.8,
                            text_top_k=1
                        ),
            )

new_image = generated[1][0]
new_image = wavelet_color_fix(new_image, image)
new_image.save("./example_output.png", "PNG")
text=generated[0]
print(text)

The final output image resolution is 512x512, so your input image resolution should be 128x128 (4x), 256x256 (2x), or 512x512 (1x). You can adjust the temperature and CFG(Classifier-Free Guidance) parameters to achieve different restoration results.

🚀 Accelerate Inference

We can seamlessly accelerate inference through Speculative Jacobi Decoding:

python test_pure_jacobi.py

🌈 Train

Please refer to TRAIN.md.

📚 Training Datasets

Following SeeSR, we train PURE on LSDIR+FFHQ10k. To generate realistic LQ-HQ image pairs for training, we apply the degradation pipeline from Real-ESRGAN.

🤗 Checkpoints

Model	Size	Resolution	Huggingface
PURE	7B	512	nonwhy/PURE

💕 Acknowledgements

Thanks to the following excellent open-source projects:

🎫 License

This project is released under the MIT License.

📧 Contact

If you have any questions, please feel free to contact: [email protected]

🎓Citations

If our PURE helps your research or work, please consider citing our paper:

@misc{wei2025perceiveunderstandrestorerealworld,
      title={Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models}, 
      author={Hongyang Wei and Shuaizheng Liu and Chun Yuan and Lei Zhang},
      year={2025},
      eprint={2503.11073},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.11073}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figs		figs
pure		pure
xllmx		xllmx
LICENSE		LICENSE
README.md		README.md
TRAIN.md		TRAIN.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models (ICCV 2025)

🚩Accepted by ICCV 2025

📢 News

🎬 Overview

📷 Results

⚙️ Installation

💡 Inference

🔑 Simple Inference

🚀 Accelerate Inference

🌈 Train

📚 Training Datasets

🤗 Checkpoints

💕 Acknowledgements

🎫 License

📧 Contact

🎓Citations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

nonwhy/PURE

Folders and files

Latest commit

History

Repository files navigation

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models (ICCV 2025)

🚩Accepted by ICCV 2025

📢 News

🎬 Overview

📷 Results

⚙️ Installation

💡 Inference

🔑 Simple Inference

🚀 Accelerate Inference

🌈 Train

📚 Training Datasets

🤗 Checkpoints

💕 Acknowledgements

🎫 License

📧 Contact

🎓Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages