[CVPR' 2025] JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Yunlong Lin^1*♣, Zixu Lin^1*♣, Haoyu Chen^2*, Panwang Pan^3*, Chenxin Li⁶, Sixiang Chen², Kairun Wen¹, Yeying Jin⁴, Wenbo Li^5†, Xinghao Ding^1†

¹Xiamen University, ²The Hong Kong University of Science and Technology (Guangzhou), ³Bytedance's Pico, ⁴Tencent, ⁵Huawei Noah's Ark Lab, ⁶The Chinese University of Hong Kong

Accepted by CVPR 2025

JarvisIR Gradio Demo: Showcasing image restoration capabilities under various adverse weather conditions

📮 Updates

2025.7.6: Release degradation synthesis codes!
2025.6.26: Released inference code!
2025.6.17: Released 中文知乎解读 and JarvisIR：VLM掌舵,为自动驾驶装上“火眼金睛”,不惧恶劣天气 introducing JarvisIR! 📝
2025.6.13: Released Model weights (preview version) and Huggingface Online demo 🤗 🚀 ✨.
2025.6.9: Released Gradio demo, restoration tools and SFT training code.
2025.4.8: This repo is created.

🧭 Navigation

♦️ Overview

JarvisIR (CVPR 2025) is a VLM-powered agent designed to tackle the challenges of vision-centric perception systems under unpredictable and coupled weather degradations. It leverages the VLM as a controller to manage multiple expert restoration models, enabling robust and autonomous operation in real-world conditions. JarvisIR employs a novel two-stage framework consisting of supervised fine-tuning and human feedback alignment, allowing it to effectively fine-tune on large-scale real-world data in an unsupervised manner. Supported by CleanBench, a comprehensive dataset with 150K synthetic and 80K real instruction-response pairs, JarvisIR demonstrates superior decision-making and restoration capabilities, achieving a 50% improvement in the average of all perception metrics on CleanBench-Real.

💻 Getting Started

For gradio demo runing, please follow:

Gradio Demo

For inference and model usage, please follow:

Inference Code

For image degradation data synthesis, please refer to:

Degradation Generator

For sft training and environment setup preparation, please follow:

SFT Training

🧰 Expert Models

JarvisIR integrates multiple expert restoration models to handle various types of image degradation. To test the performance of individual expert models, please refer to the instructions and scripts provided in ./package/agent_tools/.

Task	Model	Description
Super-resolution	Real-ESRGAN	Fast GAN-based model for super-resolution, deblurring, and artifact removal
Denoising	SCUNet	Hybrid UNet-based model combining convolution and transformer blocks for robust denoising
Deraining	UDR-S2Former	Uncertainty-aware transformer model for rain streak removal
	Img2img-turbo-rain	Efficient SD-turbo based model for fast and effective rain removal
Raindrop removal	IDT	Transformer-based model for de-raining and raindrop removal
Dehazing	RIDCP	Efficient dehazing model utilizing high-quality codebook priors
	KANet	Efficient dehazing network using a localization-and-removal pipeline
Desnowing	Img2img-turbo-snow	Efficient model for removing snow artifacts while preserving natural scene details
	Snowmaster	Real-world image desnowing via MLLM with multi-model feedback optimization
Low-light enhancement	Retinexformer	One-stage Retinex-based Transformer for low-light image enhancement
	HVICIDNet	Lightweight transformer for low-light and exposure correction
	LightenDiff	Diffusion-based framework for low-light enhancement

🎪 Checklist

Release preview inference code and gradio demo
Release sft training
Release Inference code
Release huggingFace online demo
Release degradation synthesis code
Release mrrhf training code
Release CleanBench dataset

🙏 Acknowledgements

We would like to express our gratitude to HuggingGPT, XTuner, and RRHF for their valuable open-source contributions which have provided important technical references for our work.

🤟 Citation

@inproceedings{jarvisir2025,
  title={JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration},
  author={Lin, Yunlong and Lin, Zixu and Chen, Haoyu and Pan, Panwang and Li, Chenxin and Chen, Sixiang and Kairun, Wen and Jin, Yeying and Li, Wenbo and Ding, Xinghao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
degradation_synthesis		degradation_synthesis
dependences		dependences
docs		docs
package		package
src/sft		src/sft
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo_gradio.py		demo_gradio.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR' 2025] JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

📮 Updates

🧭 Navigation

♦️ Overview

💻 Getting Started

🧰 Expert Models

🎪 Checklist

🙏 Acknowledgements

🤟 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

LYL1015/JarvisIR

Folders and files

Latest commit

History

Repository files navigation

[CVPR' 2025] JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

📮 Updates

🧭 Navigation

♦️ Overview

💻 Getting Started

🧰 Expert Models

🎪 Checklist

🙏 Acknowledgements

🤟 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages