ProxyV: Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM

Getting Started

Installation

conda create -n proxyv python=3.10 -y
conda activate proxyv
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

Training Dataset

For the pre-training stage, we use the 1.2M ShareGPT4V data which can be downloaded at this link For the fine-tuning stage, we use the public LLaVA-Next data which can be downloaded at this link

Image Encoding Scheme

In our current implementation, we adopt the AnyRes strategy. The image features within each crop are flattened in raster order and concatenated crop by crop, similar to the UniRes strategy. We also append a newline separator token after each crop.
To process the vision tokens more conveniently, we pack tokens in the [vision tokens; proxy tokens; newline separator tokens; text tokens] order, and modify the position_ids and attention_masks accordingly to preserve their original relative order.

Training

The pre-training scripts can be found within the scripts/pretrain folder, and fine-tuning example scripts are provided under the scripts/finetune folder.
To enable ProxyV, set --proxyv to true in the script and set --proxyv_start_layer to the desired layer index.

Evaluation

The vicuna-1.5-7B ProxyV layer-12 model studied in the paper is provided at this [link](vicuna-1.5-7B ProxyV layer-12 model). A simple inference example script is provided at demo.py.
All benchmark evaluations can be directly conducted using lmms-eval with --model set to llava.

License

This project is under the Apache-2.0 license. See LICENSE for details.

Citation

Please consider citing our paper if you find this project helpful for your research:

@article{ProxyV,
  author    = {Wu, Penghao and Lu, Lewei and Liu, Ziwei},
  title     = {Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM},
  journal={arXiv preprint arXiv:2505.15816},
  year={2025}}

Acknowledgement

This work is built upon LLaVA-NeXT.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
docs		docs
llava		llava
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProxyV: Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM

Contents:

Getting Started

Installation

Training Dataset

Image Encoding Scheme

Training

Evaluation

License

Citation

Acknowledgement

About

Uh oh!

Languages

License

penghao-wu/ProxyV

Folders and files

Latest commit

History

Repository files navigation

ProxyV: Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM

Contents:

Getting Started

Installation

Training Dataset

Image Encoding Scheme

Training

Evaluation

License

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages