Skip to content

penghao-wu/ProxyV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProxyV: Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM

Static Badge Static Badge Static Badge

pipeline

Contents:

  1. Getting Started
  2. Image Encoding Scheme
  3. Training
  4. Evaluation
  5. License
  6. Citation
  7. Acknowledgement

Getting Started

Installation

conda create -n proxyv python=3.10 -y
conda activate proxyv
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

Training Dataset

For the pre-training stage, we use the 1.2M ShareGPT4V data which can be downloaded at this link For the fine-tuning stage, we use the public LLaVA-Next data which can be downloaded at this link

Image Encoding Scheme

In our current implementation, we adopt the AnyRes strategy. The image features within each crop are flattened in raster order and concatenated crop by crop, similar to the UniRes strategy. We also append a newline separator token after each crop.
To process the vision tokens more conveniently, we pack tokens in the [vision tokens; proxy tokens; newline separator tokens; text tokens] order, and modify the position_ids and attention_masks accordingly to preserve their original relative order.

Training

The pre-training scripts can be found within the scripts/pretrain folder, and fine-tuning example scripts are provided under the scripts/finetune folder.
To enable ProxyV, set --proxyv to true in the script and set --proxyv_start_layer to the desired layer index.

Evaluation

The vicuna-1.5-7B ProxyV layer-12 model studied in the paper is provided at this [link](vicuna-1.5-7B ProxyV layer-12 model). A simple inference example script is provided at demo.py.
All benchmark evaluations can be directly conducted using lmms-eval with --model set to llava.

License

This project is under the Apache-2.0 license. See LICENSE for details.

Citation

Please consider citing our paper if you find this project helpful for your research:

@article{ProxyV,
  author    = {Wu, Penghao and Lu, Lewei and Liu, Ziwei},
  title     = {Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM},
  journal={arXiv preprint arXiv:2505.15816},
  year={2025}}

Acknowledgement

About

[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM

Resources

License

Stars

Watchers

Forks