Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

A reinforcement-learning-based visual token pruning framework to accelerate inference of Large Vision Language Models (LVLMs).

📋 Method Overview

TPRL formulates visual token pruning as a Markov Decision Process (MDP):

Learning from Demonstrations (LfD): Generate demonstration trajectories using heuristics and pretrain the policy network.
PPO Fine-tuning: Fine-tune the policy with Proximal Policy Optimization to jointly optimize task performance and computational efficiency.
Inference: One-shot pruning that retains the most important visual tokens.

Architecture

visual input → ViT → Projector → [TPRL pruner] → LLM → output

🚀 Quick Start

Installation

# Clone the repository git clone https://github.com/MagicVicCoder/TPRL.git cd TPRL # Install requirements pip install -r requirements.txt

Training

Step 1: Learning from Demonstrations

python train_lfd.py

Step 2: PPO Training

# Set the LfD checkpoint path in config.py first python train_ppo.py

Evaluation

python main.py

📁 Project Structure

TPRL/ ├── model/ │ ├── autoencoder.py # Token compression (optional) │ ├── rl_networks.py # Policy and value networks │ ├── llava_mllm.py # LLaVA model wrapper │ └── qwen_mllm.py # Qwen model wrapper ├── pruner/ │ ├── rl_pruner.py # RL-based pruner │ ├── random_pruner.py # Baseline random pruner │ └── mlp_pruner.py # MLP-based pruner ├── train_lfd.py # LfD training script ├── train_ppo.py # PPO training script ├── config.py # Configuration └── main.py # Evaluation / inference script

🎯 Core Idea

MDP Formulation

State: (visual tokens, text query)
Action: keep / prune decision for each token
Reward: downstream task performance + computational efficiency

Reward Function

reward = alpha * task_reward + beta * efficiency_reward

task_reward: change in task performance (e.g., IoU / accuracy)
efficiency_reward: compression / efficiency metric

🛠️ Requirements

Python >= 3.8
PyTorch >= 2.0
Transformers >= 4.37.0
See requirements.txt for full dependency list

⭐ If you find this repository useful, please give it a Star!

📄 Citation

If you find this work useful, please cite:

@misc{cao2026languageguidedtokencompressionreinforcement, title={Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models}, author={Sihan Cao and Jianwei Zhang and Pengcheng Zheng and Jiaxin Yan and Caiyan Qin and Yalan Ye and Wei Dong and Peng Wang and Yang Yang and Chaoning Zhang}, year={2026}, eprint={2603.13394}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2603.13394} }

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
evaluator		evaluator
model		model
pruner		pruner
trainer		trainer
.gitignore		.gitignore
LLAVA_INTEGRATION.md		LLAVA_INTEGRATION.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TPRL_IMPLEMENTATION_SUMMARY.md		TPRL_IMPLEMENTATION_SUMMARY.md
TPRL_README.md		TPRL_README.md
check_tprl.py		check_tprl.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py
test_llava.py		test_llava.py
test_tprl.py		test_tprl.py
train_lfd.py		train_lfd.py
train_ppo.py		train_ppo.py
validate_llava.py		validate_llava.py
validate_static.py		validate_static.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

📋 Method Overview

Architecture

🚀 Quick Start

Installation

Training

Step 1: Learning from Demonstrations

Step 2: PPO Training

Evaluation

📁 Project Structure

🎯 Core Idea

MDP Formulation

Reward Function

🛠️ Requirements

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

📋 Method Overview

Architecture

🚀 Quick Start

Installation

Training

Step 1: Learning from Demonstrations

Step 2: PPO Training

Evaluation

📁 Project Structure

🎯 Core Idea

MDP Formulation

Reward Function

🛠️ Requirements

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages