Skip to content

[ICLR 2025🔥] Official implementation of "D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models"

License

Notifications You must be signed in to change notification settings

AIoT-MLSys-Lab/d2o

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

$D_{2}O$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models


The code for ICLR 2025 paper: $D_{2}O$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models.

📃 [Paper] • 💻 [Github] • 🤗 [Huggingface]

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

We recommend using Anaconda to create a new environment and install the required packages. You can create a new environment and install the required packages using the following commands:

pip install -r requirements.txt
conda create -n d2o_v2 python=3.10
conda activate d2o_v2
pip install --upgrade pip  # enable PEP 660 support

Quick Step to Run the Code

You can run the inference code using the following command to run the Longbench sample:

CUDA_VISIBLE_DEVICES=0 python run_pred_long_bench_sample.py --model_name_or_path meta-llama/Meta-Llama-3-8B \
    --cache_dir /your_hf_home_path \
    --use_d2o True \
    --model_type llama3 \
    --hh_ratio 0.1 \
    --recent_ratio 0.1 \
    --action_name d2o_0.2 \
    --e True 
  • cache_dir stores your model weights.
  • use_d2o specifies the execution strategy name.
  • hh_ratio refers to important tokens in our main paper.
  • recent_ratio represents the proportion of the window closest to the generated token.

Then, evaluate the results:

python eval_long_bench.py --model Meta-Llama-3-8B_d2o_0.2 --e 

For tasks related to lm-evaluation-harness GitHub Repository,
we recommend using the latest version by running:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git

Then, follow the installation instructions provided in the repository and execute our algorithm accordingly.

Citation

@article{wan2024d2o,
  title={D2o: Dynamic discriminative operations for efficient generative inference of large language models},
  author={Wan, Zhongwei and Wu, Xinjian and Zhang, Yu and Xin, Yi and Tao, Chaofan and Zhu, Zhihong and Wang, Xin and Luo, Siqi and Xiong, Jing and Zhang, Mi},
  journal={arXiv preprint arXiv:2406.13035},
  year={2024}
}

or

@inproceedings{
wan2025textdtexto,
title={\${\textbackslash}text\{D\}\_\{2\}{\textbackslash}text\{O\}\$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models},
author={Zhongwei Wan and Xinjian Wu and Yu Zhang and Yi Xin and Chaofan Tao and Zhihong Zhu and Xin Wang and Siqi Luo and Jing Xiong and Longyue Wang and Mi Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=HzBfoUdjHt}
}

About

[ICLR 2025🔥] Official implementation of "D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published