$D_{2}O$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

The code for ICLR 2025 paper: $D_{2}O$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models.

📃 [Paper] • 💻 [Github] • 🤗 [Huggingface]

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

We recommend using Anaconda to create a new environment and install the required packages. You can create a new environment and install the required packages using the following commands:

pip install -r requirements.txt
conda create -n d2o_v2 python=3.10
conda activate d2o_v2
pip install --upgrade pip  # enable PEP 660 support

Quick Step to Run the Code

You can run the inference code using the following command to run the Longbench sample:

CUDA_VISIBLE_DEVICES=0 python run_pred_long_bench_sample.py --model_name_or_path meta-llama/Meta-Llama-3-8B \
    --cache_dir /your_hf_home_path \
    --use_d2o True \
    --model_type llama3 \
    --hh_ratio 0.1 \
    --recent_ratio 0.1 \
    --action_name d2o_0.2 \
    --e True

cache_dir stores your model weights.
use_d2o specifies the execution strategy name.
hh_ratio refers to important tokens in our main paper.
recent_ratio represents the proportion of the window closest to the generated token.

Then, evaluate the results:

python eval_long_bench.py --model Meta-Llama-3-8B_d2o_0.2 --e

For tasks related to lm-evaluation-harness GitHub Repository,
we recommend using the latest version by running:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git

Then, follow the installation instructions provided in the repository and execute our algorithm accordingly.

Citation

@article{wan2024d2o,
  title={D2o: Dynamic discriminative operations for efficient generative inference of large language models},
  author={Wan, Zhongwei and Wu, Xinjian and Zhang, Yu and Xin, Yi and Tao, Chaofan and Zhu, Zhihong and Wang, Xin and Luo, Siqi and Xiong, Jing and Zhang, Mi},
  journal={arXiv preprint arXiv:2406.13035},
  year={2024}
}

or

@inproceedings{
wan2025textdtexto,
title={\${\textbackslash}text\{D\}\_\{2\}{\textbackslash}text\{O\}\$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models},
author={Zhongwei Wan and Xinjian Wu and Yu Zhang and Yi Xin and Chaofan Tao and Zhihong Zhu and Xin Wang and Siqi Luo and Jing Xiong and Longyue Wang and Mi Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=HzBfoUdjHt}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LLM_merge_new		LLM_merge_new
D2O.png		D2O.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

$D_{2}O$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

Quick Step to Run the Code

Citation

About

Releases

Packages

Contributors 2

Languages

License

AIoT-MLSys-Lab/d2o

Folders and files

Latest commit

History

Repository files navigation

$D_{2}O$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

If you find our project helpful, please give us a star ⭐ on GitHub to stay updated.

Setup Environment

Quick Step to Run the Code

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages