Any suggestions for improving the performance of Llama 3B Instruct model on mobile? #12723

Arya-Hari · 2025-04-02T16:25:51Z

Arya-Hari
Apr 2, 2025

Hello everyone! I've run the Llama 3B Instruct model on mobile using termux and llama.cpp. However, the performance is quite poor, with a prefill time of around 4.5 seconds and a decode time of around 5 seconds. These become worse when 8B is used. Any suggestions to improve these metrics?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any suggestions for improving the performance of Llama 3B Instruct model on mobile? #12723

{{title}}

Replies: 0 comments

Select a reply

Any suggestions for improving the performance of Llama 3B Instruct model on mobile? #12723

Arya-Hari Apr 2, 2025

Replies: 0 comments

Arya-Hari
Apr 2, 2025