Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

JamePeng
Copy link

@JamePeng JamePeng commented Mar 9, 2025

Update llama.cpp version llama.cpp updated [from 794fe2 to 2c9f833]
Use the llama_sampler_init instead of llama_sampler() for safe usage
Sync llama : add Phi-4-mini support
Sync llama : expose llama_model_n_head_kv in the API
Sync tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
Remove Tail-Free sampling
Add TopN-Sigma/XTC/DRY samplers code into sampler

@JamePeng JamePeng changed the title Sync LLAMA_API names with ggml-org/llama.cpp 20250309, support LLAMA_VOCAB_PRE_TYPE_GPT4O Sync LLAMA_API names with ggml-org/llama.cpp 20250309 Mar 9, 2025
@JamePeng
Copy link
Author

JamePeng commented Mar 9, 2025

I tried to adjust the workflow output based on VS2022 to compile pip wheels, and generate two cuda versions 12.4.1 and 12.6.3 and the win version of py310-312 for your convenience.
It should have been compiled now: https://github.com/JamePeng/llama-cpp-python/releases

@JamePeng JamePeng changed the title Sync LLAMA_API names with ggml-org/llama.cpp 20250309 Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant