Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

JamePeng · 2025-03-09T01:57:40Z

Update llama.cpp version llama.cpp updated [from 794fe2 to 2c9f833]
Use the llama_sampler_init instead of llama_sampler() for safe usage
Sync llama : add Phi-4-mini support
Sync llama : expose llama_model_n_head_kv in the API
Sync tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
Remove Tail-Free sampling
Add TopN-Sigma/XTC/DRY samplers code into sampler

…VOCAB_PRE_TYPE_GPT4O

Fix the char array params convert problem

more top_n_sigma、xtc_threshold: float = 0.1、xtc_probability: float params

Create build-wheels-cuda-win.yml

JamePeng · 2025-03-09T20:51:59Z

I tried to adjust the workflow output based on VS2022 to compile pip wheels, and generate two cuda versions 12.4.1 and 12.6.3 and the win version of py310-312 for your convenience.
It should have been compiled now: https://github.com/JamePeng/llama-cpp-python/releases

JamePeng added 8 commits March 9, 2025 08:37

update llama.cpp version

6f7cd45

Sync LLAMA_API names with ggml-org/llama.cpp 20250309, support LLAMA_…

d5ce3fe

…VOCAB_PRE_TYPE_GPT4O

fix missing dot

698cb80

append add_grammar_lazy and add_grammar_lazy_patterns in _internals.py

6cb058a

llama_sampler_init params fixed

efce1f4

Remove duplicate functions

c392ec2

class LlamaSampler: append add_xtc() and add_top_n_sigma()

0874bac

class LlamaSampler: append add_dry()

0523859

Fix the char array params convert problem

JamePeng changed the title ~~Sync LLAMA_API names with ggml-org/llama.cpp 20250309, support LLAMA_VOCAB_PRE_TYPE_GPT4O~~ Sync LLAMA_API names with ggml-org/llama.cpp 20250309 Mar 9, 2025

JamePeng added 4 commits March 9, 2025 14:00

Append xtc params to sampler_init

cb49f02

Append top_n_sigma params to sampler_init

41cff76

fix typo!

d984742

remove Tail-Free sampling, ggml-org/llama.cpp#10071

fef4f25

more top_n_sigma、xtc_threshold: float = 0.1、xtc_probability: float params

JamePeng mentioned this pull request Mar 9, 2025

GPU Support Missing in Version >=0.3.5 on Windows with CUDA 12.4 and RTX 3090 #1967

Open

JamePeng added 2 commits March 10, 2025 00:14

Create build-wheels-cuda-win.yml

f28487b

Merge pull request #4 from JamePeng/JamePeng-patch-build-wheel-cuda

7074f42

Create build-wheels-cuda-win.yml

JamePeng force-pushed the main branch from d2dd3b0 to 7074f42 Compare March 9, 2025 16:50

JamePeng changed the title ~~Sync LLAMA_API names with ggml-org/llama.cpp 20250309~~ Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows Mar 9, 2025

JamePeng added 3 commits March 11, 2025 05:27

minor fix

655dc4a

Add DRY samplers code into sampler

6348c2b

Submodule vendor/llama.cpp 0fd7ca7..2c9f833

6a681a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

JamePeng commented Mar 9, 2025 •

edited

Loading

JamePeng commented Mar 9, 2025

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Are you sure you want to change the base?

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Conversation

JamePeng commented Mar 9, 2025 • edited Loading

JamePeng commented Mar 9, 2025

JamePeng commented Mar 9, 2025 •

edited

Loading