-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: ggml-org/llama.cpp
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cbbd1efa06f8c09f9dff58ff9d9af509cc4c152b
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: ggml-org/llama.cpp
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cb49e0f8c906e5da49e9f6d64a57742a9a241c6a
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 5 commits
- 14 files changed
- 4 contributors
Commits on Feb 27, 2024
-
llama : fix defrag bugs + add parameter (#5735)
* llama : fix defrag bugs + enable by default ggml-ci * llama : add defrag_thold parameter ggml-ci * llama : cont * llama : disable log message ggml-ci * llama : fix graph size check during defrag
Configuration menu - View commit details
-
Copy full SHA for 9d533a7 - Browse repository at this point
Copy the full SHA 9d533a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f30b7a - Browse repository at this point
Copy the full SHA 1f30b7aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c24a2a6 - Browse repository at this point
Copy the full SHA c24a2a6View commit details -
IQ4_XS: a 4.25 bpw quantization (#5747)
* Try IQ4_NL with blocks of 64 - does not look good * iq4_xs: go to super-blocks of 256 and 6-bit scales for blocks of 32 * iq4_xs: CUDA works - 133.2 t/s * iq4_xs: AVX2 dot product * iq4_xs: ARM_NEON dot product * iq4_nl: Metal implementation As usual, Metal / Apple Silicon don't like my quants. * iq3_xs: minor fix * iq4_xs: shrink by using IQ3_S for attn_k and attn_q * iq4_xs: revert using IQ3_S for attn_k and attn_v PPL vs size is good, but CPU performance suffers: on M2 Max TG-128 drops to 21.7 t/s from 28.8, and on a Ryzen-7950X to 14.5 t/s from 15.8 t/s. On CUDA we have 135 t/s when using IQ3_S vs 133 t/s with pure IQ4_XS. * Fix CI * iq4_xs: Added forgotten check for 256 divisibility --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 0becb22 - Browse repository at this point
Copy the full SHA 0becb22View commit details -
Attempt to fix android build (#5752)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for cb49e0f - Browse repository at this point
Copy the full SHA cb49e0fView commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff cbbd1efa06f8c09f9dff58ff9d9af509cc4c152b...cb49e0f8c906e5da49e9f6d64a57742a9a241c6a