-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a local endpoint type for inference directly from chat-ui #1778
Conversation
Something is going wrong in the build step... Found this relevant issue, trying to fix |
Works well you can do something like
it will automatically use your GPU if available and download models to the It's still super rough as it doesn't handle running out of memory gracefully so I'm still working on dealing with this better. I also want to automatically expose any .gguf files in the |
Merging for now, it works well in local testing! Will update the docs to explain this when I'm done with the quick setup. |
…uggingface#1778) * feat: add a local endpoint type running llama.cpp from chat-ui * fix: build image * fix: lock file * wip: try to make it more reliable * feat: load chat template from .gguf file * feat: load gguf models from `models/` folder * fix: default config * feat: make endpoint use chatSession instead of completion * refactor: improve exit handling, exit immediately on second sinal * fix: various fixes to improve reliability when calling multiple models at once * docs: add instructions for adding .gguf files to the models directory
Part of #1774
models/
as a model ifMODELS
is undefined