Replies: 2 comments
-
The server application has some limitations and does not work as expected in instruction mode. You should keep in mind that you need to pass the exact same prompt template for the model to give the best responses. Could you provide an example? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Try surrounding the prompt with [INST] [/INST] tags. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Please help, help, help! I spent months in an attempt to solve this issue. I asked this question many times and nobody answered. I can't believe that no one is interested in solving this issue.
I can't tune up llama server. No, it's quite simple, but the response is always awful with the same prompt. LLAMA Sever doesn't work like LLAMA INSTRUCTION MODE. Meanwhile, when I run llama.cpp in the instruction mode I receive exactly what I want. Below, I'll give you examples:
alex@M1 llama.cpp % ./main -m ~/ai/mistral-7b-instruct-v0.2.Q4_K_M.gguf -ins --color --multiline-input -ngl 99
Another example:
Everything is correct. Next, I'm trying to do the same with LLAMA Server:
~/ai/llama.cpp$ ./server -m models/mistral-7b-instruct-v0.2.Q4_K_M.gguf -c 2048 --host XXX.XXX.XXX.XX --port 8000
Ok, let's do it with the most simple code and make an API request:
Below are the examples of the response:
WTF? named after his girlfriend, Marie-Antoinette Lancret-de-Fortuny Where did the model get these garbish data?
WTF? from the Latin word "lanka," meaning "slim," and "komos,"
WTF? who named the company after his favorite flower, the orchid (l'Orchidee in French)
Finally, example with CURL (how it was done in the official instruction).
... again, hallucinations: The name Lancome is derived from the Latin word for lake, "lacuna," and the French word for beauty, "beau."
While the output on the same remote machine using the instruct mode:
Why every basic instruct response writes the brand name correctly Lancôme while the server's response writes Lancome?
In other words, every time I make an API request to the server, I experience the hallucinations of the model, while in the instruction mode it behaves correctly.
Why does it happen? Why I can't get exactly the same (quality) response from the LLAMA server, while LLAMA.CPP gives me exactly what I want in the instruction mode. I've been trying this with numerous queries and completely the same parameters but had no quality response with LLAMA Server.
P.s.: I also mentioned that the strange behavior (hallucinations) happen only in the Completion mode. Even if I use the web interface for LLAMA Server, everything works just fine in the Chat mode. I suppose, LLAMA.CPP drops some crucial parameters in the completion mode.
Beta Was this translation helpful? Give feedback.
All reactions