Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A CUDA error is thrown when using llama_local - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

Closed
ropstah opened this issue Jan 9, 2025 · 7 comments

Comments

@ropstah
Copy link

ropstah commented Jan 9, 2025

Describe the bug

A CUDA error is thrown when using llama_local

To Reproduce

Windows 10
wsl2
node -v = v23.5.0
python --version = Python 3.12.8

followed Quick Start:

  1. checkout tag v0.1.7
  2. pnpm install --no-frozen-lockfile
  3. pnpm build
  4. .env setup for llama_local: XAI_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo as described here
  5. Changed modelProvider to llama_local in trump.character.json
  6. Run with pnpm start --character="characters/trump.character.json"
  7. Start client with pnpm start:client
  8. Type hi in chat

Expected behavior

An indication of what the error is exactly. I also changing model configuration but that doesn't seem to be picked up. I'm at a loss.

Screenshot
image

Additional context

["◎ Generating text..."] 

 ℹ INFORMATIONS
   Generating text with options:
   {"modelProvider":"llama_local","model":"large"} 

 ℹ INFORMATIONS
   Selected model:
   NousResearch/Hermes-3-Llama-3.1-8B-GGUF/resolve/main/Hermes-3-Llama-3.1-8B.Q8_0.gguf?download=true 

 ["ℹ Model not initialized, starting initialization..."] 

 ["ℹ Checking model file..."] 

 ["⚠ Model already exists."] 

 ["ℹ LlamaService: CUDA detected, using GPU acceleration"] 

 ["ℹ Initializing Llama instance..."] 

(node:30868) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("ts-node/esm", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)
(node:30868) [DEP0180] DeprecationWarning: fs.Stats constructor is deprecated.
(Use `node --trace-deprecation ...` to show where the warning was created)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
 ["ℹ Creating JSON schema grammar..."] 

 ["ℹ Loading model..."] 

 ["ℹ Creating context and sequence..."] 

 ["✓ Model initialization complete"] 

C:\Users\me\Documents\Projects\eliza\node_modules\node-llama-cpp\llama\llama.cpp\ggml\src\ggml-cuda.cu:70: CUDA error
C:\Users\me\Documents\Projects\eliza\agent:
 ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL  @elizaos/agent@0.1.7 start: `node --loader ts-node/esm src/index.ts "--isRoot" "--character=characters/trump.character.json"`
Exit status 3221226505
 ELIFECYCLE  Command failed with exit code 3221226505.
@ropstah ropstah added the bug Something isn't working label Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 9, 2025

Hello @ropstah! Welcome to the ai16z community. Thank you for opening your first issue; we appreciate your contribution. You are now a ai16z contributor!

@IdosElmo
Copy link

IdosElmo commented Jan 10, 2025

I am also experiencing this issue, and have been trying to solve it for a few days now without any luck.
Using Windows, with RTX 4070TI SUPER (16gb). I also tried to load a 7B and a 3B model to rule out memory issues.
I tracked the GPU usage and it seems to spike to about 30-40% and crash right after with the error above.

The error is thrown when calling this.sequence.evaluate(tokens, {...}) here - llama.ts
I was able to manually run node-llama-cpp with the same model (roughly with the same inputs) and it seems to evaluate just fine.

  • Trying to evaluate empty tokens tokens = [] will work but error on missing response
  • Trying to evaluate one token (tokenizing context = 'hi') will also fail with the same error
  • If the GPU usage is already high, I'll see a memory issues along side the same error (two errors)
  • It works without gpu acceleration (or with vulkan)

This might be another bug, but when using the cpu or vulkan the model is stuck in an endless loop. I saw this PR which fixes it - but the same fix doesn't work for the issue mentioned here.

@AIFlowML
Copy link
Collaborator

The local model should not have issues like this specially on CUDA.
I will investigate.
We are also start a finetune of a new local model to use in the REPO.
I run anyway both the new, the old and the develop on the GPU server and i not have any issues.

@AIFlowML AIFlowML added Need Feedback and removed bug Something isn't working labels Jan 12, 2025
@lanly-dev
Copy link

lanly-dev commented Jan 25, 2025

I also got the ggml-cuda.cu:70: CUDA error from running eliza-starter

hello world
 ◎ LOGS
   Creating Memory 
   362489a6-7dfe-0adf-9c2c-85aa58c86100
   hello world

 ["◎ Generating message response.."] 

 ["◎ Generating text..."]

 ℹ INFORMATIONS
   Generating text with options:
   {"modelProvider":"llama_local","model":"large"}

 ℹ INFORMATIONS
   Selected model:
   NousResearch/Hermes-3-Llama-3.1-8B-GGUF/resolve/main/Hermes-3-Llama-3.1-8B.Q8_0.gguf?download=true

 ["ℹ Model not initialized, starting initialization..."] 

 ["ℹ Checking model file..."]

 ["⚠ Model already exists."]

 ["ℹ LlamaService: CUDA detected, using GPU acceleration"] 

 ["ℹ Initializing Llama instance..."]

(node:18076) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("ts-node/esm", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)
(node:18076) [DEP0180] DeprecationWarning: fs.Stats constructor is deprecated.
(Use `node --trace-deprecation ...` to show where the warning was created)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
 ["ℹ Creating JSON schema grammar..."]

 ["ℹ Loading model..."]

 ["ℹ Creating context and sequence..."]

 ["✓ Model initialization complete"]

D:\a\node-llama-cpp\node-llama-cpp\llama\llama.cpp\ggml\src\ggml-cuda.cu:70: CUDA error

And I only have one C: drive, so not sure how this path shows up 🤔

Image

@vcastelin
Copy link

Same error here with:

  • eliza v0.1.8-alpha.1
  • Windows 11 pro
  • Visual code with git bash
  • RTX geforce 4070 for laptop

Nothing really configured, I've just copied/pasted the .env.example to .env. Maybe we miss some configuration there ?

I'll dig a bit.

@ballboyredditor
Copy link

Anyone able to resolve this issue?

@lalalune lalalune closed this as not planned Won't fix, can't repro, duplicate, stale Mar 2, 2025
@lanly-dev
Copy link

Is this fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants