A CUDA error is thrown when using `llama_local` - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

ropstah · 2025-01-09T20:00:04Z

Describe the bug

A CUDA error is thrown when using llama_local

To Reproduce

Windows 10
wsl2
node -v = v23.5.0
python --version = Python 3.12.8

checkout tag v0.1.7
pnpm install --no-frozen-lockfile
pnpm build
.env setup for llama_local: XAI_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo as described here
Changed modelProvider to llama_local in trump.character.json
Run with pnpm start --character="characters/trump.character.json"
Start client with pnpm start:client
Type hi in chat

Expected behavior

An indication of what the error is exactly. I also changing model configuration but that doesn't seem to be picked up. I'm at a loss.

Screenshot

Additional context

["◎ Generating text..."] 

 ℹ INFORMATIONS
   Generating text with options:
   {"modelProvider":"llama_local","model":"large"} 

 ℹ INFORMATIONS
   Selected model:
   NousResearch/Hermes-3-Llama-3.1-8B-GGUF/resolve/main/Hermes-3-Llama-3.1-8B.Q8_0.gguf?download=true 

 ["ℹ Model not initialized, starting initialization..."] 

 ["ℹ Checking model file..."] 

 ["⚠ Model already exists."] 

 ["ℹ LlamaService: CUDA detected, using GPU acceleration"] 

 ["ℹ Initializing Llama instance..."] 

(node:30868) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("ts-node/esm", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)
(node:30868) [DEP0180] DeprecationWarning: fs.Stats constructor is deprecated.
(Use `node --trace-deprecation ...` to show where the warning was created)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
 ["ℹ Creating JSON schema grammar..."] 

 ["ℹ Loading model..."] 

 ["ℹ Creating context and sequence..."] 

 ["✓ Model initialization complete"] 

C:\Users\me\Documents\Projects\eliza\node_modules\node-llama-cpp\llama\llama.cpp\ggml\src\ggml-cuda.cu:70: CUDA error
C:\Users\me\Documents\Projects\eliza\agent:
 ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL  @elizaos/agent@0.1.7 start: `node --loader ts-node/esm src/index.ts "--isRoot" "--character=characters/trump.character.json"`
Exit status 3221226505
 ELIFECYCLE  Command failed with exit code 3221226505.

The text was updated successfully, but these errors were encountered:

github-actions · 2025-01-09T20:00:27Z

Hello @ropstah! Welcome to the ai16z community. Thank you for opening your first issue; we appreciate your contribution. You are now a ai16z contributor!

IdosElmo · 2025-01-10T04:57:15Z

I am also experiencing this issue, and have been trying to solve it for a few days now without any luck.
Using Windows, with RTX 4070TI SUPER (16gb). I also tried to load a 7B and a 3B model to rule out memory issues.
I tracked the GPU usage and it seems to spike to about 30-40% and crash right after with the error above.

The error is thrown when calling this.sequence.evaluate(tokens, {...}) here - llama.ts
I was able to manually run node-llama-cpp with the same model (roughly with the same inputs) and it seems to evaluate just fine.

Trying to evaluate empty tokens tokens = [] will work but error on missing response
Trying to evaluate one token (tokenizing context = 'hi') will also fail with the same error
If the GPU usage is already high, I'll see a memory issues along side the same error (two errors)
It works without gpu acceleration (or with vulkan)

This might be another bug, but when using the cpu or vulkan the model is stuck in an endless loop. I saw this PR which fixes it - but the same fix doesn't work for the issue mentioned here.

AIFlowML · 2025-01-12T10:34:40Z

The local model should not have issues like this specially on CUDA.
I will investigate.
We are also start a finetune of a new local model to use in the REPO.
I run anyway both the new, the old and the develop on the GPU server and i not have any issues.

lanly-dev · 2025-01-25T01:14:50Z

I also got the ggml-cuda.cu:70: CUDA error from running eliza-starter

hello world
 ◎ LOGS
   Creating Memory 
   362489a6-7dfe-0adf-9c2c-85aa58c86100
   hello world

 ["◎ Generating message response.."] 

 ["◎ Generating text..."]

 ℹ INFORMATIONS
   Generating text with options:
   {"modelProvider":"llama_local","model":"large"}

 ℹ INFORMATIONS
   Selected model:
   NousResearch/Hermes-3-Llama-3.1-8B-GGUF/resolve/main/Hermes-3-Llama-3.1-8B.Q8_0.gguf?download=true

 ["ℹ Model not initialized, starting initialization..."] 

 ["ℹ Checking model file..."]

 ["⚠ Model already exists."]

 ["ℹ LlamaService: CUDA detected, using GPU acceleration"] 

 ["ℹ Initializing Llama instance..."]

(node:18076) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("ts-node/esm", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)
(node:18076) [DEP0180] DeprecationWarning: fs.Stats constructor is deprecated.
(Use `node --trace-deprecation ...` to show where the warning was created)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
 ["ℹ Creating JSON schema grammar..."]

 ["ℹ Loading model..."]

 ["ℹ Creating context and sequence..."]

 ["✓ Model initialization complete"]

D:\a\node-llama-cpp\node-llama-cpp\llama\llama.cpp\ggml\src\ggml-cuda.cu:70: CUDA error

And I only have one C: drive, so not sure how this path shows up 🤔

vcastelin · 2025-01-31T08:46:29Z

Same error here with:

eliza v0.1.8-alpha.1
Windows 11 pro
Visual code with git bash
RTX geforce 4070 for laptop

Nothing really configured, I've just copied/pasted the .env.example to .env. Maybe we miss some configuration there ?

I'll dig a bit.

ballboyredditor · 2025-02-26T15:18:40Z

Anyone able to resolve this issue?

lanly-dev · 2025-03-02T02:17:48Z

Is this fixed?

ropstah added the bug Something isn't working label Jan 9, 2025

AIFlowML added Need Feedback and removed bug Something isn't working labels Jan 12, 2025

lalalune closed this as not planned Won't fix, can't repro, duplicate, stale Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A CUDA error is thrown when using `llama_local` - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

A CUDA error is thrown when using `llama_local` - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

ropstah commented Jan 9, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025

IdosElmo commented Jan 10, 2025 •

edited

Loading

AIFlowML commented Jan 12, 2025

lanly-dev commented Jan 25, 2025 •

edited

Loading

vcastelin commented Jan 31, 2025

ballboyredditor commented Feb 26, 2025

lanly-dev commented Mar 2, 2025

A CUDA error is thrown when using llama_local - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

A CUDA error is thrown when using llama_local - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

Comments

ropstah commented Jan 9, 2025 • edited Loading

github-actions bot commented Jan 9, 2025

IdosElmo commented Jan 10, 2025 • edited Loading

AIFlowML commented Jan 12, 2025

lanly-dev commented Jan 25, 2025 • edited Loading

vcastelin commented Jan 31, 2025

ballboyredditor commented Feb 26, 2025

lanly-dev commented Mar 2, 2025

A CUDA error is thrown when using `llama_local` - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

A CUDA error is thrown when using `llama_local` - ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL, exit code 3221226505 #2080

ropstah commented Jan 9, 2025 •

edited

Loading

IdosElmo commented Jan 10, 2025 •

edited

Loading

lanly-dev commented Jan 25, 2025 •

edited

Loading