You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch version: 2.6.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Arch Linux (x86_64)
GCC version: (GCC) 14.2.1 20250207
Clang version: 19.1.7
CMake version: version 4.0.0
Libc version: glibc-2.41
Python version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.13.8-arch1-1-x86_64-with-glibc2.41
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architektur: x86_64
CPU Operationsmodus: 32-bit, 64-bit
Adressgrößen: 39 bits physical, 48 bits virtual
Byte-Reihenfolge: Little Endian
CPU(s): 8
Liste der Online-CPU(s): 0-7
Anbieterkennung: GenuineIntel
Modellname: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Prozessorfamilie: 6
Modell: 142
Thread(s) pro Kern: 2
Kern(e) pro Sockel: 4
Sockel: 1
Stepping: 12
Skalierung der CPU(s): 47%
Maximale Taktfrequenz der CPU: 4900,0000
Minimale Taktfrequenz der CPU: 400,0000
BogoMIPS: 4599,93
Markierungen: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities
Virtualisierung: VT-x
L1d Cache: 128 KiB (4 Instanzen)
L1i Cache: 128 KiB (4 Instanzen)
L2 Cache: 1 MiB (4 Instanzen)
L3 Cache: 8 MiB (1 Instanz)
NUMA-Knoten: 1
NUMA-Knoten0 CPU(s): 0-7
Schwachstelle Gather data sampling: Vulnerable: No microcode
Schwachstelle Itlb multihit: KVM: Mitigation: VMX disabled
Schwachstelle L1tf: Not affected
Schwachstelle Mds: Not affected
Schwachstelle Meltdown: Not affected
Schwachstelle Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Schwachstelle Reg file data sampling: Not affected
Schwachstelle Retbleed: Mitigation; Enhanced IBRS
Schwachstelle Spec rstack overflow: Not affected
Schwachstelle Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Schwachstelle Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Schwachstelle Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Schwachstelle Srbds: Vulnerable: No microcode
Schwachstelle Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.0
[pip3] torch==2.6.0+cpu
[pip3] torchaudio==2.6.0+cpu
[pip3] torchvision==0.21.0+cpu
[pip3] transformers==4.50.3
[conda] numpy 1.26.4 pypi_0 pypi
[conda] pyzmq 26.3.0 pypi_0 pypi
[conda] torch 2.6.0+cpu pypi_0 pypi
[conda] torchaudio 2.6.0+cpu pypi_0 pypi
[conda] torchvision 0.21.0+cpu pypi_0 pypi
[conda] transformers 4.50.3 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.8.3.dev212+g58e234a7
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1
How would you like to use vllm
I would like to run vLLM inside a Jupyter notebook environment as any other python code snippet. When I run the example code (see below) from the CLI, it works as expected!
When I run the following snippet from your examples, I get an error:
from vllm import LLM
llm = LLM(model="OpenGVLab/InternVL2_5-1B")
# Refer to the HuggingFace repo for the correct format to use
prompt = "USER: <image>\nWhat is the content of this image?\nASSISTANT:"
# Load the image using PIL.Image
image = PIL.Image.open('/tmp/pic1.png')
# Single prompt inference
outputs = llm.generate({
"prompt": prompt,
"multi_modal_data": {"image": image},
})
The error is as follows:
/home/repodiac/anaconda3/envs/vllm/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
INFO 04-03 10:43:58 [__init__.py:239] Automatically detected platform cpu.
2025-04-03 10:43:59,209 INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
INFO 04-03 10:44:06 [config.py:598] This model supports multiple tasks: {'classify', 'reward', 'score', 'embed', 'generate'}. Defaulting to 'generate'.
WARNING 04-03 10:44:06 [arg_utils.py:1707] device type=cpu is not supported by the V1 Engine. Falling back to V0.
WARNING 04-03 10:44:06 [cpu.py:98] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
WARNING 04-03 10:44:06 [cpu.py:111] uni is not supported on CPU, fallback to mp distributed executor backend.
INFO 04-03 10:44:06 [llm_engine.py:242] Initializing a V0 LLM engine (v0.8.3.dev212+g58e234a7) with config: model='OpenGVLab/InternVL2_5-1B', speculative_config=None, tokenizer='OpenGVLab/InternVL2_5-1B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=OpenGVLab/InternVL2_5-1B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 04-03 10:44:08 [cpu.py:44] Using Torch SDPA backend.
WARNING 04-03 10:44:08 [_custom_ops.py:21] Failed to import from vllm._C with ImportError("/home/repodiac/anaconda3/envs/vllm/lib/python3.12/site-packages/zmq/backend/cython/../../../../.././libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /home/repodiac/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/_C.abi3.so)")
INFO 04-03 10:44:08 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 04-03 10:44:08 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-03 10:44:08 [cpu.py:44] Using Torch SDPA backend.
INFO 04-03 10:44:08 [config.py:3317] cudagraph sizes specified by model runner [] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]
WARNING 04-03 10:44:08 [cpu.py:98] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
[W403 10:44:08.931695810 socket.cpp:759] [c10d] The client socket cannot be initialized to connect to [quark-247]:35995 (errno: 97 - Address family not supported by protocol).
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[1], line 3
1 from vllm import LLM
----> 3 llm = LLM(model="OpenGVLab/InternVL2_5-1B")
5 # Refer to the HuggingFace repo for the correct format to use
6 prompt = "USER: <image>\nWhat is the content of this image?\nASSISTANT:"
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py:1096, in deprecate_args.<locals>.wrapper.<locals>.inner(*args, **kwargs)
1089 msg += f" {additional_message}"
1091 warnings.warn(
1092 DeprecationWarning(msg),
1093 stacklevel=3, # The inner function takes up one level
1094 )
-> 1096 return fn(*args, **kwargs)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/llm.py:243, in LLM.__init__(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, allowed_local_media_path, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, hf_overrides, mm_processor_kwargs, task, override_pooler_config, compilation_config, **kwargs)
214 engine_args = EngineArgs(
215 model=model,
216 task=task,
(...)
239 **kwargs,
240 )
242 # Create the Engine (autoselects V0 vs V1)
--> 243 self.llm_engine = LLMEngine.from_engine_args(
244 engine_args=engine_args, usage_context=UsageContext.LLM_CLASS)
245 self.engine_class = type(self.llm_engine)
247 self.request_counter = Counter()
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py:521, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers)
518 from vllm.v1.engine.llm_engine import LLMEngine as V1LLMEngine
519 engine_cls = V1LLMEngine
--> 521 return engine_cls.from_vllm_config(
522 vllm_config=vllm_config,
523 usage_context=usage_context,
524 stat_loggers=stat_loggers,
525 disable_log_stats=engine_args.disable_log_stats,
526 )
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py:497, in LLMEngine.from_vllm_config(cls, vllm_config, usage_context, stat_loggers, disable_log_stats)
489 @classmethod
490 def from_vllm_config(
491 cls,
(...)
495 disable_log_stats: bool = False,
496 ) -> "LLMEngine":
--> 497 return cls(
498 vllm_config=vllm_config,
499 executor_class=cls._get_executor_cls(vllm_config),
500 log_stats=(not disable_log_stats),
501 usage_context=usage_context,
502 stat_loggers=stat_loggers,
503 )
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py:281, in LLMEngine.__init__(self, vllm_config, executor_class, log_stats, usage_context, stat_loggers, input_registry, mm_registry, use_cached_outputs)
277 self.input_registry = input_registry
278 self.input_processor = input_registry.create_input_processor(
279 self.model_config)
--> 281 self.model_executor = executor_class(vllm_config=vllm_config, )
283 if self.model_config.runner_type != "pooling":
284 self._initialize_kv_caches()
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py:286, in DistributedExecutorBase.__init__(self, *args, **kwargs)
281 def __init__(self, *args, **kwargs):
282 # This is non-None when the execute model loop is running
283 # in the parallel workers. It's a coroutine in the AsyncLLMEngine case.
284 self.parallel_worker_tasks: Optional[Union[Any, Awaitable[Any]]] = None
--> 286 super().__init__(*args, **kwargs)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py:52, in ExecutorBase.__init__(self, vllm_config)
50 self.prompt_adapter_config = vllm_config.prompt_adapter_config
51 self.observability_config = vllm_config.observability_config
---> 52 self._init_executor()
53 self.is_sleeping = False
54 self.sleeping_tags: set[str] = set()
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py:125, in MultiprocessingDistributedExecutor._init_executor(self)
123 self._run_workers("init_worker", all_kwargs)
124 self._run_workers("init_device")
--> 125 self._run_workers("load_model",
126 max_concurrent_workers=self.parallel_config.
127 max_parallel_loading_workers)
128 self.driver_exec_model = make_async(self.driver_worker.execute_model)
129 self.pp_locks: Optional[List[asyncio.Lock]] = None
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py:185, in MultiprocessingDistributedExecutor._run_workers(***failed resolving arguments***)
179 # Start all remote workers first.
180 worker_outputs = [
181 worker.execute_method(sent_method, *args, **kwargs)
182 for worker in self.workers
183 ]
--> 185 driver_worker_output = run_method(self.driver_worker, sent_method,
186 args, kwargs)
188 # Get the results of the workers.
189 return [driver_worker_output
190 ] + [output.get() for output in worker_outputs]
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py:2347, in run_method(obj, method, args, kwargs)
2345 else:
2346 func = partial(method, obj) # type: ignore
-> 2347 return func(*args, **kwargs)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/cpu_worker.py:233, in CPUWorker.load_model(self)
232 def load_model(self):
--> 233 self.model_runner.load_model()
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/cpu_model_runner.py:491, in CPUModelRunnerBase.load_model(self)
490 def load_model(self) -> None:
--> 491 self.model = get_model(vllm_config=self.vllm_config)
493 if self.lora_config:
494 assert supports_lora(
495 self.model
496 ), f"{self.model.__class__.__name__} does not support LoRA yet."
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py:14, in get_model(vllm_config)
12 def get_model(*, vllm_config: VllmConfig) -> nn.Module:
13 loader = get_model_loader(vllm_config.load_config)
---> 14 return loader.load_model(vllm_config=vllm_config)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:441, in DefaultModelLoader.load_model(self, vllm_config)
439 with set_default_torch_dtype(model_config.dtype):
440 with target_device:
--> 441 model = _initialize_model(vllm_config=vllm_config)
443 weights_to_load = {name for name, _ in model.named_parameters()}
444 loaded_weights = model.load_weights(
445 self._get_all_weights(model_config, model))
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:127, in _initialize_model(vllm_config, prefix)
124 if "vllm_config" in all_params and "prefix" in all_params:
125 # new-style model class
126 with set_current_vllm_config(vllm_config, check_compile=True):
--> 127 return model_class(vllm_config=vllm_config, prefix=prefix)
129 msg = ("vLLM model class should accept `vllm_config` and `prefix` as "
130 "input arguments. Possibly you have an old-style model class"
131 " registered from out of tree and it is used for new vLLM version. "
132 "Check https://docs.vllm.ai/en/latest/design/arch_overview.html "
133 "for the design and update the model class accordingly.")
134 warnings.warn(msg, DeprecationWarning, stacklevel=2)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/internvl.py:714, in InternVLChatModel.__init__(self, vllm_config, prefix)
706 self.is_mono = self.llm_arch_name == 'InternLM2VEForCausalLM'
707 self.vision_model = self._init_vision_model(
708 config,
709 quant_config=quant_config,
710 is_mono=self.is_mono,
711 prefix=maybe_prefix(prefix, "vision_model"),
712 )
--> 714 self.language_model = init_vllm_registered_model(
715 vllm_config=vllm_config,
716 hf_config=config.text_config,
717 prefix=maybe_prefix(prefix, "language_model"),
718 )
720 self.mlp1 = self._init_mlp1(config)
722 self.img_context_token_id = None
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:286, in init_vllm_registered_model(vllm_config, prefix, hf_config, architectures)
282 if hf_config is not None:
283 vllm_config = vllm_config.with_hf_config(hf_config,
284 architectures=architectures)
--> 286 return _initialize_model(vllm_config=vllm_config, prefix=prefix)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:127, in _initialize_model(vllm_config, prefix)
124 if "vllm_config" in all_params and "prefix" in all_params:
125 # new-style model class
126 with set_current_vllm_config(vllm_config, check_compile=True):
--> 127 return model_class(vllm_config=vllm_config, prefix=prefix)
129 msg = ("vLLM model class should accept `vllm_config` and `prefix` as "
130 "input arguments. Possibly you have an old-style model class"
131 " registered from out of tree and it is used for new vLLM version. "
132 "Check https://docs.vllm.ai/en/latest/design/arch_overview.html "
133 "for the design and update the model class accordingly.")
134 warnings.warn(msg, DeprecationWarning, stacklevel=2)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:431, in Qwen2ForCausalLM.__init__(self, vllm_config, prefix)
428 self.lora_config = lora_config
430 self.quant_config = quant_config
--> 431 self.model = Qwen2Model(vllm_config=vllm_config,
432 prefix=maybe_prefix(prefix, "model"))
434 if get_pp_group().is_last_rank:
435 if config.tie_word_embeddings:
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/compilation/decorators.py:151, in _support_torch_compile.<locals>.__init__(self, vllm_config, prefix, **kwargs)
150 def __init__(self, *, vllm_config: VllmConfig, prefix: str = '', **kwargs):
--> 151 old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
152 self.vllm_config = vllm_config
153 # for CompilationLevel.DYNAMO_AS_IS , the upper level model runner
154 # will handle the compilation, so we don't need to do anything here.
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:300, in Qwen2Model.__init__(self, vllm_config, prefix)
297 else:
298 self.embed_tokens = PPMissingLayer()
--> 300 self.start_layer, self.end_layer, self.layers = make_layers(
301 config.num_hidden_layers,
302 lambda prefix: Qwen2DecoderLayer(config=config,
303 cache_config=cache_config,
304 quant_config=quant_config,
305 prefix=prefix),
306 prefix=f"{prefix}.layers",
307 )
309 self.make_empty_intermediate_tensors = (
310 make_empty_intermediate_tensors_factory(
311 ["hidden_states", "residual"], config.hidden_size))
312 if get_pp_group().is_last_rank:
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:610, in make_layers(num_hidden_layers, layer_fn, prefix)
604 from vllm.distributed.utils import get_pp_indices
605 start_layer, end_layer = get_pp_indices(num_hidden_layers,
606 get_pp_group().rank_in_group,
607 get_pp_group().world_size)
608 modules = torch.nn.ModuleList(
609 [PPMissingLayer() for _ in range(start_layer)] + [
--> 610 maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
611 for idx in range(start_layer, end_layer)
612 ] + [PPMissingLayer() for _ in range(end_layer, num_hidden_layers)])
613 return start_layer, end_layer, modules
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:302, in Qwen2Model.__init__.<locals>.<lambda>(prefix)
297 else:
298 self.embed_tokens = PPMissingLayer()
300 self.start_layer, self.end_layer, self.layers = make_layers(
301 config.num_hidden_layers,
--> 302 lambda prefix: Qwen2DecoderLayer(config=config,
303 cache_config=cache_config,
304 quant_config=quant_config,
305 prefix=prefix),
306 prefix=f"{prefix}.layers",
307 )
309 self.make_empty_intermediate_tensors = (
310 make_empty_intermediate_tensors_factory(
311 ["hidden_states", "residual"], config.hidden_size))
312 if get_pp_group().is_last_rank:
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:218, in Qwen2DecoderLayer.__init__(self, config, cache_config, quant_config, prefix)
204 attn_type = AttentionType.ENCODER_ONLY
206 self.self_attn = Qwen2Attention(
207 hidden_size=self.hidden_size,
208 num_heads=config.num_attention_heads,
(...)
216 attn_type=attn_type,
217 )
--> 218 self.mlp = Qwen2MLP(
219 hidden_size=self.hidden_size,
220 intermediate_size=config.intermediate_size,
221 hidden_act=config.hidden_act,
222 quant_config=quant_config,
223 prefix=f"{prefix}.mlp",
224 )
225 self.input_layernorm = RMSNorm(config.hidden_size,
226 eps=config.rms_norm_eps)
227 self.post_attention_layernorm = RMSNorm(config.hidden_size,
228 eps=config.rms_norm_eps)
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:92, in Qwen2MLP.__init__(self, hidden_size, intermediate_size, hidden_act, quant_config, prefix)
89 if hidden_act != "silu":
90 raise ValueError(f"Unsupported activation: {hidden_act}. "
91 "Only silu is supported for now.")
---> 92 self.act_fn = SiluAndMul()
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/layers/activation.py:68, in SiluAndMul.__init__(self)
66 super().__init__()
67 if current_platform.is_cuda_alike() or current_platform.is_cpu():
---> 68 self.op = torch.ops._C.silu_and_mul
69 elif current_platform.is_xpu():
70 from vllm._ipex_ops import ipex_ops
File ~/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_ops.py:1232, in _OpNamespace.__getattr__(self, op_name)
1230 op, overload_names = _get_packet(qualified_op_name, module_name)
1231 if op is None:
-> 1232 raise AttributeError(
1233 f"'_OpNamespace' '{self.name}' object has no attribute '{op_name}'"
1234 )
1235 except RuntimeError as e:
1236 # Turn this into AttributeError so getattr(obj, key, default)
1237 # works (this is called by TorchScript with __origin__)
1238 raise AttributeError(
1239 f"'_OpNamespace' '{self.name}' object has no attribute '{op_name}'"
1240 ) from e
AttributeError: '_OpNamespace' '_C' object has no attribute 'silu_and_mul'
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
(With Cuda)
Not sure why it's working on the CLI, but not in a notebook.
Could it be an issue with how your notebook environment was launched or some other linker/python-env error?
Your current environment
How would you like to use vllm
I would like to run vLLM inside a Jupyter notebook environment as any other python code snippet.
When I run the example code (see below) from the CLI, it works as expected!
When I run the following snippet from your examples, I get an error:
The error is as follows:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: