removed waits around enqueue_primitive in verbose mode #2973
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When running llama.cpp with SYCL Graph feature enabled by GGML_SYCL_GRAPH variable and with verbose logging on oneDNN, the
wait
call is executed on the sycl stream. This is illegal when recording graph, and wait cannot be called for a queue which is recording to a command graph exception is thrown by this line in llvm compiler.I'd like to solve this
wait
problem in order to allow testing llama.cpp with SYCL and with oneDNN verbose logs, as active work on enabling fast graphs in llama.cpp is currently in progress.What is the best solution
The simplest solution is just remove wait calls, as proposed in this PR, but in this case measurement of kernel execution time will be wrongly calculated.
Other solutions that I see are:
get_profiling_info
function. This measures just kernel execution time, not data transfers and kernel launch time.Is there a perfect solution to this problem already designed, or is there any other solution possible, that avoids waits but still computes and logs kernel execution time correctly?