Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPUW] Serialization fixes #28442

Merged
merged 17 commits into from
Jan 15, 2025

Conversation

smirnov-alexey
Copy link
Contributor

No description provided.

@moslex moslex added the priority: high High piority label Jan 15, 2025
Comment on lines 94 to 96
} else if (var.is<int>()) {
write(stream, int(AnyType::INT));
write(stream, var.as<int>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be not portable, but let's keep it aside as we don't claim any blob portability at the moment.

Comment on lines 226 to 229
} else if (type == AnyType::CHARS) {
std::string val;
read(stream, val);
var = std::move(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but didn't CHARS stand for const char* in Any when you serialized it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I serialize it as std::string to keep it simple

Comment on lines 224 to 228
if (iter_device != device_bank.storage.end()) {
// Already allocated
// Shouldn't be possible
NPUW_ASSERT(false);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a change in the contract here, what was OK previously is now not OK, can you please explain more here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs during bank deserialization. Our storage is an unordered_map with unique UIDs which all have been evaluated and serialized at this point. Thus it shouldn't be possible to encounter a duplicate UIDs during deserialization. Just a sanity check - this if can be removed altogether with no effect

@@ -237,6 +238,10 @@ void Bank::read_and_add_tensor(std::istream& stream, int64_t uid, const std::str
ov::Tensor allocated_tensor;

// FIXME: reading not via a dedicated function
bool is_intialized = false;
read(stream, is_intialized);
NPUW_ASSERT(is_intialized);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too. Haven't you wrote empty tensors before?
if is_initialized must always be true, why write it to the stream at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed Tensor serialization in some of the previous PRs. It occurred when I was serializing empty scales/zerops. This change is just to align with the previous change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Do you serialize empty tensors now or not?
If you don't, you don't need this bool in the stream either.

If you do, this code breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do but not for the bank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this code doesn't break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did I read this right you have a common routine to write a tensor to disk but you read it here in some specific way?
it doesn't sound good tbh.

you'd need to use the same function to read tensors as you write those. If that routine checks for false and does nothing, but you still need to raise an assert, you can make it in the caller code (= HERE) if the resulting tensor is still empty after read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on call, this change is caused by optimization (reading from stream directly to the L0 buffer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed locally

Comment on lines 145 to 147
if (properties.count(ov::cache_dir.name())) {
OPENVINO_THROW("Option 'CACHE_DIR' is not supported with NPU_USE_NPUW!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we do have the necessary tools in place now, let's plan it for 25.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This PR is just a support for LLMCompiledModel for now. We will figure out the rest later

// CACHE_DIR isn't supported with NPU_USE_NPUW
if (properties.count(ov::cache_dir.name())) {
OPENVINO_THROW(
"Option 'CACHE_DIR' is not supported with configuration: NPU_USE_NPUW : YES, NPUW_LLM : NO!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be surprised if people who get this will start putting "NO!" to the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -237,6 +238,10 @@ void Bank::read_and_add_tensor(std::istream& stream, int64_t uid, const std::str
ov::Tensor allocated_tensor;

// FIXME: reading not via a dedicated function
bool is_intialized = false;
read(stream, is_intialized);
NPUW_ASSERT(is_intialized);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did I read this right you have a common routine to write a tensor to disk but you read it here in some specific way?
it doesn't sound good tbh.

you'd need to use the same function to read tensors as you write those. If that routine checks for false and does nothing, but you still need to raise an assert, you can make it in the caller code (= HERE) if the resulting tensor is still empty after read.

@dmatveev dmatveev enabled auto-merge January 15, 2025 18:03
@dmatveev dmatveev added this pull request to the merge queue Jan 15, 2025
Merged via the queue into openvinotoolkit:master with commit 40b19c8 Jan 15, 2025
161 checks passed
@dmatveev dmatveev deleted the as/npuw_s11n_fixes branch January 15, 2025 21:34
github-merge-queue bot pushed a commit to openvinotoolkit/openvino.genai that referenced this pull request Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Code Freeze priority: high High piority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants