refactor(streaming): remove `get_compacted_row` from `StateTable` #20034

BugenZhao · 2025-01-06T09:28:12Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

The point of StateTable::get_compacted_row is that, when a table is using value-encoding, we can save a roundtrip of parsing the encoded row bytes into an OwnedRow and serializing it back.

This function is currently used only by MaterializeCache when a state table has an on-conflict behavior set. However, in such cases, the state table must be a user TABLE that adopts column-aware encoding. As a result, the optimized branch is effectively dead.

In this PR, we remove this function and inline it to its only caller.

This shows us that there's no significant benefit to stick with value-encoding for its memory-efficiency , and we may adopt a better encoding for this (see #20017 for more details).

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

My PR needs documentation updates.

Release note

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

kwannoel · 2025-01-06T10:35:02Z

This function is currently used only by MaterializeCache when a state table has an on-conflict behavior set. However, in such cases, the state table must be a user TABLE that adopts column-aware encoding. As a result, the optimized branch is effectively dead.

Could you elaborate on this?

This shows us that there's no significant benefit to stick with value-encoding for its memory-efficiency

Is this in all cases? Or just for materialize?

fuyufjh

LGMT

fuyufjh · 2025-01-07T03:25:25Z

I thought this optimization should exist for streaming executors, such as HashJoin. But it turns out to be not?

BugenZhao · 2025-01-07T03:32:22Z

This function is currently used only by MaterializeCache when a state table has an on-conflict behavior set. However, in such cases, the state table must be a user TABLE that adopts column-aware encoding. As a result, the optimized branch is effectively dead.

Could you elaborate on this?

MaterializeExecutor resides either in a table or a materialized view.

For tables, we use column-aware encoding in the storage to support schema change. To get a CompactedRow and store it in MaterializeCache, one has to decode the raw bytes and re-encode with value encoding. So calling get_compacted_row is essentially get_row then encode.

For materialized views, we enforce the invariant of a retractable stream internally, so there's no need for conflict handling. As a result, MaterializeCache is not used at all in this scenario, so we don't even call this function.

This shows us that there's no significant benefit to stick with value-encoding for its memory-efficiency

Is this in all cases? Or just for materialize?

In streaming cache, there's a need for a more compact row representation than Vec<Datum>. Currently we use CompactedRow, which is exactly the value-encoding representation of a row. However, as mentioned in #20017, the major downside is that we have to decode the datums (involving copy and allocation) every time before evaluating them or appending them to a chunk. This has prompted us to explore alternatives to CompactedRow.

Let's check the properties of CompactedRow:

Pros:

no memory padding overhead, compared to Datum
contiguous memory, lightweight to copy
the same encoding as the storage, no need to re-encode (really?)

Cons:

has to be decoded every time to use it
cannot change the algorithm without breaking the storage persisted rows

If we are going to introduce a new in-memory compacted representation, it's easy to see that we can likely eliminate all cons, and retain all pros except for 3.

However, as this PR has shown that this optimization is never actually utilized, we are free to explore the alternatives with less concern.

BugenZhao · 2025-01-07T03:50:57Z

I thought this optimization should exist for streaming executors, such as HashJoin. But it turns out to be not?

I guess it's because iter is more widely used in streaming executors, but we only have this optimization for the point-get interface. However, after a second review, it turns out there is no obstacle preventing us from adopting it in executors like HashJoin...

stdrc

LGTM

wcy-fdu

LGTM

…0034) Signed-off-by: Bugen Zhao <i@bugenzhao.com>

refactor(streaming): remove get_compacted_row from StateTable

170823c

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao requested review from fuyufjh, wenym1, kwannoel and wcy-fdu January 6, 2025 09:28

github-actions bot added the type/refactor label Jan 6, 2025

fuyufjh approved these changes Jan 7, 2025

View reviewed changes

st1page approved these changes Jan 7, 2025

View reviewed changes

stdrc approved these changes Jan 7, 2025

View reviewed changes

wcy-fdu approved these changes Jan 7, 2025

View reviewed changes

kwannoel approved these changes Jan 7, 2025

View reviewed changes

BugenZhao added this pull request to the merge queue Jan 7, 2025

Merged via the queue into main with commit b098e15 Jan 7, 2025
31 checks passed

BugenZhao deleted the bz/remove-get-compacted-row branch January 7, 2025 10:56

lmatz pushed a commit that referenced this pull request Jan 10, 2025

refactor(streaming): remove get_compacted_row from StateTable (#2…

0e4f885

…0034) Signed-off-by: Bugen Zhao <i@bugenzhao.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(streaming): remove `get_compacted_row` from `StateTable` #20034

refactor(streaming): remove `get_compacted_row` from `StateTable` #20034

BugenZhao commented Jan 6, 2025

kwannoel commented Jan 6, 2025

fuyufjh left a comment

fuyufjh commented Jan 7, 2025 •

edited

Loading

BugenZhao commented Jan 7, 2025

BugenZhao commented Jan 7, 2025

stdrc left a comment

wcy-fdu left a comment

refactor(streaming): remove get_compacted_row from StateTable #20034

refactor(streaming): remove get_compacted_row from StateTable #20034

Conversation

BugenZhao commented Jan 6, 2025

What's changed and what's your intention?

Checklist

Documentation

kwannoel commented Jan 6, 2025

fuyufjh left a comment

Choose a reason for hiding this comment

fuyufjh commented Jan 7, 2025 • edited Loading

BugenZhao commented Jan 7, 2025

BugenZhao commented Jan 7, 2025

stdrc left a comment

Choose a reason for hiding this comment

wcy-fdu left a comment

Choose a reason for hiding this comment

refactor(streaming): remove `get_compacted_row` from `StateTable` #20034

refactor(streaming): remove `get_compacted_row` from `StateTable` #20034

fuyufjh commented Jan 7, 2025 •

edited

Loading