-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: remove clone
on uninitiated_partitions
in SortPreservingMergeStream
#15562
Conversation
… custom fixed size queue this is done to avoid having to clone or change the actual values
I noticed something similar when profiling some click bench queries locally. I am going to take a look at this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rluvaton -- this looks neat. I am running some benchmarks now to see if I can notice any improvement or not
I like the idea of using a fixed size queue rather than reallocating a lot
/// ```plain | ||
/// index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | ||
/// | -------------------------------------------------------- | | ||
/// value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find these examples easier to follow if index
and value
didn't have the same values. maybe something like
/// value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | |
/// value: | 9 | 8 | 7 | 5 | 5 | 4 | 3 | 2 | 1 | 0 | |
FYI @jayzhan211 |
BTW I filed a ticket to track this as I have seen it too |
I removed the custom fixed size queue and replaced with matching vec deque functions that are 0-cost as it's just index manipulation like my prev impl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rluvaton -- I am going to run some benchmarks on this PR but it is looking good to me
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @rluvaton
Unfortunately I didn't see much improvement in my local benchmark runs, but I think this is an improvement regardless
Interesting - I wonder if this is something related to how sampling is done with |
My current intuition about it is twofold:
|
I agree with this assesment -- so we save CPU work with this change but the total query time doesn't really decrease because we are not fully using all of the cores I also have found Samply super awesome -- thank you @comphead for showing that Something else I have observed when looking at Samply is that on my laptop at least, it appears there are several times where processing stalls due to parsing parquet metadata: I was running this
It would be an interesting experiment to see if we can improve performance by caching the metadata. I will file a ticket to investigate this |
|
merge
uninitiated_partitions
VecDeque<usize>
with custom fixed size queuemerge
uninitiated_partitions
VecDeque<usize>
merge
uninitiated_partitions
VecDeque<usize>
clone
on uninitiated_partitions
in SortPreservingMergeStream
Thanks @rluvaton |
Which issue does this PR close?
Related to:
Rationale for this change
I saw in my code that clone and pop front and push takes some of the time from
poll_next_inner
This is from my sampling
Before:

After:

What changes are included in this PR?
FixedSizeQueue
which avoid cloning the vector (could be avoided even without introducing this)Are these changes tested?
Existing tests
Are there any user-facing changes?
No