Skip to content

add concurrency to multipart_save method #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

avoidik
Copy link

@avoidik avoidik commented May 13, 2025

my (naive) attempt to fix #738 and #737

Comment on lines 359 to 361
if part_tags.empty? #it is an error to have a multipart upload with no parts
part_upload = service.upload_part(directory.key, key, upload_id, 1, '', part_headers(''))
part_tags << part_upload.headers["ETag"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no similar logic in multipart_copy() method 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably an oversight I guess?

chunk = nil
mutex.synchronize do
if body.respond_to?(:seek)
body.seek(start_pos) rescue nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello, it is me, a little bottleneck right here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This should prevent multiple threads from trying to read the file at the same time I guess. They would take turns leading it into memory, but then actually do the upload in parallel (which is the slower part). I think mutexing the file read is probably necessary if I recall properly. Is that what you were thinking? Happy to discuss further.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will definitely be a bottleneck on a block storage side. Multiple threads trying to seek through blocks and run IO operations from one file, that is a reason why mutex is there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas for alternatives or workarounds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this implementation it's implied that we could fine-tune this bottleneck using two existing properties, concurrency and multipart_chunk_size

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't try this without the mutex.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using some sort of buffering in addition to threads wouldn't make it any better.

@geemus
Copy link
Member

geemus commented May 19, 2025

@avoidik - This seems like a pretty reasonable next step to try to me. Have you had a chance to test it? Were there additional parts you wanted to change before we proceed?

@geemus
Copy link
Member

geemus commented May 19, 2025

@avoidik - also, the head error appears to be on main and unrelated to this, I'll see about addressing that separately.

@avoidik
Copy link
Author

avoidik commented May 19, 2025

@geemus I didn't test it thoroughly, I could run yet another performance benchmark again after your minimal input and some kind of a review

@geemus
Copy link
Member

geemus commented May 19, 2025

@avoidik Yeah, that sounds good. Overall this code appears to align well with the copy code that was already here, but I don't have direct experience using that either. It will also be helpful to know if and how much it helps with the performance issues you originally noted. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extremely slow transfer rate to S3
2 participants