-
-
Notifications
You must be signed in to change notification settings - Fork 353
add concurrency to multipart_save method #739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
if part_tags.empty? #it is an error to have a multipart upload with no parts | ||
part_upload = service.upload_part(directory.key, key, upload_id, 1, '', part_headers('')) | ||
part_tags << part_upload.headers["ETag"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no similar logic in multipart_copy()
method 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably an oversight I guess?
chunk = nil | ||
mutex.synchronize do | ||
if body.respond_to?(:seek) | ||
body.seek(start_pos) rescue nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello, it is me, a little bottleneck right here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This should prevent multiple threads from trying to read the file at the same time I guess. They would take turns leading it into memory, but then actually do the upload in parallel (which is the slower part). I think mutexing the file read is probably necessary if I recall properly. Is that what you were thinking? Happy to discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will definitely be a bottleneck on a block storage side. Multiple threads trying to seek through blocks and run IO operations from one file, that is a reason why mutex is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas for alternatives or workarounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this implementation it's implied that we could fine-tune this bottleneck using two existing properties, concurrency
and multipart_chunk_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't try this without the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using some sort of buffering in addition to threads wouldn't make it any better.
@avoidik - This seems like a pretty reasonable next step to try to me. Have you had a chance to test it? Were there additional parts you wanted to change before we proceed? |
@avoidik - also, the head error appears to be on main and unrelated to this, I'll see about addressing that separately. |
@geemus I didn't test it thoroughly, I could run yet another performance benchmark again after your minimal input and some kind of a review |
@avoidik Yeah, that sounds good. Overall this code appears to align well with the copy code that was already here, but I don't have direct experience using that either. It will also be helpful to know if and how much it helps with the performance issues you originally noted. Thanks! |
my (naive) attempt to fix #738 and #737