Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Helper functions for boto3 efficiency #4470

Open
1 of 2 tasks
FrcMoya opened this issue Mar 11, 2025 · 2 comments
Open
1 of 2 tasks

Feature Request: Helper functions for boto3 efficiency #4470

FrcMoya opened this issue Mar 11, 2025 · 2 comments
Assignees
Labels
feature-request This issue requests a feature. p3 This is a minor priority issue

Comments

@FrcMoya
Copy link

FrcMoya commented Mar 11, 2025

Describe the feature

Hello,

I would like to know why there are no helper functions in boto3. I understand that this library aims to provide a 1-to-1 mapping of the AWS API, but I believe it would be beneficial to include some helper functions to facilitate more efficient API calls.

For instance, the elasticsearch-py SDK includes helper functions like streaming_bulk, which optimize requests by maximizing the number of records sent or ensuring the maximum size is reached. Currently, I use custom classes to manage this for different services, including CloudWatch Logs, Kinesis, and SQS.

Adding such helper functions to boto3 would greatly enhance its usability and efficiency.

Thanks in advance!

Use Case

Sent several logs to different services (CWLogs, Kinesis, SQS...) optimizing the requests

Proposed Solution

No response

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

1.35.85

Environment details (OS name and version, etc.)

MacOS 15.3.1

@FrcMoya FrcMoya added feature-request This issue requests a feature. needs-triage This issue or PR still needs to be triaged. labels Mar 11, 2025
@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this Mar 17, 2025
@RyanFitzSimmonsAK RyanFitzSimmonsAK added investigating This issue is being investigated and/or work is in progress to resolve the issue. p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Mar 17, 2025
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @FrcMoya, thanks for reaching out. Boto3 actually does have some handwritten methods going beyond the 1:1 mapping to AWS APIs. Mostly, these are for S3 and DynamoDB. However, if you have any particular features you'd like supported for a client, that'd be a great thing to submit as a feature request. Do you have any specifics? Thanks!

@RyanFitzSimmonsAK RyanFitzSimmonsAK added response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Mar 18, 2025
@FrcMoya
Copy link
Author

FrcMoya commented Mar 19, 2025

Hello @RyanFitzSimmonsAK . Thanks for your interest in this feature!

For example, in the case of Kinesis, it would be useful to have a managed class or function that wraps the put_records API call. This class could automatically handle AWS limits—such as 500 records per request, 1MiB per record, and 5MiB per batch—and send the batch automatically when a limit is reached.

A good reference is OpenSearchPy, which has streaming_bulk and bulk functions that work similarly. These functions consume an iterator and automatically send chunks when size or count limits are reached.

To optimize my own use case, I built a wrapper class that buffers requests and only sends them when needed:

class SendToKinesisWrapper:
    def __init__(self, kinesis_client, logger, stream_name):
        self.kinesis_client = kinesis_client
        self.logger = logger
        self.stream_name = name

        # Limits by AWS API (boto3.client(kinesis).put_records)
        _limit_bytes_per_record = 1048576
        _limit_bytes_per_batch = 5242880
        _limit_records_per_batch = 500
        
        self._records_to_send = []
        self._total_bytes_to_send = 0
        self._batch_count = 1
        self._records_count = 0


        # __enter__ method for context manager
       def __enter__(self):
           return self

       # __exit__ method for context manager
       def __exit__(self, type, value, traceback):
           self._send_last_batch()

       
      def put_record(self, data, partition_key=None):
           # Add data to batch and checks limits. If limits reached: send batch and start a new one


      def _send_last_batch(self):
           # If something in the cache, send it

This allows for optimized usage like this:

with SendToKinesisWrapper(...) as kinesis_helper:
    for data in data_to_send:
        kinesis_helper.put_record(data)

I think this feature would help users optimize batch requests without manually handling API limits.

Let me know if you'd like more details. Thanks again!

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

2 participants