Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch logs get_log_events returns nextForwardToken even though no more events appear AND limit is not respected #4472

Open
1 task
seanmann-synth opened this issue Mar 13, 2025 · 1 comment
Assignees
Labels
bug This issue is a confirmed bug. cloudwatch investigating This issue is being investigated and/or work is in progress to resolve the issue. p3 This is a minor priority issue

Comments

@seanmann-synth
Copy link

Describe the bug

I was diagnosing a bottleneck in our code and found 2 oddities:

  1. First is that nextForwardToken is returned with a new value despite no more event records being returned.
  2. Second is that the limit is not being respected and results get capped at 0.08 MB.
  • Maybe I'm doing something wrong?
  • Is it safe to stop querying for more results if no results are returned despite the nextForwardToken being returned as a new token?

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

  1. API should not return a nextForwardToken when there are no more events to retrieve.
  2. API should return up to 1MB or 10,000 results as documented

Current Behavior

Sample response:

Iteration: 1 -  Next token: None
Iter: 1
Events: 285
Response size: 0.08 MB
Content-Length from headers: 0.10 MB
Iteration: 2 -  Next token: f/38843677690954359307332779238523611073553355858145312768/s
Iter: 2
Events: 290
Response size: 0.08 MB
Content-Length from headers: 0.10 MB
Iteration: 3 -  Next token: f/38843871704337782941158364500062913595036750815702024192/s
Iter: 3
Events: 150
Response size: 0.04 MB
Content-Length from headers: 0.05 MB
Iteration: 4 -  Next token: f/38843972064626708085705530648379222384170020279955292160/s
No events found in response at iteration 4
Iteration: 5 -  Next token: f/38844167221179781711902907616856595068764451014568804351/s
No events found in response at iteration 5
Iteration: 6 -  Next token: f/38844367927886568487511181438321048904018004838312804351/s
No events found in response at iteration 6
Iteration: 7 -  Next token: f/38844568634593355263119455259785502739271558662056804351/s
No events found in response at iteration 7
Iteration: 8 -  Next token: f/38844769341300142038727729081249956574525112485800804351/s
No events found in response at iteration 8
Iteration: 9 -  Next token: f/38844970048006928814336002902714410409778666309544804351/s
No events found in response at iteration 9
Iteration: 10 -  Next token: f/38845170754713715589944276724178864245032220133288804351/s
No events found in response at iteration 10
Iteration: 11 -  Next token: f/38845371461420502365552550545643318080285773957032804351/s
No events found in response at iteration 11
Iteration: 12 -  Next token: f/38845572168127289141160824367107771915539327780776804351/s
No events found in response at iteration 12
Duration: 3.93 seconds
Count: 12
Duration: 0.00 seconds
Count: 725

Sample response 2 (note that the number of events and total MB went down...):

Iteration: 1 -  Next token: None
Iter: 1
Events: 185
Response size: 0.05 MB
Content-Length from headers: 0.06 MB
Iteration: 2 -  Next token: f/38843610791461755374882617651150595545357572210706350080/s
Iter: 2
Events: 200
Response size: 0.05 MB
Content-Length from headers: 0.07 MB
Iteration: 3 -  Next token: f/38843744651974719242528930077070676199508848723704348672/s
Iter: 3
Events: 200
Response size: 0.05 MB
Content-Length from headers: 0.07 MB
Iteration: 4 -  Next token: f/38843878389186862907499427146054793968755222266668056576/s
Iter: 4
Events: 140
Response size: 0.04 MB
Content-Length from headers: 0.05 MB
Iteration: 5 -  Next token: f/38843972064626708085705530648379222384170020279955292160/s
No events found in response at iteration 5
Iteration: 6 -  Next token: f/38844167221179781711902907616856595068764451014568804351/s
No events found in response at iteration 6
Iteration: 7 -  Next token: f/38844367927886568487511181438321048904018004838312804351/s
No events found in response at iteration 7
Iteration: 8 -  Next token: f/38844568634593355263119455259785502739271558662056804351/s
No events found in response at iteration 8
Iteration: 9 -  Next token: f/38844769341300142038727729081249956574525112485800804351/s
No events found in response at iteration 9
Iteration: 10 -  Next token: f/38844970048006928814336002902714410409778666309544804351/s
No events found in response at iteration 10
Iteration: 11 -  Next token: f/38845170754713715589944276724178864245032220133288804351/s
No events found in response at iteration 11
Iteration: 12 -  Next token: f/38845371461420502365552550545643318080285773957032804351/s
No events found in response at iteration 12
Iteration: 13 -  Next token: f/38845572168127289141160824367107771915539327780776804351/s
No events found in response at iteration 13
Iteration: 14 -  Next token: f/38845645760586444292217191434978071655132297516149604351/s
No events found in response at iteration 14
Duration: 8.08 seconds
Count: 14

Reproduction Steps

Sample code:

params = dict(
    logGroupName=self.log_group_name,
    logStreamName=log_stream_name,
    startFromHead=True,
    limit=10000,
)

events = []
response = dict()
next_token = None

import time

start_time = time.time()
count = 0
while next_token != response.get("nextForwardToken", ""):
    count += 1
    next_token = response.get("nextForwardToken")
    LOG.error(f"Iteration: {count} -  Next token: {next_token}")
    if next_token:
        params["nextToken"] = next_token
    response = self.cloudwatch.get_log_events(**params)

    if response["events"]:
        LOG.error(f"Iter: {count}")
        LOG.error(f"Events: {len(response['events'])}")
        # Calculate size of response in MB
        response_size = len(str(response).encode("utf-8")) / (1024 * 1024)
        LOG.error(f"Response size: {response_size:.2f} MB")
        # Get content-length from response metadata if available
        content_length = int(
            response.get("ResponseMetadata", {}).get("HTTPHeaders", {}).get("content-length", 0)
        ) / (1024 * 1024)
        LOG.error(f"Content-Length from headers: {content_length:.2f} MB")
        events.extend(response["events"])
    else:
        LOG.error(f"No events found in response at iteration {count}")

end_time = time.time()
duration = end_time - start_time
LOG.error(f"Duration: {duration:.2f} seconds")
LOG.error(f"Count: {count}")

Possible Solution

No response

Additional Information/Context

No response

SDK version used

Python 3.12.9 and boto3==1.37.12

Environment details (OS name and version, etc.)

MacOS 15.3.2

@seanmann-synth seanmann-synth added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Mar 13, 2025
@seanmann-synth
Copy link
Author

I'm also seeing instances where it's not returning full results at all. I'm talking about logs that are only 200kb.

@adev-code adev-code self-assigned this Apr 7, 2025
@adev-code adev-code added cloudwatch p3 This is a minor priority issue investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. cloudwatch investigating This issue is being investigated and/or work is in progress to resolve the issue. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

2 participants