Skip to content

Revert "SQS: Support Async JSON SQS Protocol & Message Attributes" #2260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

auvipy
Copy link
Member

@auvipy auvipy commented Mar 15, 2025

to see if it fixes a regression report Reverts #2226

Copy link

codecov bot commented Mar 15, 2025

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.54%. Comparing base (571dd4d) to head (56619bd).

Files with missing lines Patch % Lines
kombu/asynchronous/aws/sqs/connection.py 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2260      +/-   ##
==========================================
- Coverage   81.57%   81.54%   -0.03%     
==========================================
  Files          77       77              
  Lines        9537     9523      -14     
  Branches     1161     1157       -4     
==========================================
- Hits         7780     7766      -14     
  Misses       1564     1564              
  Partials      193      193              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey bro, I've already completely investigated the situation, see my comment with the reproduction script.

This code is legit (in terms of the performance hit), we don't need to revert it.
You can close the PR :)

EDIT:
Referring to this: #2258

@hfern
Copy link
Contributor

hfern commented Mar 15, 2025

@Nusnus or @auvipy I am having a hard time following the conversation across multiple tickets. Please excuse me if I am missing something and please correct me if my understanding is wrong --

Hey bro, I've already completely investigated the situation, see my comment with the reproduction script.

This code is legit (in terms of the performance hit), we don't need to revert it.
You can close the PR :)

EDIT:
Referring to this: #2258

So there seem to be two issues:

  1. a large performance hit (pycurl vs urllib3) in Update to Kombu 5.5.0 stopped processing SQS messages #2258
  2. A new UnknownOperationException in the SQS broker reported in UnknownOperationException after upgrading Kombu to 5.5.0 #2257 and originating in SQS: Support Async JSON SQS Protocol & Message Attributes #2226.

Is that right?

I can dig into (2) but I am not familiar with (1). In addition, if you need a very fast fix for (2) feel free to revert #2226 and I can do debugging while not blocking anyone.

@Nusnus
Copy link
Member

Nusnus commented Mar 15, 2025

@hfern

Is that right?

Yes.

I can dig into (2)

Thank you, that will be very helpful!

In addition, if you need a very fast fix for (2) feel free to revert #2226 and I can do debugging while not blocking anyone.

Thank you. For now I prefer to avoid reverting it unless we know for sure it causes the problem.

@hfern
Copy link
Contributor

hfern commented Mar 15, 2025

@Nusnus thanks for the confirmation. I will work to repro today and get back to you.

Do you have a time frame for when you need a revert/keep decision?

@Nusnus
Copy link
Member

Nusnus commented Mar 15, 2025

@hfern

@Nusnus thanks for the confirmation. I will work to repro today and get back to you.

Do you have a time frame for when you need a revert/keep decision?

I aim to resolve everything for a fixed release by Monday.
The sooner the better though.

I’m available during this time for assistance and review if needed.

@hfern
Copy link
Contributor

hfern commented Mar 16, 2025

@Nusnus So far I have not been able to reproduce this issue without forcing an illegal combination of packages versions between celery, kombu, boto3, and botocore. So far my findings are that if you stay within the prescribed version ranges (as encoded in install_requires), you should not hit this bug.

I will spend more time on this tomorrow hunting for valid ranges that also have this bug.

You might have to consider whether Kombu wants to support running out-of-spec versions of botocore. If you want to support running very old botocore, Celery itself will have to lower the boto3 version requirement. Otherwise, if you want to preserve the current minimum-required botocore, we may have to instruct users having issues to upgrade to the latest minimum support botocore version from 2.5 years ago.

I am frankly surprised that whatever is assembling illegal package combinations is not throwing version-compatibility errors and is instead naively allow incompatible packages to run together. That seems like something that would require a very high bar to support.

@sarswamanishpl
Copy link

sarswamanishpl commented Mar 18, 2025

We are facing the same error with respect to containers crashing.
celery version - celery==5.3.0
And boto3 - boto3==1.26

Error - CRITICAL/MainProcess] Unrecoverable error: Exception("Request HTTP Error HTTP 404 Not Found (b'\n')")
What could be resolution ?

@hfern
Copy link
Contributor

hfern commented Mar 18, 2025

@sarswamanishpl Can you report the exact installed version of boto3 and botocore as well? You reported 1.26 for boto3 but there should be a patch number in there as well (in this case it's important).

You can find this with pip freeze if you're in a virtualenv.

@auvipy
Copy link
Member Author

auvipy commented Mar 18, 2025

I think we should bump the minimum version of botocore, is that right?

@hfern
Copy link
Contributor

hfern commented Mar 19, 2025

@auvipy I would agree with that, but for at least one of these folks they are running a botocore version that is already out of spec. So bumping would not fix his issue (though, I would hesitate to want to officially support allowing out-of-spec botocore versions).

I noted in the original PR, but the optimal version of botocore would be botocore>=1.34.90 from April 2024 and then we could actually clean all of this code up further. I will run some tests against that and verify celery + kombu + botocore==1.34.90 works.

Update: 'kombu==5.5.0' 'botocore==1.34.90' 'celery[sqs]' Works as expected.

@sarswamanishpl
Copy link

@hfern this is the botocore version we are using botocore==1.29.165

@hfern
Copy link
Contributor

hfern commented Mar 19, 2025

@sarswamanishpl thank you for that info! You have led me to the first quality repro case!

For anyone else following along, you can repro with poetry add 'kombu==5.5.0' 'botocore==1.29.165' 'celery[sqs]' and a tasks.py of

from celery import Celery

app = Celery(
    'kombudebug.tasks',
    broker='sqs://',
    broker_transport_options = {
        'region': 'us-west-2',
    },
)

app.conf.update(
    task_default_queue='kombu-debug-queue',
)

@app.task
def add(x, y):
    print("Got task with args:", x, y)
    return x + y

if __name__ == '__main__':
    print("Sending task...")
    add.delay(1, 2)
    print("Task sent.")

And the error occurs when running the worker: celery -A kombudebug.tasks worker --loglevel=DEBUG -c 1

[2025-03-18 23:22:42,421: INFO/MainProcess] celery@PhoenixDeux ready.
[2025-03-18 23:22:42,422: DEBUG/MainProcess] basic.qos: prefetch_count->4
[2025-03-18 23:22:42,426: DEBUG/MainProcess] Event choose-signer.sqs.ReceiveMessage: calling handler <function set_operation_specific_signer at 0x7633dfb45760>
[2025-03-18 23:22:42,428: DEBUG/MainProcess] Calculating signature using v4 auth.
[2025-03-18 23:22:42,428: DEBUG/MainProcess] CanonicalRequest:
POST
/123456/kombu-debug-queue

host:sqs.us-west-2.amazonaws.com
x-amz-date:20250319T062242Z

host;x-amz-date
XXX
[2025-03-18 23:22:42,428: DEBUG/MainProcess] StringToSign:
AWS4-HMAC-SHA256
20250319T062242Z
20250319/us-west-2/sqs/aws4_request
XXX
[2025-03-18 23:22:42,428: DEBUG/MainProcess] Signature:
XXX
[2025-03-18 23:22:42,429: DEBUG/MainProcess] Certificate path: /venv/lib/python3.12/site-packages/botocore/cacert.pem
[2025-03-18 23:22:42,429: DEBUG/MainProcess] Starting new HTTPS connection (1): sqs.us-west-2.amazonaws.com:443
[2025-03-18 23:22:42,621: DEBUG/MainProcess] https://sqs.us-west-2.amazonaws.com:443 "POST /123456/kombu-debug-queue HTTP/1.1" 404 29
[2025-03-18 23:22:42,623: CRITICAL/MainProcess] Unrecoverable error: Exception("Request HTTP Error  HTTP 404  Not Found (b'<UnknownOperationException/>\\n')")
Traceback (most recent call last):
  File "/venv/lib/python3.12/site-packages/celery/worker/worker.py", line 202, in start
    self.blueprint.start(self)
  File "/venv/lib/python3.12/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/venv/lib/python3.12/site-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
           ^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/celery/worker/consumer/consumer.py", line 340, in start
    blueprint.start(self)
  File "/venv/lib/python3.12/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/venv/lib/python3.12/site-packages/celery/worker/consumer/consumer.py", line 746, in start
    c.loop(*c.loop_args())
  File "/venv/lib/python3.12/site-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/hub.py", line 306, in create_loop
    item()
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 161, in __call__
    return self.throw()
           ^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 158, in __call__
    retval = fun(*final_args, **final_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/transport/SQS.py", line 600, in _schedule_queue
    self._get_bulk_async(
  File "/venv/lib/python3.12/site-packages/kombu/transport/SQS.py", line 617, in _get_bulk_async
    return self._get_async(queue, maxcount, callback=callback)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/transport/SQS.py", line 626, in _get_async
    return self._get_from_sqs(
           ^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/transport/SQS.py", line 647, in _get_from_sqs
    return connection.receive_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/aws/sqs/connection.py", line 169, in receive_message
    return self.get_list(
           ^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/aws/connection.py", line 209, in get_list
    return self.make_request(
           ^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/aws/sqs/connection.py", line 115, in make_request
    return self._mexe(prepared_request, callback=callback)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/aws/connection.py", line 152, in _mexe
    conn.getresponse(callback=callback)
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/aws/connection.py", line 101, in getresponse
    return self.http_client.add_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/http/urllib3_client.py", line 69, in add_request
    self._process_queue()
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/http/urllib3_client.py", line 210, in _process_queue
    self._process_pending_requests()
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/http/urllib3_client.py", line 152, in _process_pending_requests
    self._process_request(request)
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/http/urllib3_client.py", line 207, in _process_request
    request.on_ready(response_obj)
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 168, in __call__
    svpending(*ca, **ck)
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 161, in __call__
    return self.throw()
           ^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 158, in __call__
    retval = fun(*final_args, **final_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/vine/funtools.py", line 98, in _transback
    return callback(ret)
           ^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 161, in __call__
    return self.throw()
           ^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/vine/promises.py", line 158, in __call__
    retval = fun(*final_args, **final_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/vine/funtools.py", line 96, in _transback
    callback.throw()
  File "/venv/lib/python3.12/site-packages/vine/funtools.py", line 94, in _transback
    ret = filter_(*args + (ret,), **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/kombu/asynchronous/aws/connection.py", line 252, in _on_list_ready
    raise self._for_status(response, response.read())
Exception: Request HTTP Error  HTTP 404  Not Found (b'<UnknownOperationException/>\n')

So clearly, something is going wrong with that call construction for receiving messages.

I'll investigate further.

@hfern
Copy link
Contributor

hfern commented Mar 19, 2025

@auvipy I can confirm that adding 'application/x-www-form-urlencoded' to the AWS Query request headers fixes the issue. I have no idea how we lost that header value from my original PR's code changes -- it must be something deep down in botocore that was fixed at some point (or our manual request construction is wrong). I'll write a patch for it and PR it ASAP.

Copy link
Member Author

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this do not need the revert anymore as we found a fix

@auvipy auvipy closed this Mar 19, 2025
@auvipy auvipy deleted the revert-2226-hgf/sqs-json branch March 19, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants