Add OpenAI Priority Load Balancer for Azure OpenAI #1626

simonkurtz-MSFT · 2024-05-17T00:18:38Z

This PR introduces the openai-priority-loadbalancer as a native Python option to target one or more Azure OpenAI endpoints. Among the features of the load-balancer are:

Minimally necessary code and configuration to add abstracted load-balancing to the OpenAI Python API Library via a custom httpx client.
Priority-based load-balancing to address scenarios such as Provisioned Throughput Unit (PTU) over Consumption prioritization.
Respects Retry-After headers returned from Azure OpenAI to trigger a temporary open circuit for that endpoint.
Random distribution of Azure OpenAI requests across any available backends (non-429 && non-5xx status).
Automatic retries of failing requests across remaining available backends.
Return of 429 status to OpenAI Python API Library once all backends are exhausted. The Retry-After header value will be the lowest / soonest of all backends to facilitate a very likely successful retry by the OpenAI Python API Library as soon as possible.

Relevant links:

This PR can be merged after @pamelafox's approval.

simonkurtz-MSFT · 2024-05-17T00:20:12Z

Hi @pamelafox & @kristapratico,

This is how the OpenAI Priority Load Balancer integrates. Nevermind the hard-coded backend and the location of the backends list in this PR. I don't intend to ask for a merge, but this was the best way to give you an idea of the setup.

If you have two AOAI instances with the same model, you can plug them both in and should see load-balancing.

…-demo

simonkurtz-MSFT · 2024-05-17T15:57:39Z

I brought up two AOAI instances and related assets and configured both instances as backends in app.py. Then I started to have a conversation.

Both backends are responding. It's important to note that this is not a uniform distribution because available backends are randomized (have to do so as part of multi-process workloads).

At no point did the conversation break down or showed any kind of error through the chat bot.

pamelafox · 2024-06-02T21:56:49Z

Cool! I made a few changes to the PR to make it a little easier to test out, by actually making the additional backend deployment, mind if I push them to the branch?

I think we should mention this option in the Productionizing guide, and if there are multiple customers wanting to use this approach, we could consider integrating it into main as an option.

pamelafox · 2024-06-02T21:57:56Z

Here are what my usage graphs look like during a load test btw:

simonkurtz-MSFT · 2024-06-03T01:58:24Z

Cool! I made a few changes to the PR to make it a little easier to test out, by actually making the additional backend deployment, mind if I push them to the branch?

I think we should mention this option in the Productionizing guide, and if there are multiple customers wanting to use this approach, we could consider integrating it into main as an option.

Hi Pamela, please do push! I very much welcome your expertise and improvements. If there are aspects of the 1.0.9 package itself that should/need to be improved, I'm all ears there, too, of course.

Thank you so much! I know this is extraordinary time spent.

simonkurtz-MSFT · 2024-06-03T02:00:06Z

Here are what my usage graphs look like during a load test btw:

Help me understand your test results, please. Are you hitting different backends or just different models?

simonkurtz-MSFT

@pamelafox, LGTM

pamelafox · 2024-06-03T16:50:52Z

@simonkurtz-MSFT Those graphs were for two different OpenAI instances in the same region.

pamelafox · 2024-06-03T16:52:33Z

@simonkurtz-MSFT Could you send a separate PR adding a mention of this approach to https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/productionizing.md#openai-capacity with a link to this PR? You could contrast when someone might opt for this over ACA/APIM (presumably cost/complexity).

simonkurtz-MSFT · 2024-06-12T14:53:08Z

Hi @pamelafox, could I trouble you for another review of this PR, please? Thank you very much for all your help!

pamelafox · 2024-06-12T18:16:22Z

infra/main.bicep

+  scope: openAiResourceGroup
+  params: {
+    name: '${abbrs.cognitiveServicesAccounts}${resourceToken}-b2'
+    location: openAiResourceGroupLocation


@simonkurtz-MSFT Do your customers typically deploy backends in multiple regions or same region? @mattgotteiner is wondering if the location should be a second location.

@pamelafox & @mattgotteiner, that's a very important question. My customers almost exclusively deploy to multiple regions. Being able to define a second region would be helpful. If not defined, we could fall back to setting the second region to the value of the first region, if need be.

simonkurtz-MSFT and others added 2 commits May 16, 2024 20:17

Add openai-priority-loadbalancer

b087a8e

Merge branch 'Azure-Samples:main' into main

0f5f800

simonkurtz-MSFT added 3 commits May 17, 2024 11:44

Add second working backend

791e51a

Merge branch 'main' of github.com:simonkurtz-MSFT/azure-search-openai…

8e1ffb2

…-demo

Lock openai_priority_loadbalancer to 1.0.6

77d4516

simonkurtz-MSFT added 3 commits May 17, 2024 11:59

Clean up

6442893

Update openai-priority-loadbalancer to 1.0.8

4239b03

Update openai-priority-loadbalancer to 1.0.9

f385736

Use bicep and env vars for backends

fc1a681

simonkurtz-MSFT marked this pull request as ready for review June 3, 2024 15:10

simonkurtz-MSFT force-pushed the main branch from 5b1d27b to fc1a681 Compare June 3, 2024 15:29

Merge branch 'main' into main

6632130

simonkurtz-MSFT commented Jun 3, 2024

View reviewed changes

simonkurtz-MSFT and others added 2 commits June 3, 2024 13:18

Fix linter error

eecfac2

Merge branch 'main' into main

0019c70

simonkurtz-MSFT mentioned this pull request Jun 3, 2024

Update productionizing.md #1677

Merged

5 tasks

simonkurtz-MSFT added 3 commits June 3, 2024 15:26

Merge branch 'main' into main

3ee03c3

Update openai_priority_loadbalancer to 1.1.0

d140570

Update openai_priority_loadbalancer to 1.1.0

b9f2e53

pamelafox reviewed Jun 12, 2024

View reviewed changes

simonkurtz-MSFT requested a review from pamelafox June 12, 2024 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI Priority Load Balancer for Azure OpenAI #1626

Add OpenAI Priority Load Balancer for Azure OpenAI #1626

simonkurtz-MSFT commented May 17, 2024 •

edited

Loading

simonkurtz-MSFT commented May 17, 2024

simonkurtz-MSFT commented May 17, 2024

pamelafox commented Jun 2, 2024

pamelafox commented Jun 2, 2024

simonkurtz-MSFT commented Jun 3, 2024

simonkurtz-MSFT commented Jun 3, 2024

simonkurtz-MSFT left a comment •

edited

Loading

pamelafox commented Jun 3, 2024

pamelafox commented Jun 3, 2024

simonkurtz-MSFT commented Jun 12, 2024

pamelafox Jun 12, 2024

simonkurtz-MSFT Jun 12, 2024 •

edited

Loading

Add OpenAI Priority Load Balancer for Azure OpenAI #1626

Are you sure you want to change the base?

Add OpenAI Priority Load Balancer for Azure OpenAI #1626

Conversation

simonkurtz-MSFT commented May 17, 2024 • edited Loading

simonkurtz-MSFT commented May 17, 2024

simonkurtz-MSFT commented May 17, 2024

pamelafox commented Jun 2, 2024

pamelafox commented Jun 2, 2024

simonkurtz-MSFT commented Jun 3, 2024

simonkurtz-MSFT commented Jun 3, 2024

simonkurtz-MSFT left a comment • edited Loading

Choose a reason for hiding this comment

pamelafox commented Jun 3, 2024

pamelafox commented Jun 3, 2024

simonkurtz-MSFT commented Jun 12, 2024

pamelafox Jun 12, 2024

Choose a reason for hiding this comment

simonkurtz-MSFT Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

simonkurtz-MSFT commented May 17, 2024 •

edited

Loading

simonkurtz-MSFT left a comment •

edited

Loading

simonkurtz-MSFT Jun 12, 2024 •

edited

Loading