-
Notifications
You must be signed in to change notification settings - Fork 7
Prefix Aware Scorer #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…ill be the target for a request. Session affinity scorer added
- Rename SessionId to SessionID - Remove datastore from scoreTargets, add datastore to SessionAffinityScorer - Rename ScoredPod to PodScore
…f ScoreMng - If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0. - [Commits](golang/net@v0.37.0...v0.38.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.38.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
…es/golang.org/x/net-0.38.0 Bump golang.org/x/net from 0.37.0 to 0.38.0
Add scorers support in scheduler
…eployments First iteration of development deployments & environments
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
…ilds fix: basic container image builds for linux
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
empty top level kustomization.yaml - make CICD happy
Fix kustomize envs
provide more defaults in Makefile
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a new prefix store based on a radix tree and a prefix-aware scorer to route requests more efficiently. Key changes include:
- Adding a PrefixStore and its configuration to scheduler configuration with environmental overrides.
- Implementing a new PrefixAwareScorer and integrating it into the scheduler.
- Providing unit tests for the new store and scorer components.
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
pkg/epp/scheduling/types/types.go | Added a Prompt field to the LLMRequest structure. |
pkg/epp/scheduling/scheduler.go | Introduced prefix store configuration and integrated the new scorer. |
pkg/epp/scheduling/prefix_store.go | Implements the prefix store with entry addition, lookup, eviction, and maintenance. |
pkg/epp/scheduling/prefix_store_test.go | Unit tests for basic prefix operations, constraints, TTL expiration, max entries, and maintenance routine. |
pkg/epp/scheduling/prefix_aware_scorer.go | Defines the scorer which uses the prefix store to score pods. |
pkg/epp/scheduling/prefix_aware_scorer_test.go | Tests behavior of the new scorer with various input scenarios. |
Files not reviewed (1)
- go.mod: Language not supported
// Update LastUsed time for the matched entry | ||
entry.LastUsed = time.Now() | ||
ps.tree.Insert(matchedPrefix, entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying 'entry.LastUsed' under a read lock (RLock) in FindPodForPrefix may lead to a race condition. Consider acquiring the write lock (Lock) when updating mutable state or restructuring the update to ensure thread safety.
// Update LastUsed time for the matched entry | |
entry.LastUsed = time.Now() | |
ps.tree.Insert(matchedPrefix, entry) | |
// Update LastUsed time for the matched entry | |
ps.mu.RUnlock() // Release the read lock before acquiring the write lock | |
ps.mu.Lock() | |
entry.LastUsed = time.Now() | |
ps.tree.Insert(matchedPrefix, entry) | |
ps.mu.Unlock() |
Copilot uses AI. Check for mistakes.
@@ -43,6 +48,10 @@ const ( | |||
defaultQueueThresholdCritical = 5 | |||
defaultQueueingThresholdLoRA = 128 | |||
defaultLoraAffinityThreshold = 0.999 | |||
defaultPrefixStoreMaxEntries = 1000 | |||
defaultPrefixStoreMinLen = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably too small :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! testing defaults xD
defaultPrefixStoreMaxEntries = 1000 | ||
defaultPrefixStoreMinLen = 3 | ||
defaultPrefixStoreMaxLen = 100 | ||
defaultPrefixStoreTTLHours = 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minute instead or hour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what should be a reasonable TTL for LLM inferencing.
The I'll open a KVCacheAwareScorer PR soon and we can sync. |
Signed-off-by: Ricardo Noriega <rnoriega@redhat.com>
// between the request's prompt and stored prefixes. The score is normalized between 0 and 1, | ||
// where 1 represents the longest matching prefix. | ||
type PrefixAwareScorer struct { | ||
weight float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, weights are extracted to the layer above.
} | ||
|
||
// NewPrefixAwareScorer creates a new PrefixAwareScorer with the given weight and prefix store | ||
func NewPrefixAwareScorer(weight float64, prefixStore *PrefixStore) Scorer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for receiving (instead of creating internally) the prefixStore
?
Seems like this would be an internal implementation decision?
If so, PrefixStore
is likely not an exported type.
if !found { | ||
logger.V(logging.DEBUG).Info("No matching prefix found, returning zero scores for all pods") | ||
// If no matching prefix found, return zero scores for all pods | ||
for i, pod := range pods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider moving this up to the function initialization?
That way, you'd be assured that you start with 0
s for all Pods and would not need to repeat it multiple times.
|
||
// PrefixEntry represents a single entry in the prefix store | ||
type PrefixEntry struct { | ||
PodRef types.NamespacedName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: is this how defined in PodMetrics.Pod or would a simple string suffice?
) | ||
|
||
// PrefixEntry represents a single entry in the prefix store | ||
type PrefixEntry struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: should the struct and its fields be exported?
On first glance seems that it should be internal use.
errutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/error" | ||
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to have a high level description of data structure and algorithms, and tie those to the processing steps of the LLM request (e.g., consulted in read only or also updated on prompt request, are responses handed, ...)
if entry.PodRef == pod && entry.ModelName == modelName { | ||
logger.V(logging.DEBUG).Info("Updating existing entry", "prefix", prefix, "pod", pod.String()) | ||
entry.LastUsed = time.Now() | ||
ps.tree.Insert(prefix, entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this allow multiple hits on a prefix (e.g., both pods A and B hold the prefix) or just one (last writer wins)?
I think we would want to have multiple (e.g., in case other scorers score these Pods differently).
|
||
// Check total entries limit | ||
if ps.tree.Len() >= ps.config.MaxEntries { | ||
logger.V(logging.DEBUG).Info("Store at capacity, evicting oldest entry", "currentSize", ps.tree.Len(), "maxSize", ps.config.MaxEntries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: would any entries "below" the evicted node become dangling following the eviction?
if len(prefix) < ps.config.MinPrefixLen { | ||
logger.V(logging.DEBUG).Info("Prefix too short", "prefix", prefix, "minLength", ps.config.MinPrefixLen) | ||
return types.NamespacedName{}, false | ||
} | ||
|
||
if len(prefix) > ps.config.MaxPrefixLen { | ||
logger.V(logging.DEBUG).Info("Truncating prefix", "originalLength", len(prefix), "maxLength", ps.config.MaxPrefixLen) | ||
prefix = prefix[:ps.config.MaxPrefixLen] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider extracting to an isValidPrompt
function that can be called from multiple places
} | ||
|
||
// Use LongestPrefix to find the best match | ||
matchedPrefix, val, found := ps.tree.LongestPrefix(prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation seems to score only the longest prefix and not any of the other pods?
@oglok would be great if you can rebase off latest dev for easier review. Most of the 78 modified files are coming in from different PRs and "polluting" your PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there downsides to having this feature (including scorer, store and tests) in its own directory?
That would minimize the changes in the existing code base
} | ||
} | ||
|
||
// ScoreTargets does the actual scoring of the target pods by the session affinity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear where/when the store is updated with new prompt requests and replies.
Is this currently handled?
Hey @elevran ! thanks for reviewing the PR. I'm working on the rebase, and rework my code because apparently the scorers interface merged upstream has changed. I'll get your comments in while doing that! |
This PR implements:
NOTE: there is a runMaintenance function in the store that is not being used. Need to figure out where to call it from without blocking all requests.