Skip to content

Prefix Aware Scorer #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 106 commits into
base: dev
Choose a base branch
from
Open

Prefix Aware Scorer #48

wants to merge 106 commits into from

Conversation

oglok
Copy link

@oglok oglok commented Apr 23, 2025

This PR implements:

  • prefix store based in a radix tree storing a map of prefixes (key) and prefixEntry (value) that contains the target pod, last time used and model name.
  • prefix aware scorer
  • addition of new scorer to the scorer manager
  • unit tests for new store and scorer

NOTE: there is a runMaintenance function in the store that is not being used. Need to figure out where to call it from without blocking all requests.

mayabar and others added 30 commits April 10, 2025 14:50
…ill be the target for a request. Session affinity scorer added
- Rename SessionId to SessionID
- Remove datastore from scoreTargets, add datastore to SessionAffinityScorer
- Rename ScoredPod to PodScore
…f ScoreMng

- If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0.
- [Commits](golang/net@v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.38.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
…es/golang.org/x/net-0.38.0

Bump golang.org/x/net from 0.37.0 to 0.38.0
Add scorers support in scheduler
…eployments

First iteration of development deployments & environments
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
…ilds

fix: basic container image builds for linux
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
empty top level kustomization.yaml - make CICD happy
elevran and others added 15 commits April 22, 2025 23:49
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
@oglok oglok changed the title WIP: Prefix Aware Scorer Prefix Aware Scorer Apr 23, 2025
@rootfs rootfs requested a review from Copilot April 23, 2025 12:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new prefix store based on a radix tree and a prefix-aware scorer to route requests more efficiently. Key changes include:

  • Adding a PrefixStore and its configuration to scheduler configuration with environmental overrides.
  • Implementing a new PrefixAwareScorer and integrating it into the scheduler.
  • Providing unit tests for the new store and scorer components.

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/epp/scheduling/types/types.go Added a Prompt field to the LLMRequest structure.
pkg/epp/scheduling/scheduler.go Introduced prefix store configuration and integrated the new scorer.
pkg/epp/scheduling/prefix_store.go Implements the prefix store with entry addition, lookup, eviction, and maintenance.
pkg/epp/scheduling/prefix_store_test.go Unit tests for basic prefix operations, constraints, TTL expiration, max entries, and maintenance routine.
pkg/epp/scheduling/prefix_aware_scorer.go Defines the scorer which uses the prefix store to score pods.
pkg/epp/scheduling/prefix_aware_scorer_test.go Tests behavior of the new scorer with various input scenarios.
Files not reviewed (1)
  • go.mod: Language not supported

Comment on lines +118 to +120
// Update LastUsed time for the matched entry
entry.LastUsed = time.Now()
ps.tree.Insert(matchedPrefix, entry)
Copy link
Preview

Copilot AI Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying 'entry.LastUsed' under a read lock (RLock) in FindPodForPrefix may lead to a race condition. Consider acquiring the write lock (Lock) when updating mutable state or restructuring the update to ensure thread safety.

Suggested change
// Update LastUsed time for the matched entry
entry.LastUsed = time.Now()
ps.tree.Insert(matchedPrefix, entry)
// Update LastUsed time for the matched entry
ps.mu.RUnlock() // Release the read lock before acquiring the write lock
ps.mu.Lock()
entry.LastUsed = time.Now()
ps.tree.Insert(matchedPrefix, entry)
ps.mu.Unlock()

Copilot uses AI. Check for mistakes.

@@ -43,6 +48,10 @@ const (
defaultQueueThresholdCritical = 5
defaultQueueingThresholdLoRA = 128
defaultLoraAffinityThreshold = 0.999
defaultPrefixStoreMaxEntries = 1000
defaultPrefixStoreMinLen = 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably too small :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! testing defaults xD

defaultPrefixStoreMaxEntries = 1000
defaultPrefixStoreMinLen = 3
defaultPrefixStoreMaxLen = 100
defaultPrefixStoreTTLHours = 24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minute instead or hour?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what should be a reasonable TTL for LLM inferencing.

@vMaroon
Copy link
Member

vMaroon commented Apr 23, 2025

The dev branch was rebased on upstream main, which includes the new and different Scorer interface. The Scorer pulled from there defines a Score function that works per-pod and not group of pods. We need to discuss whether to propose changing that but it makes sense to first adapt to it.

I'll open a KVCacheAwareScorer PR soon and we can sync.

Signed-off-by: Ricardo Noriega <rnoriega@redhat.com>
// between the request's prompt and stored prefixes. The score is normalized between 0 and 1,
// where 1 represents the longest matching prefix.
type PrefixAwareScorer struct {
weight float64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, weights are extracted to the layer above.

}

// NewPrefixAwareScorer creates a new PrefixAwareScorer with the given weight and prefix store
func NewPrefixAwareScorer(weight float64, prefixStore *PrefixStore) Scorer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for receiving (instead of creating internally) the prefixStore?
Seems like this would be an internal implementation decision?
If so, PrefixStore is likely not an exported type.

if !found {
logger.V(logging.DEBUG).Info("No matching prefix found, returning zero scores for all pods")
// If no matching prefix found, return zero scores for all pods
for i, pod := range pods {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving this up to the function initialization?
That way, you'd be assured that you start with 0s for all Pods and would not need to repeat it multiple times.


// PrefixEntry represents a single entry in the prefix store
type PrefixEntry struct {
PodRef types.NamespacedName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: is this how defined in PodMetrics.Pod or would a simple string suffice?

)

// PrefixEntry represents a single entry in the prefix store
type PrefixEntry struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: should the struct and its fields be exported?
On first glance seems that it should be internal use.

errutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/error"
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to have a high level description of data structure and algorithms, and tie those to the processing steps of the LLM request (e.g., consulted in read only or also updated on prompt request, are responses handed, ...)

if entry.PodRef == pod && entry.ModelName == modelName {
logger.V(logging.DEBUG).Info("Updating existing entry", "prefix", prefix, "pod", pod.String())
entry.LastUsed = time.Now()
ps.tree.Insert(prefix, entry)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this allow multiple hits on a prefix (e.g., both pods A and B hold the prefix) or just one (last writer wins)?
I think we would want to have multiple (e.g., in case other scorers score these Pods differently).


// Check total entries limit
if ps.tree.Len() >= ps.config.MaxEntries {
logger.V(logging.DEBUG).Info("Store at capacity, evicting oldest entry", "currentSize", ps.tree.Len(), "maxSize", ps.config.MaxEntries)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: would any entries "below" the evicted node become dangling following the eviction?

Comment on lines +102 to +110
if len(prefix) < ps.config.MinPrefixLen {
logger.V(logging.DEBUG).Info("Prefix too short", "prefix", prefix, "minLength", ps.config.MinPrefixLen)
return types.NamespacedName{}, false
}

if len(prefix) > ps.config.MaxPrefixLen {
logger.V(logging.DEBUG).Info("Truncating prefix", "originalLength", len(prefix), "maxLength", ps.config.MaxPrefixLen)
prefix = prefix[:ps.config.MaxPrefixLen]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider extracting to an isValidPrompt function that can be called from multiple places

}

// Use LongestPrefix to find the best match
matchedPrefix, val, found := ps.tree.LongestPrefix(prefix)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation seems to score only the longest prefix and not any of the other pods?

@elevran
Copy link
Collaborator

elevran commented Apr 27, 2025

@oglok would be great if you can rebase off latest dev for easier review. Most of the 78 modified files are coming in from different PRs and "polluting" your PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there downsides to having this feature (including scorer, store and tests) in its own directory?
That would minimize the changes in the existing code base

}
}

// ScoreTargets does the actual scoring of the target pods by the session affinity.
Copy link
Collaborator

@elevran elevran Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear where/when the store is updated with new prompt requests and replies.
Is this currently handled?

@oglok
Copy link
Author

oglok commented Apr 28, 2025

Hey @elevran ! thanks for reviewing the PR. I'm working on the rebase, and rework my code because apparently the scorers interface merged upstream has changed. I'll get your comments in while doing that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants