SIMBA Improvements #8077

klopsahlong · 2025-04-17T14:35:00Z

This PR makes the following updates:

Support for teacher and prompt model. The teacher model will be used to generate 1/N trajectories for each example, so that we are still targeting rule generation improvements based on task model failure modes.
Support for reasoning models. Because temp must be kept at 1.0 for reasoning models, we create new trajectories by varying the seed. We also updated dspy.LM to handle setting temp / max_token defaults for reasoning models, which were previously causing errors.
Don't append_demo if demo score < 10th percentile. This will help us avoid adding in poor demos.
Support for metric metadata. Allow for additional metadata to be passed back in a dspy.Prediction object, in addition to the score. Note that the one downside here is that users must know that the score should be called 'score'. To address this for now, we've added in an error message telling users to add in score to their dspy.Prediction object.

Handled here, but merged sooner in another commit:

Fixes max recursion depth exceeded error. Previously this error showed up when we tried to save the optimized program. This was due to one of the candidate_programs attached to the best_program being the best_program itself. This was fixed by making best_program a deep copy the max program in candidate_programs.

…a_dev

klopsahlong · 2025-04-17T14:35:46Z

dspy/adapters/utils.py


 def parse_value(value, annotation):
+    annotation = _strip_optional(annotation)


This is to allow support for Optional fields (i.e. where a field could be None, or str), which is the case for KIE and was throwing errors before.

klopsahlong · 2025-04-17T14:37:40Z

dspy/clients/lm.py

-        temperature: float = 0.0,
-        max_tokens: int = 1000,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,


Updating defaults, which were throwing errors for reasoning models. Now, instead of defaulting to a temp of 0.0 and max_tokens 1000 (and erroring out automatically for o3mini), we are setting temp and max_tokens based on whether the model is a reasoning model or not. If the user has intentionally set one of the values to something the reasoning model can't handle (i.e. temperature=0.7), then we will still throw an error.

klopsahlong · 2025-04-17T14:38:03Z

dspy/teleprompt/simba.py

@@ -41,6 +44,8 @@ def __init__(
        self.num_candidates = num_candidates
        self.max_steps = max_steps
        self.max_demos = max_demos


Adding support for prompt / teacher models

klopsahlong · 2025-04-17T14:38:16Z

dspy/teleprompt/simba.py

@@ -310,7 +316,7 @@ def register_new_program(prog: dspy.Module, score_list: list[float]):
                trial_logs[idx_prog-1]["train_score"] = avg_score

        best_idx = scores.index(max(scores)) if scores else 0
-        best_program = candidate_programs[best_idx]


Fixing max recursion depth error

klopsahlong · 2025-04-17T14:39:08Z

dspy/teleprompt/simba_utils.py

+    # Check to see if our model is a reasoning model, which means temp must stay as 1.0
+    model_family = lm.model.split("/")[-1].lower() if "/" in lm.model else lm.model.lower()
+    model_pattern = re.match(r"^o([13])(?:-mini)?", model_family)
+


This function has been updated to:

add support for teacher model (used for 1 of the N trajectories)

add support for reasoning models by varying the seed

klopsahlong · 2025-04-17T14:40:35Z

dspy/teleprompt/simba_utils.py

@@ -28,30 +45,46 @@ def wrapped_program(example):
                print(e)
            trace = dspy.settings.trace.copy()



Updated to handle additional metric metadata in addition to the score. To do this, we check if the output from the metric is a float or int (in which case we use it as the score) or a dspy.Prediction object

klopsahlong · 2025-04-17T14:56:25Z

dspy/teleprompt/simba_utils.py

@@ -116,12 +149,16 @@ def append_a_rule(bucket, system, **kwargs):
        worse_program_outputs=dict(bad["prediction"] or {}),
        worse_reward_value=bad["score"],
        better_reward_value=good["score"],


again, adding additional metric metadata here

klopsahlong · 2025-04-17T15:15:34Z

dspy/teleprompt/simba_utils.py


-        trace = bucket[0]["trace"]
+        good = bucket[0]
+        trace = good["trace"]
        name2demo = {}



Double checking that the demo we're appending is not below the 10th percentile of scores

klopsahlong added 11 commits April 3, 2025 10:06

adding in simba fast

e7a8c11

adding in feedback to rule generator + token counting

9fe6d6e

adding in adjusted score selection

7a8297a

wip

f568a57

Merge branch 'main' of https://github.com/stanfordnlp/dspy into krist…

17f6353

…a_dev

allowing adapters to handle optional typing

63b0907

adding in ability to set teacher model & handle reasoning models

0856d7f

deleted simba fast for now

430f046

wip

ce30c63

wip

5bf5e85

removing unnecessary imports

c559729

klopsahlong commented Apr 17, 2025

View reviewed changes

klopsahlong added 2 commits April 17, 2025 11:08

ruff check

57f0a7d

wip

44bafa9

klopsahlong commented Apr 17, 2025

View reviewed changes

klopsahlong added 2 commits April 17, 2025 11:15

wip

540ddbb

adding informative score error

826bfca

klopsahlong marked this pull request as ready for review April 17, 2025 15:25

klopsahlong added 3 commits April 17, 2025 15:14

changing from RaiseValueError --> assertion

c61a455

max_tokens >= 20k

90e9929

fixing lm assertion error

2670c91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMBA Improvements #8077

SIMBA Improvements #8077

klopsahlong commented Apr 17, 2025 •

edited by okhat

Loading

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025

klopsahlong Apr 17, 2025 •

edited

Loading


		def parse_value(value, annotation):
		annotation = _strip_optional(annotation)

		@@ -28,30 +45,46 @@ def wrapped_program(example):
		print(e)
		trace = dspy.settings.trace.copy()

SIMBA Improvements #8077

Are you sure you want to change the base?

SIMBA Improvements #8077

Conversation

klopsahlong commented Apr 17, 2025 • edited by okhat Loading

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025

Choose a reason for hiding this comment

klopsahlong Apr 17, 2025 • edited Loading

Choose a reason for hiding this comment

klopsahlong commented Apr 17, 2025 •

edited by okhat

Loading

klopsahlong Apr 17, 2025 •

edited

Loading