Skip to content

lexer error: too many states: 10000 >= 10000; stopping #1169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
slice-harshit opened this issue Mar 21, 2025 · 5 comments
Open

lexer error: too many states: 10000 >= 10000; stopping #1169

slice-harshit opened this issue Mar 21, 2025 · 5 comments

Comments

@slice-harshit
Copy link

Code:

BASE_PROMPT = """<s>[INST] <<SYS>>
You are a financial SMS analyzer specializing in Indian banking transactions. Your task is to carefully analyze each message.

<big prompt>

Now, you will analyze multiple SMS messages at once.
<</SYS>>
"""
def process(sms_texts):
    batch_results = []
    
    start_time = time.time()
    
    # Create batch prompt with all SMS using join instead of loop
    batch_prompt = BASE_PROMPT + "\n" + "\n".join(
        f"--- SMS #{i+1} ---\n{sms}" 
        for i, sms in enumerate(sms_texts)
    )

    # Finish the instruction part
    batch_prompt += "\n[/INST]\n\nAnalysis of each SMS message:\n"

    # Generate the selection blocks for guidance using itertools instead of range loop
    selection_blocks = []
    for i in range(len(sms_texts)):
        predicted_category = f"predicted_category_{i}"
        predicted_sub_category = f"predicted_sub_category_{i}"
        predicted_transaction_type = f"predicted_transaction_type_{i}"
        predicted_payment_method = f"predicted_payment_method_{i}"
        predicted_entity_name = f"predicted_entity_name_{i}"
        predicted_credit_account_info = f"predicted_credit_account_info_{i}"
        predicted_debit_account_info = f"predicted_debit_account_info_{i}"
        predicted_amount = f"predicted_amount_{i}"
        predicted_balance = f"predicted_balance_{i}"
        predicted_transaction_status = f"predicted_transaction_status_{i}"
        predicted_order_status = f"predicted_order_status_{i}"
        
        selection_blocks.append(f"""
--- SMS #{i+1} ---
Category: {select(allowed_category, name=predicted_category)}
Sub-Category: {select(allowed_sub_category, name=predicted_sub_category)}
Transaction Type: {select(["credit", "debit", "<unk>"], name=predicted_transaction_type)}
Payment Method: {select(["UPI", "Wallet","NACH", "NEFT", "RTGS", "IMPS", "Credit Card", "Debit Card", "Net Banking", "Auto-Debit", "ATM Withdrawal", "Cash Deposit", "Netbanking", "ECS", "<unk>"], name=predicted_payment_method)}
Entity Name: {gen(predicted_entity_name, stop=NEWLINE)}
Credit Account Info: {gen(predicted_credit_account_info, stop=[NEWLINE, ' ', '['])}
Debit Account Info: {gen(predicted_debit_account_info, stop=[NEWLINE, ' ', '['])}
Amount: {gen(predicted_amount, stop=[NEWLINE, ' ', '['])}
Balance: {gen(predicted_balance, stop=[NEWLINE, ' ', '['])}
Transaction Status: {select(["success", "failed", "pending", "<unk>"], name=predicted_transaction_status)}
Order Status: {select(["pending", "shipped", "delivered", "cancelled", "refund_pending", "refund_done", "refund_failed", "<unk>"], name=predicted_order_status)}
""")
    
    batch_prompt += "".join(selection_blocks) + "</s>"
    
    result = model + batch_prompt
    
    result_fields = [
        "category", "sub_category", "transaction_type", "payment_method", 
        "entity_name", "credit_account_info", "debit_account_info", 
        "amount", "balance", "transaction_status", "order_status"
    ]
    
    batch_results = [
        {
            "sms_text": sms,
            **{
                field: result[f"predicted_{field}_{i}"]
                for field in result_fields
            }
        }
        for i, sms in enumerate(sms_texts)
    ]
    
    print("inference_time: ", time.time() - start_time)
    return batch_results
def batch_iterator(iterable, batch_size):
    iterator = iter(iterable)
    while batch := list(itertools.islice(iterator, batch_size)):
        yield batch

def run_until_oom(sms_texts, batch_size=10):
  for batch in batch_iterator(sms_texts, batch_size):
          try:
              batch_results = process(batch)
              results.extend(batch_results)
              
              # Save interim results after each batch
              pd.DataFrame(results).to_excel(interim_file_path, index=False)
              
              batch_count += 1
              if batch_count % 10 == 0:
                  print(f"Successfully processed {batch_count} batches")
              if batch_count % 15 == 0:
                  cleanup_gpu_memory()
          except (RuntimeError, MemoryError) as e:
              if "CUDA out of memory" in str(e) or "out of memory" in str(e).lower():
                  print(f"GPU out of memory error after processing {batch_count} batches")
                  # Save what we have so far
                  pd.DataFrame(results).to_excel(interim_file_path, index=False)
                  return batch_count
              else:
                  # Save what we have so far before raising the error
                  pd.DataFrame(results).to_excel(interim_file_path, index=False)
                  raise

When giving the batch_size of 10 I am getting the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 61
     57         print(f"Interim file deleted. Final results saved to {final_file_path}")
     59     return batch_count
---> 61 total_processed = run_until_oom(sms_texts)

Cell In[10], line 28, in run_until_oom(sms_texts, batch_size)
     26 for batch in batch_iterator(sms_texts_to_process, batch_size):
     27     try:
---> 28         batch_results = process(batch)
     29         results.extend(batch_results)
     31         # Save interim results after each batch

Cell In[8], line 193, in process(sms_texts)
    190 batch_prompt += "</s>"
    192 # Process the entire batch in a single model call
--> 193 result = model + batch_prompt
    195 for i in range(len(sms_texts)):
    196     batch_results.append({
    197         "sms_text": sms_texts[i],
    198         "category": result[f"predicted_category_{i}"],
   (...)
    208         "order_status": result[f"predicted_order_status_{i}"],
    209     })

File ~/devakar_pradhan/query_image/lang_sam.venv/lib/python3.11/site-packages/guidance/models/_model.py:1198, in Model.__add__(self, value)
   1195                 partial_grammar += string(part)
   1196             is_id = not is_id
-> 1198         out = lm + partial_grammar
   1200 # if we find a null value we do nothing
   1201 elif isinstance(value, Null):

File ~/devakar_pradhan/query_image/lang_sam.venv/lib/python3.11/site-packages/guidance/models/_model.py:1207, in Model.__add__(self, value)
   1205 elif isinstance(value, GrammarFunction):
   1206     lm._update_trace_node(lm._id, lm._parent_id, StatelessGuidanceInput(value=value))
-> 1207     out = lm._run_stateless(value)
   1209 # run stateful functions
   1210 else:
   1211     lm._update_trace_node(lm._id, lm._parent_id, StatefulGuidanceInput(value=value))

File ~/devakar_pradhan/query_image/lang_sam.venv/lib/python3.11/site-packages/guidance/models/_model.py:1413, in Model._run_stateless(self, stateless_function, temperature, top_p, n)
   1410 delayed_bytes = b""
   1411 # last_is_generated = False
-> 1413 for chunk in gen_obj:
   1414 
   1415     # we make everything full probability if we are not computing uncertainty
   1416     # if not self.engine.compute_log_probs:
   1417     #     chunk.new_bytes_prob = 1.0
   1418 
   1419     # convert the bytes to a string (delaying if we don't yet have a valid unicode string)
   1420     lm.token_count += chunk.new_token_count
   1421     chunk.new_bytes = delayed_bytes + chunk.new_bytes

File ~/devakar_pradhan/query_image/lang_sam.venv/lib/python3.11/site-packages/guidance/models/_model.py:431, in Engine.__call__(self, prompt, grammar, ensure_bos_token, echo)
    428 while not parser.done():
    429     t0 = time.time()
--> 431     tokens, mask_fut, backtrack = parser.advance(engine_output)
    433     # Note that has_pending_stop implies that the response is a stop response,
    434     # but the converse is not true. We can therefore avoid some (but not all)
    435     # unnecessary calls to get_logits on the final iteration.
    436     has_pending_stop = parser.has_pending_stop()

File ~/devakar_pradhan/query_image/lang_sam.venv/lib/python3.11/site-packages/guidance/_parser.py:78, in TokenParser.advance(self, engine_output)
     75 if self.done():
     76     raise TokenParserException("Cannot advance on a done parser")
---> 78 return self._generator.send(engine_output)

File ~/devakar_pradhan/query_image/lang_sam.venv/lib/python3.11/site-packages/guidance/_parser.py:153, in TokenParser._parse(self, prompt, ensure_bos_token)
    144 if not mask[engine_output.issued_token.token_id]:
    145     # Note: we could punt this probem to ll_interpreter.post_process,
    146     # but it's a bit clearer to handle it here
    147     raise InvalidTokenException(
    148         token=engine_output.issued_token.token_id,
    149         valid_tokens=[i for i in range(len(mask)) if mask[i]],
    150         prompt_tokens=tokens
    151     )            
--> 153 backtrack, ff_tokens = self.ll_interpreter.commit_token(
    154     engine_output.issued_token.token_id
    155 )
    156 if backtrack:
    157     tokens = tokens[:-backtrack]

ValueError: lexer error: too many states: 10000 >= 10000

The error "lexer error: too many states: 10000 >= 10000; stopping" occurs because the guidance library's parser has a maximum limit of 10,000 states, and your structured prompt with multiple select and gen calls for each SMS in the batch exceeds this limit.

It's important for me to run over large batches of SMS messages. If someone has any solution to tweak the limit so that I can run it on a large batch, that would be helpful.

Thank you!

@Harsha-Nori
Copy link
Member

Thanks for reporting this @slice-harshit! I assue it's OK to slow down inference slightly if you really need this to work?

@hudson-ai and @mmoskal , I wonder if we should expose a new param for exposing "fuel" like settings at the python level that users have to opt into.

@mmoskal
Copy link
Member

mmoskal commented Mar 26, 2025

The lexer state limit was raised to 50k in llguidance v0.6.0 and 250k in v0.6.28. @slice-harshit which version of guidance are you using?

@Harsha-Nori I'm sure we can expose some knobs but I would rather have the defaults work!

@slice-harshit
Copy link
Author

slice-harshit commented Mar 26, 2025

I was using guidance==0.2.0 and llguidance==0.5.1. Now, I have updated it to guidance==0.2.1 and llguidance==0.6.31

@Harsha-Nori
Copy link
Member

Harsha-Nori commented Mar 26, 2025 via email

@slice-harshit
Copy link
Author

@Harsha-Nori, I updated the guidance; now, for the same batch_size, I am not getting this error.

@Harsha-Nori @mmoskal @VincentToups @hudson-ai
Also, this is a little off-topic, but my initial goal in batch processing was to reduce the inference time. With this technique, I reduced it from 3 seconds to 1.8 seconds, which is good but still significant for my task. Are there other techniques to reduce the inference time further without compromising with accuracy and also compatible with guidance too.

Sorry for going off-topic, but it would be helpful if I could get any suggestions.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants