This project implements a LangGraph search agent powered by the Ministral 8B language model.
- AI Agent: Utilizes LangGraph and Ministral 8B to build an AI agent that can search the web.
- API: Provides a REST API for easy integration.
- Flexible Invocation: Supports both synchronous and asynchronous (queue-based) interactions.
- Deployment Options: Run locally or deploy to BentoCloud for scalability.
This project is a reference implementation designed to be hackable. Download the source code and use it as a playground to build your own agent APIs:
git clone https://github.com/bentoml/BentoLangGraph.git
cd BentoLangGraph/langgraph-mistral
Install dependencies
pip install -r requirements.txt
Set HuggingFace API Key for downloading the model:
export HF_TOKEN=<your-api-key>
Spin up the REST API server:
bentoml serve .
Invoke with Python API client auto-generated by BentoML:
import bentoml
client = bentoml.SyncHTTPClient("http://localhost:3000")
response = client.invoke("What is the weather in San Francisco today?")
print(response)
Invoke with CURL:
curl -X POST http://localhost:3000/invoke \
-H 'Content-Type: application/json' \
-d '{"query": "what is the weather in San Francisco today?"}'
Example Output:
The weather in San Francisco today is mostly cloudy with a low around 57 degrees and a high of 69 degrees. There is a chance of rain later in the day.
Submit task to queue:
$ curl -X POST http://localhost:3000/invoke/submit \
-H 'Content-Type: application/json' \
-d '{"query": "what is the weather in San Francisco today?"}'
{"task_id":"b1fe7960470740ac9be58dcf740ee587","status":"in_progress"}
Check status of task:
$ curl -s http://localhost:3000/invoke/status?task_id=$TASK_ID
{"task_id":"40451e21a6834c279d78433c5e1a4083","status":"success",
"created_at":"2024-09-23T05:09:36","executed_at":"2024-09-23T05:09:36"}
Get result of task:
$ curl -s http://localhost:3000/invoke/get?task_id=$TASK_ID
{"task_id":"40451e21a6834c279d78433c5e1a4083","status":"success",
"created_at":"2024-09-23T05:09:36","executed_at":"2024-09-23T05:09:36"}
Start development server that will auto reload when code changes:
bentoml serve . --reload
Inspect all event streams:
curl -X POST http://localhost:3000/stream \
-H 'Content-Type: application/json' \
-d '{"query": "what is the weather in San Francisco today?"}'
{'event': 'on_chain_start', 'data': ...}
{'event': 'on_chain_end', 'data': ...}
{'event': 'on_tool_start', 'data': ...}
{'event': 'on_tool_end', 'data': ...}
...
Login to BentoCloud:
pip install bentoml
bentoml cloud login
Create secret:
bentoml secret create huggingface HF_TOKEN=$HF_TOKEN
Deploy:
bentoml deploy . --name search-agent --secret huggingface
Invoke the endpoint:
DEPLOYED_ENDPOINT=$(bentoml deployment get search-agent -o json | jq -r ".endpoint_urls[0]")
python client.py --query "What's the weather in San Francisco today?" --url $DEPLOYED_ENDPOINT