Generate text using the loaded model.
Request Body:
{
"prompt": "string",
"model_id": "string | null",
"stream": "boolean",
"max_length": "integer | null",
"temperature": "float",
"top_p": "float"
}
Response:
{
"response": "string",
"usage": {
"prompt_tokens": "integer",
"completion_tokens": "integer",
"total_tokens": "integer"
}
}
Error Responses:
400 Bad Request
: Invalid parameters413 Payload Too Large
: Input too long429 Too Many Requests
: Rate limit exceeded500 Internal Server Error
: Model error
Chat completion endpoint similar to OpenAI's API.
Request Body:
{
"messages": [
{
"role": "string",
"content": "string"
}
],
"model_id": "string | null",
"stream": "boolean",
"max_length": "integer | null",
"temperature": "float",
"top_p": "float"
}
Response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "string"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": "integer",
"completion_tokens": "integer",
"total_tokens": "integer"
}
}
Load a specific model.
Request Body:
{
"model_id": "string"
}
Get information about the currently loaded model.
List all available models in the registry.
Get detailed system information.
Check the health status of the server.
All endpoints return appropriate HTTP status codes:
200
: Success400
: Bad Request404
: Not Found500
: Internal Server Error
Error responses include a detail message:
{
"detail": "Error message describing what went wrong"
}
- 60 requests per minute
- Burst size of 10 requests
Made with ❤️ by Utkarsh Tiwari
GitHub: UtkarshTheDev | Twitter: @UtkarshTheDev | LinkedIn: utkarshthedev