Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E&A] Drafts initial conceptual docs for EIS #733

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 97 additions & 2 deletions explore-analyze/elastic-inference/eis.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,101 @@ applies_to:
navigation_title: Elastic Inference Service (EIS)
---

# Elastic {{infer-cap}} Service
# Elastic {{infer-cap}} Service [elastic-inference-service-eis]

This is the documentation of the Elastic Inference Service.
The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster.
With EIS, you don't need to manage the infrastructure and resources required for {{ml}} {{infer}} by adding, configuring, and scaling {{ml}} nodes.
Instead, you can use {{ml}} models for ingest, search and chat independently of your {{es}} infrastructure.

% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) %

## AI features powered by EIS [ai-features-powered-by-eis]

Your Elastic deployment or project comes with a default `Elastic LLM` connector. This connector is used in the AI Assistant, Attack Discovery, Automatic Import and Search Playground.

% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) %

## Available task types

EIS offers the following {{infer}} task types to perform:

* `chat_completion`

## Default EIS endpoints [default-eis-inference-endpoints]

Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API:

* `.rainbow-sprinkles-elastic`

::::{note}

* This endpoint is used by the `Elastic LLM` AI connector, which in turn powers the AI Assistant, Attack Discovery, Automatic Import, and the Search Playground.

::::

% TO DO: Link to the AI assistant documentation in the different solutions and possibly connector docs. %

### Examples

The following example demonstrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint.

```json
POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream
{
"messages": [
{
"role": "user",
"content": "Say yes if it works."
}
],
"temperature": 0.7,
"max_completion_tokens": 300
}
```

The request returns the following response as a stream:

```json
event: message
data: {
"id" : "unified-45ecde2b-6293-4fd6-a195-4252de76ee63",
"choices" : [
{
"delta" : {
"role" : "assistant"
},
"index" : 0
}
],
"model" : "rainbow-sprinkles",
"object" : "chat.completion.chunk"
}


event: message
data: {
"id" : "unified-45ecde2b-6293-4fd6-a195-4252de76ee63",
"choices" : [
{
"delta" : {
"content" : "Yes"
},
"index" : 0
}
],
"model" : "rainbow-sprinkles",
"object" : "chat.completion.chunk"
}
```

## Regions [eis-regions]

All EIS requests are handled by one of these AWS regions:

* `us-east-1`
* `us-west-2`

However, projects and deployments can use the Elastic LLM regardless of their cloud provider or region.
The request routing does not restrict the location of your deployments.

For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,10 @@ Refer to the [{{infer-cap}} APIs](https://www.elastic.co/docs/api/doc/elasticsea

Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` service.


## {{api-request-title}} [infer-service-elastic-api-request]

`PUT /_inference/<task_type>/<inference_id>`


## {{api-path-parms-title}} [infer-service-elastic-api-path-params]

`<inference_id>`
Expand All @@ -34,16 +32,13 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se
* `chat_completion`,
* `sparse_embedding`.


::::{note}
The `chat_completion` task type only supports streaming and only through the `_stream` API.

For more information on how to use the `chat_completion` task type, refer to the [chat completion documentation](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-stream-inference).

::::



## {{api-request-body-title}} [infer-service-elastic-api-request-body]

`max_chunk_size`
Expand All @@ -64,7 +59,6 @@ For more information on how to use the `chat_completion` task type, refer to the
`service_settings`
: (Required, object) Settings used to install the {{infer}} model.


`model_id`
: (Required, string) The name of the model to use for the {{infer}} task.

Expand All @@ -77,9 +71,7 @@ For more information on how to use the `chat_completion` task type, refer to the
}
```



## Elastic {{infer-cap}} Service example [inference-example-elastic]
## Elastic {{infer-cap}} Service example [inference-example-elastic]

The following example shows how to create an {{infer}} endpoint called `elser-model-eis` to perform a `text_embedding` task type.

Expand All @@ -104,4 +96,3 @@ PUT /_inference/chat_completion/chat-completion-endpoint
}
}
```

Loading