Skip to content

[Feature Request] Track non-successful Search API calls across coordinator nodes #18377

@kkhatua

Description

@kkhatua

Is your feature request related to a problem? Please describe

Opensearch returns a 200 status code as a response for an _msearch call, even though there can be partial failures within the complete request.
This apparently is done because the client otherwise might incorrectly retry the entire call, and the responsibility of identifying the partial failures lies on the client. It would be very useful to know if there are such errors observed across the nodes, because the clients may not be fully aware of the partial failures.

Describe the solution you'd like

On lines of #4562 (comment) , the proposal is to
1.
2. Add a stat within the search actions (at the coordinating node) which will count the number of errors of each type.
3. These actions can publish a counter regularly for each error type (400/429/500, etc.) while handling search requests.
4. The counter will be exposed using an API which can be used by clients to query (e.g. using node stats API) the number of item level failures.
Based on the way Opensearch publishes all cumulative stats, most of the clients already setup a regular poller which allows them to make sense of the cumulative counter based on the last value seen for a node and the current value seen for a node to gather any insights over a desired time period. This will add value to all those use cases.

Related component

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

SearchSearch query, autocomplete ...etcSearch:Query InsightsenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions