-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Is your feature request related to a problem? Please describe
Opensearch returns a 200 status code as a response for an _msearch
call, even though there can be partial failures within the complete request.
This apparently is done because the client otherwise might incorrectly retry the entire call, and the responsibility of identifying the partial failures lies on the client. It would be very useful to know if there are such errors observed across the nodes, because the clients may not be fully aware of the partial failures.
Describe the solution you'd like
On lines of #4562 (comment) , the proposal is to
1.
2. Add a stat within the search actions (at the coordinating node) which will count the number of errors of each type.
3. These actions can publish a counter regularly for each error type (400/429/500, etc.) while handling search requests.
4. The counter will be exposed using an API which can be used by clients to query (e.g. using node stats API) the number of item level failures.
Based on the way Opensearch publishes all cumulative stats, most of the clients already setup a regular poller which allows them to make sense of the cumulative counter based on the last value seen for a node and the current value seen for a node to gather any insights over a desired time period. This will add value to all those use cases.
Related component
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status