Skip to content

Commit 2cb5ad7

Browse files
loki-gh-app[bot]jeschkiesJSticklertrevorwhitney
authored
feat: LID for query splitting (backport release-3.4.x) (#17378)
Signed-off-by: Karsten Jeschkies <k@jeschkies.xyz> Co-authored-by: Karsten Jeschkies <karsten.jeschkies@grafana.com> Co-authored-by: J Stickler <julie.stickler@grafana.com> Co-authored-by: Trevor Whitney <trevorjwhitney@gmail.com>
1 parent 62fe983 commit 2cb5ad7

File tree

1 file changed

+117
-0
lines changed

1 file changed

+117
-0
lines changed
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: "0006: Expose Split Logic in API"
3+
description: "0006: Expose Split Logic in API"
4+
---
5+
6+
# 0006: Expose Split Logic in API
7+
8+
**Author:** Karsten Jeschkies (karsten.jeschkies@grafana.com)
9+
10+
**Date:** 03/2025
11+
12+
**Sponsor(s):** @trevorwhitney
13+
14+
**Type:** API
15+
16+
**Status:** Review
17+
18+
**Related issues/PRs:** N/A
19+
20+
**Thread from [mailing list](https://groups.google.com/forum/#!forum/lokiproject):** N/A
21+
22+
---
23+
24+
## Background
25+
26+
Loki has an internal logic to split and shard log and metric queries by time into multiple queries. However, this logic is not
27+
accessible outside of the code base. This proposal intends to create an API for clients to split queries by exposing the
28+
internal split logic.
29+
30+
A split query is divided by time. The results of a split query can be concatenated in order to form the final
31+
result.
32+
33+
A sharded query is divided by label values. The results of a sharded cannot always be concatenated but require some
34+
extra logic to form the final result. Some queries, such as `topk` cannot be sharded at all.
35+
36+
## Problem Statement
37+
38+
Loki clients such as the Grafana Loki datasource or the [Trino Loki
39+
connector](https://github.com/trinodb/trino/tree/master/plugin/trino-loki) benefit from splitting LogQL queries into multiple sub-queries either to process
40+
smaller chunks or to distribute work on query results.
41+
42+
Splitting a query requires parsing the LogQL query first but there are no parsers for other languages except Go and
43+
JavaScript.
44+
45+
## Goals
46+
47+
The intended goal is to enable any client to split a query into multiple sub-queries that can be either executed
48+
sequentially or in parallel. The joined result of the sub-queries must be the same as executing the same query.
49+
50+
## Non-Goals
51+
52+
This proposal does not aim to provide pagination for query results.
53+
54+
## Proposals
55+
56+
### Proposal 0: Do nothing
57+
58+
Without an API each client will have to use a LogQL parser.
59+
60+
*Pros*
61+
- The split logic in Loki can be changed at will without breaking client behavior.
62+
- There is no maintanence overhead for an API.
63+
64+
*Cons*
65+
- Currently, the LogQL grammar is specific to Go. It is not easy to port it and the parser to other languages.
66+
- Any changes to the splitting logic must be implemented for each client/platform.
67+
68+
### Proposal 1: Expose Splitting in an API
69+
70+
A new endpoint `GET /loki/api/v1/split_query` is introduced that takes a `splits` parameter and the same parameters as the [/loki/api/v1/query_range](https://grafana.com/docs/loki/latest/reference/loki-http-api/#query-logs-within-a-range-of-time) endpoint. The new endoint returns sub-queries split by time.
71+
72+
The `splits` parameter optionally defines the number of desired splits. The API is allowed to return fewer splits than requested.
73+
74+
The `limit` parameter has extended semantics. Setting it to `0` for a log stream query indicates to query all logs.
75+
76+
The response body is JSON encoded:
77+
78+
```json
79+
{
80+
"resultType": "matrix" | "streams" | "vector",
81+
"subqueries": [
82+
{
83+
start: <timestamp nanoseconds>,
84+
end: <timestamp nanoseconds>,
85+
limit: <number>,
86+
query: <query string>
87+
},
88+
{
89+
start: <timestamp nanoseconds>,
90+
end: <timestamp nanoseconds>,
91+
limit: <number>,
92+
query: <query string>
93+
}
94+
]
95+
}
96+
```
97+
98+
*Pros*
99+
- Clients can split queries independent on the implemation language and platform.
100+
- Split logic is controlled by Loki and not the client. This means it can be improved, for example, by introducing sharding
101+
labels.
102+
103+
*Cons*
104+
- A new API endpoint increases the compatiblity surface area and thus maintanence overhead for Loki maintainers.
105+
106+
### Proposal 2: Support Apache Arrow Flight RPC
107+
108+
Loki could support Apache [Arrow Flight RPC](https://arrow.apache.org/docs/format/Flight.html) which is designed to
109+
exchange large data sets in shards between services.
110+
111+
*Pros*
112+
- Supporting an open standard comes with support for other non-Loki clients.
113+
114+
*Cons*
115+
- Loki would have to support Apache Arrow which make the implementation more complicated.
116+
- Arrow Flight RPC assumes the data is being queried on the first request. Which means all shards are available at the
117+
same time. However, the intent of this document is that shards can be queried independently.

0 commit comments

Comments
 (0)