Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support cold metrics query for metrics/traces/logs in BanyanDB storage #13093

Open
2 of 3 tasks
wu-sheng opened this issue Mar 9, 2025 · 5 comments
Open
2 of 3 tasks
Assignees
Labels
backend OAP backend related. feature New feature
Milestone

Comments

@wu-sheng
Copy link
Member

wu-sheng commented Mar 9, 2025

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

With #12938 is going to be added in 10.2, UI and Query APIs are required to change adaptively. The cold data is long-term-persistent data in low-cost data storage, e.g. SATA, S3. Although BanyanDB API will provide nearly 100% compatible query by following the query time range to determine whether to hit on this kind of data, the reality is, the query will be much slower compared with hot and warm data.

So, to be reasonable and response time more predictable, we should indicate a query to include this kind of data from UI and API perspectives.

I am proposing two kinds of API @wankai123 @hanahmily

  1. cold-data supported. There are two conditions for this API to return true, <1> BanyanDB storage enabled, <2> BanyanDB enabled label=cold feature data node with proper cold data TTL.
  2. Extra flag to indicate this query for metrics/traces/logs should include cold data, we could use the term Archived historical data or Long-term historical data as the name to this flag.

@Fine0830 For the UI part, we should have a check-box(or some other way) to provide this flag to the end user, and by default, NO query should enable this.
The API <1> could determine whether UI shows this flag as other storages(JDBC and Elasticsearch) don't have such a feature.

The UI style for trace and log queries is easy, we could easily add a check-box to these widgets.
But for metrics, the tricky point is, the query is slow. If we add this check-box to the global time-range selector, the dashboard is very hard to load. @Fine0830 You could think about how we should add this, my idea for now is, we just support historical data query in a pop-up box. For the metrics widget, we could add a menu item for it, the user could click that to pop up a query box, and in there, the user could(by the check-box checked) query data from a longer days ago.

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels Mar 9, 2025
@wu-sheng wu-sheng added this to the 10.2.0 milestone Mar 9, 2025
@wu-sheng
Copy link
Member Author

Another option is, we determine this automatically through time range. This is a kind of trade off, the good part is, this is transparent for UI and query, but the experience of the query is different. Query cold data is much slower compared with warm/hot data.

@wu-sheng
Copy link
Member Author

@wankai123 As the warm->cold happens after warm TTL reached, so, when we use the TTL and time in OAP to calculate the hit stages(hot, warm, cold), don't need to worry about providing the wrong stage(warm and hot are default stages, they are always picked without explicit declarance).
I prefer to do the stage hit automatically in OAP, then we don't need query-protocol level change, and we could support native UI and PromQL/Grafana automatically.

@wu-sheng
Copy link
Member Author

wu-sheng commented Mar 23, 2025

cc @Fine0830. Based on the above preference, I hope we can skip the UI change.

But, about the cold traces and logs, it may need a separate and pop-out page to query the historic data, and with longer timeout threshold.

@wu-sheng
Copy link
Member Author

For cold data specifically, if we talk about cold traces and logs, it is hard to say, our current design is good. Because, the scale of the dataset is too huge, and the query may take minutes, we may need some kind of SQL-DB-procedure to process this long-time scanning.

@wu-sheng wu-sheng modified the milestones: 10.2.0, 10.3.0 Mar 23, 2025
@wu-sheng
Copy link
Member Author

Change this to milestone, we need some discussions to determine the final design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants