Skip to content

Add Hive and Iceberg Load benchmark #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

PingLiuPing
Copy link

@PingLiuPing PingLiuPing commented Apr 24, 2025

loading (insert) benchmark is missing in pbench, this PR add the initial files for loading benchmark. It includes test files for hive and iceberg connector, both native and Java.
The data is loaded from tpch connector on the fly.

Future enhancements are required to make the benchmark run in stage such as prepare stage, main stage, cleanup stage etc.

@PingLiuPing PingLiuPing marked this pull request as ready for review April 24, 2025 16:55
@PingLiuPing PingLiuPing changed the title Add TPCH Load benchmark Add Hive and Iceberg Load benchmark Apr 24, 2025
@PingLiuPing PingLiuPing self-assigned this Aug 4, 2025
@PingLiuPing
Copy link
Author

@wanglinsong @ethanyzhang Sorry for the late response, this PR slipped from my mind. I addressed your comments, can you please take another look? Thanks.

Copy link
Member

@wanglinsong wanglinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the DDL to create tables are the same across all scale factors. Can you parameterize or remove the hardcoded schema name: tpch.sf100.?

FROM tpch.sf100.customer;

@PingLiuPing
Copy link
Author

believe the DDL to create tables are the same across all scale factors. Can you parameterize or remove the hardcoded schema name: tpch.sf100.?

FROM tpch.sf100.customer;

Thanks, at the current framework I think this needs lots of work to support that.

@wanglinsong
Copy link
Member

wanglinsong commented Aug 21, 2025

believe the DDL to create tables are the same across all scale factors. Can you parameterize or remove the hardcoded schema name: tpch.sf100.?
FROM tpch.sf100.customer;

Thanks, at the current framework I think this needs lots of work to support that.

Oh, this is an embedded connector. This is not an issue at all. Please ignore.

@PingLiuPing
Copy link
Author

Hi @wanglinsong Thanks for your comments, do you think this PR is ready to be merged? Anything else you want me to change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants