Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cst: streaming parsing for tx range manifests #24728

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

nvartolomei
Copy link
Contributor

test                         iterations      median         mad         min         max      allocs       tasks        inst
dom_deserializer.1k                 199     1.868ms     1.685us     1.866ms     1.870ms      57.025       1.036  25814372.0
sax_deserializer.1k                 222     1.392ms     1.436us     1.384ms     1.393ms      42.032       1.050  18845216.1

dom_deserializer.100k                 2   196.808ms   796.756us   196.011ms   198.198ms     896.000       2.500 2712139318.1
sax_deserializer.100k                 3   147.369ms    57.420us   147.311ms   149.829ms     578.067       2.600 2007473108.5

At 100k dom deserializer does large allocations seastar_memory - oversized allocation: 2273280 bytes while sax serializer never allocates more than 128k (this is actually the maximum allocation observed).

Downsides: the parser is very rigid. We can't evolve the file in place. I doubt we care as we'd rather move to tx2 file extension and binary serialization if we wanted to include additional information.

PTAL

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

src/v/cloud_storage/tx_range_manifest.cc Outdated Show resolved Hide resolved
src/v/cloud_storage/tx_range_manifest.cc Outdated Show resolved Hide resolved
src/v/cloud_storage/tx_range_manifest.cc Outdated Show resolved Hide resolved
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#60480
test_id test_kind job_url test_status passed
rm_stm_tests_rpunit.rm_stm_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60480#01944a39-0600-4091-b609-b7315f4fdf1b FLAKY 1/2
rptest.tests.cpu_stress_injection_test.CpuStressInjectionTest.test_stress_fibers_ms ducktape https://buildkite.com/redpanda/redpanda/builds/60480#01944a92-400b-4b48-9da3-1a95ec42a9a8 FLAKY 5/6

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parser is very rigid. We can't evolve the file in place

can you say more about how this restriction will effect us?

@nvartolomei
Copy link
Contributor Author

can you say more about how this restriction will effect us?

I'm not sure what more can be said there. No new fields can be added to the file. If we ever want to extend the manifest we will be able by using new file and bumping manifest/manifest name version.

@nvartolomei nvartolomei force-pushed the nv/cst-streaming-tx-manifest-parsing branch from f78ed3f to 7258b7c Compare January 12, 2025 21:29
Gets rid of large allocs caused by rapidjson DOM parsing.
@nvartolomei nvartolomei force-pushed the nv/cst-streaming-tx-manifest-parsing branch from 7258b7c to f518357 Compare January 12, 2025 21:33
@nvartolomei nvartolomei changed the title Nv/cst streaming tx manifest parsing cst: streaming parsing for tx range manifests Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants