Skip to content

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Aug 5, 2025

With apache arrow glib parquet library, we're able to support parquet format on out_s3.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    trace
    HTTP_Server  Off
    HTTP_Listen  0.0.0.0
    HTTP_Port    2020

[INPUT]
    Name dummy
    Tag  dummy.local
    dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}

[OUTPUT]
    Name  s3
    Match dummy*
    Region us-east-2
    bucket fbit-parquet-s3
    Use_Put_object true
    compression parquet
    # No need to specify schema
  • Debug log output from testing the change
Fluent Bit v4.1.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____ 
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/ 


[2025/08/05 17:27:09] [ info] Configuration:
[2025/08/05 17:27:09] [ info]  flush time     | 5.000000 seconds
[2025/08/05 17:27:09] [ info]  grace          | 5 seconds
[2025/08/05 17:27:09] [ info]  daemon         | 0
[2025/08/05 17:27:09] [ info] ___________
[2025/08/05 17:27:09] [ info]  inputs:
[2025/08/05 17:27:09] [ info]      dummy
[2025/08/05 17:27:09] [ info] ___________
[2025/08/05 17:27:09] [ info]  filters:
[2025/08/05 17:27:09] [ info] ___________
[2025/08/05 17:27:09] [ info]  outputs:
[2025/08/05 17:27:09] [ info]      s3.0
[2025/08/05 17:27:09] [ info] ___________
[2025/08/05 17:27:09] [ info]  collectors:
[2025/08/05 17:27:09] [ info] [fluent bit] version=4.1.0, commit=0afb495f86, pid=81424
[2025/08/05 17:27:09] [debug] [engine] coroutine stack size: 36864 bytes (36.0K)
[2025/08/05 17:27:09] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/08/05 17:27:09] [ info] [simd    ] NEON
[2025/08/05 17:27:09] [ info] [cmetrics] version=1.0.5
[2025/08/05 17:27:09] [ info] [ctraces ] version=0.6.6
[2025/08/05 17:27:09] [ info] [input:dummy:dummy.0] initializing
[2025/08/05 17:27:09] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/08/05 17:27:09] [debug] [dummy:dummy.0] created event channels: read=26 write=27
[2025/08/05 17:27:09] [debug] [s3:s3.0] created event channels: read=28 write=29
[2025/08/05 17:27:09] [debug] [tls] attempting to load certificates from system keychain of macOS
# <snip of loading certificates logs>
[2025/08/05 17:27:09] [debug] [tls] finished loading keychain certificates, total loaded: 153
[2025/08/05 17:27:09] [debug] [aws_credentials] Initialized Env Provider in standard chain
[2025/08/05 17:27:09] [debug] [aws_credentials] creating profile (null) provider
[2025/08/05 17:27:09] [debug] [aws_credentials] Initialized AWS Profile Provider in standard chain
[2025/08/05 17:27:09] [debug] [aws_credentials] Not initializing EKS provider because AWS_ROLE_ARN was not set
[2025/08/05 17:27:09] [debug] [aws_credentials] Not initializing ECS/EKS HTTP Provider because AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and AWS_CONTAINER_CREDENTIALS_FULL_URI is not set
[2025/08/05 17:27:09] [debug] [aws_credentials] Initialized EC2 Provider in standard chain
[2025/08/05 17:27:09] [debug] [aws_credentials] Sync called on the EC2 provider
[2025/08/05 17:27:09] [debug] [aws_credentials] Init called on the env provider
[2025/08/05 17:27:09] [ info] [output:s3:s3.0] Sending locally buffered data from previous executions to S3; buffer=/tmp/fluent-bit/s3/fbit-parquet-s3
[2025/08/05 17:27:09] [ info] [output:s3:s3.0] Pre-compression chunk size is 2394, After compression, chunk is 2384 bytes
[2025/08/05 17:27:10] [debug] [upstream] KA connection #34 to s3.us-east-2.amazonaws.com:443 is connected
[2025/08/05 17:27:10] [debug] [http_client] not using http_proxy for header
[2025/08/05 17:27:10] [debug] [aws_credentials] Requesting credentials from the env provider..
[2025/08/05 17:27:10] [debug] [upstream] KA connection #34 to s3.us-east-2.amazonaws.com:443 is now available
[2025/08/05 17:27:10] [debug] [output:s3:s3.0] PutObject http status=200
[2025/08/05 17:27:10] [ info] [output:s3:s3.0] Successfully uploaded object /fluent-bit-logs/dummy.local/2025/08/05/08/27/09-objectt33OX2sM
[2025/08/05 17:27:10] [debug] [aws_credentials] upstream_set called on the EC2 provider
[2025/08/05 17:27:10] [ info] [output:s3:s3.0] worker #0 started
[2025/08/05 17:27:10] [ info] [sp] stream processor started
[2025/08/05 17:27:10] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
[2025/08/05 17:27:15] [debug] [task] created task=0x1137040a0 id=0 OK
[2025/08/05 17:27:15] [debug] [output:s3:s3.0] task_id=0 assigned to thread #0
[2025/08/05 17:27:15] [debug] [output:s3:s3.0] Creating upload timer with frequency 60s
[2025/08/05 17:27:15] [debug] [out flush] cb_destroy coro_id=0
[2025/08/05 17:27:15] [debug] [task] destroy task=0x1137040a0 (task_id=0)
[2025/08/05 17:27:18] [engine] caught signal (SIGTERM)
[2025/08/05 17:27:18] [ info] [input] pausing dummy.0
[2025/08/05 17:27:18] [ info] [output:s3:s3.0] thread worker #0 stopping...
[2025/08/05 17:27:18] [ info] [output:s3:s3.0] thread worker #0 stopped
[2025/08/05 17:27:18] [ info] [output:s3:s3.0] Sending all locally buffered data to S3
[2025/08/05 17:27:18] [ info] [output:s3:s3.0] Pre-compression chunk size is 504, After compression, chunk is 1904 bytes
[2025/08/05 17:27:18] [debug] [output:s3:s3.0] PutObject http status=200
[2025/08/05 17:27:18] [ info] [output:s3:s3.0] Successfully uploaded object /fluent-bit-logs/dummy.local/2025/08/05/08/27/15-objectkoBRRSAl
  • Attached Valgrind output that shows no leaks or memory corruption was found

With leaks on macOS, there's no leaks:

Process:         fluent-bit [81424]
Path:            /Users/USER/*/fluent-bit
Load Address:    0x100f58000
Identifier:      fluent-bit
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [81423]
Target Type:     live task

Date/Time:       2025-08-05 17:27:19.085 +0900
Launch Time:     2025-08-05 17:27:09.515 +0900
OS Version:      macOS 15.5 (24F74)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         15.6M
Physical footprint (peak):  18.8M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 81424: 3008 nodes malloced for 428 KB
Process 81424: 0 leaks for 0 total leaked bytes.

[2025/08/05 17:27:19] [engine] caught signal (SIGCONT)
[2025/08/05 17:27:19] Fluent Bit Dump

With valgrind:

==361658== LEAK SUMMARY:
==361658==    definitely lost: 0 bytes in 0 blocks
==361658==    indirectly lost: 0 bytes in 0 blocks
==361658==      possibly lost: 0 bytes in 0 blocks
==361658==    still reachable: 68,984 bytes in 677 blocks
==361658==         suppressed: 0 bytes in 0 blocks
==361658== Reachable blocks (those to which a pointer was found) are not shown.

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Added Parquet as a compression option for the S3 output (now supports gzip, arrow, parquet). Requires Apache Arrow support at build time.
  • Improvements

    • Enforced requirement to enable use_put_object when using Arrow or Parquet compression.
    • Clarified upload size handling: compressed uploads (any non-NONE compression) follow the 5GB multipart limit; uncompressed (NONE) follows the 50MB single-part limit.
  • Documentation

    • Updated configuration descriptions to include Parquet and build requirements.

Copy link

coderabbitai bot commented Aug 5, 2025

Walkthrough

Adds Parquet compression support gated by Arrow GLib Parquet detection, extends S3 output to treat any non-NONE compression uniformly, enforces put-object for Arrow/Parquet, updates build scripts and CI to install Arrow/Parquet GLib, and exposes a new compressor API plus option wiring.

Changes

Cohort / File(s) Summary
Build system: Arrow/Parquet detection
CMakeLists.txt, src/aws/compression/arrow/CMakeLists.txt
Adds pkg-config check for parquet-glib; defines FLB_HAVE_ARROW_PARQUET when FLB_ARROW and ARROW_GLIB_PARQUET_FOUND; links Parquet GLib includes/libs to flb-aws-arrow; optionally links jemalloc.
Public API: compression constants
include/fluent-bit/aws/flb_aws_compress.h
Introduces FLB_AWS_COMPRESS_PARQUET 3; aligns constant formatting.
Compressor implementation (Arrow/Parquet)
src/aws/compression/arrow/compress.c, src/aws/compression/arrow/compress.h
Adds Parquet compression: table_to_parquet_buffer helper and out_s3_compress_parquet API; declaration guarded by FLB_HAVE_ARROW_PARQUET.
Compression option registry
src/aws/flb_aws_compress.c
Registers "parquet" option with FLB_AWS_COMPRESS_PARQUET and out_s3_compress_parquet under FLB_HAVE_ARROW_PARQUET.
S3 output plugin behavior
plugins/out_s3/s3.c
Supports parquet in config; requires use_put_object for Arrow/Parquet; generalizes compression handling for any non-NONE type; enforces 5GB compressed multipart limit for non-NONE and 50MB uncompressed limit for NONE; adjusts buffer free paths; gzip still sets Content-Encoding.
CI: Arrow/Parquet dependencies
.github/workflows/unit-tests.yaml
Adds -DFLB_ARROW=On to matrix; installs libarrow-glib-dev and libparquet-glib-dev when FLB_ARROW is on; adds clang row testing Arrow path.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User as User/Config
  participant S3 as S3 Plugin
  participant Comp as Compressor (gzip/arrow/parquet)
  participant AWS as Amazon S3

  User->>S3: Configure compression=(none|gzip|arrow|parquet), use_put_object
  alt compression in {arrow, parquet}
    S3->>S3: Verify use_put_object == true
    alt use_put_object false
      S3-->>User: Error: require put-object for Arrow/Parquet
      note right of S3: Abort send
    else use_put_object true
      S3->>Comp: Compress records
      alt parquet
        note over Comp: out_s3_compress_parquet\n(Arrow Table -> Parquet buffer)
      else arrow
        note over Comp: out_s3_compress_arrow\n(Arrow Table -> Feather)
      else gzip
        note over Comp: gzip buffer
      end
      Comp-->>S3: Compressed payload, size
      S3->>S3: Validate size (multipart 5GB compressed limit)
      S3->>AWS: Upload (PUT Object or Multipart)
      AWS-->>S3: Response
      S3-->>User: Result
    end
  else compression == none
    S3->>S3: No compression
    S3->>S3: Validate size (50MB uncompressed limit for multipart)
    S3->>AWS: Upload (PUT Object or Multipart)
    AWS-->>S3: Response
    S3-->>User: Result
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

I nibbled bytes and hopped through trees,
Found Parquet paths on Arrow’s breeze.
With PUT I stash my tidy pack,
Compress, confess—no turning back.
From CI burrows, green lights glow—
To S3’s clouds, my payloads go. 🐇📦✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-implement-parquet-comression-with-apache-arrow-glib-parquet

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-parquet-comression-with-apache-arrow-glib-parquet branch from 85316ab to 0afb495 Compare August 5, 2025 07:55
@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-parquet-comression-with-apache-arrow-glib-parquet branch from 243f704 to 4eaed0d Compare August 5, 2025 08:16
@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-parquet-comression-with-apache-arrow-glib-parquet branch from 4eaed0d to e69bbd2 Compare August 5, 2025 08:21
@cosmo0920 cosmo0920 marked this pull request as ready for review August 5, 2025 09:28
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-parquet-comression-with-apache-arrow-glib-parquet branch from d165b26 to fbaab06 Compare August 25, 2025 11:30
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/out_s3/s3.c (1)

706-714: Critical: validation uses ctx->compression before it’s assigned (guard never triggers).

You compute ret = flb_aws_compression_get_type(tmp) but then check ctx->compression (still holding its old/default value) instead of ret. This allows Arrow/Parquet with use_put_object == FLB_FALSE to slip past initialization. Compare against ret (or assign first, then validate).

Apply this minimal fix:

         ret = flb_aws_compression_get_type(tmp);
         if (ret == -1) {
             flb_plg_error(ctx->ins, "unknown compression: %s", tmp);
             return -1;
         }
-        if (ctx->use_put_object == FLB_FALSE &&
-            (ctx->compression == FLB_AWS_COMPRESS_ARROW ||
-             ctx->compression == FLB_AWS_COMPRESS_PARQUET)) {
+        if (ctx->use_put_object == FLB_FALSE &&
+            (ret == FLB_AWS_COMPRESS_ARROW ||
+             ret == FLB_AWS_COMPRESS_PARQUET)) {
             flb_plg_error(ctx->ins,
-                          "use_put_object must be enabled when Apache Arrow or Parquet is enabled");
+                          "use_put_object must be enabled when Apache Arrow or Parquet compression is selected");
             return -1;
         }
         ctx->compression = ret;
♻️ Duplicate comments (1)
plugins/out_s3/s3.c (1)

1173-1177: Fix misleading log message (it’s reporting chunk size, not upload_chunk_size).

The message says “Pre-compression upload_chunk_size” but prints the pre-compression chunk size. Suggest clarifying wording.

This also reflects the earlier feedback about the Arrow/Parquet fallback branch; it’s now correctly limited to the gzip-only case.

Apply this tweak:

-                flb_plg_info(ctx->ins, "Pre-compression upload_chunk_size= %zu, After compression, chunk is only %zu bytes, "
-                                       "the chunk was too small, using PutObject to upload", preCompress_size, body_size);
+                flb_plg_info(ctx->ins,
+                             "Pre-compression chunk size was %zu bytes; after compression it's %zu bytes; "
+                             "chunk too small, using PutObject to upload",
+                             preCompress_size, body_size);
🧹 Nitpick comments (2)
plugins/out_s3/s3.c (2)

1195-1197: De-duplicate compressed-buffer frees with a single cleanup path.

You correctly free payload_buf across all exit paths, but the repetition makes future edits error-prone. Consider a single cleanup: block (or a small inline helper) to free the compressed buffer when set.

Example pattern:

int ret = FLB_OK;
void *payload_buf = NULL;
bool compressed = false;

/* ... set compressed=true and payload_buf on success ... */

put_object:
ret = s3_put_object(...);

cleanup:
if (compressed && payload_buf != NULL) {
    flb_free(payload_buf);
}
return ret;

This keeps ownership clear and reduces maintenance overhead.

Also applies to: 1222-1224, 1236-1238, 1246-1248, 1263-1265


3996-3999: Clarify option docs: Arrow/Parquet require PutObject.

Since initialization enforces PutObject for Arrow/Parquet, reflect that in the compression option description to prevent config surprises.

-    "Compression type for S3 objects. 'gzip', 'arrow' and 'parquet' are the supported values. "
-    "'arrow' and 'parquet' are only available if Apache Arrow was enabled at compile time. "
+    "Compression type for S3 objects. Supported values: 'gzip', 'arrow', 'parquet'. "
+    "'arrow' and 'parquet' are available only if Apache Arrow was enabled at compile time, "
+    "and require 'use_put_object' to be enabled. "
     "Defaults to no compression. "
     "If 'gzip' is selected, the Content-Encoding HTTP Header will be set to 'gzip'."
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d165b26 and fbaab06.

📒 Files selected for processing (4)
  • CMakeLists.txt (1 hunks)
  • plugins/out_s3/s3.c (9 hunks)
  • src/aws/compression/arrow/CMakeLists.txt (1 hunks)
  • src/aws/compression/arrow/compress.c (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (3)
plugins/out_s3/s3.c (3)

735-745: Multipart size caps: compressed vs. uncompressed limit — LGTM.

Good call enforcing 5 GB per-part cap for compressed payloads and retaining 50 MB for uncompressed. This aligns the code with S3 multipart constraints and avoids accidental oversizing.


1130-1141: Unified compression preprocessing for all non-NONE types — LGTM.

Centralizing gzip/arrow/parquet into a single compression pathway simplifies control flow and makes memory handling consistent. The early return on compression failure with FLB_RETRY is appropriate.


1352-1364: Graceful fallback to uncompressed on compression failure — LGTM.

When compression fails in put_all_chunks(), you log and proceed with the original buffer to prevent data loss. Memory ownership transitions (buffer -> payload_buf) and frees look correct.

@edsiper edsiper merged commit 137f9f0 into master Aug 28, 2025
55 checks passed
@edsiper edsiper deleted the cosmo0920-implement-parquet-comression-with-apache-arrow-glib-parquet branch August 28, 2025 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants