Skip to content

Compress encrypted data #231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jimhark
Copy link

@jimhark jimhark commented Jul 21, 2025

Summary

Compress data before encryption because it reduces size by about 30%, at a minimal additional compute cost compared to encryption and signing. This can produce in an encrypted file that's smaller than the original.

Resolves

Resolves #230

Details

  • Compress before encryption / decompress after decryption
  • Use gzip compression
  • About 30% reduction observed

Testing

I manually tested node encrypt, node decrypt, and HTML wrapper decrypt.

Notes for Reviewers

In an attempt to avoid, or at least limit, having 3 representations of data simultaneously in memory (file data buffer, decrypted data buffer, decompressed data buffer), we sometimes pass a reader instead of a buffer. Reader here means a callable that can read the data. See 'msgReader' in codec.js, which provides this comment:

We take a message reader function instead of a message buffer so we can release its storage when it is no longer needed (caller isn't stuck holding a reference).

jimhark and others added 9 commits July 21, 2025 00:48
This is part of a push to support larger files. The focus is the switch
to using Uint8Array to store binary data. But also includes:

- When running on Node, use Buffer.from() for hex string conversions.

- To avoid large buffer copy, signedMsg as been replaced by an object
    containing  iv, encrypted, and hmac.

- hmac calculation has changed so it avoids copying (possibly very
    large) encrypted data. See signDigest() in lib/codec.js.

- Minor cleanup

Handling hex encode/decode at the input/output boundaries and using
Uint8Array internally for representing binary data has these benefits:

- More memory efficient, allows processing of 2x larger files.

- Aligns with cryptographic best practices: hashing is now performed
  on raw binary data (Uint8Array) instead of hex strings.

- Behavior is (mostly) unchanged
  - scripts/index_template.html textContent is not implemented and
    needs to be redesigned.
makes unnamed function more self documenting
Also removed use of recursion to improve readability (and debugability).
As a bonus, function is actually shorter (LoC AND lines of text)
cuts size by 1/3 and noticeably improves performance
Refactored encrypted buffer handling to reduce memory usage
Minor cleanup
@jimhark
Copy link
Author

jimhark commented Jul 23, 2025

Committed a396bf0 to fix line of code that got messed up between testing and commit/push.

@jimhark
Copy link
Author

jimhark commented Jul 23, 2025

Committed a396bf0 to remove to variable definitions for vars that are no longer used. This was discovered while writing a previous pull request comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compress/decompress encrypted data
1 participant