Skip to content

Conversation

cruessler
Copy link
Contributor

@cruessler cruessler commented Aug 4, 2025

This PR changes ConsumeHunk::consume_hunk to accept typed data instead of strings.

This PR was initially motivated by gitui-org/gitui#2685.

Initial description (no longer accurate as of 2025-08-24)

This PR is an attempt at extracting a part of UnifiedDiff, more specifically a part that exposes an API for dealing with structured data as opposed to string-based data. It introduces ConsumeTypedHunk which is very similar to ConsumeHunk, except that is uses HunkHeader and DiffLineType to encode diff-related information in proper types.

There are still a few open issues/questions:

  • ConsumeTypedHunk is not NewlineSeparator-aware. Is that a good thing or a bad thing? I’m leaning more towards the former.
  • The PR contains a few TODO: comments.

What do you think about the PR’s overall design? If you think this works, I’d start addressing the remaining issues.

@cruessler cruessler marked this pull request as ready for review August 7, 2025 10:03
@cruessler cruessler marked this pull request as draft August 8, 2025 08:15
@cruessler
Copy link
Contributor Author

I’m going to mark this PR as ready for review in order to indicate that I’m now done with the initial version and would welcome a first round of feedback!

There’s a few genuine TODO: comments in places where I had questions myself. Feel free to mark this PR as draft in case you don’t think it’s ready yet!

@Byron Byron force-pushed the split-unified-diff branch from 98f5370 to b765afe Compare August 22, 2025 12:23
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from our session:

  • UnifiedDiffSink -> UnifiedDiff - make it a breaking change
  • .collect() - reuse the buffer for lines instead.

@Byron Byron force-pushed the split-unified-diff branch from b765afe to 1fe6b19 Compare August 22, 2025 12:41
@cruessler
Copy link
Contributor Author

I’ve now updated the PR.

  • The API is a bit more verbose as the responsibility for handling newlines was moved to the consumer. This had become necessary as HunkHeader::to_string() doesn’t include any newline at all. For consistency, newlines for the actual lines in a diff have to be handled by the consumer as well.
  • I did not manage to move buffer2 to UnifiedDiff as I ran into lifetime and ownership issues, but I might have missed something. Please check carefully! :-)

@cruessler cruessler changed the title Extract UnifiedDiffSink Add HunkHeader and DiffLineType to ConsumeHunk::consume_hunk Aug 24, 2025
@cruessler cruessler marked this pull request as ready for review August 24, 2025 05:07
@Byron
Copy link
Member

Byron commented Aug 26, 2025

Apologies for the late response!

In any case, I think this is the way to go. Pushing newline handling further down seems like perfectly reasonable, the new 'abstract' version doesn't need to know about that, after all it has a header, it has each line in a buffer, that's all that it needs.

Regarding the lifetime issues with the buffer, I see why would that be the case. There is this hidden lifetime in T that one would want there. In theory, it should be possible, practically I'd also have trouble expressing this, even though I also think that it should be possible, maybe with more generics, maybe with an explicit lifetime that is added as constraint and can be used with the buffer, like buffer: Vec<(Type, &'interned [u8])>.

I set this PR back to draft, but invite you to change it back to what it is now to clearly indicate when it's ready for review.

@Byron Byron marked this pull request as draft August 26, 2025 03:33
This commit modifies the public API of `ConsumeHunk::consume_hunk` to
use the new types `DiffLineType` and `HunkHeader`. It also shifts
responsibility for adding newlines to the API's consumer.
@cruessler
Copy link
Contributor Author

The lifetime issue fortunately was much easier to solve than expected. I removed the intermediate buffer2 and changed self.buffer to what you suggested, and that was already sufficient. The intermediate buffer2 was only needed by the previous version because the previous version had to add newlines and that required a separate buffer.

Once CI passes, this will be ready for review!

@cruessler cruessler marked this pull request as ready for review August 26, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants