Skip to content

Add .gptinclude functionality and fix XML CDATA handling #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

imertz
Copy link
Contributor

@imertz imertz commented Apr 29, 2025

Overview

This PR adds support for a .gptinclude file, allowing users to explicitly specify which files should be included in the repository export. It also fixes an issue with XML output when file content contains CDATA end markers.

New Feature: .gptinclude Support

This feature complements the existing .gptignore functionality by letting users specify explicitly which files should be included rather than only which to exclude. When both files exist, git2gpt prioritizes .gptinclude patterns first, then excludes any files that match .gptignore patterns.

Changes:

  • Added new -I/--include command-line flag to specify a custom path to .gptinclude file
  • By default, the tool looks for .gptinclude in the repository root
  • Added file processing logic to respect both include and ignore lists
  • Comprehensive test coverage for the new functionality
  • Updated documentation in README.md with examples

Bug Fix: XML CDATA Handling

Fixed an issue where the XML export would fail with "unexpected EOF in CDATA section" errors when file content contained the CDATA end marker sequence ]]>.

The fix:

  • Properly handles CDATA sections containing the ]]> sequence
  • Splits content around these markers and creates properly nested CDATA sections
  • Ensures XML output is valid regardless of source content

Testing Instructions

Testing .gptinclude functionality:

  1. Create a .gptinclude file in a test repository with patterns like:
    *.go
    docs/**
    
  2. Run: git2gpt -o output.txt /path/to/repo
  3. Verify only matching files are included in the output

Testing with both .gptinclude and .gptignore:

  1. Create a .gptinclude file with src/**
  2. Create a .gptignore file with src/test/**
  3. Run: git2gpt -o output.txt /path/to/repo
  4. Verify only files in src/ except those in src/test/ are included

Testing XML output fix:

  1. Create a file containing the sequence ]]> (common in some code or XML files)
  2. Run: git2gpt -x -o output.xml /path/to/repo
  3. Verify the XML output is generated without errors

Automated Tests

The PR includes a comprehensive test for the .gptinclude functionality:

go test -v ./prompt -run TestGptIncludeAndIgnore

Documentation

README.md has been updated with:

  • Explanation of the .gptinclude feature
  • Usage examples for both .gptinclude and .gptignore
  • Updated command-line flags documentation

Potential Impact

These changes are backward compatible and don't affect existing functionality:

  • Projects using .gptignore will continue to work as before
  • The XML fix ensures more robust output without changing format

imertz added 3 commits April 29, 2025 11:08
- Fixed XML generation to properly handle special characters and CDATA sections
- Added protection against premature CDATA termination by escaping "]]>" sequences
- Improved XML formatting with consistent indentation and structure
- Simplified token placeholder replacement without breaking formatting
This commit adds support for a .gptinclude file, which allows users to
explicitly specify which files should be included in the repository export.
The feature complements the existing .gptignore functionality:

- When both .gptinclude and .gptignore exist, files are first filtered
  by the include patterns, then any matching ignore patterns are excluded
- Added new command-line flag: -I/--include to specify a custom path
  to the .gptinclude file
- Default behavior looks for .gptinclude in repository root
- Added comprehensive tests for the new functionality
- Updated README.md with documentation and examples

With this change, users gain more fine-grained control over which parts
of their repositories are processed by git2gpt, making it easier to focus
on specific areas when working with AI language models.
This commit fixes an issue where the XML export would fail with
"unexpected EOF in CDATA section" errors when file content contained
the CDATA end marker sequence ']]>'.

The fix implements a proper CDATA handling strategy that:
- Detects all occurrences of ']]>' in file content
- Splits the content around these markers
- Creates properly nested CDATA sections to preserve the original content
- Ensures all XML output is well-formed regardless of source content

This approach maintains the efficiency of CDATA for storing large code
blocks while ensuring compatibility with all possible file content.

Fixes the XML validation error that would occur when processing files
containing CDATA end marker sequences.
@chand1012
Copy link
Owner

Duplicate of #17, closing.

@chand1012 chand1012 closed this Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants