Skip to content

Title duplication problem in markdown format #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Amoming opened this issue Apr 9, 2025 · 4 comments
Open

Title duplication problem in markdown format #249

Amoming opened this issue Apr 9, 2025 · 4 comments
Labels
enhancement New feature or request fix developed

Comments

@Amoming
Copy link

Amoming commented Apr 9, 2025

Here is the parsed text:Image
original text: Image
Basically, this will happen in the title.
version: 0.0.20
PDF file attach.

1219829161.PDF

@JorjMcKie
Copy link
Collaborator

This file contains text portions which simulate bold by repeatedly writing the same text with small offsets. This technique is used when separate, explicitly bold fonts are deliberately avoided.
This problem will be fixed with a future version of (Py-) Mupdf, which can detect most fake-bold situations.

@JorjMcKie JorjMcKie added the enhancement New feature or request label Apr 9, 2025
@Amoming
Copy link
Author

Amoming commented Apr 9, 2025

Thanks for your reply!❤❤ I'm really looking forward to the new features of pymupdf, could I know when and what version of this part of the fix feature will be available.

@JorjMcKie
Copy link
Collaborator

No fixed date yet. But the MuPDF team is currently finalizing that important version 1.26.0.
Usually, a new PyMuPDF version will follow just a few days after their publication.

@JorjMcKie
Copy link
Collaborator

Once the new PyMuPDF version is released that contains that MuPDF v1.26.0, this issue will be automatically fixed in PyMuPDF4LLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fix developed
Projects
None yet
Development

No branches or pull requests

2 participants