A Python tool that automatically renames invoice files based on their content using OCR technology. The tool extracts the invoice date and amount from PDF/JPG files and renames them in a standardized format: YYYYMMDD_AMOUNT元.pdf
.
- Supports PDF and JPG/JPEG file formats
- Extracts invoice date and amount using PaddleOCR
- Processes single files or entire directories
- Creates renamed files in a separate 'rename' directory
- Supports Chinese invoice format
- Python 3.11 or higher
- uv package manager
- macOS (for current installation instructions)
pip install uv
git clone <repository-url>
cd invoice_renamer
uv venv
uv sync
uv pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
uv add paddleocr
brew install ccache
python main.py path/to/invoice.pdf
python main.py path/to/directory
Files will be renamed to: YYYYMMDD_AMOUNT元.pdf
Example: 20250315_683.00元.pdf
See pyproject.toml for detailed dependencies:
- paddleocr >= 2.10.0
- pdf2image >= 1.17.0
- argparse >= 1.4.0
This project is licensed under the MIT License - see the LICENSE file for details.