Skip to content

itvincent-git/invoice_renamer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Invoice Renamer

Description

A Python tool that automatically renames invoice files based on their content using OCR technology. The tool extracts the invoice date and amount from PDF/JPG files and renames them in a standardized format: YYYYMMDD_AMOUNT元.pdf.

Features

  • Supports PDF and JPG/JPEG file formats
  • Extracts invoice date and amount using PaddleOCR
  • Processes single files or entire directories
  • Creates renamed files in a separate 'rename' directory
  • Supports Chinese invoice format

Prerequisites

  • Python 3.11 or higher
  • uv package manager
  • macOS (for current installation instructions)

Installation

1. Install uv (Package Manager)

pip install uv

2. Clone and Setup Project

git clone <repository-url>
cd invoice_renamer
uv venv
uv sync

3. Install PaddlePaddle OCR Components

Install PaddlePaddle Core (macOS)

uv pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Install PaddleOCR

uv add paddleocr

Install Additional Dependencies

brew install ccache

Usage

Process Single File

python main.py path/to/invoice.pdf

Process Entire Directory

python main.py path/to/directory

Output Format

Files will be renamed to: YYYYMMDD_AMOUNT元.pdf Example: 20250315_683.00元.pdf

Requirements

See pyproject.toml for detailed dependencies:

  • paddleocr >= 2.10.0
  • pdf2image >= 1.17.0
  • argparse >= 1.4.0

References

License

This project is licensed under the MIT License - see the LICENSE file for details.