LLMOCR uses a local LLM to read text from images.
You can also change the instruction to have the LLM use the image in the way that you prompt.
- Local Processing: All processing is done locally on your machine.
- User-Friendly GUI: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
- GPU Acceleration: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
- Cross-Platform: Supports Windows, macOS ARM, and Linux.
- Python 3.8 or higher
-
Clone the repository
-
Install Python for Windows
-
Open KoboldCpp or an OpenAI compatible API and load a vision model
-
Open
llmocr.bat
-
Clone the repository or download and extract the ZIP file
-
Install Python 3.8 or higher if not already installed
-
Create a new python env and install the requirements.txt
-
Open KoboldCpp or an OpenAI compatible API with a loaded vision model
-
Run llmocr.py using Python