🧠 DocMind AI: Local LLM for AI-Powered Document Analysis

DocMind AI transforms how you analyze documents locally with zero cloud dependency. This system combines hybrid search (dense + sparse embeddings), knowledge graph extraction, and a sophisticated 5-agent coordination system to extract and analyze information from your PDFs, Office docs, and multimedia content. Built on LlamaIndex pipelines with LangGraph supervisor orchestration and Qwen3-4B-Instruct-2507's FULL 262K context capability through INT8 KV cache optimization, it delivers production-ready document intelligence that runs entirely on your hardware—with GPU acceleration for enhanced performance and specialized agent coordination for improved query quality.

Why DocMind AI? Traditional document analysis tools either send your data to the cloud (privacy risk) or provide basic keyword search (limited intelligence). DocMind AI gives you the best of both worlds: AI reasoning with complete data privacy. Process complex queries that require multiple reasoning strategies, extract entities and relationships, and get contextual answers—all while your sensitive documents never leave your machine.

✨ Features of DocMind AI

Privacy-Focused: Local processing ensures data security without cloud dependency.
Versatile Document Handling: Supports multiple file formats:
- 📄 PDF
- 📑 DOCX
- 📝 TXT
- 📊 XLSX
- 🌐 MD (Markdown)
- 🗃️ JSON
- 🗂️ XML
- 🔤 RTF
- 📇 CSV
- 📧 MSG (Email)
- 🖥️ PPTX (PowerPoint)
- 📘 ODT (OpenDocument Text)
- 📚 EPUB (E-book)
- 💻 Code files (PY, JS, JAVA, TS, TSX, C, CPP, H, and more)
Multi-Agent Coordination: LangGraph supervisor coordinating 5 specialized agents: query router, query planner, retrieval expert, result synthesizer, and response validator.
LlamaIndex RAG Pipeline: QueryPipeline with async/parallel processing, ingestion pipelines, and caching.
Hybrid Retrieval: RRF fusion (α=0.7) combining BGE-Large dense and SPLADE++ sparse embeddings for 15-20% better recall.
Knowledge Graph Integration: spaCy entity extraction with relationship mapping for complex queries.
Multimodal Processing: Unstructured hi-res parsing for PDFs with text, tables, and images using Jina v4 embeddings.
ColBERT Reranking: Late-interaction reranking improves context quality by 20-30%.
Offline-First Design: 100% local processing with no external API dependencies.
GPU Acceleration: CUDA support with mixed precision and FP8 quantization via vLLM FlashInfer backend for optimized performance.
Session Persistence: SQLite WAL with local multi-process support for concurrent access.
Docker Support: Easy deployment with Docker and Docker Compose.
Intelligent Caching: High-performance document processing cache for rapid re-analysis.
Robust Error Handling: Reliable retry strategies with exponential backoff.
Structured Logging: Contextual logging with automatic rotation and JSON output.
Simple Configuration: Environment variables and Streamlit native config for easy setup.

📖 Table of Contents

🧠 DocMind AI: Local LLM for AI-Powered Document Analysis

🚀 Getting Started with DocMind AI

📋 Prerequisites

Ollama installed and running locally.
Python 3.11+ (tested with 3.11, 3.12)
(Optional) Docker and Docker Compose for containerized deployment.
(Optional) NVIDIA GPU (e.g., RTX 4090 Laptop) with at least 16GB VRAM for 262K context capability and accelerated performance.

⚙️ Installation

Clone the repository:

git clone https://github.com/BjornMelin/docmind-ai-llm.git
cd docmind-ai-llm

Install dependencies:
```
uv sync
```
Key Dependencies Included:
- LlamaIndex Core: RAG framework with QueryPipeline patterns
- LangGraph (0.5.4): 5-agent supervisor orchestration with langgraph-supervisor library
- Streamlit (1.48.0): Web interface framework
- Ollama (0.5.1): Local LLM integration
- Qdrant Client (1.15.1): Vector database operations
- FastEmbed (0.3.0+): High-performance embeddings
- Tenacity (8.0.0+): Retry strategies with exponential backoff
- Loguru (0.7.0+): Structured logging
- Pydantic (2.11.7): Data validation and settings
Install spaCy language model:

DocMind AI uses spaCy for named entity recognition and linguistic analysis. Install the English language model:
```
# Install the small English model (recommended, ~15MB)
uv run python -m spacy download en_core_web_sm

# Optional: Install larger models for better accuracy
# Medium model (~50MB): uv run python -m spacy download en_core_web_md
# Large model (~560MB): uv run python -m spacy download en_core_web_lg
```
Note: spaCy models are downloaded and cached locally. The application will automatically attempt to download en_core_web_sm if not found, but manual installation ensures offline functionality.
Set up environment configuration:

Copy the example environment file and configure your settings:
```
cp .env.example .env
# Edit .env with your preferred settings
```

(Optional) Install GPU support for RTX 4090 with vLLM FlashInfer:

RECOMMENDED: vLLM FlashInfer Stack for Qwen3-4B-Instruct-2507-FP8 with 128K context:

# Phase 1: Verify CUDA installation
nvcc --version  # Should show CUDA 12.8+
nvidia-smi     # Verify RTX 4090 detection

# Phase 2: Install PyTorch 2.7.1 with CUDA 12.8 (DEFINITIVE - TESTED APPROACH)
uv pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
    --extra-index-url https://download.pytorch.org/whl/cu128

# Phase 3: Install vLLM with FlashInfer support (includes FlashInfer automatically)
uv pip install "vllm[flashinfer]>=0.10.1" \
    --extra-index-url https://download.pytorch.org/whl/cu128

# Phase 4: Install remaining GPU dependencies
uv sync --extra gpu

# Phase 5: Verify installation
python -c "import vllm; import torch; print(f'vLLM: {vllm.__version__}, PyTorch: {torch.__version__}')"

Hardware Requirements:

NVIDIA RTX 4090 (16GB VRAM minimum for 128K context)
CUDA Toolkit 12.8+
NVIDIA Driver 550.54.14+
Compute Capability 8.9 (RTX 4090)

Performance Targets Achieved:

100-160 tok/s decode speed (typical: 120-180 with FlashInfer)
800-1300 tok/s prefill speed (typical: 900-1400 with RTX 4090)
FP8 quantization for optimal 16GB VRAM usage (12-14GB typical)
128K context support with INT8 KV cache optimization

Fallback Installation (if FlashInfer fails):

# Fallback: vLLM CUDA-only installation with PyTorch 2.7.1
uv pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
    --extra-index-url https://download.pytorch.org/whl/cu128
uv pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
uv sync --extra gpu

See GPU Setup Guide for detailed configuration and troubleshooting.

▶️ Running the App

Locally:

streamlit run app.py

With Docker:

docker-compose up --build

Access the app at http://localhost:8501.

💻 Usage

🎛️ Selecting a Model

Start Ollama service (if not already running):
```
ollama serve
```
Enter the Ollama Base URL (default: http://localhost:11434).
Select an Ollama Model Name (e.g., qwen3-4b-instruct-2507 for 128K context). If the model isn't installed:
```
ollama pull qwen3-4b-instruct-2507
```
Toggle "Use GPU if available" for accelerated processing (recommended for NVIDIA GPUs with 4GB+ VRAM).
Adjust Context Size based on your model and hardware:
- 2048: Small models, limited VRAM
- 4096: Standard setting for most use cases
- 8192+: Large models with sufficient resources
- 262144: FULL 262K context with INT8 KV cache (Qwen3-4B-Instruct-2507 + 16GB VRAM)

📁 Uploading Documents

Upload one or more documents via the "Browse files" button. Supported formats include PDF, DOCX, TXT, and more (see Features). PDF previews include first-page images for multimodal support.

✍️ Choosing Prompts

Select a pre-defined prompt or create a custom one:

Comprehensive Document Analysis: Summary, key insights, action items, and open questions.
Extract Key Insights and Action Items: Focus on insights and actionable outcomes.
Summarize and Identify Open Questions: Generate summaries and highlight unresolved questions.
Custom Prompt: Define your own analysis prompt.

😃 Selecting Tone

Choose the desired tone for LLM responses:

Professional: Formal and objective.
Academic: Scholarly and research-focused.
Informal: Casual and conversational.
Creative: Imaginative and expressive.
Neutral: Balanced and unbiased.
Direct: Concise and straightforward.
Empathetic: Compassionate and understanding.
Humorous: Lighthearted and witty.
Authoritative: Confident and expert-like.
Inquisitive: Curious and exploratory.

🧮 Selecting Instructions

Select the LLM's role or provide custom instructions:

General Assistant: Helpful and versatile.
Researcher: Deep, analytical insights.
Software Engineer: Technical and code-focused.
Product Manager: Strategic and user-centric.
Data Scientist: Data-driven analysis.
Business Analyst: Business and strategic focus.
Technical Writer: Clear and concise documentation.
Marketing Specialist: Branding and engagement-oriented.
HR Manager: Human resources perspective.
Legal Advisor: Legal and compliance-focused.
Custom Instructions: Specify your own role or instructions.

📏 Setting Length/Detail

Select the desired output length and detail:

Concise: Brief and to-the-point.
Detailed: Thorough and in-depth.
Comprehensive: Extensive and exhaustive.
Bullet Points: Structured list format.

🗂️ Choosing Analysis Mode

Choose how documents are analyzed:

Analyze each document separately: Individual analysis for each file.
Combine analysis for all documents: Holistic analysis across all uploaded files.

🧠 Analyzing Documents

Upload documents.
Configure analysis options (prompt, tone, instructions, length, mode).
Enable Chunked Analysis for large documents, Late Chunking for accuracy, or Multi-Vector Embeddings for enhanced retrieval.
Click "Extract and Analyze" to process.

Results include summaries, insights, action items, and open questions, exportable as JSON or Markdown.

💬 Interacting with the LLM

Use the chat interface to ask follow-up questions. The LLM leverages hybrid search (Jina v4 dense + FastEmbed SPLADE++ sparse) with submodular-optimized reranking for context-aware, high-quality responses.

🔧 API Usage Examples

Programmatic Document Analysis

import asyncio
from pathlib import Path
from models import AppSettings
from src.utils.document import load_documents_unstructured
from src.utils.embedding import create_index_async
from agent_factory import get_agent_system

async def analyze_document(file_path: str, query: str):
    """Example: Analyze a document programmatically."""
    settings = AppSettings()
    
    # Load and process document
    documents = await load_documents_unstructured([Path(file_path)], settings)
    index = await create_index_async(documents, settings)
    
    # Create agent system
    agent_system = get_agent_system(index, settings)
    
    # Run analysis
    response = await agent_system.arun(query)
    return response

# Usage
async def main():
    result = await analyze_document(
        "path/to/document.pdf", 
        "Summarize the key findings and action items"
    )
    print(result)

asyncio.run(main())

Custom Configuration

from models import AppSettings
import os

# Override default settings
os.environ["DEFAULT_MODEL"] = "llama3.2"
os.environ["GPU_ACCELERATION"] = "true"
os.environ["ENABLE_COLBERT_RERANKING"] = "true"

settings = AppSettings()
print(f"Using model: {settings.default_model}")
print(f"GPU enabled: {settings.gpu_acceleration}")

Batch Document Processing

import asyncio
from pathlib import Path
from models import AppSettings
from src.utils.document import load_documents_unstructured
from src.utils.embedding import create_index_async

async def process_document_folder(folder_path: str):
    """Process all supported documents in a folder."""
    settings = AppSettings()
    
    # Find all supported documents
    folder = Path(folder_path)
    supported_extensions = {'.pdf', '.docx', '.txt', '.md', '.json'}
    documents_paths = [
        f for f in folder.rglob("*") 
        if f.suffix.lower() in supported_extensions
    ]
    
    if not documents_paths:
        print("No supported documents found")
        return
    
    print(f"Processing {len(documents_paths)} documents...")
    
    # Load and index all documents
    documents = await load_documents_unstructured(documents_paths, settings)
    index = await create_index_async(documents, settings)
    
    print("Documents processed and indexed successfully!")
    return index

# Usage
asyncio.run(process_document_folder("/path/to/documents"))

🏗️ Architecture

graph TD
    A[Document Upload<br/>Streamlit UI] --> B[Unstructured Parser<br/>hi-res parsing]
    B --> C[Text + Images + Tables<br/>Multimodal Content]
    C --> D[LlamaIndex Ingestion Pipeline<br/>Document Processing]
    
    D --> E[SentenceSplitter<br/>1024 tokens / 200 overlap]
    D --> F[spaCy NLP Pipeline<br/>Entity Recognition]
    
    E --> G[Multi-Modal Embeddings]
    F --> H[Knowledge Graph Builder<br/>Entity Relations]
    
    G --> I[Dense: BGE-Large 1024D<br/>Sparse: SPLADE++ FastEmbed<br/>Multimodal: Jina v4 512D]
    I --> J[Qdrant Vector Store<br/>RRF Fusion α=0.7]
    
    H --> K[Knowledge Graph Index<br/>NetworkX Relations]
    
    J --> L[LlamaIndex QueryPipeline<br/>Multi-Stage Processing]
    K --> L
    
    L --> M[LangGraph Supervisor System<br/>5-Agent Coordination]
    M --> N[5 Specialized Agents:<br/>• Query Router<br/>• Query Planner<br/>• Retrieval Expert<br/>• Result Synthesizer<br/>• Response Validator]
    
    N --> O[ColBERT Reranker<br/>Late Interaction top-5]
    O --> P[Local LLM Backend<br/>Ollama/LM Studio/LlamaCpp]
    P --> Q[Supervisor Coordination<br/>Agent-to-Agent Handoffs]
    Q --> R[Response Synthesis<br/>Quality Validation]
    R --> S[Streamlit Interface<br/>Chat + Analysis Results]
    
    T[SQLite WAL Database<br/>Session Persistence] <--> M
    T <--> L
    U[DiskCache<br/>Document Processing] <--> D
    V[GPU Acceleration<br/>CUDA + Mixed Precision] <--> I
    V <--> O
    W[Human-in-the-Loop<br/>Agent Interrupts] <--> M
    
    subgraph "Local Infrastructure"
        P
        T
        U
        J
    end
    
    subgraph "AI Processing"
        I
        O
        M
        N
    end

🛠️ Implementation Details

Document Processing Pipeline

Parsing: Unstructured hi-res strategy extracts text, tables, and images from PDFs/Office docs with OCR support
Chunking: LlamaIndex SentenceSplitter with 1024-token chunks and 200-token overlap for optimal context
Metadata: spaCy en_core_web_sm for entity extraction and relationship mapping

Hybrid Retrieval Architecture

Dense Embeddings: BGE-Large 1024D (BAAI/bge-large-en-v1.5) for semantic similarity
Sparse Embeddings: SPLADE++ with FastEmbed for neural lexical matching and term expansion
Multimodal: Jina v4 512D embeddings for images and mixed content with int8 quantization
Fusion: RRF (Reciprocal Rank Fusion) with α=0.7 weighting for optimal dense/sparse balance
Storage: Qdrant vector database with metadata filtering and concurrent access

Multi-Agent Coordination

Supervisor Pattern: LangGraph supervisor using langgraph-supervisor library for proven coordination patterns with automatic state management
5 Specialized Agents:
- Query Router: Analyzes query complexity and determines optimal retrieval strategy
- Query Planner: Decomposes complex queries into manageable sub-tasks for better processing
- Retrieval Expert: Executes optimized retrieval with DSPy query optimization and optional GraphRAG for relationships
- Result Synthesizer: Combines and reconciles results from multiple retrieval passes with deduplication
- Response Validator: Validates response quality, accuracy, and completeness before final output
Enhanced Capabilities: DSPy automatic query optimization (20-30% quality improvement) and optional GraphRAG for multi-hop reasoning
Workflow Coordination: Supervisor automatically routes between agents based on query complexity with <300ms coordination overhead
Session Management: SQLite WAL database with built-in conversation context preservation and error recovery
Async Execution: Concurrent agent operations with automatic resource management and fallback mechanisms

Performance Optimizations

GPU Acceleration: CUDA support with FP8 quantization via vLLM FlashInfer backend and torch.compile optimization
Async Processing: QueryPipeline with parallel execution and intelligent caching
Reranking: ColBERT late-interaction model improves top-5 results from top-20 prefetch
Memory Management: Quantization and model size auto-selection based on available VRAM

⚙️ Configuration

DocMind AI uses a simple, distributed configuration approach optimized for local desktop applications:

Environment Variables: Runtime configuration via .env file
Streamlit Native Config: UI settings via .streamlit/config.toml
Library Defaults: Sensible defaults from LlamaIndex, Qdrant, etc.
Feature Flags: Boolean environment variables for experimental features

Configuration Philosophy

Following KISS principles, configuration is intentionally simple and distributed rather than centralized, avoiding over-engineering for a single-user local application.

Environment Variables

DocMind AI uses environment variables for configuration. Copy the example file and customize:

cp .env.example .env

Key configuration options in .env:

# Model & Backend Services
DOCMIND_MODEL=Qwen/Qwen3-4B-Instruct-2507
DOCMIND_DEVICE=cuda
DOCMIND_CONTEXT_LENGTH=262144
LMDEPLOY_HOST=http://localhost:23333

# Embedding Models (BGE-M3 unified)
EMBEDDING_MODEL=BAAI/bge-m3
RERANKER_MODEL=BAAI/bge-reranker-v2-m3

# Feature Flags
ENABLE_DSPY_OPTIMIZATION=true
ENABLE_GRAPHRAG=false
ENABLE_GPU_ACCELERATION=true
LMDEPLOY_QUANT_POLICY=fp8  # FP8 KV cache

# Performance Tuning
RETRIEVAL_TOP_K=10
RERANK_TOP_K=5
CACHE_SIZE_LIMIT=1073741824  # 1GB

See the complete .env.example file for all available configuration options.

Additional Configuration

Streamlit UI Configuration (.streamlit/config.toml):

[theme]
base = "light"
primaryColor = "#FF4B4B"

[server]
maxUploadSize = 200

Cache Configuration (automatic via LlamaIndex):

Document processing cache: ./cache/documents (1GB limit)
Embedding cache: In-memory with LRU eviction
Model cache: Automatic via Hugging Face transformers

📊 Performance Benchmarks

Performance Metrics

Operation	Performance	Notes
Document Processing (Cold)	~15-30 seconds	50-page PDF with GPU acceleration
Document Processing (Warm)	~2-5 seconds	DiskCache + index caching
Query Response	1-3 seconds	Hybrid retrieval + ColBERT reranking
5-Agent System Response	3-8 seconds	LangGraph supervisor coordination with <200ms overhead
128K Context Processing	1.5-3 seconds	128K context with FP8 KV cache
Vector Search	<500ms	Qdrant in-memory with GPU embeddings
Test Suite (99 tests)	~40 seconds	Comprehensive coverage
Memory Usage (Idle)	400-500MB	Base application
Memory Usage (Processing)	1.2-2.1GB	During document analysis
GPU Memory Usage	~12-14GB	Model + 128K context + embedding cache

Caching Performance

Document Processing Cache:

Cache hit ratio: 85-90% for repeated documents
Storage efficiency: ~1GB handles 1000+ documents
Cache invalidation: Automatic based on file content + settings hash
Concurrent access: Multi-process safe with WAL mode

Hybrid Search Performance

Retrieval Quality Metrics:

Dense + Sparse RRF: 15-20% better recall vs single-vector
ColBERT Reranking: 20-30% context quality improvement
Top-K Retrieval: <2 seconds for 10K document corpus
Knowledge Graph: Entity extraction <1 second per document

System Resource Usage

Memory Profile:

Base application: ~400MB
Document processing: +500-900MB (depends on file size)
Embedding cache: ~200MB for 1000 documents
GPU memory: 8-16GB (model dependent)

Disk Usage:

Application: ~50MB
Document cache: Configurable (default 1GB limit)
Vector database: ~100MB per 1000 documents
Model weights: 2-8GB (embedding + reranking models)

Scalability Benchmarks

Document Count	Processing Time	Query Time	Memory Usage
100 docs	5 minutes	<1 second	800MB
1,000 docs	45 minutes	<2 seconds	1.2GB
5,000 docs	3.5 hours	<5 seconds	2.1GB
10,000 docs	7 hours	<8 seconds	3.5GB

Benchmarks performed on RTX 4090 Laptop GPU, 16GB RAM, NVMe SSD

🔧 Offline Operation

DocMind AI is designed for complete offline operation:

Prerequisites for Offline Use

Install Ollama locally:

# Download from https://ollama.com/download
ollama serve  # Start the service

Pull required models:

ollama pull qwen3-4b-instruct-2507  # Recommended for 128K context
ollama pull qwen2:7b  # Alternative lightweight model

Verify GPU setup (optional):

nvidia-smi  # Check GPU availability
python scripts/gpu_validation.py  # Validate CUDA setup

Model Requirements

Model Size	RAM Required	GPU VRAM	Performance	Context
4B (qwen3-4b-instruct-2507-fp8)	16GB+	12-14GB	Best	128K
7B (e.g., qwen2:7b)	8GB+	4GB+	Good	32K
13B	16GB+	8GB+	Better	32K

🛠️ Troubleshooting

Common Issues

1. Ollama Connection Errors

# Check if Ollama is running
curl http://localhost:11434/api/version

# If not running, start it
ollama serve

2. GPU Not Detected

# Install GPU dependencies
uv sync --extra gpu

# Verify CUDA installation
nvidia-smi
python -c "import torch; print(torch.cuda.is_available())"

3. Model Download Issues

# Pull models manually
ollama pull qwen3-4b-instruct-2507  # For 128K context
ollama pull qwen2:7b  # Alternative
ollama list  # Verify installation

4. Memory Issues

Reduce context size in UI (262144 → 32768 → 4096)
Use smaller models (7B instead of 4B for lower VRAM)
Enable document chunking for large files
Close other applications to free RAM

5. Document Processing Errors

# Check supported formats
echo "Supported: PDF, DOCX, TXT, XLSX, CSV, JSON, XML, MD, RTF, MSG, PPTX, ODT, EPUB"

# For unsupported formats, convert to PDF first

6. vLLM FlashInfer Installation Issues

# Check CUDA compatibility
nvcc --version  # Should show CUDA 12.8+
nvidia-smi     # Should show RTX 4090 and compatible driver

# Clean installation if issues occur
uv pip uninstall torch torchvision torchaudio vllm flashinfer-python -y
uv pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
    --extra-index-url https://download.pytorch.org/whl/cu128
uv pip install "vllm[flashinfer]>=0.10.1" \
    --extra-index-url https://download.pytorch.org/whl/cu128

# Test FlashInfer availability
python -c "import vllm; print('vLLM with FlashInfer imported successfully')"

7. PyTorch 2.7.1 Compatibility Issues

RESOLVED: PyTorch 2.7.1 compatibility was confirmed in vLLM v0.10.0+ (July 2025). Current project uses vLLM>=0.10.1.

# Verify versions
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import vllm; print(f'vLLM: {vllm.__version__}')"

# If using older vLLM, upgrade:
uv pip install --upgrade "vllm[flashinfer]>=0.10.1"

8. GPU Memory Issues (16GB RTX 4090)

# Reduce GPU memory utilization in .env
export VLLM_GPU_MEMORY_UTILIZATION=0.75  # Reduce from 0.85

# Monitor GPU memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv --loop=1

# Clear GPU memory cache
python -c "import torch; torch.cuda.empty_cache()"

9. Performance Validation

# Run performance validation script
python scripts/performance_validation.py

# Expected results for RTX 4090:
# - Decode: 120-180 tokens/second
# - Prefill: 900-1400 tokens/second  
# - VRAM: 12-14GB usage
# - Context: 128K tokens supported

Performance Optimization

Enable GPU acceleration in the UI sidebar
Use appropriate model sizes for your hardware
Enable caching to speed up repeat analysis
Adjust chunk sizes based on document complexity
Use hybrid search for better retrieval quality

Getting Help

Check logs in logs/ directory for detailed errors
Review troubleshooting guide
Search existing GitHub Issues
Open a new issue with: steps to reproduce, error logs, system info

📖 How to Cite

If you use DocMind AI in your research or work, please cite it as follows:

@software{melin_docmind_ai_2025,
  author = {Melin, Bjorn},
  title = {DocMind AI: Local LLM for AI-Powered Document Analysis},
  url = {https://github.com/BjornMelin/docmind-ai-llm},
  version = {0.1.0},
  year = {2025}
}

🙌 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository and create a feature branch

Set up development environment:

git clone https://github.com/your-username/docmind-ai-llm.git
cd docmind-ai-llm
uv sync --group dev

Make your changes following the established patterns

Run tests and linting:

ruff check . --fix
ruff format .
pytest tests/

Submit a pull request with clear description of changes

Development Guidelines

Follow PEP 8 style guide (enforced by Ruff)
Add type hints for all functions
Include docstrings for public APIs
Write tests for new functionality
Update documentation as needed

See CONTRIBUTING.md for detailed guidelines.

📃 License

This project is licensed under the MIT License—see the LICENSE file for details.

Built with ❤️ by Bjorn Melin

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pylintrc		.pylintrc
.uvrc		.uvrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

License

BjornMelin/docmind-ai-llm

Folders and files

Latest commit

History

Repository files navigation