ZenCtrl

An all-in-one, control framework for unified visual content creation using GenAI.
Generate multi-view, diverse-scene, and task-specific high-resolution images from a single subject image—without fine-tuning.

🧠 Overview

ZenCtrl is a comprehensive toolkit built to tackle core challenges in image generation:

No fine-tuning needed – works from a single subject image
Maintains control over shape, pose, camera angle, context
Supports high-resolution, multi-scene generation
Modular toolkit for preprocessing, control, editing, and post-processing tasks

ZenCtrl is based on OminiControl but enhanced with more fine-grained control, consistent subject preservation, and more improved and ready-to-use models. Our goal is to build an agentic visual generation system that can orchestrate image/video creation from LLM-driven recipes.

🛠 Toolkit Components (coming soon)

🧹 Preprocessing

Background removal
Matting
Reshaping
Segmentation

🎮 Control Models

Shape (HED, Scribble, Depth)
Pose (OpenPose, DensePose)
Mask control
Camera/View control

🎨 Post-processing

Deblurring
Color fixing
Natural blending

✏️ Editing Models

Inpainting (removal, masked editing, replacement)
Outpainting
Transformation / Motion
Relighting

🎯 Supported Tasks

Background generation
Controlled background generation
Subject-consistent context-aware generation
Object and subject placement (coming soon)
In-context image/video generation (coming soon)
Multi-object/subject merging & blending (coming soon)
Video generation (coming soon)

📦 Target Use Cases

Product photography
Fashion & accessory try-on
Virtual try-on (shoes, hats, glasses, etc.)
People & portrait control
Illustration, animation, and ad creatives

All of these tasks can be mixed and layered — ZenCtrl is designed to support real-world visual workflows with agentic task composition.

📢 News

2025-03-24: 🧠 First release — model weights available on Hugging Face!
2025-05-06: 📢 Update — ource code release, latest model weights available on Hugging Face!
Coming Soon: Quick Start guide, Upscaling source code, Example notebooks
Next: Controlled fine-grain version on our platform and API (Pro version)
Future: Video generation toolkit release

🚀 Quick Start

Before running the Gradio code, please install the requirements and download the weights from our HuggingFace repository:
👉 https://huggingface.co/fotographerai/zenctrl_tools

We matched our original code with the Omnicontrol structure. Our model takes two inputs instead, but we are going to release the original code soon with the LLaMA task driver — so stay tuned. We will also update the tasks for specific verticals (e.g., virtual try-on, ad creatives, etc.).

Quick Setup (CMD)

You can follow the step-by-step setup instructions below:

*** Cloning and setting up ZenCtrl
git clone https://github.com/FotographerAI/ZenCtrl.git
cd ZenCtrl

*** Creating virtual environment
python -m venv venv
call venv\Scripts\activate.bat

*** Installing PyTorch and requirements
pip install torch==2.7.0+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install --upgrade pip wheel setuptools
pip install -r requirements.txt

*** Downloading model weights
curl --create-dirs -L https://huggingface.co/fotographerai/zenctrl_tools/resolve/main/weights/zen2con_1440_17000/pytorch_lora_weights.safetensors -o weights\zen2con_1440_17000\pytorch_lora_weights.safetensors

*** All set! Launching Gradio app
python app/gradio_app.py

🎨 Demo

Examples

🧪 Try it now on Hugging Face Space

🔧 Models (Updated Weights Released)

Type	Name	Base	Resolution	Description	links
Subject Generation	`zen2con_1440_17000`	FLUX.1	1024x1024	Core model for subject-driven gen	link
Bg generation + Canny	`bg_canny_58000_1024`	FLUX.1	1024x1024	Enhanced background control	link
Deblurring Model	`deblurr_1024_10000`	OminiControl	1024x1024	Quality recovery post-generation	link

🚧 Limitations

Models currently perform best with objects, and to some extent humans.
Resolution support is currently capped at 1024x1024 (higher quality coming soon).
Performance with illustrations is currently limited.
The models were not trained on large-scale or highly diverse datasets yet — we plan to improve quality and variation by training on larger and more diverse datasets, especially for illustration and stylized content.
Video support and the full agentic task pipeline are still under development.

📋 To-do

Release early pretrained model weights for defined tasks
Release additional task-specific models and modes
Release open source code
Launch API access via Baseten for easier deployment
Release Quick Start guide and example notebooks
Launch API access via our app for easier deployment
Release high-resolution models (1500×1500+)
Enable full toolkit integration with agent API
Add video generation module

🤝 Join the Community

💬 Discord – share ideas and feedback
🌐 Landing Page
🧪 Try it now on Hugging Face Space

🤝 Community Collaboration

We hope to collaborate closely with the open-source community to make ZenCtrl a powerful and extensible toolkit for visual content creation.
Once the source code is released, we welcome contributions in training, expanding supported use cases, and developing new task-specific modules.
Our vision is to make ZenCtrl the standard framework for agentic, high-quality image and video generation — built together, for everyone.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
app		app
flux		flux
samples		samples
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZenCtrl

🧠 Overview