Skip to content

Latest commit

 

History

History
62 lines (39 loc) · 2.04 KB

README.md

File metadata and controls

62 lines (39 loc) · 2.04 KB

RGDiffSR

The official pytorch implementation of Paper: RECOGNITION-GUIDED DIFFUSION MODEL FOR SCENE TEXT IMAGE SUPER-RESOLUTION

paper

Installation

Environment preparation: (Python 3.8 + PyTorch 1.7.0 + Torchvision 0.8.1 + pytorch_lightning 1.5.10 + CUDA 11.0)

conda create -n RGDiffSR python=3.8
git clone git@github.com:shercoo/RGDiffSR.git
cd RGDiffSR
pip install -r requirements.txt

You can also refer to taming-transformers for the installation of taming-transformers library (Needed if VQGAN is applied).

Dataset preparation

Download the TextZoom dataset at TextZoom.

Model checkpoints

Download the pre-trained recognizers Aster, Moran, CRNN.

Download the checkpoints of pre-trained VQGAN and RGDiffSR at Baidu Netdisk. Password: yws3

Training

First train the latent encoder (VQGAN) model.

CUDA_VISIBLE_DEVICES=<GPU_IDs> python main.py -b configs/autoencoder/vqgan_2x.yaml -t --gpus <GPU_IDS>     

Put the pre-trained VQGAN model in checkpoints/.

CUDA_VISIBLE_DEVICES=<GPU_IDs> python main.py -b configs/latent-diffusion/sr_best.yaml -t --gpus <GPU_IDS>

Testing

Put the pre-trained RGDiffSR model in checkpoints/.

CUDA_VISIBLE_DEVICES=<GPU_IDs> python test.py -b configs/latent-diffusion/sr_test.yaml  --gpus <GPU_IDS>

You can manually modify the test dataset directory in sr_test.yaml for test on different difficulty of TextZoom dataset.

License

The model is licensed under the MIT license.

Acknowledgement

Our code is built on the latent-diffusion and TATT repositories. Thanks to their research!