This comprehensive guide outlines the process of setting up, training, and testing CycleGAN and Pix2Pix models on Linux systems, including usage in high-performance computing (HPC) environments.
- Prerequisites
- Environment Setup
- Data Preparation
- Model Training
- Model Testing
- Using Batch Job Scripts
- Output and Monitoring
- GPU Monitoring
- Troubleshooting
- Additional Resources
- Linux operating system (Ubuntu 18.04 or later recommended)
- NVIDIA GPU with CUDA support
- Internet connection
- Sudo privileges
-
Install CUDA and cuDNN:
- Visit the NVIDIA CUDA Toolkit Archive
- Install CUDA and cuDNN following official instructions
-
Install Anaconda:
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh bash Anaconda3-2023.09-0-Linux-x86_64.sh source ~/.bashrc
-
Create and activate a virtual environment:
conda create -n pix2pix python=3.8 conda activate pix2pix
-
Install PyTorch:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
-
Clone the repository:
git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git cd pytorch-CycleGAN-and-pix2pix
-
Install dependencies:
pip install -r requirements.txt
-
Organize your dataset in
datasets/your_dataset_name
:- For Pix2Pix: Use
/test
,/train
, and/val
folders - For CycleGAN: Use
TestA
,TestB
,TrainA
,TrainB
folders
- For Pix2Pix: Use
-
For image generation, refer to the separate image generation guide.
Navigate to the repository folder:
cd path/to/pytorch-CycleGAN-and-pix2pix
python train.py --dataroot ./datasets/your_dataset \
--name your_experiment_name \
--model pix2pix \
--direction AtoB \
--save_epoch_freq 1 \
--n_epochs 500 \
--batch_size 150
python train.py --dataroot ./datasets/your_dataset \
--name your_experiment_name \
--model cycle_gan \
--direction AtoB \
--n_epochs 10 \
--batch_size 1
python test.py --dataroot ./datasets/your_test_dataset \
--name your_experiment_name \
--model [pix2pix/cycle_gan] \
--num_test 1000
For HPC environments using SLURM, we provide example scripts:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=8:00:00
#SBATCH --job-name=train-ma-b-p2p-v100
#SBATCH --partition=gpu
#SBATCH --gres=gpu:v100-sxm2:1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32GB
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your.email@example.com
module load anaconda3/2022.05 cuda/11.8
source activate /path/to/your/conda/env
python /work/re-blocking/pytorch-CycleGAN-and-pix2pix/train.py \
--dataroot /work/re-blocking/data/ma-boston \
--checkpoints_dir /work/re-blocking/checkpoints \
--name ma-boston-p2p-200-150-v100 \
--model pix2pix \
--direction AtoB \
--save_epoch_freq 1 \
--continue_train \
--epoch_count 491 \
--n_epochs 500 \
--batch_size 150
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=0:15:00
#SBATCH --job-name=test-ma-b-b-v100
#SBATCH --partition=gpu
#SBATCH --gres=gpu:v100-sxm2:1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4GB
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your.email@example.com
module load anaconda3/2022.05 cuda/11.8
source activate /path/to/your/conda/env
python /work/re-blocking/pytorch-CycleGAN-and-pix2pix/test.py \
--dataroot /work/re-blocking/data/ny-brooklyn \
--checkpoints_dir /work/re-blocking/checkpoints \
--results_dir /work/re-blocking/results \
--name ma-boston-p2p-200-150-v100 \
--model pix2pix \
--num_test 1000
To use these scripts:
- Save them in your project directory
- Make them executable:
chmod +x script_name.sh
- Submit the job:
sbatch script_name.sh
- View training progress:
./checkpoints/your_experiment_name/web/index.html
- Monitor training logs: Use
logs-visualised.ipynb
(Work in Progress) - Results are saved in the
results
directory
Monitor NVIDIA GPU usage:
watch -n 0.1 nvidia-smi
- CUDA errors: Ensure CUDA and PyTorch versions are compatible
- Memory issues: Reduce batch size or image size
- For other issues, consult the official PyTorch and project documentation
For detailed parameter explanations, refer to the options
directory in the project repository.