RaptorPJPEG

Official repo for "Progressive decoding of DNA-stored JPEG data with on-the-fly error correction", 2025.

Description

This repo combines a lot of packages and libraries into one full pipeline, to allow for full image encoding and decoding. We also use Badread as the storage channel simulator preceding clustering.

Getting Started

Dependencies

CMake
GCC v4.1+/Clang (Unix-like) or Microsoft Visual C++ 2005 or later (Windows)
Badread

Installing

First, build libjpeg-turbo:

git clone 'https://github.com/libjpeg-turbo/libjpeg-turbo.git'
mkdir {build_directory}
cd {build_directory}
cmake -G"Unix Makefiles" [additional CMake flags] {libjpeg-turbo}
make

Modify cjpeg executable path in NOREC4DNA/encode.py to point to your own {build_directory}/cjpeg executable:

cjpeg_path = ".../libjpeg/cjpeg"

Next, build JPEG decoder:

cd jpeg
mkdir build
cd build
cmake ..
make

Modify jpeg executable path in NOREC4DNA/encode.py and NOREC4DNA/decode.py to point to your own jpeg/build/jpeg executable:

jpeg_path = ".../jpeg/build/jpeg"

Next, create a conda env and install NOREC4DNA:

conda create -n "norec" python=3.10
conda activate norec
cd NOREC4DNA
pip install -r requirements.txt
python setup.py install

Install MUSCLE and change path in clustering_dna_storage/strand_reconstruction.py:

# Change the muscle path here
muscle_exe = ".../muscle-osx-arm64.v5.3"

Usage

Encoding and decoding

Encoding an uncompressed bmp image:

python encode.py {file/to/image.bmp} [additional flags]

This generates a few files:

image.jpg = standard JPEG
image_FFDX.jpg = custom JPEG with base-8 RIs
image_FFDX.jpg_RU10.fasta = FASTA containing generated oligo packets
image_FFDX.jpg.ini = config file

Decoding a FASTA:

python decode.py {file/to/image_FFDX.jpg.ini}

This creats a tmp dir, iteratively decodes & renders the image. Images are saved as tmp/patched_IDX.bmp, and a GIF is generated at the end compiling all renders.

Full pipeline

Raptor encoding an uncompressed image, cat.bmp:

cd NOREC4DNA
python encode.py data/cat.bmp --chunk_size 47 --error_correction reedsolomon --insert_header --overhead 0.5 --p_thr 0.4

Add checksum bases to oligos for clustering, via clustering_dna_storage/checksum.ipynb:

# Read cat.fasta - add checksum encoding
# write fasta

RU10_path = "data/cat_FFDX.jpg_RU10.fasta"
checksum_path = os.path.join(os.path.dirname(RU10_path), "cat_FFDX_checksum.fasta")
recovered_path = os.path.join(os.path.dirname(RU10_path), "cat_FFDX_rec.fasta")
oligoLen = 204

checksum = CheckSum4(reference_length=oligoLen)
original_strands, original_strand_ids = read_synthesized_strands_from_file(file_path=RU10_path)
encoded_strands = checksum.encode(original_strands)
create_fasta_file(ids=original_strand_ids, strands=encoded_strands, output_filepath=checksum_path)
print("Checksum encoded strands saved to", checksum_path)

Running data/cat_FFDX_checksum.fasta thru Badread:

badread simulate --reference data/cat_FFDX_checksum.fasta --quantity 60x --identity 97,99,1.0 | gzip > reads.fastq.gz

Run noisy reads (unzipped at data/reads.fastq) thru agglomerative k-mer clustering, via clustering_dna_storage/checksum.ipynb:

# Raptor
records = get_fastq_records(fastq_filepath="data/reads.fastq")
reads = [str(i.seq) for i in records]
ids = [get_badread_strand_id(i) for i in records]

clustering = Clustering(strand_pool=reads, reference_length=oligoLen+4, original_strands=encoded_strands, strand_pool_ids=ids, distance_threshold=40)
clustering.run_pipeline(eval=True)

# We want to recover the original strands after checksum
decoded_strands = checksum.decode(candidates=clustering.candidates, n_reference_strands=len(original_strands), clustered_seqs=clustering.clustered_seqs, n_guesses=5, guesses=True)
ids = ["" for i in range(len(decoded_strands))]
create_fasta_file(ids, decoded_strands, output_filepath=recovered_path)

Running Raptor decoding on Badread + clustered strands:

python decode.py data/cat_FFDX.jpg.ini --badread data/cat_FFDX_rec.fasta

Authors

Ibrahim Nawaz

Version History

0.2
- Updated README, LICENSE and priority encoding
0.1
- Initial Release

License

This project is licensed under the MIT License

Acknowledgments

Clustering algorithm was kindly shared by Parv Argwal
NOREC4DNA was developed by Peter Michael Schwarz & Bernd Freisleben

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Goldman		Goldman
NOREC4DNA		NOREC4DNA
clustering_dna_storage		clustering_dna_storage
data		data
jpeg		jpeg
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RaptorPJPEG

Description

Getting Started

Dependencies

Installing

Usage

Encoding and decoding

Full pipeline

Authors

Version History

License

Acknowledgments

About

Releases

Packages

Languages

License

nawazia/RaptorPJPEG

Folders and files

Latest commit

History

Repository files navigation

RaptorPJPEG

Description

Getting Started

Dependencies

Installing

Usage

Encoding and decoding

Full pipeline

Authors

Version History

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages