This repository shares the preprocessing scripts we use at the NIH Molecular Imaging Branch. Depending on your raw data and your labels several preprocessing steps are needed before Deep Learning models can be trained. This repository focuses mainly on Computer Vision and Natural Language Processing tasks.
1. Image patch extractor for VOI files
The VOI file format is used to save medical imaging segmentations. Although not very common, it is still used by some applications e.g. Brainmaker or NIH MIPAV. With this library image patches and masks can be created based on segmentations in VOI format.
2. DOC/DOCX document converter
Microsoft Office switched from the old DOC format to the XML based DOCX format in 2003. This format is more suitable for Natural Language Processing since extracting strings and parsing text is pretty straightforward. This script converts DOC files to DOCX files or vice versa and can be run directly in the command line.