generating manifest file for training titanet-large #12853

ukemamaster · 2025-04-02T09:56:50Z

Hi,
I am trying to train the titanet-large model with a huge dataset (10M+ samples) and it is taking more than 20 hours (still running) to generate the manifest.json file for trainig data. i run:
python NeMo/scripts/speaker_tasks/filelist_to_manifest.py --filelist path/to/train.txt --id -2 --out path/to/manifest.json --split
Is there any possibility to make it faster? or to skip this step and be able to train the model with the files list .txt?

The text was updated successfully, but these errors were encountered:

ukemamaster · 2025-04-02T11:10:41Z

Increasing the max_workers in lines = process_map(get_duration, lines, chunksize=100, max_workers=128) solved the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generating manifest file for training titanet-large #12853

generating manifest file for training titanet-large #12853

ukemamaster commented Apr 2, 2025 •

edited

Loading

ukemamaster commented Apr 2, 2025

generating manifest file for training titanet-large #12853

generating manifest file for training titanet-large #12853

Comments

ukemamaster commented Apr 2, 2025 • edited Loading

ukemamaster commented Apr 2, 2025

ukemamaster commented Apr 2, 2025 •

edited

Loading