RTF evaluation issues #12978

a779159990 · 2025-04-11T06:31:55Z

Thank you very much for your great work. I have some questions about canary-1b-flash. The RTFx speed you mentioned in your project is over 1000+ on a100 device, which is about RTF<0.001. However, in our actual test, the speed is far from enough. Can you help me find out where the problem is?
The test code is
`

test code

from NeMo.nemo.collections.asr.models import EncDecMultiTaskModel

from NeMo.nemo.collections.asr.parts.utils.streaming_utils import FrameBatchMultiTaskAED
from NeMo.nemo.collections.asr.parts.utils.transcribe_utils import get_buffered_pred_feat_multitaskAED
import time
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
canary_model = EncDecMultiTaskModel.restore_from('canary-1b-flash/canary-1b-flash.nemo', map_location=device)
canary_model = canary_model.to(device)

decode_cfg = canary_model.cfg.decoding
decode_cfg.beam.beam_size = 1
canary_model.change_decoding_strategy(decode_cfg)
feature_stride = canary_model.cfg.preprocessor['window_stride']

model_stride_in_secs = feature_stride * 8
start = time.time()
output = canary_model.transcribe(
"mix.wav",
batch_size=128, # batch size to run the inference with
# pnc='yes', # generate output with Punctuation and Capitalization
# timestamps='yes', # generate output with timestamps
)

predicted_text_1 = output[0].text
print(time.time()-start)
print(predicted_text_1)

start = time.time()
output = canary_model.transcribe(
"mix.wav",
batch_size=128, # batch size to run the inference with
# pnc='yes', # generate output with Punctuation and Capitalization
# timestamps='yes', # generate output with timestamps
# source_lang="en", # language of the audio input, set source_lang==target_lang for ASR, choices=['en','de','es','fr']
# target_lang="de", # language of the text output, choices=['en','de','es','fr']

)

predicted_text_1 = output[0].text
print(time.time()-start)
print(predicted_text_1)`

The processed audio time is about 10s, and the processed time is about 2s. Is this correct?
Our device is a single a100-40GB.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTF evaluation issues #12978

RTF evaluation issues #12978

a779159990 commented Apr 11, 2025

RTF evaluation issues #12978

RTF evaluation issues #12978

Comments

a779159990 commented Apr 11, 2025

test code