Batch Inference Does Not Cause Significant Increase In Per Image Inference Speed #187

SWillSZ · 2022-12-22T03:15:21Z

SWillSZ
Dec 22, 2022

Hello,

I am not sure whether my code is incorrect, or whether YoloV5 may have an error. In particular, inference time per image is roughly constant, regardless of whether the images are passed to YoloV5 through a batch or one by one. Batching images for YoloV5 inference gives little benefit. I have attached a simple demo script which can be copy/pasted and run to verify.

from yolov5 import YOLOv5
import datetime

# set model params
model_path = "yolov5/weights/yolov5s.pt"
device = "cuda:0" # or "cpu"

# init yolov5 model
yolov5 = YOLOv5(model_path, device)

# load images
image1 = 'https://github.com/ultralytics/yolov5/raw/master/data/images/zidane.jpg'
image2 = 'https://github.com/ultralytics/yolov5/raw/master/data/images/zidane.jpg'
image3 = 'https://github.com/ultralytics/yolov5/raw/master/data/images/zidane.jpg'
image4 = 'https://github.com/ultralytics/yolov5/raw/master/data/images/zidane.jpg'


# perform inference on multiple images
print("Starting inference for batch of 1 image")
start_time = datetime.datetime.now()
for i in range(0,100):
	results = yolov5.predict([image1])
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print("For a batch of 1 images, ran 100 times, elapsed time is:")
print(elapsed_time)
print("Starting inference for batch of 4 images")
start_time = datetime.datetime.now()
for i in range(0,100):
	results = yolov5.predict([image1,image2,image3,image4])
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print("For a batch of 4 images, ran 100 times, elapsed time is:")
print(elapsed_time)

I am using a 6GB 1060 NVIDIA GPU, with 12 cores and plenty of memory. Thank you so much for taking a look, I really appreciate it, and I believe this might be an issue many users are running into.

fcakyon · 2023-01-07T22:27:40Z

fcakyon
Jan 7, 2023
Maintainer

Instead of YOLOv5 class, can you try with model=yolov5.load() ? YOLOv5 class from this repo is deprecated.

Furthermore, when you provide image path instead of image numpy/torch array, the reading time from the disk will be the bottleneck instead of the infetence time. To overcome this issue I suggest you to update your benchmark script so that it does not include the image reading time.

Have you compared the same with original ultralytics implementation of hub inference?

Can you also paste your inference time results so that I can understand the issue you are pointing out?

Lastly batch inference is beneficial when you are utilizing a dataloader with num workers > 1. Otherwise loading/arranging the images might take more time than inference time.

1 reply

SWillSZ Jan 7, 2023
Author

Thank you for your help. Happy to fix this up - I will make the changes you have recommended and get back to you shortly with results

SWillSZ · 2023-01-08T23:30:20Z

SWillSZ
Jan 8, 2023
Author

@fcakyon,

I implemented your changes to be sure that reading time is not a factor
I have re-implemented with Ultralytics
I have included the inference time results
I presume that the dataloading should not be an issue with this implementation, as it is taken from @glenn-jocher's comment here batch inference ultralytics/yolov5#1806

The slowdown looks to also be present in the underlying Ultralytics implementation. My guess is that it is due to the complexity of the YoloV5 layers combined with the limited number of CUDA cores on the GTX 1060 6GB (1280). If all the CUDA cores are occupied in processing a single layer for inference on a single image, no free cores will be able to simultaneously operate on a second image if the batch consists of multiple images. For a GTX 1060 6GB, batch processing gives no speedup:

import cv2
import torch
import datetime

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Load two identical images
for f in ['bus.jpg', 'bus.jpg']:
    torch.hub.download_url_to_file('https://ultralytics.com/images/' + f, f)  # download 2 images
img1 = cv2.imread('bus.jpg')[..., ::-1]  # OpenCV image (BGR to RGB)
img2 = cv2.imread('bus.jpg')[..., ::-1]  # OpenCV image (BGR to RGB)
imgs = [img1]  # batch of a single image
imgs_2 = [img1, img2] # batch of multiple images

# Run inference on a batch with one image, timing
start_time = datetime.datetime.now()
for i in range(0, 1000):
	results = model(imgs, size=640)
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print("For a batch of 1 images, ran 1000 times, elapsed time is:")
print(elapsed_time)


# Run inference on a batch with two images, timing
start_time = datetime.datetime.now()
for i in range(0, 1000):
	results = model(imgs_2, size=640)
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print("For a batch of 2 images, ran 1000 times, elapsed time is:")
print(elapsed_time)

On a GTX 1060 6GB, this gives the following output:

For a batch of 1 images, ran 1000 times, elapsed time is:
0:00:15.943667
For a batch of 2 images, ran 1000 times, elapsed time is:
0:00:30.491301

2 replies

fcakyon Jan 9, 2023
Maintainer

Have you opened an issue on the original ultralytics repo with this results?

SWillSZ Jan 11, 2023
Author

I have. You might find it interesting in that it ties in with glenn's timing results for batch processing images. ultralytics/yolov5#10740

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Inference Does Not Cause Significant Increase In Per Image Inference Speed #187

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Batch Inference Does Not Cause Significant Increase In Per Image Inference Speed #187

SWillSZ Dec 22, 2022

Replies: 2 comments · 3 replies

fcakyon Jan 7, 2023 Maintainer

SWillSZ Jan 7, 2023 Author

SWillSZ Jan 8, 2023 Author

fcakyon Jan 9, 2023 Maintainer

SWillSZ Jan 11, 2023 Author

SWillSZ
Dec 22, 2022

Replies: 2 comments 3 replies

fcakyon
Jan 7, 2023
Maintainer

SWillSZ Jan 7, 2023
Author

SWillSZ
Jan 8, 2023
Author

fcakyon Jan 9, 2023
Maintainer

SWillSZ Jan 11, 2023
Author