This model is a real-time neural network for object detection that detects 20 different classes from the VOC 2007+2012 datasets. For information on network architecture, see the full YOLOv2 paper.
The model was converted to ONNX from a Core ML version of YOLOv2 using WinMLTools. The base source code for this conversion can be found here. The original network implemented in Darknet was modified to allow for conversion to Keras format. Additionally, to enable conversion to (1) Keras from Darknet format; and (2) Core ML from Keras format, modifications to layers of the yolov2.cfg
file were required, and were performed according to link.
This version is based on the VOC dataset. For an ONNX YOLOv2 Model based on COCO, visit the YOLOv2 COCO page.
Model | Download | ONNX version | Opset version |
---|---|---|---|
YOLOv2 - VOC | 203.9 MB | 1.3 | 8 |
shape (1x3x416x416)
shape (1x125x13x13)
The output is a (125x13x13)
tensor where 13x13 is the number of grid cells that the image gets divided into. Each grid cell corresponds to 5 channels, made up of the 5 bounding boxes predicted by the grid cell and the 20 classes that describe each bounding box (5 x (20 classes + 5) = 125
). For more information on how to derive the final bounding boxes and their corresponding confidence scores, refer to this post.
The YOLOv2 model was trained on the VOC dataset and was sourced from the original yolov2-voc .cfg
and .weights
files from link.
"YOLO9000: Better, Faster, Stronger" arXiv:1612.08242
MIT License