Object Detection Tutorial¶
This tutorial will walk you through the features related to object detection that ChainerCV supports.
We assume that readers have a basic understanding of Chainer framework (e.g. understand
For users new to Chainer, please first read Introduction to Chainer.
In ChainerCV, we define the object detection task as a problem of, given an image, bounding box based localization and categorization of objects. ChainerCV supports the task by providing the following features:
- Detection Link
- Training script for various detection models
Here is a short example that conducts inference and visualizes output.
Please download an image from a link below, and name it as
# In the rest of the tutorial, we assume that the `plt` # is imported before every code snippet. import matplotlib.pyplot as plt from chainercv.datasets import voc_bbox_label_names from chainercv.links import SSD300 from chainercv.utils import read_image from chainercv.visualizations import vis_bbox # Read an RGB image and return it in CHW format. img = read_image('sample.jpg') model = SSD300(pretrained_model='voc0712') bboxes, labels, scores = model.predict([img]) vis_bbox(img, bboxes, labels, scores, label_names=voc_bbox_label_names) plt.show()
Bounding boxes in ChainerCV¶
Bounding boxes in an image are represented as a two-dimensional array of shape \((R, 4)\),
where \(R\) is the number of bounding boxes and the second axis corresponds to the coordinates of bounding boxes.
The coordinates are ordered in the array by
(y_min, x_min, y_max, x_max), where
(y_min, x_min) and
(y_max, x_max) are the
(y, x) coordinates of the top left and the bottom right vertices.
Notice that ChainerCV orders coordinates in
yx order, which is the opposite of the convention used by other libraries such as OpenCV.
This convention is adopted because it is more consistent with the memory order of an image that follows row-column order.
dtype of bounding box array is
Here is an example with a simple toy data.
from chainercv.visualizations import vis_bbox import numpy as np img = np.zeros((3, 224, 224), dtype=np.float32) # We call a variable/array of bounding boxes as `bbox` throughout the library bbox = np.array([[10, 10, 20, 40], [150, 150, 200, 200]], dtype=np.float32) vis_bbox(img, bbox) plt.show()
In this example, two bounding boxes are displayed on top of a black image.
vis_bbox() is a utility function that visualizes
bounding boxes and an image together.
Bounding Box Dataset¶
ChainerCV supports dataset loaders, which can be used to easily index examples with list-like interfaces.
Dataset classes whose names end with
BboxDataset contain annotations of where objects locate in an image and which categories they are assigned to.
These datasets can be indexed to return a tuple of an image, bounding boxes and labels.
The labels are stored in an
np.int32 array of shape \((R,)\). Each element corresponds to a label of an object in the corresponding bounding box.
A mapping between an integer label and a category differs between datasets.
This mapping can be obtained from objects whose names end with
label_names, such as
These mappings become helpful when bounding boxes need to be visualized with label names.
In the next example, the interface of
BboxDataset and the functionality of
vis_bbox() to visualize label names are illustrated.
from chainercv.datasets import VOCBboxDataset from chainercv.datasets import voc_bbox_label_names from chainercv.visualizations import vis_bbox dataset = VOCBboxDataset(year='2012') img, bbox, label = dataset print(bbox.shape) # (2, 4) print(label.shape) # (2,) vis_bbox(img, bbox, label, label_names=voc_bbox_label_names) plt.show()
Note that the example downloads VOC 2012 dataset at runtime when it is used for the first time on the machine.
ChainerCV provides functionalities that make evaluating detection links easy. They are provided at two levels: evaluator extensions and evaluation functions.
Evaluator extensions such as
DetectionVOCEvaluator inherit from
Evaluator, and have similar interface.
They are initialized by taking an iterator and a network that carries out prediction with method
When this class is called (i.e.
DetectionVOCEvaluator), several actions are taken.
First, it iterates over a dataset based on an iterator.
Second, the network makes prediction using the images collected from the dataset.
Last, an evaluation function is called with the ground truth annotations and the prediction results.
In contrast to evaluators that hide details,
evaluation functions such as
are provided for those who need a finer level of control.
These functions take the ground truth annotations and prediction results as arguments
and return measured performance.
Here is a simple example that uses a detection evaluator.
from chainer.iterators import SerialIterator from chainer.datasets import SubDataset from chainercv.datasets import VOCBboxDataset from chainercv.datasets import voc_bbox_label_names from chainercv.extensions import DetectionVOCEvaluator from chainercv.links import SSD300 # Only use subset of dataset so that evaluation finishes quickly. dataset = VOCBboxDataset(year='2007', split='test') dataset = dataset[:6] it = SerialIterator(dataset, 2, repeat=False, shuffle=False) model = SSD300(pretrained_model='voc0712') evaluator = DetectionVOCEvaluator(it, model, label_names=voc_bbox_label_names) # result is a dictionary of evaluation scores. Print it and check it. result = evaluator()
|[Ren15]||Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.|
|[Liu16]||Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.|