SSD (Single Shot Multibox Detector)¶
Detection Links¶
SSD300¶

class
chainercv.links.model.ssd.
SSD300
(n_fg_class=None, pretrained_model=None)¶ Single Shot Multibox Detector with 300x300 inputs.
This is a model of Single Shot Multibox Detector [1]. This model uses
VGG16Extractor300
as its feature extractor.[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  n_fg_class (int) – The number of classes excluding the background.
 pretrained_model (str) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the Caffe model provided by the original implementation. The conversion code is chainercv/examples/ssd/caffe2npz.py.'imagenet'
: Load weights of VGG16 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number. filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
SSD512¶

class
chainercv.links.model.ssd.
SSD512
(n_fg_class=None, pretrained_model=None)¶ Single Shot Multibox Detector with 512x512 inputs.
This is a model of Single Shot Multibox Detector [2]. This model uses
VGG16Extractor512
as its feature extractor.[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  n_fg_class (int) – The number of classes excluding the background.
 pretrained_model (str) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the Caffe model provided by the original implementation. The conversion code is chainercv/examples/ssd/caffe2npz.py.'imagenet'
: Load weights of VGG16 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number. filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
Utility¶
Multibox¶

class
chainercv.links.model.ssd.
Multibox
(n_class, aspect_ratios, initialW=None, initial_bias=None)¶ Multibox head of Single Shot Multibox Detector.
This is a head part of Single Shot Multibox Detector [3]. This link computes
mb_locs
andmb_confs
from feature maps.mb_locs
contains information of the coordinates of bounding boxes andmb_confs
contains confidence scores of each classes.[3] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  n_class (int) – The number of classes possibly including the background.
 aspect_ratios (iterable of tuple or int) – The aspect ratios of default bounding boxes for each feature map.
 initialW – An initializer used in
chainer.links.Convolution2d.__init__()
. The default value ischainer.initializers.LeCunUniform
.  initial_bias – An initializer used in
chainer.links.Convolution2d.__init__()
. The default value ischainer.initializers.Zero
.

__call__
(xs)¶ Compute loc and conf from feature maps
This method computes
mb_locs
andmb_confs
from given feature maps.Parameters: xs (iterable of chainer.Variable) – An iterable of feature maps. The number of feature maps must be same as the number of aspect_ratios
.Returns: This method returns two chainer.Variable
:mb_locs
andmb_confs
. mb_locs: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
 mb_confs: A variable of float arrays of shape \((B, K, n\_fg\_class + 1)\).
Return type: tuple of chainer.Variable
MultiboxCoder¶

class
chainercv.links.model.ssd.
MultiboxCoder
(grids, aspect_ratios, steps, sizes, variance)¶ A helper class to encode/decode bounding boxes.
This class encodes
(bbox, label)
to(mb_loc, mb_label)
and decodes(mb_loc, mb_conf)
to(bbox, label, score)
. These encoding/decoding are used in Single Shot Multibox Detector [4].mb_loc
: An array representing offsets and scales from the default bounding boxes. Its shape is \((K, 4)\), where \(K\) is the number of the default bounding boxes. The second axis is composed by \((\Delta y, \Delta x, \Delta h, \Delta w)\). These values are computed by the following formulas. \(\Delta y = (b_y  m_y) / (m_h * v_0)\)
 \(\Delta x = (b_x  m_x) / (m_w * v_0)\)
 \(\Delta h = log(b_h / m_h) / v_1\)
 \(\Delta w = log(b_w / m_w) / v_1\)
\((m_y, m_x)\) and \((m_h, m_w)\) are center coodinates and size of a default bounding box. \((b_y, b_x)\) and \((b_h, b_w)\) are center coodinates and size of a given bounding boxes that is assined to the default bounding box. \((v_0, v_1)\) are coefficients that can be set by argument
variance
.mb_label
: An array representing classes of ground truth bounding boxes. Its shape is \((K,)\).mb_conf
: An array representing classes of predicted bounding boxes. Its shape is \((K, n\_fg\_class + 1)\).
[4] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  grids (iterable of ints) – An iterable of integers. Each integer indicates the size of a feature map.
 aspect_ratios (iterable of tuples of ints) – An iterable of tuples of integers
used to compute the default bouding boxes.
Each tuple indicates the aspect ratios of
the default bounding boxes at each feature maps.
The length of this iterable should be
len(grids)
.  steps (iterable of floats) – The step size for each feature map.
The length of this iterable should be
len(grids)
.  sizes (iterable of floats) – The base size of default bounding boxes
for each feature map.
The length of this iterable should be
len(grids) + 1
.  variance (tuple of floats) – Two coefficients for encoding/decoding the locations of bounding boxes. The first value is used to encode/decode coordinates of the centers. The second value is used to encode/decode the sizes of bounding boxes.

decode
(mb_loc, mb_conf, nms_thresh=0.45, score_thresh=0.6)¶ Decodes back to coordinates and classes of bounding boxes.
This method decodes
mb_loc
andmb_conf
returned by a SSD network back tobbox
,label
andscore
.Parameters:  mb_loc (array) – A float array whose shape is \((K, 4)\), \(K\) is the number of default bounding boxes.
 mb_conf (array) – A float array whose shape is \((K, n\_fg\_class + 1)\).
 nms_thresh (float) – The threshold value
for
chainercv.transfroms.non_maximum_suppression()
. The default value is0.45
.  score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than
this value, the bounding box will be suppressed.
The default value is
0.6
.
Returns: This method returns a tuple of three arrays,
(bbox, label, score)
. bbox: A float array of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bouding box is organized by
(y_min, x_min, y_max, x_max)
in the second axis.  label : An integer array of shape \((R,)\). Each value indicates the class of the bounding box.
 score : A float array of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of three arrays

encode
(bbox, label, iou_thresh=0.5)¶ Encodes coordinates and classes of bounding boxes.
This method encodes
bbox
andlabel
tomb_loc
andmb_label
, which are used to compute multibox loss.Parameters:  bbox (array) – A float array of shape \((R, 4)\),
where \(R\) is the number of bounding boxes in an image.
Each bouding box is organized by
(y_min, x_min, y_max, x_max)
in the second axis.  label (array) – An integer array of shape \((R,)\). Each value indicates the class of the bounding box.
 iou_thresh (float) – The threshold value to determine
a default bounding box is assigned to a ground truth
or not. The default value is
0.5
.
Returns: This method returns a tuple of two arrays,
(mb_loc, mb_label)
. mb_loc: A float array of shape \((K, 4)\), where \(K\) is the number of default bounding boxes.
 mb_label: An integer array of shape \((K,)\).
Return type: tuple of two arrays
 bbox (array) – A float array of shape \((R, 4)\),
where \(R\) is the number of bounding boxes in an image.
Each bouding box is organized by
Normalize¶

class
chainercv.links.model.ssd.
Normalize
(n_channel, initial=0, eps=1e05)¶ Learnable L2 normalization [5].
This link normalizes input along the channel axis and scales it. The scale factors are trained channelwise.
[5] Wei Liu, Andrew Rabinovich, Alexander C. Berg. ParseNet: Looking Wider to See Better. ICLR 2016. Parameters: 
__call__
(x)¶ Normalize input and scale it.
Parameters: x (chainer.Variable) – A variable holding 4dimensional array. Its dtype
isnumpy.float32
.Returns: The shape and dtype
are same as those of input.Return type: chainer.Variable

SSD¶

class
chainercv.links.model.ssd.
SSD
(extractor, multibox, steps, sizes, variance=(0.1, 0.2), mean=0)¶ Base class of Single Shot Multibox Detector.
This is a base class of Single Shot Multibox Detector [6].
[6] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  extractor –
A link which extracts feature maps. This link must have
insize
,grids
and__call__()
.insize
: An integer which indicates the size of input images. Images are resized to this size before feature extraction.grids
: An iterable of integer. Each integer indicates the size of feature map. This value is used byMultiBboxCoder
.__call_()
: A method which computes feature maps. It must take a batched images and return batched feature maps.
 multibox –
A link which computes
mb_locs
andmb_confs
from feature maps. This link must haven_class
,aspect_ratios
and__call__()
.n_class
: An integer which indicates the number of classes. This value should include the background class.aspect_ratios
: An iterable of tuple of integer. Each tuple indicates the aspect ratios of default bounding boxes at each feature maps. This value is used byMultiboxCoder
.__call__()
: A method which computesmb_locs
andmb_confs
. It must take a batched feature maps and returnmb_locs
andmb_confs
.
 steps (iterable of float) – The step size for each feature map.
This value is used by
MultiboxCoder
.  sizes (iterable of float) – The base size of default bounding boxes
for each feature map. This value is used by
MultiboxCoder
.  variance (tuple of floats) – Two coefficients for decoding
the locations of bounding boxe.
This value is used by
MultiboxCoder
. The default value is(0.1, 0.2)
.  nms_thresh (float) – The threshold value
for
chainercv.transfroms.non_maximum_suppression()
. The default value is0.45
. This value can be changed directly or by usinguse_preset()
.  score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than this value,
the bounding box will be suppressed.
The default value is
0.6
. This value can be changed directly or by usinguse_preset()
.

__call__
(x)¶ Compute localization and classification from a batch of images.
This method computes two variables,
mb_locs
andmb_confs
.self.coder.decode()
converts these variables to bounding box coordinates and confidence scores. These variables are also used in training SSD.Parameters: x (chainer.Variable) – A variable holding a batch of images. The images are preprocessed by _prepare()
.Returns: This method returns two variables, mb_locs
andmb_confs
. mb_locs: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
 mb_confs: A variable of float arrays of shape \((B, K, n\_fg\_class + 1)\).
Return type: tuple of chainer.Variable

predict
(imgs)¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
. bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bouding box is organized by
(y_min, x_min, y_max, x_max)
in the second axis.  labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L  1]\), where \(L\) is the number of the foreground classes.
 scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists  bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bouding box is organized by

use_preset
(preset)¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
 extractor –
VGG16¶

class
chainercv.links.model.ssd.
VGG16
¶ An extended VGG16 model for SSD300 and SSD512.
This is an extended VGG16 model proposed in [7]. The differences from original VGG16 [8] are shown below.
conv5_1
,conv5_2
andconv5_3
are changed fromConvolution2d
toDilatedConvolution2d
.Normalize
is inserted afterconv4_3
. The parameters of max pooling after
conv5_3
are changed. fc6
andfc7
are converted toconv6
andconv7
.
[7] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. [8] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks for LargeScale Image Recognition. ICLR 2015.
VGG16Extractor300¶

class
chainercv.links.model.ssd.
VGG16Extractor300
¶ A VGG16 based feature extractor for SSD300.
This is a feature extractor for
SSD300
. This extractor is based onVGG16
.
__call__
(x)¶ Compute feature maps from a batch of images.
This method extracts feature maps from
conv4_3
,conv7
,conv8_2
,conv9_2
,conv10_2
, andconv11_2
.Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(300\times 300\). Returns: Each variable contains a feature map. Return type: list of Variable

VGG16Extractor512¶

class
chainercv.links.model.ssd.
VGG16Extractor512
¶ A VGG16 based feature extractor for SSD512.
This is a feature extractor for
SSD512
. This extractor is based onVGG16
.
__call__
(x)¶ Compute feature maps from a batch of images.
This method extracts feature maps from
conv4_3
,conv7
,conv8_2
,conv9_2
,conv10_2
,conv11_2
, andconv12_2
.Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(512\times 512\). Returns: Each variable contains a feature map. Return type: list of Variable

Trainonly Utility¶
GradientScaling¶
multibox_loss¶

chainercv.links.model.ssd.
multibox_loss
(mb_locs, mb_confs, gt_mb_locs, gt_mb_labels, k)¶ Computes multibox losses.
This is a loss function used in [9]. This function returns
loc_loss
andconf_loss
.loc_loss
is a loss for localization andconf_loss
is a loss for classification. The formulas of these losses can be found in the equation (2) and (3) in the original paper.[9] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  mb_locs (chainer.Variable or array) – The offsets and scales for predicted bounding boxes. Its shape is \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
 mb_confs (chainer.Variable or array) – The classes of predicted bounding boxes. Its shape is \((B, K, n\_class)\). This function assumes the first class is background (negative).
 gt_mb_locs (chainer.Variable or array) – The offsets and scales for ground truth bounding boxes. Its shape is \((B, K, 4)\).
 gt_mb_labels (chainer.Variable or array) – The classes of ground truth bounding boxes. Its shape is \((B, K)\).
 k (float) – A coefficient which is used for hard negative mining.
This value determines the ratio between the number of positives
and that of mined negatives. The value used in the original paper
is
3
.
Returns: This function returns two
chainer.Variable
:loc_loss
andconf_loss
.Return type: tuple of chainer.Variable
random_crop_with_bbox_constraints¶

chainercv.links.model.ssd.
random_crop_with_bbox_constraints
(img, bbox, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50, return_param=False)¶ Crop an image randomly with bounding box constraints.
This data augmentation is used in training of Single Shot Multibox Detector [10]. More details can be found in data augmentation section of the original paper.
[10] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  img (ndarray) – An image array to be cropped. This is in CHW format.
 bbox (ndarray) – Bounding boxes used for constraints. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
 min_scale (float) – The minimum ratio between a cropped
region and the original image. The default value is
0.3
.  max_scale (float) – The maximum ratio between a cropped
region and the original image. The default value is
1
.  max_aspect_ratio (float) – The maximum aspect ratio of cropped region.
The default value is
2
.  constaraints (iterable of tuples) – An iterable of constraints.
Each constraint should be
(min_iou, max_iou)
format. If you setmin_iou
ormax_iou
toNone
, it means not limited. If this argument is not specified,((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1))
will be used.  max_trial (int) – The maximum number of trials to be conducted
for each constraint. If this function
can not find any region that satisfies the constraint in
\(max\_trial\) trials, this function skips the constraint.
The default value is
50
.  return_param (bool) – If
True
, this function returns information of intermediate values.
Returns: If
return_param = False
, returns an arrayimg
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areimg, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, valuetype and the description of the value. constraint (tuple): The chosen constraint.
 y_slice (slice): A slice in vertical direction used to crop the input image.
 x_slice (slice): A slice in horizontal direction used to crop the input image.
Return type:
random_distort¶

chainercv.links.model.ssd.
random_distort
(img, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)¶ A color related data augmentation used in SSD.
This function is a combination of four augmentation methods: brightness, contrast, saturation and hue.
 brightness: Adding a random offset to the intensity of the image.
 contrast: Multiplying the intensity of the image by a random scale.
 saturation: Multiplying the saturation of the image by a random scale.
 hue: Adding a random offset to the hue of the image randomly.
This data augmentation is used in training of Single Shot Multibox Detector [11].
Note that this function requires
cv2
.[11] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters:  img (ndarray) – An image array to be augmented. This is in CHW and RGB format.
 brightness_delta (float) – The offset for saturation will be
drawn from \([brightness\_delta, brightness\_delta]\).
The default value is
32
.  contrast_low (float) – The scale for contrast will be
drawn from \([contrast\_low, contrast\_high]\).
The default value is
0.5
.  contrast_high (float) – See
contrast_low
. The default value is1.5
.  saturation_low (float) – The scale for saturation will be
drawn from \([saturation\_low, saturation\_high]\).
The default value is
0.5
.  saturation_high (float) – See
saturation_low
. The default value is1.5
.  hue_delta (float) – The offset for hue will be
drawn from \([hue\_delta, hue\_delta]\).
The default value is
18
.
Returns: An image in CHW and RGB format.
resize_with_random_interpolation¶

chainercv.links.model.ssd.
resize_with_random_interpolation
(img, size, return_param=False)¶ Resize an image with a randomly selected interpolation method.
This function is similar to
chainercv.transforms.resize()
, but this chooses the interpolation method randomly.This data augmentation is used in training of Single Shot Multibox Detector [12].
Note that this function requires
cv2
.[12] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: Returns: If
return_param = False
, returns an arrayimg
that is the result of rotation.If
return_param = True
, returns a tuple whose elements areimg, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, valuetype and the description of the value. interpolatation: The chosen interpolation method.
Return type: