SSD (Single Shot Multibox Detector)

Utility

Multibox

class chainercv.links.model.ssd.Multibox(n_class, aspect_ratios, initialW=None, initial_bias=None)

Multibox head of Single Shot Multibox Detector.

This is a head part of Single Shot Multibox Detector [3]. This link computes loc and conf from feature maps. loc contains information of the coordinates of bounding boxes and conf contains that of classes.

[3]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
Parameters:
  • n_class (int) – The number of classes possibly including the background.
  • aspect_ratios (iterable of tuple or int) – The aspect ratios of default bounding boxes for each feature map.
  • initialW – An initializer used in chainer.links.Convolution2d.__init__(). The default value is chainer.initializers.GlorotUniform.
  • initial_bias – An initializer used in chainer.links.Convolution2d.__init__(). The default value is chainer.initializers.Zero.
__call__(xs)

Compute loc and conf from feature maps

This method computes loc and conf from given feature maps.

Parameters:xs (iterable of chainer.Variable) – An iterable of feature maps. The number of feature maps must be same as the number of aspect_ratios.
Returns:This method returns two chainer.Variable, loc and conf. loc is an array whose shape is \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes. conf is an array whose shape is \((B, K, n\_class)\)
Return type:tuple of chainer.Variable

Normalize

class chainercv.links.model.ssd.Normalize(n_channel, initial=0, eps=1e-05)

Learnable L2 normalization [4].

This link normalizes input along the channel axis and scales it. The scale factors are trained channel-wise.

[4]Wei Liu, Andrew Rabinovich, Alexander C. Berg. ParseNet: Looking Wider to See Better. ICLR 2016.
Parameters:
  • n_channel (int) – The number of channels.
  • initial – A value to initialize the scale factors. It is pased to chainer.initializers._get_initializer(). The default value is 0.
  • eps (float) – A small value to avoid zero-division. The default value is \(1e-5\).
__call__(x)

Normalize input and scale it.

Parameters:x (chainer.Variable) – A variable holding 4-dimensional array. Its dtype is numpy.float32.
Returns:The shape and dtype are same as those of input.
Return type:chainer.Variable

SSD

class chainercv.links.model.ssd.SSD(extractor, multibox, steps, sizes, variance=(0.1, 0.2), mean=0)

Base class of Single Shot Multibox Detector.

This is a base class of Single Shot Multibox Detector [5].

[5]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
Parameters:
  • extractor

    A link which extracts feature maps. This link must have insize, grids and __call__().

    • insize: An integer which indicates the size of input images. Images are resized to this size before feature extraction.
    • grids: An iterable of integer. Each integer indicates the size of feature map.
    • __call_(): A method which computes feature maps. It must take a batched images and return batched feature maps.
  • multibox

    A link which computes loc and conf from feature maps. This link must have n_class, aspect_ratios and __call__().

    • n_class: An integer which indicates the number of classes. This value should include the background class.
    • aspect_ratios: An iterable of tuple of integer. Each tuple indicates the aspect ratios of default bounding boxes at each feature maps.
    • __call__(): A method which computes loc and conf. It must take a batched feature maps and return loc and conf.
  • steps (iterable of float) – The step size for each feature map.
  • sizes (iterable of float) – The base size of default bounding boxes for each feature map.
  • variance (tuple of float) – Two coefficients for encoding the locations of bounding boxe. The first value is used to encode coordinates of the centers. The second value is used to encode the sizes of bounding boxes. The default value is (0.1, 0.2).
  • nms_thresh (float) – The threshold value for chainercv.transfroms.non_maximum_suppression(). The default value is 0.45. This value can be changed directly or by using use_preset().
  • score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is 0.6. This value can be changed directly or by using use_preset().
__call__(x)

Compute localization and classification from a batch of images.

This method computes two variables, loc and conf. _decode() converts these variables to bounding box coordinates and confidence scores. These variables are also used in training SSD.

Parameters:x (chainer.Variable) – A variable holding a batch of images. The images are preprocessed by _prepare().
Returns:This method returns two variables, loc and conf.
  • loc: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and :\(K\) is the number of default bounding boxes.
  • conf: A variable of float arrays of shape \((B, K, n\_fg\_class + 1)\).
Return type:tuple of chainer.Variable
predict(imgs)

Detect objects from images.

This method predicts objects for each image.

Parameters:imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns:This method returns a tuple of three lists, (bboxes, labels, scores).
  • bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bouding box is organized by (y_min, x_min, y_max, x_max) in the second axis.
  • labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
  • scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type:tuple of lists
use_preset(preset)

Use the given preset during prediction.

This method changes values of nms_thresh and score_thresh. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals in predict(), respectively.

If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.

Parameters:preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.

VGG16

class chainercv.links.model.ssd.VGG16

An extended VGG-16 model for SSD300 and SSD512.

This is an extended VGG-16 model proposed in [6]. The differences from original VGG-16 [7] are shown below.

  • conv5_1, conv5_2 and conv5_3 are changed from Convolution2d to DilatedConvolution2d.
  • Normalize is inserted after conv4_3.
  • The parameters of max pooling after conv5_3 are changed.
  • fc6 and fc7 are converted to conv6 and conv7.
[6]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
[7]Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015.

VGG16Extractor300

class chainercv.links.model.ssd.VGG16Extractor300

A VGG-16 based feature extractor for SSD300.

This is a feature extractor for SSD300. This extractor is based on VGG16.

__call__(x)

Compute feature maps from a batch of images.

This method extracts feature maps from conv4_3, conv7, conv8_2, conv9_2, conv10_2, and conv11_2.

Parameters:x (ndarray) – An array holding a batch of images. The images should be resized to \(300\times 300\).
Returns:Each variable contains a feature map.
Return type:list of Variable

VGG16Extractor512

class chainercv.links.model.ssd.VGG16Extractor512

A VGG-16 based feature extractor for SSD512.

This is a feature extractor for SSD512. This extractor is based on VGG16.

__call__(x)

Compute feature maps from a batch of images.

This method extracts feature maps from conv4_3, conv7, conv8_2, conv9_2, conv10_2, conv11_2, and conv12_2.

Parameters:x (ndarray) – An array holding a batch of images. The images should be resized to \(512\times 512\).
Returns:Each variable contains a feature map.
Return type:list of Variable