SSD (Single Shot Multibox Detector)¶
Detection Links¶
SSD300¶
-
class
chainercv.links.model.ssd.
SSD300
(n_fg_class=None, pretrained_model=None)¶ Single Shot Multibox Detector with 300x300 inputs.
This is a model of Single Shot Multibox Detector [1]. This model uses
VGG16Extractor300
as its feature extractor.[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (str) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the Caffe model provided by the original implementation. The conversion code is chainercv/examples/ssd/caffe2npz.py.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
SSD512¶
-
class
chainercv.links.model.ssd.
SSD512
(n_fg_class=None, pretrained_model=None)¶ Single Shot Multibox Detector with 512x512 inputs.
This is a model of Single Shot Multibox Detector [2]. This model uses
VGG16Extractor512
as its feature extractor.[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (str) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the Caffe model provided by the original implementation. The conversion code is chainercv/examples/ssd/caffe2npz.py.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
Utility¶
Multibox¶
-
class
chainercv.links.model.ssd.
Multibox
(n_class, aspect_ratios, initialW=None, initial_bias=None)¶ Multibox head of Single Shot Multibox Detector.
This is a head part of Single Shot Multibox Detector [3]. This link computes
loc
andconf
from feature maps.loc
contains information of the coordinates of bounding boxes andconf
contains that of classes.[3] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - n_class (int) – The number of classes possibly including the background.
- aspect_ratios (iterable of tuple or int) – The aspect ratios of default bounding boxes for each feature map.
- initialW – An initializer used in
chainer.links.Convolution2d.__init__()
. The default value ischainer.initializers.GlorotUniform
. - initial_bias – An initializer used in
chainer.links.Convolution2d.__init__()
. The default value ischainer.initializers.Zero
.
-
__call__
(xs)¶ Compute loc and conf from feature maps
This method computes
loc
andconf
from given feature maps.Parameters: xs (iterable of chainer.Variable) – An iterable of feature maps. The number of feature maps must be same as the number of aspect_ratios
.Returns: This method returns two chainer.Variable
,loc
andconf
.loc
is an array whose shape is \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.conf
is an array whose shape is \((B, K, n\_class)\)Return type: tuple of chainer.Variable
Normalize¶
-
class
chainercv.links.model.ssd.
Normalize
(n_channel, initial=0, eps=1e-05)¶ Learnable L2 normalization [4].
This link normalizes input along the channel axis and scales it. The scale factors are trained channel-wise.
[4] Wei Liu, Andrew Rabinovich, Alexander C. Berg. ParseNet: Looking Wider to See Better. ICLR 2016. Parameters: -
__call__
(x)¶ Normalize input and scale it.
Parameters: x (chainer.Variable) – A variable holding 4-dimensional array. Its dtype
isnumpy.float32
.Returns: The shape and dtype
are same as those of input.Return type: chainer.Variable
-
SSD¶
-
class
chainercv.links.model.ssd.
SSD
(extractor, multibox, steps, sizes, variance=(0.1, 0.2), mean=0)¶ Base class of Single Shot Multibox Detector.
This is a base class of Single Shot Multibox Detector [5].
[5] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - extractor –
A link which extracts feature maps. This link must have
insize
,grids
and__call__()
.insize
: An integer which indicates the size of input images. Images are resized to this size before feature extraction.grids
: An iterable of integer. Each integer indicates the size of feature map.__call_()
: A method which computes feature maps. It must take a batched images and return batched feature maps.
- multibox –
A link which computes loc and conf from feature maps. This link must have
n_class
,aspect_ratios
and__call__()
.n_class
: An integer which indicates the number of classes. This value should include the background class.aspect_ratios
: An iterable of tuple of integer. Each tuple indicates the aspect ratios of default bounding boxes at each feature maps.__call__()
: A method which computesloc
andconf
. It must take a batched feature maps and returnloc
andconf
.
- steps (iterable of float) – The step size for each feature map.
- sizes (iterable of float) – The base size of default bounding boxes for each feature map.
- variance (tuple of float) – Two coefficients for encoding
the locations of bounding boxe. The first value is used to
encode coordinates of the centers. The second value is used to
encode the sizes of bounding boxes.
The default value is
(0.1, 0.2)
. - nms_thresh (float) – The threshold value
for
chainercv.transfroms.non_maximum_suppression()
. The default value is0.45
. This value can be changed directly or by usinguse_preset()
. - score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than this value,
the bounding box will be suppressed.
The default value is
0.6
. This value can be changed directly or by usinguse_preset()
.
-
__call__
(x)¶ Compute localization and classification from a batch of images.
This method computes two variables,
loc
andconf
._decode()
converts these variables to bounding box coordinates and confidence scores. These variables are also used in training SSD.Parameters: x (chainer.Variable) – A variable holding a batch of images. The images are preprocessed by _prepare()
.Returns: This method returns two variables, loc
andconf
.- loc: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and :\(K\) is the number of default bounding boxes.
- conf: A variable of float arrays of shape \((B, K, n\_fg\_class + 1)\).
Return type: tuple of chainer.Variable
-
predict
(imgs)¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
.- bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bouding box is organized by
(y_min, x_min, y_max, x_max)
in the second axis. - labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists - bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bouding box is organized by
-
use_preset
(preset)¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
- extractor –
VGG16¶
-
class
chainercv.links.model.ssd.
VGG16
(**links)¶ An extended VGG-16 model for SSD300 and SSD512.
This is an extended VGG-16 model proposed in [6]. The differences from original VGG-16 [7] are shown below.
conv5_1
,conv5_2
andconv5_3
are changed fromConvolution2d
toDilatedConvolution2d
.Normalize
is inserted afterconv4_3
.- The parameters of max pooling after
conv5_3
are changed. fc6
andfc7
are converted toconv6
andconv7
.
[6] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. [7] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015.
VGG16Extractor300¶
-
class
chainercv.links.model.ssd.
VGG16Extractor300
¶ A VGG-16 based feature extractor for SSD300.
This is a feature extractor for
SSD300
. This extractor is based onVGG16
.-
__call__
(x)¶ Compute feature maps from a batch of images.
This method extracts feature maps from
conv4_3
,conv7
,conv8_2
,conv9_2
,conv10_2
, andconv11_2
.Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(300\times 300\). Returns: Each variable contains a feature map. Return type: list of Variable
-
VGG16Extractor512¶
-
class
chainercv.links.model.ssd.
VGG16Extractor512
¶ A VGG-16 based feature extractor for SSD512.
This is a feature extractor for
SSD512
. This extractor is based onVGG16
.-
__call__
(x)¶ Compute feature maps from a batch of images.
This method extracts feature maps from
conv4_3
,conv7
,conv8_2
,conv9_2
,conv10_2
,conv11_2
, andconv12_2
.Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(512\times 512\). Returns: Each variable contains a feature map. Return type: list of Variable
-