FPN (Feature Pyramid Networks)¶
Detection Links¶
FasterRCNNFPNResnet50¶
-
class
chainercv.links.model.fpn.
FasterRCNNFPNResNet50
(n_fg_class=None, pretrained_model=None)[source]¶ Feature Pyramid Networks with ResNet-50.
This is a model of Feature Pyramid Networks [1]. This model uses
ResNet50
as its base feature extractor.[1] Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017 Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'coco'
, filepath orNone
. The default value isNone
.'coco'
: Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically.n_fg_class
must be80
orNone
.'imagenet'
: Load weights of ResNet-50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
FasterRCNNFPNResnet101¶
-
class
chainercv.links.model.fpn.
FasterRCNNFPNResNet101
(n_fg_class=None, pretrained_model=None)[source]¶ Feature Pyramid Networks with ResNet-101.
This is a model of Feature Pyramid Networks [2]. This model uses
ResNet101
as its base feature extractor.[2] Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017 Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'coco'
, filepath orNone
. The default value isNone
.'coco'
: Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically.n_fg_class
must be80
orNone
.'imagenet'
: Load weights of ResNet-101 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
Utility¶
FasterRCNN¶
-
class
chainercv.links.model.fpn.
FasterRCNN
(extractor, rpn, head)[source]¶ Base class of Feature Pyramid Networks.
This is a base class of Feature Pyramid Networks [3].
[3] Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017 Parameters: - extractor (Link) – A link that extracts feature maps.
This link must have
scales
,mean
and__call__()
. - rpn (Link) – A link that has the same interface as
RPN
. Please refer to the documentation found there. - head (Link) – A link that has the same interface as
Head
. Please refer to the documentation found there. - nms_thresh (float) – The threshold value
for
non_maximum_suppression()
. The default value is0.45
. This value can be changed directly or by usinguse_preset()
. - score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than this value,
the bounding box will be suppressed.
The default value is
0.6
. This value can be changed directly or by usinguse_preset()
.
-
predict
(imgs)[source]¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
.- bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists
-
prepare
(imgs)[source]¶ Preprocess images.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: preprocessed images and scales that were caluclated in prepocessing. Return type: Two arrays
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
- extractor (Link) – A link that extracts feature maps.
This link must have
FPN¶
-
class
chainercv.links.model.fpn.
FPN
(base, n_base_output, scales)[source]¶ An extractor class of Feature Pyramid Networks.
This class wraps a feature extractor and provides multi-scale features.
Parameters: - base (Link) – A base feature extractor.
It should have
__call__()
andmean
.__call__()
should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map. - n_base_output (int) – The number of feature maps
that
base
returns. - scales (tuple of floats) – The scales of feature maps.
- base (Link) – A base feature extractor.
It should have
Head¶
-
class
chainercv.links.model.fpn.
Head
(n_class, scales)[source]¶ Head network of Feature Pyramid Networks.
Parameters: - n_class (int) – The number of classes including background.
- scales (tuple of floats) – The scales of feature maps.
-
__call__
(hs, rois, roi_indices)[source]¶ Calculates RoIs.
Parameters: - hs (iterable of array) – An iterable of feature maps.
- rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l- th feature map.
- roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).
Returns: locs
andconfs
.- locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.
” confs: A list of array whose shape is \((R, n\_class)\).
Return type: tuple of two arrays
-
decode
(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locs
andconfs
returned by a FPN network back tobboxes
,labels
andscores
.Parameters: - rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
- roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
- locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
- confs (array) – An array whose shape is \((R, n\_class)\).
- scales (list of floats) – A list of floats returned
by
prepare()
- sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
- nms_thresh (float) – The threshold value
for
non_maximum_suppression()
. - score_thresh (float) – The threshold value for confidence score.
Returns: bboxes
,labels
andscores
.Return type: tuple of three list of arrays
- bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.
-
distribute
(rois, roi_indices)[source]¶ Assigns Rois to feature maps according to their size.
Parameters: - rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
- roi_indices (array) – An array of shape \((R,)\).
Returns: rois
androi_indices
.- rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
- roi_indices : A list of arrays of shape \((R_l,)\).
Return type: tuple of two lists
RPN¶
-
class
chainercv.links.model.fpn.
RPN
(scales)[source]¶ Region Proposal Network of Feature Pyramid Networks.
Parameters: scales (tuple of floats) – The scales of feature maps. -
__call__
(hs)[source]¶ Calculates RoIs.
Parameters: hs (iterable of array) – An iterable of feature maps. Returns: locs
andconfs
.- locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
” confs: A list of array whose shape is \((N, K_l)\).
Return type: tuple of two arrays
-
anchors
(sizes)[source]¶ Calculates anchor boxes.
Parameters: sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map. Returns: The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios. Return type: list of arrays
-
decode
(locs, confs, anchors, in_shape)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locs
andconfs
returned by a FPN network back torois
androi_indices
.Parameters: - locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(N_l\) is the number of the anchor boxes of the \(l\)-th level.
- confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
- anchors (list of arrays) – Anchor boxes returned by
anchors()
. - in_shape (tuple of ints) – The shape of input of array the feature extractor.
Returns: rois
androi_indices
.- rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
- roi_indices : An array of shape \((R,)\).
Return type: tuple of two arrays
-
Train-only Utility¶
head_loss_pre¶
-
chainercv.links.model.fpn.
head_loss_pre
(rois, roi_indices, std, bboxes, labels)[source]¶ Loss function for Head (pre).
This function processes RoIs for
head_loss_post()
.Parameters: - rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
- roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
- std (tuple of floats) – Two coefficients used for encoding bounding boxes.
- bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
- labels – A list of arrays whose shape is \((R_n,)\).
head_loss_post¶
-
chainercv.links.model.fpn.
head_loss_post
(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]¶ Loss function for Head (post).
Parameters: - locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
- confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).
- roi_indices (list of arrays) – A list of arrays returned by
head_locs_pre()
. - gt_locs (list of arrays) – A list of arrays returned by
head_locs_pre()
. - gt_labels (list of arrays) – A list of arrays returned by
head_locs_pre()
. - batchsize (int) – The size of batch.
Returns: loc_loss
andconf_loss
.Return type: tuple of two variables
rpn_loss¶
-
chainercv.links.model.fpn.
rpn_loss
(locs, confs, anchors, sizes, bboxes)[source]¶ Loss function for RPN.
Parameters: - locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
- confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).
- anchors (list of arrays) – A list of arrays returned by
anchors()
. - sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
- bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
Returns: loc_loss
andconf_loss
.Return type: tuple of two variables