FPN (Feature Pyramid Networks)¶
Detection Links¶
FasterRCNNFPNResnet50¶

class
chainercv.links.model.fpn.
FasterRCNNFPNResNet50
(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Faster RCNN with ResNet50 and FPN.
Please refer to
FasterRCNNFPNResNet
.
FasterRCNNFPNResnet101¶

class
chainercv.links.model.fpn.
FasterRCNNFPNResNet101
(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Faster RCNN with ResNet101 and FPN.
Please refer to
FasterRCNNFPNResNet
.
Instance Segmentation Links¶
MaskRCNNFPNResNet50¶

class
chainercv.links.model.fpn.
MaskRCNNFPNResNet50
(n_fg_class=None, pretrained_model=None, return_values=['masks', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Mask RCNN with ResNet50 and FPN.
Please refer to
FasterRCNNFPNResNet
.
MaskRCNNFPNResNet101¶

class
chainercv.links.model.fpn.
MaskRCNNFPNResNet101
(n_fg_class=None, pretrained_model=None, return_values=['masks', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Mask RCNN with ResNet101 and FPN.
Please refer to
FasterRCNNFPNResNet
.
Utility¶
FasterRCNN¶

class
chainercv.links.model.fpn.
FasterRCNN
(extractor, rpn, bbox_head, mask_head, return_values, min_size=800, max_size=1333)[source]¶ Base class of Faster RCNN with FPN.
This is a base class of Faster RCNN with FPN.
 Parameters
extractor (Link) – A link that extracts feature maps. This link must have
scales
,mean
andforward()
.rpn (Link) – A link that has the same interface as
RPN
. Please refer to the documentation found there.bbox_head (Link) – A link that has the same interface as
BboxHead
. Please refer to the documentation found there.mask_head (Link) – A link that has the same interface as
MaskHead
. Please refer to the documentation found there.return_values (list of strings) – Determines the values returned by
predict()
.min_size (int) – A preprocessing paramter for
prepare()
. Please refer to a docstring found forprepare()
.max_size (int) – A preprocessing paramter for
prepare()
. Note that the result ofprepare()
can exceed this size due to alignment with stride.nms_thresh (float) – The threshold value for
non_maximum_suppression()
. The default value is0.45
. This value can be changed directly or by usinguse_preset()
.score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is
0.6
. This value can be changed directly or by usinguse_preset()
.

predict
(imgs)[source]¶ Conduct inference on the given images.
The value returned by this method is decided based on the argument
return_values
of__init__()
.Examples
>>> from chainercv.links import FasterRCNNFPNResNet50 >>> model = FasterRCNNFPNResNet50( ... pretrained_model='coco', ... return_values=['rois', 'bboxes', 'labels', 'scores']) >>> rois, bboxes, labels, scores = model.predict(imgs)
 Parameters
imgs (iterable of numpy.ndarray) – Inputs.
 Returns
The table below shows the input and possible outputs.
 Return type
tuple of lists
Input name
shape
dtype
format
imgs
\([(3, H, W)]\)
float32
RGB, \([0, 255]\)
Output name
shape
dtype
format
rois
\([(R', 4)]\)
float32
\((y_{min}, x_{min}, y_{max}, x_{max})\)
bboxes
\([(R, 4)]\)
float32
\((y_{min}, x_{min}, y_{max}, x_{max})\)
scores
\([(R,)]\)
float32
–
labels
\([(R,)]\)
int32
\([0, \#fg\_class  1]\)
masks
\([(R, H, W)]\)
–

prepare
(imgs)[source]¶ Preprocess images.
 Parameters
imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
 Returns
preprocessed images and scales that were caluclated in prepocessing.
 Return type
Two arrays

use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
 Parameters
preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
FasterRCNNFPNResNet¶

class
chainercv.links.model.fpn.
FasterRCNNFPNResNet
(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Base class for Faster RCNN with a ResNet backbone and FPN.
A subclass of this class should have
_base
and_models
. Parameters
n_fg_class (int) – The number of classes excluding the background.
pretrained_model (string) –
The weight file to be loaded. This can take
'coco'
, filepath orNone
. The default value isNone
.'coco'
: Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically.n_fg_class
must be80
orNone
.'imagenet'
: Load weights of ResNet50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.filepath: A path of npz file. In this case,
n_fg_class
must be specified properly.None
: Do not load weights.
return_values (list of strings) – Determines the values returned by
predict()
.min_size (int) – A preprocessing paramter for
prepare()
. Please refer toprepare()
.max_size (int) – A preprocessing paramter for
prepare()
.
FPN¶

class
chainercv.links.model.fpn.
FPN
(base, n_base_output, scales)[source]¶ An extractor class of Feature Pyramid Networks.
This class wraps a feature extractor and provides multiscale features.
 Parameters
base (Link) – A base feature extractor. It should have
forward()
andmean
.forward()
should take a batch of images and return feature maps of them. The size of the \(k+1\)th feature map should be the half as that of the \(k\)th feature map.n_base_output (int) – The number of feature maps that
base
returns.scales (tuple of floats) – The scales of feature maps.
BboxHead¶

class
chainercv.links.model.fpn.
BboxHead
(n_class, scales)[source]¶ Bounding box head network of Feature Pyramid Networks.
 Parameters
n_class (int) – The number of classes including background.
scales (tuple of floats) – The scales of feature maps.

decode
(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locs
andconfs
returned by a FPN network back tobboxes
,labels
andscores
. Parameters
rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An array whose shape is \((R, n\_class)\).
scales (list of floats) – A list of floats returned by
prepare()
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)th image.
nms_thresh (float) – The threshold value for
non_maximum_suppression()
.score_thresh (float) – The threshold value for confidence score.
 Returns
bboxes
,labels
andscores
. Return type
tuple of three list of arrays
bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L  1]\), where \(L\) is the number of the foreground classes.
scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.

distribute
(rois, roi_indices)[source]¶ Assigns Rois to feature maps according to their size.
 Parameters
rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).
 Returns
rois
androi_indices
.rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)th feature map.
roi_indices : A list of arrays of shape \((R_l,)\).
 Return type
tuple of two lists

forward
(hs, rois, roi_indices)[source]¶ Calculates RoIs.
 Parameters
hs (iterable of array) – An iterable of feature maps.
rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l th feature map.
roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).
 Returns
locs
andconfs
.locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.
confs: A list of array whose shape is \((R, n\_class)\).
 Return type
tuple of two arrays
RPN¶

class
chainercv.links.model.fpn.
RPN
(scales)[source]¶ Region Proposal Network of Feature Pyramid Networks.
 Parameters
scales (tuple of floats) – The scales of feature maps.

anchors
(sizes)[source]¶ Calculates anchor boxes.
 Parameters
sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)th feature map.
 Returns
The shape of the \(l\)th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios.
 Return type
list of arrays

decode
(locs, confs, anchors, in_shape)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locs
andconfs
returned by a FPN network back torois
androi_indices
. Parameters
locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)th level.
confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
anchors (list of arrays) – Anchor boxes returned by
anchors()
.in_shape (tuple of ints) – The shape of input of array the feature extractor.
 Returns
rois
androi_indices
.rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices : An array of shape \((R,)\).
 Return type
tuple of two arrays

forward
(hs)[source]¶ Calculates RoIs.
 Parameters
hs (iterable of array) – An iterable of feature maps.
 Returns
locs
andconfs
.locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)th level.
” confs: A list of array whose shape is \((N, K_l)\).
 Return type
tuple of two arrays
MaskHead¶

class
chainercv.links.model.fpn.
MaskHead
(n_class, scales)[source]¶ Mask Head network of Mask RCNN.
 Parameters
n_class (int) – The number of classes including background.
scales (tuple of floats) – The scales of feature maps.

decode
(segms, bboxes, labels, sizes)[source]¶ Decodes back to masks.
 Parameters
segms (iterable of arrays) – An iterable of arrays of shape \((R_n, n\_class, M, M)\).
bboxes (iterable of arrays) – An iterable of arrays of shape \((R_n, 4)\).
labels (iterable of arrays) – An iterable of arrays of shape \((R_n,)\).
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)th image.
 Returns
This list contains instance segmentation for each image in the batch. More precisely, this is a list of boolean arrays of shape \((R'_n, H_n, W_n)\), where \(R'_n\) is the number of bounding boxes in the \(n\)th image.
 Return type
list of arrays

distribute
(rois, roi_indices)[source]¶ Assigns feature levels to Rois based on their size.
 Parameters
rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).
 Returns
out_rois
,out_roi_indices
andorder
.out_rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)th feature map.
out_roi_indices : A list of arrays of shape \((R_l,)\).
order: A correspondence between the output and the input. The relationship below is satisfied.
xp.concatenate(out_rois, axis=0)[order[i]] == rois[i]
 Return type
two lists and one array
segm_to_mask¶

chainercv.links.model.fpn.
segm_to_mask
(segm, bbox, size)[source]¶ Recover mask from cropped and resized mask.
This function requires cv2.
 Parameters
 Returns
See below.
 Return type
name
shape
dtype
format
segm
\((R, S, S)\)
float32
–
bbox
\((R, 4)\)
float32
\((y_{min}, x_{min}, y_{max}, x_{max})\)
mask
(output)\((R, H, W)\)
–
Trainonly Utility¶
bbox_head_loss_pre¶

chainercv.links.model.fpn.
bbox_head_loss_pre
(rois, roi_indices, std, bboxes, labels)[source]¶ Loss function for Head (pre).
This function processes RoIs for
bbox_head_loss_post()
. Parameters
rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
std (tuple of floats) – Two coefficients used for encoding bounding boxes.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
labels – A list of arrays whose shape is \((R_n,)\).
bbox_head_loss_post¶

chainercv.links.model.fpn.
bbox_head_loss_post
(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]¶ Loss function for Head (post).
 Parameters
locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).
roi_indices (list of arrays) – A list of arrays returned by
bbox_head_locs_pre()
.gt_locs (list of arrays) – A list of arrays returned by
bbox_head_locs_pre()
.gt_labels (list of arrays) – A list of arrays returned by
bbox_head_locs_pre()
.batchsize (int) – The size of batch.
 Returns
loc_loss
andconf_loss
. Return type
tuple of two variables
rpn_loss¶

chainercv.links.model.fpn.
rpn_loss
(locs, confs, anchors, sizes, bboxes)[source]¶ Loss function for RPN.
 Parameters
locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)th level.
confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).
anchors (list of arrays) – A list of arrays returned by
anchors()
.sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)th image.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
 Returns
loc_loss
andconf_loss
. Return type
tuple of two variables
mask_head_loss_pre¶

chainercv.links.model.fpn.
mask_head_loss_pre
(rois, roi_indices, gt_masks, gt_bboxes, gt_head_labels, segm_size)[source]¶ Loss function for Mask Head (pre).
This function processes RoIs for
mask_head_loss_post()
by selecting RoIs for mask loss calculation and preparing ground truth network output. Parameters
rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
gt_masks (iterable of arrays) – An iterable of arrays whose shape is \((R_n, H, W)\), where \(R_n\) is the number of ground truth objects.
gt_head_labels (iterable of arrays) – An iterable of arrays of shape \((R_l,)\). This is a collection of groundtruth labels assigned to
rois
during bounding box localization stage. The range of value is \((0, n\_class  1)\).segm_size (int) – Size of the ground truth network output.
 Returns
mask_rois
,mask_roi_indices
,gt_segms
, andgt_mask_labels
.rois: A list of arrays of shape \((R'_l, 4)\), where \(R'_l\) is the number of RoIs in the \(l\)th feature map.
roi_indices: A list of arrays of shape \((R'_l,)\).
gt_segms: A list of arrays of shape \((R'_l, M, M). :math:\) is the argument
segm_size
.gt_mask_labels: A list of arrays of shape \((R'_l,)\) indicating the classes of ground truth.
 Return type
tuple of four lists
mask_head_loss_post¶

chainercv.links.model.fpn.
mask_head_loss_post
(segms, mask_roi_indices, gt_segms, gt_mask_labels, batchsize)[source]¶ Loss function for Mask Head (post).
 Parameters
segms (array) – An array whose shape is \((R, n\_class, M, M)\), where \(R\) is the total number of RoIs in the given batch.
mask_roi_indices (array) – A list of arrays returned by
mask_head_loss_pre()
.gt_segms (list of arrays) – A list of arrays returned by
mask_head_loss_pre()
.gt_mask_labels (list of arrays) – A list of arrays returned by
mask_head_loss_pre()
.batchsize (int) – The size of batch.
 Returns
Mask loss.
 Return type
chainer.Variable
mask_to_segm¶

chainercv.links.model.fpn.
mask_to_segm
(mask, bbox, segm_size, index=None)[source]¶ Crop and resize mask.
This function requires cv2.
 Parameters
 Returns
See below.
 Return type
name
shape
dtype
format
mask
\((N, H, W)\)
–
bbox
\((R, 4)\)
float32
\((y_{min}, x_{min}, y_{max}, x_{max})\)
index
(optional)\((R,)\)
int32
–
segms
(output)\((R, S, S)\)
float32
\([0, 1]\)