FPN (Feature Pyramid Networks)¶

Detection Links¶

FasterRCNNFPNResnet50¶

class chainercv.links.model.fpn.FasterRCNNFPNResNet50(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶

Faster R-CNN with ResNet-50 and FPN.

Please refer to FasterRCNNFPNResNet.

FasterRCNNFPNResnet101¶

class chainercv.links.model.fpn.FasterRCNNFPNResNet101(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶

Faster R-CNN with ResNet-101 and FPN.

Please refer to FasterRCNNFPNResNet.

Instance Segmentation Links¶

MaskRCNNFPNResNet50¶

class chainercv.links.model.fpn.MaskRCNNFPNResNet50(n_fg_class=None, pretrained_model=None, return_values=['masks', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶

Mask R-CNN with ResNet-50 and FPN.

Please refer to FasterRCNNFPNResNet.

MaskRCNNFPNResNet101¶

class chainercv.links.model.fpn.MaskRCNNFPNResNet101(n_fg_class=None, pretrained_model=None, return_values=['masks', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶

Mask R-CNN with ResNet-101 and FPN.

Please refer to FasterRCNNFPNResNet.

Utility¶

FasterRCNN¶

class chainercv.links.model.fpn.FasterRCNN(extractor, rpn, bbox_head, mask_head, return_values, min_size=800, max_size=1333)[source]¶

Base class of Faster R-CNN with FPN.

This is a base class of Faster R-CNN with FPN.

Parameters

extractor (Link) – A link that extracts feature maps. This link must have scales, mean and forward().
rpn (Link) – A link that has the same interface as RPN. Please refer to the documentation found there.
bbox_head (Link) – A link that has the same interface as BboxHead. Please refer to the documentation found there.
mask_head (Link) – A link that has the same interface as MaskHead. Please refer to the documentation found there.
return_values (list of strings) – Determines the values returned by predict().
min_size (int) – A preprocessing paramter for prepare(). Please refer to a docstring found for prepare().
max_size (int) – A preprocessing paramter for prepare(). Note that the result of prepare() can exceed this size due to alignment with stride.
nms_thresh (float) – The threshold value for non_maximum_suppression(). The default value is 0.45. This value can be changed directly or by using use_preset().
score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is 0.6. This value can be changed directly or by using use_preset().

predict(imgs)[source]¶

Conduct inference on the given images.

The value returned by this method is decided based on the argument return_values of __init__().

Examples

>>> from chainercv.links import FasterRCNNFPNResNet50
>>> model = FasterRCNNFPNResNet50(
...     pretrained_model='coco',
...     return_values=['rois', 'bboxes', 'labels', 'scores'])
>>> rois, bboxes, labels, scores = model.predict(imgs)

Parameters: imgs (iterable of numpy.ndarray) – Inputs.
Returns: The table below shows the input and possible outputs.
Return type: tuple of lists

Input name	shape	dtype	format
`imgs`	\([(3, H, W)]\)	`float32`	RGB, \([0, 255]\)

Output name	shape	dtype	format
`rois`	\([(R', 4)]\)	`float32`	\((y_{min}, x_{min}, y_{max}, x_{max})\)
`bboxes`	\([(R, 4)]\)	`float32`	\((y_{min}, x_{min}, y_{max}, x_{max})\)
`scores`	\([(R,)]\)	`float32`	–
`labels`	\([(R,)]\)	`int32`	\([0, \#fg\_class - 1]\)
`masks`	\([(R, H, W)]\)	`bool`	–

prepare(imgs)[source]¶

Preprocess images.

Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns: preprocessed images and scales that were caluclated in prepocessing.
Return type: Two arrays

use_preset(preset)[source]¶

Use the given preset during prediction.

This method changes values of nms_thresh and score_thresh. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals in predict(), respectively.

If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.

Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.

FasterRCNNFPNResNet¶

class chainercv.links.model.fpn.FasterRCNNFPNResNet(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶

Base class for Faster R-CNN with a ResNet backbone and FPN.

A subclass of this class should have _base and _models.

Parameters

n_fg_class (int) – The number of classes excluding the background.
pretrained_model (string) –
The weight file to be loaded. This can take 'coco', filepath or None. The default value is None.
- 'coco': Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically. n_fg_class must be 80 or None.
- 'imagenet': Load weights of ResNet-50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case, n_fg_class can be set to any number.
- filepath: A path of npz file. In this case, n_fg_class must be specified properly.
- None: Do not load weights.
return_values (list of strings) – Determines the values returned by predict().
min_size (int) – A preprocessing paramter for prepare(). Please refer to prepare().
max_size (int) – A preprocessing paramter for prepare().

FPN¶

class chainercv.links.model.fpn.FPN(base, n_base_output, scales)[source]¶

An extractor class of Feature Pyramid Networks.

This class wraps a feature extractor and provides multi-scale features.

Parameters

base (Link) – A base feature extractor. It should have forward() and mean. forward() should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map.
n_base_output (int) – The number of feature maps that base returns.
scales (tuple of floats) – The scales of feature maps.

BboxHead¶

class chainercv.links.model.fpn.BboxHead(n_class, scales)[source]¶

Bounding box head network of Feature Pyramid Networks.

Parameters

n_class (int) – The number of classes including background.
scales (tuple of floats) – The scales of feature maps.

decode(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]¶

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to bboxes, labels and scores.

Parameters

rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An array whose shape is \((R, n\_class)\).
scales (list of floats) – A list of floats returned by prepare()
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
nms_thresh (float) – The threshold value for non_maximum_suppression().
score_thresh (float) – The threshold value for confidence score.

Returns

bboxes, labels and scores.

Return type

tuple of three list of arrays

bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.

distribute(rois, roi_indices)[source]¶

Assigns Rois to feature maps according to their size.

Parameters

rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).

Returns

rois and roi_indices.

rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices : A list of arrays of shape \((R_l,)\).

Return type

tuple of two lists

forward(hs, rois, roi_indices)[source]¶

Calculates RoIs.

Parameters

hs (iterable of array) – An iterable of feature maps.
rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l- th feature map.
roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).

Returns

locs and confs.

locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.
confs: A list of array whose shape is \((R, n\_class)\).

Return type

tuple of two arrays

RPN¶

class chainercv.links.model.fpn.RPN(scales)[source]¶

Region Proposal Network of Feature Pyramid Networks.

Parameters: scales (tuple of floats) – The scales of feature maps.

anchors(sizes)[source]¶

Calculates anchor boxes.

Parameters: sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map.
Returns: The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios.
Return type: list of arrays

decode(locs, confs, anchors, in_shape)[source]¶

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to rois and roi_indices.

Parameters

locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
anchors (list of arrays) – Anchor boxes returned by anchors().
in_shape (tuple of ints) – The shape of input of array the feature extractor.

Returns

rois and roi_indices.

rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices : An array of shape \((R,)\).

Return type

tuple of two arrays

forward(hs)[source]¶

Calculates RoIs.

Parameters

hs (iterable of array) – An iterable of feature maps.

Returns

locs and confs.

locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.

” confs: A list of array whose shape is \((N, K_l)\).

Return type

tuple of two arrays

MaskHead¶

class chainercv.links.model.fpn.MaskHead(n_class, scales)[source]¶

Mask Head network of Mask R-CNN.

Parameters

n_class (int) – The number of classes including background.
scales (tuple of floats) – The scales of feature maps.

decode(segms, bboxes, labels, sizes)[source]¶

Decodes back to masks.

Parameters

segms (iterable of arrays) – An iterable of arrays of shape \((R_n, n\_class, M, M)\).
bboxes (iterable of arrays) – An iterable of arrays of shape \((R_n, 4)\).
labels (iterable of arrays) – An iterable of arrays of shape \((R_n,)\).
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.

Returns

This list contains instance segmentation for each image in the batch. More precisely, this is a list of boolean arrays of shape \((R'_n, H_n, W_n)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image.

Return type

list of arrays

distribute(rois, roi_indices)[source]¶

Assigns feature levels to Rois based on their size.

Parameters

rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).

Returns

out_rois, out_roi_indices and order.

out_rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
out_roi_indices : A list of arrays of shape \((R_l,)\).
order: A correspondence between the output and the input. The relationship below is satisfied.

xp.concatenate(out_rois, axis=0)[order[i]] == rois[i]

Return type

two lists and one array

segm_to_mask¶

chainercv.links.model.fpn.segm_to_mask(segm, bbox, size)[source]¶

Recover mask from cropped and resized mask.

This function requires cv2.

Parameters

segm (ndarray) – See below.
bbox (ndarray) – See below.
size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).

Returns

See below.

Return type

ndarray

name	shape	dtype	format
`segm`	\((R, S, S)\)	`float32`	–
`bbox`	\((R, 4)\)	`float32`	\((y_{min}, x_{min}, y_{max}, x_{max})\)
`mask` (output)	\((R, H, W)\)	`bool`	–

Train-only Utility¶

bbox_head_loss_pre¶

chainercv.links.model.fpn.bbox_head_loss_pre(rois, roi_indices, std, bboxes, labels)[source]¶

Loss function for Head (pre).

This function processes RoIs for bbox_head_loss_post().

Parameters

rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
std (tuple of floats) – Two coefficients used for encoding bounding boxes.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
labels – A list of arrays whose shape is \((R_n,)\).

bbox_head_loss_post¶

chainercv.links.model.fpn.bbox_head_loss_post(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]¶

Loss function for Head (post).

Parameters

locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).
roi_indices (list of arrays) – A list of arrays returned by bbox_head_locs_pre().
gt_locs (list of arrays) – A list of arrays returned by bbox_head_locs_pre().
gt_labels (list of arrays) – A list of arrays returned by bbox_head_locs_pre().
batchsize (int) – The size of batch.

Returns

loc_loss and conf_loss.

Return type

tuple of two variables

rpn_loss¶

chainercv.links.model.fpn.rpn_loss(locs, confs, anchors, sizes, bboxes)[source]¶

Loss function for RPN.

Parameters

locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).
anchors (list of arrays) – A list of arrays returned by anchors().
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.

Returns

loc_loss and conf_loss.

Return type

tuple of two variables

mask_head_loss_pre¶

chainercv.links.model.fpn.mask_head_loss_pre(rois, roi_indices, gt_masks, gt_bboxes, gt_head_labels, segm_size)[source]¶

Loss function for Mask Head (pre).

This function processes RoIs for mask_head_loss_post() by selecting RoIs for mask loss calculation and preparing ground truth network output.

Parameters

rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
gt_masks (iterable of arrays) – An iterable of arrays whose shape is \((R_n, H, W)\), where \(R_n\) is the number of ground truth objects.
gt_head_labels (iterable of arrays) – An iterable of arrays of shape \((R_l,)\). This is a collection of ground-truth labels assigned to rois during bounding box localization stage. The range of value is \((0, n\_class - 1)\).
segm_size (int) – Size of the ground truth network output.

Returns

mask_rois, mask_roi_indices, gt_segms, and gt_mask_labels.

rois: A list of arrays of shape \((R'_l, 4)\), where \(R'_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices: A list of arrays of shape \((R'_l,)\).
gt_segms: A list of arrays of shape \((R'_l, M, M). :math:\) is the argument segm_size.
gt_mask_labels: A list of arrays of shape \((R'_l,)\) indicating the classes of ground truth.

Return type

tuple of four lists

mask_head_loss_post¶

chainercv.links.model.fpn.mask_head_loss_post(segms, mask_roi_indices, gt_segms, gt_mask_labels, batchsize)[source]¶

Loss function for Mask Head (post).

Parameters

segms (array) – An array whose shape is \((R, n\_class, M, M)\), where \(R\) is the total number of RoIs in the given batch.
mask_roi_indices (array) – A list of arrays returned by mask_head_loss_pre().
gt_segms (list of arrays) – A list of arrays returned by mask_head_loss_pre().
gt_mask_labels (list of arrays) – A list of arrays returned by mask_head_loss_pre().
batchsize (int) – The size of batch.

Returns

Mask loss.

Return type

chainer.Variable

mask_to_segm¶

chainercv.links.model.fpn.mask_to_segm(mask, bbox, segm_size, index=None)[source]¶

Crop and resize mask.

This function requires cv2.

Parameters

mask (ndarray) – See below.
bbox (ndarray) – See below.
segm_size (int) – The size of segm \(S\).
index (ndarray) – See below. \(R = N\) when index is None.

Returns

See below.

Return type

ndarray

name	shape	dtype	format
`mask`	\((N, H, W)\)	`bool`	–
`bbox`	\((R, 4)\)	`float32`	\((y_{min}, x_{min}, y_{max}, x_{max})\)
`index` (optional)	\((R,)\)	`int32`	–
`segms` (output)	\((R, S, S)\)	`float32`	\([0, 1]\)