FPN (Feature Pyramid Networks)

Utility

FasterRCNN

class chainercv.links.model.fpn.FasterRCNN(extractor, rpn, bbox_head, mask_head, return_values, min_size=800, max_size=1333)[source]

Base class of Faster R-CNN with FPN.

This is a base class of Faster R-CNN with FPN.

Parameters
  • extractor (Link) – A link that extracts feature maps. This link must have scales, mean and forward().

  • rpn (Link) – A link that has the same interface as RPN. Please refer to the documentation found there.

  • bbox_head (Link) – A link that has the same interface as BboxHead. Please refer to the documentation found there.

  • mask_head (Link) – A link that has the same interface as MaskHead. Please refer to the documentation found there.

  • return_values (list of strings) – Determines the values returned by predict().

  • min_size (int) – A preprocessing paramter for prepare(). Please refer to a docstring found for prepare().

  • max_size (int) – A preprocessing paramter for prepare(). Note that the result of prepare() can exceed this size due to alignment with stride.

  • nms_thresh (float) – The threshold value for non_maximum_suppression(). The default value is 0.45. This value can be changed directly or by using use_preset().

  • score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is 0.6. This value can be changed directly or by using use_preset().

predict(imgs)[source]

Conduct inference on the given images.

The value returned by this method is decided based on the argument return_values of __init__().

Examples

>>> from chainercv.links import FasterRCNNFPNResNet50
>>> model = FasterRCNNFPNResNet50(
...     pretrained_model='coco',
...     return_values=['rois', 'bboxes', 'labels', 'scores'])
>>> rois, bboxes, labels, scores = model.predict(imgs)
Parameters

imgs (iterable of numpy.ndarray) – Inputs.

Returns

The table below shows the input and possible outputs.

Return type

tuple of lists

Input name

shape

dtype

format

imgs

\([(3, H, W)]\)

float32

RGB, \([0, 255]\)

Output name

shape

dtype

format

rois

\([(R', 4)]\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

bboxes

\([(R, 4)]\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

scores

\([(R,)]\)

float32

labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

masks

\([(R, H, W)]\)

bool

prepare(imgs)[source]

Preprocess images.

Parameters

imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).

Returns

preprocessed images and scales that were caluclated in prepocessing.

Return type

Two arrays

use_preset(preset)[source]

Use the given preset during prediction.

This method changes values of nms_thresh and score_thresh. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals in predict(), respectively.

If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.

Parameters

preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.

FasterRCNNFPNResNet

class chainercv.links.model.fpn.FasterRCNNFPNResNet(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]

Base class for Faster R-CNN with a ResNet backbone and FPN.

A subclass of this class should have _base and _models.

Parameters
  • n_fg_class (int) – The number of classes excluding the background.

  • pretrained_model (string) –

    The weight file to be loaded. This can take 'coco', filepath or None. The default value is None.

    • 'coco': Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically. n_fg_class must be 80 or None.

    • 'imagenet': Load weights of ResNet-50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case, n_fg_class can be set to any number.

    • filepath: A path of npz file. In this case, n_fg_class must be specified properly.

    • None: Do not load weights.

  • return_values (list of strings) – Determines the values returned by predict().

  • min_size (int) – A preprocessing paramter for prepare(). Please refer to prepare().

  • max_size (int) – A preprocessing paramter for prepare().

FPN

class chainercv.links.model.fpn.FPN(base, n_base_output, scales)[source]

An extractor class of Feature Pyramid Networks.

This class wraps a feature extractor and provides multi-scale features.

Parameters
  • base (Link) – A base feature extractor. It should have forward() and mean. forward() should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map.

  • n_base_output (int) – The number of feature maps that base returns.

  • scales (tuple of floats) – The scales of feature maps.

BboxHead

class chainercv.links.model.fpn.BboxHead(n_class, scales)[source]

Bounding box head network of Feature Pyramid Networks.

Parameters
  • n_class (int) – The number of classes including background.

  • scales (tuple of floats) – The scales of feature maps.

decode(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to bboxes, labels and scores.

Parameters
  • rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.

  • roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).

  • locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.

  • confs (array) – An array whose shape is \((R, n\_class)\).

  • scales (list of floats) – A list of floats returned by prepare()

  • sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.

  • nms_thresh (float) – The threshold value for non_maximum_suppression().

  • score_thresh (float) – The threshold value for confidence score.

Returns

bboxes, labels and scores.

Return type

tuple of three list of arrays

  • bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.

  • labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.

  • scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.

distribute(rois, roi_indices)[source]

Assigns Rois to feature maps according to their size.

Parameters
  • rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.

  • roi_indices (array) – An array of shape \((R,)\).

Returns

rois and roi_indices.

  • rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.

  • roi_indices : A list of arrays of shape \((R_l,)\).

Return type

tuple of two lists

forward(hs, rois, roi_indices)[source]

Calculates RoIs.

Parameters
  • hs (iterable of array) – An iterable of feature maps.

  • rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l- th feature map.

  • roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).

Returns

locs and confs.

  • locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.

  • confs: A list of array whose shape is \((R, n\_class)\).

Return type

tuple of two arrays

RPN

class chainercv.links.model.fpn.RPN(scales)[source]

Region Proposal Network of Feature Pyramid Networks.

Parameters

scales (tuple of floats) – The scales of feature maps.

anchors(sizes)[source]

Calculates anchor boxes.

Parameters

sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map.

Returns

The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios.

Return type

list of arrays

decode(locs, confs, anchors, in_shape)[source]

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to rois and roi_indices.

Parameters
  • locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.

  • confs (list of arrays) – A list of array whose shape is \((N, K_l)\).

  • anchors (list of arrays) – Anchor boxes returned by anchors().

  • in_shape (tuple of ints) – The shape of input of array the feature extractor.

Returns

rois and roi_indices.

  • rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.

  • roi_indices : An array of shape \((R,)\).

Return type

tuple of two arrays

forward(hs)[source]

Calculates RoIs.

Parameters

hs (iterable of array) – An iterable of feature maps.

Returns

locs and confs.

  • locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.

confs: A list of array whose shape is \((N, K_l)\).

Return type

tuple of two arrays

MaskHead

class chainercv.links.model.fpn.MaskHead(n_class, scales)[source]

Mask Head network of Mask R-CNN.

Parameters
  • n_class (int) – The number of classes including background.

  • scales (tuple of floats) – The scales of feature maps.

decode(segms, bboxes, labels, sizes)[source]

Decodes back to masks.

Parameters
  • segms (iterable of arrays) – An iterable of arrays of shape \((R_n, n\_class, M, M)\).

  • bboxes (iterable of arrays) – An iterable of arrays of shape \((R_n, 4)\).

  • labels (iterable of arrays) – An iterable of arrays of shape \((R_n,)\).

  • sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.

Returns

This list contains instance segmentation for each image in the batch. More precisely, this is a list of boolean arrays of shape \((R'_n, H_n, W_n)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image.

Return type

list of arrays

distribute(rois, roi_indices)[source]

Assigns feature levels to Rois based on their size.

Parameters
  • rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.

  • roi_indices (array) – An array of shape \((R,)\).

Returns

out_rois, out_roi_indices and order.

  • out_rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.

  • out_roi_indices : A list of arrays of shape \((R_l,)\).

  • order: A correspondence between the output and the input. The relationship below is satisfied.

xp.concatenate(out_rois, axis=0)[order[i]] == rois[i]

Return type

two lists and one array

segm_to_mask

chainercv.links.model.fpn.segm_to_mask(segm, bbox, size)[source]

Recover mask from cropped and resized mask.

This function requires cv2.

Parameters
  • segm (ndarray) – See below.

  • bbox (ndarray) – See below.

  • size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).

Returns

See below.

Return type

ndarray

name

shape

dtype

format

segm

\((R, S, S)\)

float32

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

mask (output)

\((R, H, W)\)

bool

Train-only Utility

bbox_head_loss_pre

chainercv.links.model.fpn.bbox_head_loss_pre(rois, roi_indices, std, bboxes, labels)[source]

Loss function for Head (pre).

This function processes RoIs for bbox_head_loss_post().

Parameters
  • rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.

  • roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).

  • std (tuple of floats) – Two coefficients used for encoding bounding boxes.

  • bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.

  • labels – A list of arrays whose shape is \((R_n,)\).

bbox_head_loss_post

chainercv.links.model.fpn.bbox_head_loss_post(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]

Loss function for Head (post).

Parameters
  • locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.

  • confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).

  • roi_indices (list of arrays) – A list of arrays returned by bbox_head_locs_pre().

  • gt_locs (list of arrays) – A list of arrays returned by bbox_head_locs_pre().

  • gt_labels (list of arrays) – A list of arrays returned by bbox_head_locs_pre().

  • batchsize (int) – The size of batch.

Returns

loc_loss and conf_loss.

Return type

tuple of two variables

rpn_loss

chainercv.links.model.fpn.rpn_loss(locs, confs, anchors, sizes, bboxes)[source]

Loss function for RPN.

Parameters
  • locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level.

  • confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).

  • anchors (list of arrays) – A list of arrays returned by anchors().

  • sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.

  • bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.

Returns

loc_loss and conf_loss.

Return type

tuple of two variables

mask_head_loss_pre

chainercv.links.model.fpn.mask_head_loss_pre(rois, roi_indices, gt_masks, gt_bboxes, gt_head_labels, segm_size)[source]

Loss function for Mask Head (pre).

This function processes RoIs for mask_head_loss_post() by selecting RoIs for mask loss calculation and preparing ground truth network output.

Parameters
  • rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.

  • roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).

  • gt_masks (iterable of arrays) – An iterable of arrays whose shape is \((R_n, H, W)\), where \(R_n\) is the number of ground truth objects.

  • gt_head_labels (iterable of arrays) – An iterable of arrays of shape \((R_l,)\). This is a collection of ground-truth labels assigned to rois during bounding box localization stage. The range of value is \((0, n\_class - 1)\).

  • segm_size (int) – Size of the ground truth network output.

Returns

mask_rois, mask_roi_indices, gt_segms, and gt_mask_labels.

  • rois: A list of arrays of shape \((R'_l, 4)\), where \(R'_l\) is the number of RoIs in the \(l\)-th feature map.

  • roi_indices: A list of arrays of shape \((R'_l,)\).

  • gt_segms: A list of arrays of shape \((R'_l, M, M). :math:\) is the argument segm_size.

  • gt_mask_labels: A list of arrays of shape \((R'_l,)\) indicating the classes of ground truth.

Return type

tuple of four lists

mask_head_loss_post

chainercv.links.model.fpn.mask_head_loss_post(segms, mask_roi_indices, gt_segms, gt_mask_labels, batchsize)[source]

Loss function for Mask Head (post).

Parameters
  • segms (array) – An array whose shape is \((R, n\_class, M, M)\), where \(R\) is the total number of RoIs in the given batch.

  • mask_roi_indices (array) – A list of arrays returned by mask_head_loss_pre().

  • gt_segms (list of arrays) – A list of arrays returned by mask_head_loss_pre().

  • gt_mask_labels (list of arrays) – A list of arrays returned by mask_head_loss_pre().

  • batchsize (int) – The size of batch.

Returns

Mask loss.

Return type

chainer.Variable

mask_to_segm

chainercv.links.model.fpn.mask_to_segm(mask, bbox, segm_size, index=None)[source]

Crop and resize mask.

This function requires cv2.

Parameters
  • mask (ndarray) – See below.

  • bbox (ndarray) – See below.

  • segm_size (int) – The size of segm \(S\).

  • index (ndarray) – See below. \(R = N\) when index is None.

Returns

See below.

Return type

ndarray

name

shape

dtype

format

mask

\((N, H, W)\)

bool

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

index (optional)

\((R,)\)

int32

segms (output)

\((R, S, S)\)

float32

\([0, 1]\)