FPN (Feature Pyramid Networks)

Utility

FasterRCNN

class chainercv.links.model.fpn.FasterRCNN(extractor, rpn, head)[source]

Base class of Feature Pyramid Networks.

This is a base class of Feature Pyramid Networks [3].

[3]Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017
Parameters:
  • extractor (Link) – A link that extracts feature maps. This link must have scales, mean and __call__().
  • rpn (Link) – A link that has the same interface as RPN. Please refer to the documentation found there.
  • head (Link) – A link that has the same interface as Head. Please refer to the documentation found there.
  • nms_thresh (float) – The threshold value for non_maximum_suppression(). The default value is 0.45. This value can be changed directly or by using use_preset().
  • score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is 0.6. This value can be changed directly or by using use_preset().
predict(imgs)[source]

Detect objects from images.

This method predicts objects for each image.

Parameters:imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns:This method returns a tuple of three lists, (bboxes, labels, scores).
  • bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
  • labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
  • scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type:tuple of lists
prepare(imgs)[source]

Preprocess images.

Parameters:imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns:preprocessed images and scales that were caluclated in prepocessing.
Return type:Two arrays
use_preset(preset)[source]

Use the given preset during prediction.

This method changes values of nms_thresh and score_thresh. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals in predict(), respectively.

If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.

Parameters:preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.

FPN

class chainercv.links.model.fpn.FPN(base, n_base_output, scales)[source]

An extractor class of Feature Pyramid Networks.

This class wraps a feature extractor and provides multi-scale features.

Parameters:
  • base (Link) – A base feature extractor. It should have __call__() and mean. __call__() should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map.
  • n_base_output (int) – The number of feature maps that base returns.
  • scales (tuple of floats) – The scales of feature maps.

RPN

class chainercv.links.model.fpn.RPN(scales)[source]

Region Proposal Network of Feature Pyramid Networks.

Parameters:scales (tuple of floats) – The scales of feature maps.
__call__(hs)[source]

Calculates RoIs.

Parameters:hs (iterable of array) – An iterable of feature maps.
Returns:locs and confs.
  • locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.

confs: A list of array whose shape is \((N, K_l)\).

Return type:tuple of two arrays
anchors(sizes)[source]

Calculates anchor boxes.

Parameters:sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map.
Returns:The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios.
Return type:list of arrays
decode(locs, confs, anchors, in_shape)[source]

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to rois and roi_indices.

Parameters:
  • locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(N_l\) is the number of the anchor boxes of the \(l\)-th level.
  • confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
  • anchors (list of arrays) – Anchor boxes returned by anchors().
  • in_shape (tuple of ints) – The shape of input of array the feature extractor.
Returns:

rois and roi_indices.

  • rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
  • roi_indices : An array of shape \((R,)\).

Return type:

tuple of two arrays

Train-only Utility

head_loss_pre

chainercv.links.model.fpn.head_loss_pre(rois, roi_indices, std, bboxes, labels)[source]

Loss function for Head (pre).

This function processes RoIs for head_loss_post().

Parameters:
  • rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
  • roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
  • std (tuple of floats) – Two coefficients used for encoding bounding boxes.
  • bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
  • labels – A list of arrays whose shape is \((R_n,)\).

head_loss_post

chainercv.links.model.fpn.head_loss_post(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]

Loss function for Head (post).

Parameters:
  • locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
  • confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).
  • roi_indices (list of arrays) – A list of arrays returned by head_locs_pre().
  • gt_locs (list of arrays) – A list of arrays returned by head_locs_pre().
  • gt_labels (list of arrays) – A list of arrays returned by head_locs_pre().
  • batchsize (int) – The size of batch.
Returns:

loc_loss and conf_loss.

Return type:

tuple of two variables

rpn_loss

chainercv.links.model.fpn.rpn_loss(locs, confs, anchors, sizes, bboxes)[source]

Loss function for RPN.

Parameters:
  • locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
  • confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).
  • anchors (list of arrays) – A list of arrays returned by anchors().
  • sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
  • bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
Returns:

loc_loss and conf_loss.

Return type:

tuple of two variables