FPN (Feature Pyramid Networks)¶

Detection Links¶

FasterRCNNFPNResnet50¶

class chainercv.links.model.fpn.FasterRCNNFPNResNet50(n_fg_class=None, pretrained_model=None)[source]¶

Feature Pyramid Networks with ResNet-50.

This is a model of Feature Pyramid Networks [1]. This model uses ResNet50 as its base feature extractor.

[1]	Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017

Parameters:

n_fg_class (int) – The number of classes excluding the background.
pretrained_model (string) –
The weight file to be loaded. This can take 'coco', filepath or None. The default value is None.
- 'coco': Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically. n_fg_class must be 80 or None.
- 'imagenet': Load weights of ResNet-50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case, n_fg_class can be set to any number.
- filepath: A path of npz file. In this case, n_fg_class must be specified properly.
- None: Do not load weights.

FasterRCNNFPNResnet101¶

class chainercv.links.model.fpn.FasterRCNNFPNResNet101(n_fg_class=None, pretrained_model=None)[source]¶

Feature Pyramid Networks with ResNet-101.

This is a model of Feature Pyramid Networks [2]. This model uses ResNet101 as its base feature extractor.

[2]	Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017

Parameters:

n_fg_class (int) – The number of classes excluding the background.
pretrained_model (string) –
The weight file to be loaded. This can take 'coco', filepath or None. The default value is None.
- 'coco': Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically. n_fg_class must be 80 or None.
- 'imagenet': Load weights of ResNet-101 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case, n_fg_class can be set to any number.
- filepath: A path of npz file. In this case, n_fg_class must be specified properly.
- None: Do not load weights.

Utility¶

FasterRCNN¶

class chainercv.links.model.fpn.FasterRCNN(extractor, rpn, head)[source]¶

Base class of Feature Pyramid Networks.

This is a base class of Feature Pyramid Networks [3].

[3]	Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017

Parameters:

extractor (Link) – A link that extracts feature maps. This link must have scales, mean and __call__().
rpn (Link) – A link that has the same interface as RPN. Please refer to the documentation found there.
head (Link) – A link that has the same interface as Head. Please refer to the documentation found there.
nms_thresh (float) – The threshold value for non_maximum_suppression(). The default value is 0.45. This value can be changed directly or by using use_preset().
score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is 0.6. This value can be changed directly or by using use_preset().

predict(imgs)[source]¶

Detect objects from images.

This method predicts objects for each image.

Parameters:	imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns:	This method returns a tuple of three lists, `(bboxes, labels, scores)`. bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis. labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes. scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type:	tuple of lists

prepare(imgs)[source]¶

Preprocess images.

Parameters:	imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns:	preprocessed images and scales that were caluclated in prepocessing.
Return type:	Two arrays

use_preset(preset)[source]¶

Use the given preset during prediction.

This method changes values of nms_thresh and score_thresh. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals in predict(), respectively.

If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.

Parameters:	preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.

FPN¶

class chainercv.links.model.fpn.FPN(base, n_base_output, scales)[source]¶

An extractor class of Feature Pyramid Networks.

This class wraps a feature extractor and provides multi-scale features.

Parameters:

base (Link) – A base feature extractor. It should have __call__() and mean. __call__() should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map.
n_base_output (int) – The number of feature maps that base returns.
scales (tuple of floats) – The scales of feature maps.

Head¶

class chainercv.links.model.fpn.Head(n_class, scales)[source]¶

Head network of Feature Pyramid Networks.

Parameters:	n_class (int) – The number of classes including background. scales (tuple of floats) – The scales of feature maps.

__call__(hs, rois, roi_indices)[source]¶

Calculates RoIs.

Parameters:

hs (iterable of array) – An iterable of feature maps.
rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l- th feature map.
roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).

Returns:

locs and confs.

locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.

” confs: A list of array whose shape is \((R, n\_class)\).

Return type:

tuple of two arrays

decode(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]¶

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to bboxes, labels and scores.

Parameters:

rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An array whose shape is \((R, n\_class)\).
scales (list of floats) – A list of floats returned by prepare()
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
nms_thresh (float) – The threshold value for non_maximum_suppression().
score_thresh (float) – The threshold value for confidence score.

Returns:

bboxes, labels and scores.

Return type:

tuple of three list of arrays

bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.

distribute(rois, roi_indices)[source]¶

Assigns Rois to feature maps according to their size.

Parameters:

rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).

Returns:

rois and roi_indices.

rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices : A list of arrays of shape \((R_l,)\).

Return type:

tuple of two lists

RPN¶

class chainercv.links.model.fpn.RPN(scales)[source]¶

Region Proposal Network of Feature Pyramid Networks.

Parameters:	scales (tuple of floats) – The scales of feature maps.

__call__(hs)[source]¶

Calculates RoIs.

Parameters: hs (iterable of array) – An iterable of feature maps.

Returns:

locs and confs.

locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.

” confs: A list of array whose shape is \((N, K_l)\).

Return type: tuple of two arrays

anchors(sizes)[source]¶

Calculates anchor boxes.

Parameters:	sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map.
Returns:	The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios.
Return type:	list of arrays

decode(locs, confs, anchors, in_shape)[source]¶

Decodes back to coordinates of RoIs.

This method decodes locs and confs returned by a FPN network back to rois and roi_indices.

Parameters:

locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(N_l\) is the number of the anchor boxes of the \(l\)-th level.
confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
anchors (list of arrays) – Anchor boxes returned by anchors().
in_shape (tuple of ints) – The shape of input of array the feature extractor.

Returns:

rois and roi_indices.

rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices : An array of shape \((R,)\).

Return type:

tuple of two arrays

Train-only Utility¶

head_loss_pre¶

chainercv.links.model.fpn.head_loss_pre(rois, roi_indices, std, bboxes, labels)[source]¶

Loss function for Head (pre).

This function processes RoIs for head_loss_post().

Parameters:

rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
std (tuple of floats) – Two coefficients used for encoding bounding boxes.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
labels – A list of arrays whose shape is \((R_n,)\).

head_loss_post¶

chainercv.links.model.fpn.head_loss_post(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]¶

Loss function for Head (post).

Parameters:	locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch. confs (array) – An iterable of arrays whose shape is \((R, n\_class)\). roi_indices (list of arrays) – A list of arrays returned by `head_locs_pre()`. gt_locs (list of arrays) – A list of arrays returned by `head_locs_pre()`. gt_labels (list of arrays) – A list of arrays returned by `head_locs_pre()`. batchsize (int) – The size of batch.
Returns:	`loc_loss` and `conf_loss`.
Return type:	tuple of two variables

rpn_loss¶

chainercv.links.model.fpn.rpn_loss(locs, confs, anchors, sizes, bboxes)[source]¶

Loss function for RPN.

Parameters:	locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level. confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\). anchors (list of arrays) – A list of arrays returned by `anchors()`. sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image. bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
Returns:	`loc_loss` and `conf_loss`.
Return type:	tuple of two variables