FPN (Feature Pyramid Networks)¶
Detection Links¶
FasterRCNNFPNResnet50¶
-
class
chainercv.links.model.fpn.FasterRCNNFPNResNet50(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Faster R-CNN with ResNet-50 and FPN.
Please refer to
FasterRCNNFPNResNet.
FasterRCNNFPNResnet101¶
-
class
chainercv.links.model.fpn.FasterRCNNFPNResNet101(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Faster R-CNN with ResNet-101 and FPN.
Please refer to
FasterRCNNFPNResNet.
Instance Segmentation Links¶
MaskRCNNFPNResNet50¶
-
class
chainercv.links.model.fpn.MaskRCNNFPNResNet50(n_fg_class=None, pretrained_model=None, return_values=['masks', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Mask R-CNN with ResNet-50 and FPN.
Please refer to
FasterRCNNFPNResNet.
MaskRCNNFPNResNet101¶
-
class
chainercv.links.model.fpn.MaskRCNNFPNResNet101(n_fg_class=None, pretrained_model=None, return_values=['masks', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Mask R-CNN with ResNet-101 and FPN.
Please refer to
FasterRCNNFPNResNet.
Utility¶
FasterRCNN¶
-
class
chainercv.links.model.fpn.FasterRCNN(extractor, rpn, bbox_head, mask_head, return_values, min_size=800, max_size=1333)[source]¶ Base class of Faster R-CNN with FPN.
This is a base class of Faster R-CNN with FPN.
- Parameters
extractor (Link) – A link that extracts feature maps. This link must have
scales,meanandforward().rpn (Link) – A link that has the same interface as
RPN. Please refer to the documentation found there.bbox_head (Link) – A link that has the same interface as
BboxHead. Please refer to the documentation found there.mask_head (Link) – A link that has the same interface as
MaskHead. Please refer to the documentation found there.return_values (list of strings) – Determines the values returned by
predict().min_size (int) – A preprocessing paramter for
prepare(). Please refer to a docstring found forprepare().max_size (int) – A preprocessing paramter for
prepare(). Note that the result ofprepare()can exceed this size due to alignment with stride.nms_thresh (float) – The threshold value for
non_maximum_suppression(). The default value is0.45. This value can be changed directly or by usinguse_preset().score_thresh (float) – The threshold value for confidence score. If a bounding box whose confidence score is lower than this value, the bounding box will be suppressed. The default value is
0.6. This value can be changed directly or by usinguse_preset().
-
predict(imgs)[source]¶ Conduct inference on the given images.
The value returned by this method is decided based on the argument
return_valuesof__init__().Examples
>>> from chainercv.links import FasterRCNNFPNResNet50 >>> model = FasterRCNNFPNResNet50( ... pretrained_model='coco', ... return_values=['rois', 'bboxes', 'labels', 'scores']) >>> rois, bboxes, labels, scores = model.predict(imgs)
- Parameters
imgs (iterable of numpy.ndarray) – Inputs.
- Returns
The table below shows the input and possible outputs.
- Return type
tuple of lists
Input name
shape
dtype
format
imgs\([(3, H, W)]\)
float32RGB, \([0, 255]\)
Output name
shape
dtype
format
rois\([(R', 4)]\)
float32\((y_{min}, x_{min}, y_{max}, x_{max})\)
bboxes\([(R, 4)]\)
float32\((y_{min}, x_{min}, y_{max}, x_{max})\)
scores\([(R,)]\)
float32–
labels\([(R,)]\)
int32\([0, \#fg\_class - 1]\)
masks\([(R, H, W)]\)
–
-
prepare(imgs)[source]¶ Preprocess images.
- Parameters
imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\).
- Returns
preprocessed images and scales that were caluclated in prepocessing.
- Return type
Two arrays
-
use_preset(preset)[source]¶ Use the given preset during prediction.
This method changes values of
nms_threshandscore_thresh. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict(), respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
- Parameters
preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
FasterRCNNFPNResNet¶
-
class
chainercv.links.model.fpn.FasterRCNNFPNResNet(n_fg_class=None, pretrained_model=None, return_values=['bboxes', 'labels', 'scores'], min_size=800, max_size=1333)[source]¶ Base class for Faster R-CNN with a ResNet backbone and FPN.
A subclass of this class should have
_baseand_models.- Parameters
n_fg_class (int) – The number of classes excluding the background.
pretrained_model (string) –
The weight file to be loaded. This can take
'coco', filepath orNone. The default value isNone.'coco': Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically.n_fg_classmust be80orNone.'imagenet': Load weights of ResNet-50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_classcan be set to any number.filepath: A path of npz file. In this case,
n_fg_classmust be specified properly.None: Do not load weights.
return_values (list of strings) – Determines the values returned by
predict().min_size (int) – A preprocessing paramter for
prepare(). Please refer toprepare().max_size (int) – A preprocessing paramter for
prepare().
FPN¶
-
class
chainercv.links.model.fpn.FPN(base, n_base_output, scales)[source]¶ An extractor class of Feature Pyramid Networks.
This class wraps a feature extractor and provides multi-scale features.
- Parameters
base (Link) – A base feature extractor. It should have
forward()andmean.forward()should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map.n_base_output (int) – The number of feature maps that
basereturns.scales (tuple of floats) – The scales of feature maps.
BboxHead¶
-
class
chainercv.links.model.fpn.BboxHead(n_class, scales)[source]¶ Bounding box head network of Feature Pyramid Networks.
- Parameters
n_class (int) – The number of classes including background.
scales (tuple of floats) – The scales of feature maps.
-
decode(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locsandconfsreturned by a FPN network back tobboxes,labelsandscores.- Parameters
rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An array whose shape is \((R, n\_class)\).
scales (list of floats) – A list of floats returned by
prepare()sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
nms_thresh (float) – The threshold value for
non_maximum_suppression().score_thresh (float) – The threshold value for confidence score.
- Returns
bboxes,labelsandscores.- Return type
tuple of three list of arrays
bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.
-
distribute(rois, roi_indices)[source]¶ Assigns Rois to feature maps according to their size.
- Parameters
rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).
- Returns
roisandroi_indices.rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices : A list of arrays of shape \((R_l,)\).
- Return type
tuple of two lists
-
forward(hs, rois, roi_indices)[source]¶ Calculates RoIs.
- Parameters
hs (iterable of array) – An iterable of feature maps.
rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l- th feature map.
roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).
- Returns
locsandconfs.locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.
confs: A list of array whose shape is \((R, n\_class)\).
- Return type
tuple of two arrays
RPN¶
-
class
chainercv.links.model.fpn.RPN(scales)[source]¶ Region Proposal Network of Feature Pyramid Networks.
- Parameters
scales (tuple of floats) – The scales of feature maps.
-
anchors(sizes)[source]¶ Calculates anchor boxes.
- Parameters
sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map.
- Returns
The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios.
- Return type
list of arrays
-
decode(locs, confs, anchors, in_shape)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locsandconfsreturned by a FPN network back toroisandroi_indices.- Parameters
locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
anchors (list of arrays) – Anchor boxes returned by
anchors().in_shape (tuple of ints) – The shape of input of array the feature extractor.
- Returns
roisandroi_indices.rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices : An array of shape \((R,)\).
- Return type
tuple of two arrays
-
forward(hs)[source]¶ Calculates RoIs.
- Parameters
hs (iterable of array) – An iterable of feature maps.
- Returns
locsandconfs.locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
” confs: A list of array whose shape is \((N, K_l)\).
- Return type
tuple of two arrays
MaskHead¶
-
class
chainercv.links.model.fpn.MaskHead(n_class, scales)[source]¶ Mask Head network of Mask R-CNN.
- Parameters
n_class (int) – The number of classes including background.
scales (tuple of floats) – The scales of feature maps.
-
decode(segms, bboxes, labels, sizes)[source]¶ Decodes back to masks.
- Parameters
segms (iterable of arrays) – An iterable of arrays of shape \((R_n, n\_class, M, M)\).
bboxes (iterable of arrays) – An iterable of arrays of shape \((R_n, 4)\).
labels (iterable of arrays) – An iterable of arrays of shape \((R_n,)\).
sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
- Returns
This list contains instance segmentation for each image in the batch. More precisely, this is a list of boolean arrays of shape \((R'_n, H_n, W_n)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image.
- Return type
list of arrays
-
distribute(rois, roi_indices)[source]¶ Assigns feature levels to Rois based on their size.
- Parameters
rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
roi_indices (array) – An array of shape \((R,)\).
- Returns
out_rois,out_roi_indicesandorder.out_rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
out_roi_indices : A list of arrays of shape \((R_l,)\).
order: A correspondence between the output and the input. The relationship below is satisfied.
xp.concatenate(out_rois, axis=0)[order[i]] == rois[i]
- Return type
two lists and one array
segm_to_mask¶
-
chainercv.links.model.fpn.segm_to_mask(segm, bbox, size)[source]¶ Recover mask from cropped and resized mask.
This function requires cv2.
- Parameters
- Returns
See below.
- Return type
name
shape
dtype
format
segm\((R, S, S)\)
float32–
bbox\((R, 4)\)
float32\((y_{min}, x_{min}, y_{max}, x_{max})\)
mask(output)\((R, H, W)\)
–
Train-only Utility¶
bbox_head_loss_pre¶
-
chainercv.links.model.fpn.bbox_head_loss_pre(rois, roi_indices, std, bboxes, labels)[source]¶ Loss function for Head (pre).
This function processes RoIs for
bbox_head_loss_post().- Parameters
rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
std (tuple of floats) – Two coefficients used for encoding bounding boxes.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
labels – A list of arrays whose shape is \((R_n,)\).
bbox_head_loss_post¶
-
chainercv.links.model.fpn.bbox_head_loss_post(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]¶ Loss function for Head (post).
- Parameters
locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).
roi_indices (list of arrays) – A list of arrays returned by
bbox_head_locs_pre().gt_locs (list of arrays) – A list of arrays returned by
bbox_head_locs_pre().gt_labels (list of arrays) – A list of arrays returned by
bbox_head_locs_pre().batchsize (int) – The size of batch.
- Returns
loc_lossandconf_loss.- Return type
tuple of two variables
rpn_loss¶
-
chainercv.links.model.fpn.rpn_loss(locs, confs, anchors, sizes, bboxes)[source]¶ Loss function for RPN.
- Parameters
locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).
anchors (list of arrays) – A list of arrays returned by
anchors().sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
- Returns
loc_lossandconf_loss.- Return type
tuple of two variables
mask_head_loss_pre¶
-
chainercv.links.model.fpn.mask_head_loss_pre(rois, roi_indices, gt_masks, gt_bboxes, gt_head_labels, segm_size)[source]¶ Loss function for Mask Head (pre).
This function processes RoIs for
mask_head_loss_post()by selecting RoIs for mask loss calculation and preparing ground truth network output.- Parameters
rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
gt_masks (iterable of arrays) – An iterable of arrays whose shape is \((R_n, H, W)\), where \(R_n\) is the number of ground truth objects.
gt_head_labels (iterable of arrays) – An iterable of arrays of shape \((R_l,)\). This is a collection of ground-truth labels assigned to
roisduring bounding box localization stage. The range of value is \((0, n\_class - 1)\).segm_size (int) – Size of the ground truth network output.
- Returns
mask_rois,mask_roi_indices,gt_segms, andgt_mask_labels.rois: A list of arrays of shape \((R'_l, 4)\), where \(R'_l\) is the number of RoIs in the \(l\)-th feature map.
roi_indices: A list of arrays of shape \((R'_l,)\).
gt_segms: A list of arrays of shape \((R'_l, M, M). :math:\) is the argument
segm_size.gt_mask_labels: A list of arrays of shape \((R'_l,)\) indicating the classes of ground truth.
- Return type
tuple of four lists
mask_head_loss_post¶
-
chainercv.links.model.fpn.mask_head_loss_post(segms, mask_roi_indices, gt_segms, gt_mask_labels, batchsize)[source]¶ Loss function for Mask Head (post).
- Parameters
segms (array) – An array whose shape is \((R, n\_class, M, M)\), where \(R\) is the total number of RoIs in the given batch.
mask_roi_indices (array) – A list of arrays returned by
mask_head_loss_pre().gt_segms (list of arrays) – A list of arrays returned by
mask_head_loss_pre().gt_mask_labels (list of arrays) – A list of arrays returned by
mask_head_loss_pre().batchsize (int) – The size of batch.
- Returns
Mask loss.
- Return type
chainer.Variable
mask_to_segm¶
-
chainercv.links.model.fpn.mask_to_segm(mask, bbox, segm_size, index=None)[source]¶ Crop and resize mask.
This function requires cv2.
- Parameters
- Returns
See below.
- Return type
name
shape
dtype
format
mask\((N, H, W)\)
–
bbox\((R, 4)\)
float32\((y_{min}, x_{min}, y_{max}, x_{max})\)
index(optional)\((R,)\)
int32–
segms(output)\((R, S, S)\)
float32\([0, 1]\)