FCIS

Utility

FCIS

class chainercv.experimental.links.model.fcis.FCIS(extractor, rpn, head, mean, min_size, max_size, loc_normalize_mean, loc_normalize_std)

Base class for FCIS.

This is a base class for FCIS links supporting instance segmentation API [1]. The following three stages constitute FCIS.

  1. Feature extraction: Images are taken and their feature maps are calculated.
  2. Region Proposal Networks: Given the feature maps calculated in the previous stage, produce set of RoIs around objects.
  3. Localization, Segmentation and Classification Heads: Using feature maps that belong to the proposed RoIs, segment regions of the objects, classify the categories of the objects in the RoIs and improve localizations.

Each stage is carried out by one of the callable chainer.Chain objects feature, rpn and head. There are two functions predict() and __call__() to conduct instance segmentation. predict() takes images and returns masks, object labels and their scores. __call__() is provided for a scnerario when intermediate outputs are needed, for instance, for training and debugging.

Links that support instance segmentation API have method predict() with the same interface. Please refer to predict() for further details.

[1]Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017.
Parameters:
  • extractor (callable Chain) – A callable that takes a BCHW image array and returns feature maps.
  • rpn (callable Chain) – A callable that has the same interface as RegionProposalNetwork. Please refer to the documentation found there.
  • head (callable Chain) – A callable that takes a BCHW array, RoIs and batch indices for RoIs. This returns class-agnostic segmentation scores, class-agnostic localization parameters, class scores, improved RoIs and batch indices for RoIs.
  • mean (numpy.ndarray) – A value to be subtracted from an image in prepare().
  • min_size (int) – A preprocessing parameter for prepare(). Please refer to a docstring found for prepare().
  • max_size (int) – A preprocessing parameter for prepare().
  • loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
  • loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
__call__(x, scale=1.0)

Forward FCIS.

Scaling paramter scale is used by RPN to determine the threshold to select small objects, which are going to be rejected irrespective of their confidence scores.

Here are notations used.

  • \(N\) is the number of batch size
  • \(R'\) is the total number of RoIs produced across batches. Given \(R_i\) proposed RoIs from the \(i\) th image, \(R' = \sum _{i=1} ^ N R_i\).
  • \(L\) is the number of classes excluding the background.
  • \(RH\) is the height of pooled image by Position Sensitive ROI pooling.
  • \(RW\) is the height of pooled image by Position Sensitive ROI pooling.

Classes are ordered by the background, the first class, …, and the \(L\) th class.

Parameters:
  • x (Variable) – 4D image variable.
  • scale (float) – Amount of scaling applied to the raw image during preprocessing.
Returns:

Returns tuple of five values listed below.

  • roi_ag_seg_scores: Class-agnostic clipped mask scores for the proposed ROIs. Its shape is \((R', 2, RH, RW)\)
  • ag_locs: Class-agnostic offsets and scalings for the proposed RoIs. Its shape is \((R', 2, 4)\).
  • roi_cls_scores: Class predictions for the proposed RoIs. Its shape is \((R', L + 1)\).
  • rois: RoIs proposed by RPN. Its shape is \((R', 4)\).
  • roi_indices: Batch indices of RoIs. Its shape is \((R',)\).

Return type:

Variable, Variable, Variable, array, array

predict(imgs)

Segment object instances from images.

This method predicts instance-aware object regions for each image.

Parameters:imgs (iterable of numpy.ndarray) – Arrays holding images of shape \((B, C, H, W)\). All images are in CHW and RGB format and the range of their value is \([0, 255]\).
Returns:This method returns a tuple of three lists, (masks, labels, scores).
  • masks: A list of boolean arrays of shape \((R, H, W)\), where \(R\) is the number of masks in a image. Each pixel holds value if it is inside the object inside or not.
  • labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the masks. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
  • scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type:tuple of lists
prepare(img)

Preprocess an image for feature extraction.

The length of the shorter edge is scaled to self.min_size. After the scaling, if the length of the longer edge is longer than self.max_size, the image is scaled to fit the longer edge to self.max_size.

After resizing the image, the image is subtracted by a mean image value self.mean.

Parameters:img (ndarray) – An image. This is in CHW and RGB format. The range of its value is \([0, 255]\).
Returns:A preprocessed image.
Return type:ndarray
use_preset(preset)

Use the given preset during prediction.

This method changes values of self.nms_thresh, self.score_thresh, self.mask_merge_thresh, self.binary_thresh, self.binary_thresh and self.min_drop_size. These values are a threshold value used for non maximum suppression, a threshold value to discard low confidence proposals in predict(), a threshold value to merge mask in predict(), a threshold value to binalize segmentation scores in predict(), a limit number of predicted masks in one image and a threshold value to discard small bounding boxes respectively.

If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.

Parameters:preset ({'visualize', 'evaluate') – A string to determine the preset to use.

FCISResNet101Head

class chainercv.experimental.links.model.fcis.FCISResNet101Head(n_class, roi_size, group_size, spatial_scale, loc_normalize_mean, loc_normalize_std, iter2, initialW=None)

FCIS Head for ResNet101 based implementation.

This class is used as a head for FCIS. This outputs class-agnostice segmentation scores, class-agnostic localizations and classification based on feature maps in the given RoIs.

Parameters:
  • n_class (int) – The number of classes possibly including the background.
  • roi_size (int) – Height and width of the feature maps after Position Sensitive RoI pooling.
  • group_size (int) – Group height and width for Position Sensitive ROI pooling.
  • spatial_scale (float) – Scale of the roi is resized.
  • loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
  • loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
  • iter2 (bool) – if the value is set True, Position Sensitive ROI pooling is executed twice. In the second time, Position Sensitive ROI pooling uses improved ROIs by the localization parameters calculated in the first time.
  • initialW (callable) – Initializer for the layers.

mask_voting

chainercv.experimental.links.model.fcis.mask_voting(seg_prob, bbox, cls_prob, size, score_thresh, nms_thresh, mask_merge_thresh, binary_thresh, limit=100, bg_label=0)

Refine mask probabilities by merging multiple masks.

First, this function discard invalid masks with non maximum suppression. Then, it merges masks with weight calculated from class probabilities and iou. This function improves the mask qualities by merging overlapped masks predicted as the same object class.

Here are notations used. * \(R\) is the total number of RoIs produced in one image. * \(L\) is the number of classes excluding the background. * \(RH\) is the height of pooled image. * \(RW\) is the height of pooled image.

Parameters:
  • seg_prob (array) – A mask probability array whose shape is \((R, RH, RW)\).
  • bbox (array) – A bounding box array whose shape is \((R, 4)\).
  • cls_prob (array) – A class probability array whose shape is \((R, L + 1)\).
  • size (tuple of int) – Original image size.
  • score_thresh (float) – A threshold value of the class score.
  • nms_thresh (float) – A threshold value of non maximum suppression.
  • mask_merge_thresh (float) – A threshold value of the bounding box iou for mask merging.
  • binary_thresh (float) – A threshold value of mask score for mask merging.
  • limit (int) – The maximum number of outputs.
  • bg_label (int) – The id of the background label.
Returns:

  • v_seg_prob: Merged mask probability. Its shapes is \((N, RH, RW)\).
  • v_bbox: Bounding boxes for the merged masks. Its shape is \((N, 4)\).
  • v_label: Class labels for the merged masks. Its shape is \((N, )\).
  • v_score: Class probabilities for the merged masks. Its shape is \((N, )\).

Return type:

array, array, array, array

ResNet101Extractor

class chainercv.experimental.links.model.fcis.ResNet101Extractor(initialW=None)

ResNet101 Extractor for FCIS ResNet101 implementation.

This class is used as an extractor for FCISResNet101. This outputs feature maps. Dilated convolution is used in the C5 stage.

Parameters:initialW – Initializer for ResNet101 extractor.