FCIS¶
Instance Segmentation Link¶
FCISResNet101¶
-
class
chainercv.experimental.links.model.fcis.
FCISResNet101
(n_fg_class=None, pretrained_model=None, min_size=600, max_size=1000, roi_size=21, group_size=7, ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32], loc_normalize_mean=(0.0, 0.0, 0.0, 0.0), loc_normalize_std=(0.2, 0.2, 0.5, 0.5), iter2=True, resnet_initialW=None, rpn_initialW=None, head_initialW=None, proposal_creator_params=None)[source]¶ FCIS based on ResNet101.
When you specify the path of a pre-trained chainer model serialized as a
npz
file in the constructor, this chain model automatically initializes all the parameters with it. When a string in prespecified set is provided, a pretrained model is loaded from weights distributed on the Internet. The list of pretrained models supported are as follows:sbd
: Loads weights trained with the trainval split of Semantic Boundaries Dataset.
For descriptions on the interface of this model, please refer to
FCIS
.FCISResNet101
supports finer control on random initializations of weights by argumentsresnet_initialW
,rpn_initialW
andhead_initialW
. It accepts a callable that takes an array and edits its values. IfNone
is passed as an initializer, the default initializer is used.- Parameters
n_fg_class (int) – The number of classes excluding the background.
pretrained_model (str) – The destination of the pre-trained chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable.min_size (int) – A preprocessing paramter for
prepare()
.max_size (int) – A preprocessing paramter for
prepare()
.roi_size (int) – Height and width of the feature maps after Position Sensitive RoI pooling.
group_size (int) – Group height and width for Position Sensitive ROI pooling.
ratios (list of floats) – This is ratios of width to height of the anchors.
anchor_scales (list of numbers) – This is areas of anchors. Those areas will be the product of the square of an element in
anchor_scales
and the original area of the reference window.loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
iter2 (bool) – if the value is set
True
, Position Sensitive ROI pooling is executed twice. In the second time, Position Sensitive ROI pooling uses improved ROIs by the localization parameters calculated in the first time.resnet_initialW (callable) – Initializer for the layers corresponding to the ResNet101 layers.
rpn_initialW (callable) – Initializer for Region Proposal Network layers.
head_initialW (callable) – Initializer for the head layers.
proposal_creator_params (dict) – Key valued paramters for
ProposalCreator
.
Utility¶
FCIS¶
-
class
chainercv.experimental.links.model.fcis.
FCIS
(extractor, rpn, head, mean, min_size, max_size, loc_normalize_mean, loc_normalize_std)[source]¶ Base class for FCIS.
This is a base class for FCIS links supporting instance segmentation API 1. The following three stages constitute FCIS.
Feature extraction: Images are taken and their feature maps are calculated.
Region Proposal Networks: Given the feature maps calculated in the previous stage, produce set of RoIs around objects.
Localization, Segmentation and Classification Heads: Using feature maps that belong to the proposed RoIs, segment regions of the objects, classify the categories of the objects in the RoIs and improve localizations.
Each stage is carried out by one of the callable
chainer.Chain
objectsfeature
,rpn
andhead
. There are two functionspredict()
andforward()
to conduct instance segmentation.predict()
takes images and returns masks, object labels and their scores.forward()
is provided for a scnerario when intermediate outputs are needed, for instance, for training and debugging.Links that support instance segmentation API have method
predict()
with the same interface. Please refer topredict()
for further details.- 1
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017.
- Parameters
extractor (callable Chain) – A callable that takes a BCHW image array and returns feature maps.
rpn (callable Chain) – A callable that has the same interface as
RegionProposalNetwork
. Please refer to the documentation found there.head (callable Chain) – A callable that takes a BCHW array, RoIs and batch indices for RoIs. This returns class-agnostic segmentation scores, class-agnostic localization parameters, class scores, improved RoIs and batch indices for RoIs.
mean (numpy.ndarray) – A value to be subtracted from an image in
prepare()
.min_size (int) – A preprocessing parameter for
prepare()
. Please refer to a docstring found forprepare()
.loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
-
forward
(x, scales=None)[source]¶ Forward FCIS.
Scaling paramter
scale
is used by RPN to determine the threshold to select small objects, which are going to be rejected irrespective of their confidence scores.Here are notations used.
\(N\) is the number of batch size
\(R'\) is the total number of RoIs produced across batches. Given \(R_i\) proposed RoIs from the \(i\) th image, \(R' = \sum _{i=1} ^ N R_i\).
\(L\) is the number of classes excluding the background.
\(RH\) is the height of pooled image by Position Sensitive ROI pooling.
\(RW\) is the height of pooled image by Position Sensitive ROI pooling.
Classes are ordered by the background, the first class, …, and the \(L\) th class.
- Parameters
x (Variable) – 4D image variable.
scales (tuple of floats) – Amount of scaling applied to each input image during preprocessing.
- Returns
Returns tuple of five values listed below.
roi_ag_seg_scores: Class-agnostic clipped mask scores for the proposed ROIs. Its shape is \((R', 2, RH, RW)\)
ag_locs: Class-agnostic offsets and scalings for the proposed RoIs. Its shape is \((R', 2, 4)\).
roi_cls_scores: Class predictions for the proposed RoIs. Its shape is \((R', L + 1)\).
rois: RoIs proposed by RPN. Its shape is \((R', 4)\).
roi_indices: Batch indices of RoIs. Its shape is \((R',)\).
- Return type
Variable, Variable, Variable, array, array
-
predict
(imgs)[source]¶ Segment object instances from images.
This method predicts instance-aware object regions for each image.
- Parameters
imgs (iterable of numpy.ndarray) – Arrays holding images of shape \((B, C, H, W)\). All images are in CHW and RGB format and the range of their value is \([0, 255]\).
- Returns
This method returns a tuple of three lists,
(masks, labels, scores)
.masks: A list of boolean arrays of shape \((R, H, W)\), where \(R\) is the number of masks in a image. Each pixel holds value if it is inside the object inside or not.
labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the masks. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
- Return type
tuple of lists
-
prepare
(img)[source]¶ Preprocess an image for feature extraction.
The length of the shorter edge is scaled to
self.min_size
. After the scaling, if the length of the longer edge is longer thanself.max_size
, the image is scaled to fit the longer edge toself.max_size
.After resizing the image, the image is subtracted by a mean image value
self.mean
.
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
self.nms_thresh
,self.score_thresh
,self.mask_merge_thresh
,self.binary_thresh
,self.binary_thresh
andself.min_drop_size
. These values are a threshold value used for non maximum suppression, a threshold value to discard low confidence proposals inpredict()
, a threshold value to merge mask inpredict()
, a threshold value to binalize segmentation scores inpredict()
, a limit number of predicted masks in one image and a threshold value to discard small bounding boxes respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
- Parameters
preset ({'visualize', 'evaluate') – A string to determine the preset to use.
FCISResNet101Head¶
-
class
chainercv.experimental.links.model.fcis.
FCISResNet101Head
(n_class, roi_size, group_size, spatial_scale, loc_normalize_mean, loc_normalize_std, iter2, initialW=None)[source]¶ FCIS Head for ResNet101 based implementation.
This class is used as a head for FCIS. This outputs class-agnostice segmentation scores, class-agnostic localizations and classification based on feature maps in the given RoIs.
- Parameters
n_class (int) – The number of classes possibly including the background.
roi_size (int) – Height and width of the feature maps after Position Sensitive RoI pooling.
group_size (int) – Group height and width for Position Sensitive ROI pooling.
spatial_scale (float) – Scale of the roi is resized.
loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
iter2 (bool) – if the value is set
True
, Position Sensitive ROI pooling is executed twice. In the second time, Position Sensitive ROI pooling uses improved ROIs by the localization parameters calculated in the first time.initialW (callable) – Initializer for the layers.
mask_voting¶
-
chainercv.experimental.links.model.fcis.
mask_voting
(seg_prob, bbox, cls_prob, size, score_thresh, nms_thresh, mask_merge_thresh, binary_thresh, limit=100, bg_label=0)[source]¶ Refine mask probabilities by merging multiple masks.
First, this function discard invalid masks with non maximum suppression. Then, it merges masks with weight calculated from class probabilities and iou. This function improves the mask qualities by merging overlapped masks predicted as the same object class.
Here are notations used. * \(R\) is the total number of RoIs produced in one image. * \(L\) is the number of classes excluding the background. * \(RH\) is the height of pooled image. * \(RW\) is the height of pooled image.
- Parameters
seg_prob (array) – A mask probability array whose shape is \((R, RH, RW)\).
bbox (array) – A bounding box array whose shape is \((R, 4)\).
cls_prob (array) – A class probability array whose shape is \((R, L + 1)\).
size (tuple of int) – Original image size.
score_thresh (float) – A threshold value of the class score.
nms_thresh (float) – A threshold value of non maximum suppression.
mask_merge_thresh (float) – A threshold value of the bounding box iou for mask merging.
binary_thresh (float) – A threshold value of mask score for mask merging.
limit (int) – The maximum number of outputs.
bg_label (int) – The id of the background label.
- Returns
v_seg_prob: Merged mask probability. Its shapes is \((N, RH, RW)\).
v_bbox: Bounding boxes for the merged masks. Its shape is \((N, 4)\).
v_label: Class labels for the merged masks. Its shape is \((N, )\).
v_score: Class probabilities for the merged masks. Its shape is \((N, )\).
- Return type
array, array, array, array
ResNet101Extractor¶
-
class
chainercv.experimental.links.model.fcis.
ResNet101Extractor
(initialW=None)[source]¶ ResNet101 Extractor for FCIS ResNet101 implementation.
This class is used as an extractor for FCISResNet101. This outputs feature maps. Dilated convolution is used in the C5 stage.
- Parameters
initialW – Initializer for ResNet101 extractor.
Train-only Utility¶
FCISTrainChain¶
-
class
chainercv.experimental.links.model.fcis.
FCISTrainChain
(fcis, rpn_sigma=3.0, roi_sigma=1.0, anchor_target_creator=<chainercv.links.model.faster_rcnn.utils.anchor_target_creator.AnchorTargetCreator object>, proposal_target_creator=<chainercv.experimental.links.model.fcis.utils.proposal_target_creator.ProposalTargetCreator object>)[source]¶ Calculate losses for FCIS and report them.
This is used to train FCIS in the joint training scheme 2.
The losses include:
rpn_loc_loss
: The localization loss for Region Proposal Network (RPN).rpn_cls_loss
: The classification loss for RPN.roi_loc_loss
: The localization loss for the head module.roi_cls_loss
: The classification loss for the head module.roi_mask_loss
: The mask loss for the head module.
- 2(1,2,3)
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017.
- Parameters
fcis (FCIS) – A FCIS model for training.
rpn_sigma (float) – Sigma parameter for the localization loss of Region Proposal Network (RPN). The default value is 3, which is the value used in 2.
roi_sigma (float) – Sigma paramter for the localization loss of the head. The default value is 1, which is the value used in 2.
anchor_target_creator – An instantiation of
AnchorTargetCreator
.proposal_target_creator – An instantiation of
ProposalTargetCreator
.
-
forward
(imgs, masks, labels, bboxes, scale)[source]¶ Forward FCIS and calculate losses.
Here are notations used.
\(N\) is the batch size.
\(R\) is the number of bounding boxes per image.
\(H\) is the image height.
\(W\) is the image width.
Currently, only \(N=1\) is supported.
- Parameters
imgs (Variable) – A variable with a batch of images.
masks (Variable) – A batch of masks. Its shape is \((N, R, H, W)\).
labels (Variable) – A batch of labels. Its shape is \((N, R)\). The background is excluded from the definition, which means that the range of the value is \([0, L - 1]\). \(L\) is the number of foreground classes.
bboxes (Variable) – A batch of bounding boxes. Its shape is \((N, R, 4)\).
scale (float or Variable) – Amount of scaling applied to the raw image during preprocessing.
- Returns
Scalar loss variable. This is the sum of losses for Region Proposal Network and the head module.
- Return type
chainer.Variable
ProposalTargetCreator¶
-
class
chainercv.experimental.links.model.fcis.
ProposalTargetCreator
(n_sample=128, pos_ratio=0.25, pos_iou_thresh=0.5, neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.1, binary_thresh=0.4)[source]¶ Assign ground truth classes, bounding boxes and masks to given RoIs.
The
__call__()
of this class generates training targets for each object proposal. This is used to train FCIS 3.- 3
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017.
- Parameters
n_sample (int) – The number of sampled regions.
pos_ratio (float) – Fraction of regions that is labeled as a foreground.
pos_iou_thresh (float) – IoU threshold for a RoI to be considered as a foreground.
neg_iou_thresh_hi (float) – RoI is considered to be the background if IoU is in [
neg_iou_thresh_hi
,neg_iou_thresh_hi
).neg_iou_thresh_lo (float) – See above.
binary_thresh (float) – Threshold for resized mask.
-
__call__
(roi, mask, label, bbox, loc_normalize_mean=(0.0, 0.0, 0.0, 0.0), loc_normalize_std=(0.2, 0.2, 0.5, 0.5), mask_size=(21, 21))[source]¶ Assigns ground truth to sampled proposals.
This function samples total of
self.n_sample
RoIs from the combination ofroi
,mask
,label
and :obj: bbox. The RoIs are assigned with the ground truth class labels as well as bounding box offsets and scales to match the ground truth bounding boxes. As many aspos_ratio * self.n_sample
RoIs are sampled as foregrounds.Offsets and scales of bounding boxes are calculated using
chainercv.links.model.faster_rcnn.bbox2loc()
. Also, types of input arrays and output arrays are same.Here are notations.
\(S\) is the total number of sampled RoIs, which equals
self.n_sample
.\(L\) is number of object classes possibly including the background.
\(H\) is the image height.
\(W\) is the image width.
\(RH\) is the mask height.
\(RW\) is the mask width.
- Parameters
roi (array) – Region of Interests (RoIs) from which we sample. Its shape is \((R, 4)\)
mask (array) – The coordinates of ground truth masks. Its shape is \((R', H, W)\).
label (array) – Ground truth bounding box labels. Its shape is \((R',)\). Its range is \([0, L - 1]\), where \(L\) is the number of foreground classes.
bbox (array) – The coordinates of ground truth bounding boxes. Its shape is \((R', 4)\).
loc_normalize_mean (tuple of four floats) – Mean values to normalize coordinates of bounding boxes.
loc_normalize_std (tuple of four floats) – Standard deviation of the coordinates of bounding boxes.
mask_size (tuple of int or int) – Generated mask size, which is equal to \((RH, RW)\).
- Returns
sample_roi: Regions of interests that are sampled. Its shape is \((S, 4)\).
gt_roi_mask: Masks assigned to sampled RoIs. Its shape is \((S, RH, RW)\).
gt_roi_label: Labels assigned to sampled RoIs. Its shape is \((S,)\). Its range is \([0, L]\). The label with value 0 is the background.
gt_roi_loc: Offsets and scales to match the sampled RoIs to the ground truth bounding boxes. Its shape is \((S, 4)\).
- Return type
(array, array, array, array)