ChainerCV

ChainerCV is a deep learning based computer vision library built on top of Chainer.

Install Guide

Pip

You can install ChainerCV using pip.

pip install -U numpy
pip install chainercv

Anaconda

Build instruction using Anaconda is as follows.

# For python 3
# wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh

bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
conda config --set always_yes yes --set changeps1 no
conda update -q conda

# Download ChainerCV and go to the root directory of ChainerCV
git clone https://github.com/chainer/chainercv
cd chainercv
conda env create -f environment.yml
source activate chainercv

# Install ChainerCV
pip install -e .

# Try our demos at examples/* !

Reference Manual

ChainerCV Reference Manual

Datasets

DirectoryParsingLabelDataset
class chainercv.datasets.DirectoryParsingLabelDataset(root, check_img_file=None, color=True, numerical_sort=False)

A label dataset whose label names are the names of the subdirectories.

The label names are the names of the directories that locate a layer below the root directory. All images locating under the subdirectoies will be categorized to classes with subdirectory names. An image is parsed only when the function check_img_file returns True by taking the path to the image as an argument. If check_img_file is None, the path with any image extensions will be parsed.

Example

A directory structure should be one like below.

root
|-- class_0
|   |-- img_0.png
|   |-- img_1.png
|
--- class_1
    |-- img_0.png
>>> from chainercv.datasets import DirectoryParsingLabelDataset
>>> dataset = DirectoryParsingLabelDataset('root')
>>> dataset.paths
['root/class_0/img_0.png', 'root/class_0/img_1.png',
'root_class_1/img_0.png']
>>> dataset.labels
array([0, 0, 1])
Parameters:
  • root (str) – The root directory.
  • check_img_file (callable) – A function to determine if a file should be included in the dataset.
  • color (bool) – If True, this dataset read images as color images.
  • numerical_sort (bool) – Label names are sorted numerically. This means that label 2 is before label 10, which is not the case when string sort is used. Regardless of this option, string sort is used for the order of files with the same label. The default value is False.
directory_parsing_label_names
chainercv.datasets.directory_parsing_label_names(root, numerical_sort=False)

Get label names from the directories that are named by them.

The label names are the names of the directories that locate a layer below the root directory.

The label names can be used together with chainercv.datasets.DirectoryParsingLabelDataset. The index of a label name corresponds to the label id that is used by the dataset to refer the label.

Parameters:
  • root (str) – The root directory.
  • numerical_sort (bool) – Label names are sorted numerically. This means that label 2 is before label 10, which is not the case when string sort is used. The default value is False.
Retruns:
list of strings: Sorted names of classes.
TransformDataset
TransformDataset
class chainercv.datasets.TransformDataset(dataset, transform)

Dataset that indexes data of a base dataset and transforms it.

This dataset wraps a base dataset by modifying the behavior of the base dataset’s __getitem__(). Arrays returned by __getitem__() of the base dataset with an integer index are transformed by the given function transform.

The function transform takes, as an argument, in_data, which is output of the base dataset’s __getitem__(), and returns transformed arrays as output. Please see the following example.

>>> from chainer.datasets import get_mnist
>>> from chainercv.datasets import TransformDataset
>>> dataset, _ = get_mnist()
>>> def transform(in_data):
>>>     img, label = in_data
>>>     img -= 0.5  # scale to [-0.5, -0.5]
>>>     return img, label
>>> dataset = TransformDataset(dataset, transform)

Note

The index used to access data is either an integer or a slice. If it is a slice, the base dataset is assumed to return a list of outputs each corresponding to the output of the integer indexing.

Note

This class is deprecated. Please use chainer.datasets.TransformDataset instead.

Parameters:
  • dataset – Underlying dataset. The index of this dataset corresponds to the index of the base dataset.
  • transform (callable) – A function that is called to transform values returned by the underlying dataset’s __getitem__().
ADE20K
ADE20KSemanticSegmentationDataset
class chainercv.datasets.ADE20KSemanticSegmentationDataset(data_dir='auto', split='train')

Semantic segmentation dataset for ADE20K.

This is ADE20K dataset distributed in MIT Scene Parsing Benchmark website. It has 20,210 training images and 2,000 validation images.

Parameters:
  • data_dir (string) – Path to the dataset directory. The directory should contain the ADEChallengeData2016 directory. And that directory should contain at least images and annotations directries. If auto is given, the dataset is automatically downloaded into $CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k.
  • split ({'train', 'val'}) – Select from dataset splits used in MIT Scene Parsing Benchmark dataset (ADE20K).
ADE20KTestImageDataset
class chainercv.datasets.ADE20KTestImageDataset(data_dir='auto')

Image dataset for test split of ADE20K.

This is an image dataset of test split in ADE20K dataset distributed at MIT Scene Parsing Benchmark website. It has 3,352 test images.

Parameters:data_dir (string) – Path to the dataset directory. The directory should contain the release_test dir. If auto is given, the dataset is automatically downloaded into $CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k.
CamVid
CamVidDataset
class chainercv.datasets.CamVidDataset(data_dir='auto', split='train')

Semantic segmentation dataset for CamVid u.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/camvid.
  • split ({'train', 'val', 'test'}) – Select from dataset splits used in CamVid Dataset.
CityscapesSemanticSegmentationDataset
class chainercv.datasets.CityscapesSemanticSegmentationDataset(data_dir=None, label_resolution=None, split='train', ignore_labels=True)

Semantic segmentation dataset for Cityscapes dataset.

Note

Please manually downalod the data because it is not allowed to re-distribute Cityscapes dataset.

Parameters:
  • data_dir (string) – Path to the dataset directory. The directory should contain at least two directories, leftImg8bit and either gtFine or gtCoarse. If None is given, it uses $CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes by default.
  • label_resolution ({'fine', 'coarse'}) – The resolution of the labels. It should be either fine or coarse.
  • split ({'train', 'val'}) – Select from dataset splits used in Cityscapes dataset.
  • ignore_labels (bool) – If True, the labels marked ignoreInEval defined in the original cityscapesScripts<https://github.com/mcordts/cityscapesScripts>_ will be replaced with -1 in the get_example() method. The default value is True.
CUB
CUBLabelDataset
class chainercv.datasets.CUBLabelDataset(data_dir='auto', return_bb=False, prob_map_dir='auto', return_prob_map=False)

Caltech-UCSD Birds-200-2011 dataset with annotated class labels.

When queried by an index, this dataset returns a corresponding img, label, a tuple of an image and class id. The image is in RGB and CHW format. The class id are between 0 and 199. If return_bb = True, a bounding box bb is appended to the tuple. If return_prob_map = True, a probability map prob_map is appended.

A bounding box is a one-dimensional array of shape \((4,)\). The elements of the bounding box corresponds to (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices. This information can optionally be retrieved from the dataset by setting return_bb = True.

The probability map of a bird shows how likely the bird is located at each pixel. If the value is close to 1, it is likely that the bird locates at that pixel. The shape of this array is \((H, W)\), where \(H\) and \(W\) are height and width of the image respectively. This information can optionally be retrieved from the dataset by setting return_prob_map = True.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • return_bb (bool) – If True, this returns a bounding box around a bird. The default value is False.
  • prob_map_dir (string) – Path to the root of the probability maps. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • return_prob_map (bool) – Decide whether to include a probability map of the bird in a tuple served for a query. The default value is False.
CUBKeypointDataset
class chainercv.datasets.CUBKeypointDataset(data_dir='auto', return_bb=False, prob_map_dir='auto', return_prob_map=False)

Caltech-UCSD Birds-200-2011 dataset with annotated keypoints.

An index corresponds to each image.

When queried by an index, this dataset returns the corresponding img, keypoint, kp_mask, a tuple of an image, keypoints and a keypoint mask that indicates visible keypoints in the image. The data type of the three elements are float32, float32, bool. If return_bb = True, a bounding box bb is appended to the tuple. If return_prob_map = True, a probability map prob_map is appended.

keypoints are packed into a two dimensional array of shape \((K, 2)\), where \(K\) is the number of keypoints. Note that \(K=15\) in CUB dataset. Also note that not all fifteen keypoints are visible in an image. When a keypoint is not visible, the values stored for that keypoint are undefined. The second axis corresponds to the \(y\) and \(x\) coordinates of the keypoints in the image.

A keypoint mask array indicates whether a keypoint is visible in the image or not. This is a boolean array of shape \((K,)\).

A bounding box is a one-dimensional array of shape \((4,)\). The elements of the bounding box corresponds to (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices. This information can optionally be retrieved from the dataset by setting return_bb = True.

The probability map of a bird shows how likely the bird is located at each pixel. If the value is close to 1, it is likely that the bird locates at that pixel. The shape of this array is \((H, W)\), where \(H\) and \(W\) are height and width of the image respectively. This information can optionally be retrieved from the dataset by setting return_prob_map = True.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • return_bb (bool) – If True, this returns a bounding box around a bird. The default value is False.
  • prob_map_dir (string) – Path to the root of the probability maps. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • return_prob_map (bool) – Decide whether to include a probability map of the bird in a tuple served for a query. The default value is False.
OnlineProducts
OnlineProductsDataset
class chainercv.datasets.OnlineProductsDataset(data_dir='auto', split='train')

Dataset class for Stanford Online Products Dataset.

When queried by an index, this dataset returns a corresponding img, class_id, super_class_id, a tuple of an image, a class id and a coarse level class id. Images are in RGB and CHW format. Class ids start from 0. The name of the \(l\) th coarse level class is \(l\) th element of chainercv.datasets.online_products_super_label_names.

The split selects train and test split of the dataset as done in [1]. The train split contains the first 11318 classes and the test split contains the remaining 11316 classes.

[1]Hyun Oh Song, Yu Xiang, Stefanie Jegelka, Silvio Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. arXiv 2015.
Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/online_products.
  • split ({'train', 'test'}) – Select a split of the dataset.
PASCAL VOC
VOCBboxDataset
class chainercv.datasets.VOCBboxDataset(data_dir='auto', split='train', year='2012', use_difficult=False, return_difficult=False)

Bounding box dataset for PASCAL VOC.

The index corresponds to each image.

When queried by an index, if return_difficult == False, this dataset returns a corresponding img, bbox, label, a tuple of an image, bounding boxes and labels. This is the default behaviour. If return_difficult == True, this dataset returns corresponding img, bbox, label, difficult. difficult is a boolean array that indicates whether bounding boxes are labeled as difficult or not.

The bounding boxes are packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices.

The labels are packed into a one dimensional tensor of shape \((R,)\). \(R\) is the number of bounding boxes in the image. The class name of the label \(l\) is \(l\) th element of chainercv.datasets.voc_bbox_label_names.

The array difficult is a one dimensional boolean array of shape \((R,)\). \(R\) is the number of bounding boxes in the image. If use_difficult is False, this array is a boolean array with all False.

The type of the image, the bounding boxes and the labels are as follows.

  • img.dtype == numpy.float32
  • bbox.dtype == numpy.float32
  • label.dtype == numpy.int32
  • difficult.dtype == numpy.bool
Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
  • split ({'train', 'val', 'trainval', 'test'}) – Select a split of the dataset. test split is only available for 2007 dataset.
  • year ({'2007', '2012'}) – Use a dataset prepared for a challenge held in year.
  • use_difficult (bool) – If true, use images that are labeled as difficult in the original annotation.
  • return_difficult (bool) – If true, this dataset returns a boolean array that indicates whether bounding boxes are labeled as difficult or not. The default value is False.
VOCSemanticSegmentationDataset
class chainercv.datasets.VOCSemanticSegmentationDataset(data_dir='auto', split='train')

Semantic segmentation dataset for PASCAL VOC2012.

The class name of the label \(l\) is \(l\) th element of chainercv.datasets.voc_semantic_segmentation_label_names.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
  • split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

Evaluations

Detection VOC
eval_detection_voc
chainercv.evaluations.eval_detection_voc(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5, use_07_metric=False)

Calculate average precisions based on evaluation code of PASCAL VOC.

This function evaluates predicted bounding boxes obtained from a dataset which has \(N\) images by using average precision for each class. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters:
  • pred_bboxes (iterable of numpy.ndarray) – An iterable of \(N\) sets of bounding boxes. Its index corresponds to an index for the base dataset. Each element of pred_bboxes is a set of coordinates of bounding boxes. This is an array whose shape is \((R, 4)\), where \(R\) corresponds to the number of bounding boxes, which may vary among boxes. The second axis corresponds to y_min, x_min, y_max, x_max of a bounding box.
  • pred_labels (iterable of numpy.ndarray) – An iterable of labels. Similar to pred_bboxes, its index corresponds to an index for the base dataset. Its length is \(N\).
  • pred_scores (iterable of numpy.ndarray) – An iterable of confidence scores for predicted bounding boxes. Similar to pred_bboxes, its index corresponds to an index for the base dataset. Its length is \(N\).
  • gt_bboxes (iterable of numpy.ndarray) – An iterable of ground truth bounding boxes whose length is \(N\). An element of gt_bboxes is a bounding box whose shape is \((R, 4)\). Note that the number of bounding boxes in each image does not need to be same as the number of corresponding predicted boxes.
  • gt_labels (iterable of numpy.ndarray) – An iterable of ground truth labels which are organized similarly to gt_bboxes.
  • gt_difficults (iterable of numpy.ndarray) – An iterable of boolean arrays which is organized similarly to gt_bboxes. This tells whether the corresponding ground truth bounding box is difficult or not. By default, this is None. In that case, this function considers all bounding boxes to be not difficult.
  • iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.
  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.
Returns:

The keys, value-types and the description of the values are listed below.

  • ap (numpy.ndarray): An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.
  • map (float): The average of Average Precisions over classes.

Return type:

dict

calc_detection_voc_ap
chainercv.evaluations.calc_detection_voc_ap(prec, rec, use_07_metric=False)

Calculate average precisions based on evaluation code of PASCAL VOC.

This function calculates average precisions from given precisions and recalls. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters:
  • prec (list of numpy.array) – A list of arrays. prec[l] indicates precision for class \(l\). If prec[l] is None, this function returns numpy.nan for class \(l\).
  • rec (list of numpy.array) – A list of arrays. rec[l] indicates recall for class \(l\). If rec[l] is None, this function returns numpy.nan for class \(l\).
  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.
Returns:

This function returns an array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If prec[l] or rec[l] is None, the corresponding value is set to numpy.nan.

Return type:

ndarray

calc_detection_voc_prec_rec
chainercv.evaluations.calc_detection_voc_prec_rec(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5)

Calculate precision and recall based on evaluation code of PASCAL VOC.

This function calculates precision and recall of predicted bounding boxes obtained from a dataset which has \(N\) images. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters:
  • pred_bboxes (iterable of numpy.ndarray) – An iterable of \(N\) sets of bounding boxes. Its index corresponds to an index for the base dataset. Each element of pred_bboxes is a set of coordinates of bounding boxes. This is an array whose shape is \((R, 4)\), where \(R\) corresponds to the number of bounding boxes, which may vary among boxes. The second axis corresponds to y_min, x_min, y_max, x_max of a bounding box.
  • pred_labels (iterable of numpy.ndarray) – An iterable of labels. Similar to pred_bboxes, its index corresponds to an index for the base dataset. Its length is \(N\).
  • pred_scores (iterable of numpy.ndarray) – An iterable of confidence scores for predicted bounding boxes. Similar to pred_bboxes, its index corresponds to an index for the base dataset. Its length is \(N\).
  • gt_bboxes (iterable of numpy.ndarray) – An iterable of ground truth bounding boxes whose length is \(N\). An element of gt_bboxes is a bounding box whose shape is \((R, 4)\). Note that the number of bounding boxes in each image does not need to be same as the number of corresponding predicted boxes.
  • gt_labels (iterable of numpy.ndarray) – An iterable of ground truth labels which are organized similarly to gt_bboxes.
  • gt_difficults (iterable of numpy.ndarray) – An iterable of boolean arrays which is organized similarly to gt_bboxes. This tells whether the corresponding ground truth bounding box is difficult or not. By default, this is None. In that case, this function considers all bounding boxes to be not difficult.
  • iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value..
Returns:

This function returns two lists: prec and rec.

  • prec: A list of arrays. prec[l] is precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, prec[l] is set to None.
  • rec: A list of arrays. rec[l] is recall for class \(l\). If class \(l\) that is not marked as difficult does not exist in gt_labels, rec[l] is set to None.

Return type:

tuple of two lists

Semantic Segmentation IoU
eval_semantic_segmentation
chainercv.evaluations.eval_semantic_segmentation(pred_labels, gt_labels)

Evaluate metrics used in Semantic Segmentation.

This function calculates Intersection over Union (IoU), Pixel Accuracy and Class Accuracy for the task of semantic segmentation.

The definition of metrics calculated by this function is as follows, where \(N_{ij}\) is the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

  • \(\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)
  • \(\text{mIoU} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)
  • \(\text{Pixel Accuracy} = \frac {\sum_{i=1}^k N_{ii}} {\sum_{i=1}^k \sum_{j=1}^k N_{ij}}\)
  • \(\text{Class Accuracy} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}\)
  • \(\text{Mean Class Accuracy} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}\)

The more detailed description of the above metrics can be found in a review on semantic segmentation [1].

The number of classes \(n\_class\) is \(max(pred\_labels, gt\_labels) + 1\), which is the maximum class id of the inputs added by one.

[1]Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017.
Parameters:
  • pred_labels (iterable of numpy.ndarray) – A collection of predicted labels. The shape of a label array is \((H, W)\). \(H\) and \(W\) are height and width of the label. For example, this is a list of labels [label_0, label_1, ...], where label_i.shape = (H_i, W_i).
  • gt_labels (iterable of numpy.ndarray) – A collection of ground truth labels. The shape of a ground truth label array is \((H, W)\), and its corresponding prediction label should have the same shape. A pixel with value -1 will be ignored during evaluation.
Returns:

The keys, value-types and the description of the values are listed below.

  • iou (numpy.ndarray): An array of IoUs for the \(n\_class\) classes. Its shape is \((n\_class,)\).
  • miou (float): The average of IoUs over classes.
  • pixel_accuracy (float): The computed pixel accuracy.
  • class_accuracy (numpy.ndarray): An array of class accuracies for the \(n\_class\) classes. Its shape is \((n\_class,)\).
  • mean_class_accuracy (float): The average of class accuracies.

Return type:

dict

calc_semantic_segmentation_confusion
chainercv.evaluations.calc_semantic_segmentation_confusion(pred_labels, gt_labels)

Collect a confusion matrix.

The number of classes \(n\_class\) is \(max(pred\_labels, gt\_labels) + 1\), which is the maximum class id of the inputs added by one.

Parameters:
  • pred_labels (iterable of numpy.ndarray) – A collection of predicted labels. The shape of a label array is \((H, W)\). \(H\) and \(W\) are height and width of the label.
  • gt_labels (iterable of numpy.ndarray) – A collection of ground truth labels. The shape of a ground truth label array is \((H, W)\), and its corresponding prediction label should have the same shape. A pixel with value -1 will be ignored during evaluation.
Returns:

A confusion matrix. Its shape is \((n\_class, n\_class)\). The \((i, j)\) th element corresponds to the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

Return type:

numpy.ndarray

calc_semantic_segmentation_iou
chainercv.evaluations.calc_semantic_segmentation_iou(confusion)

Calculate Intersection over Union with a given confusion matrix.

The definition of Intersection over Union (IoU) is as follows, where \(N_{ij}\) is the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

  • \(\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)
Parameters:confusion (numpy.ndarray) – A confusion matrix. Its shape is \((n\_class, n\_class)\). The \((i, j)\) th element corresponds to the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.
Returns:An array of IoUs for the \(n\_class\) classes. Its shape is \((n\_class,)\).
Return type:numpy.ndarray

Extensions

Evaluator
DetectionVOCEvaluator
class chainercv.extensions.DetectionVOCEvaluator(iterator, target, use_07_metric=False, label_names=None)

An extension that evaluates a detection model by PASCAL VOC metric.

This extension iterates over an iterator and evaluates the prediction results by average precisions (APs) and mean of them (mean Average Precision, mAP). This extension reports the following values with keys. Please note that 'ap/<label_names[l]>' is reported only if label_names is specified.

  • 'map': Mean of average precisions (mAP).
  • 'ap/<label_names[l]>': Average precision for class label_names[l], where \(l\) is the index of the class. For example, this evaluator reports 'ap/aeroplane', 'ap/bicycle', etc. if label_names is voc_bbox_label_names. If there is no bounding box assigned to class label_names[l] in either ground truth or prediction, it reports numpy.nan as its average precision. In this case, mAP is computed without this class.
Parameters:
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, bbox, label or img, bbox, label, difficult. img is an image, bbox is coordinates of bounding boxes, label is labels of the bounding boxes and difficult is whether the bounding boxes are difficult or not. If difficult is returned, difficult ground truth will be ignored from evaluation.
  • target (chainer.Link) – A detection link. This link must have predict() method that takes a list of images and returns bboxes, labels and scores.
  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.
  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision for each class is also reported with the key 'ap/<label_names[l]>'.
SemanticSegmentationEvaluator
class chainercv.extensions.SemanticSegmentationEvaluator(iterator, target, label_names=None)

An extension that evaluates a semantic segmentation model.

This extension iterates over an iterator and evaluates the prediction results of the model by common evaluation metrics for semantic segmentation. This extension reports values with keys below. Please note that 'iou/<label_names[l]>' and 'class_accuracy/<label_names[l]>' are reported only if label_names is specified.

  • 'miou': Mean of IoUs (mIoU).
  • 'iou/<label_names[l]>': IoU for class label_names[l], where \(l\) is the index of the class. For example, if label_names is camvid_label_names, this evaluator reports 'iou/Sky', 'ap/Building', etc.
  • 'mean_class_accuracy': Mean of class accuracies.
  • 'class_accuracy/<label_names[l]>': Class accuracy for class label_names[l], where \(l\) is the index of the class.
  • 'pixel_accuracy': Pixel accuracy.

If there is no label assigned to class label_names[l] in the ground truth, values corresponding to keys 'iou/<label_names[l]>' and 'class_accuracy/<label_names[l]>' are numpy.nan. In that case, the means of them are calculated by excluding them from calculation.

For details on the evaluation metrics, please see the documentation for chainercv.evaluations.eval_semantic_segmentation().

Parameters:
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, label. img is an image, label is pixel-wise label.
  • target (chainer.Link) – A semantic segmentation link. This link should have predict() method that takes a list of images and returns labels.
  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, IoU and class accuracy for each class are also reported with the keys 'iou/<label_names[l]>' and 'class_accuracy/<label_names[l]>'.
Visualization Report
DetectionVisReport
class chainercv.extensions.DetectionVisReport(iterator, target, label_names=None, filename='detection_iter={iteration}_idx={index}.jpg')

An extension that visualizes output of a detection model.

This extension visualizes the predicted bounding boxes together with the ground truth bounding boxes.

Internally, this extension takes examples from an iterator, predict bounding boxes from the images in the examples, and visualizes them using chainercv.visualizations.vis_bbox(). The process can be illustrated in the following code.

batch = next(iterator)
# Convert batch -> imgs, gt_bboxes, gt_labels
pred_bboxes, pred_labels, pred_scores = target.predict(imgs)
# Visualization code
for img, gt_bbox, gt_label, pred_bbox, pred_label, pred_score \
        in zip(imgs, gt_boxes, gt_labels,
               pred_bboxes, pred_labels, pred_scores):
    # the ground truth
    vis_bbox(img, gt_bbox, gt_label)
    # the prediction
    vis_bbox(img, pred_bbox, pred_label, pred_score)

Note

gt_bbox and pred_bbox are float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. Each bounding box is organized by (y_min, x_min, y_max, x_max) in the second axis.

gt_label and pred_label are intenger arrays of shape \((R,)\). Each label indicates the class of the bounding box.

pred_score is a float array of shape \((R,)\). Each score indicates how confident the prediction is.

Parameters:
  • iterator – Iterator object that produces images and ground truth.
  • target – Link object used for detection.
  • label_names (iterable of str) – Name of labels ordered according to label ids. If this is None, labels will be skipped.
  • filename (str) – Basename for the saved image. It can contain two keywords, '{iteration}' and '{index}'. They are replaced with the iteration of the trainer and the index of the sample when this extension save an image. The default value is 'detection_iter={iteration}_idx={index}.jpg'.

Transforms

Image
center_crop
chainercv.transforms.center_crop(img, size, return_param=False, copy=False)

Center crop an image by size.

An image is cropped to size. The center of the output image and the center of the input image are same.

Parameters:
  • img (ndarray) – An image array to be cropped. This is in CHW format.
  • size (tuple) – The size of output image after cropping. This value is \((height, width)\).
  • return_param (bool) – If True, this function returns information of slices.
  • copy (bool) – If False, a view of img is returned.
Returns:

If return_param = False, returns an array out_img that is cropped from the input array.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_slice (slice): A slice used to crop the input image. The relation below holds together with x_slice.

  • x_slice (slice): Similar to y_slice.

    out_img = img[:, y_slice, x_slice]
    

Return type:

ndarray or (ndarray, dict)

flip
chainercv.transforms.flip(img, y_flip=False, x_flip=False, copy=False)

Flip an image in vertical or horizontal direction as specified.

Parameters:
  • img (ndarray) – An array that gets flipped. This is in CHW format.
  • y_flip (bool) – Flip in vertical direction.
  • x_flip (bool) – Flip in horizontal direction.
  • copy (bool) – If False, a view of img will be returned.
Returns:

Transformed img in CHW format.

pca_lighting
chainercv.transforms.pca_lighting(img, sigma, eigen_value=None, eigen_vector=None)

AlexNet style color augmentation

This method adds a noise vector drawn from a Gaussian. The direction of the Gaussian is same as that of the principal components of the dataset.

This method is used in training of AlexNet [1].

[1]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
Parameters:
  • img (ndarray) – An image array to be augmented. This is in CHW and RGB format.
  • sigma (float) – Standard deviation of the Gaussian. In the original paper, this value is 10% of the range of intensity (25.5 if the range is \([0, 255]\)).
  • eigen_value (ndarray) – An array of eigen values. The shape has to be \((3,)\). If it is not specified, the values computed from ImageNet are used.
  • eigen_vector (ndarray) – An array of eigen vectors. The shape has to be \((3, 3)\). If it is not specified, the vectors computed from ImageNet are used.
Returns:

An image in CHW format.

random_crop
chainercv.transforms.random_crop(img, size, return_param=False, copy=False)

Crop array randomly into size.

The input image is cropped by a randomly selected region whose shape is size.

Parameters:
  • img (ndarray) – An image array to be cropped. This is in CHW format.
  • size (tuple) – The size of output image after cropping. This value is \((height, width)\).
  • return_param (bool) – If True, this function returns information of slices.
  • copy (bool) – If False, a view of img is returned.
Returns:

If return_param = False, returns an array out_img that is cropped from the input array.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_slice (slice): A slice used to crop the input image. The relation below holds together with x_slice.

  • x_slice (slice): Similar to x_slice.

    out_img = img[:, y_slice, x_slice]
    

Return type:

ndarray or (ndarray, dict)

random_expand
chainercv.transforms.random_expand(img, max_ratio=4, fill=0, return_param=False)

Expand an image randomly.

This method randomly place the input image on a larger canvas. The size of the canvas is \((rH, rW)\), where \((H, W)\) is the size of the input image and \(r\) is a random ratio drawn from \([1, max\_ratio]\). The canvas is filled by a value fill except for the region where the original image is placed.

This data augmentation trick is used to create “zoom out” effect [2].

[2]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
Parameters:
  • img (ndarray) – An image array to be augmented. This is in CHW format.
  • max_ratio (float) – The maximum ratio of expansion. In the original paper, this value is 4.
  • fill (float, tuple or ndarray) – The value of padded pixels. In the original paper, this value is the mean of ImageNet. If it is numpy.ndarray, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels of img.
  • return_param (bool) – Returns random parameters.
Returns:

If return_param = False, returns an array out_img that is the result of expansion.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • ratio (float): The sampled value used to make the canvas.
  • y_offset (int): The y coodinate of the top left corner of the image after placing on the canvas.
  • x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.

Return type:

ndarray or (ndarray, dict)

random_flip
chainercv.transforms.random_flip(img, y_random=False, x_random=False, return_param=False, copy=False)

Randomly flip an image in vertical or horizontal direction.

Parameters:
  • img (ndarray) – An array that gets flipped. This is in CHW format.
  • y_random (bool) – Randomly flip in vertical direction.
  • x_random (bool) – Randomly flip in horizontal direction.
  • return_param (bool) – Returns information of flip.
  • copy (bool) – If False, a view of img will be returned.
Returns:

If return_param = False, returns an array out_img that is the result of flipping.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_flip (bool): Whether the image was flipped in the vertical direction or not.
  • x_flip (bool): Whether the image was flipped in the horizontal direction or not.

Return type:

ndarray or (ndarray, dict)

random_rotate
chainercv.transforms.random_rotate(img, return_param=False)

Randomly rotate images by 90, 180, 270 or 360 degrees.

Parameters:
  • img (ndarray) – An arrays that get flipped. This is in CHW format.
  • return_param (bool) – Returns information of rotation.
Returns:

If return_param = False, returns an array out_img that is the result of rotation.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • k (int): The integer that represents the number of times the image is rotated by 90 degrees.

Return type:

ndarray or (ndarray, dict)

resize
chainercv.transforms.resize(img, size, interpolation=2)

Resize image to match the given shape.

This method uses cv2 or PIL for the backend. If cv2 is installed, this function uses the implementation in cv2. This implementation is faster than the implementation in PIL. Under Anaconda environment, cv2 can be installed by the following command.

$ conda install -c menpo opencv3=3.2.0
Parameters:
  • img (ndarray) – An array to be transformed. This is in CHW format and the type should be numpy.float32.
  • size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).
  • interpolation (int) – Determines sampling strategy. This is one of PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS. Bilinear interpolation is the default strategy.
Returns:

A resize array in CHW format.

Return type:

ndarray

resize_contain
chainercv.transforms.resize_contain(img, size, fill=0, return_param=False)

Resize the image to fit in the given area while keeping aspect ratio.

If both the height and the width in size are larger than the height and the width of the img, the img is placed on the center with an appropriate padding to match size.

Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.

Parameters:
  • img (ndarray) – An array to be transformed. This is in CHW format.
  • size (tuple of two ints) – A tuple of two elements: height, width. The size of the image after resizing.
  • fill (float, tuple or ndarray) – The value of padded pixels. If it is numpy.ndarray, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels of img.
  • return_param (bool) – Returns information of resizing and offsetting.
Returns:

If return_param = False, returns an array out_img that is the result of resizing.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_offset (int): The y coodinate of the top left corner of the image after placing on the canvas.
  • x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
  • scaled_size (tuple): The size to which the image is scaled to before placing it on a canvas. This is a tuple of two elements: height, width.

Return type:

ndarray or (ndarray, dict)

scale
chainercv.transforms.scale(img, size, fit_short=True, interpolation=2)

Rescales the input image to the given “size”.

When fit_short == True, the input image will be resized so that the shorter edge will be scaled to length size after resizing. For example, if the height of the image is larger than its width, image will be resized to (size * height / width, size).

Otherwise, the input image will be resized so that the longer edge will be scaled to length size after resizing.

Parameters:
  • img (ndarray) – An image array to be scaled. This is in CHW format.
  • size (int) – The length of the smaller edge.
  • fit_short (bool) – Determines whether to match the length of the shorter edge or the longer edge to size.
  • interpolation (int) – Determines sampling strategy. This is one of PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS. Bilinear interpolation is the default strategy.
Returns:

A scaled image in CHW format.

Return type:

ndarray

ten_crop
chainercv.transforms.ten_crop(img, size)

Crop 10 regions from an array.

This method crops 10 regions. All regions will be in shape size. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them.

The crops are ordered in this order.

  • center crop
  • top-left crop
  • bottom-left crop
  • top-right crop
  • bottom-right crop
  • center crop (flipped horizontally)
  • top-left crop (flipped horizontally)
  • bottom-left crop (flipped horizontally)
  • top-right crop (flipped horizontally)
  • bottom-right crop (flipped horizontally)
Parameters:
  • img (ndarray) – An image array to be cropped. This is in CHW format.
  • size (tuple) – The size of output images after cropping. This value is \((height, width)\).
Returns:

The cropped arrays. The shape of tensor is \((10, C, H, W)\).

Bounding Box
crop_bbox
chainercv.transforms.crop_bbox(bbox, y_slice=None, x_slice=None, allow_outside_center=True, return_param=False)

Translate bounding boxes to fit within the cropped area of an image.

This method is mainly used together with image cropping. This method translates the coordinates of bounding boxes like translate_bbox(). In addition, this function truncates the bounding boxes to fit within the cropped area. If a bounding box does not overlap with the cropped area, this bounding box will be removed.

The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices.

Parameters:
  • bbox (ndarray) – Bounding boxes to be transformed. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
  • y_slice (slice) – The slice of y axis.
  • x_slice (slice) – The slice of x axis.
  • allow_outside_center (bool) – If this argument is False, bounding boxes whose centers are outside of the cropped area are removed. The default value is True.
  • return_param (bool) – If True, this function returns indices of kept bounding boxes.
Returns:

If return_param = False, returns an array bbox.

If return_param = True, returns a tuple whose elements are bbox, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • index (numpy.ndarray): An array holding indices of used
    bounding boxes.

Return type:

ndarray or (ndarray, dict)

flip_bbox
chainercv.transforms.flip_bbox(bbox, size, y_flip=False, x_flip=False)

Flip bounding boxes accordingly.

The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices.

Parameters:
  • bbox (ndarray) – An array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes.
  • size (tuple) – A tuple of length 2. The height and the width of the image before resized.
  • y_flip (bool) – Flip bounding box according to a vertical flip of an image.
  • x_flip (bool) – Flip bounding box according to a horizontal flip of an image.
Returns:

Bounding boxes flipped according to the given flips.

Return type:

ndarray

resize_bbox
chainercv.transforms.resize_bbox(bbox, in_size, out_size)

Resize bounding boxes according to image resize.

The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices.

Parameters:
  • bbox (ndarray) – An array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes.
  • in_size (tuple) – A tuple of length 2. The height and the width of the image before resized.
  • out_size (tuple) – A tuple of length 2. The height and the width of the image after resized.
Returns:

Bounding boxes rescaled according to the given image shapes.

Return type:

ndarray

translate_bbox
chainercv.transforms.translate_bbox(bbox, y_offset=0, x_offset=0)

Translate bounding boxes.

This method is mainly used together with image transforms, such as padding and cropping, which translates the left top point of the image from coordinate \((0, 0)\) to coordinate \((y, x) = (y\_offset, x\_offset)\).

The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices.

Parameters:
  • bbox (ndarray) – Bounding boxes to be transformed. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
  • y_offset (int or float) – The offset along y axis.
  • x_offset (int or float) – The offset along x axis.
Returns:

Bounding boxes translated according to the given offsets.

Return type:

ndarray

Keypoint
flip_keypoint
chainercv.transforms.flip_keypoint(keypoint, size, y_flip=False, x_flip=False)

Modify keypoints according to image flips.

Parameters:
  • keypoint (ndarray) – Keypoints in the image. The shape of this array is \((K, 2)\). \(K\) is the number of keypoints in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the keypoints.
  • size (tuple) – A tuple of length 2. The height and the width of the image which is associated with the keypoints.
  • y_flip (bool) – Modify keypoints according to a vertical flip of an image.
  • x_flip (bool) – Modify keypoints according to a horizontal flip of an image.
Returns:

Keypoints modified according to image flips.

Return type:

ndarray

resize_keypoint
chainercv.transforms.resize_keypoint(keypoint, in_size, out_size)

Change values of keypoint according to paramters for resizing an image.

Parameters:
  • keypoint (ndarray) – Keypoints in the image. The shape of this array is \((K, 2)\). \(K\) is the number of keypoint in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the keypoints.
  • in_size (tuple) – A tuple of length 2. The height and the width of the image before resized.
  • out_size (tuple) – A tuple of length 2. The height and the width of the image after resized.
Returns:

Keypoint rescaled according to the given image shapes.

Return type:

ndarray

translate_keypoint
chainercv.transforms.translate_keypoint(keypoint, y_offset=0, x_offset=0)

Translate keypoints.

This method is mainly used together with image transforms, such as padding and cropping, which translates the top left point of the image to the coordinate \((y, x) = (y\_offset, x\_offset)\).

Parameters:
  • keypoint (ndarray) – Keypoints in the image. The shape of this array is \((K, 2)\). \(K\) is the number of keypoints in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the keypoints.
  • y_offset (int or float) – The offset along y axis.
  • x_offset (int or float) – The offset along x axis.
Returns:

Keypoints modified translation of an image.

Return type:

ndarray

Visualizations

vis_bbox
chainercv.visualizations.vis_bbox(img, bbox, label=None, score=None, label_names=None, ax=None)

Visualize bounding boxes inside image.

Example

>>> from chainercv.datasets import VOCDetectionDataset
>>> from chainercv.datasets import voc_bbox_label_names
>>> from chainercv.visualizations import vis_bbox
>>> import matplotlib.pyplot as plot
>>> dataset = VOCDetectionDataset()
>>> img, bbox, label = dataset[60]
>>> vis_bbox(img, bbox, label,
...         label_names=voc_bbox_label_names)
>>> plot.show()
Parameters:
  • img (ndarray) – An array of shape \((3, height, width)\). This is in RGB format and the range of its value is \([0, 255]\).
  • bbox (ndarray) – An array of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. Each element is organized by (y_min, x_min, y_max, x_max) in the second axis.
  • label (ndarray) – An integer array of shape \((R,)\). The values correspond to id for label names stored in label_names. This is optional.
  • score (ndarray) – A float array of shape \((R,)\). Each value indicates how confident the prediction is. This is optional.
  • label_names (iterable of strings) – Name of labels ordered according to label ids. If this is None, labels will be skipped.
  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.
Returns:

Returns the Axes object with the plot for further tweaking.

Return type:

Axes

vis_image
chainercv.visualizations.vis_image(img, ax=None)

Visualize a color image.

Parameters:
  • img (ndarray) – An array of shape \((3, height, width)\). This is in RGB format and the range of its value is \([0, 255]\).
  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.
Returns:

Returns the Axes object with the plot for further tweaking.

Return type:

Axes

vis_keypoint
chainercv.visualizations.vis_keypoint(img, keypoint, kp_mask=None, ax=None)

Visualize keypoints in an image.

Example

>>> import chainercv
>>> import matplotlib.pyplot as plot
>>> dataset = chainercv.datasets.CUBKeypointDataset()
>>> img, keypoint, kp_mask = dataset[0]
>>> chainercv.visualizations.vis_keypoint(img, keypoint, kp_mask)
>>> plot.show()
Parameters:
  • img (ndarray) – An image of shape \((3, height, width)\). This is in RGB format and the range of its value is \([0, 255]\). This should be visualizable using matplotlib.pyplot.imshow(img)
  • keypoint (ndarray) – An array with keypoint pairs whose shape is \((K, 2)\), where \(K\) is the number of keypoints in the array. The second axis corresponds to \(y\) and \(x\) coordinates of the keypoint.
  • kp_mask (ndarray, optional) – A boolean array whose shape is \((K,)\). If \(i\) th index is True, the \(i\) th keypoint is not displayed. If not specified, all keypoints in keypoint will be displayed.
  • ax (matplotlib.axes.Axes, optional) – If provided, plot on this axis.
Returns:

Returns the Axes object with the plot for further tweaking.

Return type:

Axes

vis_semantic_segmentation
chainercv.visualizations.vis_semantic_segmentation(label, label_names=None, label_colors=None, ignore_label_color=(0, 0, 0), alpha=1, all_label_names_in_legend=False, ax=None)

Visualize a semantic segmentation.

Example

>>> from chainercv.datasets import VOCSemanticSegmentationDataset
>>> from chainercv.datasets         ...     import voc_semantic_segmentation_label_colors
>>> from chainercv.datasets         ...     import voc_semantic_segmentation_label_names
>>> from chainercv.visualizations import vis_image
>>> from chainercv.visualizations import vis_semantic_segmentation
>>> import matplotlib.pyplot as plot
>>> dataset = VOCSemanticSegmentationDataset()
>>> img, label = dataset[60]
>>> ax = vis_image(img)
>>> _, legend_handles = vis_semantic_segmentation(
...     label,
...     label_names=voc_semantic_segmentation_label_names,
...     label_colors=voc_semantic_segmentation_label_colors,
...     alpha=0.9, ax=ax)
>>> ax.legend(handles=legend_handles, bbox_to_anchor=(1, 1), loc=2)
>>> plot.show()
Parameters:
  • label (ndarray) – An integer array of shape \((height, width)\). The values correspond to id for label names stored in label_names.
  • label_names (iterable of strings) – Name of labels ordered according to label ids.
  • label_colors – (iterable of tuple): An iterable of colors for regular labels. Each color is RGB format and the range of its values is \([0, 255]\). If colors is None, the default color map used.
  • ignore_label_color (tuple) – Color for ignored label. This is RGB format and the range of its values is \([0, 255]\). The default value is (0, 0, 0).
  • alpha (float) – The value which determines transparency of the figure. The range of this value is \([0, 1]\). If this value is 0, the figure will be completely transparent. The default value is 1. This option is useful for overlaying the label on the source image.
  • all_label_names_in_legend (bool) – Determines whether to include all label names in a legend. If this is False, the legend does not contain the names of unused labels. An unused label is defined as a label that does not appear in label. The default value is False.
  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.
Returns:

Returns ax and legend_handles. ax is an matploblib.axes.Axes with the plot. It can be used for further tweaking. legend_handles is a list of legends. It can be passed matploblib.pyplot.legend() to show a legend.

Return type:

matploblib.axes.Axes and list of matplotlib.patches.Patch

Utils

Bounding Box Utilities
bbox_iou
chainercv.utils.bbox_iou(bbox_a, bbox_b)

Calculate the Intersection of Unions (IoUs) between bounding boxes.

IoU is calculated as a ratio of area of the intersection and area of the union.

This function accepts both numpy.ndarray and cupy.ndarray as inputs. Please note that both bbox_a and bbox_b need to be same type. The output is same type as the type of the inputs.

Parameters:
  • bbox_a (array) – An array whose shape is \((N, 4)\). \(N\) is the number of bounding boxes. The dtype should be numpy.float32.
  • bbox_b (array) – An array similar to bbox_a, whose shape is \((K, 4)\). The dtype should be numpy.float32.
Returns:

An array whose shape is \((N, K)\). An element at index \((n, k)\) contains IoUs between \(n\) th bounding box in bbox_a and \(k\) th bounding box in bbox_b.

Return type:

array

non_maximum_suppression
chainercv.utils.non_maximum_suppression(bbox, thresh, score=None, limit=None)

Suppress bounding boxes according to their IoUs.

This method checks each bounding box sequentially and selects the bounding box if the Intersection over Unions (IoUs) between the bounding box and the previously selected bounding boxes is less than thresh. This method is mainly used as postprocessing of object detection. The bounding boxes are selected from ones with higher scores. If score is not provided as an argument, the bounding box is ordered by its index in ascending order.

The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the top left and the bottom right vertices.

score is a float array of shape \((R,)\). Each score indicates confidence of prediction.

This function accepts both numpy.ndarray and cupy.ndarray as inputs. Please note that both bbox and score need to be same type. The output is same type as the type of the inputs.

Parameters:
  • bbox (array) – Bounding boxes to be transformed. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
  • thresh (float) – Threshold of IoUs.
  • score (array) – An array of confidences whose shape is \((R,)\).
  • limit (int) – The upper bound of the number of the output bounding boxes. If it is not specified, this method selects as many bounding boxes as possible.
Returns:

An array with indices of bounding boxes that are selected. They are sorted by the scores of bounding boxes in descending order. The shape of this array is \((K,)\) and its dtype is numpy.int32. Note that \(K \leq R\).

Return type:

array

Download Utilities
cached_download
chainercv.utils.cached_download(url)

Downloads a file and caches it.

This is different from the original cached_download in that the download progress is reported.

It downloads a file from the URL if there is no corresponding cache. After the download, this function stores a cache to the directory under the dataset root (see set_dataset_root()). If there is already a cache for the given URL, it just returns the path to the cache without downloading the same file.

Parameters:url (str) – URL to download from.
Returns:Path to the downloaded file.
Return type:str
download_model
chainercv.utils.download_model(url)

Downloads a model file and puts it under model directory.

It downloads a file from the URL and puts it under model directory. For exmaple, if url is http://example.com/subdir/model.npz, the pretrained weights file will be saved to $CHAINER_DATASET_ROOT/pfnet/chainercv/models/model.npz. If there is already a file at the destination path, it just returns the path without downloading the same file.

Parameters:url (str) – URL to download from.
Returns:Path to the downloaded file.
Return type:str
extractall
chainercv.utils.extractall(file_path, destination, ext)

Extracts an archive file.

This function extracts an archive file to a destination.

Parameters:
  • file_path (str) – The path of a file to be extracted.
  • destination (str) – A directory path. The archive file will be extracted under this directory.
  • ext (str) – An extension suffix of the archive file. This function supports '.zip', '.tar', '.gz' and '.tgz'.
Image Utilities
read_image
chainercv.utils.read_image(path, dtype=<type 'numpy.float32'>, color=True)

Read an image from a file.

This function reads an image from given file. The image is CHW format and the range of its value is \([0, 255]\). If color = True, the order of the channels is RGB.

Parameters:
  • path (str) – A path of image file.
  • dtype – The type of array. The default value is float32.
  • color (bool) – This option determines the number of channels. If True, the number of channels is three. In this case, the order of the channels is RGB. This is the default behaviour. If False, this function returns a grayscale image.
Returns:

An image.

Return type:

ndarray

tile_images
chainercv.utils.tile_images(imgs, n_col, pad=2, fill=0)

Make a tile of images

Parameters:
  • imgs (numpy.ndarray) – A batch of images whose shape is BCHW.
  • n_col (int) – The number of columns in a tile.
  • pad (int) – Amount of pad. The default value is 2.
  • fill (float, tuple or ndarray) – The value of padded pixels. If it is numpy.ndarray, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels of img.
Returns:

An image array in CHW format. The size of this image is \(((H + pad) \times \lceil B / n_{n_{col}} \rceil, (W + pad) \times n_{col})\).

Return type:

ndarray

write_image
chainercv.utils.write_image(img, path)

Save an image to a file.

This function saves an image to given file. The image is in CHW format and the range of its value is \([0, 255]\).

Parameters:
  • image (ndarray) – An image to be saved.
  • path (str) – The path of an image file.
Iterator Utilities
apply_prediction_to_iterator
chainercv.utils.apply_prediction_to_iterator(predict, iterator, hook=None)

Apply a prediction function/method to an iterator.

This function applies a prediction function/method to an iterator. It assumes that the iterator returns a batch of images or a batch of tuples whose first element is an image. In the case that it returns a batch of tuples, the rests are treated as ground truth values.

>>> imgs = next(iterator)
>>> # imgs: [img]
or
>>> batch = next(iterator)
>>> # batch: [(img, gt_val0, gt_val1)]

This function applys predict() to a batch of images and gets predicted value(s). predict() should take a batch of images and return a batch of prediction values or a tuple of batches of prediction values.

>>> pred_vals0 = predict(imgs)
>>> # pred_vals0: [pred_val0]
or
>>> pred_vals0, pred_vals1 = predict(imgs)
>>> # pred_vals0: [pred_val0]
>>> # pred_vals1: [pred_val1]

Here is an exmple, which applies a pretrained Faster R-CNN to PASCAL VOC dataset.

>>> from chainer import iterators
>>>
>>> from chainercv.datasets import VOCDetectionDataset
>>> from chainercv.links import FasterRCNNVGG16
>>> from chainercv.utils import apply_prediction_to_iterator
>>>
>>> dataset = VOCDetectionDataset(year='2007', split='test')
>>> # next(iterator) -> [(img, gt_bbox, gt_label)]
>>> iterator = iterators.SerialIterator(
...     dataset, 2, repeat=False, shuffle=False)
>>>
>>> # model.predict([img]) -> ([pred_bbox], [pred_label], [pred_score])
>>> model = FasterRCNNVGG16(pretrained_model='voc07')
>>>
>>> imgs, pred_values, gt_values = apply_prediction_to_iterator(
...     model.predict, iterator)
>>>
>>> # pred_values contains three iterators
>>> pred_bboxes, pred_labels, pred_scores = pred_values
>>> # gt_values contains two iterators
>>> gt_bboxes, gt_labels = gt_values
Parameters:
  • predict – A callable that takes a batch of images and returns prediction.
  • iterator (chainer.Iterator) – An iterator. Each sample should have an image as its first element. This image is passed to predict() as an argument. The rests are treated as ground truth values.
  • hook – A callable that is called after each iteration. imgs, pred_values and gt_values are passed as arguments. Note that these values do not contain data from the previous iterations.
Returns:

This function returns an iterator and two tuples of iterators: imgs, pred_values and gt_values.

  • imgs: An iterator that returns an image.
  • pred_values: A tuple of iterators. Each iterator returns a corresponding predicted value. For example, if predict() returns ([pred_val0], [pred_val1]), next(pred_values[0]) and next(pred_values[1]) will be pred_val0 and pred_val1.
  • gt_values: A tuple of iterators. Each iterator returns a corresponding ground truth value. For example, if the iterator returns [(img, gt_val0, gt_val1)], next(gt_values[0]) and next(gt_values[1]) will be gt_val0 and gt_val1. If the input iterator does not give any ground truth values, this tuple will be empty.

Return type:

An iterator and two tuples of iterators

unzip
chainercv.utils.unzip(iterable)

Converts an iterable of tuples into a tuple of iterators.

This function converts an iterable of tuples into a tuple of iterators. This is an inverse function of six.moves.zip().

>>> from chainercv.utils import unzip
>>> data = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
>>> int_iter, str_iter = unzip(data)
>>>
>>> next(int_iter)  # 0
>>> next(int_iter)  # 1
>>> next(int_iter)  # 2
>>>
>>> next(str_iter)  # 'a'
>>> next(str_iter)  # 'b'
>>> next(str_iter)  # 'c'
Parameters:iterable (iterable) – An iterable of tuples. All tuples should have the same length.
Returns:Each iterator corresponds to each element of input tuple. Note that each iterator stores values until they are popped. To reduce memory usage, it is recommended to delete unused iterators.
Return type:tuple of iterators
Testing Utilities
assert_is_bbox
chainercv.utils.assert_is_bbox(bbox, size=None)

Checks if bounding boxes satisfy bounding box format.

This function checks if given bounding boxes satisfy bounding boxes format or not. If the bounding boxes do not satifiy the format, this function raises an AssertionError.

Parameters:
  • bbox (ndarray) – Bounding boxes to be checked.
  • size (tuple of ints) – The size of an image. If this argument is specified, Each bounding box should be within the image.
assert_is_bbox_dataset
chainercv.utils.assert_is_bbox_dataset(dataset, n_fg_class, n_example=None)

Checks if a dataset satisfies bounding box dataset APIs.

This function checks if a given dataset satisfies bounding box dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an AssertionError.

Parameters:
  • dataset – A dataset to be checked.
  • n_fg_class (int) – The number of foreground classes.
  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.
assert_is_image
chainercv.utils.assert_is_image(img, color=True, check_range=True)

Checks if an image satisfies image format.

This function checks if a given image satisfies image format or not. If the image does not satifiy the format, this function raises an AssertionError.

Parameters:
  • img (ndarray) – An image to be checked.
  • color (bool) – A boolean that determines the expected channel size. If it is True, the number of channels should be 3. Otherwise, it should be 1. The default value is True.
  • check_range (bool) – A boolean that determines whether the range of values are checked or not. If it is True, The values of image must be in \([0, 255]\). Otherwise, this function does not check the range. The default value is True.
assert_is_label_dataset
chainercv.utils.assert_is_label_dataset(dataset, n_class, n_example=None, color=True)

Checks if a dataset satisfies label dataset APIs.

This function checks if a given dataset satisfies label dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an AssertionError.

Parameters:
  • dataset – A dataset to be checked.
  • n_class (int) – The number of classes.
  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.
  • color (bool) – A boolean that determines the expected channel size. If it is True, the number of channels should be 3. Otherwise, it should be 1. The default value is True.
assert_is_semantic_segmentation_dataset
chainercv.utils.assert_is_semantic_segmentation_dataset(dataset, n_class, n_example=None)

Checks if a dataset satisfies semantic segmentation dataset APIs.

This function checks if a given dataset satisfies semantic segmentation dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an AssertionError.

Parameters:
  • dataset – A dataset to be checked.
  • n_class (int) – The number of classes including background.
  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.
generate_random_bbox
chainercv.utils.generate_random_bbox(n, img_size, min_length, max_length)

Generate valid bounding boxes with random position and shape.

Parameters:
  • n (int) – The number of bounding boxes.
  • img_size (tuple) – A tuple of length 2. The height and the width of the image on which bounding boxes locate.
  • min_length (float) – The minimum length of edges of bounding boxes.
  • max_length (float) – The maximum length of edges of bounding boxes.
Returns:

Coordinates of bounding boxes. Its shape is \((R, 4)\). Here, \(R\) equals n. The second axis contains \(y_{min}, x_{min}, y_{max}, x_{max}\), where \(min\_length \leq y_{max} - y_{min} < max\_length\). and \(min\_length \leq x_{max} - x_{min} < max\_length\)

Return type:

numpy.ndarray

Indices and tables