ChainerCV¶
ChainerCV is a deep learning based computer vision library built on top of Chainer.
Installation Guide¶
Anaconda¶
Build instruction using Anaconda is as follows.
# For python 3
# wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
conda config --set always_yes yes --set changeps1 no
conda update -q conda
# Download ChainerCV and go to the root directory of ChainerCV
git clone https://github.com/chainer/chainercv
cd chainercv
conda env create -f environment.yml
source activate chainercv
# Install ChainerCV
pip install -e .
# Try our demos at examples/* !
ChainerCV Tutorial¶
Object Detection Tutorial¶
This tutorial will walk you through the features related to object detection that ChainerCV supports.
We assume that readers have a basic understanding of Chainer framework (e.g. understand chainer.Link
).
For users new to Chainer, please first read Introduction to Chainer.
In ChainerCV, we define the object detection task as a problem of, given an image, bounding box based localization and categorization of objects. ChainerCV supports the task by providing the following features:
- Visualization
- BboxDataset
- Detection Link
- DetectionEvaluator
- Training script for various detection models
Here is a short example that conducts inference and visualizes output.
Please download an image from a link below, and name it as sample.jpg
.
https://cloud.githubusercontent.com/assets/2062128/26187667/9cb236da-3bd5-11e7-8bcf-7dbd4302e2dc.jpg
# In the rest of the tutorial, we assume that the `plt`
# is imported before every code snippet.
import matplotlib.pyplot as plt
from chainercv.datasets import voc_bbox_label_names
from chainercv.links import SSD300
from chainercv.utils import read_image
from chainercv.visualizations import vis_bbox
# Read an RGB image and return it in CHW format.
img = read_image('sample.jpg')
model = SSD300(pretrained_model='voc0712')
bboxes, labels, scores = model.predict([img])
vis_bbox(img, bboxes[0], labels[0], scores[0],
label_names=voc_bbox_label_names)
plt.show()
Bounding boxes in ChainerCV¶
Bounding boxes in an image are represented as a two-dimensional array of shape \((R, 4)\),
where \(R\) is the number of bounding boxes and the second axis corresponds to the coordinates of bounding boxes.
The coordinates are ordered in the array by (y_min, x_min, y_max, x_max)
, where
(y_min, x_min)
and (y_max, x_max)
are the (y, x)
coordinates of the top left and the bottom right vertices.
Notice that ChainerCV orders coordinates in yx
order, which is the opposite of the convention used by other libraries such as OpenCV.
This convention is adopted because it is more consistent with the memory order of an image that follows row-column order.
Also, the dtype
of bounding box array is numpy.float32
.
Here is an example with a simple toy data.
from chainercv.visualizations import vis_bbox
import numpy as np
img = np.zeros((3, 224, 224), dtype=np.float32)
# We call a variable/array of bounding boxes as `bbox` throughout the library
bbox = np.array([[10, 10, 20, 40], [150, 150, 200, 200]], dtype=np.float32)
vis_bbox(img, bbox)
plt.show()
In this example, two bounding boxes are displayed on top of a black image.
vis_bbox()
is a utility function that visualizes
bounding boxes and an image together.
Bounding Box Dataset¶
ChainerCV supports dataset loaders, which can be used to easily index examples with list-like interfaces.
Dataset classes whose names end with BboxDataset
contain annotations of where objects locate in an image and which categories they are assigned to.
These datasets can be indexed to return a tuple of an image, bounding boxes and labels.
The labels are stored in an np.int32
array of shape \((R,)\). Each element corresponds to a label of an object in the corresponding bounding box.
A mapping between an integer label and a category differs between datasets.
This mapping can be obtained from objects whose names end with label_names
, such as voc_bbox_label_names
.
These mappings become helpful when bounding boxes need to be visualized with label names.
In the next example, the interface of BboxDataset
and the functionality of vis_bbox()
to visualize label names are illustrated.
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.visualizations import vis_bbox
dataset = VOCBboxDataset(year='2012')
img, bbox, label = dataset[0]
print(bbox.shape) # (2, 4)
print(label.shape) # (2,)
vis_bbox(img, bbox, label, label_names=voc_bbox_label_names)
plt.show()
Note that the example downloads VOC 2012 dataset at runtime when it is used for the first time on the machine.
Detection Link¶
ChainerCV provides several network implementations that carry out object detection.
For example, Single Shot MultiBox Detector (SSD) [Liu16] and Faster R-CNN [Ren15] are supported.
Despite the difference between the models in how prediction is carried out internally,
they support the common method for prediction called predict()
.
This method takes a list of images and returns prediction result, which is a tuple of lists bboxes, labels, scores
.
The more description can be found here (predict()
).
Inference on these models runs smoothly by downloading necessary pre-trained weights from the internet automatically.
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.links import SSD300
from chainercv.visualizations import vis_bbox
dataset = VOCBboxDataset(year='2007', split='test')
img_0, _, _ = dataset[0]
img_1, _, _ = dataset[1]
model = SSD300(pretrained_model='voc0712')
# Note that `predict` takes a list of images.
bboxes, labels, scores = model.predict([img_0, img_1])
# Visualize output of the first image on the left and
# the second image on the right.
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
vis_bbox(img_0, bboxes[0], labels[0], scores[0],
label_names=voc_bbox_label_names, ax=ax1)
vis_bbox(img_1, bboxes[1], labels[1], scores[1],
label_names=voc_bbox_label_names, ax=ax2)
plt.show()
The above example puts together functionality of detection link.
It instantiates SSD300 model with weights trained on VOC 2007 and VOC 2012 datasets.
The model runs prediction using predict()
, and the outputs are visualized using
vis_bbox()
.
Note that in this case, confidence scores are visualized together with other data.
Many detection algorithms post-process bounding box proposals calculated from the output of neural networks by removing unnecessary ones.
Faster R-CNN and SSD use non-maximum suppression to remove overlapping bounding boxes.
Also, they remove bounding boxes with low confidence scores.
These two models have attributes nms_thresh
and score_thresh
, which configure the post-processing.
In the following example, the algorithm runs with a very low score_thresh
so that bounding boxes with low scores are kept.
It is known that lower score_thresh
produces higher mAP.
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.links import SSD300
from chainercv.visualizations import vis_bbox
dataset = VOCBboxDataset(year='2007', split='test')
img, _, _ = dataset[0]
model = SSD300(pretrained_model='voc0712')
# Alternatively, you can use predefined parameters by
# model.use_preset('evaluate')
model.score_thresh = 0.01
bboxes, labels, scores = model.predict([img])
vis_bbox(img, bboxes[0], labels[0], scores[0],
label_names=voc_bbox_label_names)
plt.show()
Detection Evaluator¶
ChainerCV provides functionalities that make evaluating detection links easy. They are provided at two levels: evaluator extensions and evaluation functions.
Evaluator extensions such as DetectionVOCEvaluator
inherit from Evaluator
, and have similar interface.
They are initialized by taking an iterator and a network that carries out prediction with method predict()
.
When this class is called (i.e. __call__()
of DetectionVOCEvaluator
), several actions are taken.
First, it iterates over a dataset based on an iterator.
Second, the network makes prediction using the images collected from the dataset.
Last, an evaluation function is called with the ground truth annotations and the prediction results.
In contrast to evaluators that hide details,
evaluation functions such as eval_detection_voc()
are provided for those who need a finer level of control.
These functions take the ground truth annotations and prediction results as arguments
and return measured performance.
Here is a simple example that uses a detection evaluator.
from chainer.iterators import SerialIterator
from chainer.datasets import SubDataset
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.extensions import DetectionVOCEvaluator
from chainercv.links import SSD300
# Only use subset of dataset so that evaluation finishes quickly.
dataset = VOCBboxDataset(year='2007', split='test')
dataset = dataset[:6]
it = SerialIterator(dataset, 2, repeat=False, shuffle=False)
model = SSD300(pretrained_model='voc0712')
evaluator = DetectionVOCEvaluator(it, model,
label_names=voc_bbox_label_names)
# result is a dictionary of evaluation scores. Print it and check it.
result = evaluator()
Training Detection Links¶
By putting together all the functions and utilities, training scripts can be easily written. Please check training scripts contained in the examples. Also, ChainerCV posts the performance achieved through running the training script in README.
References¶
[Ren15] | Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. |
[Liu16] | Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. |
Tips using Links¶
Fine-tuning¶
Models in ChainerCV support the argument pretrained_model
to load pretrained weights.
This functionality is limited in the case when fine-tuning pretrained weights.
In that circumstance, the layers specific to the classes of the original dataset may need to be randomly initialized.
In this section, we give a procedure to cope with this problem.
Copying a subset of weights in a chain can be done in few lines of code. Here is a block of code that does this.
# src is a model with pretrained weights
# dst is a model randomly initialized
# ignore_names is the name of parameters to skip
# For the case of VGG16, this should be ['/fc7/W', '/fc7/b']
ignore_names = []
src_params = {p[0]: p[1] for p in src.namedparams()}
for dst_named_param in dst.namedparams():
name = dst_named_param[0]
if name not in ignore_names:
dst_named_param[1].array[:] = src_params[name].array[:]
Fine-tuning to a dataset with a different number of classes¶
When the number of classes of the target dataset is different from the source dataset during fine-tuning, the names of the weights to skip can be found automatically with the following method.
def get_shape_mismatch_names(src, dst):
# all parameters are assumed to be initialized
mismatch_names = []
src_params = {p[0]: p[1] for p in src.namedparams()}
for dst_named_param in dst.namedparams():
name = dst_named_param[0]
dst_param = dst_named_param[1]
src_param = src_params[name]
if src_param.shape != dst_param.shape:
mismatch_names.append(name)
return mismatch_names
Finally, this is a complete example using SSD300.
from chainercv.links import SSD300
import numpy as np
src = SSD300(pretrained_model='voc0712')
# the number of classes in VOC is different from 50
dst = SSD300(n_fg_class=50)
# initialized weights
dst(np.zeros((1, 3, dst.insize, dst.insize), dtype=np.float32))
# the method described above
ignore_names = get_shape_mismatch_names(src, dst)
src_params = {p[0]: p[1] for p in src.namedparams()}
for dst_named_param in dst.namedparams():
name = dst_named_param[0]
if name not in ignore_names:
dst_named_param[1].array[:] = src_params[name].array[:]
# check that weights are transfered
np.testing.assert_equal(dst.extractor.conv1_1.W.data,
src.extractor.conv1_1.W.data)
# the names of the weights that are skipped
print(ignore_names)
Sliceable Dataset¶
This tutorial will walk you through the features related to sliceable dataset.
We assume that readers have a basic understanding of Chainer dataset (e.g. understand chainer.dataset.DatasetMixin
).
In ChainerCV, we introduce sliceable feature to datasets.
Sliceable datasets support slice()
that returns a view of the dataset.
This example that shows the basic usage.
# VOCBboxDataset supports sliceable feature
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()
# keys returns the names of data
print(dataset.keys) # ('img', 'bbox', 'label')
# we can get an example by []
img, bbox, label = dataset[0]
# get a view of the first 100 examples
view = dataset.slice[:100]
print(len(view)) # 100
# get a view of image and label
view = dataset.slice[:, ('img', 'label')]
# the view also supports sliceable, so that we can call keys
print(view.keys) # ('img', 'label')
# we can get an example by []
img, label = view[0]
Motivation¶
slice()
returns a view of the dataset without conducting data loading,
where DatasetMixin.__getitem__()
conducts get_example()
for all required examples.
Users can write efficient code by this view.
This example counts the number of images that contain dogs. With the sliceable feature, we can access the label information without loading images from disk.. Therefore, the first case becomes faster.
import time
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
dataset = VOCBboxDataset()
dog_lb = voc_bbox_label_names.index('dog')
# with slice
t = time.time()
count = 0
# get a view of label
view = dataset.slice[:, 'label']
for i in range(len(view)):
# we can focus on label
label = view[i]
if dog_lb in label:
count += 1
print('w/ slice: {} secs'.format(time.time() - t))
print('{} images contain dogs'.format(count))
print()
# without slice
t = time.time()
count = 0
for i in range(len(dataset)):
# img and bbox are loaded but not needed
img, bbox, label = dataset[i]
if dog_lb in label:
count += 1
print('w/o slice: {} secs'.format(time.time() - t))
print('{} images contain dogs'.format(count))
print()
Usage: slice along with the axis of examples¶
slice()
takes indices of examples as its first argument.
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()
# the view of the first 100 examples
view = dataset.slice[:100]
# the view of the last 100 examples
view = dataset.slice[-100:]
# the view of the 3rd, 5th, and 7th examples
view = dataset.slice[3:8:2]
# the view of the 3rd, 1st, and 4th examples
view = dataset.slice[[3, 1, 4]]
Also, it can take a list of booleans as its first argument.
Note that the length of the list should be the same as len(dataset)
.
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()
# make booleans
bboxes = dataset.slice[:, 'bbox']
booleans = [len(bbox) >= 3 for bbox in bboxes]
# a collection of samples that contain at least three bounding boxes
view = dataset.slice[booleans]
Usage: slice along with the axis of data¶
slice()
takes names or indices of data as its second argument.
keys
returns all available names.
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()
# the view of image
# note that : of the first argument means all examples
view = dataset.slice[:, 'img']
print(view.keys) # 'img'
img = view[0]
# the view of image and label
view = dataset.slice[:, ('img', 'label')]
print(view.keys) # ('img', 'label')
img, label = view[0]
# the view of image (returns a tuple)
view = dataset.slice[:, ('img',)]
print(view.keys) # ('img',)
img, = view[0]
# use an index instead of a name
view = dataset.slice[:, 1]
print(view.keys) # 'bbox'
bbox = view[0]
# mixture of names and indices
view = dataset.slice[:, (1, 'label')]
print(view.keys) # ('bbox', 'label')
bbox, label = view[0]
# use booleans
# note that the number of booleans should be the same as len(dataset.keys)
view = dataset.slice[:, (True, True, False)]
print(view.keys) # ('img', 'bbox')
img, bbox = view[0]
Usage: slice along with both axes¶
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()
# the view of the labels of the first 100 examples
view = dataset.slice[:100, 'label']
Concatenate and transform¶
ChainerCV provides ConcatenatedDataset
and TransformDataset
.
The difference from chainer.datasets.ConcatenatedDataset
and
chainer.datasets.TransformDataset
is that they take sliceable dataset(s) and return a sliceable dataset.
from chainercv.chainer_experimental.datasets.sliceable import ConcatenatedDataset
from chainercv.chainer_experimental.datasets.sliceable import TransformDataset
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
dataset_07 = VOCBboxDataset(year='2007')
print('07:', dataset_07.keys, len(dataset_07)) # 07: ('img', 'bbox', 'label') 2501
dataset_12 = VOCBboxDataset(year='2012')
print('12:', dataset_12.keys, len(dataset_12)) # 12: ('img', 'bbox', 'label') 5717
# concatenate
dataset_0712 = ConcatenatedDataset(dataset_07, dataset_12)
print('0712:', dataset_0712.keys, len(dataset_0712)) # 0712: ('img', 'bbox', 'label') 8218
# transform
def transform(in_data):
img, bbox, label = in_data
dog_lb = voc_bbox_label_names.index('dog')
bbox_dog = bbox[label == dog_lb]
return img, bbox_dog
# we need to specify the names of data that the transform function returns
dataset_0712_dog = TransformDataset(dataset_0712, ('img', 'bbox_dog'), transform)
print('0712_dog:', dataset_0712_dog.keys, len(dataset_0712_dog)) # 0712_dog: ('img', 'bbox_dog') 8218
Make your own dataset¶
ChainerCV provides GetterDataset
to construct a new sliceable dataset.
This example implements a sliceable bounding box dataset.
import numpy as np
from chainercv.chainer_experimental.datasets.sliceable import GetterDataset
from chainercv.utils import generate_random_bbox
class SampleBboxDataset(GetterDataset):
def __init__(self):
super(SampleBboxDataset, self).__init__()
# register getter method for image
self.add_getter('img', self.get_image)
# register getter method for bbox and label
self.add_getter(('bbox', 'label'), self.get_annotation)
def __len__(self):
return 20
def get_image(self, i):
print('get_image({})'.format(i))
# generate dummy image
img = np.random.uniform(0, 255, size=(3, 224, 224)).astype(np.float32)
return img
def get_annotation(self, i):
print('get_annotation({})'.format(i))
# generate dummy annotations
bbox = generate_random_bbox(10, (224, 224), 10, 224)
label = np.random.randint(0, 9, size=10).astype(np.int32)
return bbox, label
dataset = SampleBboxDataset()
img, bbox, label = dataset[0] # get_image(0) and get_annotation(0)
view = dataset.slice[:, 'label']
label = view[1] # get_annotation(1)
If you have arrays of data, you can use TupleDataset
.
import numpy as np
from chainercv.chainer_experimental.datasets.sliceable import TupleDataset
from chainercv.utils import generate_random_bbox
n = 20
imgs = np.random.uniform(0, 255, size=(n, 3, 224, 224)).astype(np.float32)
bboxes = [generate_random_bbox(10, (224, 224), 10, 224) for _ in range(n)]
labels = np.random.randint(0, 9, size=(n, 10)).astype(np.int32)
dataset = TupleDataset(('img', imgs), ('bbox', bboxes), ('label', labels))
print(dataset.keys) # ('img', 'bbox', 'label')
view = dataset.slice[:, 'label']
label = view[1]
ChainerCV Reference Manual¶
Chainer Experimental¶
This module contains WIP modules of Chainer. After they are merged into chainer, these modules will be removed from ChainerCV.
Datasets¶
Sliceable¶
Sliceable¶
This module support sliceable feature. Please note that this module will be removed after Chainer implements sliceable feature.
-
class
chainercv.chainer_experimental.datasets.sliceable.
ConcatenatedDataset
(*datasets)[source]¶ A sliceable version of
chainer.datasets.ConcatenatedDataset
.Here is an example.
>>> dataset_a = TupleDataset([0, 1, 2], [0, 1, 4]) >>> dataset_b = TupleDataset([3, 4, 5], [9, 16, 25]) >>> >>> dataset = ConcatenatedDataset(dataset_a, dataset_b) >>> dataset.slice[:, 0][:] # [0, 1, 2, 3, 4, 5]
Parameters: datasets – The underlying datasets. Each dataset should inherit Sliceabledataset
and should have the same keys.
-
class
chainercv.chainer_experimental.datasets.sliceable.
GetterDataset
[source]¶ A sliceable dataset class that is defined with getters.
This is a dataset class with getters. Please refer to the tutorial for more detailed explanation.
Here is an example.
>>> class SliceableLabeledImageDataset(GetterDataset): >>> def __init__(self, pairs, root='.'): >>> super(SliceableLabeledImageDataset, self).__init__() >>> with open(pairs) as f: >>> self._pairs = [l.split() for l in f] >>> self._root = root >>> >>> self.add_getter('img', self.get_image) >>> self.add_getter('label', self.get_label) >>> >>> def __len__(self): >>> return len(self._pairs) >>> >>> def get_image(self, i): >>> path, _ = self._pairs[i] >>> return read_image(os.path.join(self._root, path)) >>> >>> def get_label(self, i): >>> _, label = self._pairs[i] >>> return np.int32(label) >>> >>> dataset = SliceableLabeledImageDataset('list.txt') >>> >>> # get a subset with label = 0, 1, 2 >>> # no images are loaded >>> indices = [i for i, label in ... enumerate(dataset.slice[:, 'label']) if label in {0, 1, 2}] >>> dataset_012 = dataset.slice[indices]
-
class
chainercv.chainer_experimental.datasets.sliceable.
TupleDataset
(*datasets)[source]¶ A sliceable version of
chainer.datasets.TupleDataset
.Here is an example.
>>> # omit keys >>> dataset = TupleDataset([0, 1, 2], [0, 1, 4]) >>> dataset.keys # (None, None) >>> dataset.slice[:, 0][:] # [0, 1, 2] >>> >>> dataset_more = TupleDataset(dataset, [0, 1, 8]) >>> dataset_more.keys # (None, None, None) >>> dataset_more.slice[:, [1, 2]][:] # [(0, 0), (1, 1), (4, 8)] >>> >>> # specify the name of a key >>> named_dataset = TupleDataset(('feat0', [0, 1, 2]), [0, 1, 4]) >>> named_dataset.keys # ('feat0', None) >>> # slice takes both key and index (or their mixture) >>> named_dataset.slice[:, ['feat0', 1]][:] # [(0, 0), (1, 1), (2, 4)]
Parameters: datasets – The underlying datasets. The following datasets are acceptable.
- An inheritance of :class:~chainer.datasets.sliceable.SliceableDataset`.
- A tuple of a name and a data array. The data array should be list or
numpy.ndarray
. - A data array. In this case, the name of key is
None
.
-
class
chainercv.chainer_experimental.datasets.sliceable.
TransformDataset
(dataset, keys, transform)[source]¶ A sliceable version of
chainer.datasets.TransformDataset
.Note that it reuqires
keys
to determine the names of returned values.Here is an example.
>>> def transfrom(in_data): >>> img, bbox, label = in_data >>> ... >>> return new_img, new_label >>> >>> dataset = TramsformDataset(dataset, ('img', 'label'), transform) >>> dataset.keys # ('img', 'label')
Parameters: - dataset – The underlying dataset.
This dataset should have
__len__()
and__getitem__()
. - keys (int or string or tuple of strings) – The number or name(s) of data that the transform function returns.
- transform (callable) – A function that is called to transform values
returned by the underlying dataset’s
__getitem__()
.
- dataset – The underlying dataset.
This dataset should have
Training¶
Extensions¶
Extensions¶
-
chainercv.chainer_experimental.training.extensions.
make_shift
(attr, optimizer=None)[source]¶ Decorator to make shift extensions.
This decorator wraps a function and makes a shift extension. Base function should take
trainer
and return a new value ofattr
.Here is an example.
>>> # define an extension that updates 'lr' attribute >>> @make_shift('lr') >>> def warmup(trainer): >>> base_lr = 0.01 >>> rate = 0.1 >>> >>> iteration = trainer.updater.iteration >>> if iteration < 1000: >>> return base_lr * (rate + (1 - rate) * iteraion / 1000) >>> else: >>> return base_lr >>> >>> # use the extension >>> trainer.extend(warmup)
Parameters: - attr (str) – Name of the attribute to shift.
- optimizer (Optimizer) – Target optimizer to adjust the
attribute. If it is
None
, the main optimizer of the updater is used.
Datasets¶
General datasets¶
DirectoryParsingLabelDataset¶
-
class
chainercv.datasets.
DirectoryParsingLabelDataset
(root, check_img_file=None, color=True, numerical_sort=False)[source]¶ A label dataset whose label names are the names of the subdirectories.
The label names are the names of the directories that locate a layer below the root directory. All images locating under the subdirectoies will be categorized to classes with subdirectory names. An image is parsed only when the function
check_img_file
returnsTrue
by taking the path to the image as an argument. Ifcheck_img_file
isNone
, the path with any image extensions will be parsed.Example
A directory structure should be one like below.
root |-- class_0 | |-- img_0.png | |-- img_1.png | --- class_1 |-- img_0.png
>>> from chainercv.datasets import DirectoryParsingLabelDataset >>> dataset = DirectoryParsingLabelDataset('root') >>> dataset.paths ['root/class_0/img_0.png', 'root/class_0/img_1.png', 'root_class_1/img_0.png'] >>> dataset.labels array([0, 0, 1])
Parameters: - root (string) – The root directory.
- check_img_file (callable) – A function to determine if a file should be included in the dataset.
- color (bool) – If
True
, this dataset read images as color images. The default value isTrue
. - numerical_sort (bool) – Label names are sorted numerically.
This means that label
2
is before label10
, which is not the case when string sort is used. Regardless of this option, string sort is used for the order of files with the same label. The default value isFalse
.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) [1] float32
RGB, \([0, 255]\) label
scalar int32
\([0, \#class - 1]\) [1] \((1, H, W)\) if color = False
.
directory_parsing_label_names¶
-
chainercv.datasets.
directory_parsing_label_names
(root, numerical_sort=False)[source]¶ Get label names from the directories that are named by them.
The label names are the names of the directories that locate a layer below the root directory.
The label names can be used together with
DirectoryParsingLabelDataset
. The index of a label name corresponds to the label id that is used by the dataset to refer the label.Parameters: Returns: Sorted names of classes.
Return type: list of strings
MixUpSoftLabelDataset¶
-
class
chainercv.datasets.
MixUpSoftLabelDataset
(dataset, n_class)[source]¶ Dataset which returns mixed images and labels for mixup learning [2].
MixUpSoftLabelDataset
mixes two pairs of labeled images fetched from the base dataset.Unlike LabeledImageDatasets, label is a one-dimensional float array with at most two nonnegative weights (i.e. soft label). The sum of the two weights is one.
Example
We construct a mixup dataset from MNIST.
>>> from chainer.datasets import get_mnist >>> from chainercv.datasets import SiameseDataset >>> from chainercv.datasets import MixUpSoftLabelDataset >>> mnist, _ = get_mnist() >>> base_dataset = SiameseDataset(mnist, mnist) >>> dataset = MixUpSoftLabelDataset(base_dataset, 10) >>> mixed_image, mixed_label = dataset[0] >>> mixed_label.shape (10,) >>> mixed_label.dtype dtype('float32')
Parameters: - dataset –
The underlying dataset. The dataset returns
img_0, label_0, img_1, label_1
, which is a tuple containing two pairs of an image and a label. Typically, dataset is SiameseDataset.The shapes of images and labels should be constant.
- n_class (int) – The number of classes in the base dataset.
[2] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz. mixup: Beyond Empirical Risk Minimization. arXiv 2017. This dataset returns the following data.
name shape dtype format img
[3] [3] [3] label
\((\#class,)\) float32
\([0, 1]\) [3] (1, 2, 3) Same as dataset
.- dataset –
SiameseDataset¶
-
class
chainercv.datasets.
SiameseDataset
(dataset_0, dataset_1, pos_ratio=None, length=None, labels_0=None, labels_1=None)[source]¶ A dataset that returns samples fetched from two datasets.
The dataset returns samples from the two base datasets. If
pos_ratio
is notNone
,SiameseDataset
can be configured to return positive pairs at the ratio ofpos_ratio
and negative pairs at the ratio of1 - pos_ratio
. In this mode, the base datasets are assumed to be label datasets that return an image and a label as a sample.Example
We construct a siamese dataset from MNIST.
>>> from chainer.datasets import get_mnist >>> from chainercv.datasets import SiameseDataset >>> mnist, _ = get_mnist() >>> dataset = SiameseDataset(mnist, mnist, pos_ratio=0.3) # The probability of the two samples having the same label # is 0.3 as specified by pos_ratio. >>> img_0, label_0, img_1, label_1 = dataset[0] # The returned examples may change in the next # call even if the index is the same as before # because SiameseDataset picks examples randomly # (e.g., img_0_new may differ from img_0). >>> img_0_new, label_0_new, img_1_new, label_1_new = dataset[0]
Parameters: - dataset_0 – The first base dataset.
- dataset_1 – The second base dataset.
- pos_ratio (float) – If this is not
None
, this dataset tries to construct positive pairs at the given rate. IfNone
, this dataset randomly samples examples from the base datasets. The default value isNone
. - length (int) – The length of this dataset. If
None
, the length of the first base dataset is the length of this dataset. - labels_0 (numpy.ndarray) – The labels associated to
the first base dataset. The length should be the same as
the length of the first dataset. If this is
None
, the labels are automatically fetched using the following line of code:[ex[1] for ex in dataset_0]
. By settinglabels_0
and skipping the fetching iteration, the computation cost can be reduced. Also, ifpos_ratio
isNone
, this value is ignored. The default value isNone
. Iflabels_1
is spcified anddataset_0
anddataset_1
are the same,labels_0
can be skipped. - labels_1 (numpy.ndarray) – The labels associated to
the second base dataset. If
labels_0
is spcified anddataset_0
anddataset_1
are the same,labels_1
can be skipped. Please consult the explanation forlabels_0
.
This dataset returns the following data.
name shape dtype format img_0
[4] [4] [4] label_0
scalar int32
\([0, \#class - 1]\) img_1
[5] [5] [5] label_1
scalar int32
\([0, \#class - 1]\) [4] (1, 2, 3) Same as dataset_0
.[5] (1, 2, 3) Same as dataset_1
.
ADE20K¶
ADE20KSemanticSegmentationDataset¶
-
class
chainercv.datasets.
ADE20KSemanticSegmentationDataset
(data_dir='auto', split='train')[source]¶ Semantic segmentation dataset for ADE20K.
This is ADE20K dataset distributed in MIT Scene Parsing Benchmark website. It has 20,210 training images and 2,000 validation images.
Parameters: - data_dir (string) – Path to the dataset directory. The directory should
contain the
ADEChallengeData2016
directory. And that directory should contain at leastimages
andannotations
directries. Ifauto
is given, the dataset is automatically downloaded into$CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k
. - split ({'train', 'val'}) – Select from dataset splits used in MIT Scene Parsing Benchmark dataset (ADE20K).
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
\((H, W)\) int32
\([-1, \#class - 1]\) - data_dir (string) – Path to the dataset directory. The directory should
contain the
ADE20KTestImageDataset¶
-
class
chainercv.datasets.
ADE20KTestImageDataset
(data_dir='auto')[source]¶ Image dataset for test split of ADE20K.
This is an image dataset of test split in ADE20K dataset distributed at MIT Scene Parsing Benchmark website. It has 3,352 test images.
Parameters: data_dir (string) – Path to the dataset directory. The directory should contain the release_test
dir. Ifauto
is given, the dataset is automatically downloaded into$CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k
.This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\)
CamVid¶
CamVidDataset¶
-
class
chainercv.datasets.
CamVidDataset
(data_dir='auto', split='train')[source]¶ Semantic segmentation dataset for CamVid.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/camvid
. - split ({'train', 'val', 'test'}) – Select from dataset splits used in CamVid Dataset.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
\((H, W)\) int32
\([-1, \#class - 1]\) - data_dir (string) – Path to the root of the training data. If this is
Cityscapes¶
CityscapesSemanticSegmentationDataset¶
-
class
chainercv.datasets.
CityscapesSemanticSegmentationDataset
(data_dir='auto', label_resolution=None, split='train', ignore_labels=True)[source]¶ Semantic segmentation dataset for Cityscapes dataset.
Note
Please manually download the data because it is not allowed to re-distribute Cityscapes dataset.
Parameters: - data_dir (string) – Path to the dataset directory. The directory should
contain at least two directories,
leftImg8bit
and eithergtFine
orgtCoarse
. Ifauto
is given, it uses$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes
by default. - label_resolution ({'fine', 'coarse'}) – The resolution of the labels. It
should be either
fine
orcoarse
. - split ({'train', 'val'}) – Select from dataset splits used in Cityscapes dataset.
- ignore_labels (bool) – If
True
, the labels markedignoreInEval
defined in the original cityscapesScripts will be replaced with-1
in theget_example()
method. The default value isTrue
.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
\((H, W)\) int32
\([-1, \#class - 1]\) - data_dir (string) – Path to the dataset directory. The directory should
contain at least two directories,
CityscapesTestImageDataset¶
-
class
chainercv.datasets.
CityscapesTestImageDataset
(data_dir='auto')[source]¶ Image dataset for test split of Cityscapes dataset.
Note
Please manually download the data because it is not allowed to re-distribute Cityscapes dataset.
Parameters: data_dir (string) – Path to the dataset directory. The directory should contain the leftImg8bit
directory. Ifauto
is given, it uses$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes
by default.This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\)
CUB¶
CUBLabelDataset¶
-
class
chainercv.datasets.
CUBLabelDataset
(data_dir='auto', return_bb=False, prob_map_dir='auto', return_prob_map=False)[source]¶ Caltech-UCSD Birds-200-2011 dataset with annotated class labels.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/cub
. - return_bb (bool) – If
True
, this returns a bounding box around a bird. The default value isFalse
. - prob_map_dir (string) – Path to the root of the probability maps.
If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/cub
. - return_prob_map (bool) – Decide whether to include a probability map of
the bird in a tuple served for a query. The default value is
False
.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
scalar int32
\([0, \#class - 1]\) bb
[6]\((4,)\) float32
\((y_{min}, x_{min}, y_{max}, x_{max})\) prob_map
[7]\((H, W)\) float32
\([0, 1]\) [6] bb
indicates the location of a bird. It is available ifreturn_bb = True
.[7] prob_map
indicates how likey a bird is located at each the pixel. It is available ifreturn_prob_map = True
.- data_dir (string) – Path to the root of the training data. If this is
CUBPointDataset¶
-
class
chainercv.datasets.
CUBPointDataset
(data_dir='auto', return_bb=False, prob_map_dir='auto', return_prob_map=False)[source]¶ Caltech-UCSD Birds-200-2011 dataset with annotated points.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/cub
. - return_bb (bool) – If
True
, this returns a bounding box around a bird. The default value isFalse
. - prob_map_dir (string) – Path to the root of the probability maps.
If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/cub
. - return_prob_map (bool) – Decide whether to include a probability map of
the bird in a tuple served for a query. The default value is
False
.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) point
\((P, 2)\) float32
\((y, x)\) mask
\((P,)\) bool
– bb
[8]\((4,)\) float32
\((y_{min}, x_{min}, y_{max}, x_{max})\) prob_map
[9]\((H, W)\) float32
\([0, 1]\) [8] bb
indicates the location of a bird. It is available ifreturn_bb = True
.[9] prob_map
indicates how likey a bird is located at each the pixel. It is available ifreturn_prob_map = True
.- data_dir (string) – Path to the root of the training data. If this is
MS COCO¶
COCOBboxDataset¶
-
class
chainercv.datasets.
COCOBboxDataset
(data_dir='auto', split='train', year='2017', use_crowded=False, return_area=False, return_crowded=False)[source]¶ Bounding box dataset for MS COCO.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/coco
. - split ({'train', 'val', 'minival', 'valminusminival'}) – Select a split of the dataset.
- year ({'2014', '2017'}) – Use a dataset released in
year
. Splitsminival
andvalminusminival
are only supported in year2014
. - use_crowded (bool) – If true, use bounding boxes that are labeled as
crowded in the original annotation. The default value is
False
. - return_area (bool) – If true, this dataset returns areas of masks
around objects. The default value is
False
. - return_crowded (bool) – If true, this dataset returns a boolean array
that indicates whether bounding boxes are labeled as crowded
or not. The default value is
False
.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) bbox
[10]\((R, 4)\) float32
\((y_{min}, x_{min}, y_{max}, x_{max})\) label
[10]\((R,)\) int32
\([0, \#fg\_class - 1]\) area
[10] [11]\((R,)\) float32
– crowded
[12]\((R,)\) bool
– [10] (1, 2, 3) If use_crowded = True
,bbox
,label
andarea
contain crowded instances.[11] area
is available ifreturn_area = True
.[12] crowded
is available ifreturn_crowded = True
.When there are more than ten objects from the same category, bounding boxes correspond to crowd of instances instead of individual instances. Please see more detail in the Fig. 12 (e) of the summary paper [13].
[13] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar. Microsoft COCO: Common Objects in Context. arXiv 2014. - data_dir (string) – Path to the root of the training data. If this is
COCOInstanceSegmentationDataset¶
-
class
chainercv.datasets.
COCOInstanceSegmentationDataset
(data_dir='auto', split='train', year='2017', use_crowded=False, return_crowded=False, return_area=False)[source]¶ Instance segmentation dataset for MS COCO.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/coco
. - split ({'train', 'val', 'minival', 'valminusminival'}) – Select a split of the dataset.
- year ({'2014', '2017'}) – Use a dataset released in
year
. Splitsminival
andvalminusminival
are only supported in year2014
. - use_crowded (bool) – If true, use masks that are labeled as crowded in the original annotation.
- return_crowded (bool) – If true, this dataset returns a boolean array
that indicates whether masks are labeled as crowded
or not. The default value is
False
. - return_area (bool) – If true, this dataset returns areas of masks around objects.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) mask
[14]\((R, H, W)\) bool
– label
[14]\((R,)\) int32
\([0, \#fg\_class - 1]\) area
[14] [15]\((R,)\) float32
– crowded
[16]\((R,)\) bool
– [14] (1, 2, 3) If use_crowded = True
,mask
,label
andarea
contain crowded instances.[15] area
is available ifreturn_area = True
.[16] crowded
is available ifreturn_crowded = True
.When there are more than ten objects from the same category, masks correspond to crowd of instances instead of individual instances. Please see more detail in the Fig. 12 (e) of the summary paper [17].
[17] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar. Microsoft COCO: Common Objects in Context. arXiv 2014. - data_dir (string) – Path to the root of the training data. If this is
COCOSemanticSegmentationDataset¶
-
class
chainercv.datasets.
COCOSemanticSegmentationDataset
(data_dir='auto', split='train')[source]¶ Semantic segmentation dataset for MS COCO.
Semantic segmentations are generated from panoptic segmentations as done in the official toolkit.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/coco
. - split ({'train', 'val'}) – Select a split of the dataset.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
\((H, W)\) int32
\([-1, \#class - 1]\) - data_dir (string) – Path to the root of the training data. If this is
OnlineProducts¶
OnlineProductsDataset¶
-
class
chainercv.datasets.
OnlineProductsDataset
(data_dir='auto', split='train')[source]¶ Dataset class for Stanford Online Products Dataset.
The
split
selects train and test split of the dataset as done in [18]. The train split contains the first 11318 classes and the test split contains the remaining 11316 classes.[18] Hyun Oh Song, Yu Xiang, Stefanie Jegelka, Silvio Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. arXiv 2015. Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/online_products
. - split ({'train', 'test'}) – Select a split of the dataset.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
scalar int32
\([0, \#class - 1]\) super_label
scalar int32
\([0, \#super\_class - 1]\) - data_dir (string) – Path to the root of the training data. If this is
PASCAL VOC¶
VOCBboxDataset¶
-
class
chainercv.datasets.
VOCBboxDataset
(data_dir='auto', split='train', year='2012', use_difficult=False, return_difficult=False)[source]¶ Bounding box dataset for PASCAL VOC.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/voc
. - split ({'train', 'val', 'trainval', 'test'}) – Select a split of the
dataset.
test
split is only available for 2007 dataset. - year ({'2007', '2012'}) – Use a dataset prepared for a challenge
held in
year
. - use_difficult (bool) – If
True
, use images that are labeled as difficult in the original annotation. - return_difficult (bool) – If
True
, this dataset returns a boolean array that indicates whether bounding boxes are labeled as difficult or not. The default value isFalse
.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) bbox
[19]\((R, 4)\) float32
\((y_{min}, x_{min}, y_{max}, x_{max})\) label
[19]\((R,)\) int32
\([0, \#fg\_class - 1]\) difficult
(optional [20])\((R,)\) bool
– [19] (1, 2) If use_difficult = True
,bbox
andlabel
contain difficult instances.[20] difficult
is available ifreturn_difficult = True
.- data_dir (string) – Path to the root of the training data. If this is
VOCInstanceSegmentationDataset¶
-
class
chainercv.datasets.
VOCInstanceSegmentationDataset
(data_dir='auto', split='train')[source]¶ Instance segmentation dataset for PASCAL VOC2012.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/voc
. - split ({'train', 'val', 'trainval'}) – Select a split of the dataset.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) mask
\((R, H, W)\) bool
– label
\((R,)\) int32
\([0, \#fg\_class - 1]\) - data_dir (string) – Path to the root of the training data. If this is
VOCSemanticSegmentationDataset¶
-
class
chainercv.datasets.
VOCSemanticSegmentationDataset
(data_dir='auto', split='train')[source]¶ Semantic segmentation dataset for PASCAL VOC2012.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/voc
. - split ({'train', 'val', 'trainval'}) – Select a split of the dataset.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) label
\((H, W)\) int32
\([-1, \#class - 1]\) - data_dir (string) – Path to the root of the training data. If this is
Semantic Boundaries Dataset¶
SBDInstanceSegmentationDataset¶
-
class
chainercv.datasets.
SBDInstanceSegmentationDataset
(data_dir='auto', split='train')[source]¶ Instance segmentation dataset for Semantic Boundaries Dataset SBD.
Parameters: - data_dir (string) – Path to the root of the training data. If this is
auto
, this class will automatically download data for you under$CHAINER_DATASET_ROOT/pfnet/chainercv/sbd
. - split ({'train', 'val', 'trainval'}) – Select a split of the dataset.
This dataset returns the following data.
name shape dtype format img
\((3, H, W)\) float32
RGB, \([0, 255]\) mask
\((R, H, W)\) bool
– label
\((R,)\) int32
\([0, \#fg\_class - 1]\) - data_dir (string) – Path to the root of the training data. If this is
Evaluations¶
Detection COCO¶
eval_detection_coco¶
-
chainercv.evaluations.
eval_detection_coco
(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_areas=None, gt_crowdeds=None)[source]¶ Evaluate detections based on evaluation code of MS COCO.
This function evaluates predicted bounding boxes obtained from a dataset by using average precision for each class. The code is based on the evaluation code used in MS COCO.
Parameters: - pred_bboxes (iterable of numpy.ndarray) – See the table below.
- pred_labels (iterable of numpy.ndarray) – See the table below.
- pred_scores (iterable of numpy.ndarray) – See the table below.
- gt_bboxes (iterable of numpy.ndarray) – See the table below.
- gt_labels (iterable of numpy.ndarray) – See the table below.
- gt_areas (iterable of numpy.ndarray) – See the table below. If
None
, some scores are not returned. - gt_crowdeds (iterable of numpy.ndarray) – See the table below.
name shape dtype format pred_bboxes
\([(R, 4)]\) float32
\((y_{min}, x_{min}, y_{max}, x_{max})\) pred_labels
\([(R,)]\) int32
\([0, \#fg\_class - 1]\) pred_scores
\([(R,)]\) float32
– gt_bboxes
\([(R, 4)]\) float32
\((y_{min}, x_{min}, y_{max}, x_{max})\) gt_labels
\([(R,)]\) int32
\([0, \#fg\_class - 1]\) gt_areas
\([(R,)]\) float32
– gt_crowdeds
\([(R,)]\) bool
– All inputs should have the same length. For more detailed explanation of the inputs, please refer to
chainercv.datasets.COCOBboxDataset
.See also
Returns: The keys, value-types and the description of the values are listed below. The APs and ARs calculated with different iou thresholds, sizes of objects, and numbers of detections per image. For more details on the 12 patterns of evaluation metrics, please refer to COCO’s official evaluation page. key type description ap/iou=0.50:0.95/area=all/max_dets=100 numpy.ndarray [1] ap/iou=0.50/area=all/max_dets=100 numpy.ndarray [1] ap/iou=0.75/area=all/max_dets=100 numpy.ndarray [1] ap/iou=0.50:0.95/area=small/max_dets=100 numpy.ndarray [1] [5] ap/iou=0.50:0.95/area=medium/max_dets=100 numpy.ndarray [1] [5] ap/iou=0.50:0.95/area=large/max_dets=100 numpy.ndarray [1] [5] ar/iou=0.50:0.95/area=all/max_dets=1 numpy.ndarray [2] ar/iou=0.50/area=all/max_dets=10 numpy.ndarray [2] ar/iou=0.75/area=all/max_dets=100 numpy.ndarray [2] ar/iou=0.50:0.95/area=small/max_dets=100 numpy.ndarray [2] [5] ar/iou=0.50:0.95/area=medium/max_dets=100 numpy.ndarray [2] [5] ar/iou=0.50:0.95/area=large/max_dets=100 numpy.ndarray [2] [5] map/iou=0.50:0.95/area=all/max_dets=100 float [3] map/iou=0.50/area=all/max_dets=100 float [3] map/iou=0.75/area=all/max_dets=100 float [3] map/iou=0.50:0.95/area=small/max_dets=100 float [3] [5] map/iou=0.50:0.95/area=medium/max_dets=100 float [3] [5] map/iou=0.50:0.95/area=large/max_dets=100 float [3] [5] mar/iou=0.50:0.95/area=all/max_dets=1 float [4] mar/iou=0.50/area=all/max_dets=10 float [4] mar/iou=0.75/area=all/max_dets=100 float [4] mar/iou=0.50:0.95/area=small/max_dets=100 float [4] [5] mar/iou=0.50:0.95/area=medium/max_dets=100 float [4] [5] mar/iou=0.50:0.95/area=large/max_dets=100 float [4] [5] coco_eval pycocotools.cocoeval.COCOeval result from pycocotools
existent_labels numpy.ndarray used labels Return type: dict [1] (1, 2, 3, 4, 5, 6) An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[2] (1, 2, 3, 4, 5, 6) An array of average recalls. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[3] (1, 2, 3, 4, 5, 6) The average of average precisions over classes. [4] (1, 2, 3, 4, 5, 6) The average of average recalls over classes. [5] (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) Skip if gt_areas
isNone
.
Detection VOC¶
eval_detection_voc¶
-
chainercv.evaluations.
eval_detection_voc
(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5, use_07_metric=False)[source]¶ Calculate average precisions based on evaluation code of PASCAL VOC.
This function evaluates predicted bounding boxes obtained from a dataset which has \(N\) images by using average precision for each class. The code is based on the evaluation code used in PASCAL VOC Challenge.
Parameters: - pred_bboxes (iterable of numpy.ndarray) – An iterable of \(N\)
sets of bounding boxes.
Its index corresponds to an index for the base dataset.
Each element of
pred_bboxes
is a set of coordinates of bounding boxes. This is an array whose shape is \((R, 4)\), where \(R\) corresponds to the number of bounding boxes, which may vary among boxes. The second axis corresponds to \(y_{min}, x_{min}, y_{max}, x_{max}\) of a bounding box. - pred_labels (iterable of numpy.ndarray) – An iterable of labels.
Similar to
pred_bboxes
, its index corresponds to an index for the base dataset. Its length is \(N\). - pred_scores (iterable of numpy.ndarray) – An iterable of confidence
scores for predicted bounding boxes. Similar to
pred_bboxes
, its index corresponds to an index for the base dataset. Its length is \(N\). - gt_bboxes (iterable of numpy.ndarray) – An iterable of ground truth
bounding boxes
whose length is \(N\). An element of
gt_bboxes
is a bounding box whose shape is \((R, 4)\). Note that the number of bounding boxes in each image does not need to be same as the number of corresponding predicted boxes. - gt_labels (iterable of numpy.ndarray) – An iterable of ground truth
labels which are organized similarly to
gt_bboxes
. - gt_difficults (iterable of numpy.ndarray) – An iterable of boolean
arrays which is organized similarly to
gt_bboxes
. This tells whether the corresponding ground truth bounding box is difficult or not. By default, this isNone
. In that case, this function considers all bounding boxes to be not difficult. - iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.
- use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
False
.
Returns: The keys, value-types and the description of the values are listed below.
- ap (numpy.ndarray): An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either
pred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
. - map (float): The average of Average Precisions over classes.
Return type: - pred_bboxes (iterable of numpy.ndarray) – An iterable of \(N\)
sets of bounding boxes.
Its index corresponds to an index for the base dataset.
Each element of
calc_detection_voc_ap¶
-
chainercv.evaluations.
calc_detection_voc_ap
(prec, rec, use_07_metric=False)[source]¶ Calculate average precisions based on evaluation code of PASCAL VOC.
This function calculates average precisions from given precisions and recalls. The code is based on the evaluation code used in PASCAL VOC Challenge.
Parameters: - prec (list of numpy.array) – A list of arrays.
prec[l]
indicates precision for class \(l\). Ifprec[l]
isNone
, this function returnsnumpy.nan
for class \(l\). - rec (list of numpy.array) – A list of arrays.
rec[l]
indicates recall for class \(l\). Ifrec[l]
isNone
, this function returnsnumpy.nan
for class \(l\). - use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
False
.
Returns: This function returns an array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If
prec[l]
orrec[l]
isNone
, the corresponding value is set tonumpy.nan
.Return type: - prec (list of numpy.array) – A list of arrays.
calc_detection_voc_prec_rec¶
-
chainercv.evaluations.
calc_detection_voc_prec_rec
(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5)[source]¶ Calculate precision and recall based on evaluation code of PASCAL VOC.
This function calculates precision and recall of predicted bounding boxes obtained from a dataset which has \(N\) images. The code is based on the evaluation code used in PASCAL VOC Challenge.
Parameters: - pred_bboxes (iterable of numpy.ndarray) – An iterable of \(N\)
sets of bounding boxes.
Its index corresponds to an index for the base dataset.
Each element of
pred_bboxes
is a set of coordinates of bounding boxes. This is an array whose shape is \((R, 4)\), where \(R\) corresponds to the number of bounding boxes, which may vary among boxes. The second axis corresponds to \(y_{min}, x_{min}, y_{max}, x_{max}\) of a bounding box. - pred_labels (iterable of numpy.ndarray) – An iterable of labels.
Similar to
pred_bboxes
, its index corresponds to an index for the base dataset. Its length is \(N\). - pred_scores (iterable of numpy.ndarray) – An iterable of confidence
scores for predicted bounding boxes. Similar to
pred_bboxes
, its index corresponds to an index for the base dataset. Its length is \(N\). - gt_bboxes (iterable of numpy.ndarray) – An iterable of ground truth
bounding boxes
whose length is \(N\). An element of
gt_bboxes
is a bounding box whose shape is \((R, 4)\). Note that the number of bounding boxes in each image does not need to be same as the number of corresponding predicted boxes. - gt_labels (iterable of numpy.ndarray) – An iterable of ground truth
labels which are organized similarly to
gt_bboxes
. - gt_difficults (iterable of numpy.ndarray) – An iterable of boolean
arrays which is organized similarly to
gt_bboxes
. This tells whether the corresponding ground truth bounding box is difficult or not. By default, this isNone
. In that case, this function considers all bounding boxes to be not difficult. - iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value..
Returns: This function returns two lists:
prec
andrec
.prec
: A list of arrays.prec[l]
is precision for class \(l\). If class \(l\) does not exist in eitherpred_labels
orgt_labels
,prec[l]
is set toNone
.rec
: A list of arrays.rec[l]
is recall for class \(l\). If class \(l\) that is not marked as difficult does not exist ingt_labels
,rec[l]
is set toNone
.
Return type: tuple of two lists
- pred_bboxes (iterable of numpy.ndarray) – An iterable of \(N\)
sets of bounding boxes.
Its index corresponds to an index for the base dataset.
Each element of
Instance Segmentation COCO¶
eval_instance_segmentation_coco¶
-
chainercv.evaluations.
eval_instance_segmentation_coco
(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, gt_areas=None, gt_crowdeds=None)[source]¶ Evaluate instance segmentations based on evaluation code of MS COCO.
This function evaluates predicted instance segmentations obtained from a dataset by using average precision for each class. The code is based on the evaluation code used in MS COCO.
Parameters: - pred_masks (iterable of numpy.ndarray) – See the table below.
- pred_labels (iterable of numpy.ndarray) – See the table below.
- pred_scores (iterable of numpy.ndarray) – See the table below.
- gt_masks (iterable of numpy.ndarray) – See the table below.
- gt_labels (iterable of numpy.ndarray) – See the table below.
- gt_areas (iterable of numpy.ndarray) – See the table below. If
None
, some scores are not returned. - gt_crowdeds (iterable of numpy.ndarray) – See the table below.
name shape dtype format pred_masks
\([(R, H, W)]\) bool
– pred_labels
\([(R,)]\) int32
\([0, \#fg\_class - 1]\) pred_scores
\([(R,)]\) float32
– gt_masks
\([(R, H, W)]\) bool
– gt_labels
\([(R,)]\) int32
\([0, \#fg\_class - 1]\) gt_areas
\([(R,)]\) float32
– gt_crowdeds
\([(R,)]\) bool
– All inputs should have the same length. For more detailed explanation of the inputs, please refer to
chainercv.datasets.COCOInstanceSegmentationDataset
.Returns: The keys, value-types and the description of the values are listed below. The APs and ARs calculated with different iou thresholds, sizes of objects, and numbers of detections per image. For more details on the 12 patterns of evaluation metrics, please refer to COCO’s official evaluation page. key type description ap/iou=0.50:0.95/area=all/max_dets=100 numpy.ndarray [6] ap/iou=0.50/area=all/max_dets=100 numpy.ndarray [6] ap/iou=0.75/area=all/max_dets=100 numpy.ndarray [6] ap/iou=0.50:0.95/area=small/max_dets=100 numpy.ndarray [6] [10] ap/iou=0.50:0.95/area=medium/max_dets=100 numpy.ndarray [6] [10] ap/iou=0.50:0.95/area=large/max_dets=100 numpy.ndarray [6] [10] ar/iou=0.50:0.95/area=all/max_dets=1 numpy.ndarray [7] ar/iou=0.50/area=all/max_dets=10 numpy.ndarray [7] ar/iou=0.75/area=all/max_dets=100 numpy.ndarray [7] ar/iou=0.50:0.95/area=small/max_dets=100 numpy.ndarray [7] [10] ar/iou=0.50:0.95/area=medium/max_dets=100 numpy.ndarray [7] [10] ar/iou=0.50:0.95/area=large/max_dets=100 numpy.ndarray [7] [10] map/iou=0.50:0.95/area=all/max_dets=100 float [8] map/iou=0.50/area=all/max_dets=100 float [8] map/iou=0.75/area=all/max_dets=100 float [8] map/iou=0.50:0.95/area=small/max_dets=100 float [8] [10] map/iou=0.50:0.95/area=medium/max_dets=100 float [8] [10] map/iou=0.50:0.95/area=large/max_dets=100 float [8] [10] mar/iou=0.50:0.95/area=all/max_dets=1 float [9] mar/iou=0.50/area=all/max_dets=10 float [9] mar/iou=0.75/area=all/max_dets=100 float [9] mar/iou=0.50:0.95/area=small/max_dets=100 float [9] [10] mar/iou=0.50:0.95/area=medium/max_dets=100 float [9] [10] mar/iou=0.50:0.95/area=large/max_dets=100 float [9] [10] coco_eval pycocotools.cocoeval.COCOeval result from pycocotools
existent_labels numpy.ndarray used labels Return type: dict [6] (1, 2, 3, 4, 5, 6) An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[7] (1, 2, 3, 4, 5, 6) An array of average recalls. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[8] (1, 2, 3, 4, 5, 6) The average of average precisions over classes. [9] (1, 2, 3, 4, 5, 6) The average of average recalls over classes. [10] (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) Skip if gt_areas
isNone
.
Instance Segmentation VOC¶
eval_instance_segmentation_voc¶
-
chainercv.evaluations.
eval_instance_segmentation_voc
(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, iou_thresh=0.5, use_07_metric=False)[source]¶ Calculate average precisions based on evaluation code of PASCAL VOC.
This function evaluates predicted masks obtained from a dataset which has \(N\) images by using average precision for each class. The code is based on the evaluation code used in FCIS.
Parameters: - pred_masks (iterable of numpy.ndarray) – An iterable of \(N\)
sets of masks. Its index corresponds to an index for the base
dataset. Each element of
pred_masks
is an object mask and is an array whose shape is \((R, H, W)\), where \(R\) corresponds to the number of masks, which may vary among images. - pred_labels (iterable of numpy.ndarray) – An iterable of labels.
Similar to
pred_masks
, its index corresponds to an index for the base dataset. Its length is \(N\). - pred_scores (iterable of numpy.ndarray) – An iterable of confidence
scores for predicted masks. Similar to
pred_masks
, its index corresponds to an index for the base dataset. Its length is \(N\). - gt_masks (iterable of numpy.ndarray) – An iterable of ground truth
masks whose length is \(N\). An element of
gt_masks
is an object mask whose shape is \((R, H, W)\). Note that the number of masks \(R\) in each image does not need to be same as the number of corresponding predicted masks. - gt_labels (iterable of numpy.ndarray) – An iterable of ground truth
labels which are organized similarly to
gt_masks
. Its length is \(N\). - iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.
- use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
False
.
Returns: The keys, value-types and the description of the values are listed below.
- ap (numpy.ndarray): An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either
pred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
. - map (float): The average of Average Precisions over classes.
Return type: - pred_masks (iterable of numpy.ndarray) – An iterable of \(N\)
sets of masks. Its index corresponds to an index for the base
dataset. Each element of
calc_instance_segmentation_voc_prec_rec¶
-
chainercv.evaluations.
calc_instance_segmentation_voc_prec_rec
(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, iou_thresh)[source]¶ Calculate precision and recall based on evaluation code of PASCAL VOC.
This function calculates precision and recall of predicted masks obtained from a dataset which has \(N\) images. The code is based on the evaluation code used in FCIS.
Parameters: - pred_masks (iterable of numpy.ndarray) – An iterable of \(N\)
sets of masks. Its index corresponds to an index for the base
dataset. Each element of
pred_masks
is an object mask and is an array whose shape is \((R, H, W)\), where \(R\) corresponds to the number of masks, which may vary among images. - pred_labels (iterable of numpy.ndarray) – An iterable of labels.
Similar to
pred_masks
, its index corresponds to an index for the base dataset. Its length is \(N\). - pred_scores (iterable of numpy.ndarray) – An iterable of confidence
scores for predicted masks. Similar to
pred_masks
, its index corresponds to an index for the base dataset. Its length is \(N\). - gt_masks (iterable of numpy.ndarray) – An iterable of ground truth
masks whose length is \(N\). An element of
gt_masks
is an object mask whose shape is \((R, H, W)\). Note that the number of masks \(R\) in each image does not need to be same as the number of corresponding predicted masks. - gt_labels (iterable of numpy.ndarray) – An iterable of ground truth
labels which are organized similarly to
gt_masks
. Its length is \(N\). - iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.
Returns: This function returns two lists:
prec
andrec
.prec
: A list of arrays.prec[l]
is precision for class \(l\). If class \(l\) does not exist in eitherpred_labels
orgt_labels
,prec[l]
is set toNone
.rec
: A list of arrays.rec[l]
is recall for class \(l\). If class \(l\) that is not marked as difficult does not exist ingt_labels
,rec[l]
is set toNone
.
Return type: tuple of two lists
- pred_masks (iterable of numpy.ndarray) – An iterable of \(N\)
sets of masks. Its index corresponds to an index for the base
dataset. Each element of
Semantic Segmentation IoU¶
eval_semantic_segmentation¶
-
chainercv.evaluations.
eval_semantic_segmentation
(pred_labels, gt_labels)[source]¶ Evaluate metrics used in Semantic Segmentation.
This function calculates Intersection over Union (IoU), Pixel Accuracy and Class Accuracy for the task of semantic segmentation.
The definition of metrics calculated by this function is as follows, where \(N_{ij}\) is the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.
- \(\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)
- \(\text{mIoU} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)
- \(\text{Pixel Accuracy} = \frac {\sum_{i=1}^k N_{ii}} {\sum_{i=1}^k \sum_{j=1}^k N_{ij}}\)
- \(\text{Class Accuracy} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}\)
- \(\text{Mean Class Accuracy} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}\)
The more detailed description of the above metrics can be found in a review on semantic segmentation [11].
The number of classes \(n\_class\) is \(max(pred\_labels, gt\_labels) + 1\), which is the maximum class id of the inputs added by one.
[11] Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017. Parameters: - pred_labels (iterable of numpy.ndarray) – A collection of predicted
labels. The shape of a label array
is \((H, W)\). \(H\) and \(W\)
are height and width of the label.
For example, this is a list of labels
[label_0, label_1, ...]
, wherelabel_i.shape = (H_i, W_i)
. - gt_labels (iterable of numpy.ndarray) – A collection of ground
truth labels. The shape of a ground truth label array is
\((H, W)\), and its corresponding prediction label should
have the same shape.
A pixel with value
-1
will be ignored during evaluation.
Returns: The keys, value-types and the description of the values are listed below.
- iou (numpy.ndarray): An array of IoUs for the \(n\_class\) classes. Its shape is \((n\_class,)\).
- miou (float): The average of IoUs over classes.
- pixel_accuracy (float): The computed pixel accuracy.
- class_accuracy (numpy.ndarray): An array of class accuracies for the \(n\_class\) classes. Its shape is \((n\_class,)\).
- mean_class_accuracy (float): The average of class accuracies.
Return type:
calc_semantic_segmentation_confusion¶
-
chainercv.evaluations.
calc_semantic_segmentation_confusion
(pred_labels, gt_labels)[source]¶ Collect a confusion matrix.
The number of classes \(n\_class\) is \(max(pred\_labels, gt\_labels) + 1\), which is the maximum class id of the inputs added by one.
Parameters: - pred_labels (iterable of numpy.ndarray) – A collection of predicted labels. The shape of a label array is \((H, W)\). \(H\) and \(W\) are height and width of the label.
- gt_labels (iterable of numpy.ndarray) – A collection of ground
truth labels. The shape of a ground truth label array is
\((H, W)\), and its corresponding prediction label should
have the same shape.
A pixel with value
-1
will be ignored during evaluation.
Returns: A confusion matrix. Its shape is \((n\_class, n\_class)\). The \((i, j)\) th element corresponds to the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.
Return type:
calc_semantic_segmentation_iou¶
-
chainercv.evaluations.
calc_semantic_segmentation_iou
(confusion)[source]¶ Calculate Intersection over Union with a given confusion matrix.
The definition of Intersection over Union (IoU) is as follows, where \(N_{ij}\) is the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.
- \(\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)
Parameters: confusion (numpy.ndarray) – A confusion matrix. Its shape is \((n\_class, n\_class)\). The \((i, j)\) th element corresponds to the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction. Returns: An array of IoUs for the \(n\_class\) classes. Its shape is \((n\_class,)\). Return type: numpy.ndarray
Experimental¶
Links¶
Detection¶
Detection links share a common method predict()
to detect objects in images.
YOLO¶
-
class
chainercv.experimental.links.model.yolo.
YOLOv2Tiny
(n_fg_class=None, pretrained_model=None)[source]¶ YOLOv2 tiny.
This is a model of YOLOv2 tiny a.k.a. Tiny YOLO. This model uses
DarknetExtractor
as its feature extractor.Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the darknet model provided by the original implementation. The conversion code is chainercv/examples/yolo/darknet2npz.py.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
Semantic Segmentation¶
Semantic segmentation links share a common method predict()
to conduct semantic segmentation of images.
PSPNet¶
-
chainercv.experimental.links.model.pspnet.
convolution_crop
(img, size, stride, return_param=False)[source]¶ Strided cropping.
This extracts cropped images from the input. The cropped images are extracted from the entire image, while taking a constant steps between neighboring patches.
Parameters: - img (ndarray) – An image array to be cropped. This is in CHW format.
- size (tuple) – The size of output image after cropping. This value is \((height, width)\).
- stride (tuple) – The stride between crops. This contains two values: stride in the vertical and horizontal directions.
- return_param (bool) – If
True
, this function returns information of slices.
Returns: If
return_param = False
, returns an arraycrop_imgs
that is a stack of cropped images.If
return_param = True
, returns a tuple whose elements arecrop_imgs, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slices (list slices): Slices used to crop the input image. The relation below holds together with
x_slices
.x_slices (list of slices): Similar to
y_slices
.crop_y_slices (list of slices): This indicates the region of the cropped image that is actually extracted from the input. This is relevant only when borders of the input are cropped.
crop_x_slices (list of slices): Similar to
crop_y_slices
.crop_img = crop_imgs[i][:, crop_y_slices[i], crop_x_slices[i]] crop_img == img[:, y_slices[i], x_slices[i]]
Return type: Examples
>>> import numpy as np >>> from chainercv.datasets import VOCBboxDataset >>> from chainercv.transforms import resize >>> from chainercv.experimental.links.model.pspnet import ... convolution_crop >>> >>> img, _, _ = VOCBboxDataset(year='2007')[0] >>> img = resize(img, (300, 300)) >>> imgs, param = convolution_crop( >>> img, (128, 128), (96, 96), return_param=True) >>> # Restore the original image from the cropped images. >>> output = np.zeros((3, 300, 300)) >>> count = np.zeros((300, 300)) >>> for i in range(len(imgs)): >>> crop_y_slice = param['crop_y_slices'][i] >>> crop_x_slice = param['crop_x_slices'][i] >>> y_slice = param['y_slices'][i] >>> x_slice = param['x_slices'][i] >>> output[:, y_slice, x_slice] += ... imgs[i][:, crop_y_slice, crop_x_slice] >>> count[y_slice, x_slice] += 1 >>> output = output / count[None] >>> np.testing.assert_equal(output, img) >>> >>> # Visualization of the cropped images >>> import matplotlib.pyplot as plt >>> from chainercv.utils import tile_images >>> from chainercv.visualizations import vis_image >>> v_imgs = tile_images(imgs, 5, fill=122.5) >>> vis_image(v_imgs) >>> plt.show()
-
class
chainercv.experimental.links.model.pspnet.
PSPNet
(n_class=None, pretrained_model=None, input_size=None, initialW=None)[source]¶ Pyramid Scene Parsing Network.
This is a PSPNet [1] model for semantic segmentation. This is based on the implementation found here.
[1] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang Jiaya Jia “Pyramid Scene Parsing Network” CVPR, 2017 Parameters: - n_class (int) – The number of channels in the last convolution layer.
- pretrained_model (string) – The weight file to be loaded.
This can take
'cityscapes'
, filepath orNone
. The default value isNone
. *'cityscapes'
: Load weights trained on the train split of Cityscapes dataset.n_class
must be19
orNone
. *'ade20k'
: Load weights trained on the train split of ADE20K dataset.n_class
must be150
orNone
. *'imagenet'
: Load ImageNet pretrained weights for the extractor. * filepath: A path of npz file. In this case,n_class
must be specified properly. *None
: Do not load weights. - input_size (tuple) – The size of the input. This value is \((height, width)\).
- initialW (callable) – Initializer for the weights of convolution kernels.
-
predict
(imgs)[source]¶ Conduct semantic segmentation from images.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their values are \([0, 255]\). Returns: List of integer labels predicted from each image in the input list. Return type: list of numpy.ndarray
Instance Segmentation¶
Instance segmentation share a common method predict()
to detect objects in images.
For more details, please read FCIS.predict()
.
FCIS¶
-
class
chainercv.experimental.links.model.fcis.
FCISResNet101
(n_fg_class=None, pretrained_model=None, min_size=600, max_size=1000, roi_size=21, group_size=7, ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32], loc_normalize_mean=(0.0, 0.0, 0.0, 0.0), loc_normalize_std=(0.2, 0.2, 0.5, 0.5), iter2=True, resnet_initialW=None, rpn_initialW=None, head_initialW=None, proposal_creator_params=None)[source]¶ FCIS based on ResNet101.
When you specify the path of a pre-trained chainer model serialized as a
npz
file in the constructor, this chain model automatically initializes all the parameters with it. When a string in prespecified set is provided, a pretrained model is loaded from weights distributed on the Internet. The list of pretrained models supported are as follows:sbd
: Loads weights trained with the trainval split of Semantic Boundaries Dataset.
For descriptions on the interface of this model, please refer to
FCIS
.FCISResNet101
supports finer control on random initializations of weights by argumentsresnet_initialW
,rpn_initialW
andhead_initialW
. It accepts a callable that takes an array and edits its values. IfNone
is passed as an initializer, the default initializer is used.Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (str) – The destination of the pre-trained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - min_size (int) – A preprocessing paramter for
prepare()
. - max_size (int) – A preprocessing paramter for
prepare()
. - roi_size (int) – Height and width of the feature maps after Position Sensitive RoI pooling.
- group_size (int) – Group height and width for Position Sensitive ROI pooling.
- ratios (list of floats) – This is ratios of width to height of the anchors.
- anchor_scales (list of numbers) – This is areas of anchors.
Those areas will be the product of the square of an element in
anchor_scales
and the original area of the reference window. - loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
- loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
- iter2 (bool) – if the value is set
True
, Position Sensitive ROI pooling is executed twice. In the second time, Position Sensitive ROI pooling uses improved ROIs by the localization parameters calculated in the first time. - resnet_initialW (callable) – Initializer for the layers corresponding to the ResNet101 layers.
- rpn_initialW (callable) – Initializer for Region Proposal Network layers.
- head_initialW (callable) – Initializer for the head layers.
- proposal_creator_params (dict) – Key valued paramters for
ProposalCreator
.
-
class
chainercv.experimental.links.model.fcis.
FCIS
(extractor, rpn, head, mean, min_size, max_size, loc_normalize_mean, loc_normalize_std)[source]¶ Base class for FCIS.
This is a base class for FCIS links supporting instance segmentation API [1]. The following three stages constitute FCIS.
- Feature extraction: Images are taken and their feature maps are calculated.
- Region Proposal Networks: Given the feature maps calculated in the previous stage, produce set of RoIs around objects.
- Localization, Segmentation and Classification Heads: Using feature maps that belong to the proposed RoIs, segment regions of the objects, classify the categories of the objects in the RoIs and improve localizations.
Each stage is carried out by one of the callable
chainer.Chain
objectsfeature
,rpn
andhead
. There are two functionspredict()
and__call__()
to conduct instance segmentation.predict()
takes images and returns masks, object labels and their scores.__call__()
is provided for a scnerario when intermediate outputs are needed, for instance, for training and debugging.Links that support instance segmentation API have method
predict()
with the same interface. Please refer topredict()
for further details.[1] Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017. Parameters: - extractor (callable Chain) – A callable that takes a BCHW image array and returns feature maps.
- rpn (callable Chain) – A callable that has the same interface as
RegionProposalNetwork
. Please refer to the documentation found there. - head (callable Chain) – A callable that takes a BCHW array, RoIs and batch indices for RoIs. This returns class-agnostic segmentation scores, class-agnostic localization parameters, class scores, improved RoIs and batch indices for RoIs.
- mean (numpy.ndarray) – A value to be subtracted from an image
in
prepare()
. - min_size (int) – A preprocessing parameter for
prepare()
. Please refer to a docstring found forprepare()
. - max_size (int) – A preprocessing parameter for
prepare()
. - loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
- loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
-
__call__
(x, scale=1.0)[source]¶ Forward FCIS.
Scaling paramter
scale
is used by RPN to determine the threshold to select small objects, which are going to be rejected irrespective of their confidence scores.Here are notations used.
- \(N\) is the number of batch size
- \(R'\) is the total number of RoIs produced across batches. Given \(R_i\) proposed RoIs from the \(i\) th image, \(R' = \sum _{i=1} ^ N R_i\).
- \(L\) is the number of classes excluding the background.
- \(RH\) is the height of pooled image by Position Sensitive ROI pooling.
- \(RW\) is the height of pooled image by Position Sensitive ROI pooling.
Classes are ordered by the background, the first class, …, and the \(L\) th class.
Parameters: - x (Variable) – 4D image variable.
- scale (float) – Amount of scaling applied to the raw image during preprocessing.
Returns: Returns tuple of five values listed below.
- roi_ag_seg_scores: Class-agnostic clipped mask scores for the proposed ROIs. Its shape is \((R', 2, RH, RW)\)
- ag_locs: Class-agnostic offsets and scalings for the proposed RoIs. Its shape is \((R', 2, 4)\).
- roi_cls_scores: Class predictions for the proposed RoIs. Its shape is \((R', L + 1)\).
- rois: RoIs proposed by RPN. Its shape is \((R', 4)\).
- roi_indices: Batch indices of RoIs. Its shape is \((R',)\).
Return type: Variable, Variable, Variable, array, array
-
predict
(imgs)[source]¶ Segment object instances from images.
This method predicts instance-aware object regions for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images of shape \((B, C, H, W)\). All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (masks, labels, scores)
.- masks: A list of boolean arrays of shape \((R, H, W)\), where \(R\) is the number of masks in a image. Each pixel holds value if it is inside the object inside or not.
- labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the masks. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists
-
prepare
(img)[source]¶ Preprocess an image for feature extraction.
The length of the shorter edge is scaled to
self.min_size
. After the scaling, if the length of the longer edge is longer thanself.max_size
, the image is scaled to fit the longer edge toself.max_size
.After resizing the image, the image is subtracted by a mean image value
self.mean
.Parameters: img (ndarray) – An image. This is in CHW and RGB format. The range of its value is \([0, 255]\). Returns: A preprocessed image. Return type: ndarray
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
self.nms_thresh
,self.score_thresh
,self.mask_merge_thresh
,self.binary_thresh
,self.binary_thresh
andself.min_drop_size
. These values are a threshold value used for non maximum suppression, a threshold value to discard low confidence proposals inpredict()
, a threshold value to merge mask inpredict()
, a threshold value to binalize segmentation scores inpredict()
, a limit number of predicted masks in one image and a threshold value to discard small bounding boxes respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate') – A string to determine the preset to use.
-
class
chainercv.experimental.links.model.fcis.
FCISResNet101Head
(n_class, roi_size, group_size, spatial_scale, loc_normalize_mean, loc_normalize_std, iter2, initialW=None)[source]¶ FCIS Head for ResNet101 based implementation.
This class is used as a head for FCIS. This outputs class-agnostice segmentation scores, class-agnostic localizations and classification based on feature maps in the given RoIs.
Parameters: - n_class (int) – The number of classes possibly including the background.
- roi_size (int) – Height and width of the feature maps after Position Sensitive RoI pooling.
- group_size (int) – Group height and width for Position Sensitive ROI pooling.
- spatial_scale (float) – Scale of the roi is resized.
- loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
- loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
- iter2 (bool) – if the value is set
True
, Position Sensitive ROI pooling is executed twice. In the second time, Position Sensitive ROI pooling uses improved ROIs by the localization parameters calculated in the first time. - initialW (callable) – Initializer for the layers.
-
chainercv.experimental.links.model.fcis.
mask_voting
(seg_prob, bbox, cls_prob, size, score_thresh, nms_thresh, mask_merge_thresh, binary_thresh, limit=100, bg_label=0)[source]¶ Refine mask probabilities by merging multiple masks.
First, this function discard invalid masks with non maximum suppression. Then, it merges masks with weight calculated from class probabilities and iou. This function improves the mask qualities by merging overlapped masks predicted as the same object class.
Here are notations used. * \(R\) is the total number of RoIs produced in one image. * \(L\) is the number of classes excluding the background. * \(RH\) is the height of pooled image. * \(RW\) is the height of pooled image.
Parameters: - seg_prob (array) – A mask probability array whose shape is \((R, RH, RW)\).
- bbox (array) – A bounding box array whose shape is \((R, 4)\).
- cls_prob (array) – A class probability array whose shape is \((R, L + 1)\).
- size (tuple of int) – Original image size.
- score_thresh (float) – A threshold value of the class score.
- nms_thresh (float) – A threshold value of non maximum suppression.
- mask_merge_thresh (float) – A threshold value of the bounding box iou for mask merging.
- binary_thresh (float) – A threshold value of mask score for mask merging.
- limit (int) – The maximum number of outputs.
- bg_label (int) – The id of the background label.
Returns: - v_seg_prob: Merged mask probability. Its shapes is \((N, RH, RW)\).
- v_bbox: Bounding boxes for the merged masks. Its shape is \((N, 4)\).
- v_label: Class labels for the merged masks. Its shape is \((N, )\).
- v_score: Class probabilities for the merged masks. Its shape is \((N, )\).
Return type: array, array, array, array
-
class
chainercv.experimental.links.model.fcis.
ResNet101Extractor
(initialW=None)[source]¶ ResNet101 Extractor for FCIS ResNet101 implementation.
This class is used as an extractor for FCISResNet101. This outputs feature maps. Dilated convolution is used in the C5 stage.
Parameters: initialW – Initializer for ResNet101 extractor.
-
class
chainercv.experimental.links.model.fcis.
FCISTrainChain
(fcis, rpn_sigma=3.0, roi_sigma=1.0, anchor_target_creator=<chainercv.links.model.faster_rcnn.utils.anchor_target_creator.AnchorTargetCreator object>, proposal_target_creator=<chainercv.experimental.links.model.fcis.utils.proposal_target_creator.ProposalTargetCreator object>)[source]¶ Calculate losses for FCIS and report them.
This is used to train FCIS in the joint training scheme [2].
The losses include:
rpn_loc_loss
: The localization loss for Region Proposal Network (RPN).rpn_cls_loss
: The classification loss for RPN.roi_loc_loss
: The localization loss for the head module.roi_cls_loss
: The classification loss for the head module.roi_mask_loss
: The mask loss for the head module.
[2] (1, 2, 3) Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017. Parameters: - fcis (FCIS) – A FCIS model for training.
- rpn_sigma (float) – Sigma parameter for the localization loss of Region Proposal Network (RPN). The default value is 3, which is the value used in [2].
- roi_sigma (float) – Sigma paramter for the localization loss of the head. The default value is 1, which is the value used in [2].
- anchor_target_creator – An instantiation of
AnchorTargetCreator
. - proposal_target_creator – An instantiation of
ProposalTargetCreator
.
-
__call__
(imgs, masks, labels, bboxes, scale)[source]¶ Forward FCIS and calculate losses.
Here are notations used.
- \(N\) is the batch size.
- \(R\) is the number of bounding boxes per image.
- \(H\) is the image height.
- \(W\) is the image width.
Currently, only \(N=1\) is supported.
Parameters: - imgs (Variable) – A variable with a batch of images.
- masks (Variable) – A batch of masks. Its shape is \((N, R, H, W)\).
- labels (Variable) – A batch of labels. Its shape is \((N, R)\). The background is excluded from the definition, which means that the range of the value is \([0, L - 1]\). \(L\) is the number of foreground classes.
- bboxes (Variable) – A batch of bounding boxes. Its shape is \((N, R, 4)\).
- scale (float or Variable) – Amount of scaling applied to the raw image during preprocessing.
Returns: Scalar loss variable. This is the sum of losses for Region Proposal Network and the head module.
Return type: chainer.Variable
-
class
chainercv.experimental.links.model.fcis.
ProposalTargetCreator
(n_sample=128, pos_ratio=0.25, pos_iou_thresh=0.5, neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.1, binary_thresh=0.4)[source]¶ Assign ground truth classes, bounding boxes and masks to given RoIs.
The
__call__()
of this class generates training targets for each object proposal. This is used to train FCIS [3].[3] Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei. Fully Convolutional Instance-aware Semantic Segmentation. CVPR 2017. Parameters: - n_sample (int) – The number of sampled regions.
- pos_ratio (float) – Fraction of regions that is labeled as a foreground.
- pos_iou_thresh (float) – IoU threshold for a RoI to be considered as a foreground.
- neg_iou_thresh_hi (float) – RoI is considered to be the background
if IoU is in
[
neg_iou_thresh_hi
,neg_iou_thresh_hi
). - neg_iou_thresh_lo (float) – See above.
- binary_thresh (float) – Threshold for resized mask.
-
__call__
(roi, mask, label, bbox, loc_normalize_mean=(0.0, 0.0, 0.0, 0.0), loc_normalize_std=(0.2, 0.2, 0.5, 0.5), mask_size=(21, 21))[source]¶ Assigns ground truth to sampled proposals.
This function samples total of
self.n_sample
RoIs from the combination ofroi
,mask
,label
and :obj: bbox. The RoIs are assigned with the ground truth class labels as well as bounding box offsets and scales to match the ground truth bounding boxes. As many aspos_ratio * self.n_sample
RoIs are sampled as foregrounds.Offsets and scales of bounding boxes are calculated using
chainercv.links.model.faster_rcnn.bbox2loc()
. Also, types of input arrays and output arrays are same.Here are notations.
- \(S\) is the total number of sampled RoIs, which equals
self.n_sample
. - \(L\) is number of object classes possibly including the background.
- \(H\) is the image height.
- \(W\) is the image width.
- \(RH\) is the mask height.
- \(RW\) is the mask width.
Parameters: - roi (array) – Region of Interests (RoIs) from which we sample. Its shape is \((R, 4)\)
- mask (array) – The coordinates of ground truth masks. Its shape is \((R', H, W)\).
- label (array) – Ground truth bounding box labels. Its shape is \((R',)\). Its range is \([0, L - 1]\), where \(L\) is the number of foreground classes.
- bbox (array) – The coordinates of ground truth bounding boxes. Its shape is \((R', 4)\).
- loc_normalize_mean (tuple of four floats) – Mean values to normalize coordinates of bounding boxes.
- loc_normalize_std (tuple of four floats) – Standard deviation of the coordinates of bounding boxes.
- mask_size (tuple of int or int) – Generated mask size, which is equal to \((RH, RW)\).
Returns: - sample_roi: Regions of interests that are sampled. Its shape is \((S, 4)\).
- gt_roi_mask: Masks assigned to sampled RoIs. Its shape is \((S, RH, RW)\).
- gt_roi_label: Labels assigned to sampled RoIs. Its shape is \((S,)\). Its range is \([0, L]\). The label with value 0 is the background.
- gt_roi_loc: Offsets and scales to match the sampled RoIs to the ground truth bounding boxes. Its shape is \((S, 4)\).
Return type: (array, array, array, array)
- \(S\) is the total number of sampled RoIs, which equals
Extensions¶
Evaluator¶
DetectionCOCOEvaluator¶
-
class
chainercv.extensions.
DetectionCOCOEvaluator
(iterator, target, label_names=None)[source]¶ An extension that evaluates a detection model by MS COCO metric.
This extension iterates over an iterator and evaluates the prediction results. The results consist of average precisions (APs) and average recalls (ARs) as well as the mean of each (mean average precision and mean average recall). This extension reports the following values with keys. Please note that if
label_names
is not specified, only the mAPs and mARs are reported.The underlying dataset of the iterator is assumed to return
img, bbox, label
orimg, bbox, label, area, crowded
.key description ap/iou=0.50:0.95/area=all/max_dets=100/<label_names[l]> [1] ap/iou=0.50/area=all/max_dets=100/<label_names[l]> [1] ap/iou=0.75/area=all/max_dets=100/<label_names[l]> [1] ap/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]> [1] [5] ap/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]> [1] [5] ap/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]> [1] [5] ar/iou=0.50:0.95/area=all/max_dets=1/<label_names[l]> [2] ar/iou=0.50/area=all/max_dets=10/<label_names[l]> [2] ar/iou=0.75/area=all/max_dets=100/<label_names[l]> [2] ar/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]> [2] [5] ar/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]> [2] [5] ar/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]> [2] [5] map/iou=0.50:0.95/area=all/max_dets=100 [3] map/iou=0.50/area=all/max_dets=100 [3] map/iou=0.75/area=all/max_dets=100 [3] map/iou=0.50:0.95/area=small/max_dets=100 [3] [5] map/iou=0.50:0.95/area=medium/max_dets=100 [3] [5] map/iou=0.50:0.95/area=large/max_dets=100 [3] [5] ar/iou=0.50:0.95/area=all/max_dets=1 [4] ar/iou=0.50/area=all/max_dets=10 [4] ar/iou=0.75/area=all/max_dets=100 [4] ar/iou=0.50:0.95/area=small/max_dets=100 [4] [5] ar/iou=0.50:0.95/area=medium/max_dets=100 [4] [5] ar/iou=0.50:0.95/area=large/max_dets=100 [4] [5] [1] (1, 2, 3, 4, 5, 6) Average precision for class label_names[l]
, where \(l\) is the index of the class. If class \(l\) does not exist in eitherpred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[2] (1, 2, 3, 4, 5, 6) Average recall for class label_names[l]
, where \(l\) is the index of the class. If class \(l\) does not exist in eitherpred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[3] (1, 2, 3, 4, 5, 6) The average of average precisions over classes. [4] (1, 2, 3, 4, 5, 6) The average of average recalls over classes. [5] (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) Skip if gt_areas
isNone
.Parameters: - iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
img, bbox, label, area, crowded
. - target (chainer.Link) – A detection link. This link must have
predict()
method that takes a list of images and returnsbboxes
,labels
andscores
. - label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision and average recalls for each class are reported.
- iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
DetectionVOCEvaluator¶
-
class
chainercv.extensions.
DetectionVOCEvaluator
(iterator, target, use_07_metric=False, label_names=None)[source]¶ An extension that evaluates a detection model by PASCAL VOC metric.
This extension iterates over an iterator and evaluates the prediction results by average precisions (APs) and mean of them (mean Average Precision, mAP). This extension reports the following values with keys. Please note that
'ap/<label_names[l]>'
is reported only iflabel_names
is specified.'map'
: Mean of average precisions (mAP).'ap/<label_names[l]>'
: Average precision for classlabel_names[l]
, where \(l\) is the index of the class. For example, this evaluator reports'ap/aeroplane'
,'ap/bicycle'
, etc. iflabel_names
isvoc_bbox_label_names
. If there is no bounding box assigned to classlabel_names[l]
in either ground truth or prediction, it reportsnumpy.nan
as its average precision. In this case, mAP is computed without this class.
Parameters: - iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
img, bbox, label
orimg, bbox, label, difficult
.img
is an image,bbox
is coordinates of bounding boxes,label
is labels of the bounding boxes anddifficult
is whether the bounding boxes are difficult or not. Ifdifficult
is returned, difficult ground truth will be ignored from evaluation. - target (chainer.Link) – A detection link. This link must have
predict()
method that takes a list of images and returnsbboxes
,labels
andscores
. - use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
False
. - label_names (iterable of strings) – An iterable of names of classes.
If this value is specified, average precision for each class is
also reported with the key
'ap/<label_names[l]>'
.
InstanceSegmentationCOCOEvaluator¶
-
class
chainercv.extensions.
InstanceSegmentationCOCOEvaluator
(iterator, target, label_names=None)[source]¶ An extension that evaluates a instance segmentation model by MS COCO metric.
This extension iterates over an iterator and evaluates the prediction results. The results consist of average precisions (APs) and average recalls (ARs) as well as the mean of each (mean average precision and mean average recall). This extension reports the following values with keys. Please note that if
label_names
is not specified, only the mAPs and mARs are reported.The underlying dataset of the iterator is assumed to return
img, mask, label
orimg, mask, label, area, crowded
.key description ap/iou=0.50:0.95/area=all/max_dets=100/<label_names[l]> [6] ap/iou=0.50/area=all/max_dets=100/<label_names[l]> [6] ap/iou=0.75/area=all/max_dets=100/<label_names[l]> [6] ap/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]> [6] [10] ap/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]> [6] [10] ap/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]> [6] [10] ar/iou=0.50:0.95/area=all/max_dets=1/<label_names[l]> [7] ar/iou=0.50/area=all/max_dets=10/<label_names[l]> [7] ar/iou=0.75/area=all/max_dets=100/<label_names[l]> [7] ar/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]> [7] [10] ar/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]> [7] [10] ar/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]> [7] [10] map/iou=0.50:0.95/area=all/max_dets=100 [8] map/iou=0.50/area=all/max_dets=100 [8] map/iou=0.75/area=all/max_dets=100 [8] map/iou=0.50:0.95/area=small/max_dets=100 [8] [10] map/iou=0.50:0.95/area=medium/max_dets=100 [8] [10] map/iou=0.50:0.95/area=large/max_dets=100 [8] [10] ar/iou=0.50:0.95/area=all/max_dets=1 [9] ar/iou=0.50/area=all/max_dets=10 [9] ar/iou=0.75/area=all/max_dets=100 [9] ar/iou=0.50:0.95/area=small/max_dets=100 [9] [10] ar/iou=0.50:0.95/area=medium/max_dets=100 [9] [10] ar/iou=0.50:0.95/area=large/max_dets=100 [9] [10] [6] (1, 2, 3, 4, 5, 6) Average precision for class label_names[l]
, where \(l\) is the index of the class. If class \(l\) does not exist in eitherpred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[7] (1, 2, 3, 4, 5, 6) Average recall for class label_names[l]
, where \(l\) is the index of the class. If class \(l\) does not exist in eitherpred_labels
orgt_labels
, the corresponding value is set tonumpy.nan
.[8] (1, 2, 3, 4, 5, 6) The average of average precisions over classes. [9] (1, 2, 3, 4, 5, 6) The average of average recalls over classes. [10] (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) Skip if gt_areas
isNone
.Parameters: - iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
img, mask, label, area, crowded
. - target (chainer.Link) – A detection link. This link must have
predict()
method that takes a list of images and returnsmasks
,labels
andscores
. - label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision and average recalls for each class are reported.
- iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
InstanceSegmentationVOCEvaluator¶
-
class
chainercv.extensions.
InstanceSegmentationVOCEvaluator
(iterator, target, iou_thresh=0.5, use_07_metric=False, label_names=None)[source]¶ An evaluation extension of instance-segmentation by PASCAL VOC metric.
This extension iterates over an iterator and evaluates the prediction results by average precisions (APs) and mean of them (mean Average Precision, mAP). This extension reports the following values with keys. Please note that
'ap/<label_names[l]>'
is reported only iflabel_names
is specified.'map'
: Mean of average precisions (mAP).'ap/<label_names[l]>'
: Average precision for classlabel_names[l]
, where \(l\) is the index of the class. For example, this evaluator reports'ap/aeroplane'
,'ap/bicycle'
, etc. iflabel_names
issbd_instance_segmentation_label_names
. If there is no bounding box assigned to classlabel_names[l]
in either ground truth or prediction, it reportsnumpy.nan
as its average precision. In this case, mAP is computed without this class.
Parameters: - iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
img, bbox, label
orimg, bbox, label, difficult
.img
is an image,bbox
is coordinates of bounding boxes,label
is labels of the bounding boxes anddifficult
is whether the bounding boxes are difficult or not. Ifdifficult
is returned, difficult ground truth will be ignored from evaluation. - target (chainer.Link) – An instance-segmentation link. This link must
have
predict()
method that takes a list of images and returnsbboxes
,labels
andscores
. - iou_thresh (float) – Intersection over Union (IoU) threshold for calulating average precision. The default value is 0.5.
- use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
False
. - label_names (iterable of strings) – An iterable of names of classes.
If this value is specified, average precision for each class is
also reported with the key
'ap/<label_names[l]>'
.
SemanticSegmentationEvaluator¶
-
class
chainercv.extensions.
SemanticSegmentationEvaluator
(iterator, target, label_names=None)[source]¶ An extension that evaluates a semantic segmentation model.
This extension iterates over an iterator and evaluates the prediction results of the model by common evaluation metrics for semantic segmentation. This extension reports values with keys below. Please note that
'iou/<label_names[l]>'
and'class_accuracy/<label_names[l]>'
are reported only iflabel_names
is specified.'miou'
: Mean of IoUs (mIoU).'iou/<label_names[l]>'
: IoU for classlabel_names[l]
, where \(l\) is the index of the class. For example, iflabel_names
iscamvid_label_names
, this evaluator reports'iou/Sky'
,'ap/Building'
, etc.'mean_class_accuracy'
: Mean of class accuracies.'class_accuracy/<label_names[l]>'
: Class accuracy for classlabel_names[l]
, where \(l\) is the index of the class.'pixel_accuracy'
: Pixel accuracy.
If there is no label assigned to class
label_names[l]
in the ground truth, values corresponding to keys'iou/<label_names[l]>'
and'class_accuracy/<label_names[l]>'
arenumpy.nan
. In that case, the means of them are calculated by excluding them from calculation.For details on the evaluation metrics, please see the documentation for
chainercv.evaluations.eval_semantic_segmentation()
.Parameters: - iterator (chainer.Iterator) – An iterator. Each sample should be
following tuple
img, label
.img
is an image,label
is pixel-wise label. - target (chainer.Link) – A semantic segmentation link. This link should
have
predict()
method that takes a list of images and returnslabels
. - label_names (iterable of strings) – An iterable of names of classes.
If this value is specified, IoU and class accuracy for each class
are also reported with the keys
'iou/<label_names[l]>'
and'class_accuracy/<label_names[l]>'
.
Visualization Report¶
DetectionVisReport¶
-
class
chainercv.extensions.
DetectionVisReport
(iterator, target, label_names=None, filename='detection_iter={iteration}_idx={index}.jpg')[source]¶ An extension that visualizes output of a detection model.
This extension visualizes the predicted bounding boxes together with the ground truth bounding boxes.
Internally, this extension takes examples from an iterator, predict bounding boxes from the images in the examples, and visualizes them using
chainercv.visualizations.vis_bbox()
. The process can be illustrated in the following code.batch = next(iterator) # Convert batch -> imgs, gt_bboxes, gt_labels pred_bboxes, pred_labels, pred_scores = target.predict(imgs) # Visualization code for img, gt_bbox, gt_label, pred_bbox, pred_label, pred_score \ in zip(imgs, gt_boxes, gt_labels, pred_bboxes, pred_labels, pred_scores): # the ground truth vis_bbox(img, gt_bbox, gt_label) # the prediction vis_bbox(img, pred_bbox, pred_label, pred_score)
Note
gt_bbox
andpred_bbox
are float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.gt_label
andpred_label
are intenger arrays of shape \((R,)\). Each label indicates the class of the bounding box.pred_score
is a float array of shape \((R,)\). Each score indicates how confident the prediction is.Parameters: - iterator – Iterator object that produces images and ground truth.
- target – Link object used for detection.
- label_names (iterable of strings) – Name of labels ordered according
to label ids. If this is
None
, labels will be skipped. - filename (str) – Basename for the saved image. It can contain two
keywords,
'{iteration}'
and'{index}'
. They are replaced with the iteration of the trainer and the index of the sample when this extension save an image. The default value is'detection_iter={iteration}_idx={index}.jpg'
.
Functions¶
Spatial Pooling¶
psroi_pooling_2d¶
-
chainercv.functions.
psroi_pooling_2d
(x, rois, roi_indices, out_c, out_h, out_w, spatial_scale, group_size)[source]¶ Position Sensitive Region of Interest (ROI) pooling function.
This function computes position sensitive average of input spatial patch with the given region of interests. Each ROI is splitted into \((group\_size, group\_size)\) regions, and position sensitive values in each region is computed.
Parameters: - x (Variable) – Input variable. The shape is expected to be 4 dimentional: (n: batch, c: channel, h, height, w: width).
- rois (array) – Input roi. The shape is expected to
be \((R, 4)\), and each datum is set as below:
(y_min, x_min, y_max, x_max). The dtype is
numpy.float32
. - roi_indices (array) – Input roi indices. The shape is expected to
be \((R, )\). The dtype is
numpy.int32
. - out_c (int) – Channels of output image after pooled.
- out_h (int) – Height of output image after pooled.
- out_w (int) – Width of output image after pooled.
- spatial_scale (float) – Scale of the roi is resized.
- group_size (int) – Position sensitive group size.
Returns: Output variable.
Return type: Variable
See the original paper proposing PSROIPooling: R-FCN.
Links¶
Model¶
General Chain¶
General Chain¶
-
class
chainercv.links.
FeaturePredictor
(extractor, crop_size, scale_size=None, crop='center', mean=None)[source]¶ Wrapper that adds a prediction method to a feature extraction link.
The
predict()
takes three steps to make a prediction.- Preprocess input images
- Forward the preprocessed images to the network
- Average features in the case when more than one crops are extracted.
Example
>>> from chainercv.links import VGG16 >>> from chainercv.links import FeaturePredictor >>> base_model = VGG16() >>> model = FeaturePredictor(base_model, 224, 256) >>> prob = model.predict([img]) # Predicting multiple features >>> model.extractor.pick = ['conv5_3', 'fc7'] >>> conv5_3, fc7 = model.predict([img])
When
self.crop == 'center'
,predict()
extracts features from the center crop of the input images. Whenself.crop == '10'
,predict()
extracts features from patches that are ten-cropped from the input images.When extracting more than one crops from an image, the output of
predict()
returns the average of the features computed from the crops.Parameters: - extractor – A feature extraction link. This is a callable chain that takes a batch of images and returns a variable or a tuple of variables.
- crop_size (int or tuple) – The height and the width of an image after cropping in preprocessing. If this is an integer, the image is cropped to \((crop\_size, crop\_size)\).
- scale_size (int or tuple) – If
scale_size
isNone
, neither scaling nor resizing is conducted during preprocessing. This is the default behavior. If this is an integer, an image is resized so that the length of the shorter edge is equal toscale_size
. If this is a tuple(height, width)
, the image is resized to \((height, width)\). - crop ({'center', '10'}) – Determines the style of cropping.
- mean (numpy.ndarray) – A mean value. If this is
None
,extractor.mean
is used as the mean value.
-
predict
(imgs)[source]¶ Predict features from images.
Given \(N\) input images, this method outputs a batched array with batchsize \(N\).
Parameters: imgs (iterable of numpy.ndarray) – Array-images. All images are in CHW format and the range of their value is \([0, 255]\). Returns: A batch of features or a tuple of them. Return type: numpy.ndarray or tuple of numpy.ndarray
-
class
chainercv.links.
PickableSequentialChain
[source]¶ A sequential chain that can pick intermediate layers.
Callable objects, such as
chainer.Link
andchainer.Function
, can be registered to this chain withinit_scope()
. This chain keeps the order of registrations and__call__()
executes callables in that order. Achainer.Link
object in the sequence will be added as a child link of this link.__call__()
returns single or multiple layers that are picked up through a stream of computation. These layers can be specified bypick
, which contains the names of the layers that are collected. Whenpick
is a string, single layer is returned. Whenpick
is an iterable of strings, a tuple of layers is returned. The order of the layers is the same as the order of the strings inpick
. Whenpick
isNone
, the last layer is returned.Examples
>>> import chainer.functions as F >>> import chainer.links as L >>> model = PickableSequentialChain() >>> with model.init_scope(): >>> model.l1 = L.Linear(None, 1000) >>> model.l1_relu = F.relu >>> model.l2 = L.Linear(None, 1000) >>> model.l2_relu = F.relu >>> model.l3 = L.Linear(None, 10) >>> # This is layer l3 >>> layer3 = model(x) >>> # The layers to be collected can be changed. >>> model.pick = ('l2_relu', 'l1_relu') >>> # These are layers l2_relu and l1_relu. >>> layer2, layer1 = model(x)
Parameters: - pick (string or iterable of strings) – Names of layers that are collected during the forward pass.
- layer_names (iterable of strings) – Names of layers that can be collected from this chain. The names are ordered in the order of computation.
-
copy
(*args, **kargs)[source]¶ Copies the link hierarchy to new one.
The whole hierarchy rooted by this link is copied. There are three modes to perform copy. Please see the documentation for the argument
mode
below.The name of the link is reset on the copy, since the copied instance does not belong to the original parent chain (even if exists).
Parameters: mode (str) – It should be either init
,copy
, orshare
.init
means parameter variables under the returned link object is re-initialized by calling theirinitialize()
method, so that all the parameters may have different initial values from the original link.copy
means that the link object is deeply copied, so that its parameters are not re-initialized but are also deeply copied. Thus, all parameters have same initial values but can be changed independently.share
means that the link is shallowly copied, so that its parameters’ arrays are shared with the original one. Thus, their values are changed synchronously. The defaultmode
isshare
.Returns: Copied link object. Return type: Link
Feature Extraction¶
Feature extraction links extract feature(s) from given images.
ResNet¶
-
class
chainercv.links.model.resnet.
ResNet
(n_layer, n_class=None, pretrained_model=None, mean=None, initialW=None, fc_kwargs={}, arch='fb')[source]¶ Base class for ResNet architecture.
This is a pickable sequential link. The network can choose output layers from set of all intermediate layers. The attribute
pick
is the names of the layers that are going to be picked by__call__()
. The attributelayer_names
is the names of all layers that can be picked.Examples
>>> model = ResNet50() # By default, __call__ returns a probability score (after Softmax). >>> prob = model(imgs) >>> model.pick = 'res5' # This is layer res5 >>> res5 = model(imgs) >>> model.pick = ['res5', 'fc6'] >>> # These are layers res5 and fc6. >>> res5, fc6 = model(imgs)
See also
chainercv.links.model.PickableSequentialChain
When
pretrained_model
is the path of a pre-trained chainer model serialized as anpz
file in the constructor, this chain model automatically initializes all the parameters with it. When a string in the prespecified set is provided, a pretrained model is loaded from weights distributed on the Internet. The list of pretrained models supported are as follows:imagenet
: Loads weights trained with ImageNet. Whenarch=='he'
, the weights distributed at Model Zoo are used.
Parameters: - n_layer (int) – The number of layers.
- n_class (int) – The number of classes. If
None
, the default values are used. If a supported pretrained model is used, the number of classes used to train the pretrained model is used. Otherwise, the number of classes in ILSVRC 2012 dataset is used. - pretrained_model (string) – The destination of the pre-trained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - mean (numpy.ndarray) – A mean value. If
None
, the default values are used. If a supported pretrained model is used, the mean value used to train the pretrained model is used. Otherwise, the mean value calculated from ILSVRC 2012 dataset is used. - initialW (callable) – Initializer for the weights of convolution kernels.
- fc_kwargs (dict) – Keyword arguments passed to initialize
the
chainer.links.Linear
. - arch (string) – If
fb
, use Facebook ResNet architecture. Whenhe
, use the architecture presented by the original ResNet paper. This option changes where to apply strided convolution. The default value isfb
.
-
class
chainercv.links.model.resnet.
Bottleneck
(in_channels, mid_channels, out_channels, stride=1, dilate=1, groups=1, initialW=None, bn_kwargs={}, residual_conv=False, stride_first=False, add_seblock=False)[source]¶ A bottleneck layer.
Parameters: - in_channels (int) – The number of channels of the input array.
- mid_channels (int) – The number of channels of intermediate arrays.
- out_channels (int) – The number of channels of the output array.
- stride (int or tuple of ints) – Stride of filter application.
- dilate (int or tuple of ints) – Dilation factor of filter applications.
dilate=d
anddilate=(d, d)
are equivalent. - groups (int) – The number of groups to use grouped convolution in the second layer. The default is one, where grouped convolution is not used.
- initialW (callable) – Initial weight value used in the convolutional layers.
- bn_kwargs (dict) – Keyword arguments passed to initialize
chainer.links.BatchNormalization
. - residual_conv (bool) – If
True
, apply a 1x1 convolution to the residual. - stride_first (bool) – If
True
, apply strided convolution with the first convolution layer. Otherwise, apply strided convolution with the second convolution layer. - add_seblock (bool) – If
True
, apply a squeeze-and-excitation block to each residual block.
-
class
chainercv.links.model.resnet.
ResBlock
(n_layer, in_channels, mid_channels, out_channels, stride, dilate=1, groups=1, initialW=None, bn_kwargs={}, stride_first=False, add_seblock=False)[source]¶ A building block for ResNets.
in –> Bottleneck with residual_conv –> Bottleneck * (n_layer - 1) –> out
Parameters: - n_layer (int) – The number of layers used in the building block.
- in_channels (int) – The number of channels of the input array.
- mid_channels (int) – The number of channels of intermediate arrays.
- out_channels (int) – The number of channels of the output array.
- stride (int or tuple of ints) – Stride of filter application.
- dilate (int or tuple of ints) – Dilation factor of filter applications.
dilate=d
anddilate=(d, d)
are equivalent. - groups (int) – The number of groups to use grouped convolution in the second layer of each bottleneck. The default is one, where grouped convolution is not used.
- initialW (callable) – Initial weight value used in the convolutional layers.
- bn_kwargs (dict) – Keyword arguments passed to initialize
chainer.links.BatchNormalization
. - stride_first (bool) – This determines the behavior of the
bottleneck with a shortcut. If
True
, apply strided convolution with the first convolution layer. Otherwise, apply strided convolution with the second convolution layer. - add_seblock (bool) – If
True
, apply a squeeze-and-excitation block to each residual block.
SEResNet¶
-
class
chainercv.links.model.senet.
SEResNet
(n_layer, n_class=None, pretrained_model=None, mean=None, initialW=None, fc_kwargs={})[source]¶ Base class for SE-ResNet architecture.
This architecture is based on ResNet. A squeeze-and-excitation block is applied at the end of each non-identity branch of residual block. Please refer to the original paper for a detailed description of network architecture.
Similar to
chainercv.links.model.resnet.ResNet
, ImageNet pretrained weights are downloaded whenpretrained_model
argument isimagenet
, originally distributed at the Github repository by one of the paper authors.Parameters: - n_layer (int) – The number of layers.
- n_class (int) – The number of classes. If
None
, the default values are used. If a supported pretrained model is used, the number of classes used to train the pretrained model is used. Otherwise, the number of classes in ILSVRC 2012 dataset is used. - pretrained_model (string) – The destination of the pre-trained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - mean (numpy.ndarray) – A mean value. If
None
, the default values are used. If a supported pretrained model is used, the mean value used to train the pretrained model is used. Otherwise, the mean value calculated from ILSVRC 2012 dataset is used. - initialW (callable) – Initializer for the weights of convolution kernels.
- fc_kwargs (dict) – Keyword arguments passed to initialize
the
chainer.links.Linear
.
-
class
chainercv.links.model.senet.
SEResNeXt
(n_layer, n_class=None, pretrained_model=None, mean=None, initialW=None, fc_kwargs={})[source]¶ Base class for SE-ResNeXt architecture.
ResNeXt is a ResNet-based architecture, where grouped convolution is adopted to the second convolution layer of each bottleneck block. In addition, a squeeze-and-excitation block is applied at the end of each non-identity branch of residual block. Please refer to Aggregated Residual Transformations for Deep Neural Networks and Squeeze-and-Excitation Networks for detailed description of network architecture.
Similar to
chainercv.links.model.resnet.ResNet
, ImageNet pretrained weights are downloaded whenpretrained_model
argument isimagenet
, originally distributed at the Github repository by one of the paper authors of SENet.See also
chainercv.links.model.resnet.ResNet
chainercv.links.model.senet.SEResNet
chainercv.links.connection.SEBlock
Parameters: - n_layer (int) – The number of layers.
- n_class (int) – The number of classes. If
None
, the default values are used. If a supported pretrained model is used, the number of classes used to train the pretrained model is used. Otherwise, the number of classes in ILSVRC 2012 dataset is used. - pretrained_model (string) – The destination of the pre-trained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - mean (numpy.ndarray) – A mean value. If
None
, the default values are used. If a supported pretrained model is used, the mean value used to train the pretrained model is used. Otherwise, the mean value calculated from ILSVRC 2012 dataset is used. - initialW (callable) – Initializer for the weights of convolution kernels.
- fc_kwargs (dict) – Keyword arguments passed to initialize
the
chainer.links.Linear
.
VGG¶
-
class
chainercv.links.model.vgg.
VGG16
(n_class=None, pretrained_model=None, mean=None, initialW=None, initial_bias=None)[source]¶ VGG-16 Network.
This is a pickable sequential link. The network can choose output layers from set of all intermediate layers. The attribute
pick
is the names of the layers that are going to be picked by__call__()
. The attributelayer_names
is the names of all layers that can be picked.Examples
>>> model = VGG16() # By default, __call__ returns a probability score (after Softmax). >>> prob = model(imgs) >>> model.pick = 'conv5_3' # This is layer conv5_3 (after ReLU). >>> conv5_3 = model(imgs) >>> model.pick = ['conv5_3', 'fc6'] >>> # These are layers conv5_3 (after ReLU) and fc6 (before ReLU). >>> conv5_3, fc6 = model(imgs)
See also
chainercv.links.model.PickableSequentialChain
When
pretrained_model
is the path of a pre-trained chainer model serialized as anpz
file in the constructor, this chain model automatically initializes all the parameters with it. When a string in the prespecified set is provided, a pretrained model is loaded from weights distributed on the Internet. The list of pretrained models supported are as follows:imagenet
: Loads weights trained with ImageNet and distributed at Model Zoo.
Parameters: - n_class (int) – The number of classes. If
None
, the default values are used. If a supported pretrained model is used, the number of classes used to train the pretrained model is used. Otherwise, the number of classes in ILSVRC 2012 dataset is used. - pretrained_model (string) – The destination of the pre-trained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - mean (numpy.ndarray) – A mean value. If
None
, the default values are used. If a supported pretrained model is used, the mean value used to train the pretrained model is used. Otherwise, the mean value calculated from ILSVRC 2012 dataset is used. - initialW (callable) – Initializer for the weights.
- initial_bias (callable) – Initializer for the biases.
Detection¶
Detection links share a common method predict()
to detect objects in images.
For more details, please read FasterRCNN.predict()
.
Faster R-CNN¶
-
class
chainercv.links.model.faster_rcnn.
FasterRCNNVGG16
(n_fg_class=None, pretrained_model=None, min_size=600, max_size=1000, ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32], vgg_initialW=None, rpn_initialW=None, loc_initialW=None, score_initialW=None, proposal_creator_params={})[source]¶ Faster R-CNN based on VGG-16.
When you specify the path of a pre-trained chainer model serialized as a
npz
file in the constructor, this chain model automatically initializes all the parameters with it. When a string in prespecified set is provided, a pretrained model is loaded from weights distributed on the Internet. The list of pretrained models supported are as follows:voc07
: Loads weights trained with the trainval split of PASCAL VOC2007 Detection Dataset.imagenet
: Loads weights trained with ImageNet Classfication task for the feature extractor and the head modules. Weights that do not have a corresponding layer in VGG-16 will be randomly initialized.
For descriptions on the interface of this model, please refer to
FasterRCNN
.FasterRCNNVGG16
supports finer control on random initializations of weights by argumentsvgg_initialW
,rpn_initialW
,loc_initialW
andscore_initialW
. It accepts a callable that takes an array and edits its values. IfNone
is passed as an initializer, the default initializer is used.Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) – The destination of the pre-trained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - min_size (int) – A preprocessing paramter for
prepare()
. - max_size (int) – A preprocessing paramter for
prepare()
. - ratios (list of floats) – This is ratios of width to height of the anchors.
- anchor_scales (list of numbers) – This is areas of anchors.
Those areas will be the product of the square of an element in
anchor_scales
and the original area of the reference window. - vgg_initialW (callable) – Initializer for the layers corresponding to the VGG-16 layers.
- rpn_initialW (callable) – Initializer for Region Proposal Network layers.
- loc_initialW (callable) – Initializer for the localization head.
- score_initialW (callable) – Initializer for the score head.
- proposal_creator_params (dict) – Key valued paramters for
ProposalCreator
.
-
chainercv.links.model.faster_rcnn.
bbox2loc
(src_bbox, dst_bbox)[source]¶ Encodes the source and the destination bounding boxes to “loc”.
Given bounding boxes, this function computes offsets and scales to match the source bounding boxes to the target bounding boxes. Mathematcially, given a bounding box whose center is \((y, x) = p_y, p_x\) and size \(p_h, p_w\) and the target bounding box whose center is \(g_y, g_x\) and size \(g_h, g_w\), the offsets and scales \(t_y, t_x, t_h, t_w\) can be computed by the following formulas.
- \(t_y = \frac{(g_y - p_y)} {p_h}\)
- \(t_x = \frac{(g_x - p_x)} {p_w}\)
- \(t_h = \log(\frac{g_h} {p_h})\)
- \(t_w = \log(\frac{g_w} {p_w})\)
The output is same type as the type of the inputs. The encoding formulas are used in works such as R-CNN [1].
[1] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014. Parameters: - src_bbox (array) – An image coordinate array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes. These coordinates are \(p_{ymin}, p_{xmin}, p_{ymax}, p_{xmax}\).
- dst_bbox (array) – An image coordinate array whose shape is \((R, 4)\). These coordinates are \(g_{ymin}, g_{xmin}, g_{ymax}, g_{xmax}\).
Returns: Bounding box offsets and scales from
src_bbox
todst_bbox
. This has shape \((R, 4)\). The second axis contains four values \(t_y, t_x, t_h, t_w\).Return type: array
-
class
chainercv.links.model.faster_rcnn.
FasterRCNN
(extractor, rpn, head, mean, min_size=600, max_size=1000, loc_normalize_mean=(0.0, 0.0, 0.0, 0.0), loc_normalize_std=(0.1, 0.1, 0.2, 0.2))[source]¶ Base class for Faster R-CNN.
This is a base class for Faster R-CNN links supporting object detection API [2]. The following three stages constitute Faster R-CNN.
- Feature extraction: Images are taken and their feature maps are calculated.
- Region Proposal Networks: Given the feature maps calculated in the previous stage, produce set of RoIs around objects.
- Localization and Classification Heads: Using feature maps that belong to the proposed RoIs, classify the categories of the objects in the RoIs and improve localizations.
Each stage is carried out by one of the callable
chainer.Chain
objectsfeature
,rpn
andhead
.There are two functions
predict()
and__call__()
to conduct object detection.predict()
takes images and returns bounding boxes that are converted to image coordinates. This will be useful for a scenario when Faster R-CNN is treated as a black box function, for instance.__call__()
is provided for a scnerario when intermediate outputs are needed, for instance, for training and debugging.Links that support obejct detection API have method
predict()
with the same interface. Please refer topredict()
for further details.[2] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. Parameters: - extractor (callable Chain) – A callable that takes a BCHW image array and returns feature maps.
- rpn (callable Chain) – A callable that has the same interface as
RegionProposalNetwork
. Please refer to the documentation found there. - head (callable Chain) – A callable that takes a BCHW array, RoIs and batch indices for RoIs. This returns class dependent localization paramters and class scores.
- mean (numpy.ndarray) – A value to be subtracted from an image
in
prepare()
. - min_size (int) – A preprocessing paramter for
prepare()
. Please refer to a docstring found forprepare()
. - max_size (int) – A preprocessing paramter for
prepare()
. - loc_normalize_mean (tuple of four floats) – Mean values of localization estimates.
- loc_normalize_std (tupler of four floats) – Standard deviation of localization estimates.
-
__call__
(x, scales=None)[source]¶ Forward Faster R-CNN.
Scaling paramter
scales
is used by RPN to determine the threshold to select small objects, which are going to be rejected irrespective of their confidence scores.Here are notations used.
- \(N\) is the number of batch size
- \(R'\) is the total number of RoIs produced across batches. Given \(R_i\) proposed RoIs from the \(i\) th image, \(R' = \sum _{i=1} ^ N R_i\).
- \(L\) is the number of classes excluding the background.
Classes are ordered by the background, the first class, …, and the \(L\) th class.
Parameters: - x (Variable) – 4D image variable.
- scales (tuple of floats) – Amount of scaling applied to each input image during preprocessing.
Returns: Returns tuple of four values listed below.
- roi_cls_locs: Offsets and scalings for the proposed RoIs. Its shape is \((R', (L + 1) \times 4)\).
- roi_scores: Class predictions for the proposed RoIs. Its shape is \((R', L + 1)\).
- rois: RoIs proposed by RPN. Its shape is \((R', 4)\).
- roi_indices: Batch indices of RoIs. Its shape is \((R',)\).
Return type: Variable, Variable, array, array
-
predict
(imgs)[source]¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
.- bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists
-
prepare
(img)[source]¶ Preprocess an image for feature extraction.
The length of the shorter edge is scaled to
self.min_size
. After the scaling, if the length of the longer edge is longer thanself.max_size
, the image is scaled to fit the longer edge toself.max_size
.After resizing the image, the image is subtracted by a mean image value
self.mean
.Parameters: img (ndarray) – An image. This is in CHW and RGB format. The range of its value is \([0, 255]\). Returns: A preprocessed image. Return type: ndarray
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
self.nms_thresh
andself.score_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate') – A string to determine the preset to use.
-
chainercv.links.model.faster_rcnn.
generate_anchor_base
(base_size=16, ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32])[source]¶ Generate anchor base windows by enumerating aspect ratio and scales.
Generate anchors that are scaled and modified to the given aspect ratios. Area of a scaled anchor is preserved when modifying to the given aspect ratio.
R = len(ratios) * len(anchor_scales)
anchors are generated by this function. Thei * len(anchor_scales) + j
th anchor corresponds to an anchor generated byratios[i]
andanchor_scales[j]
.For example, if the scale is \(8\) and the ratio is \(0.25\), the width and the height of the base window will be stretched by \(8\). For modifying the anchor to the given aspect ratio, the height is halved and the width is doubled.
Parameters: - base_size (number) – The width and the height of the reference window.
- ratios (list of floats) – This is ratios of width to height of the anchors.
- anchor_scales (list of numbers) – This is areas of anchors.
Those areas will be the product of the square of an element in
anchor_scales
and the original area of the reference window.
Returns: An array of shape \((R, 4)\). Each element is a set of coordinates of a bounding box. The second axis corresponds to \((y_{min}, x_{min}, y_{max}, x_{max})\) of a bounding box.
Return type:
-
chainercv.links.model.faster_rcnn.
loc2bbox
(src_bbox, loc)[source]¶ Decode bounding boxes from bounding box offsets and scales.
Given bounding box offsets and scales computed by
bbox2loc()
, this function decodes the representation to coordinates in 2D image coordinates.Given scales and offsets \(t_y, t_x, t_h, t_w\) and a bounding box whose center is \((y, x) = p_y, p_x\) and size \(p_h, p_w\), the decoded bounding box’s center \(\hat{g}_y\), \(\hat{g}_x\) and size \(\hat{g}_h\), \(\hat{g}_w\) are calculated by the following formulas.
- \(\hat{g}_y = p_h t_y + p_y\)
- \(\hat{g}_x = p_w t_x + p_x\)
- \(\hat{g}_h = p_h \exp(t_h)\)
- \(\hat{g}_w = p_w \exp(t_w)\)
The decoding formulas are used in works such as R-CNN [3].
The output is same type as the type of the inputs.
[3] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014. Parameters: - src_bbox (array) – A coordinates of bounding boxes. Its shape is \((R, 4)\). These coordinates are \(p_{ymin}, p_{xmin}, p_{ymax}, p_{xmax}\).
- loc (array) – An array with offsets and scales.
The shapes of
src_bbox
andloc
should be same. This contains values \(t_y, t_x, t_h, t_w\).
Returns: Decoded bounding box coordinates. Its shape is \((R, 4)\). The second axis contains four values \(\hat{g}_{ymin}, \hat{g}_{xmin}, \hat{g}_{ymax}, \hat{g}_{xmax}\).
Return type: array
-
class
chainercv.links.model.faster_rcnn.
ProposalCreator
(nms_thresh=0.7, n_train_pre_nms=12000, n_train_post_nms=2000, n_test_pre_nms=6000, n_test_post_nms=300, force_cpu_nms=False, min_size=16)[source]¶ Proposal regions are generated by calling this object.
The
__call__()
of this object outputs object detection proposals by applying estimated bounding box offsets to a set of anchors.This class takes parameters to control number of bounding boxes to pass to NMS and keep after NMS. If the paramters are negative, it uses all the bounding boxes supplied or keep all the bounding boxes returned by NMS.
This class is used for Region Proposal Networks introduced in Faster R-CNN [4].
[4] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. Parameters: - nms_thresh (float) – Threshold value used when calling NMS.
- n_train_pre_nms (int) – Number of top scored bounding boxes to keep before passing to NMS in train mode.
- n_train_post_nms (int) – Number of top scored bounding boxes to keep after passing to NMS in train mode.
- n_test_pre_nms (int) – Number of top scored bounding boxes to keep before passing to NMS in test mode.
- n_test_post_nms (int) – Number of top scored bounding boxes to keep after passing to NMS in test mode.
- force_cpu_nms (bool) – If this is
True
, always use NMS in CPU mode. IfFalse
, the NMS mode is selected based on the type of inputs. - min_size (int) – A paramter to determine the threshold on discarding bounding boxes based on their sizes.
-
__call__
(loc, score, anchor, img_size, scale=1.0)[source]¶ Propose RoIs.
Inputs
loc, score, anchor
refer to the same anchor when indexed by the same index.On notations, \(R\) is the total number of anchors. This is equal to product of the height and the width of an image and the number of anchor bases per pixel.
Type of the output is same as the inputs.
Parameters: - loc (array) – Predicted offsets and scaling to anchors. Its shape is \((R, 4)\).
- score (array) – Predicted foreground probability for anchors. Its shape is \((R,)\).
- anchor (array) – Coordinates of anchors. Its shape is \((R, 4)\).
- img_size (tuple of ints) – A tuple
height, width
, which contains image size after scaling. - scale (float) – The scaling factor used to scale an image after reading it from a file.
Returns: An array of coordinates of proposal boxes. Its shape is \((S, 4)\). \(S\) is less than
self.n_test_post_nms
in test time and less thanself.n_train_post_nms
in train time. \(S\) depends on the size of the predicted bounding boxes and the number of bounding boxes discarded by NMS.Return type: array
-
class
chainercv.links.model.faster_rcnn.
RegionProposalNetwork
(in_channels=512, mid_channels=512, ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32], feat_stride=16, initialW=None, proposal_creator_params={})[source]¶ Region Proposal Network introduced in Faster R-CNN.
This is Region Proposal Network introduced in Faster R-CNN [5]. This takes features extracted from images and propose class agnostic bounding boxes around “objects”.
[5] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. Parameters: - in_channels (int) – The channel size of input.
- mid_channels (int) – The channel size of the intermediate tensor.
- ratios (list of floats) – This is ratios of width to height of the anchors.
- anchor_scales (list of numbers) – This is areas of anchors.
Those areas will be the product of the square of an element in
anchor_scales
and the original area of the reference window. - feat_stride (int) – Stride size after extracting features from an image.
- initialW (callable) – Initial weight value. If
None
then this function uses Gaussian distribution scaled by 0.1 to initialize weight. May also be a callable that takes an array and edits its values. - proposal_creator_params (dict) – Key valued paramters for
ProposalCreator
.
See also
-
__call__
(x, img_size, scales=None)[source]¶ Forward Region Proposal Network.
Here are notations.
- \(N\) is batch size.
- \(C\) channel size of the input.
- \(H\) and \(W\) are height and witdh of the input feature.
- \(A\) is number of anchors assigned to each pixel.
Parameters: - x (Variable) – The Features extracted from images. Its shape is \((N, C, H, W)\).
- img_size (tuple of ints) – A tuple
height, width
, which contains image size after scaling. - scales (tuple of floats) – The amount of scaling done to each input image during preprocessing.
Returns: This is a tuple of five following values.
- rpn_locs: Predicted bounding box offsets and scales for anchors. Its shape is \((N, H W A, 4)\).
- rpn_scores: Predicted foreground scores for anchors. Its shape is \((N, H W A, 2)\).
- rois: A bounding box array containing coordinates of proposal boxes. This is a concatenation of bounding box arrays from multiple images in the batch. Its shape is \((R', 4)\). Given \(R_i\) predicted bounding boxes from the \(i\) th image, \(R' = \sum _{i=1} ^ N R_i\).
- roi_indices: An array containing indices of images to which RoIs correspond to. Its shape is \((R',)\).
- anchor: Coordinates of enumerated shifted anchors. Its shape is \((H W A, 4)\).
Return type: (Variable, Variable, array, array, array)
-
class
chainercv.links.model.faster_rcnn.
VGG16RoIHead
(n_class, roi_size, spatial_scale, vgg_initialW=None, loc_initialW=None, score_initialW=None)[source]¶ Faster R-CNN Head for VGG-16 based implementation.
This class is used as a head for Faster R-CNN. This outputs class-wise localizations and classification based on feature maps in the given RoIs.
Parameters: - n_class (int) – The number of classes possibly including the background.
- roi_size (int) – Height and width of the feature maps after RoI-pooling.
- spatial_scale (float) – Scale of the roi is resized.
- vgg_initialW (callable) – Initializer for the layers corresponding to the VGG-16 layers.
- loc_initialW (callable) – Initializer for the localization head.
- score_initialW (callable) – Initializer for the score head.
-
class
chainercv.links.model.faster_rcnn.
AnchorTargetCreator
(n_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5)[source]¶ Assign the ground truth bounding boxes to anchors.
Assigns the ground truth bounding boxes to anchors for training Region Proposal Networks introduced in Faster R-CNN [6].
Offsets and scales to match anchors to the ground truth are calculated using the encoding scheme of
bbox2loc()
.[6] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. Parameters: - n_sample (int) – The number of regions to produce.
- pos_iou_thresh (float) – Anchors with IoU above this threshold will be assigned as positive.
- neg_iou_thresh (float) – Anchors with IoU below this threshold will be assigned as negative.
- pos_ratio (float) – Ratio of positive regions in the sampled regions.
-
__call__
(bbox, anchor, img_size)[source]¶ Assign ground truth supervision to sampled subset of anchors.
Types of input arrays and output arrays are same.
Here are notations.
- \(S\) is the number of anchors.
- \(R\) is the number of bounding boxes.
Parameters: - bbox (array) – Coordinates of bounding boxes. Its shape is \((R, 4)\).
- anchor (array) – Coordinates of anchors. Its shape is \((S, 4)\).
- img_size (tuple of ints) – A tuple
H, W
, which is a tuple of height and width of an image.
Returns: - loc: Offsets and scales to match the anchors to the ground truth bounding boxes. Its shape is \((S, 4)\).
- label: Labels of anchors with values
(1=positive, 0=negative, -1=ignore)
. Its shape is \((S,)\).
Return type: (array, array)
-
class
chainercv.links.model.faster_rcnn.
FasterRCNNTrainChain
(faster_rcnn, rpn_sigma=3.0, roi_sigma=1.0, anchor_target_creator=<chainercv.links.model.faster_rcnn.utils.anchor_target_creator.AnchorTargetCreator object>, proposal_target_creator=<chainercv.links.model.faster_rcnn.utils.proposal_target_creator.ProposalTargetCreator object>)[source]¶ Calculate losses for Faster R-CNN and report them.
This is used to train Faster R-CNN in the joint training scheme [7].
The losses include:
rpn_loc_loss
: The localization loss for Region Proposal Network (RPN).rpn_cls_loss
: The classification loss for RPN.roi_loc_loss
: The localization loss for the head module.roi_cls_loss
: The classification loss for the head module.
[7] (1, 2, 3) Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. Parameters: - faster_rcnn (FasterRCNN) – A Faster R-CNN model that is going to be trained.
- rpn_sigma (float) – Sigma parameter for the localization loss of Region Proposal Network (RPN). The default value is 3, which is the value used in [7].
- roi_sigma (float) – Sigma paramter for the localization loss of the head. The default value is 1, which is the value used in [7].
- anchor_target_creator – An instantiation of
AnchorTargetCreator
. - proposal_target_creator – An instantiation of
ProposalTargetCreator
.
-
__call__
(imgs, bboxes, labels, scales)[source]¶ Forward Faster R-CNN and calculate losses.
Here are notations used.
- \(N\) is the batch size.
- \(R\) is the number of bounding boxes per image.
Currently, only \(N=1\) is supported.
Parameters: - imgs (Variable) – A variable with a batch of images.
- bboxes (Variable) – A batch of bounding boxes. Its shape is \((N, R, 4)\).
- labels (Variable) – A batch of labels. Its shape is \((N, R)\). The background is excluded from the definition, which means that the range of the value is \([0, L - 1]\). \(L\) is the number of foreground classes.
- scales (Variable) – Amount of scaling applied to each input image during preprocessing.
Returns: Scalar loss variable. This is the sum of losses for Region Proposal Network and the head module.
Return type: chainer.Variable
-
class
chainercv.links.model.faster_rcnn.
ProposalTargetCreator
(n_sample=128, pos_ratio=0.25, pos_iou_thresh=0.5, neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0)[source]¶ Assign ground truth bounding boxes to given RoIs.
The
__call__()
of this class generates training targets for each object proposal. This is used to train Faster RCNN [8].[8] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015. Parameters: - n_sample (int) – The number of sampled regions.
- pos_ratio (float) – Fraction of regions that is labeled as a foreground.
- pos_iou_thresh (float) – IoU threshold for a RoI to be considered as a foreground.
- neg_iou_thresh_hi (float) – RoI is considered to be the background
if IoU is in
[
neg_iou_thresh_lo
,neg_iou_thresh_hi
). - neg_iou_thresh_lo (float) – See above.
-
__call__
(roi, bbox, label, loc_normalize_mean=(0.0, 0.0, 0.0, 0.0), loc_normalize_std=(0.1, 0.1, 0.2, 0.2))[source]¶ Assigns ground truth to sampled proposals.
This function samples total of
self.n_sample
RoIs from the combination ofroi
andbbox
. The RoIs are assigned with the ground truth class labels as well as bounding box offsets and scales to match the ground truth bounding boxes. As many aspos_ratio * self.n_sample
RoIs are sampled as foregrounds.Offsets and scales of bounding boxes are calculated using
chainercv.links.model.faster_rcnn.bbox2loc()
. Also, types of input arrays and output arrays are same.Here are notations.
- \(S\) is the total number of sampled RoIs, which equals
self.n_sample
. - \(L\) is number of object classes possibly including the background.
Parameters: - roi (array) – Region of Interests (RoIs) from which we sample. Its shape is \((R, 4)\)
- bbox (array) – The coordinates of ground truth bounding boxes. Its shape is \((R', 4)\).
- label (array) – Ground truth bounding box labels. Its shape is \((R',)\). Its range is \([0, L - 1]\), where \(L\) is the number of foreground classes.
- loc_normalize_mean (tuple of four floats) – Mean values to normalize coordinates of bounding boxes.
- loc_normalize_std (tuple of four floats) – Standard deviation of the coordinates of bounding boxes.
Returns: - sample_roi: Regions of interests that are sampled. Its shape is \((S, 4)\).
- gt_roi_loc: Offsets and scales to match the sampled RoIs to the ground truth bounding boxes. Its shape is \((S, 4)\).
- gt_roi_label: Labels assigned to sampled RoIs. Its shape is \((S,)\). Its range is \([0, L]\). The label with value 0 is the background.
Return type: (array, array, array)
- \(S\) is the total number of sampled RoIs, which equals
FPN (Feature Pyramid Networks)¶
-
class
chainercv.links.model.fpn.
FasterRCNNFPNResNet50
(n_fg_class=None, pretrained_model=None)[source]¶ Feature Pyramid Networks with ResNet-50.
This is a model of Feature Pyramid Networks [1]. This model uses
ResNet50
as its base feature extractor.[1] Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017 Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'coco'
, filepath orNone
. The default value isNone
.'coco'
: Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically.n_fg_class
must be80
orNone
.'imagenet'
: Load weights of ResNet-50 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
-
class
chainercv.links.model.fpn.
FasterRCNNFPNResNet101
(n_fg_class=None, pretrained_model=None)[source]¶ Feature Pyramid Networks with ResNet-101.
This is a model of Feature Pyramid Networks [2]. This model uses
ResNet101
as its base feature extractor.[2] Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017 Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'coco'
, filepath orNone
. The default value isNone
.'coco'
: Load weights trained on train split of MS COCO 2017. The weight file is downloaded and cached automatically.n_fg_class
must be80
orNone
.'imagenet'
: Load weights of ResNet-101 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
-
class
chainercv.links.model.fpn.
FasterRCNN
(extractor, rpn, head)[source]¶ Base class of Feature Pyramid Networks.
This is a base class of Feature Pyramid Networks [3].
[3] Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. CVPR 2017 Parameters: - extractor (Link) – A link that extracts feature maps.
This link must have
scales
,mean
and__call__()
. - rpn (Link) – A link that has the same interface as
RPN
. Please refer to the documentation found there. - head (Link) – A link that has the same interface as
Head
. Please refer to the documentation found there. - nms_thresh (float) – The threshold value
for
non_maximum_suppression()
. The default value is0.45
. This value can be changed directly or by usinguse_preset()
. - score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than this value,
the bounding box will be suppressed.
The default value is
0.6
. This value can be changed directly or by usinguse_preset()
.
-
predict
(imgs)[source]¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
.- bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists
-
prepare
(imgs)[source]¶ Preprocess images.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: preprocessed images and scales that were caluclated in prepocessing. Return type: Two arrays
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
- extractor (Link) – A link that extracts feature maps.
This link must have
-
class
chainercv.links.model.fpn.
FPN
(base, n_base_output, scales)[source]¶ An extractor class of Feature Pyramid Networks.
This class wraps a feature extractor and provides multi-scale features.
Parameters: - base (Link) – A base feature extractor.
It should have
__call__()
andmean
.__call__()
should take a batch of images and return feature maps of them. The size of the \(k+1\)-th feature map should be the half as that of the \(k\)-th feature map. - n_base_output (int) – The number of feature maps
that
base
returns. - scales (tuple of floats) – The scales of feature maps.
- base (Link) – A base feature extractor.
It should have
-
class
chainercv.links.model.fpn.
Head
(n_class, scales)[source]¶ Head network of Feature Pyramid Networks.
Parameters: - n_class (int) – The number of classes including background.
- scales (tuple of floats) – The scales of feature maps.
-
__call__
(hs, rois, roi_indices)[source]¶ Calculates RoIs.
Parameters: - hs (iterable of array) – An iterable of feature maps.
- rois (list of arrays) – A list of arrays of shape: math: (R_l, 4), where: math: R_l is the number of RoIs in the: math: l- th feature map.
- roi_indices (list of arrays) – A list of arrays of shape \((R_l,)\).
Returns: locs
andconfs
.- locs: An arrays whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the batch.
” confs: A list of array whose shape is \((R, n\_class)\).
Return type: tuple of two arrays
-
decode
(rois, roi_indices, locs, confs, scales, sizes, nms_thresh, score_thresh)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locs
andconfs
returned by a FPN network back tobboxes
,labels
andscores
.Parameters: - rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
- roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
- locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
- confs (array) – An array whose shape is \((R, n\_class)\).
- scales (list of floats) – A list of floats returned
by
prepare()
- sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
- nms_thresh (float) – The threshold value
for
non_maximum_suppression()
. - score_thresh (float) – The threshold value for confidence score.
Returns: bboxes
,labels
andscores
.Return type: tuple of three list of arrays
- bboxes: A list of float arrays of shape \((R'_n, 4)\), where \(R'_n\) is the number of bounding boxes in the \(n\)-th image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R'_n,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R'_n,)\). Each value indicates how confident the prediction is.
-
distribute
(rois, roi_indices)[source]¶ Assigns Rois to feature maps according to their size.
Parameters: - rois (array) – An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
- roi_indices (array) – An array of shape \((R,)\).
Returns: rois
androi_indices
.- rois: A list of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
- roi_indices : A list of arrays of shape \((R_l,)\).
Return type: tuple of two lists
-
class
chainercv.links.model.fpn.
RPN
(scales)[source]¶ Region Proposal Network of Feature Pyramid Networks.
Parameters: scales (tuple of floats) – The scales of feature maps. -
__call__
(hs)[source]¶ Calculates RoIs.
Parameters: hs (iterable of array) – An iterable of feature maps. Returns: locs
andconfs
.- locs: A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
” confs: A list of array whose shape is \((N, K_l)\).
Return type: tuple of two arrays
-
anchors
(sizes)[source]¶ Calculates anchor boxes.
Parameters: sizes (iterable of tuples of two ints) – An iterable of \((H_l, W_l)\), where \(H_l\) and \(W_l\) are height and width of the \(l\)-th feature map. Returns: The shape of the \(l\)-th array is \((H_l * W_l * A, 4)\), where \(A\) is the number of anchor ratios. Return type: list of arrays
-
decode
(locs, confs, anchors, in_shape)[source]¶ Decodes back to coordinates of RoIs.
This method decodes
locs
andconfs
returned by a FPN network back torois
androi_indices
.Parameters: - locs (list of arrays) – A list of arrays whose shape is \((N, K_l, 4)\), where \(N\) is the size of batch and \(N_l\) is the number of the anchor boxes of the \(l\)-th level.
- confs (list of arrays) – A list of array whose shape is \((N, K_l)\).
- anchors (list of arrays) – Anchor boxes returned by
anchors()
. - in_shape (tuple of ints) – The shape of input of array the feature extractor.
Returns: rois
androi_indices
.- rois: An array of shape \((R, 4)\), where \(R\) is the total number of RoIs in the given batch.
- roi_indices : An array of shape \((R,)\).
Return type: tuple of two arrays
-
-
chainercv.links.model.fpn.
head_loss_pre
(rois, roi_indices, std, bboxes, labels)[source]¶ Loss function for Head (pre).
This function processes RoIs for
head_loss_post()
.Parameters: - rois (iterable of arrays) – An iterable of arrays of shape \((R_l, 4)\), where \(R_l\) is the number of RoIs in the \(l\)-th feature map.
- roi_indices (iterable of arrays) – An iterable of arrays of shape \((R_l,)\).
- std (tuple of floats) – Two coefficients used for encoding bounding boxes.
- bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
- labels – A list of arrays whose shape is \((R_n,)\).
-
chainercv.links.model.fpn.
head_loss_post
(locs, confs, roi_indices, gt_locs, gt_labels, batchsize)[source]¶ Loss function for Head (post).
Parameters: - locs (array) – An array whose shape is \((R, n\_class, 4)\), where \(R\) is the total number of RoIs in the given batch.
- confs (array) – An iterable of arrays whose shape is \((R, n\_class)\).
- roi_indices (list of arrays) – A list of arrays returned by
head_locs_pre()
. - gt_locs (list of arrays) – A list of arrays returned by
head_locs_pre()
. - gt_labels (list of arrays) – A list of arrays returned by
head_locs_pre()
. - batchsize (int) – The size of batch.
Returns: loc_loss
andconf_loss
.Return type: tuple of two variables
-
chainercv.links.model.fpn.
rpn_loss
(locs, confs, anchors, sizes, bboxes)[source]¶ Loss function for RPN.
Parameters: - locs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l, 4)\), where \(K_l\) is the number of the anchor boxes of the \(l\)-th level.
- confs (iterable of arrays) – An iterable of arrays whose shape is \((N, K_l)\).
- anchors (list of arrays) – A list of arrays returned by
anchors()
. - sizes (list of tuples of two ints) – A list of \((H_n, W_n)\), where \(H_n\) and \(W_n\) are height and width of the \(n\)-th image.
- bboxes (list of arrays) – A list of arrays whose shape is \((R_n, 4)\), where \(R_n\) is the number of ground truth bounding boxes.
Returns: loc_loss
andconf_loss
.Return type: tuple of two variables
SSD (Single Shot Multibox Detector)¶
-
class
chainercv.links.model.ssd.
SSD300
(n_fg_class=None, pretrained_model=None)[source]¶ Single Shot Multibox Detector with 300x300 inputs.
This is a model of Single Shot Multibox Detector [1]. This model uses
VGG16Extractor300
as its feature extractor.[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the Caffe model provided by the original implementation. The conversion code is chainercv/examples/ssd/caffe2npz.py.'imagenet'
: Load weights of VGG-16 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
-
class
chainercv.links.model.ssd.
SSD512
(n_fg_class=None, pretrained_model=None)[source]¶ Single Shot Multibox Detector with 512x512 inputs.
This is a model of Single Shot Multibox Detector [2]. This model uses
VGG16Extractor512
as its feature extractor.[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the Caffe model provided by the original implementation. The conversion code is chainercv/examples/ssd/caffe2npz.py.'imagenet'
: Load weights of VGG-16 trained on ImageNet. The weight file is downloaded and cached automatically. This option initializes weights partially and the rests are initialized randomly. In this case,n_fg_class
can be set to any number.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
-
class
chainercv.links.model.ssd.
Multibox
(n_class, aspect_ratios, initialW=None, initial_bias=None)[source]¶ Multibox head of Single Shot Multibox Detector.
This is a head part of Single Shot Multibox Detector [3]. This link computes
mb_locs
andmb_confs
from feature maps.mb_locs
contains information of the coordinates of bounding boxes andmb_confs
contains confidence scores of each classes.[3] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - n_class (int) – The number of classes possibly including the background.
- aspect_ratios (iterable of tuple or int) – The aspect ratios of default bounding boxes for each feature map.
- initialW – An initializer used in
chainer.links.Convolution2d.__init__()
. The default value ischainer.initializers.LeCunUniform
. - initial_bias – An initializer used in
chainer.links.Convolution2d.__init__()
. The default value ischainer.initializers.Zero
.
-
__call__
(xs)[source]¶ Compute loc and conf from feature maps
This method computes
mb_locs
andmb_confs
from given feature maps.Parameters: xs (iterable of chainer.Variable) – An iterable of feature maps. The number of feature maps must be same as the number of aspect_ratios
.Returns: This method returns two chainer.Variable
:mb_locs
andmb_confs
.- mb_locs: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
- mb_confs: A variable of float arrays of shape \((B, K, n\_fg\_class + 1)\).
Return type: tuple of chainer.Variable
-
class
chainercv.links.model.ssd.
MultiboxCoder
(grids, aspect_ratios, steps, sizes, variance)[source]¶ A helper class to encode/decode bounding boxes.
This class encodes
(bbox, label)
to(mb_loc, mb_label)
and decodes(mb_loc, mb_conf)
to(bbox, label, score)
. These encoding/decoding are used in Single Shot Multibox Detector [4].mb_loc
: An array representing offsets and scales from the default bounding boxes. Its shape is \((K, 4)\), where \(K\) is the number of the default bounding boxes. The second axis is composed by \((\Delta y, \Delta x, \Delta h, \Delta w)\). These values are computed by the following formulas.- \(\Delta y = (b_y - m_y) / (m_h * v_0)\)
- \(\Delta x = (b_x - m_x) / (m_w * v_0)\)
- \(\Delta h = log(b_h / m_h) / v_1\)
- \(\Delta w = log(b_w / m_w) / v_1\)
\((m_y, m_x)\) and \((m_h, m_w)\) are center coodinates and size of a default bounding box. \((b_y, b_x)\) and \((b_h, b_w)\) are center coodinates and size of a given bounding boxes that is assined to the default bounding box. \((v_0, v_1)\) are coefficients that can be set by argument
variance
.mb_label
: An array representing classes of ground truth bounding boxes. Its shape is \((K,)\).mb_conf
: An array representing classes of predicted bounding boxes. Its shape is \((K, n\_fg\_class + 1)\).
[4] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - grids (iterable of ints) – An iterable of integers. Each integer indicates the size of a feature map.
- aspect_ratios (iterable of tuples of ints) – An iterable of tuples of integers
used to compute the default bounding boxes.
Each tuple indicates the aspect ratios of
the default bounding boxes at each feature maps.
The length of this iterable should be
len(grids)
. - steps (iterable of floats) – The step size for each feature map.
The length of this iterable should be
len(grids)
. - sizes (iterable of floats) – The base size of default bounding boxes
for each feature map.
The length of this iterable should be
len(grids) + 1
. - variance (tuple of floats) – Two coefficients for encoding/decoding the locations of bounding boxes. The first value is used to encode/decode coordinates of the centers. The second value is used to encode/decode the sizes of bounding boxes.
-
decode
(mb_loc, mb_conf, nms_thresh=0.45, score_thresh=0.6)[source]¶ Decodes back to coordinates and classes of bounding boxes.
This method decodes
mb_loc
andmb_conf
returned by a SSD network back tobbox
,label
andscore
.Parameters: - mb_loc (array) – A float array whose shape is \((K, 4)\), \(K\) is the number of default bounding boxes.
- mb_conf (array) – A float array whose shape is \((K, n\_fg\_class + 1)\).
- nms_thresh (float) – The threshold value
for
non_maximum_suppression()
. The default value is0.45
. - score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than
this value, the bounding box will be suppressed.
The default value is
0.6
.
Returns: This method returns a tuple of three arrays,
(bbox, label, score)
.- bbox: A float array of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- label : An integer array of shape \((R,)\). Each value indicates the class of the bounding box.
- score : A float array of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of three arrays
-
encode
(bbox, label, iou_thresh=0.5)[source]¶ Encodes coordinates and classes of bounding boxes.
This method encodes
bbox
andlabel
tomb_loc
andmb_label
, which are used to compute multibox loss.Parameters: - bbox (array) – A float array of shape \((R, 4)\), where \(R\) is the number of bounding boxes in an image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- label (array) – An integer array of shape \((R,)\). Each value indicates the class of the bounding box.
- iou_thresh (float) – The threshold value to determine
a default bounding box is assigned to a ground truth
or not. The default value is
0.5
.
Returns: This method returns a tuple of two arrays,
(mb_loc, mb_label)
.- mb_loc: A float array of shape \((K, 4)\), where \(K\) is the number of default bounding boxes.
- mb_label: An integer array of shape \((K,)\).
Return type: tuple of two arrays
-
class
chainercv.links.model.ssd.
Normalize
(n_channel, initial=0, eps=1e-05)[source]¶ Learnable L2 normalization [5].
This link normalizes input along the channel axis and scales it. The scale factors are trained channel-wise.
[5] Wei Liu, Andrew Rabinovich, Alexander C. Berg. ParseNet: Looking Wider to See Better. ICLR 2016. Parameters:
-
class
chainercv.links.model.ssd.
SSD
(extractor, multibox, steps, sizes, variance=(0.1, 0.2), mean=0)[source]¶ Base class of Single Shot Multibox Detector.
This is a base class of Single Shot Multibox Detector [6].
[6] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - extractor –
A link which extracts feature maps. This link must have
insize
,grids
and__call__()
.insize
: An integer which indicates the size of input images. Images are resized to this size before feature extraction.grids
: An iterable of integer. Each integer indicates the size of feature map. This value is used byMultiBboxCoder
.__call_()
: A method which computes feature maps. It must take a batched images and return batched feature maps.
- multibox –
A link which computes
mb_locs
andmb_confs
from feature maps. This link must haven_class
,aspect_ratios
and__call__()
.n_class
: An integer which indicates the number of classes. This value should include the background class.aspect_ratios
: An iterable of tuple of integer. Each tuple indicates the aspect ratios of default bounding boxes at each feature maps. This value is used byMultiboxCoder
.__call__()
: A method which computesmb_locs
andmb_confs
. It must take a batched feature maps and returnmb_locs
andmb_confs
.
- steps (iterable of float) – The step size for each feature map.
This value is used by
MultiboxCoder
. - sizes (iterable of float) – The base size of default bounding boxes
for each feature map. This value is used by
MultiboxCoder
. - variance (tuple of floats) – Two coefficients for decoding
the locations of bounding boxes.
This value is used by
MultiboxCoder
. The default value is(0.1, 0.2)
. - nms_thresh (float) – The threshold value
for
non_maximum_suppression()
. The default value is0.45
. This value can be changed directly or by usinguse_preset()
. - score_thresh (float) – The threshold value for confidence score.
If a bounding box whose confidence score is lower than this value,
the bounding box will be suppressed.
The default value is
0.6
. This value can be changed directly or by usinguse_preset()
.
-
__call__
(x)[source]¶ Compute localization and classification from a batch of images.
This method computes two variables,
mb_locs
andmb_confs
.self.coder.decode()
converts these variables to bounding box coordinates and confidence scores. These variables are also used in training SSD.Parameters: x (chainer.Variable) – A variable holding a batch of images. The images are preprocessed by _prepare()
.Returns: This method returns two variables, mb_locs
andmb_confs
.- mb_locs: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
- mb_confs: A variable of float arrays of shape \((B, K, n\_fg\_class + 1)\).
Return type: tuple of chainer.Variable
-
predict
(imgs)[source]¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
.- bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists
-
to_cpu
()[source]¶ Copies parameter variables and persistent values to CPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to CPU, the link implementation must override this method to do so.
Returns: self
-
to_gpu
(device=None)[source]¶ Copies parameter variables and persistent values to GPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to GPU, the link implementation must override this method to do so.
Parameters: device – Target device specifier. If omitted, the current device is used. Returns: self
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
- extractor –
-
class
chainercv.links.model.ssd.
VGG16
[source]¶ An extended VGG-16 model for SSD300 and SSD512.
This is an extended VGG-16 model proposed in [7]. The differences from original VGG-16 [8] are shown below.
conv5_1
,conv5_2
andconv5_3
are changed fromConvolution2d
toDilatedConvolution2d
.Normalize
is inserted afterconv4_3
.- The parameters of max pooling after
conv5_3
are changed. fc6
andfc7
are converted toconv6
andconv7
.
[7] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. [8] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015.
-
class
chainercv.links.model.ssd.
VGG16Extractor300
[source]¶ A VGG-16 based feature extractor for SSD300.
This is a feature extractor for
SSD300
. This extractor is based onVGG16
.-
__call__
(x)[source]¶ Compute feature maps from a batch of images.
This method extracts feature maps from
conv4_3
,conv7
,conv8_2
,conv9_2
,conv10_2
, andconv11_2
.Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(300\times 300\). Returns: Each variable contains a feature map. Return type: list of Variable
-
-
class
chainercv.links.model.ssd.
VGG16Extractor512
[source]¶ A VGG-16 based feature extractor for SSD512.
This is a feature extractor for
SSD512
. This extractor is based onVGG16
.-
__call__
(x)[source]¶ Compute feature maps from a batch of images.
This method extracts feature maps from
conv4_3
,conv7
,conv8_2
,conv9_2
,conv10_2
,conv11_2
, andconv12_2
.Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(512\times 512\). Returns: Each variable contains a feature map. Return type: list of Variable
-
-
chainercv.links.model.ssd.
multibox_loss
(mb_locs, mb_confs, gt_mb_locs, gt_mb_labels, k, comm=None)[source]¶ Computes multibox losses.
This is a loss function used in [9]. This function returns
loc_loss
andconf_loss
.loc_loss
is a loss for localization andconf_loss
is a loss for classification. The formulas of these losses can be found in the equation (2) and (3) in the original paper.[9] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - mb_locs (chainer.Variable or array) – The offsets and scales for predicted bounding boxes. Its shape is \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
- mb_confs (chainer.Variable or array) – The classes of predicted bounding boxes. Its shape is \((B, K, n\_class)\). This function assumes the first class is background (negative).
- gt_mb_locs (chainer.Variable or array) – The offsets and scales for ground truth bounding boxes. Its shape is \((B, K, 4)\).
- gt_mb_labels (chainer.Variable or array) – The classes of ground truth bounding boxes. Its shape is \((B, K)\).
- k (float) – A coefficient which is used for hard negative mining.
This value determines the ratio between the number of positives
and that of mined negatives. The value used in the original paper
is
3
. - comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, the number of positive examples is computed among all GPUs.
Returns: This function returns two
chainer.Variable
:loc_loss
andconf_loss
.Return type: tuple of chainer.Variable
-
chainercv.links.model.ssd.
random_crop_with_bbox_constraints
(img, bbox, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50, return_param=False)[source]¶ Crop an image randomly with bounding box constraints.
This data augmentation is used in training of Single Shot Multibox Detector [10]. More details can be found in data augmentation section of the original paper.
[10] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - img (ndarray) – An image array to be cropped. This is in CHW format.
- bbox (ndarray) – Bounding boxes used for constraints. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
- min_scale (float) – The minimum ratio between a cropped
region and the original image. The default value is
0.3
. - max_scale (float) – The maximum ratio between a cropped
region and the original image. The default value is
1
. - max_aspect_ratio (float) – The maximum aspect ratio of cropped region.
The default value is
2
. - constaraints (iterable of tuples) – An iterable of constraints.
Each constraint should be
(min_iou, max_iou)
format. If you setmin_iou
ormax_iou
toNone
, it means not limited. If this argument is not specified,((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1))
will be used. - max_trial (int) – The maximum number of trials to be conducted
for each constraint. If this function
can not find any region that satisfies the constraint in
\(max\_trial\) trials, this function skips the constraint.
The default value is
50
. - return_param (bool) – If
True
, this function returns information of intermediate values.
Returns: If
return_param = False
, returns an arrayimg
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areimg, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- constraint (tuple): The chosen constraint.
- y_slice (slice): A slice in vertical direction used to crop the input image.
- x_slice (slice): A slice in horizontal direction used to crop the input image.
Return type:
-
chainercv.links.model.ssd.
random_distort
(img, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)[source]¶ A color related data augmentation used in SSD.
This function is a combination of four augmentation methods: brightness, contrast, saturation and hue.
- brightness: Adding a random offset to the intensity of the image.
- contrast: Multiplying the intensity of the image by a random scale.
- saturation: Multiplying the saturation of the image by a random scale.
- hue: Adding a random offset to the hue of the image randomly.
This data augmentation is used in training of Single Shot Multibox Detector [11].
Note that this function requires
cv2
.[11] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - img (ndarray) – An image array to be augmented. This is in CHW and RGB format.
- brightness_delta (float) – The offset for saturation will be
drawn from \([-brightness\_delta, brightness\_delta]\).
The default value is
32
. - contrast_low (float) – The scale for contrast will be
drawn from \([contrast\_low, contrast\_high]\).
The default value is
0.5
. - contrast_high (float) – See
contrast_low
. The default value is1.5
. - saturation_low (float) – The scale for saturation will be
drawn from \([saturation\_low, saturation\_high]\).
The default value is
0.5
. - saturation_high (float) – See
saturation_low
. The default value is1.5
. - hue_delta (float) – The offset for hue will be
drawn from \([-hue\_delta, hue\_delta]\).
The default value is
18
.
Returns: An image in CHW and RGB format.
-
chainercv.links.model.ssd.
resize_with_random_interpolation
(img, size, return_param=False)[source]¶ Resize an image with a randomly selected interpolation method.
This function is similar to
chainercv.transforms.resize()
, but this chooses the interpolation method randomly.This data augmentation is used in training of Single Shot Multibox Detector [12].
Note that this function requires
cv2
.[12] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: Returns: If
return_param = False
, returns an arrayimg
that is the result of rotation.If
return_param = True
, returns a tuple whose elements areimg, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- interpolatation: The chosen interpolation method.
Return type:
YOLO¶
-
class
chainercv.links.model.yolo.
YOLOv2
(n_fg_class=None, pretrained_model=None)[source]¶ YOLOv2.
This is a model of YOLOv2 [1]. This model uses
Darknet19Extractor
as its feature extractor.[1] Joseph Redmon, Ali Farhadi. YOLO9000: Better, Faster, Stronger. CVPR 2017. Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the darknet model provided by the original implementation. The conversion code is chainercv/examples/yolo/darknet2npz.py.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
-
class
chainercv.links.model.yolo.
YOLOv3
(n_fg_class=None, pretrained_model=None)[source]¶ YOLOv3.
This is a model of YOLOv3 [2]. This model uses
Darknet53Extractor
as its feature extractor.[2] Joseph Redmon, Ali Farhadi. YOLOv3: An Incremental Improvement. arXiv 2018. Parameters: - n_fg_class (int) – The number of classes excluding the background.
- pretrained_model (string) –
The weight file to be loaded. This can take
'voc0712'
, filepath orNone
. The default value isNone
.'voc0712'
: Load weights trained on trainval split of PASCAL VOC 2007 and 2012. The weight file is downloaded and cached automatically.n_fg_class
must be20
orNone
. These weights were converted from the darknet model. The conversion code is chainercv/examples/yolo/darknet2npz.py.- filepath: A path of npz file. In this case,
n_fg_class
must be specified properly. None
: Do not load weights.
-
to_cpu
()[source]¶ Copies parameter variables and persistent values to CPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to CPU, the link implementation must override this method to do so.
Returns: self
-
to_gpu
(device=None)[source]¶ Copies parameter variables and persistent values to GPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to GPU, the link implementation must override this method to do so.
Parameters: device – Target device specifier. If omitted, the current device is used. Returns: self
-
class
chainercv.links.model.yolo.
Darknet53Extractor
[source]¶ A Darknet53 based feature extractor for YOLOv3.
This is a feature extractor for
YOLOv3
-
__call__
(x)[source]¶ Compute feature maps from a batch of images.
This method extracts feature maps from 3 layers.
Parameters: x (ndarray) – An array holding a batch of images. The images should be resized to \(416\times 416\). Returns: Each variable contains a feature map. Return type: list of Variable
-
-
class
chainercv.links.model.yolo.
YOLOBase
(**links)[source]¶ Base class for YOLOv2 and YOLOv3.
A subclass of this class should have
extractor
,__call__()
, and_decode()
.-
predict
(imgs)[source]¶ Detect objects from images.
This method predicts objects for each image.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their value is \([0, 255]\). Returns: This method returns a tuple of three lists, (bboxes, labels, scores)
.- bboxes: A list of float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in a image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- labels : A list of integer arrays of shape \((R,)\). Each value indicates the class of the bounding box. Values are in range \([0, L - 1]\), where \(L\) is the number of the foreground classes.
- scores : A list of float arrays of shape \((R,)\). Each value indicates how confident the prediction is.
Return type: tuple of lists
-
use_preset
(preset)[source]¶ Use the given preset during prediction.
This method changes values of
nms_thresh
andscore_thresh
. These values are a threshold value used for non maximum suppression and a threshold value to discard low confidence proposals inpredict()
, respectively.If the attributes need to be changed to something other than the values provided in the presets, please modify them by directly accessing the public attributes.
Parameters: preset ({'visualize', 'evaluate'}) – A string to determine the preset to use.
-
-
class
chainercv.links.model.yolo.
YOLOv2Base
(n_fg_class=None, pretrained_model=None)[source]¶ Base class for YOLOv2 and YOLOv2Tiny.
A subclass of this class should have
_extractor
,_models
, and_anchors
.-
__call__
(x)[source]¶ Compute localization, objectness, and classification from a batch of images.
This method computes three variables,
locs
,objs
, andconfs
.self._decode()
converts these variables to bounding box coordinates and confidence scores. These variables are also used in training YOLOv2.Parameters: x (chainer.Variable) – A variable holding a batch of images. Returns: This method returns three variables, locs
,objs
, andconfs
.- locs: A variable of float arrays of shape \((B, K, 4)\), where \(B\) is the number of samples in the batch and \(K\) is the number of default bounding boxes.
- objs: A variable of float arrays of shape \((B, K)\).
- confs: A variable of float arrays of shape \((B, K, n\_fg\_class)\).
Return type: tuple of chainer.Variable
-
to_cpu
()[source]¶ Copies parameter variables and persistent values to CPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to CPU, the link implementation must override this method to do so.
Returns: self
-
to_gpu
(device=None)[source]¶ Copies parameter variables and persistent values to GPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to GPU, the link implementation must override this method to do so.
Parameters: device – Target device specifier. If omitted, the current device is used. Returns: self
-
Semantic Segmentation¶
Semantic segmentation links share a common method predict()
to conduct semantic segmentation of images.
For more details, please read SegNetBasic.predict()
.
SegNet¶
-
class
chainercv.links.model.segnet.
SegNetBasic
(n_class=None, pretrained_model=None, initialW=None)[source]¶ SegNet Basic for semantic segmentation.
This is a SegNet [1] model for semantic segmenation. This is based on SegNetBasic model that is found here.
When you specify the path of a pretrained chainer model serialized as a
npz
file in the constructor, this chain model automatically initializes all the parameters with it. When a string in prespecified set is provided, a pretrained model is loaded from weights distributed on the Internet. The list of pretrained models supported are as follows:camvid
: Loads weights trained with the train split of CamVid dataset.
[1] Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” PAMI, 2017 Parameters: - n_class (int) – The number of classes. If
None
, it can be infered ifpretrained_model
is given. - pretrained_model (string) – The destination of the pretrained
chainer model serialized as a
npz
file. If this is one of the strings described above, it automatically loads weights stored under a directory$CHAINER_DATASET_ROOT/pfnet/chainercv/models/
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. - initialW (callable) – Initializer for convolution layers.
-
__call__
(x)[source]¶ Compute an image-wise score from a batch of images
Parameters: x (chainer.Variable) – A variable with 4D image array. Returns: An image-wise score. Its channel size is self.n_class
.Return type: chainer.Variable
-
predict
(imgs)[source]¶ Conduct semantic segmentations from images.
Parameters: imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their values are \([0, 255]\). Returns: List of integer labels predicted from each image in the input list. Return type: list of numpy.ndarray
Classifiers¶
Classifier¶
-
class
chainercv.links.
PixelwiseSoftmaxClassifier
(predictor, ignore_label=-1, class_weight=None)[source]¶ A pixel-wise classifier.
It computes the loss based on a given input/label pair for semantic segmentation.
Parameters: - predictor (Link) – Predictor network.
- ignore_label (int) – A class id that is going to be ignored in evaluation. The default value is -1.
- class_weight (array) – An array
that contains constant weights that will be multiplied with the
loss values along with the channel dimension. This will be
used in
chainer.functions.softmax_cross_entropy()
.
-
__call__
(x, t)[source]¶ Computes the loss value for an image and label pair.
Parameters: - x (Variable) – A variable with a batch of images.
- t (Variable) – A variable with the ground truth image-wise label.
Returns: Loss value.
Return type: Variable
-
to_cpu
()[source]¶ Copies parameter variables and persistent values to CPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to CPU, the link implementation must override this method to do so.
Returns: self
-
to_gpu
(device=None)[source]¶ Copies parameter variables and persistent values to GPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to GPU, the link implementation must override this method to do so.
Parameters: device – Target device specifier. If omitted, the current device is used. Returns: self
Connection¶
Connection¶
Conv2DActiv¶
-
class
chainercv.links.connection.
Conv2DActiv
(in_channels, out_channels, ksize=None, stride=1, pad=0, dilate=1, nobias=False, initialW=None, initial_bias=None, activ=<function relu>)[source]¶ Convolution2D –> Activation
This is a chain that does two-dimensional convolution and applies an activation.
The arguments are the same as those of
chainer.links.Convolution2D
except foractiv
.Example
There are several ways to initialize a
Conv2DActiv
.Give the first three arguments explicitly:
>>> l = Conv2DActiv(5, 10, 3)
Omit
in_channels
or fill it withNone
:In these ways, attributes are initialized at runtime based on the channel size of the input.
>>> l = Conv2DActiv(10, 3) >>> l = Conv2DActiv(None, 10, 3)
Parameters: - in_channels (int or None) – Number of channels of input arrays.
If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined. - out_channels (int) – Number of channels of output arrays.
- ksize (int or tuple of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent. - stride (int or tuple of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent. - pad (int or tuple of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent. - dilate (int or tuple of ints) – Dilation factor of filter applications.
dilate=d
anddilate=(d, d)
are equivalent. - nobias (bool) – If
True
, then this link does not use the bias term. - initialW (callable) – Initial weight value. If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value. - initial_bias (callable) – Initial bias value. If
None
, the bias is set to 0. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value. - activ (callable) – An activation function. The default value is
chainer.functions.relu()
. If this isNone
, no activation is applied (i.e. the activation is the identity function).
Conv2DBNActiv¶
-
class
chainercv.links.connection.
Conv2DBNActiv
(in_channels, out_channels, ksize=None, stride=1, pad=0, dilate=1, groups=1, nobias=True, initialW=None, initial_bias=None, activ=<function relu>, bn_kwargs={})[source]¶ Convolution2D –> Batch Normalization –> Activation
This is a chain that sequentially applies a two-dimensional convolution, a batch normalization and an activation.
The arguments are the same as that of
chainer.links.Convolution2D
except foractiv
andbn_kwargs
.bn_kwargs
can includecomm
key and a communicator of ChainerMN as the value to usechainermn.links.MultiNodeBatchNormalization
. Ifcomm
is not included inbn_kwargs
,chainer.links.BatchNormalization
link from Chainer is used. Note that the default value for thenobias
is changed toTrue
.Example
There are several ways to initialize a
Conv2DBNActiv
.Give the first three arguments explicitly:
>>> l = Conv2DBNActiv(5, 10, 3)
Omit
in_channels
or fill it withNone
:In these ways, attributes are initialized at runtime based on the channel size of the input.
>>> l = Conv2DBNActiv(10, 3) >>> l = Conv2DBNActiv(None, 10, 3)
Parameters: - in_channels (int or None) – Number of channels of input arrays.
If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined. - out_channels (int) – Number of channels of output arrays.
- ksize (int or tuple of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent. - stride (int or tuple of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent. - pad (int or tuple of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent. - dilate (int or tuple of ints) – Dilation factor of filter applications.
dilate=d
anddilate=(d, d)
are equivalent. - groups (int) – The number of groups to use grouped convolution. The default is one, where grouped convolution is not used.
- nobias (bool) – If
True
, then this link does not use the bias term. - initialW (callable) – Initial weight value. If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value. - initial_bias (callable) – Initial bias value. If
None
, the bias is set to 0. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value. - activ (callable) – An activation function. The default value is
chainer.functions.relu()
. If this isNone
, no activation is applied (i.e. the activation is the identity function). - bn_kwargs (dict) – Keyword arguments passed to initialize
chainer.links.BatchNormalization
. If a ChainerMN communicator (CommunicatorBase
) is given with the keycomm
,MultiNodeBatchNormalization
will be used for the batch normalization. Otherwise,BatchNormalization
will be used.
SEBlock¶
-
class
chainercv.links.connection.
SEBlock
(n_channel, ratio=16)[source]¶ A squeeze-and-excitation block.
This block is part of squeeze-and-excitation networks. Channel-wise multiplication weights are inferred from and applied to input feature map. Please refer to the original paper for more details.
See also
Parameters:
Transforms¶
Image¶
center_crop¶
-
chainercv.transforms.
center_crop
(img, size, return_param=False, copy=False)[source]¶ Center crop an image by size.
An image is cropped to
size
. The center of the output image and the center of the input image are same.Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
y_slice
.out_img = img[:, y_slice, x_slice]
Return type:
flip¶
pca_lighting¶
-
chainercv.transforms.
pca_lighting
(img, sigma, eigen_value=None, eigen_vector=None)[source]¶ AlexNet style color augmentation
This method adds a noise vector drawn from a Gaussian. The direction of the Gaussian is same as that of the principal components of the dataset.
This method is used in training of AlexNet [1].
[1] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012. Parameters: - img (ndarray) – An image array to be augmented. This is in CHW and RGB format.
- sigma (float) – Standard deviation of the Gaussian. In the original paper, this value is 10% of the range of intensity (25.5 if the range is \([0, 255]\)).
- eigen_value (ndarray) – An array of eigen values. The shape has to be \((3,)\). If it is not specified, the values computed from ImageNet are used.
- eigen_vector (ndarray) – An array of eigen vectors. The shape has to be \((3, 3)\). If it is not specified, the vectors computed from ImageNet are used.
Returns: An image in CHW format.
random_crop¶
-
chainercv.transforms.
random_crop
(img, size, return_param=False, copy=False)[source]¶ Crop array randomly into size.
The input image is cropped by a randomly selected region whose shape is
size
.Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
x_slice
.out_img = img[:, y_slice, x_slice]
Return type:
random_expand¶
-
chainercv.transforms.
random_expand
(img, max_ratio=4, fill=0, return_param=False)[source]¶ Expand an image randomly.
This method randomly place the input image on a larger canvas. The size of the canvas is \((rH, rW)\), where \((H, W)\) is the size of the input image and \(r\) is a random ratio drawn from \([1, max\_ratio]\). The canvas is filled by a value
fill
except for the region where the original image is placed.This data augmentation trick is used to create “zoom out” effect [2].
[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - img (ndarray) – An image array to be augmented. This is in CHW format.
- max_ratio (float) – The maximum ratio of expansion. In the original paper, this value is 4.
- fill (float, tuple or ndarray) – The value of padded pixels.
In the original paper, this value is the mean of ImageNet.
If it is
numpy.ndarray
, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels ofimg
. - return_param (bool) – Returns random parameters.
Returns: If
return_param = False
, returns an arrayout_img
that is the result of expansion.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- ratio (float): The sampled value used to make the canvas.
- y_offset (int): The y coordinate of the top left corner of the image after placing on the canvas.
- x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
Return type:
random_flip¶
-
chainercv.transforms.
random_flip
(img, y_random=False, x_random=False, return_param=False, copy=False)[source]¶ Randomly flip an image in vertical or horizontal direction.
Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is the result of flipping.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- y_flip (bool): Whether the image was flipped in the vertical direction or not.
- x_flip (bool): Whether the image was flipped in the horizontal direction or not.
Return type:
random_rotate¶
-
chainercv.transforms.
random_rotate
(img, return_param=False)[source]¶ Randomly rotate images by 90, 180, 270 or 360 degrees.
Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is the result of rotation.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- k (int): The integer that represents the number of times the image is rotated by 90 degrees.
Return type:
random_sized_crop¶
-
chainercv.transforms.
random_sized_crop
(img, scale_ratio_range=(0.08, 1), aspect_ratio_range=(0.75, 1.3333333333333333), return_param=False, copy=False)[source]¶ Crop an image to random size and aspect ratio.
The size \((H_{crop}, W_{crop})\) and the left top coordinate \((y_{start}, x_{start})\) of the crop are calculated as follows:
- \(H_{crop} = \lfloor{\sqrt{s \times H \times W \times a}}\rfloor\)
- \(W_{crop} = \lfloor{\sqrt{s \times H \times W \div a}}\rfloor\)
- \(y_{start} \sim Uniform\{0, H - H_{crop}\}\)
- \(x_{start} \sim Uniform\{0, W - W_{crop}\}\)
- \(s \sim Uniform(s_1, s_2)\)
- \(b \sim Uniform(a_1, a_2)\) and \(a = b\) or \(a = \frac{1}{b}\) in 50/50 probability.
Here, \(s_1, s_2\) are the two floats in
scale_ratio_range
and \(a_1, a_2\) are the two floats inaspect_ratio_range
. Also, \(H\) and \(W\) are the height and the width of the image. Note that \(s \approx \frac{H_{crop} \times W_{crop}}{H \times W}\) and \(a \approx \frac{H_{crop}}{W_{crop}}\). The approximations come from flooring floats to integers.Note
When it fails to sample a valid scale and aspect ratio for ten times, it picks values in a non-uniform way. If this happens, the selected scale ratio can be smaller than
scale_ratio_range[0]
.Parameters: - img (ndarray) – An image array. This is in CHW format.
- scale_ratio_range (tuple of two floats) – Determines the distribution from which a scale ratio is sampled. The default values are selected so that the area of the crop is 8~100% of the original image. This is the default setting used to train ResNets in Torch style.
- aspect_ratio_range (tuple of two floats) – Determines the distribution from which an aspect ratio is sampled. The default values are \(\frac{3}{4}\) and \(\frac{4}{3}\), which are also the default setting to train ResNets in Torch style.
- return_param (bool) – Returns parameters if
True
.
Returns: If
return_param = False
, returns only the cropped image.If
return_param = True
, returns a tuple of cropped image andparam
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
y_slice
.out_img = img[:, y_slice, x_slice]
scale_ratio (float): \(s\) in the description (see above).
aspect_ratio (float): \(a\) in the description.
Return type:
resize¶
-
chainercv.transforms.
resize
(img, size, interpolation=2)[source]¶ Resize image to match the given shape.
The backend used by
resize()
is configured bychainer.global_config.cv_resize_backend
. Two backends are supported: “cv2” and “PIL”.Parameters: - img (ndarray) – An array to be transformed.
This is in CHW format and the type should be
numpy.float32
. - size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).
- interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy.
Returns: A resize array in CHW format.
Return type: - img (ndarray) – An array to be transformed.
This is in CHW format and the type should be
resize_contain¶
-
chainercv.transforms.
resize_contain
(img, size, fill=0, interpolation=2, return_param=False)[source]¶ Resize the image to fit in the given area while keeping aspect ratio.
If both the height and the width in
size
are larger than the height and the width of theimg
, theimg
is placed on the center with an appropriate padding to matchsize
.Otherwise, the input image is scaled to fit in a canvas whose size is
size
while preserving aspect ratio.Parameters: - img (ndarray) – An array to be transformed. This is in CHW format.
- size (tuple of two ints) – A tuple of two elements:
height, width
. The size of the image after resizing. - fill (float, tuple or ndarray) – The value of padded pixels.
If it is
numpy.ndarray
, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels ofimg
. - interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy. - return_param (bool) – Returns information of resizing and offsetting.
Returns: If
return_param = False
, returns an arrayout_img
that is the result of resizing.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- y_offset (int): The y coordinate of the top left corner of the image after placing on the canvas.
- x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
- scaled_size (tuple): The size to which the image is scaled to before placing it on a canvas. This is a tuple of two elements:
height, width
.
Return type:
rotate¶
-
chainercv.transforms.
rotate
(img, angle, expand=True, fill=0, interpolation=2)[source]¶ Rotate images by degrees.
The backend used by
rotate()
is configured bychainer.global_config.cv_rotate_backend
. Two backends are supported: “cv2” and “PIL”.Parameters: - img (ndarray) – An arrays that get rotated. This is in CHW format.
- angle (float) – Counter clock-wise rotation angle (degree).
- expand (bool) – The output shaped is adapted or not.
If
True
, the input image is contained complete in the output. - fill (float) – The value used for pixels outside the boundaries.
- interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
. Bilinear interpolation is the default strategy.
Returns: returns an array
out_img
that is the result of rotation.Return type:
scale¶
-
chainercv.transforms.
scale
(img, size, fit_short=True, interpolation=2)[source]¶ Rescales the input image to the given “size”.
When
fit_short == True
, the input image will be resized so that the shorter edge will be scaled to lengthsize
after resizing. For example, if the height of the image is larger than its width, image will be resized to (size * height / width, size).Otherwise, the input image will be resized so that the longer edge will be scaled to length
size
after resizing.Parameters: - img (ndarray) – An image array to be scaled. This is in CHW format.
- size (int) – The length of the smaller edge.
- fit_short (bool) – Determines whether to match the length
of the shorter edge or the longer edge to
size
. - interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy.
Returns: A scaled image in CHW format.
Return type:
ten_crop¶
-
chainercv.transforms.
ten_crop
(img, size)[source]¶ Crop 10 regions from an array.
This method crops 10 regions. All regions will be in shape
size
. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them.The crops are ordered in this order.
- center crop
- top-left crop
- bottom-left crop
- top-right crop
- bottom-right crop
- center crop (flipped horizontally)
- top-left crop (flipped horizontally)
- bottom-left crop (flipped horizontally)
- top-right crop (flipped horizontally)
- bottom-right crop (flipped horizontally)
Parameters: Returns: The cropped arrays. The shape of tensor is \((10, C, H, W)\).
Bounding Box¶
crop_bbox¶
-
chainercv.transforms.
crop_bbox
(bbox, y_slice=None, x_slice=None, allow_outside_center=True, return_param=False)[source]¶ Translate bounding boxes to fit within the cropped area of an image.
This method is mainly used together with image cropping. This method translates the coordinates of bounding boxes like
translate_bbox()
. In addition, this function truncates the bounding boxes to fit within the cropped area. If a bounding box does not overlap with the cropped area, this bounding box will be removed.The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are \((y_{min}, x_{min}, y_{max}, x_{max})\), where the four attributes are coordinates of the top left and the bottom right vertices.
Parameters: - bbox (ndarray) – Bounding boxes to be transformed. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
- y_slice (slice) – The slice of y axis.
- x_slice (slice) – The slice of x axis.
- allow_outside_center (bool) – If this argument is
False
, bounding boxes whose centers are outside of the cropped area are removed. The default value isTrue
. - return_param (bool) – If
True
, this function returns indices of kept bounding boxes.
Returns: If
return_param = False
, returns an arraybbox
.If
return_param = True
, returns a tuple whose elements arebbox, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- index (numpy.ndarray): An array holding indices of used bounding boxes.
- trancated_index (numpy.ndarray): An array holding indices of truncated bounding boxes, with respect to returned
bbox
, rather than originalbbox
.
Return type:
flip_bbox¶
-
chainercv.transforms.
flip_bbox
(bbox, size, y_flip=False, x_flip=False)[source]¶ Flip bounding boxes accordingly.
The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are \((y_{min}, x_{min}, y_{max}, x_{max})\), where the four attributes are coordinates of the top left and the bottom right vertices.
Parameters: - bbox (ndarray) – An array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes.
- size (tuple) – A tuple of length 2. The height and the width of the image before resized.
- y_flip (bool) – Flip bounding box according to a vertical flip of an image.
- x_flip (bool) – Flip bounding box according to a horizontal flip of an image.
Returns: Bounding boxes flipped according to the given flips.
Return type:
resize_bbox¶
-
chainercv.transforms.
resize_bbox
(bbox, in_size, out_size)[source]¶ Resize bounding boxes according to image resize.
The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are \((y_{min}, x_{min}, y_{max}, x_{max})\), where the four attributes are coordinates of the top left and the bottom right vertices.
Parameters: Returns: Bounding boxes rescaled according to the given image shapes.
Return type:
rotate_bbox¶
translate_bbox¶
-
chainercv.transforms.
translate_bbox
(bbox, y_offset=0, x_offset=0)[source]¶ Translate bounding boxes.
This method is mainly used together with image transforms, such as padding and cropping, which translates the left top point of the image from coordinate \((0, 0)\) to coordinate \((y, x) = (y_{offset}, x_{offset})\).
The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are \((y_{min}, x_{min}, y_{max}, x_{max})\), where the four attributes are coordinates of the top left and the bottom right vertices.
Parameters: Returns: Bounding boxes translated according to the given offsets.
Return type:
Point¶
flip_point¶
-
chainercv.transforms.
flip_point
(point, size, y_flip=False, x_flip=False)[source]¶ Modify points according to image flips.
Parameters: - point (ndarray) – Points in the image. The shape of this array is \((P, 2)\). \(P\) is the number of points in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the points.
- size (tuple) – A tuple of length 2. The height and the width of the image, which is associated with the points.
- y_flip (bool) – Modify points according to a vertical flip of an image.
- x_flip (bool) – Modify keypoipoints according to a horizontal flip of an image.
Returns: Points modified according to image flips.
Return type:
resize_point¶
-
chainercv.transforms.
resize_point
(point, in_size, out_size)[source]¶ Adapt point coordinates to the rescaled image space.
Parameters: - point (ndarray) – Points in the image. The shape of this array is \((P, 2)\). \(P\) is the number of points in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the points.
- in_size (tuple) – A tuple of length 2. The height and the width of the image before resized.
- out_size (tuple) – A tuple of length 2. The height and the width of the image after resized.
Returns: Points rescaled according to the given image shapes.
Return type:
translate_point¶
-
chainercv.transforms.
translate_point
(point, y_offset=0, x_offset=0)[source]¶ Translate points.
This method is mainly used together with image transforms, such as padding and cropping, which translates the top left point of the image to the coordinate \((y, x) = (y_{offset}, x_{offset})\).
Parameters: - point (ndarray) – Points in the image. The shape of this array is \((P, 2)\). \(P\) is the number of points in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the points.
- y_offset (int or float) – The offset along y axis.
- x_offset (int or float) – The offset along x axis.
Returns: Points modified translation of an image.
Return type:
Visualizations¶
vis_bbox¶
-
chainercv.visualizations.
vis_bbox
(img, bbox, label=None, score=None, label_names=None, instance_colors=None, alpha=1.0, linewidth=3.0, ax=None)[source]¶ Visualize bounding boxes inside image.
Example
>>> from chainercv.datasets import VOCBboxDataset >>> from chainercv.datasets import voc_bbox_label_names >>> from chainercv.visualizations import vis_bbox >>> import matplotlib.pyplot as plt >>> dataset = VOCBboxDataset() >>> img, bbox, label = dataset[60] >>> vis_bbox(img, bbox, label, ... label_names=voc_bbox_label_names) >>> plt.show()
This example visualizes by displaying the same colors for bounding boxes assigned to the same labels.
>>> from chainercv.datasets import VOCBboxDataset >>> from chainercv.datasets import voc_bbox_label_names >>> from chainercv.visualizations import vis_bbox >>> from chainercv.visualizations.colormap import voc_colormap >>> import matplotlib.pyplot as plt >>> dataset = VOCBboxDataset() >>> img, bbox, label = dataset[61] >>> colors = voc_colormap(label + 1) >>> vis_bbox(img, bbox, label, ... label_names=voc_bbox_label_names, ... instance_colors=colors) >>> plt.show()
Parameters: - img (ndarray) – An array of shape \((3, height, width)\).
This is in RGB format and the range of its value is
\([0, 255]\). If this is
None
, no image is displayed. - bbox (ndarray) – An array of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. Each element is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.
- label (ndarray) – An integer array of shape \((R,)\).
The values correspond to id for label names stored in
label_names
. This is optional. - score (ndarray) – A float array of shape \((R,)\). Each value indicates how confident the prediction is. This is optional.
- label_names (iterable of strings) – Name of labels ordered according
to label ids. If this is
None
, labels will be skipped. - instance_colors (iterable of tuples) – List of colors.
Each color is RGB format and the range of its values is
\([0, 255]\). The
i
-th element is the color used to visualize thei
-th instance. Ifinstance_colors
isNone
, the red is used for all boxes. - alpha (float) – The value which determines transparency of the bounding boxes. The range of this value is \([0, 1]\).
- linewidth (float) – The thickness of the edges of the bounding boxes.
- ax (matplotlib.axes.Axis) – The visualization is displayed on this
axis. If this is
None
(default), a new axis is created.
Returns: Returns the Axes object with the plot for further tweaking.
Return type: Axes
- img (ndarray) – An array of shape \((3, height, width)\).
This is in RGB format and the range of its value is
\([0, 255]\). If this is
vis_image¶
vis_instance_segmentation¶
-
chainercv.visualizations.
vis_instance_segmentation
(img, mask, label=None, score=None, label_names=None, instance_colors=None, alpha=0.7, ax=None)[source]¶ Visualize instance segmentation.
Example
This example visualizes an image and an instance segmentation.
>>> from chainercv.datasets import SBDInstanceSegmentationDataset >>> from chainercv.datasets ... import sbd_instance_segmentation_label_names >>> from chainercv.visualizations import vis_instance_segmentation >>> import matplotlib.pyplot as plt >>> dataset = SBDInstanceSegmentationDataset() >>> img, mask, label = dataset[0] >>> vis_instance_segmentation( ... img, mask, label, ... label_names=sbd_instance_segmentation_label_names) >>> plt.show()
This example visualizes an image, an instance segmentation and bounding boxes.
>>> from chainercv.datasets import SBDInstanceSegmentationDataset >>> from chainercv.datasets ... import sbd_instance_segmentation_label_names >>> from chainercv.visualizations import vis_bbox >>> from chainercv.visualizations import vis_instance_segmentation >>> from chainercv.visualizations.colormap import voc_colormap >>> from chainercv.utils import mask_to_bbox >>> import matplotlib.pyplot as plt >>> dataset = SBDInstanceSegmentationDataset() >>> img, mask, label = dataset[0] >>> bbox = mask_to_bbox(mask) >>> colors = voc_colormap(list(range(1, len(mask) + 1))) >>> ax = vis_bbox(img, bbox, label, ... label_names=sbd_instance_segmentation_label_names, ... instance_colors=colors, alpha=0.7, linewidth=0.5) >>> vis_instance_segmentation( ... None, mask, instance_colors=colors, alpha=0.7, ax=ax) >>> plt.show()
Parameters: - img (ndarray) – An array of shape \((3, H, W)\).
This is in RGB format and the range of its value is
\([0, 255]\). If this is
None
, no image is displayed. - mask (ndarray) – A bool array of shape
:math`(R, H, W)`.
If there is an object, the value of the pixel is
True
, and otherwise, it isFalse
. - label (ndarray) – An integer array of shape \((R, )\).
The values correspond to id for label names stored in
label_names
. - score (ndarray) – A float array of shape \((R,)\). Each value indicates how confident the prediction is. This is optional.
- label_names (iterable of strings) – Name of labels ordered according to label ids.
- instance_colors (iterable of tuple) – List of colors.
Each color is RGB format and the range of its values is
\([0, 255]\). The
i
-th element is the color used to visualize thei
-th instance. Ifinstance_colors
isNone
, the default color map is used. - alpha (float) – The value which determines transparency of the figure.
The range of this value is \([0, 1]\). If this
value is
0
, the figure will be completely transparent. The default value is0.7
. This option is useful for overlaying the label on the source image. - ax (matplotlib.axes.Axis) – The visualization is displayed on this
axis. If this is
None
(default), a new axis is created.
Returns: Returns
ax
.ax
is anmatploblib.axes.Axes
with the plot.Return type: matploblib.axes.Axes
- img (ndarray) – An array of shape \((3, H, W)\).
This is in RGB format and the range of its value is
\([0, 255]\). If this is
vis_point¶
-
chainercv.visualizations.
vis_point
(img, point, mask=None, ax=None)[source]¶ Visualize points in an image.
Example
>>> import chainercv >>> import matplotlib.pyplot as plt >>> dataset = chainercv.datasets.CUBPointDataset() >>> img, point, mask = dataset[0] >>> chainercv.visualizations.vis_point(img, point, mask) >>> plt.show()
Parameters: - img (ndarray) – An image of shape \((3, height, width)\).
This is in RGB format and the range of its value is
\([0, 255]\). This should be visualizable using
matplotlib.pyplot.imshow(img)
- point (ndarray) – An array of point coordinates whose shape is \((P, 2)\), where \(P\) is the number of points. The second axis corresponds to \(y\) and \(x\) coordinates of the points.
- mask (ndarray) – A boolean array whose shape is
\((P,)\). If \(i\) th element is
True
, the \(i\) th point is not displayed. If not specified, all points inpoint
will be displayed. - ax (matplotlib.axes.Axes) – If provided, plot on this axis.
Returns: Returns the Axes object with the plot for further tweaking.
Return type: Axes
- img (ndarray) – An image of shape \((3, height, width)\).
This is in RGB format and the range of its value is
\([0, 255]\). This should be visualizable using
vis_semantic_segmentation¶
-
chainercv.visualizations.
vis_semantic_segmentation
(img, label, label_names=None, label_colors=None, ignore_label_color=(0, 0, 0), alpha=1, all_label_names_in_legend=False, ax=None)[source]¶ Visualize a semantic segmentation.
Example
>>> from chainercv.datasets import VOCSemanticSegmentationDataset >>> from chainercv.datasets ... import voc_semantic_segmentation_label_colors >>> from chainercv.datasets ... import voc_semantic_segmentation_label_names >>> from chainercv.visualizations import vis_semantic_segmentation >>> import matplotlib.pyplot as plt >>> dataset = VOCSemanticSegmentationDataset() >>> img, label = dataset[60] >>> ax, legend_handles = vis_semantic_segmentation( ... img, label, ... label_names=voc_semantic_segmentation_label_names, ... label_colors=voc_semantic_segmentation_label_colors, ... alpha=0.9) >>> ax.legend(handles=legend_handles, bbox_to_anchor=(1, 1), loc=2) >>> plt.show()
Parameters: - img (ndarray) – An array of shape \((3, height, width)\).
This is in RGB format and the range of its value is
\([0, 255]\). If this is
None
, no image is displayed. - label (ndarray) – An integer array of shape
\((height, width)\).
The values correspond to id for label names stored in
label_names
. - label_names (iterable of strings) – Name of labels ordered according to label ids.
- label_colors – (iterable of tuple): An iterable of colors for regular
labels.
Each color is RGB format and the range of its values is
\([0, 255]\).
If
colors
isNone
, the default color map is used. - ignore_label_color (tuple) – Color for ignored label.
This is RGB format and the range of its values is \([0, 255]\).
The default value is
(0, 0, 0)
. - alpha (float) – The value which determines transparency of the figure.
The range of this value is \([0, 1]\). If this
value is
0
, the figure will be completely transparent. The default value is1
. This option is useful for overlaying the label on the source image. - all_label_names_in_legend (bool) – Determines whether to include
all label names in a legend. If this is
False
, the legend does not contain the names of unused labels. An unused label is defined as a label that does not appear inlabel
. The default value isFalse
. - ax (matplotlib.axes.Axis) – The visualization is displayed on this
axis. If this is
None
(default), a new axis is created.
Returns: Returns
ax
andlegend_handles
.ax
is anmatploblib.axes.Axes
with the plot. It can be used for further tweaking.legend_handles
is a list of legends. It can be passedmatploblib.pyplot.legend()
to show a legend.Return type: matploblib.axes.Axes and list of matplotlib.patches.Patch
- img (ndarray) – An array of shape \((3, height, width)\).
This is in RGB format and the range of its value is
\([0, 255]\). If this is
Utils¶
Bounding Box Utilities¶
bbox_iou¶
-
chainercv.utils.
bbox_iou
(bbox_a, bbox_b)[source]¶ Calculate the Intersection of Unions (IoUs) between bounding boxes.
IoU is calculated as a ratio of area of the intersection and area of the union.
This function accepts both
numpy.ndarray
andcupy.ndarray
as inputs. Please note that bothbbox_a
andbbox_b
need to be same type. The output is same type as the type of the inputs.Parameters: - bbox_a (array) – An array whose shape is \((N, 4)\).
\(N\) is the number of bounding boxes.
The dtype should be
numpy.float32
. - bbox_b (array) – An array similar to
bbox_a
, whose shape is \((K, 4)\). The dtype should benumpy.float32
.
Returns: An array whose shape is \((N, K)\). An element at index \((n, k)\) contains IoUs between \(n\) th bounding box in
bbox_a
and \(k\) th bounding box inbbox_b
.Return type: array
- bbox_a (array) – An array whose shape is \((N, 4)\).
\(N\) is the number of bounding boxes.
The dtype should be
non_maximum_suppression¶
-
chainercv.utils.
non_maximum_suppression
(bbox, thresh, score=None, limit=None)[source]¶ Suppress bounding boxes according to their IoUs.
This method checks each bounding box sequentially and selects the bounding box if the Intersection over Unions (IoUs) between the bounding box and the previously selected bounding boxes is less than
thresh
. This method is mainly used as postprocessing of object detection. The bounding boxes are selected from ones with higher scores. Ifscore
is not provided as an argument, the bounding box is ordered by its index in ascending order.The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are \((y_{min}, x_{min}, y_{max}, x_{max})\), where the four attributes are coordinates of the top left and the bottom right vertices.
score
is a float array of shape \((R,)\). Each score indicates confidence of prediction.This function accepts both
numpy.ndarray
andcupy.ndarray
as an input. Please note that bothbbox
andscore
need to be the same type. The type of the output is the same as the input.Parameters: - bbox (array) – Bounding boxes to be transformed. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.
- thresh (float) – Threshold of IoUs.
- score (array) – An array of confidences whose shape is \((R,)\).
- limit (int) – The upper bound of the number of the output bounding boxes. If it is not specified, this method selects as many bounding boxes as possible.
Returns: An array with indices of bounding boxes that are selected. They are sorted by the scores of bounding boxes in descending order. The shape of this array is \((K,)\) and its dtype is
numpy.int32
. Note that \(K \leq R\).Return type: array
Download Utilities¶
cached_download¶
-
chainercv.utils.
cached_download
(url)[source]¶ Downloads a file and caches it.
This is different from the original
cached_download()
in that the download progress is reported.It downloads a file from the URL if there is no corresponding cache. After the download, this function stores a cache to the directory under the dataset root (see
set_dataset_root()
). If there is already a cache for the given URL, it just returns the path to the cache without downloading the same file.Parameters: url (string) – URL to download from. Returns: Path to the downloaded file. Return type: string
download_model¶
-
chainercv.utils.
download_model
(url)[source]¶ Downloads a model file and puts it under model directory.
It downloads a file from the URL and puts it under model directory. For exmaple, if
url
is http://example.com/subdir/model.npz, the pretrained weights file will be saved to $CHAINER_DATASET_ROOT/pfnet/chainercv/models/model.npz. If there is already a file at the destination path, it just returns the path without downloading the same file.Parameters: url (string) – URL to download from. Returns: Path to the downloaded file. Return type: string
extractall¶
-
chainercv.utils.
extractall
(file_path, destination, ext)[source]¶ Extracts an archive file.
This function extracts an archive file to a destination.
Parameters: - file_path (string) – The path of a file to be extracted.
- destination (string) – A directory path. The archive file will be extracted under this directory.
- ext (string) – An extension suffix of the archive file.
This function supports
'.zip'
,'.tar'
,'.gz'
and'.tgz'
.
Image Utilities¶
read_image¶
-
chainercv.utils.
read_image
(path, dtype=<type 'numpy.float32'>, color=True, alpha=None)[source]¶ Read an image from a file.
This function reads an image from given file. The image is CHW format and the range of its value is \([0, 255]\). If
color = True
, the order of the channels is RGB.The backend used by
read_image()
is configured bychainer.global_config.cv_read_image_backend
. Two backends are supported: “cv2” and “PIL”.Parameters: - path (string) – A path of image file.
- dtype – The type of array. The default value is
float32
. - color (bool) – This option determines the number of channels.
If
True
, the number of channels is three. In this case, the order of the channels is RGB. This is the default behaviour. IfFalse
, this function returns a grayscale image. - alpha (None or {'ignore', 'blend_with_white', 'blend_with_black'}) –
Choose how RGBA images are handled. By default, an error is raised. Here are the other possible behaviors:
- ’ignore’: Ignore alpha channel.
- ’blend_with_white’: Blend RGB image multiplied by alpha on a white image.
- ’blend_with_black’: Blend RGB image multiplied by alpha on a black image.
Returns: An image.
Return type:
read_label¶
-
chainercv.utils.
read_label
(path, dtype=<type 'numpy.int32'>)[source]¶ Read a label image from a file.
This function reads an label image from given file. If reading label doesn’t work collectly, try
read_image()
with a parametercolor=True
.Parameters: - path (string) – A path of image file.
- dtype – The type of array. The default value is
int32
. - color (bool) – This option determines the number of channels.
If
True
, the number of channels is three. In this case, the order of the channels is RGB. This is the default behaviour. IfFalse
, this function returns a grayscale image.
Returns: An image.
Return type:
tile_images¶
-
chainercv.utils.
tile_images
(imgs, n_col, pad=2, fill=0)[source]¶ Make a tile of images
Parameters: - imgs (numpy.ndarray) – A batch of images whose shape is BCHW.
- n_col (int) – The number of columns in a tile.
- pad (int or tuple of two ints) –
pad_y, pad_x
. This is the amounts of padding in y and x directions. If this is an integer, the amounts of padding in the two directions are the same. The default value is 2. - fill (float, tuple or ndarray) – The value of padded pixels.
If it is
numpy.ndarray
, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels ofimg
.
Returns: An image array in CHW format. The size of this image is \(((H + pad_{y}) \times \lceil B / n_{n_{col}} \rceil, (W + pad_{x}) \times n_{col})\).
Return type:
Iterator Utilities¶
apply_to_iterator¶
-
chainercv.utils.
apply_to_iterator
(func, iterator, n_input=1, hook=None)[source]¶ Apply a function/method to batches from an iterator.
This function applies a function/method to an iterator of batches.
It assumes that the iterator iterates over a collection of tuples that contain inputs to
func()
. Additionally, the tuples may contain values that are not used byfunc()
. For convenience, we allow the iterator to iterate over a collection of inputs that are not tuple. Here is an illustration of the expected behavior of the iterator. This behaviour is the same aschainer.Iterator
.>>> batch = next(iterator) >>> # batch: [in_val] or >>> # batch: [(in_val0, ..., in_val{n_input - 1})] or >>> # batch: [(in_val0, ..., in_val{n_input - 1}, rest_val0, ...)]
func()
should take batch(es) of data and return batch(es) of computed values. Here is an illustration of the expected behavior of the function.>>> out_vals = func([in_val0], ..., [in_val{n_input - 1}]) >>> # out_vals: [out_val] or >>> out_vals0, out_vals1, ... = func([in_val0], ..., [in_val{n_input - 1}]) >>> # out_vals0: [out_val0] >>> # out_vals1: [out_val1]
With
apply_to_iterator()
, users can get iterator(s) of values returned byfunc()
. It also returns iterator(s) of input values and values that are not used for computation.>>> in_values, out_values, rest_values = apply_to_iterator( >>> func, iterator, n_input) >>> # in_values: (iter of in_val0, ..., iter of in_val{n_input - 1}) >>> # out_values: (iter of out_val0, ...) >>> # rest_values: (iter of rest_val0, ...)
Here is an exmple, which applies a pretrained Faster R-CNN to PASCAL VOC dataset.
>>> from chainer import iterators >>> >>> from chainercv.datasets import VOCBBoxDataset >>> from chainercv.links import FasterRCNNVGG16 >>> from chainercv.utils import apply_to_iterator >>> >>> dataset = VOCBBoxDataset(year='2007', split='test') >>> # next(iterator) -> [(img, gt_bbox, gt_label)] >>> iterator = iterators.SerialIterator( ... dataset, 2, repeat=False, shuffle=False) >>> >>> # model.predict([img]) -> ([pred_bbox], [pred_label], [pred_score]) >>> model = FasterRCNNVGG16(pretrained_model='voc07') >>> >>> in_values, out_values, rest_values = apply_to_iterator( ... model.predict, iterator) >>> >>> # in_values contains one iterator >>> imgs, = in_values >>> # out_values contains three iterators >>> pred_bboxes, pred_labels, pred_scores = out_values >>> # rest_values contains two iterators >>> gt_bboxes, gt_labels = rest_values
Parameters: - func – A callable that takes batch(es) of input data and returns computed data.
- iterator (iterator) – An iterator of batches.
The first
n_input
elements in each sample are treated as input values. They are passed tofunc
. - n_input (int) – The number of input data. The default value is
1
. - hook – A callable that is called after each iteration.
in_values
,out_values
, andrest_values
are passed as arguments. Note that these values do not contain data from the previous iterations.
Returns: This function returns three tuples of iterators:
in_values
,out_values
andrest_values
.in_values
: A tuple of iterators. Each iterator returns a corresponding input value. For example, iffunc()
takes[in_val0], [in_val1]
,next(in_values[0])
andnext(in_values[1])
will bein_val0
andin_val1
.out_values
: A tuple of iterators. Each iterator returns a corresponding computed value. For example, iffunc()
returns([out_val0], [out_val1])
,next(out_values[0])
andnext(out_values[1])
will beout_val0
andout_val1
.rest_values
: A tuple of iterators. Each iterator returns a corresponding rest value. For example, if theiterator
returns[(in_val0, in_val1, rest_val0, rest_val1)]
,next(rest_values[0])
andnext(rest_values[1])
will berest_val0
andrest_val1
. If the input iterator does not give any rest values, this tuple will be empty.
Return type: Three tuples of iterators
ProgressHook¶
unzip¶
-
chainercv.utils.
unzip
(iterable)[source]¶ Converts an iterable of tuples into a tuple of iterators.
This function converts an iterable of tuples into a tuple of iterators. This is an inverse function of
six.moves.zip()
.>>> from chainercv.utils import unzip >>> data = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')] >>> int_iter, str_iter = unzip(data) >>> >>> next(int_iter) # 0 >>> next(int_iter) # 1 >>> next(int_iter) # 2 >>> >>> next(str_iter) # 'a' >>> next(str_iter) # 'b' >>> next(str_iter) # 'c'
Parameters: iterable (iterable) – An iterable of tuples. All tuples should have the same length. Returns: Each iterator corresponds to each element of input tuple. Note that each iterator stores values until they are popped. To reduce memory usage, it is recommended to delete unused iterators. Return type: tuple of iterators
Link Utilities¶
prepare_pretrained_model¶
-
chainercv.utils.
prepare_pretrained_model
(param, pretrained_model, models, default={})[source]¶ Select parameters based on the existence of pretrained model.
Parameters: - param (dict) – Map from the name of the parameter to values.
- pretrained_model (string) – Name of the pretrained weight,
path to the pretrained weight or
None
. - models (dict) –
Map from the name of the pretrained weight to
model
, which is a dictionary containing the configuration used by the selected weight.model
has four keys:param
,overwritable
,url
andcv2
.- param (dict): Parameters assigned to the pretrained weight.
- overwritable (set): Names of parameters that are overwritable (i.e.,
param[key] != model['param'][key]
is accepted). - url (string): Location of the pretrained weight.
- cv2 (bool): If
True
, a warning is raised ifcv2
is not installed.
Mask Utilities¶
mask_iou¶
-
chainercv.utils.
mask_iou
(mask_a, mask_b)[source]¶ Calculate the Intersection of Unions (IoUs) between masks.
IoU is calculated as a ratio of area of the intersection and area of the union.
This function accepts both
numpy.ndarray
andcupy.ndarray
as inputs. Please note that bothmask_a
andmask_b
need to be same type. The output is same type as the type of the inputs.Parameters: - mask_a (array) – An array whose shape is \((N, H, W)\).
\(N\) is the number of masks.
The dtype should be
numpy.bool
. - mask_b (array) – An array similar to
mask_a
, whose shape is \((K, H, W)\). The dtype should benumpy.bool
.
Returns: An array whose shape is \((N, K)\). An element at index \((n, k)\) contains IoUs between \(n\) th mask in
mask_a
and \(k\) th mask inmask_b
.Return type: array
- mask_a (array) – An array whose shape is \((N, H, W)\).
\(N\) is the number of masks.
The dtype should be
mask_to_bbox¶
-
chainercv.utils.
mask_to_bbox
(mask)[source]¶ Compute the bounding boxes around the masked regions.
This function accepts both
numpy.ndarray
andcupy.ndarray
as inputs.Parameters: mask (array) – An array whose shape is \((R, H, W)\). \(R\) is the number of masks. The dtype should be numpy.bool
.Returns: The bounding boxes around the masked regions. This is an array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes. The dtype should be numpy.float32
.Return type: array
Testing Utilities¶
assert_is_bbox¶
-
chainercv.utils.
assert_is_bbox
(bbox, size=None)[source]¶ Checks if bounding boxes satisfy bounding box format.
This function checks if given bounding boxes satisfy bounding boxes format or not. If the bounding boxes do not satifiy the format, this function raises an
AssertionError
.Parameters: - bbox (ndarray) – Bounding boxes to be checked.
- size (tuple of ints) – The size of an image. If this argument is specified, Each bounding box should be within the image.
assert_is_bbox_dataset¶
-
chainercv.utils.
assert_is_bbox_dataset
(dataset, n_fg_class, n_example=None)[source]¶ Checks if a dataset satisfies the bounding box dataset API.
This function checks if a given dataset satisfies the bounding box dataset API or not. If the dataset does not satifiy the API, this function raises an
AssertionError
.Parameters:
assert_is_detection_link¶
-
chainercv.utils.
assert_is_detection_link
(link, n_fg_class)[source]¶ Checks if a link satisfies detection link APIs.
This function checks if a given link satisfies detection link APIs or not. If the link does not satifiy the APIs, this function raises an
AssertionError
.Parameters: - link – A link to be checked.
- n_fg_class (int) – The number of foreground classes.
assert_is_image¶
-
chainercv.utils.
assert_is_image
(img, color=True, check_range=True)[source]¶ Checks if an image satisfies image format.
This function checks if a given image satisfies image format or not. If the image does not satifiy the format, this function raises an
AssertionError
.Parameters: - img (ndarray) – An image to be checked.
- color (bool) – A boolean that determines the expected channel size.
If it is
True
, the number of channels should be3
. Otherwise, it should be1
. The default value isTrue
. - check_range (bool) – A boolean that determines whether the range
of values are checked or not. If it is
True
, The values of image must be in \([0, 255]\). Otherwise, this function does not check the range. The default value isTrue
.
assert_is_instance_segmentation_dataset¶
-
chainercv.utils.
assert_is_instance_segmentation_dataset
(dataset, n_fg_class, n_example=None)[source]¶ Checks if a dataset satisfies instance segmentation dataset APIs.
This function checks if a given dataset satisfies instance segmentation dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an
AssertionError
.Parameters:
assert_is_label_dataset¶
-
chainercv.utils.
assert_is_label_dataset
(dataset, n_class, n_example=None, color=True)[source]¶ Checks if a dataset satisfies the label dataset API.
This function checks if a given dataset satisfies the label dataset API or not. If the dataset does not satifiy the API, this function raises an
AssertionError
.Parameters: - dataset – A dataset to be checked.
- n_class (int) – The number of classes.
- n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.
- color (bool) – A boolean that determines the expected channel size.
If it is
True
, the number of channels should be3
. Otherwise, it should be1
. The default value isTrue
.
assert_is_point¶
-
chainercv.utils.
assert_is_point
(point, mask=None, size=None)[source]¶ Checks if points satisfy the format.
This function checks if given points satisfy the format and raises an
AssertionError
when the points violate the convention.Parameters:
assert_is_point_dataset¶
-
chainercv.utils.
assert_is_point_dataset
(dataset, n_point=None, n_example=None, no_mask=False)[source]¶ Checks if a dataset satisfies the point dataset API.
This function checks if a given dataset satisfies the point dataset API or not. If the dataset does not satifiy the API, this function raises an
AssertionError
.Parameters: - dataset – A dataset to be checked.
- n_point (int) – The number of expected points per image.
If this is
None
, the number of points per image can be arbitrary. - n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.
- no_mask (bool) – If
True
, we assume that point mask is always not contained. IfFalse
, point mask may or may not be contained.
assert_is_semantic_segmentation_dataset¶
-
chainercv.utils.
assert_is_semantic_segmentation_dataset
(dataset, n_class, n_example=None)[source]¶ Checks if a dataset satisfies semantic segmentation dataset APIs.
This function checks if a given dataset satisfies semantic segmentation dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an
AssertionError
.Parameters:
assert_is_semantic_segmentation_link¶
-
chainercv.utils.
assert_is_semantic_segmentation_link
(link, n_class)[source]¶ Checks if a link satisfies semantic segmentation link APIs.
This function checks if a given link satisfies semantic segmentation link APIs or not. If the link does not satifiy the APIs, this function raises an
AssertionError
.Parameters: - link – A link to be checked.
- n_class (int) – The number of classes including background.
ConstantStubLink¶
-
class
chainercv.utils.
ConstantStubLink
(outputs)[source]¶ A chainer.Link that returns constant value(s).
This is a
chainer.Link
that returns constantchainer.Variable
(s) when__call__()
method is called.Parameters: outputs (ndarray or tuple or ndarray) – The value(s) of variable(s) returned by __call__()
. If an array is specified,__call__()
returns achainer.Variable
. Otherwise, it returns a tuple ofchainer.Variable
.-
to_cpu
()[source]¶ Copies parameter variables and persistent values to CPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to CPU, the link implementation must override this method to do so.
Returns: self
-
to_gpu
()[source]¶ Copies parameter variables and persistent values to GPU.
This method does not handle non-registered attributes. If some of such attributes must be copied to GPU, the link implementation must override this method to do so.
Parameters: device – Target device specifier. If omitted, the current device is used. Returns: self
-
generate_random_bbox¶
-
chainercv.utils.
generate_random_bbox
(n, img_size, min_length, max_length)[source]¶ Generate valid bounding boxes with random position and shape.
Parameters: Returns: Coordinates of bounding boxes. Its shape is \((R, 4)\). Here, \(R\) equals
n
. The second axis contains \(y_{min}, x_{min}, y_{max}, x_{max}\), where \(min\_length \leq y_{max} - y_{min} < max\_length\). and \(min\_length \leq x_{max} - x_{min} < max\_length\)Return type:
Naming Conventions¶
Here are the notations used.
- \(B\) is the size of a batch.
- \(H\) is the height of an image.
- \(W\) is the width of an image.
- \(C\) is the number of channels.
- \(R\) is the total number of instances in an image.
- \(L\) is the number of classes.
Data objects¶
Images¶
imgs
: \((B, C, H, W)\) or \([(C, H, W)]\)img
: \((C, H, W)\)
Note
image
is used for a name of a function or a class (e.g., chainercv.utils.write_image()
).
Bounding boxes¶
bboxes
: \((B, R, 4)\) or \([(R, 4)]\)bbox
: \((R, 4)\)bb
: \((4,)\)
Labels¶
name | classification | detection and instance segmentation | semantic segmentation | |
---|---|---|---|---|
labels |
\((B,)\) | \((B, R)\) or \([(R,)]\) | \((B, H, W)\) | |
label |
\(()\) | \((R,)\) | \((H, W)\) | |
l |
r lb |
– | \(()\) | – |
Scores and probabilities¶
score represents an unbounded confidence value.
On the other hand, probability is bounded in [0, 1]
and sums to 1.
name | classification | detection and instance segmentation | semantic segmentation |
---|---|---|---|
scores or probs |
\((B, L)\) | \((B, R, L)\) or \([(R, L)]\) | \((B, L, H, W)\) |
score or prob |
\((L,)\) | \((R, L)\) | \((L, H, W)\) |
sc or pb |
– | \((L,)\) | – |
Note
Even for objects that satisfy the definition of probability, they can be named as score
.
Instance segmentations¶
masks
: \((B, R, H, W)\) or \([(R, H, W)]\)mask
: \((R, H, W)\)msk
: \((H, W)\)
Attributing an additonal meaning to a basic data object¶
RoIs¶
rois
: \((R', 4)\), which consists of bounding boxes for multiple images. Assuming that there are \(B\) images each containing \(R_i\) bounding boxes, the formula \(R' = \sum R_i\) is true.roi_indices
: An array of shape \((R',)\) that contains batch indices of images to which bounding boxes correspond.roi
: \((R, 4)\). This is RoIs for single image.
Attributes associated to RoIs¶
RoIs may have additional attributes, such as class scores and masks.
These attributes are named by appending roi_
(e.g., scores
-like object is named as roi_scores
).
roi_xs
: \((R',) + x_{shape}\)roi_x
: \((R,) + x_{shape}\)
In the case of scores
with shape \((L,)\), roi_xs
would have shape \((R', L)\).
Note
roi_nouns = roi_noun = noun
when batchsize=1
.
Changing names interchangeably is fine.
Class-wise vs class-independent¶
cls_nouns
is a multi-class version of nouns
.
For instance, cls_locs
is \((B, R, L, 4)\) and locs
is \((B, R, 4)\).
Note
cls_probs
and probs
can be used interchangeably in the case
when there is no confusion.
Arbitrary input¶
x
is a variable whose shape can be inferred from the context.
It can be used only when there is no confusion on its shape.
This is usually the case when naming an input to a neural network.
License¶
Source Code¶
The source code of ChainerCV is licensed under MIT-License.
Pretrained Models¶
Pretrained models provided by ChainerCV are benefited from the following resources. See the following resources for the terms of use of a model with weights pretrained by any of such resources.
model | resource |
---|---|
ResNet50/101/152 (imagenet ) |
|
SEResNet50/101/152 (imagenet ) |
|
SEResNeXt50/101 (imagenet ) |
|
VGG16 (imagenet ) |
|
FasterRCNNVGG16 (imagenet ) |
|
FasterRCNNVGG16 (voc07 /voc0712 ) |
|
SSD300/SSD512 (imagenet ) |
|
SSD300/SSD512 (voc0712 ) |
|
YOLOv2 (voc0712 ) |
|
YOLOv3 (voc0712 ) |
|
PSPNetResNet101 (cityscapes ) |
|
SegNetBasic (camvid ) |
|
FCISResNet101 (sbd ) |