Datasets¶

General datasets¶

DirectoryParsingLabelDataset¶

class chainercv.datasets.DirectoryParsingLabelDataset(root, check_img_file=None, color=True, numerical_sort=False)[source]¶

A label dataset whose label names are the names of the subdirectories.

The label names are the names of the directories that locate a layer below the root directory. All images locating under the subdirectoies will be categorized to classes with subdirectory names. An image is parsed only when the function check_img_file returns True by taking the path to the image as an argument. If check_img_file is None, the path with any image extensions will be parsed.

Example

A directory structure should be one like below.

root
|-- class_0
|   |-- img_0.png
|   |-- img_1.png
|
--- class_1
    |-- img_0.png

>>> from chainercv.datasets import DirectoryParsingLabelDataset
>>> dataset = DirectoryParsingLabelDataset('root')
>>> dataset.img_paths
['root/class_0/img_0.png', 'root/class_0/img_1.png',
'root_class_1/img_0.png']
>>> dataset.labels
array([0, 0, 1])

Parameters

root (string) – The root directory.
check_img_file (callable) – A function to determine if a file should be included in the dataset.
color (bool) – If True, this dataset read images as color images. The default value is True.
numerical_sort (bool) – Label names are sorted numerically. This means that label 2 is before label 10, which is not the case when string sort is used. Regardless of this option, string sort is used for the order of files with the same label. The default value is False.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$ 1	`float32`	RGB, $[0, 255]$
`label`	scalar	`int32`	$[0, \#class - 1]$

1: $(1, H, W)$ if color = False.

directory_parsing_label_names¶

chainercv.datasets.directory_parsing_label_names(root, numerical_sort=False)[source]¶

Get label names from the directories that are named by them.

The label names are the names of the directories that locate a layer below the root directory.

The label names can be used together with DirectoryParsingLabelDataset. The index of a label name corresponds to the label id that is used by the dataset to refer the label.

Parameters

root (string) – The root directory.
numerical_sort (bool) – Label names are sorted numerically. This means that label 2 is before label 10, which is not the case when string sort is used. The default value is False.

Returns

Sorted names of classes.

Return type

list of strings

MixUpSoftLabelDataset¶

class chainercv.datasets.MixUpSoftLabelDataset(dataset, n_class, alpha=1.0)[source]¶

Dataset which returns mixed images and labels for mixup learning 2.

MixUpSoftLabelDataset mixes two pairs of labeled images fetched from the base dataset.

Unlike LabeledImageDatasets, label is a one-dimensional float array with at most two nonnegative weights (i.e. soft label). The sum of the two weights is one.

Example

We construct a mixup dataset from MNIST.

>>> from chainer.datasets import get_mnist
>>> from chainercv.datasets import SiameseDataset
>>> from chainercv.datasets import MixUpSoftLabelDataset
>>> mnist, _ = get_mnist()
>>> base_dataset = SiameseDataset(mnist, mnist)
>>> dataset = MixUpSoftLabelDataset(base_dataset, 10)
>>> mixed_image, mixed_label = dataset[0]
>>> mixed_label.shape
(10,)
>>> mixed_label.dtype
dtype('float32')

Parameters

dataset –
The underlying dataset. The dataset returns img_0, label_0, img_1, label_1, which is a tuple containing two pairs of an image and a label. Typically, dataset is SiameseDataset.

The shapes of images and labels should be constant.
n_class (int) – The number of classes in the base dataset.
alpha (float) – A hyperparameter of Beta distribution. mix_ratio is sampled from $B(\alpha,\alpha)$. The default value is $1.0$ meaning that the distribution is the same as Uniform distribution with lower boundary of $0.0$ and upper boundary of $1.0$.

2: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz. mixup: Beyond Empirical Risk Minimization. arXiv 2017.

This dataset returns the following data.

name	shape	dtype	format
`img`	3	3	3
`label`	$(\#class,)$	`float32`	$[0, 1]$

3(1,2,3): Same as dataset.

SiameseDataset¶

class chainercv.datasets.SiameseDataset(dataset_0, dataset_1, pos_ratio=None, length=None, labels_0=None, labels_1=None)[source]¶

A dataset that returns samples fetched from two datasets.

The dataset returns samples from the two base datasets. If pos_ratio is not None, SiameseDataset can be configured to return positive pairs at the ratio of pos_ratio and negative pairs at the ratio of 1 - pos_ratio. In this mode, the base datasets are assumed to be label datasets that return an image and a label as a sample.

Example

We construct a siamese dataset from MNIST.

>>> from chainer.datasets import get_mnist
>>> from chainercv.datasets import SiameseDataset
>>> mnist, _ = get_mnist()
>>> dataset = SiameseDataset(mnist, mnist, pos_ratio=0.3)
# The probability of the two samples having the same label
# is 0.3 as specified by pos_ratio.
>>> img_0, label_0, img_1, label_1 = dataset[0]
# The returned examples may change in the next
# call even if the index is the same as before
# because SiameseDataset picks examples randomly
# (e.g., img_0_new may differ from img_0).
>>> img_0_new, label_0_new, img_1_new, label_1_new = dataset[0]

Parameters

dataset_0 – The first base dataset.
dataset_1 – The second base dataset.
pos_ratio (float) – If this is not None, this dataset tries to construct positive pairs at the given rate. If None, this dataset randomly samples examples from the base datasets. The default value is None.
length (int) – The length of this dataset. If None, the length of the first base dataset is the length of this dataset.
labels_0 (numpy.ndarray) – The labels associated to the first base dataset. The length should be the same as the length of the first dataset. If this is None, the labels are automatically fetched using the following line of code: [ex[1] for ex in dataset_0]. By setting labels_0 and skipping the fetching iteration, the computation cost can be reduced. Also, if pos_ratio is None, this value is ignored. The default value is None. If labels_1 is spcified and dataset_0 and dataset_1 are the same, labels_0 can be skipped.
labels_1 (numpy.ndarray) – The labels associated to the second base dataset. If labels_0 is spcified and dataset_0 and dataset_1 are the same, labels_1 can be skipped. Please consult the explanation for labels_0.

This dataset returns the following data.

name	shape	dtype	format
`img_0`	4	4	4
`label_0`	scalar	`int32`	$[0, \#class - 1]$
`img_1`	5	5	5
`label_1`	scalar	`int32`	$[0, \#class - 1]$

4(1,2,3): Same as dataset_0.
5(1,2,3): Same as dataset_1.

ADE20K¶

ADE20KSemanticSegmentationDataset¶

class chainercv.datasets.ADE20KSemanticSegmentationDataset(data_dir='auto', split='train')[source]¶

Semantic segmentation dataset for ADE20K.

This is ADE20K dataset distributed in MIT Scene Parsing Benchmark website. It has 20,210 training images and 2,000 validation images.

Parameters

data_dir (string) – Path to the dataset directory. The directory should contain the ADEChallengeData2016 directory. And that directory should contain at least images and annotations directries. If auto is given, the dataset is automatically downloaded into $CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k.
split ({'train', 'val'}) – Select from dataset splits used in MIT Scene Parsing Benchmark dataset (ADE20K).

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	$(H, W)$	`int32`	$[-1, \#class - 1]$

ADE20KTestImageDataset¶

class chainercv.datasets.ADE20KTestImageDataset(data_dir='auto')[source]¶

Image dataset for test split of ADE20K.

This is an image dataset of test split in ADE20K dataset distributed at MIT Scene Parsing Benchmark website. It has 3,352 test images.

Parameters: data_dir (string) – Path to the dataset directory. The directory should contain the release_test dir. If auto is given, the dataset is automatically downloaded into $CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$

CamVid¶

CamVidDataset¶

class chainercv.datasets.CamVidDataset(data_dir='auto', split='train')[source]¶

Semantic segmentation dataset for CamVid.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/camvid.
split ({'train', 'val', 'test'}) – Select from dataset splits used in CamVid Dataset.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	$(H, W)$	`int32`	$[-1, \#class - 1]$

Cityscapes¶

CityscapesSemanticSegmentationDataset¶

class chainercv.datasets.CityscapesSemanticSegmentationDataset(data_dir='auto', label_resolution=None, split='train', ignore_labels=True)[source]¶

Semantic segmentation dataset for Cityscapes dataset.

Note

Please manually download the data because it is not allowed to re-distribute Cityscapes dataset.

Parameters

data_dir (string) – Path to the dataset directory. The directory should contain at least two directories, leftImg8bit and either gtFine or gtCoarse. If auto is given, it uses $CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes by default.
label_resolution ({'fine', 'coarse'}) – The resolution of the labels. It should be either fine or coarse.
split ({'train', 'val'}) – Select from dataset splits used in Cityscapes dataset.
ignore_labels (bool) – If True, the labels marked ignoreInEval defined in the original cityscapesScripts will be replaced with -1 in the get_example() method. The default value is True.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	$(H, W)$	`int32`	$[-1, \#class - 1]$

CityscapesTestImageDataset¶

class chainercv.datasets.CityscapesTestImageDataset(data_dir='auto')[source]¶

Image dataset for test split of Cityscapes dataset.

Note

Please manually download the data because it is not allowed to re-distribute Cityscapes dataset.

Parameters: data_dir (string) – Path to the dataset directory. The directory should contain the leftImg8bit directory. If auto is given, it uses $CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes by default.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$

CUB¶

CUBLabelDataset¶

class chainercv.datasets.CUBLabelDataset(data_dir='auto', return_bbox=False, prob_map_dir='auto', return_prob_map=False)[source]¶

Caltech-UCSD Birds-200-2011 dataset with annotated class labels.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
return_bbox (bool) – If True, this returns a bounding box around a bird. The default value is False.
prob_map_dir (string) – Path to the root of the probability maps. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
return_prob_map (bool) – Decide whether to include a probability map of the bird in a tuple served for a query. The default value is False.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	scalar	`int32`	$[0, \#class - 1]$
`bbox` 6	$(1, 4)$	`float32`	$(y_{min}, x_{min}, y_{max}, x_{max})$
`prob_map` 7	$(H, W)$	`float32`	$[0, 1]$

6: bb indicates the location of a bird. It is available if return_bbox = True.
7: prob_map indicates how likey a bird is located at each the pixel. It is available if return_prob_map = True.

CUBKeypointDataset¶

class chainercv.datasets.CUBKeypointDataset(data_dir='auto', return_bbox=False, prob_map_dir='auto', return_prob_map=False)[source]¶

Caltech-UCSD Birds-200-2011 dataset with annotated points.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
return_bbox (bool) – If True, this returns a bounding box around a bird. The default value is False.
prob_map_dir (string) – Path to the root of the probability maps. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
return_prob_map (bool) – Decide whether to include a probability map of the bird in a tuple served for a query. The default value is False.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`point`	$(1, 15, 2)$	`float32`	$(y, x)$
`visible`	$(1, 15)$	`bool`	–
`bbox` 8	$(1, 4)$	`float32`	$(y_{min}, x_{min}, y_{max}, x_{max})$
`prob_map` 9	$(H, W)$	`float32`	$[0, 1]$

8: bb indicates the location of a bird. It is available if return_bbox = True.
9: prob_map indicates how likey a bird is located at each the pixel. It is available if return_prob_map = True.

MS COCO¶

COCOBboxDataset¶

class chainercv.datasets.COCOBboxDataset(data_dir='auto', split='train', year='2017', use_crowded=False, return_area=False, return_crowded=False)[source]¶

Bounding box dataset for MS COCO.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/coco.
split ({'train', 'val', 'minival', 'valminusminival'}) – Select a split of the dataset.
year ({'2014', '2017'}) – Use a dataset released in year. Splits minival and valminusminival are only supported in year 2014.
use_crowded (bool) – If true, use bounding boxes that are labeled as crowded in the original annotation. The default value is False.
return_area (bool) – If true, this dataset returns areas of masks around objects. The default value is False.
return_crowded (bool) – If true, this dataset returns a boolean array that indicates whether bounding boxes are labeled as crowded or not. The default value is False.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`bbox` 10	$(R, 4)$	`float32`	$(y_{min}, x_{min}, y_{max}, x_{max})$
`label` 10	$(R,)$	`int32`	$[0, \#fg\_class - 1]$
`area` 10 11	$(R,)$	`float32`	–
`crowded` 12	$(R,)$	`bool`	–

10(1,2,3,4): If use_crowded = True, bbox, label and area contain crowded instances.
11: area is available if return_area = True.
12: crowded is available if return_crowded = True.

When there are more than ten objects from the same category, bounding boxes correspond to crowd of instances instead of individual instances. Please see more detail in the Fig. 12 (e) of the summary paper 13.

13: Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar. Microsoft COCO: Common Objects in Context. arXiv 2014.

COCOInstanceSegmentationDataset¶

class chainercv.datasets.COCOInstanceSegmentationDataset(data_dir='auto', split='train', year='2017', use_crowded=False, return_crowded=False, return_area=False, return_bbox=False)[source]¶

Instance segmentation dataset for MS COCO.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/coco.
split ({'train', 'val', 'minival', 'valminusminival'}) – Select a split of the dataset.
year ({'2014', '2017'}) – Use a dataset released in year. Splits minival and valminusminival are only supported in year 2014.
use_crowded (bool) – If true, use masks that are labeled as crowded in the original annotation.
return_crowded (bool) – If true, this dataset returns a boolean array that indicates whether masks are labeled as crowded or not. The default value is False.
return_area (bool) – If true, this dataset returns areas of masks around objects.
return_bbox (bool) – If true, this dataset returns bounding boxes arround objects.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`mask` 14	$(R, H, W)$	`bool`	–
`label` 14	$(R,)$	`int32`	$[0, \#fg\_class - 1]$
`area` 14 15	$(R,)$	`float32`	–
`crowded` 16	$(R,)$	`bool`	–
`bbox` 10	$(R, 4)$	`float32`	$(y_{min}, x_{min}, y_{max}, x_{max})$

14(1,2,3): If use_crowded = True, mask, label, area and bbox contain crowded instances.
15: area is available if return_area = True.
16: crowded is available if return_crowded = True.

When there are more than ten objects from the same category, masks correspond to crowd of instances instead of individual instances. Please see more detail in the Fig. 12 (e) of the summary paper 17.

17: Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar. Microsoft COCO: Common Objects in Context. arXiv 2014.

COCOSemanticSegmentationDataset¶

class chainercv.datasets.COCOSemanticSegmentationDataset(data_dir='auto', split='train')[source]¶

Semantic segmentation dataset for MS COCO.

Semantic segmentations are generated from panoptic segmentations as done in the official toolkit.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/coco.
split ({'train', 'val'}) – Select a split of the dataset.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	$(H, W)$	`int32`	$[-1, \#class - 1]$

OnlineProducts¶

OnlineProductsDataset¶

class chainercv.datasets.OnlineProductsDataset(data_dir='auto', split='train')[source]¶

Dataset class for Stanford Online Products Dataset.

The split selects train and test split of the dataset as done in 18. The train split contains the first 11318 classes and the test split contains the remaining 11316 classes.

18: Hyun Oh Song, Yu Xiang, Stefanie Jegelka, Silvio Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. arXiv 2015.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/online_products.
split ({'train', 'test'}) – Select a split of the dataset.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	scalar	`int32`	$[0, \#class - 1]$
`super_label`	scalar	`int32`	$[0, \#super\_class - 1]$

PASCAL VOC¶

VOCBboxDataset¶

class chainercv.datasets.VOCBboxDataset(data_dir='auto', split='train', year='2012', use_difficult=False, return_difficult=False)[source]¶

Bounding box dataset for PASCAL VOC.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
split ({'train', 'val', 'trainval', 'test'}) – Select a split of the dataset. test split is only available for 2007 dataset.
year ({'2007', '2012'}) – Use a dataset prepared for a challenge held in year.
use_difficult (bool) – If True, use images that are labeled as difficult in the original annotation.
return_difficult (bool) – If True, this dataset returns a boolean array that indicates whether bounding boxes are labeled as difficult or not. The default value is False.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`bbox` 19	$(R, 4)$	`float32`	$(y_{min}, x_{min}, y_{max}, x_{max})$
`label` 19	$(R,)$	`int32`	$[0, \#fg\_class - 1]$
`difficult` (optional 20)	$(R,)$	`bool`	–

19(1,2): If use_difficult = True, bbox and label contain difficult instances.
20: difficult is available if return_difficult = True.

VOCInstanceSegmentationDataset¶

class chainercv.datasets.VOCInstanceSegmentationDataset(data_dir='auto', split='train')[source]¶

Instance segmentation dataset for PASCAL VOC2012.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`mask`	$(R, H, W)$	`bool`	–
`label`	$(R,)$	`int32`	$[0, \#fg\_class - 1]$

VOCSemanticSegmentationDataset¶

class chainercv.datasets.VOCSemanticSegmentationDataset(data_dir='auto', split='train')[source]¶

Semantic segmentation dataset for PASCAL VOC2012.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`label`	$(H, W)$	`int32`	$[-1, \#class - 1]$

Semantic Boundaries Dataset¶

SBDInstanceSegmentationDataset¶

class chainercv.datasets.SBDInstanceSegmentationDataset(data_dir='auto', split='train')[source]¶

Instance segmentation dataset for Semantic Boundaries Dataset SBD.

Parameters

data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/sbd.
split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

This dataset returns the following data.

name	shape	dtype	format
`img`	$(3, H, W)$	`float32`	RGB, $[0, 255]$
`mask`	$(R, H, W)$	`bool`	–
`label`	$(R,)$	`int32`	$[0, \#fg\_class - 1]$