Datasets

TransformDataset

TransformDataset

chainercv.datasets.TransformDataset(dataset, transform)

Dataset that indexes data of a base dataset and transforms it.

This dataset wraps a base dataset by modifying the behavior of the base dataset’s __getitem__(). Arrays returned by __getitem__() of the base dataset with an integer index are transformed by the given function transform.

The function transform takes, as an argument, in_data, which is output of the base dataset’s __getitem__(), and returns transformed arrays as output. Please see the following example.

>>> from chainer.datasets import get_mnist
>>> from chainercv.datasets import TransformDataset
>>> dataset, _ = get_mnist()
>>> def transform(in_data):
>>>     img, label = in_data
>>>     img -= 0.5  # scale to [-0.5, -0.5]
>>>     return img, label
>>> dataset = TransformDataset(dataset, transform)

Note

The index used to access data is either an integer or a slice. If it is a slice, the base dataset is assumed to return a list of outputs each corresponding to the output of the integer indexing.

Parameters:
  • dataset – Underlying dataset. The index of this dataset corresponds to the index of the base dataset.
  • transform (callable) – A function that is called to transform values returned by the underlying dataset’s __getitem__().

CamVid

CamVidDataset

chainercv.datasets.CamVidDataset(data_dir='auto', split='train')

Dataset class for a semantic segmantion task on CamVid u.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/camvid.
  • split ({'train', 'val', 'test'}) – Select from dataset splits used in CamVid Dataset.

CUB

CUBLabelDataset

chainercv.datasets.CUBLabelDataset(data_dir='auto', crop_bbox=True)

Caltech-UCSD Birds-200-2011 dataset with annotated class labels.

When queried by an index, this dataset returns a corresponding img, label, a tuple of an image and class id. The image is in RGB and CHW format. The class id are between 0 and 199.

There are 200 labels of birds in total.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • crop_bbox (bool) – If true, this class returns an image cropped by the bounding box of the bird inside it.

CUBKeypointDataset

chainercv.datasets.CUBKeypointDataset(data_dir='auto', crop_bbox=True, mask_dir='auto', return_mask=False)

Caltech-UCSD Birds-200-2011 dataset with annotated keypoints.

An index corresponds to each image.

When queried by an index, this dataset returns the corresponding img, keypoint, kp_mask, a tuple of an image, keypoints and a keypoint mask that indicates visible keypoints in the image. The data type of the three elements are float32, float32, bool. If return_mask = True, mask will be returned as well, making the returned tuple to be of length four. mask is a uint8 image which indicates the region of the image where a bird locates.

keypoints are packed into a two dimensional array of shape \((K, 2)\), where \(K\) is the number of keypoints. Note that \(K=15\) in CUB dataset. Also note that not all fifteen keypoints are visible in an image. When a keypoint is not visible, the values stored for that keypoint are undefined. The second axis corresponds to the \(y\) and \(x\) coordinates of the keypoints in the image.

A keypoint mask array indicates whether a keypoint is visible in the image or not. This is a boolean array of shape \((K,)\).

A mask image of the bird shows how likely the bird is located at a given pixel. If the value is close to 255, more likely that a bird locates at that pixel. The shape of this array is \((1, H, W)\), where \(H\) and \(W\) are height and width of the image respectively.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • crop_bbox (bool) – If true, this class returns an image cropped by the bounding box of the bird inside it.
  • mask_dir (string) – Path to the root of the mask data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.
  • return_mask (bool) – Decide whether to include mask image of the bird in a tuple served for a query.

OnlineProducts

OnlineProductsDataset

chainercv.datasets.OnlineProductsDataset(data_dir='auto', split='train')

Dataset class for Stanford Online Products Dataset.

When queried by an index, this dataset returns a corresponding img, class_id, super_class_id, a tuple of an image, a class id and a coarse level class id. Images are in RGB and CHW format. Class ids start from 0.

The split selects train and test split of the dataset as done in [1]. The train split contains the first 11318 classes and the test split contains the remaining 11316 classes.

[1]Hyun Oh Song, Yu Xiang, Stefanie Jegelka, Silvio Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. arXiv 2015.
Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/online_products.
  • split ({'train', 'test'}) – Select a split of the dataset.

PASCAL VOC

VOCDetectionDataset

chainercv.datasets.VOCDetectionDataset(data_dir='auto', split='train', year='2012', use_difficult=False, return_difficult=False)

Dataset class for the detection task of PASCAL VOC.

The index corresponds to each image.

When queried by an index, if return_difficult == False, this dataset returns a corresponding img, bbox, label, a tuple of an image, bounding boxes and labels. This is the default behaviour. If return_difficult == True, this dataset returns corresponding img, bbox, label, difficult. difficult is a boolean array that indicates whether bounding boxes are labeled as difficult or not.

The bounding boxes are packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are (y_min, x_min, y_max, x_max), where the four attributes are coordinates of the bottom left and the top right vertices.

The labels are packed into a one dimensional tensor of shape \((R,)\). \(R\) is the number of bounding boxes in the image. The class name of the label \(l\) is \(l\) th element of chainercv.datasets.voc_detection_label_names.

The array difficult is a one dimensional boolean array of shape \((R,)\). \(R\) is the number of bounding boxes in the image.

The type of the image, the bounding boxes and the labels are as follows.

  • img.dtype == numpy.float32
  • bbox.dtype == numpy.float32
  • label.dtype == numpy.int32
  • difficult.dtype == numpy.bool
Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
  • split ({'train', 'val', 'trainval', 'test'}) – Select a split of the dataset. test split is only available for 2007 dataset.
  • year ({'2007', '2012'}) – Use a dataset prepared for a challenge held in year.
  • use_difficult (bool) – If true, use images that are labeled as difficult in the original annotation.
  • return_difficult (bool) – If true, this dataset returns a boolean array that indicates whether bounding boxes are labeled as difficult or not. The default value is False.

VOCSemanticSegmentationDataset

chainercv.datasets.VOCSemanticSegmentationDataset(data_dir='auto', split='train')

Dataset class for the semantic segmantion task of PASCAL VOC2012.

The class name of the label \(l\) is \(l\) th element of chainercv.datasets.voc_semantic_segmentation_label_names.

Parameters:
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.
  • split ({'train', 'val', 'trainval'}) – Select a split of the dataset.