PSPNet

Utility

convolution_crop

chainercv.experimental.links.model.pspnet.convolution_crop(img, size, stride, return_param=False)[source]

Strided cropping.

This extracts cropped images from the input. The cropped images are extracted from the entire image, while taking a constant steps between neighboring patches.

Parameters
  • img (ndarray) – An image array to be cropped. This is in CHW format.

  • size (tuple) – The size of output image after cropping. This value is \((height, width)\).

  • stride (tuple) – The stride between crops. This contains two values: stride in the vertical and horizontal directions.

  • return_param (bool) – If True, this function returns information of slices.

Returns

If return_param = False, returns an array crop_imgs that is a stack of cropped images.

If return_param = True, returns a tuple whose elements are crop_imgs, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_slices (list slices): Slices used to crop the input image. The relation below holds together with x_slices.

  • x_slices (list of slices): Similar to y_slices.

  • crop_y_slices (list of slices): This indicates the region of the cropped image that is actually extracted from the input. This is relevant only when borders of the input are cropped.

  • crop_x_slices (list of slices): Similar to crop_y_slices.

    crop_img = crop_imgs[i][:, crop_y_slices[i], crop_x_slices[i]]
    crop_img == img[:, y_slices[i], x_slices[i]]
    

Return type

ndarray or (ndarray, dict)

Examples

>>> import numpy as np
>>> from chainercv.datasets import VOCBboxDataset
>>> from chainercv.transforms import resize
>>> from chainercv.experimental.links.model.pspnet import         ...     convolution_crop
>>>
>>> img, _, _ = VOCBboxDataset(year='2007')[0]
>>> img = resize(img, (300, 300))
>>> imgs, param = convolution_crop(
>>>     img, (128, 128), (96, 96), return_param=True)
>>> # Restore the original image from the cropped images.
>>> output = np.zeros((3, 300, 300))
>>> count = np.zeros((300, 300))
>>> for i in range(len(imgs)):
>>>     crop_y_slice = param['crop_y_slices'][i]
>>>     crop_x_slice = param['crop_x_slices'][i]
>>>     y_slice = param['y_slices'][i]
>>>     x_slice = param['x_slices'][i]
>>>     output[:, y_slice, x_slice] +=        ...         imgs[i][:, crop_y_slice, crop_x_slice]
>>>     count[y_slice, x_slice] += 1
>>> output = output / count[None]
>>> np.testing.assert_equal(output, img)
>>>
>>> # Visualization of the cropped images
>>> import matplotlib.pyplot as plt
>>> from chainercv.utils import tile_images
>>> from chainercv.visualizations import vis_image
>>> v_imgs = tile_images(imgs, 5, fill=122.5)
>>> vis_image(v_imgs)
>>> plt.show()

PSPNet

class chainercv.experimental.links.model.pspnet.PSPNet(n_class=None, pretrained_model=None, input_size=None, initialW=None)[source]

Pyramid Scene Parsing Network.

This is a PSPNet 1 model for semantic segmentation. This is based on the implementation found here.

1

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang Jiaya Jia “Pyramid Scene Parsing Network” CVPR, 2017

Parameters
  • n_class (int) – The number of channels in the last convolution layer.

  • pretrained_model (string) – The weight file to be loaded. This can take 'cityscapes', filepath or None. The default value is None. * 'cityscapes': Load weights trained on the train split of Cityscapes dataset. n_class must be 19 or None. * 'ade20k': Load weights trained on the train split of ADE20K dataset. n_class must be 150 or None. * 'imagenet': Load ImageNet pretrained weights for the extractor. * filepath: A path of npz file. In this case, n_class must be specified properly. * None: Do not load weights.

  • input_size (tuple) – The size of the input. This value is \((height, width)\).

  • initialW (callable) – Initializer for the weights of convolution kernels.

predict(imgs)[source]

Conduct semantic segmentation from images.

Parameters

imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their values are \([0, 255]\).

Returns

List of integer labels predicted from each image in the input list.

Return type

list of numpy.ndarray