PSPNet¶

Semantic Segmentation Link¶

PSPNetResNet101¶

class chainercv.experimental.links.model.pspnet.PSPNetResNet101(n_class=None, pretrained_model=None, input_size=None, initialW=None, comm=None)[source]¶

PSPNet with Dilated ResNet101 as the feature extractor.

Utility¶

convolution_crop¶

chainercv.experimental.links.model.pspnet.convolution_crop(img, size, stride, return_param=False)[source]¶

Strided cropping.

This extracts cropped images from the input. The cropped images are extracted from the entire image, while taking a constant steps between neighboring patches.

Parameters:

img (ndarray) – An image array to be cropped. This is in CHW format.
size (tuple) – The size of output image after cropping. This value is \((height, width)\).
stride (tuple) – The stride between crops. This contains two values: stride in the vertical and horizontal directions.
return_param (bool) – If True, this function returns information of slices.

Returns:

If return_param = False, returns an array crop_imgs that is a stack of cropped images.

If return_param = True, returns a tuple whose elements are crop_imgs, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

y_slices (list slices): Slices used to crop the input image. The relation below holds together with x_slices.
x_slices (list of slices): Similar to y_slices.
crop_y_slices (list of slices): This indicates the region of the cropped image that is actually extracted from the input. This is relevant only when borders of the input are cropped.

crop_x_slices (list of slices): Similar to crop_y_slices.

crop_img = crop_imgs[i][:, crop_y_slices[i], crop_x_slices[i]]
crop_img == img[:, y_slices[i], x_slices[i]]

Return type:

ndarray or (ndarray, dict)

Examples

>>> import numpy as np
>>> from chainercv.datasets import VOCBboxDataset
>>> from chainercv.transforms import resize
>>> from chainercv.experimental.links.model.pspnet import         ...     convolution_crop
>>>
>>> img, _, _ = VOCBboxDataset(year='2007')[0]
>>> img = resize(img, (300, 300))
>>> imgs, param = convolution_crop(
>>>     img, (128, 128), (96, 96), return_param=True)
>>> # Restore the original image from the cropped images.
>>> output = np.zeros((3, 300, 300))
>>> count = np.zeros((300, 300))
>>> for i in range(len(imgs)):
>>>     crop_y_slice = param['crop_y_slices'][i]
>>>     crop_x_slice = param['crop_x_slices'][i]
>>>     y_slice = param['y_slices'][i]
>>>     x_slice = param['x_slices'][i]
>>>     output[:, y_slice, x_slice] +=        ...         imgs[i][:, crop_y_slice, crop_x_slice]
>>>     count[y_slice, x_slice] += 1
>>> output = output / count[None]
>>> np.testing.assert_equal(output, img)
>>>
>>> # Visualization of the cropped images
>>> import matplotlib.pyplot as plt
>>> from chainercv.utils import tile_images
>>> from chainercv.visualizations import vis_image
>>> v_imgs = tile_images(imgs, 5, fill=122.5)
>>> vis_image(v_imgs)
>>> plt.show()

PSPNet¶

class chainercv.experimental.links.model.pspnet.PSPNet(extractor, n_class, input_size, initialW=None, bn_kwargs=None)[source]¶

Pyramid Scene Parsing Network.

This is a PSPNet [1] model for semantic segmentation. This is based on the implementation found here.

[1]	Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang Jiaya Jia “Pyramid Scene Parsing Network” CVPR, 2017

Parameters:

extractor (chainer.Chain) – A feature extractor.
n_class (int) – The number of channels in the last convolution layer.
input_size (tuple) – The size of the input. This value is \((height, width)\).
initialW (callable) – Initializer for the weights of convolution kernels.
bn_kwargs (dict) – Keyword arguments passed to initialize chainer.links.BatchNormalization. If a ChainerMN communicator (CommunicatorBase) is given with the key comm, MultiNodeBatchNormalization will be used for the batch normalization. Otherwise, BatchNormalization will be used.

predict(imgs)[source]¶

Conduct semantic segmentation from images.

Parameters:	imgs (iterable of numpy.ndarray) – Arrays holding images. All images are in CHW and RGB format and the range of their values are \([0, 255]\).
Returns:	List of integer labels predicted from each image in the input list.
Return type:	list of numpy.ndarray