Transforms¶
Image¶
center_crop¶
-
chainercv.transforms.
center_crop
(img, size, return_param=False, copy=False)¶ Center crop an image by size.
An image is cropped to
size
. The center of the output image and the center of the input image are same.Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
y_slice
.out_img = img[:, y_slice, x_slice]
Return type:
flip¶
-
chainercv.transforms.
flip
(img, y_flip=False, x_flip=False, copy=False)¶ Flip an image in vertical or horizontal direction as specified.
Parameters: Returns: Transformed
img
in CHW format.
pca_lighting¶
-
chainercv.transforms.
pca_lighting
(img, sigma, eigen_value=None, eigen_vector=None)¶ AlexNet style color augmentation
This method adds a noise vector drawn from a Gaussian. The direction of the Gaussian is same as that of the principal components of the dataset.
This method is used in training of AlexNet [1].
[1] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012. Parameters: - img (ndarray) – An image array to be augmented. This is in CHW and RGB format.
- sigma (float) – Standard deviation of the Gaussian. In the original paper, this value is 10% of the range of intensity (25.5 if the range is \([0, 255]\)).
- eigen_value (ndarray) – An array of eigen values. The shape has to be \((3,)\). If it is not specified, the values computed from ImageNet are used.
- eigen_vector (ndarray) – An array of eigen vectors. The shape has to be \((3, 3)\). If it is not specified, the vectors computed from ImageNet are used.
Returns: An image in CHW format.
random_crop¶
-
chainercv.transforms.
random_crop
(img, size, return_param=False, copy=False)¶ Crop array randomly into size.
The input image is cropped by a randomly selected region whose shape is
size
.Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
x_slice
.out_img = img[:, y_slice, x_slice]
Return type:
random_expand¶
-
chainercv.transforms.
random_expand
(img, max_ratio=4, fill=0, return_param=False)¶ Expand an image randomly.
This method randomly place the input image on a larger canvas. The size of the canvas is \((rH, rW)\), where \((H, W)\) is the size of the input image and \(r\) is a random ratio drawn from \([1, max\_ratio]\). The canvas is filled by a value
fill
except for the region where the original image is placed.This data augmentation trick is used to create “zoom out” effect [2].
[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. Parameters: - img (ndarray) – An image array to be augmented. This is in CHW format.
- max_ratio (float) – The maximum ratio of expansion. In the original paper, this value is 4.
- fill (float, tuple or ndarray) – The value of padded pixels. In the original paper, this value is the mean of ImageNet.
- return_param (bool) – Returns random parameters.
Returns: If
return_param = False
, returns an arrayout_img
that is the result of expansion.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- ratio (float): The sampled value used to make the canvas.
- y_offset (int): The y coodinate of the top left corner of the image after placing on the canvas.
- x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
Return type:
random_flip¶
-
chainercv.transforms.
random_flip
(img, y_random=False, x_random=False, return_param=False, copy=False)¶ Randomly flip an image in vertical or horizontal direction.
Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is the result of flipping.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- y_flip (bool): Whether the image was flipped in the vertical direction or not.
- x_flip (bool): Whether the image was flipped in the horizontal direction or not.
Return type:
random_rotate¶
-
chainercv.transforms.
random_rotate
(img, return_param=False)¶ Randomly rotate images by 90, 180, 270 or 360 degrees.
Parameters: Returns: If
return_param = False
, returns an arrayout_img
that is the result of rotation.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- k (int): The integer that represents the number of times the image is rotated by 90 degrees.
Return type:
resize¶
-
chainercv.transforms.
resize
(img, size, interpolation=2)¶ Resize image to match the given shape.
This method uses
cv2
orPIL
for the backend. Ifcv2
is installed, this function uses the implementation incv2
. This implementation is faster than the implementation inPIL
. Under Anaconda environment,cv2
can be installed by the following command.$ conda install -c menpo opencv3=3.2.0
Parameters: - img (ndarray) – An array to be transformed.
This is in CHW format and the type should be
numpy.float32
. - size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).
- interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy.
Returns: A resize array in CHW format.
Return type: - img (ndarray) – An array to be transformed.
This is in CHW format and the type should be
resize_contain¶
-
chainercv.transforms.
resize_contain
(img, size, fill=0, return_param=False)¶ Resize the image to fit in the given area while keeping aspect ratio.
If both the height and the width in
size
are larger than the height and the width of theimg
, theimg
is placed on the center with an appropriate padding to matchsize
.Otherwise, the input image is scaled to fit in a canvas whose size is
size
while preserving aspect ratio.Parameters: - img (ndarray) – An array to be transformed. This is in CHW format.
- size (tuple of two ints) – A tuple of two elements:
height, width
. The size of the image after resizing. - fill (float, tuple or ndarray) – The value of padded pixels.
- return_param (bool) – Returns information of resizing and offsetting.
Returns: If
return_param = False
, returns an arrayout_img
that is the result of resizing.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.- y_offset (int): The y coodinate of the top left corner of the image after placing on the canvas.
- x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
- scaled_size (tuple): The size to which the image is scaled to before placing it on a canvas. This is a tuple of two elements:
height, width
.
Return type:
scale¶
-
chainercv.transforms.
scale
(img, size, fit_short=True)¶ Rescales the input image to the given “size”.
When
fit_short == True
, the input image will be resized so that the shorter edge will be scaled to lengthsize
after resizing. For example, if the height of the image is larger than its width, image will be resized to (size * height / width, size).Otherwise, the input image will be resized so that the longer edge will be scaled to length
size
after resizing.Parameters: Returns: A scaled image in CHW format.
Return type:
ten_crop¶
-
chainercv.transforms.
ten_crop
(img, size)¶ Crop 10 regions from an array.
This method crops 10 regions. All regions will be in shape
size
. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them.The crops are ordered in this order.
- center crop
- top-left crop
- bottom-left crop
- top-right crop
- bottom-right crop
- center crop (flipped horizontally)
- top-left crop (flipped horizontally)
- bottom-left crop (flipped horizontally)
- top-right crop (flipped horizontally)
- bottom-right crop (flipped horizontally)
Parameters: Returns: The cropped arrays. The shape of tensor is \((10, C, H, W)\).
Bounding Box¶
flip_bbox¶
-
chainercv.transforms.
flip_bbox
(bbox, size, y_flip=False, x_flip=False)¶ Flip bounding boxes accordingly.
The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are
(y_min, x_min, y_max, x_max)
, where the four attributes are coordinates of the bottom left and the top right vertices.Parameters: - bbox (ndarray) – An array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes.
- size (tuple) – A tuple of length 2. The height and the width of the image before resized.
- y_flip (bool) – Flip bounding box according to a vertical flip of an image.
- x_flip (bool) – Flip bounding box according to a horizontal flip of an image.
Returns: Bounding boxes flipped according to the given flips.
Return type:
resize_bbox¶
-
chainercv.transforms.
resize_bbox
(bbox, in_size, out_size)¶ Resize bounding boxes according to image resize.
The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are
(y_min, x_min, y_max, x_max)
, where the four attributes are coordinates of the bottom left and the top right vertices.Parameters: Returns: Bounding boxes rescaled according to the given image shapes.
Return type:
translate_bbox¶
-
chainercv.transforms.
translate_bbox
(bbox, y_offset=0, x_offset=0)¶ Translate bounding boxes.
This method is mainly used together with image transforms, such as padding and cropping, which translates the left top point of the image from coordinate \((0, 0)\) to coordinate \((y, x) = (y\_offset, x\_offset)\).
The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are
(y_min, x_min, y_max, x_max)
, where the four attributes are coordinates of the bottom left and the top right vertices.Parameters: Returns: Bounding boxes translated according to the given offsets.
Return type:
Keypoint¶
flip_keypoint¶
-
chainercv.transforms.
flip_keypoint
(keypoint, size, y_flip=False, x_flip=False)¶ Modify keypoints according to image flips.
Parameters: - keypoint (ndarray) – Keypoints in the image. The shape of this array is \((K, 2)\). \(K\) is the number of keypoints in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the keypoints.
- size (tuple) – A tuple of length 2. The height and the width of the image which is associated with the keypoints.
- y_flip (bool) – Modify keypoints according to a vertical flip of an image.
- x_flip (bool) – Modify keypoints according to a horizontal flip of an image.
Returns: Keypoints modified according to image flips.
Return type:
resize_keypoint¶
-
chainercv.transforms.
resize_keypoint
(keypoint, in_size, out_size)¶ Change values of keypoint according to paramters for resizing an image.
Parameters: - keypoint (ndarray) – Keypoints in the image. The shape of this array is \((K, 2)\). \(K\) is the number of keypoint in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the keypoints.
- in_size (tuple) – A tuple of length 2. The height and the width of the image before resized.
- out_size (tuple) – A tuple of length 2. The height and the width of the image after resized.
Returns: Keypoint rescaled according to the given image shapes.
Return type:
translate_keypoint¶
-
chainercv.transforms.
translate_keypoint
(keypoint, y_offset=0, x_offset=0)¶ Translate keypoints.
This method is mainly used together with image transforms, such as padding and cropping, which translates the top left point of the image to the coordinate \((y, x) = (y\_offset, x\_offset)\).
Parameters: - keypoint (ndarray) – Keypoints in the image. The shape of this array is \((K, 2)\). \(K\) is the number of keypoints in the image. The last dimension is composed of \(y\) and \(x\) coordinates of the keypoints.
- y_offset (int or float) – The offset along y axis.
- x_offset (int or float) – The offset along x axis.
Returns: Keypoints modified translation of an image.
Return type: