Transforms¶
Image¶
center_crop¶
-
chainercv.transforms.
center_crop
(img, size, return_param=False, copy=False)[source]¶ Center crop an image by size.
An image is cropped to
size
. The center of the output image and the center of the input image are same.- Parameters
- Returns
If
return_param = False
, returns an arrayout_img
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
y_slice
.out_img = img[:, y_slice, x_slice]
- Return type
flip¶
pca_lighting¶
-
chainercv.transforms.
pca_lighting
(img, sigma, eigen_value=None, eigen_vector=None)[source]¶ AlexNet style color augmentation
This method adds a noise vector drawn from a Gaussian. The direction of the Gaussian is same as that of the principal components of the dataset.
This method is used in training of AlexNet 1.
- 1
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
- Parameters
img (ndarray) – An image array to be augmented. This is in CHW and RGB format.
sigma (float) – Standard deviation of the Gaussian. In the original paper, this value is 10% of the range of intensity (25.5 if the range is \([0, 255]\)).
eigen_value (ndarray) – An array of eigen values. The shape has to be \((3,)\). If it is not specified, the values computed from ImageNet are used.
eigen_vector (ndarray) – An array of eigen vectors. The shape has to be \((3, 3)\). If it is not specified, the vectors computed from ImageNet are used.
- Returns
An image in CHW format.
random_crop¶
-
chainercv.transforms.
random_crop
(img, size, return_param=False, copy=False)[source]¶ Crop array randomly into size.
The input image is cropped by a randomly selected region whose shape is
size
.- Parameters
- Returns
If
return_param = False
, returns an arrayout_img
that is cropped from the input array.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
x_slice
.out_img = img[:, y_slice, x_slice]
- Return type
random_expand¶
-
chainercv.transforms.
random_expand
(img, max_ratio=4, fill=0, return_param=False)[source]¶ Expand an image randomly.
This method randomly place the input image on a larger canvas. The size of the canvas is \((rH, rW)\), where \((H, W)\) is the size of the input image and \(r\) is a random ratio drawn from \([1, max\_ratio]\). The canvas is filled by a value
fill
except for the region where the original image is placed.This data augmentation trick is used to create “zoom out” effect 2.
- 2
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
- Parameters
img (ndarray) – An image array to be augmented. This is in CHW format.
max_ratio (float) – The maximum ratio of expansion. In the original paper, this value is 4.
fill (float, tuple or ndarray) – The value of padded pixels. In the original paper, this value is the mean of ImageNet. If it is
numpy.ndarray
, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels ofimg
.return_param (bool) – Returns random parameters.
- Returns
If
return_param = False
, returns an arrayout_img
that is the result of expansion.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.ratio (float): The sampled value used to make the canvas.
y_offset (int): The y coordinate of the top left corner of the image after placing on the canvas.
x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
- Return type
random_flip¶
-
chainercv.transforms.
random_flip
(img, y_random=False, x_random=False, return_param=False, copy=False)[source]¶ Randomly flip an image in vertical or horizontal direction.
- Parameters
- Returns
If
return_param = False
, returns an arrayout_img
that is the result of flipping.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_flip (bool): Whether the image was flipped in the vertical direction or not.
x_flip (bool): Whether the image was flipped in the horizontal direction or not.
- Return type
random_rotate¶
-
chainercv.transforms.
random_rotate
(img, return_param=False)[source]¶ Randomly rotate images by 90, 180, 270 or 360 degrees.
- Parameters
- Returns
If
return_param = False
, returns an arrayout_img
that is the result of rotation.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.k (int): The integer that represents the number of times the image is rotated by 90 degrees.
- Return type
random_sized_crop¶
-
chainercv.transforms.
random_sized_crop
(img, scale_ratio_range=(0.08, 1), aspect_ratio_range=(0.75, 1.3333333333333333), return_param=False, copy=False)[source]¶ Crop an image to random size and aspect ratio.
The size \((H_{crop}, W_{crop})\) and the left top coordinate \((y_{start}, x_{start})\) of the crop are calculated as follows:
\(H_{crop} = \lfloor{\sqrt{s \times H \times W \times a}}\rfloor\)
\(W_{crop} = \lfloor{\sqrt{s \times H \times W \div a}}\rfloor\)
\(y_{start} \sim Uniform\{0, H - H_{crop}\}\)
\(x_{start} \sim Uniform\{0, W - W_{crop}\}\)
\(s \sim Uniform(s_1, s_2)\)
\(b \sim Uniform(a_1, a_2)\) and \(a = b\) or \(a = \frac{1}{b}\) in 50/50 probability.
Here, \(s_1, s_2\) are the two floats in
scale_ratio_range
and \(a_1, a_2\) are the two floats inaspect_ratio_range
. Also, \(H\) and \(W\) are the height and the width of the image. Note that \(s \approx \frac{H_{crop} \times W_{crop}}{H \times W}\) and \(a \approx \frac{H_{crop}}{W_{crop}}\). The approximations come from flooring floats to integers.Note
When it fails to sample a valid scale and aspect ratio for ten times, it picks values in a non-uniform way. If this happens, the selected scale ratio can be smaller than
scale_ratio_range[0]
.- Parameters
img (ndarray) – An image array. This is in CHW format.
scale_ratio_range (tuple of two floats) – Determines the distribution from which a scale ratio is sampled. The default values are selected so that the area of the crop is 8~100% of the original image. This is the default setting used to train ResNets in Torch style.
aspect_ratio_range (tuple of two floats) – Determines the distribution from which an aspect ratio is sampled. The default values are \(\frac{3}{4}\) and \(\frac{4}{3}\), which are also the default setting to train ResNets in Torch style.
- Returns
If
return_param = False
, returns only the cropped image.If
return_param = True
, returns a tuple of cropped image andparam
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_slice (slice): A slice used to crop the input image. The relation below holds together with
x_slice
.x_slice (slice): Similar to
y_slice
.out_img = img[:, y_slice, x_slice]
scale_ratio (float): \(s\) in the description (see above).
aspect_ratio (float): \(a\) in the description.
- Return type
resize¶
-
chainercv.transforms.
resize
(img, size, interpolation=2)[source]¶ Resize image to match the given shape.
The backend used by
resize()
is configured bychainer.global_config.cv_resize_backend
. Two backends are supported: “cv2” and “PIL”. If this isNone
, “cv2” is used whenever “cv2” is installed, and “PIL” is used when “cv2” is not installed.- Parameters
img (ndarray) – An array to be transformed. This is in CHW format and the type should be
numpy.float32
.size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).
interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy.
- Returns
A resize array in CHW format.
- Return type
resize_contain¶
-
chainercv.transforms.
resize_contain
(img, size, fill=0, interpolation=2, return_param=False)[source]¶ Resize the image to fit in the given area while keeping aspect ratio.
If both the height and the width in
size
are larger than the height and the width of theimg
, theimg
is placed on the center with an appropriate padding to matchsize
.Otherwise, the input image is scaled to fit in a canvas whose size is
size
while preserving aspect ratio.- Parameters
img (ndarray) – An array to be transformed. This is in CHW format.
size (tuple of two ints) – A tuple of two elements:
height, width
. The size of the image after resizing.fill (float, tuple or ndarray) – The value of padded pixels. If it is
numpy.ndarray
, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels ofimg
.interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy.return_param (bool) – Returns information of resizing and offsetting.
- Returns
If
return_param = False
, returns an arrayout_img
that is the result of resizing.If
return_param = True
, returns a tuple whose elements areout_img, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.y_offset (int): The y coordinate of the top left corner of the image after placing on the canvas.
x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.
scaled_size (tuple): The size to which the image is scaled to before placing it on a canvas. This is a tuple of two elements:
height, width
.
- Return type
rotate¶
-
chainercv.transforms.
rotate
(img, angle, expand=True, fill=0, interpolation=2)[source]¶ Rotate images by degrees.
The backend used by
rotate()
is configured bychainer.global_config.cv_rotate_backend
. Two backends are supported: “cv2” and “PIL”. If this isNone
, “cv2” is used whenever “cv2” is installed, and “PIL” is used when “cv2” is not installed.- Parameters
img (ndarray) – An arrays that get rotated. This is in CHW format.
angle (float) – Counter clock-wise rotation angle (degree).
expand (bool) – The output shaped is adapted or not. If
True
, the input image is contained complete in the output.fill (float) – The value used for pixels outside the boundaries.
interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
. Bilinear interpolation is the default strategy.
- Returns
returns an array
out_img
that is the result of rotation.- Return type
scale¶
-
chainercv.transforms.
scale
(img, size, fit_short=True, interpolation=2)[source]¶ Rescales the input image to the given “size”.
When
fit_short == True
, the input image will be resized so that the shorter edge will be scaled to lengthsize
after resizing. For example, if the height of the image is larger than its width, image will be resized to (size * height / width, size).Otherwise, the input image will be resized so that the longer edge will be scaled to length
size
after resizing.- Parameters
img (ndarray) – An image array to be scaled. This is in CHW format.
size (int) – The length of the smaller edge.
fit_short (bool) – Determines whether to match the length of the shorter edge or the longer edge to
size
.interpolation (int) – Determines sampling strategy. This is one of
PIL.Image.NEAREST
,PIL.Image.BILINEAR
,PIL.Image.BICUBIC
,PIL.Image.LANCZOS
. Bilinear interpolation is the default strategy.
- Returns
A scaled image in CHW format.
- Return type
ten_crop¶
-
chainercv.transforms.
ten_crop
(img, size)[source]¶ Crop 10 regions from an array.
This method crops 10 regions. All regions will be in shape
size
. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them.The crops are ordered in this order.
center crop
top-left crop
bottom-left crop
top-right crop
bottom-right crop
center crop (flipped horizontally)
top-left crop (flipped horizontally)
bottom-left crop (flipped horizontally)
top-right crop (flipped horizontally)
bottom-right crop (flipped horizontally)
Bounding Box¶
crop_bbox¶
-
chainercv.transforms.
crop_bbox
(bbox, y_slice=None, x_slice=None, allow_outside_center=True, return_param=False)[source]¶ Translate bounding boxes to fit within the cropped area of an image.
This method is mainly used together with image cropping. This method translates the coordinates of bounding boxes like
translate_bbox()
. In addition, this function truncates the bounding boxes to fit within the cropped area. If a bounding box does not overlap with the cropped area, this bounding box will be removed.- Parameters
bbox (ndarray) – See the table below.
y_slice (slice) – The slice of y axis.
x_slice (slice) – The slice of x axis.
allow_outside_center (bool) – If this argument is
False
, bounding boxes whose centers are outside of the cropped area are removed. The default value isTrue
.return_param (bool) – If
True
, this function returns indices of kept bounding boxes.
name
shape
dtype
format
bbox
\((R, 4)\)
float32
\((y_{min}, x_{min}, y_{max}, x_{max})\)
- Returns
If
return_param = False
, returns an arraybbox
.If
return_param = True
, returns a tuple whose elements arebbox, param
.param
is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.index (numpy.ndarray): An array holding indices of used bounding boxes.
trancated_index (numpy.ndarray): An array holding indices of truncated bounding boxes, with respect to returned
bbox
, rather than originalbbox
.
- Return type
flip_bbox¶
resize_bbox¶
rotate_bbox¶
translate_bbox¶
-
chainercv.transforms.
translate_bbox
(bbox, y_offset=0, x_offset=0)[source]¶ Translate bounding boxes.
This method is mainly used together with image transforms, such as padding and cropping, which translates the left top point of the image from coordinate \((0, 0)\) to coordinate \((y, x) = (y_{offset}, x_{offset})\).
- Parameters
name
shape
dtype
format
bbox
\((R, 4)\)
float32
\((y_{min}, x_{min}, y_{max}, x_{max})\)
- Returns
Bounding boxes translated according to the given offsets.
- Return type
Point¶
flip_point¶
-
chainercv.transforms.
flip_point
(point, size, y_flip=False, x_flip=False)[source]¶ Modify points according to image flips.
- Parameters
point (ndarray or list of arrays) – See the table below.
size (tuple) – A tuple of length 2. The height and the width of the image, which is associated with the points.
y_flip (bool) – Modify points according to a vertical flip of an image.
x_flip (bool) – Modify keypoipoints according to a horizontal flip of an image.
name
shape
dtype
format
point
\((R, K, 2)\) or \([(K, 2)]\)
float32
\((y, x)\)
- Returns
Points modified according to image flips.
- Return type
ndarray or list of arrays
resize_point¶
-
chainercv.transforms.
resize_point
(point, in_size, out_size)[source]¶ Adapt point coordinates to the rescaled image space.
- Parameters
name
shape
dtype
format
point
\((R, K, 2)\) or \([(K, 2)]\)
float32
\((y, x)\)
- Returns
Points rescaled according to the given image shapes.
- Return type
ndarray or list of arrays
translate_point¶
-
chainercv.transforms.
translate_point
(point, y_offset=0, x_offset=0)[source]¶ Translate points.
This method is mainly used together with image transforms, such as padding and cropping, which translates the top left point of the image to the coordinate \((y, x) = (y_{offset}, x_{offset})\).
- Parameters
name
shape
dtype
format
point
\((R, K, 2)\) or \([(K, 2)]\)
float32
\((y, x)\)
- Returns
Points modified translation of an image.
- Return type