Source code for chainercv.links.model.faster_rcnn.utils.bbox2loc

from chainer.backends import cuda


[docs]def bbox2loc(src_bbox, dst_bbox): """Encodes the source and the destination bounding boxes to "loc". Given bounding boxes, this function computes offsets and scales to match the source bounding boxes to the target bounding boxes. Mathematcially, given a bounding box whose center is :math:`(y, x) = p_y, p_x` and size :math:`p_h, p_w` and the target bounding box whose center is :math:`g_y, g_x` and size :math:`g_h, g_w`, the offsets and scales :math:`t_y, t_x, t_h, t_w` can be computed by the following formulas. * :math:`t_y = \\frac{(g_y - p_y)} {p_h}` * :math:`t_x = \\frac{(g_x - p_x)} {p_w}` * :math:`t_h = \\log(\\frac{g_h} {p_h})` * :math:`t_w = \\log(\\frac{g_w} {p_w})` The output is same type as the type of the inputs. The encoding formulas are used in works such as R-CNN [#]_. .. [#] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. \ Rich feature hierarchies for accurate object detection and semantic \ segmentation. CVPR 2014. Args: src_bbox (array): An image coordinate array whose shape is :math:`(R, 4)`. :math:`R` is the number of bounding boxes. These coordinates are :math:`p_{ymin}, p_{xmin}, p_{ymax}, p_{xmax}`. dst_bbox (array): An image coordinate array whose shape is :math:`(R, 4)`. These coordinates are :math:`g_{ymin}, g_{xmin}, g_{ymax}, g_{xmax}`. Returns: array: Bounding box offsets and scales from :obj:`src_bbox` \ to :obj:`dst_bbox`. \ This has shape :math:`(R, 4)`. The second axis contains four values :math:`t_y, t_x, t_h, t_w`. """ xp = cuda.get_array_module(src_bbox) height = src_bbox[:, 2] - src_bbox[:, 0] width = src_bbox[:, 3] - src_bbox[:, 1] ctr_y = src_bbox[:, 0] + 0.5 * height ctr_x = src_bbox[:, 1] + 0.5 * width base_height = dst_bbox[:, 2] - dst_bbox[:, 0] base_width = dst_bbox[:, 3] - dst_bbox[:, 1] base_ctr_y = dst_bbox[:, 0] + 0.5 * base_height base_ctr_x = dst_bbox[:, 1] + 0.5 * base_width eps = xp.finfo(height.dtype).eps height = xp.maximum(height, eps) width = xp.maximum(width, eps) dy = (base_ctr_y - ctr_y) / height dx = (base_ctr_x - ctr_x) / width dh = xp.log(base_height / height) dw = xp.log(base_width / width) loc = xp.vstack((dy, dx, dh, dw)).transpose() return loc