Naming Conventions

Here are the notations used.

  • \(B\) is the size of a batch.
  • \(H\) is the height of an image.
  • \(W\) is the width of an image.
  • \(C\) is the number of channels.
  • \(R\) is the total number of instances in an image.
  • \(L\) is the number of classes.

Data objects


  • imgs: \((B, C, H, W)\) or \([(C, H, W)]\)
  • img: \((C, H, W)\)


image is used for a name of a function or a class (e.g., chainercv.utils.write_image()).

Bounding boxes

  • bboxes: \((B, R, 4)\) or \([(R, 4)]\)
  • bbox: \((R, 4)\)
  • bb: \((4,)\)


name classification detection and instance segmentation semantic segmentation  
labels \((B,)\) \((B, R)\) or \([(R,)]\) \((B, H, W)\)  
label \(()\) \((R,)\) \((H, W)\)  
l r lb \(()\)

Scores and probabilities

score represents an unbounded confidence value. On the other hand, probability is bounded in [0, 1] and sums to 1.

name classification detection and instance segmentation semantic segmentation
scores or probs \((B, L)\) \((B, R, L)\) or \([(R, L)]\) \((B, L, H, W)\)
score or prob \((L,)\) \((R, L)\) \((L, H, W)\)
sc or pb \((L,)\)


Even for objects that satisfy the definition of probability, they can be named as score.

Instance segmentations

  • masks: \((B, R, H, W)\) or \([(R, H, W)]\)
  • mask: \((R, H, W)\)
  • msk: \((H, W)\)

Attributing an additonal meaning to a basic data object


  • rois: \((R', 4)\), which consists of bounding boxes for multiple images. Assuming that there are \(B\) images each containing \(R_i\) bounding boxes, the formula \(R' = \sum R_i\) is true.
  • roi_indices: An array of shape \((R',)\) that contains batch indices of images to which bounding boxes correspond.
  • roi: \((R, 4)\). This is RoIs for single image.

Attributes associated to RoIs

RoIs may have additional attributes, such as class scores and masks. These attributes are named by appending roi_ (e.g., scores-like object is named as roi_scores).

  • roi_xs: \((R',) + x_{shape}\)
  • roi_x: \((R,) + x_{shape}\)

In the case of scores with shape \((L,)\), roi_xs would have shape \((R', L)\).


roi_nouns = roi_noun = noun when batchsize=1. Changing names interchangeably is fine.

Class-wise vs class-independent

cls_nouns is a multi-class version of nouns. For instance, cls_locs is \((B, R, L, 4)\) and locs is \((B, R, 4)\).


cls_probs and probs can be used interchangeably in the case when there is no confusion.

Arbitrary input

x is a variable whose shape can be inferred from the context. It can be used only when there is no confusion on its shape. This is usually the case when naming an input to a neural network.