# Evaluations¶

## Detection COCO¶

### eval_detection_coco¶

chainercv.evaluations.eval_detection_coco(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_areas=None, gt_crowdeds=None)[source]

Evaluate detections based on evaluation code of MS COCO.

This function evaluates predicted bounding boxes obtained from a dataset by using average precision for each class. The code is based on the evaluation code used in MS COCO.

Parameters
• pred_bboxes (iterable of numpy.ndarray) – See the table below.

• pred_labels (iterable of numpy.ndarray) – See the table below.

• pred_scores (iterable of numpy.ndarray) – See the table below.

• gt_bboxes (iterable of numpy.ndarray) – See the table below.

• gt_labels (iterable of numpy.ndarray) – See the table below.

• gt_areas (iterable of numpy.ndarray) – See the table below. If None, some scores are not returned.

• gt_crowdeds (iterable of numpy.ndarray) – See the table below.

name

shape

dtype

format

pred_bboxes

$$[(R, 4)]$$

float32

$$(y_{min}, x_{min}, y_{max}, x_{max})$$

pred_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

pred_scores

$$[(R,)]$$

float32

gt_bboxes

$$[(R, 4)]$$

float32

$$(y_{min}, x_{min}, y_{max}, x_{max})$$

gt_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

gt_areas

$$[(R,)]$$

float32

gt_crowdeds

$$[(R,)]$$

bool

All inputs should have the same length. For more detailed explanation of the inputs, please refer to chainercv.datasets.COCOBboxDataset.

Returns

The keys, value-types and the description of the values are listed below. The APs and ARs calculated with different iou thresholds, sizes of objects, and numbers of detections per image. For more details on the 12 patterns of evaluation metrics, please refer to COCO’s official evaluation page.

key

type

description

ap/iou=0.50:0.95/area=all/max_dets=100

numpy.ndarray

1

ap/iou=0.50/area=all/max_dets=100

numpy.ndarray

1

ap/iou=0.75/area=all/max_dets=100

numpy.ndarray

1

ap/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

ap/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

ap/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

ar/iou=0.50:0.95/area=all/max_dets=1

numpy.ndarray

2

ar/iou=0.50/area=all/max_dets=10

numpy.ndarray

2

ar/iou=0.75/area=all/max_dets=100

numpy.ndarray

2

ar/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

ar/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

ar/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

map/iou=0.50:0.95/area=all/max_dets=100

float

3

map/iou=0.50/area=all/max_dets=100

float

3

map/iou=0.75/area=all/max_dets=100

float

3

map/iou=0.50:0.95/area=small/max_dets=100

float

map/iou=0.50:0.95/area=medium/max_dets=100

float

map/iou=0.50:0.95/area=large/max_dets=100

float

mar/iou=0.50:0.95/area=all/max_dets=1

float

4

mar/iou=0.50/area=all/max_dets=10

float

4

mar/iou=0.75/area=all/max_dets=100

float

4

mar/iou=0.50:0.95/area=small/max_dets=100

float

mar/iou=0.50:0.95/area=medium/max_dets=100

float

mar/iou=0.50:0.95/area=large/max_dets=100

float

coco_eval

pycocotools.cocoeval.COCOeval

result from pycocotools

existent_labels

numpy.ndarray

used labels

Return type

dict

1(1,2,3,4,5,6)

An array of average precisions. The $$l$$-th value corresponds to the average precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

2(1,2,3,4,5,6)

An array of average recalls. The $$l$$-th value corresponds to the average precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

3(1,2,3,4,5,6)

The average of average precisions over classes.

4(1,2,3,4,5,6)

The average of average recalls over classes.

5(1,2,3,4,5,6,7,8,9,10,11,12)

Skip if gt_areas is None.

## Detection VOC¶

### eval_detection_voc¶

chainercv.evaluations.eval_detection_voc(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5, use_07_metric=False)[source]

Calculate average precisions based on evaluation code of PASCAL VOC.

This function evaluates predicted bounding boxes obtained from a dataset which has $$N$$ images by using average precision for each class. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters
• pred_bboxes (iterable of numpy.ndarray) – See the table below.

• pred_labels (iterable of numpy.ndarray) – See the table below.

• pred_scores (iterable of numpy.ndarray) – See the table below.

• gt_bboxes (iterable of numpy.ndarray) – See the table below.

• gt_labels (iterable of numpy.ndarray) – See the table below.

• gt_difficults (iterable of numpy.ndarray) – See the table below. By default, this is None. In that case, this function considers all bounding boxes to be not difficult.

• iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.

• use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

name

shape

dtype

format

pred_bboxes

$$[(R, 4)]$$

float32

$$(y_{min}, x_{min}, y_{max}, x_{max})$$

pred_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

pred_scores

$$[(R,)]$$

float32

gt_bboxes

$$[(R, 4)]$$

float32

$$(y_{min}, x_{min}, y_{max}, x_{max})$$

gt_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

gt_difficults

$$[(R,)]$$

bool

Returns

The keys, value-types and the description of the values are listed below.

• ap (numpy.ndarray): An array of average precisions. The $$l$$-th value corresponds to the average precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

• map (float): The average of Average Precisions over classes.

Return type

dict

### calc_detection_voc_ap¶

chainercv.evaluations.calc_detection_voc_ap(prec, rec, use_07_metric=False)[source]

Calculate average precisions based on evaluation code of PASCAL VOC.

This function calculates average precisions from given precisions and recalls. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters
• prec (list of numpy.array) – A list of arrays. prec[l] indicates precision for class $$l$$. If prec[l] is None, this function returns numpy.nan for class $$l$$.

• rec (list of numpy.array) – A list of arrays. rec[l] indicates recall for class $$l$$. If rec[l] is None, this function returns numpy.nan for class $$l$$.

• use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

Returns

This function returns an array of average precisions. The $$l$$-th value corresponds to the average precision for class $$l$$. If prec[l] or rec[l] is None, the corresponding value is set to numpy.nan.

Return type

ndarray

### calc_detection_voc_prec_rec¶

chainercv.evaluations.calc_detection_voc_prec_rec(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5)[source]

Calculate precision and recall based on evaluation code of PASCAL VOC.

This function calculates precision and recall of predicted bounding boxes obtained from a dataset which has $$N$$ images. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters
Returns

This function returns two lists: prec and rec.

• prec: A list of arrays. prec[l] is precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, prec[l] is set to None.

• rec: A list of arrays. rec[l] is recall for class $$l$$. If class $$l$$ that is not marked as difficult does not exist in gt_labels, rec[l] is set to None.

Return type

tuple of two lists

## Instance Segmentation COCO¶

### eval_instance_segmentation_coco¶

chainercv.evaluations.eval_instance_segmentation_coco(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, gt_areas=None, gt_crowdeds=None)[source]

Evaluate instance segmentations based on evaluation code of MS COCO.

This function evaluates predicted instance segmentations obtained from a dataset by using average precision for each class. The code is based on the evaluation code used in MS COCO.

Parameters
• pred_masks (iterable of numpy.ndarray) – See the table below.

• pred_labels (iterable of numpy.ndarray) – See the table below.

• pred_scores (iterable of numpy.ndarray) – See the table below.

• gt_masks (iterable of numpy.ndarray) – See the table below.

• gt_labels (iterable of numpy.ndarray) – See the table below.

• gt_areas (iterable of numpy.ndarray) – See the table below. If None, some scores are not returned.

• gt_crowdeds (iterable of numpy.ndarray) – See the table below.

name

shape

dtype

format

pred_masks

$$[(R, H, W)]$$

bool

pred_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

pred_scores

$$[(R,)]$$

float32

gt_masks

$$[(R, H, W)]$$

bool

gt_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

gt_areas

$$[(R,)]$$

float32

gt_crowdeds

$$[(R,)]$$

bool

All inputs should have the same length. For more detailed explanation of the inputs, please refer to chainercv.datasets.COCOInstanceSegmentationDataset.

Returns

The keys, value-types and the description of the values are listed below. The APs and ARs calculated with different iou thresholds, sizes of objects, and numbers of detections per image. For more details on the 12 patterns of evaluation metrics, please refer to COCO’s official evaluation page.

key

type

description

ap/iou=0.50:0.95/area=all/max_dets=100

numpy.ndarray

6

ap/iou=0.50/area=all/max_dets=100

numpy.ndarray

6

ap/iou=0.75/area=all/max_dets=100

numpy.ndarray

6

ap/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

ap/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

ap/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

ar/iou=0.50:0.95/area=all/max_dets=1

numpy.ndarray

7

ar/iou=0.50/area=all/max_dets=10

numpy.ndarray

7

ar/iou=0.75/area=all/max_dets=100

numpy.ndarray

7

ar/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

ar/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

ar/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

map/iou=0.50:0.95/area=all/max_dets=100

float

8

map/iou=0.50/area=all/max_dets=100

float

8

map/iou=0.75/area=all/max_dets=100

float

8

map/iou=0.50:0.95/area=small/max_dets=100

float

map/iou=0.50:0.95/area=medium/max_dets=100

float

map/iou=0.50:0.95/area=large/max_dets=100

float

mar/iou=0.50:0.95/area=all/max_dets=1

float

9

mar/iou=0.50/area=all/max_dets=10

float

9

mar/iou=0.75/area=all/max_dets=100

float

9

mar/iou=0.50:0.95/area=small/max_dets=100

float

mar/iou=0.50:0.95/area=medium/max_dets=100

float

mar/iou=0.50:0.95/area=large/max_dets=100

float

coco_eval

pycocotools.cocoeval.COCOeval

result from pycocotools

existent_labels

numpy.ndarray

used labels

Return type

dict

6(1,2,3,4,5,6)

An array of average precisions. The $$l$$-th value corresponds to the average precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

7(1,2,3,4,5,6)

An array of average recalls. The $$l$$-th value corresponds to the average precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

8(1,2,3,4,5,6)

The average of average precisions over classes.

9(1,2,3,4,5,6)

The average of average recalls over classes.

10(1,2,3,4,5,6,7,8,9,10,11,12)

Skip if gt_areas is None.

## Instance Segmentation VOC¶

### eval_instance_segmentation_voc¶

chainercv.evaluations.eval_instance_segmentation_voc(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, iou_thresh=0.5, use_07_metric=False)[source]

Calculate average precisions based on evaluation code of PASCAL VOC.

This function evaluates predicted masks obtained from a dataset which has $$N$$ images by using average precision for each class. The code is based on the evaluation code used in FCIS.

Parameters
• pred_masks (iterable of numpy.ndarray) – See the table below.

• pred_labels (iterable of numpy.ndarray) – See the table below.

• pred_scores (iterable of numpy.ndarray) – See the table below.

• gt_masks (iterable of numpy.ndarray) – See the table below.

• gt_labels (iterable of numpy.ndarray) – See the table below.

• iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.

• use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

name

shape

dtype

format

pred_masks

$$[(R, H, W)]$$

bool

pred_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

pred_scores

$$[(R,)]$$

float32

gt_masks

$$[(R, H, W)]$$

bool

gt_labels

$$[(R,)]$$

int32

$$[0, \#fg\_class - 1]$$

Returns

The keys, value-types and the description of the values are listed below.

• ap (numpy.ndarray): An array of average precisions. The $$l$$-th value corresponds to the average precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

• map (float): The average of Average Precisions over classes.

Return type

dict

### calc_instance_segmentation_voc_prec_rec¶

chainercv.evaluations.calc_instance_segmentation_voc_prec_rec(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, iou_thresh)[source]

Calculate precision and recall based on evaluation code of PASCAL VOC.

This function calculates precision and recall of predicted masks obtained from a dataset which has $$N$$ images. The code is based on the evaluation code used in FCIS.

Parameters
• pred_masks (iterable of numpy.ndarray) – An iterable of $$N$$ sets of masks. Its index corresponds to an index for the base dataset. Each element of pred_masks is an object mask and is an array whose shape is $$(R, H, W)$$, where $$R$$ corresponds to the number of masks, which may vary among images.

• pred_labels (iterable of numpy.ndarray) – An iterable of labels. Similar to pred_masks, its index corresponds to an index for the base dataset. Its length is $$N$$.

• pred_scores (iterable of numpy.ndarray) – An iterable of confidence scores for predicted masks. Similar to pred_masks, its index corresponds to an index for the base dataset. Its length is $$N$$.

• gt_masks (iterable of numpy.ndarray) – An iterable of ground truth masks whose length is $$N$$. An element of gt_masks is an object mask whose shape is $$(R, H, W)$$. Note that the number of masks $$R$$ in each image does not need to be same as the number of corresponding predicted masks.

• gt_labels (iterable of numpy.ndarray) – An iterable of ground truth labels which are organized similarly to gt_masks. Its length is $$N$$.

• iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.

Returns

This function returns two lists: prec and rec.

• prec: A list of arrays. prec[l] is precision for class $$l$$. If class $$l$$ does not exist in either pred_labels or gt_labels, prec[l] is set to None.

• rec: A list of arrays. rec[l] is recall for class $$l$$. If class $$l$$ that is not marked as difficult does not exist in gt_labels, rec[l] is set to None.

Return type

tuple of two lists

## Semantic Segmentation IoU¶

### eval_semantic_segmentation¶

chainercv.evaluations.eval_semantic_segmentation(pred_labels, gt_labels)[source]

Evaluate metrics used in Semantic Segmentation.

This function calculates Intersection over Union (IoU), Pixel Accuracy and Class Accuracy for the task of semantic segmentation.

The definition of metrics calculated by this function is as follows, where $$N_{ij}$$ is the number of pixels that are labeled as class $$i$$ by the ground truth and class $$j$$ by the prediction.

• $$\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}$$

• $$\text{mIoU} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}$$

• $$\text{Pixel Accuracy} = \frac {\sum_{i=1}^k N_{ii}} {\sum_{i=1}^k \sum_{j=1}^k N_{ij}}$$

• $$\text{Class Accuracy} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}$$

• $$\text{Mean Class Accuracy} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}$$

The more detailed description of the above metrics can be found in a review on semantic segmentation 11.

The number of classes $$n\_class$$ is $$max(pred\_labels, gt\_labels) + 1$$, which is the maximum class id of the inputs added by one.

11

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017.

Parameters
• pred_labels (iterable of numpy.ndarray) – See the table below.

• gt_labels (iterable of numpy.ndarray) – See the table below.

name

shape

dtype

format

pred_labels

$$[(H, W)]$$

int32

$$[0, \#class - 1]$$

gt_labels

$$[(H, W)]$$

int32

$$[-1, \#class - 1]$$

Returns

The keys, value-types and the description of the values are listed below.

• iou (numpy.ndarray): An array of IoUs for the $$n\_class$$ classes. Its shape is $$(n\_class,)$$.

• miou (float): The average of IoUs over classes.

• pixel_accuracy (float): The computed pixel accuracy.

• class_accuracy (numpy.ndarray): An array of class accuracies for the $$n\_class$$ classes. Its shape is $$(n\_class,)$$.

• mean_class_accuracy (float): The average of class accuracies.

Return type

dict

### calc_semantic_segmentation_confusion¶

chainercv.evaluations.calc_semantic_segmentation_confusion(pred_labels, gt_labels)[source]

Collect a confusion matrix.

The number of classes $$n\_class$$ is $$max(pred\_labels, gt\_labels) + 1$$, which is the maximum class id of the inputs added by one.

Parameters
Returns

A confusion matrix. Its shape is $$(n\_class, n\_class)$$. The $$(i, j)$$ th element corresponds to the number of pixels that are labeled as class $$i$$ by the ground truth and class $$j$$ by the prediction.

Return type

numpy.ndarray

### calc_semantic_segmentation_iou¶

chainercv.evaluations.calc_semantic_segmentation_iou(confusion)[source]

Calculate Intersection over Union with a given confusion matrix.

The definition of Intersection over Union (IoU) is as follows, where $$N_{ij}$$ is the number of pixels that are labeled as class $$i$$ by the ground truth and class $$j$$ by the prediction.

• $$\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}$$

Parameters

confusion (numpy.ndarray) – A confusion matrix. Its shape is $$(n\_class, n\_class)$$. The $$(i, j)$$ th element corresponds to the number of pixels that are labeled as class $$i$$ by the ground truth and class $$j$$ by the prediction.

Returns

An array of IoUs for the $$n\_class$$ classes. Its shape is $$(n\_class,)$$.

Return type

numpy.ndarray