Complete IoU Loss

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

解决了什么问题

为了解决以下l损失函数的问题:

\[\left\{\begin{matrix} \begin{align} IoU &: 无法解决目标框与预测框无交集的情况\\ GIoU &: 收敛速度慢\\ DIoU &: 预测框与目标框不一致 \end{align} \end{matrix}\right.\]

在物体检测中, 通常用到的损失函数是IoU(Intersection over Union)和GIoU(generalized IoU), 虽然GIoU补充了IoU在两个box不重叠时候梯度消失无法检测的情况, 但是GIoU还是存在问题(收敛速度慢) Loading...

绿色框为真实目标框, 黑色框为首次出现的预测框, 蓝色框是GIoU_Loss预测框, 红色框为DIoU_Loss预测框.

当GIoU与目标框没有交集时, 它必须扩大自己的面积与目标产生交集, 这是导致它收敛慢的主要原因.所以DIoU的提出便是为了解决这个问题. 而DIoU依然存在一个问题, 预测框的大小和目标框不一致. CIoU_Loss的提出便是为了解决这一个问题

方法

\[\begin{eqnarray} \mathcal{L}_{CIoU} & = & 1 - IoU + \frac{\rho^{2}(\textbf{b},\textbf{b}^{gt})}{c^{2}} + \alpha \upsilon\\ v & = & \frac{4}{\pi^{2}} (arctan\frac{w^{gt}}{h^{gt}} - arctan\frac{w}{h})^{2}\\ \alpha & = & \frac{v}{(1-IoU) + v} \\ IoU & = & \frac{\left | B \cap B^{gt} \right | }{\left | B \cup B^{gt} \right |} \\ B^{gt} & = & (x^{gt}, y^{gt}, w^{gt}, h^{gt}) \\ B & = & (x, y, w, h) \end{eqnarray}\]

参数:

A B C
1 \(\textbf{b}^{gt}\) 真实目标框
2 \(\textbf{b}\) 预测框(x,y,w,h)或(x1,y1,x2,y2)
3 \(\rho\) 欧氏距离
4 \(c\) 真实框与预测框的最小包围框的对角线长度
5 \(\rho(\textbf{b},\textbf{b}^{gt})\) 真实框与预测框两中心的距离

Loading...

检测算法

  1. YOLOv3
  2. SSD
  3. Faster R-CNN

数据集

  1. PASCAL VOC(VOC 2007VOC 2012)
  2. MS COCO 2017

效果

Distinguish

在GIoU和IoU这种情况下相同无法区分时, DIoU依然可以区分 Loading...

Regression error

在Fig4可以看到, 回归误差E(T,n)在DIoU上是最好的, 所有点都能处在一个低的误差范围 Loading... Loading...

On YOLOv3 with PASCAL VOC 2007

DIoU loss 可以比IoU增加3.29% AP和6.02% AP75 Loading...

On SSD with PASCAL VOC 2007

Loading...

On Faster R-CNN with MS COCO 2017

Loading...

Comparison of DIoU_NMS and original NMS

我们在阈值为[0.43,0.48]范围内比较原NMS和DIoU_NMS。从图9可以看出,DIoU_NMS在各个方面都优于原NMS 所以不需要仔细调整阈值ε也能得到表现比原NMS好的结果。 Loading...

改进方向

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

在所有IoU上添加一个α次方,由实验数据可知, \(\alpha \in [2,4]\)效果不错,且\(\alpha = 3\)时最好.

\[\begin{eqnarray} \mathcal{L}_{IoU} &=& 1 − IoU \Rightarrow \mathcal{L}_{\alpha-IoU} = 1 − IoU^{\alpha} \\ \mathcal{L}_{GIoU} &=& 1 − IoU +\frac{|C - B \cup B^{gt}|}{|C|} \Rightarrow \mathcal{L}_{\alpha -GIoU} = 1 − IoU^{\alpha} + (\frac{|C - B \cup B^{gt}|}{|C|})^{\alpha}\\ \mathcal{L}_{DIoU} &=& 1 − IoU +\frac{\rho^{2}(b, bgt)}{c^2} \Rightarrow \mathcal{L}_{\alpha-DIoU} = 1 − IoU^{\alpha} +\frac{\rho^{2\alpha }(b, b^{gt})}{c^{2\alpha} }\\ \mathcal{L}_{CIoU} &=& 1 − IoU +\frac{\rho^{2}(b, bgt)}{c^2}+ \beta v \Rightarrow \mathcal{L}_{\alpha -CIoU} = 1 − IoU^{\alpha} +\frac{\rho^{2\alpha}(b, b^{gt})}{c^{2\alpha}}+ (\beta v)^{\alpha } \end{eqnarray}\]

Code精读

def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):
    # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4
    box2 = box2.T #转置后形状(4,n)

    # Get the coordinates of bounding boxes
    if x1y1x2y2:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
    else:  # transform from xywh to xyxy
        b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
        b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
        b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
        b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2

    # 交集面积
    inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
            (torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)

    # 并集面积
    w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
    w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
    union = w1 * h1 + w2 * h2 - inter + eps

    iou = inter / union
    if GIoU or DIoU or CIoU:
        cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1)  # convex (smallest enclosing box) width最小包围框的宽
        ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)  # convex height最小包围框的高
        if CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared最小包围框的对角线长度的平方
            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +
                    (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center distance squared两个框中心距离的平方
            if DIoU:
                return iou - rho2 / c2  # DIoU
            elif CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                return iou - (rho2 / c2 + v * alpha)  # CIoU
        else:  # GIoU https://arxiv.org/pdf/1902.09630.pdf
            c_area = cw * ch + eps  # convex area
            return iou - (c_area - union) / c_area  # GIoU
    else:
        return iou  # IoU
  1. 此代码位于yolov5/utils/metrics.py, box1为预测框, box2为真实目标框.
  2. 在yolov5/utils/loss.py中, 其x1y1x2y2=False,传入的box参数为(x,y,w,h)
    iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)  # iou(prediction, target)
    lbox += (1.0 - iou).mean()  # iou loss
    
  3. 在物体识别中, 坐标系原点一般为左上角, 整个图像处于第一象限,所以x,y没有负值, clamp(0)只会识别出两个框不相交的情况

扩展

NMS

def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
                        labels=(), max_det=300):
    """Runs Non-Maximum Suppression (NMS) on inference results

    Returns:
         list of detections, on (n,6) tensor per image [xyxy, conf, cls]
    """

    nc = prediction.shape[2] - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'

    # Settings
    min_wh, max_wh = 2, 4096  # (pixels) minimum and maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 10.0  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    t = time.time()
    output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            l = labels[xi]
            v = torch.zeros((len(l), nc + 5), device=x.device)
            v[:, :4] = l[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(l)), l[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(x[:, :4])

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:
            i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
        else:  # best class only
            conf, j = x[:, 5:].max(1, keepdim=True)
            x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        elif n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        if i.shape[0] > max_det:  # limit detections
            i = i[:max_det]
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if (time.time() - t) > time_limit:
            print(f'WARNING: NMS time limit {time_limit}s exceeded')
            break  # time limit exceeded

    return output

参考

  1. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression
  2. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression