Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
解决了什么问题
为了解决以下l损失函数的问题:
\[\left\{\begin{matrix} \begin{align} IoU &: 无法解决目标框与预测框无交集的情况\\ GIoU &: 收敛速度慢\\ DIoU &: 预测框与目标框不一致 \end{align} \end{matrix}\right.\]在物体检测中, 通常用到的损失函数是IoU(Intersection over Union)和GIoU(generalized IoU), 虽然GIoU补充了IoU在两个box不重叠时候梯度消失无法检测的情况, 但是GIoU还是存在问题(收敛速度慢)
绿色框为真实目标框, 黑色框为首次出现的预测框, 蓝色框是GIoU_Loss预测框, 红色框为DIoU_Loss预测框.
当GIoU与目标框没有交集时, 它必须扩大自己的面积与目标产生交集, 这是导致它收敛慢的主要原因.所以DIoU的提出便是为了解决这个问题. 而DIoU依然存在一个问题, 预测框的大小和目标框不一致. CIoU_Loss的提出便是为了解决这一个问题
方法
\[\begin{eqnarray} \mathcal{L}_{CIoU} & = & 1 - IoU + \frac{\rho^{2}(\textbf{b},\textbf{b}^{gt})}{c^{2}} + \alpha \upsilon\\ v & = & \frac{4}{\pi^{2}} (arctan\frac{w^{gt}}{h^{gt}} - arctan\frac{w}{h})^{2}\\ \alpha & = & \frac{v}{(1-IoU) + v} \\ IoU & = & \frac{\left | B \cap B^{gt} \right | }{\left | B \cup B^{gt} \right |} \\ B^{gt} & = & (x^{gt}, y^{gt}, w^{gt}, h^{gt}) \\ B & = & (x, y, w, h) \end{eqnarray}\]参数:
A | B | C |
---|---|---|
1 | \(\textbf{b}^{gt}\) | 真实目标框 |
2 | \(\textbf{b}\) | 预测框(x,y,w,h)或(x1,y1,x2,y2) |
3 | \(\rho\) | 欧氏距离 |
4 | \(c\) | 真实框与预测框的最小包围框的对角线长度 |
5 | \(\rho(\textbf{b},\textbf{b}^{gt})\) | 真实框与预测框两中心的距离 |
检测算法
数据集
- PASCAL VOC(VOC 2007和VOC 2012)
- MS COCO 2017
效果
Distinguish
在GIoU和IoU这种情况下相同无法区分时, DIoU依然可以区分
Regression error
在Fig4可以看到, 回归误差E(T,n)在DIoU上是最好的, 所有点都能处在一个低的误差范围
On YOLOv3 with PASCAL VOC 2007
DIoU loss 可以比IoU增加3.29% AP和6.02% AP75
On SSD with PASCAL VOC 2007
On Faster R-CNN with MS COCO 2017
Comparison of DIoU_NMS and original NMS
我们在阈值为[0.43,0.48]范围内比较原NMS和DIoU_NMS。从图9可以看出,DIoU_NMS在各个方面都优于原NMS 所以不需要仔细调整阈值ε也能得到表现比原NMS好的结果。
改进方向
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression
在所有IoU上添加一个α次方,由实验数据可知, \(\alpha \in [2,4]\)效果不错,且\(\alpha = 3\)时最好.
\[\begin{eqnarray} \mathcal{L}_{IoU} &=& 1 − IoU \Rightarrow \mathcal{L}_{\alpha-IoU} = 1 − IoU^{\alpha} \\ \mathcal{L}_{GIoU} &=& 1 − IoU +\frac{|C - B \cup B^{gt}|}{|C|} \Rightarrow \mathcal{L}_{\alpha -GIoU} = 1 − IoU^{\alpha} + (\frac{|C - B \cup B^{gt}|}{|C|})^{\alpha}\\ \mathcal{L}_{DIoU} &=& 1 − IoU +\frac{\rho^{2}(b, bgt)}{c^2} \Rightarrow \mathcal{L}_{\alpha-DIoU} = 1 − IoU^{\alpha} +\frac{\rho^{2\alpha }(b, b^{gt})}{c^{2\alpha} }\\ \mathcal{L}_{CIoU} &=& 1 − IoU +\frac{\rho^{2}(b, bgt)}{c^2}+ \beta v \Rightarrow \mathcal{L}_{\alpha -CIoU} = 1 − IoU^{\alpha} +\frac{\rho^{2\alpha}(b, b^{gt})}{c^{2\alpha}}+ (\beta v)^{\alpha } \end{eqnarray}\]Code精读
def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):
# Returns the IoU of box1 to box2. box1 is 4, box2 is nx4
box2 = box2.T #转置后形状(4,n)
# Get the coordinates of bounding boxes
if x1y1x2y2: # x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
else: # transform from xywh to xyxy
b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
# 交集面积
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# 并集面积
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
union = w1 * h1 + w2 * h2 - inter + eps
iou = inter / union
if GIoU or DIoU or CIoU:
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # convex (smallest enclosing box) width最小包围框的宽
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # convex height最小包围框的高
if CIoU or DIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared最小包围框的对角线长度的平方
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +
(b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center distance squared两个框中心距离的平方
if DIoU:
return iou - rho2 / c2 # DIoU
elif CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
with torch.no_grad():
alpha = v / (v - iou + (1 + eps))
return iou - (rho2 / c2 + v * alpha) # CIoU
else: # GIoU https://arxiv.org/pdf/1902.09630.pdf
c_area = cw * ch + eps # convex area
return iou - (c_area - union) / c_area # GIoU
else:
return iou # IoU
- 此代码位于yolov5/utils/metrics.py, box1为预测框, box2为真实目标框.
- 在yolov5/utils/loss.py中, 其x1y1x2y2=False,传入的box参数为(x,y,w,h)
iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True) # iou(prediction, target) lbox += (1.0 - iou).mean() # iou loss
- 在物体识别中, 坐标系原点一般为左上角, 整个图像处于第一象限,所以x,y没有负值, clamp(0)只会识别出两个框不相交的情况
扩展
NMS
def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
labels=(), max_det=300):
"""Runs Non-Maximum Suppression (NMS) on inference results
Returns:
list of detections, on (n,6) tensor per image [xyxy, conf, cls]
"""
nc = prediction.shape[2] - 5 # number of classes
xc = prediction[..., 4] > conf_thres # candidates
# Checks
assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
# Settings
min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height
max_nms = 30000 # maximum number of boxes into torchvision.ops.nms()
time_limit = 10.0 # seconds to quit after
redundant = True # require redundant detections
multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
merge = False # use merge-NMS
t = time.time()
output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]
for xi, x in enumerate(prediction): # image index, image inference
# Apply constraints
# x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height
x = x[xc[xi]] # confidence
# Cat apriori labels if autolabelling
if labels and len(labels[xi]):
l = labels[xi]
v = torch.zeros((len(l), nc + 5), device=x.device)
v[:, :4] = l[:, 1:5] # box
v[:, 4] = 1.0 # conf
v[range(len(l)), l[:, 0].long() + 5] = 1.0 # cls
x = torch.cat((x, v), 0)
# If none remain process next image
if not x.shape[0]:
continue
# Compute conf
x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf
# Box (center x, center y, width, height) to (x1, y1, x2, y2)
box = xywh2xyxy(x[:, :4])
# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
else: # best class only
conf, j = x[:, 5:].max(1, keepdim=True)
x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]
# Filter by class
if classes is not None:
x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]
# Apply finite constraint
# if not torch.isfinite(x).all():
# x = x[torch.isfinite(x).all(1)]
# Check shape
n = x.shape[0] # number of boxes
if not n: # no boxes
continue
elif n > max_nms: # excess boxes
x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence
# Batched NMS
c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
if i.shape[0] > max_det: # limit detections
i = i[:max_det]
if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
weights = iou * scores[None] # box weights
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
if redundant:
i = i[iou.sum(1) > 1] # require redundancy
output[xi] = x[i]
if (time.time() - t) > time_limit:
print(f'WARNING: NMS time limit {time_limit}s exceeded')
break # time limit exceeded
return output