-
最近忽然有个有趣的想法,关于手机应用的。 概况:目前拍照是手机售卖时竞争点,但往往比拼的是用了什么先进的硬件。而消费者的痛点是拿着好手机拍不出好照片。 思考:如果能用一个AI辅助工具帮助消费者拍出好照片,会很有市场。 目前竞品:目前的照相辅助app都是通过AI调节色度等来优化图片 具体想法:将网友旅游打卡照片在地图上实时定位(照片天气、拍摄时间同拍摄者相同才会在地图显示),APP使用者只用选择喜欢的照片,导航到该位置,然后按APP上自己喜欢的照片角度拍照,直接解决不会构图,不会控制光线的问题,再借鉴整合现有成熟的AI调色,就可以拍出完美照片。 优点:帮助消费者在所有景区经典打卡地拍出超出自身审美的优秀照片;如果能变成手机自身功能之一,可以降低手机对拍照硬件的比拼需求,降低成本。困难:需要抓取网络拍摄的优秀照片并将照片与地图结合,需要控制相机部件的调焦(在征求拍照人意见后)
-
#------------------------# # darknet #------------------------# class Darknet(nn.Cell): """ Darknet for yolox-darknet53 """ # number of block from dark2 to dark5. depth2block = {21: [1, 2, 2, 1], 53: [2, 8, 8, 4]} def __init__( self, depth, in_channels=3, stem_out_channels=32, out_features=("dark3", "dark4", "dark5"), ): """ Args: depth (int): depth of darknet used in model, usually use [21, 53] for this param. in_channels (int): number of input channels, for example, use 3 for RGB image. stem_out_channels (int): number of output channels of darknet stem. It decides channels of darknet layer2 to layer5. out_features (Tuple[str]): desired output layer name. """ super(Darknet, self).__init__() assert out_features, "please provide output features of Darknet" self.out_features = out_features self.stem = nn.SequentialCell( BaseConv(in_channels=in_channels, out_channels=stem_out_channels, ksize=3, stride=1, act="lrelu"), *self.make_group_layer(stem_out_channels, num_blocks=1, stride=2), ) in_channels = stem_out_channels * 2 num_blocks = Darknet.depth2block[depth] # create darknet with `stem_out_channels` and `num_blocks` layers. # to make model structure more clear, we don't use `for` statement in python. self.dark2 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[0], stride=2) ) in_channels *= 2 # 128 self.dark3 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[1], stride=2) ) in_channels *= 2 # 256 self.dark4 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[2], stride=2) ) in_channels *= 2 # 512 self.dark5 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[3], stride=2), *self.make_spp_block([in_channels, in_channels * 2], in_channels * 2), ) def make_group_layer(self, in_channels: int, num_blocks: int, stride: int = 1): "starts with conv layer then has `num_blocks` `ResLayer`" return [ BaseConv(in_channels, in_channels * 2, ksize=3, stride=stride, act="lrelu"), *[(ResLayer(in_channels * 2)) for _ in range(num_blocks)], ] def make_spp_block(self, filters_list, in_filters): """ spatial pyramid pooling block""" m = nn.SequentialCell( *[ BaseConv(in_filters, filters_list[0], 1, stride=1, act="lrelu"), BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"), SPPBottleneck( in_channels=filters_list[1], out_channels=filters_list[0], activation="lrelu", ), BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"), BaseConv(filters_list[1], filters_list[0], 1, stride=1, act="lrelu"), ] ) return m def construct(self, x): """ forward """ outputs = {} x = self.stem(x) outputs["stem"] = x x = self.dark2(x) outputs["dark2"] = x x = self.dark3(x) outputs["dark3"] = x x = self.dark4(x) outputs["dark4"] = x x = self.dark5(x) outputs["dark5"] = x return outputs["dark3"], outputs["dark4"], outputs["dark5"] class CSPDarknet(nn.Cell): """ Darknet with CSP block for yolox-s m l x""" def __init__( self, dep_mul, wid_mul, out_features=("dark3", "dark4", "dark5"), depthwise=False, act="silu" ): super(CSPDarknet, self).__init__() assert out_features, "please provide output features of Darknet" self.out_features = out_features Conv = DWConv if depthwise else BaseConv base_channels = int(wid_mul * 64) base_depth = max(round(dep_mul * 3), 1) # stem self.stem = Focus(3, base_channels, ksize=3, act=act) # dark2 self.dark2 = nn.SequentialCell( Conv(base_channels, base_channels * 2, 3, 2, act=act), CSPLayer( base_channels * 2, base_channels * 2, n=base_depth, depthwise=depthwise, act=act, ), ) # dark3 self.dark3 = nn.SequentialCell( Conv(base_channels * 2, base_channels * 4, 3, 2, act=act), CSPLayer( base_channels * 4, base_channels * 4, n=base_depth * 3, depthwise=depthwise, act=act, ), ) # dark4 self.dark4 = nn.SequentialCell( Conv(base_channels * 4, base_channels * 8, 3, 2, act=act), CSPLayer( base_channels * 8, base_channels * 8, n=base_depth * 3, depthwise=depthwise, act=act, ), ) # dark5 self.dark5 = nn.SequentialCell( Conv(base_channels * 8, base_channels * 16, 3, 2, act=act), SPPBottleneck(base_channels * 16, base_channels * 16, activation=act), CSPLayer( base_channels * 16, base_channels * 16, n=base_depth, shortcut=False, depthwise=depthwise, act=act, ), ) def construct(self, x): """ forward """ outputs = {} x = self.stem(x) outputs["stem"] = x x = self.dark2(x) outputs["dark2"] = x x = self.dark3(x) outputs["dark3"] = x x = self.dark4(x) outputs["dark4"] = x x = self.dark5(x) outputs["dark5"] = x return outputs["dark3"], outputs["dark4"], outputs["dark5"] 3.6.3 backbon+neck 两种结构,如下图所示: YOLOFPN,采用Darknet为backbone,使用yolov3 baseline的Neck结构,都采用FPN结构进行融合 YOLOPAFPN, 在FPN基础上引入PAN结构 #------------------------# # YOLOFPN #------------------------# class YOLOFPN(nn.Cell): """ YOLOFPN module, Darknet53 is the default backbone of this model """ def __init__(self, input_w, input_h, depth=53, in_features=None): super(YOLOFPN, self).__init__() if in_features is None: in_features = ["dark3", "dark4", "dark5"] self.backbone = Darknet(depth) self.in_features = in_features # out 1 self.out1_cbl = self._make_cbl(512, 256, 1) self.out1 = self._make_embedding([256, 512], 512 + 256) # out 2 self.out2_cbl = self._make_cbl(256, 128, 1) self.out2 = self._make_embedding([128, 256], 256 + 128) # upsample self.upsample0 = P.ResizeNearestNeighbor((input_h // 16, input_w // 16)) self.upsample1 = P.ResizeNearestNeighbor((input_h // 8, input_w // 8)) def _make_cbl(self, _in, _out, ks): """ make cbl layer """ return BaseConv(_in, _out, ks, stride=1, act="lrelu") def _make_embedding(self, filters_list, in_filters): """ make embedding """ m = nn.SequentialCell( *[ self._make_cbl(in_filters, filters_list[0], 1), self._make_cbl(filters_list[0], filters_list[1], 3), self._make_cbl(filters_list[1], filters_list[0], 1), self._make_cbl(filters_list[0], filters_list[1], 3), self._make_cbl(filters_list[1], filters_list[0], 1), ] ) return m def construct(self, inputs): """ forward """ out_features = self.backbone(inputs) x2, x1, x0 = out_features # yolo branch 1 x1_in = self.out1_cbl(x0) x1_in = self.upsample0(x1_in) x1_in = P.Concat(axis=1)([x1_in, x1]) out_dark4 = self.out1(x1_in) # yolo branch 2 x2_in = self.out2_cbl(out_dark4) x2_in = self.upsample1(x2_in) x2_in = P.Concat(axis=1)([x2_in, x2]) out_dark3 = self.out2(x2_in) outputs = (out_dark3, out_dark4, x0) return outputs #------------------------# # YOLOPAFPN #------------------------# class YOLOPAFPN(nn.Cell): """ YOLOv3 model. Darknet 53 is the default backbone of this model """ def __init__( self, input_w, input_h, depth=1.0, width=1.0, in_features=("dark3", "dark4", "dark5"), in_channels=None, depthwise=False, act="silu" ): super(YOLOPAFPN, self).__init__() if in_channels is None: in_channels = [256, 512, 1024] self.input_w = input_w self.input_h = input_h self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act) self.in_features = in_features self.in_channels = in_channels Conv = DWConv if depthwise else BaseConv self.upsample0 = P.ResizeNearestNeighbor((input_h // 16, input_w // 16)) self.upsample1 = P.ResizeNearestNeighbor((input_h // 8, input_w // 8)) self.lateral_conv0 = BaseConv(int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act) self.C3_p4 = CSPLayer( int(2 * in_channels[1] * width), int(in_channels[1] * width), round(3 * depth), False, depthwise=depthwise, act=act ) self.reduce_conv1 = BaseConv( int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act ) self.C3_p3 = CSPLayer( int(2 * in_channels[0] * width), int(in_channels[0] * width), round(3 * depth), False, depthwise=depthwise, act=act, ) # bottom-up conv self.bu_conv2 = Conv( int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act ) self.C3_n3 = CSPLayer( int(2 * in_channels[0] * width), int(in_channels[1] * width), round(3 * depth), False, depthwise=depthwise, act=act, ) # bottom-up conv self.bu_conv1 = Conv( int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act ) self.C3_n4 = CSPLayer( int(2 * in_channels[1] * width), int(in_channels[2] * width), round(3 * depth), False, depthwise=depthwise, act=act, ) self.concat = P.Concat(axis=1) def construct(self, inputs): """ Args: inputs: input images. Returns: Tuple[Tensor]: FPN feature. """ x2, x1, x0 = self.backbone(inputs) fpn_out0 = self.lateral_conv0(x0) # 1024->512 /32 f_out0 = self.upsample0(fpn_out0) # 512 /16 f_out0 = self.concat((f_out0, x1)) # 512->1024 /16 f_out0 = self.C3_p4(f_out0) # 1024->512 /16 fpn_out1 = self.reduce_conv1(f_out0) # 512->256 /16 f_out1 = self.upsample1(fpn_out1) # 256 /8 f_out1 = self.concat((f_out1, x2)) # 256->512 /8 pan_out2 = self.C3_p3(f_out1) # 512->256 /16 p_out1 = self.bu_conv2(pan_out2) # 256->256 /16 p_out1 = self.concat((p_out1, fpn_out1)) # 256->512 /16 pan_out1 = self.C3_n3(p_out1) # 512->512/16 p_out0 = self.bu_conv1(pan_out1) # 512->512/32 p_out0 = self.concat((p_out0, fpn_out0)) # 512->1024/32 pan_out0 = self.C3_n4(p_out0) # 1024->1024/32 return pan_out2, pan_out1, pan_out0 3.7 bbox iou计算相关 #------------------------# # bbox iou #------------------------# @constexpr def raise_bbox_error(): raise IndexError("Index error, shape of input must be 4!") def bboxes_iou(bboxes_a, bboxes_b, xyxy=True): """ calculate iou Args: bboxes_a: bboxes_b: xyxy: Returns: """ if bboxes_a.shape[1] != 4 or bboxes_b.shape[1] != 4: raise_bbox_error() if xyxy: tl = P.Maximum()(bboxes_a[:, None, :2], bboxes_b[:, :2]) br = P.Minimum()(bboxes_a[:, None, 2:], bboxes_b[:, 2:]) area_a = bboxes_a[:, 2:] - bboxes_a[:, :2] area_a = (area_a[:, 0:1] * area_a[:, 1:2]).squeeze(-1) area_b = bboxes_b[:, 2:] - bboxes_b[:, :2] area_b = (area_b[:, 0:1] * area_b[:, 1:2]).squeeze(-1) else: tl = P.Maximum()( (bboxes_a[:, None, :2] - bboxes_a[:, None, 2:] / 2), (bboxes_b[:, :2] - bboxes_b[:, 2:] / 2), ) br = P.Minimum()( (bboxes_a[:, None, :2] + bboxes_a[:, None, 2:] / 2), (bboxes_b[:, :2] + bboxes_b[:, 2:] / 2), ) area_a = (bboxes_a[:, 2:3] * bboxes_a[:, 3:4]).squeeze(-1) area_b = (bboxes_b[:, 2:3] * bboxes_b[:, 3:4]).squeeze(-1) en = (tl < br).astype(tl.dtype) en = (en[..., 0:1] * en[..., 1:2]).squeeze(-1) area_i = tl - br area_i = (area_i[:, :, 0:1] * area_i[:, :, 1:2]).squeeze(-1) * en return area_i / (area_a[:, None] + area_b - area_i) def batch_bboxes_iou(batch_bboxes_a, batch_bboxes_b, xyxy=True): """ calculate iou for one batch Args: batch_bboxes_a: batch_bboxes_b: xyxy: Returns: """ if batch_bboxes_a.shape[-1] != 4 or batch_bboxes_b.shape[-1] != 4: raise_bbox_error() ious = [] for i in range(len(batch_bboxes_a)): if xyxy: iou = bboxes_iou(batch_bboxes_a[i], batch_bboxes_b[i], True) else: iou = bboxes_iou(batch_bboxes_a[i], batch_bboxes_b[i], False) iou = P.ExpandDims()(iou, 0) ious.append(iou) ious = P.Concat(axis=0)(ious) return ious 3.8 模型、Loss相关 DetectionBlock为完整的yolox结构,用于声明后续训练声明网络结构 yololoss ema指数移动平均,对模型权重进行加权平均,使其更加鲁棒 3.8.1 网络损失函数 和网络的预测结果一样,YOLOX网络的损失函数也由三个部分组成,分别是Reg部分、Obj部分和Cls部分。Reg部分是特征点的回归参数判断,Obj部分是特征点是否包含物体判断,Cls部分是特征点包含的物体的种类。 在YoloX中,物体的真实框落在哪些特征点内就由该特征点来预测。 对于每一个真实框需要求取所有特征点与它的空间位置情况,作为正样本的特征点需要满足以下几个特点: 1)特征点落在物体的真实框内; 2)特征点距离物体中心尽量要在一定半径内。 满足这两点保证了属于正样本的特征点会落在物体真实框内部,特征点中心与物体真实框中心要相近。但是这两个条件仅用作正样本的初步筛选,在YoloX中,使用了SimOTA方法进行动态的正样本数量分配。 在YoloX中,会计算一个Cost代价矩阵,代表每个真实框和每个特征点之间的代价关系,Cost代价矩阵由三个部分组成: 1)每个真实框和当前特征点预测框的重合程度; 2)每个真实框和当前特征点预测框的种类预测准确度; 3)每个真实框的中心是否落在了特征点的一定半径内。 Cost代价矩阵的目的是自适应的找到当前特征点应该去拟合的真实框,重合度越高越需要拟合,分类越准越需要拟合,在一定半径内越需要拟合。 在SimOTA中,不同目标设定不同的正样本数量(dynamic k),以旷视科技官方回答中的蚂蚁和西瓜为例子,传统的正样本分配方案常常为同一场景下的西瓜和蚂蚁分配同样的正样本数,那要么蚂蚁有很多低质量的正样本,要么西瓜仅仅只有一两个正样本,这样的结果对于哪个分配方式都是不合适的。 动态的正样本设置的关键在于如何确定k,SimOTA具体的做法是首先计算每个目标Cost最低的10特征点,然后把这十个特征点对应的预测框与真实框的IOU加起来求得最终的k。 因此,SimOTA的过程总结如下: 1)计算每个真实框和当前特征点预测框的重合程度; 2)计算将重合度最高的十个预测框与真实框的IOU加起来求得每个真实框的k,也就代表每个真实框有k个特征点与之对应; 3)计算每个真实框和当前特征点预测框的种类预测准确度; 4)判断真实框的中心是否落在了特征点的一定半径内; 5)计算Cost代价矩阵; 6)将Cost最低的k个点作为该真实框的正样本。 由前文所述可知,YoloX的损失由三个部分组成: 1.Reg部分,由SimOTA可以知道每个真实框对应的特征点,获取到每个框对应的特征点后,取出该特征点的预测框,利用真实框和预测框计算IOU损失,作为Reg部分的Loss组成。 2.Obj部分,由SimOTA可知道每个真实框对应的特征点,所有真实框对应的特征点都是正样本,剩余的特征点均为负样本,根据正负样本和特征点的是否包含物体的预测结果计算交叉熵损失,作为Obj部分的Loss组成。 3.Cls部分,由SimOTA可知道每个真实框对应的特征点,获取到每个框对应的特征点后,取出该特征点的种类预测结果,根据真实框的种类和特征点的种类预测结果计算交叉熵损失,作为Cls部分的Loss组成。 其中Cls和Obj部分采用的都是二值交叉熵损失(BCELoss),Reg部分采用的是IoULoss。值得注意的是,Cls和Reg部分只计算正样本的损失,而Obj既计算正样本也计算负样本的损失。 其中: Lcls代表分类损失,Lreg代表定位损失,Lobj代表obj损失,λ代表定位损失的平衡系数,源码中设置是5.0,Npos代表被分为正样的Anchor Point数。 #------------------------# # yolox model #------------------------# class DetectionPerFPN(nn.Cell): """ head """ def __init__(self, num_classes, scale, in_channels=None, act="silu", width=1.0): super(DetectionPerFPN, self).__init__() if in_channels is None: in_channels = [1024, 512, 256] self.scale = scale self.num_classes = num_classes Conv = BaseConv if scale == 's': self.stem = BaseConv(in_channels=int(in_channels[0] * width), out_channels=int(256 * width), ksize=1, stride=1, act=act) elif scale == 'm': self.stem = BaseConv(in_channels=int(in_channels[1] * width), out_channels=int(256 * width), ksize=1, stride=1, act=act) elif scale == 'l': self.stem = BaseConv(in_channels=int(in_channels[2] * width), out_channels=int(256 * width), ksize=1, stride=1, act=act) else: raise KeyError("Invalid scale value for DetectionBlock") self.cls_convs = nn.SequentialCell( [ Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), ] ) self.reg_convs = nn.SequentialCell( [ Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), ] ) self.cls_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=self.num_classes, kernel_size=1, stride=1, pad_mode="pad", has_bias=True) self.reg_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=4, kernel_size=1, stride=1, pad_mode="pad", has_bias=True) self.obj_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=1, kernel_size=1, stride=1, pad_mode="pad", has_bias=True) def construct(self, x): """ forward """ x = self.stem(x) cls_x = x reg_x = x cls_feat = self.cls_convs(cls_x) cls_output = self.cls_preds(cls_feat) reg_feat = self.reg_convs(reg_x) reg_output = self.reg_preds(reg_feat) obj_output = self.obj_preds(reg_feat) return cls_output, reg_output, obj_output class DetectionBlock(nn.Cell): """ connect yolox backbone and head """ def __init__(self, config, backbone="yolopafpn"): super(DetectionBlock, self).__init__() self.num_classes = config.num_classes self.attr_num = self.num_classes + 5 self.depthwise = config.depth_wise self.strides = Tensor([8, 16, 32], mindspore.float32) self.input_size = config.input_size # network if backbone == "yolopafpn": self.backbone = YOLOPAFPN(depth=1.33, width=1.25, input_w=self.input_size[1], input_h=self.input_size[0]) self.head_inchannels = [1024, 512, 256] self.activation = "silu" self.width = 1.25 else: self.backbone = YOLOFPN(input_w=self.input_size[1], input_h=self.input_size[0]) self.head_inchannels = [512, 256, 128] self.activation = "lrelu" self.width = 1.0 self.head_l = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='l', act=self.activation, width=self.width) self.head_m = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='m', act=self.activation, width=self.width) self.head_s = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='s', act=self.activation, width=self.width) def construct(self, x): """ forward """ outputs = [] x_l, x_m, x_s = self.backbone(x) cls_output_l, reg_output_l, obj_output_l = self.head_l(x_l) # (bs, 80, 80, 80)(bs, 4, 80, 80)(bs, 1, 80, 80) cls_output_m, reg_output_m, obj_output_m = self.head_m(x_m) # (bs, 80, 40, 40)(bs, 4, 40, 40)(bs, 1, 40, 40) cls_output_s, reg_output_s, obj_output_s = self.head_s(x_s) # (bs, 80, 20, 20)(bs, 4, 20, 20)(bs, 1, 20, 20) if self.training: output_l = P.Concat(axis=1)((reg_output_l, obj_output_l, cls_output_l)) # (bs, 85, 80, 80) output_m = P.Concat(axis=1)((reg_output_m, obj_output_m, cls_output_m)) # (bs, 85, 40, 40) output_s = P.Concat(axis=1)((reg_output_s, obj_output_s, cls_output_s)) # (bs, 85, 20, 20) output_l = self.mapping_to_img(output_l, stride=self.strides[0]) # (bs, 6400, 85)x_c, y_c, w, h output_m = self.mapping_to_img(output_m, stride=self.strides[1]) # (bs, 1600, 85)x_c, y_c, w, h output_s = self.mapping_to_img(output_s, stride=self.strides[2]) # (bs, 400, 85)x_c, y_c, w, h else: output_l = P.Concat(axis=1)( (reg_output_l, P.Sigmoid()(obj_output_l), P.Sigmoid()(cls_output_l))) # bs, 85, 80, 80 output_m = P.Concat(axis=1)( (reg_output_m, P.Sigmoid()(obj_output_m), P.Sigmoid()(cls_output_m))) # bs, 85, 40, 40 output_s = P.Concat(axis=1)( (reg_output_s, P.Sigmoid()(obj_output_s), P.Sigmoid()(cls_output_s))) # bs, 85, 20, 20 output_l = self.mapping_to_img(output_l, stride=self.strides[0]) # (bs, 6400, 85)x_c, y_c, w, h output_m = self.mapping_to_img(output_m, stride=self.strides[1]) # (bs, 1600, 85)x_c, y_c, w, h output_s = self.mapping_to_img(output_s, stride=self.strides[2]) # (bs, 400, 85)x_c, y_c, w, h outputs.append(output_l) outputs.append(output_m) outputs.append(output_s) return P.Concat(axis=1)(outputs) # batch_size, 8400, 85 def mapping_to_img(self, output, stride): """ map to origin image scale for each fpn """ batch_size = P.Shape()(output)[0] n_ch = self.attr_num grid_size = P.Shape()(output)[2:4] range_x = range(grid_size[1]) range_y = range(grid_size[0]) stride = P.Cast()(stride, output.dtype) grid_x = P.Cast()(F.tuple_to_array(range_x), output.dtype) grid_y = P.Cast()(F.tuple_to_array(range_y), output.dtype) grid_y = P.ExpandDims()(grid_y, 1) grid_x = P.ExpandDims()(grid_x, 0) yv = P.Tile()(grid_y, (1, grid_size[1])) xv = P.Tile()(grid_x, (grid_size[0], 1)) grid = P.Stack(axis=2)([xv, yv]) # (80, 80, 2) grid = P.Reshape()(grid, (1, 1, grid_size[0], grid_size[1], 2)) # (1,1,80,80,2) output = P.Reshape()(output, (batch_size, n_ch, grid_size[0], grid_size[1])) # bs, 6400, 85-->(bs,85,80,80) output = P.Transpose()(output, (0, 2, 1, 3)) # (bs,85,80,80)-->(bs,80,85,80) output = P.Transpose()(output, (0, 1, 3, 2)) # (bs,80,85,80)--->(bs, 80, 80, 85) output = P.Reshape()(output, (batch_size, 1 * grid_size[0] * grid_size[1], -1)) # bs, 6400, 85 grid = P.Reshape()(grid, (1, -1, 2)) # grid(1, 6400, 2) # reconstruct output_xy = output[..., :2] output_xy = (output_xy + grid) * stride output_wh = output[..., 2:4] output_wh = P.Exp()(output_wh) * stride output_other = output[..., 4:] output_t = P.Concat(axis=-1)([output_xy, output_wh, output_other]) return output_t # bs, 6400, 85 grid(1, 6400, 2) #------------------------# # yolox Loss #------------------------# class YOLOLossCell(nn.Cell): """ yolox with loss cell """ def __init__(self, network=None, config=None): super(YOLOLossCell, self).__init__() self.network = network self.n_candidate_k = config.n_candidate_k self.on_value = Tensor(1.0, mindspore.float32) self.off_value = Tensor(0.0, mindspore.float32) self.depth = config.num_classes self.unsqueeze = P.ExpandDims() self.reshape = P.Reshape() self.one_hot = P.OneHot() self.zeros = P.ZerosLike() self.sort_ascending = P.Sort(descending=False) self.bce_loss = nn.BCEWithLogitsLoss(reduction="none") self.l1_loss = nn.L1Loss(reduction="none") self.batch_iter = Tensor(np.arange(0, config.per_batch_size * config.max_gt), mindspore.int32) self.strides = config.fpn_strides self.grids = [(config.input_size[0] // _stride) * (config.input_size[1] // _stride) for _stride in config.fpn_strides] self.use_l1 = config.use_l1 def construct(self, img, labels=None, pre_fg_mask=None, is_inbox_and_incenter=None): """ forward with loss return """ batch_size = P.Shape()(img)[0] gt_max = P.Shape()(labels)[1] outputs = self.network(img) # batch_size, 8400, 85 total_num_anchors = P.Shape()(outputs)[1] bbox_preds = outputs[:, :, :4] # batch_size, 8400, 4 obj_preds = outputs[:, :, 4:5] # batch_size, 8400, 1 cls_preds = outputs[:, :, 5:] # (batch_size, 8400, 80) # process label bbox_true = labels[:, :, 1:] # (batch_size, gt_max, 4) gt_classes = F.cast(labels[:, :, 0:1].squeeze(-1), mindspore.int32) pair_wise_ious = batch_bboxes_iou(bbox_true, bbox_preds, xyxy=False) pair_wise_ious = pair_wise_ious * pre_fg_mask pair_wise_iou_loss = -P.Log()(pair_wise_ious + 1e-8) * pre_fg_mask gt_classes_ = self.one_hot(gt_classes, self.depth, self.on_value, self.off_value) gt_classes_expaned = ops.repeat_elements(self.unsqueeze(gt_classes_, 2), rep=total_num_anchors, axis=2) gt_classes_expaned = F.stop_gradient(gt_classes_expaned) cls_preds_ = P.Sigmoid()(ops.repeat_elements(self.unsqueeze(cls_preds, 1), rep=gt_max, axis=1)) * \ P.Sigmoid()( ops.repeat_elements(self.unsqueeze(obj_preds, 1), rep=gt_max, axis=1) ) pair_wise_cls_loss = P.ReduceSum()( P.BinaryCrossEntropy(reduction="none")(P.Sqrt()(cls_preds_), gt_classes_expaned, None), -1) pair_wise_cls_loss = pair_wise_cls_loss * pre_fg_mask cost = pair_wise_cls_loss + 3.0 * pair_wise_iou_loss punishment_cost = 1000.0 * (1.0 - F.cast(is_inbox_and_incenter, mindspore.float32)) cost = F.cast(cost + punishment_cost, mindspore.float16) # dynamic k matching ious_in_boxes_matrix = pair_wise_ious # (batch_size, gt_max, 8400) ious_in_boxes_matrix = F.cast(pre_fg_mask * ious_in_boxes_matrix, mindspore.float16) topk_ious, _ = P.TopK(sorted=True)(ious_in_boxes_matrix, self.n_candidate_k) dynamic_ks = P.ReduceSum()(topk_ious, 2).astype(mindspore.int32).clip(xmin=1, xmax=total_num_anchors - 1, dtype=mindspore.int32) # (1, batch_size * gt_max, 2) dynamic_ks_indices = P.Stack(axis=1)((self.batch_iter, dynamic_ks.reshape((-1,)))) dynamic_ks_indices = F.stop_gradient(dynamic_ks_indices) values, _ = P.TopK(sorted=True)(-cost, self.n_candidate_k) # b_s , 50, 8400 values = P.Reshape()(-values, (-1, self.n_candidate_k)) max_neg_score = self.unsqueeze(P.GatherNd()(values, dynamic_ks_indices).reshape(batch_size, -1), 2) pos_mask = F.cast(cost < max_neg_score, mindspore.float32) # (batch_size, gt_num, 8400) pos_mask = pre_fg_mask * pos_mask # ----dynamic_k---- END----------------------------------------------------------------------------------------- cost_t = cost * pos_mask + (1.0 - pos_mask) * 2000. min_index, _ = P.ArgMinWithValue(axis=1)(cost_t) ret_posk = P.Transpose()(nn.OneHot(depth=gt_max, axis=-1)(min_index), (0, 2, 1)) pos_mask = pos_mask * ret_posk pos_mask = F.stop_gradient(pos_mask) # AA problem--------------END ---------------------------------------------------------------------------------- # calculate target --------------------------------------------------------------------------------------------- # Cast precision pos_mask = F.cast(pos_mask, mindspore.float16) bbox_true = F.cast(bbox_true, mindspore.float16) gt_classes_ = F.cast(gt_classes_, mindspore.float16) reg_target = P.BatchMatMul(transpose_a=True)(pos_mask, bbox_true) # (batch_size, 8400, 4) pred_ious_this_matching = self.unsqueeze(P.ReduceSum()((ious_in_boxes_matrix * pos_mask), 1), -1) cls_target = P.BatchMatMul(transpose_a=True)(pos_mask, gt_classes_) cls_target = cls_target * pred_ious_this_matching obj_target = P.ReduceMax()(pos_mask, 1) # (batch_size, 8400) # calculate l1_target reg_target = F.stop_gradient(reg_target) cls_target = F.stop_gradient(cls_target) obj_target = F.stop_gradient(obj_target) bbox_preds = F.cast(bbox_preds, mindspore.float32) reg_target = F.cast(reg_target, mindspore.float32) obj_preds = F.cast(obj_preds, mindspore.float32) obj_target = F.cast(obj_target, mindspore.float32) cls_preds = F.cast(cls_preds, mindspore.float32) cls_target = F.cast(cls_target, mindspore.float32) loss_l1 = 0.0 if self.use_l1: l1_target = self.get_l1_format(reg_target) l1_preds = self.get_l1_format(bbox_preds) l1_target = F.stop_gradient(l1_target) l1_target = F.cast(l1_target, mindspore.float32) l1_preds = F.cast(l1_preds, mindspore.float32) loss_l1 = P.ReduceSum()(self.l1_loss(l1_preds, l1_target), -1) * obj_target loss_l1 = P.ReduceSum()(loss_l1) # calculate target -----------END------------------------------------------------------------------------------- loss_iou = IOUloss()(P.Reshape()(bbox_preds, (-1, 4)), reg_target).reshape(batch_size, -1) * obj_target loss_iou = P.ReduceSum()(loss_iou) loss_obj = self.bce_loss(P.Reshape()(obj_preds, (-1, 1)), P.Reshape()(obj_target, (-1, 1))) loss_obj = P.ReduceSum()(loss_obj) loss_cls = P.ReduceSum()(self.bce_loss(cls_preds, cls_target), -1) * obj_target loss_cls = P.ReduceSum()(loss_cls) loss_all = (5 * loss_iou + loss_cls + loss_obj + loss_l1) / (P.ReduceSum()(obj_target) + 1e-3) return loss_all def get_l1_format_single(self, reg_target, stride, eps): """ calculate L1 loss related """ reg_target = reg_target / stride reg_target_xy = reg_target[:, :, :2] reg_target_wh = reg_target[:, :, 2:] reg_target_wh = P.Log()(reg_target_wh + eps) return P.Concat(-1)((reg_target_xy, reg_target_wh)) def get_l1_format(self, reg_target, eps=1e-8): """ calculate L1 loss related """ reg_target_l = reg_target[:, 0:self.grids[0], :] # (bs, 6400, 4) reg_target_m = reg_target[:, self.grids[0]:self.grids[1] + self.grids[0], :] # (bs, 1600, 4) reg_target_s = reg_target[:, -self.grids[2]:, :] # (bs, 400, 4) reg_target_l = self.get_l1_format_single(reg_target_l, self.strides[0], eps) reg_target_m = self.get_l1_format_single(reg_target_m, self.strides[1], eps) reg_target_s = self.get_l1_format_single(reg_target_s, self.strides[2], eps) l1_target = P.Concat(axis=1)([reg_target_l, reg_target_m, reg_target_s]) return l1_target class IOUloss(nn.Cell): """ Iou loss """ def __init__(self, reduction="none"): super(IOUloss, self).__init__() self.reduction = reduction self.reshape = P.Reshape() def construct(self, pred, target): """ forward """ pred = self.reshape(pred, (-1, 4)) target = self.reshape(target, (-1, 4)) tl = P.Maximum()(pred[:, :2] - pred[:, 2:] / 2, target[:, :2] - target[:, 2:] / 2) br = P.Minimum()(pred[:, :2] + pred[:, 2:] / 2, target[:, :2] + target[:, 2:] / 2) area_p = (pred[:, 2:3] * pred[:, 3:4]).squeeze(-1) area_g = (target[:, 2:3] * target[:, 3:4]).squeeze(-1) en = F.cast((tl < br), tl.dtype) en = (en[:, 0:1] * en[:, 1:2]).squeeze(-1) area_i = br - tl area_i = (area_i[:, 0:1] * area_i[:, 1:2]).squeeze(-1) * en area_u = area_p + area_g - area_i iou = area_i / (area_u + 1e-16) loss = 1 - iou * iou if self.reduction == "mean": loss = loss.mean() elif self.reduction == "sum": loss = loss.sum() return loss grad_scale = C.MultitypeFuncGraph("grad_scale") reciprocal = P.Reciprocal() @grad_scale.register("Tensor", "Tensor") def tensor_grad_scale(scale, grad): return grad * reciprocal(scale) _grad_overflow = C.MultitypeFuncGraph("_grad_overflow") grad_overflow = P.FloatStatus() @_grad_overflow.register("Tensor") def _tensor_grad_overflow(grad): return grad_overflow(grad) #------------------------# # ema #------------------------# class TrainOneStepWithEMA(nn.TrainOneStepWithLossScaleCell): """ Train one step with ema model """ def __init__(self, network, optimizer, scale_sense, ema=True, decay=0.9998, updates=0, moving_name=None, ema_moving_weight=None): super(TrainOneStepWithEMA, self).__init__(network, optimizer, scale_sense) self.ema = ema self.moving_name = moving_name self.ema_moving_weight = ema_moving_weight if self.ema: self.ema_weight = self.weights.clone("ema", init='same') self.decay = decay self.updates = Parameter(Tensor(updates, mindspore.float32)) self.assign = ops.Assign() self.ema_moving_parameters() def ema_moving_parameters(self): self.moving_name = {} moving_list = [] idx = 0 for key, param in self.network.parameters_and_names(): if "moving_mean" in key or "moving_variance" in key: new_param = param.clone() new_param.name = "ema." + param.name moving_list.append(new_param) self.moving_name["ema." + key] = idx idx += 1 self.ema_moving_weight = ParameterTuple(moving_list) def ema_update(self): """Update EMA parameters.""" if self.ema: self.updates += 1 d = self.decay * (1 - ops.Exp()(-self.updates / 2000)) # update trainable parameters for ema_v, weight in zip(self.ema_weight, self.weights): tep_v = ema_v * d self.assign(ema_v, (1.0 - d) * weight + tep_v) return self.updates # moving_parameter_update is executed inside the callback(EMACallBack) def moving_parameter_update(self): if self.ema: d = (self.decay * (1 - ops.Exp()(-self.updates / 2000))).asnumpy().item() # update moving mean and moving var for key, param in self.network.parameters_and_names(): if "moving_mean" in key or "moving_variance" in key: idx = self.moving_name["ema." + key] moving_weight = param.asnumpy() tep_v = self.ema_moving_weight[idx] * d ema_value = (1.0 - d) * moving_weight + tep_v self.ema_moving_weight[idx] = ema_value def construct(self, *inputs): """ Forward """ weights = self.weights loss = self.network(*inputs) scaling_sens = self.scale_sense status, scaling_sens = self.start_overflow_check(loss, scaling_sens) scaling_sens_filled = C.ones_like(loss) * F.cast(scaling_sens, F.dtype(loss)) grads = self.grad(self.network, weights)(*inputs, scaling_sens_filled) grads = self.hyper_map(F.partial(grad_scale, scaling_sens), grads) # apply grad reducer on grads grads = self.grad_reducer(grads) self.ema_update() # get the overflow buffer cond = self.get_overflow_status(status, grads) overflow = self.process_loss_scale(cond) # if there is no overflow, do optimize if not overflow: loss = F.depend(loss, self.optimizer(grads)) return loss, cond, scaling_sens 3.9 设备函数 针对平台设备的相关函数 #------------------------# # device adapter #------------------------# def local_adp_get_device_id(): device_id = os.getenv('DEVICE_ID', '0') return int(device_id) def local_adp_get_device_num(): device_num = os.getenv('RANK_SIZE', '1') return int(device_num) def local_adp_get_rank_id(): global_rank_id = os.getenv('RANK_ID', '0') return int(global_rank_id) def local_adp_get_job_id(): return "Local Job" def moxing_adp_get_device_id(): device_id = os.getenv('DEVICE_ID', '0') return int(device_id) def moxing_adp_get_device_num(): device_num = os.getenv('RANK_SIZE', '1') return int(device_num) def moxing_adp_get_rank_id(): global_rank_id = os.getenv('RANK_ID', '0') return int(global_rank_id) def moxing_adp_get_job_id(): job_id = os.getenv('JOB_ID') job_id = job_id if job_id != "" else "default" return job_id def sync_data(from_path, to_path): """ Download data from remote obs to local directory if the first url is remote url and the second one is local path Upload data from local directory to remote obs in contrast. """ import moxing as mox global _global_sync_count sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count) _global_sync_count += 1 # Each server contains 8 devices as most. if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock): print("from path: ", from_path) print("to path: ", to_path) mox.file.copy_parallel(from_path, to_path) print("===finish data synchronization===") try: os.mknod(sync_lock) except IOError: pass print("===save flag===") while True: if os.path.exists(sync_lock): break time.sleep(1) print("Finish sync data from {} to {}.".format(from_path, to_path)) def moxing_wrapper(pre_process=None, post_process=None): """ Moxing wrapper to download dataset and upload outputs. """ def wrapper(run_func): @functools.wraps(run_func) def wrapped_func(*args, **kwargs): # Download data from data_url if config.enable_modelarts: if config.data_url: sync_data(config.data_url, config.data_path) print("Dataset downloaded: ", os.listdir(config.data_path)) if config.checkpoint_url: sync_data(config.checkpoint_url, config.load_path) print("Preload downloaded: ", os.listdir(config.load_path)) if config.train_url: sync_data(config.train_url, config.output_path) print("Workspace downloaded: ", os.listdir(config.output_path)) context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id()))) config.device_num = get_device_num() config.device_id = get_device_id() if not os.path.exists(config.output_path): os.makedirs(config.output_path) if pre_process: pre_process() # Run the main function run_func(*args, **kwargs) # Upload data to train_url if config.enable_modelarts: if post_process: post_process() if config.train_url: print("Start to copy output directory") sync_data(config.output_path, config.train_url) return wrapped_func return wrapper if config.enable_modelarts: get_device_id = moxing_adp_get_device_id get_device_num = moxing_adp_get_device_num get_rank_id = moxing_adp_get_rank_id get_job_id = moxing_adp_get_job_id else: get_device_id = local_adp_get_device_id get_device_num = local_adp_get_device_num get_rank_id = local_adp_get_rank_id get_job_id = local_adp_get_job_id
-
1.算法原理介绍 YOLOX是旷视科技在2021年提出的目标检测算法,在YOLO系列的基础上进行了经验性改进和提升,主要的改进体现在三个方面:decoupled head、anchor-free和advanced label assigning strategy(SimOTA)。 论文:YOLOX: Exceeding YOLO Series in 2021 论文地址:https://arxiv.org/abs/2107.08430 YOLOX的整体网络结构以YOLOv3+Darknet作为基线搭建,整体架构如下图所示,主要包括三个部分:CSPDarknet、FPN和Yolo Head。 CSPDarknet是YoloX的主干特征提取网络,输入的图片首先在CSPDarknet里面进行特征提取,提取到的特征可以称为特征层,是输入图片的特征集合。在主干部分,获取了三个特征层便于进行下一步网络的构建,这三个特征层可以称为有效特征层。 FPN是YoloX的加强特征提取网络,在主干部分获得的三个有效特征层会在这一部分进行特征融合,特征融合的目的是结合不同尺度的特征信息。在FPN部分,已经获得的有效特征层被用于继续提取特征。在YoloX里面同样使用了YoloV4中用到的Panet的结构,不仅会对特征进行上采样实现特征融合,还会对特征再次进行下采样实现特征融合。 Yolo Head是YoloX的分类器与回归器,通过CSPDarknet和FPN已经可以获得了三个加强过的有效特征层。每一个特征层都有宽、高和通道数,此时可以将特征图看作一个又一个特征点的集合,每一个特征点都有通道数个特征。Yolo Head实际上所做的工作就是对特征点进行判断,判断特征点是否有物体与其对应。以前版本的Yolo所用的解耦头是一起的,也就是分类和回归在一个1×1卷积里实现,YoloX认为这给网络的识别带来了不利影响。在YoloX中,Yolo Head被分为了两部分,分别实现,最后预测的时候整合在一起。 因此,整个YoloX网络所作的工作就是特征提取-特征加强-预测特征点对应的物体情况。 1.1 主干部分 YOLOX的主干特征提取网络为CSPDarknet,有以下几个特点: 1.使用了残差网络Residual,CSPDarknet中的残差卷积可以分为两个部分,主干部分是一次1×1的卷积和一次3×3的卷积,残差边部分不做任何处理,直接将主干的输入与输出结合。整个YoloX的主干部分都由残差卷积构成,残差网络的特点是容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接,缓解了在深度神经网络中增加深度带来的梯度消失问题。 2.使用CSPnet网络结构,CSPnet结构并不算复杂,就是将原来的残差块的堆叠进行了一个拆分,拆成左右两部分:主干部分继续进行原来的残差块的堆叠,另一部分则像一个残差边一样,经过少量处理直接连接到最后。因此可以认为CSP中存在一个大的残差边。 3.使用了Focus网络结构,这个网络结构是在YoloV5里面使用到比较有趣的网络结构,具体操作是在一张图片中每隔一个像素拿到一个值,这时获得了四个独立的特征层,然后将四个独立的特征层进行堆叠,此时宽高信息就集中到了通道信息,输入通道扩充了四倍。Focus结构如下图所示,拼接起来的特征层相对于原先的三通道变成了十二个通道。 4.使用了SiLU激活函数,SiLU是Sigmoid和ReLU的改进版。SiLU具备无上界有下界、平滑、非单调的特性。SiLU在深层模型上的效果优于 ReLU,可以看作是平滑的ReLU激活函数。 5.使用了SPP结构,通过不同池化核大小的最大池化进行特征提取,提高网络的感受野。在YoloV4中,SPP用在FPN里,而在YoloX中,SPP模块被用在了主干特征提取网络中。 1.2 构建FPN特征金字塔 在特征提取部分,YoloX提取多特征层进行目标检测,一共提取三个特征层。 三个特征层位于主干部分CSPdarknet的不同位置,分别位于中间层,中下层,底层,当输入图片的尺寸为640×640×3时,三个特征层的shape分别为feat1 = 80×80×256、feat2 = 40×40×512、feat3 = 20×20×1024。 在获得三个有效特征层之后,进行FPN层的构建,构建方式如下: 1.feat3 = 20×20×1024特征层进行一次1×1卷积后获得P5,对P5执行上采样操作后与feat2 = 40×40×512特征层进行结合,然后使用CSPLayer进行特征提取获得P5_upsample,此时得到的特征层尺寸为40×40×512。 2.P5_upsample = 40×40×512的特征层进行一次1×1卷积后获得P4,对P4执行上采样操作后与feat1 = 80×80×256特征层结合,然后使用CSPLayer进行特征提取获得P3_out,此时得到的特征层尺寸为80×80×256。 3.P3_out = 80×80×256特征层进行一次3×3卷积后,执行下采样操作并与P4堆叠,然后使用CSPLayer进行特征提取获得P4_out,此时得到的特征层尺寸为40×40×512。 4.P4_out = 40×40×512特征层进行一次3×3卷积后,执行下采样操作并与P5堆叠,然后使用CSPLayer进行特征提取获得P5_out,此时得到的特征层尺寸为20×20×1024。 特征金字塔可以将不同shape的特征层进行特征融合,有利于提取出更好的特征。 1.3 利用Yolo Head获得预测结果 通过FPN特征金字塔,获得了三个加强特征,其shape分别为20×20×1024、40×40×512、80×80×256,将这些加强特征层传入Yolo Head获得预测结果。 YoloX中的Yolo Head与之前版本的Yolo Head不同,以前版本的Yolo所用的解耦头是一起的,即分类和回归在一个1×1卷积里实现,YoloX认为这给网络的识别带来了不利影响。于是在YoloX中,Yolo Head被分为了两部分分别实现,最后预测的时候才整合在一起,如下图所示。 对于每一个特征层,可以获得三个预测结果,分别是: 1)Reg(H×W×4)用于判断每一个特征点的回归参数,对回归参数进行调整后可以获得预测框; 2)Obj(H×W×1)用于判断每一个特征点是否包含物体; 3)Cls(H×W×C)用于判断每一个特征点所包含物体的种类(C表示类别数)。 将三个预测结果进行堆叠,每个特征层获得的结果为Out(H×W×(4+1+C)),前四个参数用于判断每一个特征点的回归参数,回归参数调整后可以获得预测框。第五个参数用于判断每一个特征点是否包含物体,最后C个参数用于判断每一个特征点所包含的物体种类。 2.数据集 本案例使用的数据集是COCO2017,COCO的全称是Common Objects in Context,是微软团队提供的可以用来进行图像识别的数据集。目前最常用于图像检测定位,是一个新的图像识别、分割、和字幕数据集,其对于图像的标注信息不仅有类别、位置信息,还有对图像的语义文本描述。MS COCO数据集中的图像分为训练、验证和测试集,其中包括person、bicycle、car、motorbike、aeroplane、bus等80个类别。 CoCo2017数据集包括train(118287张)、val(5000张)。 数据集官网链接:https://cocodataset.org/#home 数据集的文件目录结构如下所示: ├── dataset ├── coco2017 ├── annotations │ ├─ train.json │ └─ val.json ├─ train │ ├─picture1.jpg │ ├─ ... │ └─picturen.jpg └─ val ├─picture1.jpg ├─ ... └─picturen.jpg 3.实现 模型分为两个训练阶段,第一阶段使用数据增强,第二阶段不使用数据增强 3.1导入包 import os import time import datetime import argparse import sys import logging import numpy as np import cv2 import multiprocessing import random import math import json import stat import functools from functools import reduce from collections import Counter from pycocotools.coco import COCO from pycocotools.cocoeval import COCOeval from tqdm import tqdm import colorsys import mindspore import mindspore.dataset as de import mindspore.nn as nn import mindspore.common.dtype as mstype from mindspore import load_checkpoint, load_param_into_net, save_checkpoint, Tensor, Parameter, ops, context, Model, DynamicLossScaleManager from mindspore.profiler.profiling import Profiler from mindspore.communication.management import init, get_rank, get_group_size from mindspore.context import ParallelMode from mindspore.common.parameter import ParameterTuple from mindspore.common import set_seed, initializer from mindspore.common.initializer import Initializer as MeInitializer from mindspore.train.callback import Callback, CheckpointConfig, ModelCheckpoint from mindspore.ops import composite as C from mindspore.ops import functional as F from mindspore.ops import operations as P from mindspore.ops.primitive import constexpr 3.2运行设置 关键设置: device_targe,根据平台设置相应设备 max_epoch,当前训练阶段的最大epoch, 设置为第一阶段训练最大epoch #------------------------# # settings #------------------------# def parase_config(): parase = argparse.ArgumentParser(description=__doc__) parase.add_argument('--backbone', default='yolox_darknet53', help='option for backbone, you can choose yolox_darknet53 or yolox_x') parase.add_argument('--data_aug', default=True, help='stage one use data aug, stage two not use') parase.add_argument('--device_target', default='Ascend', help='Ascend GPU CPU platform') parase.add_argument('--outputs_dir', default='./') #train opt parase.add_argument('--save_graphs', default=False) parase.add_argument('--lr_scheduler', default='yolox_warm_cos_lr') parase.add_argument('--max_epoch', default=10, help='max epoch for one train stage') parase.add_argument('--total_epoch', default=15, help='total epoch for all train stages') parase.add_argument('--data_dir', default='test_coco', help='data with coco form') parase.add_argument('--yolox_no_aug_ckpt', default='', help='last no data aug related') parase.add_argument('--need_profiler', default=0) parase.add_argument('--pretrained', default=None, help='pretrained backbon path') parase.add_argument('--resume_yolox', default=None, help='resume weight path') parase.add_argument('--flip_prob', default=0.5, help='related to data aug') parase.add_argument('--hsv_prob', default=1.0, help='data aug related') parase.add_argument('--per_batch_size', default=2, help='batch size') #net config parase.add_argument('--depth_wise', default=False) parase.add_argument('--max_gt', default=120) parase.add_argument('--num_classes', default=3, help='match the classes num your dataset owns') parase.add_argument('--input_size', default=[640, 640]) parase.add_argument('--fpn_strides', default=[8, 16, 32]) parase.add_argument('--use_l1', default=False, help='use l1 loss when stage_2') parase.add_argument('--use_syc_bn', default=True) parase.add_argument('--updates', default=0.0) parase.add_argument('--n_candidate_k', default=10, help='dynamic_k') #optimizer parase.add_argument('--lr', default=0.01, help='set 0.04 for yolox-x') parase.add_argument('--min_lr_ratio', default=0.001) parase.add_argument('--warmup_epochs', default=5) parase.add_argument('--weight_decay', default=0.0005) parase.add_argument('--momentum', default=0.9) parase.add_argument('--no_aug_epochs', default=5, help='set equal to total_epoch - max_epoch') #logger parase.add_argument('--log_interval', default=30) parase.add_argument('--ckpt_interval', default=10) parase.add_argument('--is_save_on_master', default=1) parase.add_argument('--ckpt_max_num', default=60) parase.add_argument('--opt', default='Momentum') #distributed parase.add_argument('--is_distributed', default=0) parase.add_argument('--rank', default=0) parase.add_argument('--group_size', default=1) parase.add_argument('--bind_cpu', default=True) parase.add_argument('--device_num', default=1) #model arts parase.add_argument('--is_modelArts', default=0) parase.add_argument('--enable_modelarts', default=False) parase.add_argument('--need_modelarts_dataset_unzip',default=False) parase.add_argument('--modelarts_dataset_unzip_name', default='coco2017') parase.add_argument('--data_url', default='') parase.add_argument('--train_url', default='') parase.add_argument('--checkpoint_url', default='') parase.add_argument('--data_path', default='') parase.add_argument('--output_path', default='./') parase.add_argument('--load_path', default='') parase.add_argument('--ckpt_path', default='./save_weights', help='save ckpt') #eval parase.add_argument('--log_path', default='./eval_logs') parase.add_argument('--val_ckpt', default='') parase.add_argument('--conf_thre', default=0.001) parase.add_argument('--nms_thre', default=0.65) parase.add_argument('--eval_interval', default=10) parase.add_argument('--run_eval', default=False) #pred parase.add_argument('--pred_ckpt', default='') parase.add_argument('--pred_conf_thre', default=0.01) parase.add_argument('--pred_nms_thre', default=0.5) parase.add_argument('--classes_path', default='test_coco/classes.txt') parase.add_argument('--pred_input', default='test_coco/val2017') parase.add_argument('--pred_output', default='./pred_output') #modelarts parase.add_argument('--is_modelart', default=False) parase.add_argument('--result_path', default='') #export opt parase.add_argument('--file_format', default='MINDIR') parase.add_argument('--export_bs', default=1) args = parase.parse_args(args=[]) return args config = parase_config() 3.3 logger相关 #------------------------# # logger #------------------------# class LOGGER(logging.Logger): """ Logger. Args: logger_name: String. Logger name. rank: Integer. Rank id. """ def __init__(self, logger_name, rank=0): super(LOGGER, self).__init__(logger_name) self.rank = rank if rank % 8 == 0: console = logging.StreamHandler(sys.stdout) console.setLevel(logging.INFO) formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') console.setFormatter(formatter) self.addHandler(console) def setup_logging_file(self, log_dir, rank=0): """Setup logging file.""" self.rank = rank if not os.path.exists(log_dir): os.makedirs(log_dir, exist_ok=True) log_name = datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank) self.log_fn = os.path.join(log_dir, log_name) fh = logging.FileHandler(self.log_fn) fh.setLevel(logging.INFO) formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') fh.setFormatter(formatter) self.addHandler(fh) def info(self, msg, *args, **kwargs): if self.isEnabledFor(logging.INFO): self._log(logging.INFO, msg, args, **kwargs) def save_args(self, args): self.info('Args:') args_dict = vars(args) for key in args_dict.keys(): self.info('--> %s: %s', key, args_dict[key]) self.info('') def important_info(self, msg, *args, **kwargs): if self.isEnabledFor(logging.INFO) and self.rank == 0: line_width = 2 important_msg = '\n' important_msg += ('*'*70 + '\n')*line_width important_msg += ('*'*line_width + '\n')*2 important_msg += '*'*line_width + ' '*8 + msg + '\n' important_msg += ('*'*line_width + '\n')*2 important_msg += ('*'*70 + '\n')*line_width self.info(important_msg, *args, **kwargs) def get_logger(path, rank): """Get Logger.""" logger = LOGGER('yolox', rank) logger.setup_logging_file(path, rank) return logger 3.4 image transform相关 进行图像预处理,包含Mosaic和MixUp所需函数, 获取SimOTA中的动态正样本。 在SimOTA中,不同目标设定不同的正样本数量(dynamic k),以旷视科技官方回答中的蚂蚁和西瓜为例子,传统的正样本分配方案常常为同一场景下的西瓜和蚂蚁分配同样的正样本数,那要么蚂蚁有很多低质量的正样本,要么西瓜仅仅只有一两个正样本,这样的结果对于哪个分配方式都是不合适的。 动态的正样本设置的关键在于如何确定k,SimOTA具体的做法是首先计算每个目标Cost最低的10特征点,然后把这十个特征点对应的预测框与真实框的IOU加起来求得最终的k。 因此,SimOTA的过程总结如下: 1)计算每个真实框和当前特征点预测框的重合程度; 2)计算将重合度最高的十个预测框与真实框的IOU加起来求得每个真实框的k,也就代表每个真实框有k个特征点与之对应; 3)计算每个真实框和当前特征点预测框的种类预测准确度; 4)判断真实框的中心是否落在了特征点的一定半径内; 5)计算Cost代价矩阵; 6)将Cost最低的k个点作为该真实框的正样本。 Cost代价矩阵的目的是自适应的找到当前特征点应该去拟合的真实框,重合度越高越需要拟合,分类越准越需要拟合,在一定半径内越需要拟合。 #------------------------# # image transform #------------------------# def get_aug_params(value, center=0): if isinstance(value, float): min_v = center - value max_v = center + value elif len(value) == 2: min_v = value[0] max_v = value[1] else: raise ValueError( "Affine params should be either a sequence containing two values\ or single float values. Got {}".format(value) ) return random.uniform(min_v, max_v) def get_affine_matrix( target_size, degrees=10, translate=0.1, scales=0.1, shear=10, ): twidth, theight = target_size # Rotation and Scale angle = get_aug_params(degrees) scale = get_aug_params(scales, center=1.0) if scale <= 0.0: raise ValueError("Argument scale should be positive") R = cv2.getRotationMatrix2D(angle=angle, center=(0, 0), scale=scale) M = np.ones([2, 3]) # Shear shear_x = math.tan(get_aug_params(shear) * math.pi / 180) shear_y = math.tan(get_aug_params(shear) * math.pi / 180) M[0] = R[0] + shear_y * R[1] M[1] = R[1] + shear_x * R[0] # Translation translation_x = get_aug_params(translate) * twidth # x translation (pixels) translation_y = get_aug_params(translate) * theight # y translation (pixels) M[0, 2] = translation_x M[1, 2] = translation_y return M, scale def apply_affine_to_bboxes(targets, target_size, M, scale): num_gts = len(targets) # warp corner points twidth, theight = target_size corner_points = np.ones((4 * num_gts, 3)) corner_points[:, :2] = targets[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( 4 * num_gts, 2 ) # x1y1, x2y2, x1y2, x2y1 corner_points = corner_points @ M.T # apply affine transform corner_points = corner_points.reshape(num_gts, 8) # create new boxes corner_xs = corner_points[:, 0::2] corner_ys = corner_points[:, 1::2] new_bboxes = ( np.concatenate((corner_xs.min(1), corner_ys.min(1), corner_xs.max(1), corner_ys.max(1))).reshape(4, num_gts).T) # clip boxes new_bboxes[:, 0::2] = new_bboxes[:, 0::2].clip(0, twidth) new_bboxes[:, 1::2] = new_bboxes[:, 1::2].clip(0, theight) targets[:, :4] = new_bboxes return targets def random_affine( img, targets=(), target_size=(640, 640), degrees=10, translate=0.1, scales=0.1, shear=10, ): M, scale = get_affine_matrix(target_size, degrees, translate, scales, shear) img = cv2.warpAffine(img, M, dsize=target_size, borderValue=(114, 114, 114)) # Transform label coordinates target_length = len(targets) if target_length: targets = apply_affine_to_bboxes(targets, target_size, M, scale) return img, targets def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.2): # box1(4,n), box2(4,n) # Compute candidate boxes which include following 5 things: # box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio w1, h1 = box1[2] - box1[0], box1[3] - box1[1] w2, h2 = box2[2] - box2[0], box2[3] - box2[1] ar = np.maximum(w2 / (h2 + 1e-16), h2 / (w2 + 1e-16)) # aspect ratio return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + 1e-16) > area_thr) & (ar < ar_thr) # candidates def augment_hsv(img, hgain=0.015, sgain=0.7, vgain=0.4): """ hsv augment """ r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1 # random gains hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV)) dtype = img.dtype x = np.arange(0, 256, dtype=np.int16) lut_hue = ((x * r[0]) % 180).astype(dtype) lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) lut_val = np.clip(x * r[2], 0, 255).astype(dtype) img_hsv = cv2.merge( (cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)) ).astype(dtype) cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img) def _mirror(image, boxes, prob=0.5): _, width, _ = image.shape if random.random() < prob: image = image[:, ::-1] boxes[:, 0::2] = width - boxes[:, 2::-2] return image, boxes def preproc(img, input_size, swap=(2, 0, 1)): """ padding image and transpose dim """ if len(img.shape) == 3: padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114 else: padded_img = np.ones(input_size, dtype=np.uint8) * 114 r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1]) resized_img = cv2.resize( img, (int(img.shape[1] * r), int(img.shape[0] * r)), interpolation=cv2.INTER_LINEAR, ).astype(np.uint8) padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img padded_img = padded_img.transpose(swap) padded_img = np.ascontiguousarray(padded_img, dtype=np.float32) return padded_img, r class TrainTransform: """ image transform for training """ def __init__(self, max_labels=50, flip_prob=0.5, hsv_prob=1.0, config=None): if config: self.max_labels = config.max_gt self.flip_prob = config.flip_prob self.hsv_prob = config.hsv_prob self.strides = config.fpn_strides self.input_size = config.input_size else: self.hsv_prob = 1.0 self.flip_prob = 0.5 self.max_labels = max_labels self.strides = [8, 16, 32] self.input_size = (640, 640) self.grid_size = [(self.input_size[0] / x) * (self.input_size[1] / x) for x in self.strides] self.num_total_anchor = int(sum(self.grid_size)) def __call__(self, image, targets, input_dim): """ Tran transform call """ boxes = targets[:, :4] labels = targets[:, 4] if not boxes.size: targets = np.zeros((self.max_labels, 5), dtype=np.float32) image, r_o = preproc(image, input_dim) is_in_boxes_all = np.zeros((self.max_labels, self.num_total_anchor)).astype(np.bool_) is_in_boxes_and_center = np.zeros((self.max_labels, self.num_total_anchor)).astype(np.bool_) return image, targets, is_in_boxes_all, is_in_boxes_and_center image_o = image.copy() targets_o = targets.copy() boxes_o = targets_o[:, :4] labels_o = targets_o[:, 4] boxes_o = xyxy2cxcywh(boxes_o) if random.random() < self.hsv_prob: augment_hsv(image) image_t, boxes = _mirror(image, boxes, self.flip_prob) image_t, r_ = preproc(image_t, input_dim) boxes = xyxy2cxcywh(boxes) boxes *= r_ mask_b = np.minimum(boxes[:, 2], boxes[:, 3]) > 1 boxes_t = boxes[mask_b] labels_t = labels[mask_b] if not boxes_t.size: image_t, r_o = preproc(image_o, input_dim) boxes_o *= r_o boxes_t = boxes_o labels_t = labels_o labels_t = np.expand_dims(labels_t, 1) targets_t = np.hstack((labels_t, boxes_t)) padded_labels = np.zeros((self.max_labels, 5)) true_labels = len(targets_t) padded_labels[range(len(targets_t))[: self.max_labels]] = targets_t[: self.max_labels] padded_labels = np.ascontiguousarray(padded_labels, dtype=np.float32) gt_bboxes_per_image = padded_labels[:, 1:5] # is_in_boxes_all [gt_max, 8400] is_in_boxes_all, is_in_boxes_and_center = self.get_in_boxes_info(gt_bboxes_per_image, true_labels) # is_in_boxes_all [gt_max, 8400] is_in_boxes_all = is_in_boxes_all.any(1).reshape((-1, 1)) * is_in_boxes_all.any(0).reshape((1, -1)) return image_t, padded_labels, is_in_boxes_all, is_in_boxes_and_center def get_grid(self): """ get grid in each image """ grid_size_x = [] grid_size_y = [] x_shifts = [] # (1, 6400) (1,1600) (1, 400) -->(1, 8400) y_shifts = [] # (1, 6400) (1,1600) (1, 400) expanded_strides = [] # (1, 6400) (1,1600) (1, 400) for _stride in self.strides: grid_size_x.append(int(self.input_size[0] / _stride)) grid_size_y.append(int(self.input_size[1] / _stride)) for i in range(len(grid_size_x)): xv, yv = np.meshgrid(np.arange(0, grid_size_y[i]), np.arange(0, grid_size_x[i])) grid = np.stack((xv, yv), 2).reshape(1, 1, grid_size_x[i], grid_size_y[i], 2) grid = grid.reshape(1, -1, 2) x_shifts.append(grid[:, :, 0]) y_shifts.append(grid[:, :, 1]) this_stride = np.zeros((1, grid.shape[1])) this_stride.fill(self.strides[i]) this_stride = this_stride.astype(np.float32) expanded_strides.append(this_stride) x_shifts = np.concatenate(x_shifts, axis=1) y_shifts = np.concatenate(y_shifts, axis=1) expanded_strides = np.concatenate(expanded_strides, axis=1) return x_shifts, y_shifts, expanded_strides def get_in_boxes_info(self, gt_bboxes_per_image, true_labels): """ get the pre in-center and in-box info for each image """ x_shifts, y_shifts, expanded_strides = self.get_grid() num_total_anchor = x_shifts.shape[1] expanded_strides = expanded_strides[0] x_shifts_per_image = x_shifts[0] * expanded_strides y_shifts_per_image = y_shifts[0] * expanded_strides x_centers_per_image = np.expand_dims((x_shifts_per_image + 0.5 * expanded_strides), axis=0) x_centers_per_image = np.repeat(x_centers_per_image, self.max_labels, axis=0) y_centers_per_image = np.expand_dims((y_shifts_per_image + 0.5 * expanded_strides), axis=0) y_centers_per_image = np.repeat(y_centers_per_image, self.max_labels, axis=0) gt_bboxes_per_image_l = np.expand_dims((gt_bboxes_per_image[:, 0] - 0.5 * gt_bboxes_per_image[:, 2]), axis=1) gt_bboxes_per_image_l = np.repeat(gt_bboxes_per_image_l, num_total_anchor, axis=1) gt_bboxes_per_image_r = np.expand_dims((gt_bboxes_per_image[:, 0] + 0.5 * gt_bboxes_per_image[:, 2]), axis=1) gt_bboxes_per_image_r = np.repeat(gt_bboxes_per_image_r, num_total_anchor, axis=1) gt_bboxes_per_image_t = np.expand_dims((gt_bboxes_per_image[:, 1] - 0.5 * gt_bboxes_per_image[:, 3]), axis=1) gt_bboxes_per_image_t = np.repeat(gt_bboxes_per_image_t, num_total_anchor, axis=1) gt_bboxes_per_image_b = np.expand_dims((gt_bboxes_per_image[:, 1] + 0.5 * gt_bboxes_per_image[:, 3]), axis=1) gt_bboxes_per_image_b = np.repeat(gt_bboxes_per_image_b, num_total_anchor, axis=1) b_l = x_centers_per_image - gt_bboxes_per_image_l b_r = gt_bboxes_per_image_r - x_centers_per_image b_t = y_centers_per_image - gt_bboxes_per_image_t b_b = gt_bboxes_per_image_b - y_centers_per_image bbox_deltas = np.stack([b_l, b_t, b_r, b_b], 2) is_in_boxes = bbox_deltas.min(axis=-1) > 0.0 is_in_boxes[true_labels:, ...] = False center_radius = 2.5 gt_bboxes_per_image_l = np.repeat(np.expand_dims((gt_bboxes_per_image[:, 0]), 1), num_total_anchor, 1) - \ center_radius * np.expand_dims(expanded_strides, 0) gt_bboxes_per_image_r = np.repeat(np.expand_dims((gt_bboxes_per_image[:, 0]), 1), num_total_anchor, 1) + \ center_radius * np.expand_dims(expanded_strides, 0) gt_bboxes_per_image_t = np.repeat(np.expand_dims((gt_bboxes_per_image[:, 1]), 1), num_total_anchor, 1) - \ center_radius * np.expand_dims(expanded_strides, 0) gt_bboxes_per_image_b = np.repeat(np.expand_dims((gt_bboxes_per_image[:, 1]), 1), num_total_anchor, 1) + \ center_radius * np.expand_dims(expanded_strides, 0) c_l = x_centers_per_image - gt_bboxes_per_image_l c_r = gt_bboxes_per_image_r - x_centers_per_image c_t = y_centers_per_image - gt_bboxes_per_image_t c_b = gt_bboxes_per_image_b - y_centers_per_image center_deltas = np.stack([c_l, c_r, c_t, c_b], 2) is_in_centers = center_deltas.min(axis=-1) > 0.0 is_in_centers[true_labels:, ...] = False # padding gts are set False is_in_boxes_all = is_in_boxes | is_in_centers is_in_boxes_and_center = is_in_boxes & is_in_centers return is_in_boxes_all, is_in_boxes_and_center class ValTransform: """ image transform for val """ def __init__(self, swap=(2, 0, 1), legacy=False): self.swap = swap self.legacy = legacy self.mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) self.std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) def __call__(self, img, input_size): img, _ = preproc(img, input_size, self.swap) if self.legacy: img = img[::-1, :, :].copy() / 255.0 img = (img - self.mean) / self.std return img, np.zeros((1, 5)) def xyxy2cxcywh(bboxes): bboxes[:, 2] = bboxes[:, 2] - bboxes[:, 0] bboxes[:, 3] = bboxes[:, 3] - bboxes[:, 1] bboxes[:, 0] = bboxes[:, 0] + bboxes[:, 2] * 0.5 bboxes[:, 1] = bboxes[:, 1] + bboxes[:, 3] * 0.5 return bboxes def xyxy2xywh(bboxes): bboxes[:, 2] = bboxes[:, 2] - bboxes[:, 0] bboxes[:, 3] = bboxes[:, 3] - bboxes[:, 1] return bboxes def statistic_normalize_img(img, statistic_norm): """Statistic normalize images.""" img = np.transpose(img, (1, 2, 0)) img = img / 255. mean = np.array([0.485, 0.456, 0.406]) std = np.array([0.229, 0.224, 0.225]) if statistic_norm: img = (img - mean) / std return np.transpose(img, (2, 0, 1)).astype(np.float32) 3.5 数据集创建 #------------------------# # dataset #------------------------# min_keypoints_per_image = 10 def _has_only_empty_bbox(anno): return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno) def _count_visible_keypoints(anno): return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno) def has_valid_annotation(anno): """Check annotation file.""" # if it's empty, there is no annotation if not anno: return False # if all boxes have close to zero area, there is no annotation if _has_only_empty_bbox(anno): return False # keypoints task have a slight different criteria for considering # if an annotation is valid if "keypoints" not in anno[0]: return True # for keypoint detection tasks, only consider valid images those # containing at least min_keypoints_per_image if _count_visible_keypoints(anno) >= min_keypoints_per_image: return True return False def get_mosaic_coordinate(mosaic_image, mosaic_index, xc, yc, w, h, input_h, input_w): """ Get mosaic coordinate """ # index0 to top left part of image if mosaic_index == 0: x1, y1, x2, y2 = max(xc - w, 0), max(yc - h, 0), xc, yc small_coord = w - (x2 - x1), h - (y2 - y1), w, h # index1 to top right part of image elif mosaic_index == 1: x1, y1, x2, y2 = xc, max(yc - h, 0), min(xc + w, input_w * 2), yc small_coord = 0, h - (y2 - y1), min(w, x2 - x1), h # index2 to bottom left part of image elif mosaic_index == 2: x1, y1, x2, y2 = max(xc - w, 0), yc, xc, min(input_h * 2, yc + h) small_coord = w - (x2 - x1), 0, w, min(y2 - y1, h) # index2 to bottom right part of image elif mosaic_index == 3: x1, y1, x2, y2 = xc, yc, min(xc + w, input_w * 2), min(input_h * 2, yc + h) # noqa small_coord = 0, 0, min(w, x2 - x1), min(y2 - y1, h) return (x1, y1, x2, y2), small_coord def adjust_box_anns(bbox, scale_ratio, padw, padh, w_max, h_max): bbox[:, 0::2] = np.clip(bbox[:, 0::2] * scale_ratio + padw, 0, w_max) bbox[:, 1::2] = np.clip(bbox[:, 1::2] * scale_ratio + padh, 0, h_max) return bbox class COCOYoloXDataset: """ YoloX Dataset for COCO """ def __init__(self, root, ann_file, remove_images_without_annotations=True, filter_crowd_anno=True, is_training=True, mosaic=True, img_size=(640, 640), preproc=None, input_dim=(640, 640), mosaic_prob=1.0, enable_mosaic=True, eable_mixup=True, mixup_prob=1.0): self.coco = COCO(ann_file) self.img_ids = list(self.coco.imgs.keys()) self.filter_crowd_anno = filter_crowd_anno self.is_training = is_training self.root = root self.mosaic = mosaic self.img_size = img_size self.preproc = preproc self.input_dim = input_dim self.mosaic_prob = mosaic_prob self.enable_mosaic = enable_mosaic self.degrees = 10.0 self.translate = 0.1 self.scale = (0.5, 1.5) self.mixup_scale = (0.5, 1.5) self.shear = 2.0 self.perspective = 0.0 self.mixup_prob = mixup_prob self.enable_mixup = eable_mixup if remove_images_without_annotations: img_ids = [] for img_id in self.img_ids: ann_ids = self.coco.getAnnIds(imgIds=img_id, iscrowd=None) anno = self.coco.loadAnns(ann_ids) if has_valid_annotation(anno): img_ids.append(img_id) self.img_ids = img_ids self.categories = {cat["id"]: cat["name"] for cat in self.coco.cats.values()} self.cat_ids_to_continuous_ids = {v: i for i, v in enumerate(self.coco.getCatIds())} self.continuous_ids_cat_ids = {v: k for k, v in self.cat_ids_to_continuous_ids.items()} def pull_item(self, index): """ pull image and label """ res, img_info, _ = self.load_anno_from_ids(index) img = self.load_resized_img(index) return img, res.copy(), img_info, np.array([self.img_ids[index]]) def mosaic_proc(self, idx): """ Mosaic data augment """ if self.enable_mosaic and random.random() < self.mosaic_prob: mosaic_labels = [] input_dim = self.input_dim input_h, input_w = input_dim[0], input_dim[1] yc = int(random.uniform(0.5 * input_h, 1.5 * input_h)) xc = int(random.uniform(0.5 * input_w, 1.5 * input_w)) # 3 additional image indices indices = [idx] + [random.randint(0, len(self.img_ids) - 1) for _ in range(3)] for i_mosaic, index in enumerate(indices): img, _labels, _, _ = self.pull_item(index) h0, w0 = img.shape[:2] # orig hw scale = min(1. * input_h / h0, 1. * input_w / w0) img = cv2.resize( img, (int(w0 * scale), int(h0 * scale)), interpolation=cv2.INTER_LINEAR ) # generate output mosaic image (h, w, c) = img.shape[:3] if i_mosaic == 0: mosaic_img = np.full((input_h * 2, input_w * 2, c), 114, dtype=np.uint8) # suffix l means large image, while s means small image in mosaic aug. (l_x1, l_y1, l_x2, l_y2), (s_x1, s_y1, s_x2, s_y2) = get_mosaic_coordinate( mosaic_img, i_mosaic, xc, yc, w, h, input_h, input_w ) mosaic_img[l_y1:l_y2, l_x1:l_x2] = img[s_y1:s_y2, s_x1:s_x2] padw, padh = l_x1 - s_x1, l_y1 - s_y1 labels = _labels.copy() # Normalized xywh to pixel xyxy format if _labels.size > 0: labels[:, 0] = scale * _labels[:, 0] + padw labels[:, 1] = scale * _labels[:, 1] + padh labels[:, 2] = scale * _labels[:, 2] + padw labels[:, 3] = scale * _labels[:, 3] + padh mosaic_labels.append(labels) if mosaic_labels: mosaic_labels = np.concatenate(mosaic_labels, 0) np.clip(mosaic_labels[:, 0], 0, 2 * input_w, out=mosaic_labels[:, 0]) np.clip(mosaic_labels[:, 1], 0, 2 * input_h, out=mosaic_labels[:, 1]) np.clip(mosaic_labels[:, 2], 0, 2 * input_w, out=mosaic_labels[:, 2]) np.clip(mosaic_labels[:, 3], 0, 2 * input_h, out=mosaic_labels[:, 3]) mosaic_img, mosaic_labels = random_affine( mosaic_img, mosaic_labels, target_size=(input_w, input_h), degrees=self.degrees, translate=self.translate, scales=self.scale, shear=self.shear, ) if ( self.enable_mixup and not mosaic_labels.size == 0 and random.random() < self.mixup_prob ): mosaic_img, mosaic_labels = self.mixup(mosaic_img, mosaic_labels, self.input_dim) mix_img, padded_labels, pre_fg_mask, is_inbox_and_incenter = self.preproc(mosaic_img, mosaic_labels, self.input_dim) # ----------------------------------------------------------------- # img_info and img_id are not used for training. # They are also hard to be specified on a mosaic image. # ----------------------------------------------------------------- return mix_img, padded_labels, pre_fg_mask, is_inbox_and_incenter img, label, _, _ = self.pull_item(idx) img, label, pre_fg_mask, is_inbox_and_incenter = self.preproc(img, label, self.input_dim) return img, label, pre_fg_mask, is_inbox_and_incenter def mixup(self, origin_img, origin_labels, input_dim): """ Mixup data augment """ jit_factor = random.uniform(*self.mixup_scale) FLIP = random.uniform(0, 1) > 0.5 cp_labels = np.empty(0) while not cp_labels.size: cp_index = random.randint(0, self.__len__() - 1) cp_labels, _, _ = self.load_anno_from_ids(cp_index) img, cp_labels, _, _ = self.pull_item(cp_index) if len(img.shape) == 3: cp_img = np.ones((input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114 else: cp_img = np.ones(input_dim, dtype=np.uint8) * 114 cp_scale_ratio = min(input_dim[0] / img.shape[0], input_dim[1] / img.shape[1]) resized_img = cv2.resize( img, (int(img.shape[1] * cp_scale_ratio), int(img.shape[0] * cp_scale_ratio)), interpolation=cv2.INTER_LINEAR, ) cp_img[: int(img.shape[0] * cp_scale_ratio), : int(img.shape[1] * cp_scale_ratio)] = resized_img cp_img = cv2.resize( cp_img, (int(cp_img.shape[1] * jit_factor), int(cp_img.shape[0] * jit_factor)), ) cp_scale_ratio *= jit_factor if FLIP: cp_img = cp_img[:, ::-1, :] origin_h, origin_w = cp_img.shape[:2] target_h, target_w = origin_img.shape[:2] padded_img = np.zeros( (max(origin_h, target_h), max(origin_w, target_w), 3), dtype=np.uint8 ) padded_img[:origin_h, :origin_w] = cp_img x_offset, y_offset = 0, 0 if padded_img.shape[0] > target_h: y_offset = random.randint(0, padded_img.shape[0] - target_h - 1) if padded_img.shape[1] > target_w: x_offset = random.randint(0, padded_img.shape[1] - target_w - 1) padded_cropped_img = padded_img[y_offset: y_offset + target_h, x_offset: x_offset + target_w] cp_bboxes_origin_np = adjust_box_anns( cp_labels[:, :4].copy(), cp_scale_ratio, 0, 0, origin_w, origin_h ) if FLIP: cp_bboxes_origin_np[:, 0::2] = (origin_w - cp_bboxes_origin_np[:, 0::2][:, ::-1]) cp_bboxes_transformed_np = cp_bboxes_origin_np.copy() cp_bboxes_transformed_np[:, 0::2] = np.clip( cp_bboxes_transformed_np[:, 0::2] - x_offset, 0, target_w ) cp_bboxes_transformed_np[:, 1::2] = np.clip( cp_bboxes_transformed_np[:, 1::2] - y_offset, 0, target_h ) keep_list = box_candidates(cp_bboxes_origin_np.T, cp_bboxes_transformed_np.T, 5) if keep_list.sum() >= 1.0: cls_labels = cp_labels[keep_list, 4:5].copy() box_labels = cp_bboxes_transformed_np[keep_list] labels = np.hstack((box_labels, cls_labels)) origin_labels = np.vstack((origin_labels, labels)) origin_img = origin_img.astype(np.float32) origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype(np.float32) return origin_img.astype(np.uint8), origin_labels def load_anno_from_ids(self, index): """ load annotations via ids """ img_id = self.img_ids[index] im_ann = self.coco.loadImgs(img_id)[0] width = im_ann["width"] height = im_ann["height"] ann_ids = self.coco.getAnnIds(imgIds=img_id) annotations = self.coco.loadAnns(ann_ids) objs = [] for obj in annotations: x1 = np.max((0, obj["bbox"][0])) y1 = np.max((0, obj["bbox"][1])) x2 = np.min((width, x1 + np.max((0, obj["bbox"][2])))) y2 = np.min((height, y1 + np.max((0, obj["bbox"][3])))) if obj["area"] > 0 and x2 >= x1 and y2 >= y1: obj["clean_bbox"] = [x1, y1, x2, y2] objs.append(obj) nums_objs = len(objs) res = np.zeros((nums_objs, 5)) for ix, obj in enumerate(objs): cls = self.cat_ids_to_continuous_ids[obj["category_id"]] res[ix, 0:4] = obj["clean_bbox"] res[ix, 4] = cls r = min(self.img_size[0] / height, self.img_size[1] / width) res[:, :4] *= r img_info = (height, width) resize_info = (int(height * r), int(width * r)) return res, img_info, resize_info def load_resized_img(self, index): """ resize to fix size """ img_id = self.img_ids[index] img_path = self.coco.loadImgs(img_id)[0]["file_name"] img_path = os.path.join(self.root, img_path) img = cv2.imread(img_path) img = np.array(img) r = min(self.img_size[0] / img.shape[0], self.img_size[1] / img.shape[1]) resize_img = cv2.resize( img, (int(img.shape[1] * r), int(img.shape[0] * r)), interpolation=cv2.INTER_LINEAR, ).astype(np.uint8) return resize_img def __getitem__(self, index): if self.is_training: img, labels, pre_fg_mask, is_inbox_and_incenter = self.mosaic_proc(index) return img, labels, pre_fg_mask, is_inbox_and_incenter img, _, img_info, img_id = self.pull_item(index) if self.preproc is not None: img, _ = self.preproc(img, self.input_dim) img = img.astype(np.float32) return img, img_info, img_id def __len__(self): return len(self.img_ids) def create_yolox_dataset(image_dir, anno_path, batch_size, device_num, rank, data_aug=True, is_training=True): """ create yolox dataset """ cv2.setNumThreads(0) if is_training: filter_crowd = False remove_empty_anno = False else: filter_crowd = False remove_empty_anno = False img_size = config.input_size input_dim = img_size if is_training: yolo_dataset = COCOYoloXDataset(root=image_dir, ann_file=anno_path, filter_crowd_anno=filter_crowd, remove_images_without_annotations=remove_empty_anno, is_training=is_training, mosaic=data_aug, eable_mixup=data_aug, enable_mosaic=data_aug, preproc=TrainTransform(config=config), img_size=img_size, input_dim=input_dim) else: yolo_dataset = COCOYoloXDataset( root=image_dir, ann_file=anno_path, filter_crowd_anno=filter_crowd, remove_images_without_annotations=remove_empty_anno, is_training=is_training, mosaic=False, eable_mixup=False, img_size=img_size, input_dim=input_dim, preproc=ValTransform(legacy=False) ) cores = multiprocessing.cpu_count() num_parallel_workers = int(cores / device_num) if is_training: dataset_column_names = ["image", "labels", "pre_fg_mask", "is_inbox_and_inCenter"] ds = de.GeneratorDataset(yolo_dataset, column_names=dataset_column_names, num_parallel_workers=min(8, num_parallel_workers), python_multiprocessing=True, shard_id=rank, num_shards=device_num, shuffle=True) ds = ds.batch(batch_size, drop_remainder=True) else: # for val ds = de.GeneratorDataset(yolo_dataset, column_names=["image", "image_shape", "img_id"], num_parallel_workers=min(8, num_parallel_workers), shuffle=False) ds = ds.batch(batch_size, drop_remainder=False) ds = ds.repeat(1) return ds 3.6 模型构建 模型结构图如图所示: #### 3.6.1 基础模块 模型基础组件,BaseConv, DWConv卷积块, Bottleneck、SPPBottleneck、CSPLayer以及Focus, 在后续模型构建中使用 #------------------------# # network blocks #------------------------# class SiLU(nn.Cell): def __init__(self): super(SiLU, self).__init__() self.silu = nn.Sigmoid() def construct(self, x): return x * self.silu(x) def get_activation(name="silu"): """ get the activation function """ if name == "silu": module = SiLU() elif name == "relu": module = nn.ReLU() elif name == "lrelu": module = nn.LeakyReLU(0.1) else: raise AttributeError("Unsupported activate type: {}".format(name)) return module class BaseConv(nn.Cell): """ A conv2d -> BatchNorm -> silu/leaky relu block """ def __init__( self, in_channels, out_channels, ksize, stride, groups=1, bias=False, act="silu"): super(BaseConv, self).__init__() # same padding pad = (ksize - 1) // 2 self.conv = nn.Conv2d( in_channels=in_channels, out_channels=out_channels, kernel_size=ksize, stride=stride, padding=pad, pad_mode="pad", group=groups, has_bias=bias ) self.bn = nn.BatchNorm2d(out_channels) self.act = get_activation(act) def construct(self, x): x = self.act(self.bn(self.conv(x))) return x def use_syc_bn(network): """Use synchronized batchnorm layer""" for _, cell in network.cells_and_names(): if isinstance(cell, BaseConv): out_channels = cell.bn.num_features cell.bn = nn.SyncBatchNorm(out_channels) class DWConv(nn.Cell): """Depthwise Conv + Point Conv""" def __init__(self, in_channels, out_channels, ksize, stride=1, act="silu"): super(DWConv, self).__init__() self.dconv = BaseConv( in_channels, in_channels, ksize=ksize, stride=stride, groups=in_channels, act=act ) self.pconv = BaseConv( in_channels, out_channels, ksize=1, stride=1, groups=1, act=act, ) def construct(self, x): x = self.dconv(x) return self.pconv(x) class Bottleneck(nn.Cell): """ Standard bottleneck """ def __init__( self, in_channels, out_channels, shortcut=True, expansion=0.5, depthwise=False, act="silu" ): super(Bottleneck, self).__init__() hidden_channels = int(out_channels * expansion) Conv = DWConv if depthwise else BaseConv self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act) self.conv2 = Conv(hidden_channels, out_channels, 3, stride=1, act=act) self.use_add = shortcut and in_channels == out_channels def construct(self, x): y = self.conv2(self.conv1(x)) if self.use_add: y = y + x return y class ResLayer(nn.Cell): "Residual layer with `in_channels` inputs." def __init__(self, in_channels: int): super().__init__() mid_channels = in_channels // 2 self.layer1 = BaseConv( in_channels, mid_channels, ksize=1, stride=1, act="lrelu" ) self.layer2 = BaseConv( mid_channels, in_channels, ksize=3, stride=1, act="lrelu" ) def construct(self, x): out = self.layer2(self.layer1(x)) return x + out class SPPBottleneck(nn.Cell): """Spatial pyramid pooling layer used in YOLOv3-SPP """ def __init__( self, in_channels, out_channels, kernel_sizes=(5, 9, 13), activation="silu" ): super(SPPBottleneck, self).__init__() hidden_channels = in_channels // 2 self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=activation) self.m = nn.CellList( [ nn.MaxPool2d(kernel_size=ks, stride=1) for ks in kernel_sizes ] ) self.pad0 = ops.Pad(((0, 0), (0, 0), (kernel_sizes[0] // 2, kernel_sizes[0] // 2), (kernel_sizes[0] // 2, kernel_sizes[0] // 2))) self.pad1 = ops.Pad(((0, 0), (0, 0), (kernel_sizes[1] // 2, kernel_sizes[1] // 2), (kernel_sizes[1] // 2, kernel_sizes[1] // 2))) self.pad2 = ops.Pad(((0, 0), (0, 0), (kernel_sizes[2] // 2, kernel_sizes[2] // 2), (kernel_sizes[2] // 2, kernel_sizes[2] // 2))) conv2_channels = hidden_channels * (len(kernel_sizes) + 1) self.conv2 = BaseConv(conv2_channels, out_channels, 1, stride=1, act=activation) def construct(self, x): x = self.conv1(x) op = ops.Concat(axis=1) x1 = self.m[0](self.pad0(x)) x2 = self.m[1](self.pad1(x)) x3 = self.m[2](self.pad2(x)) x = op((x, x1, x2, x3)) x = self.conv2(x) return x class CSPLayer(nn.Cell): """C3 in yolov5, CSP Bottleneck with 3 convolutions""" def __init__( self, in_channels, out_channels, n=1, shortcut=True, expansion=0.5, depthwise=False, act="silu", ): """ Args: in_channels (int): input channels. out_channels (int): output channels. n (int): number of Bottlenecks. Default value: 1. """ # ch_in, ch_out, number, shortcut, groups, expansion super().__init__() hidden_channels = int(out_channels * expansion) # hidden channels self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act) self.conv2 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act) self.conv3 = BaseConv(2 * hidden_channels, out_channels, 1, stride=1, act=act) module_list = [ Bottleneck( hidden_channels, hidden_channels, shortcut, 1.0, depthwise, act=act ) for _ in range(n) ] self.m = nn.SequentialCell(module_list) def construct(self, x): x_1 = self.conv1(x) x_2 = self.conv2(x) x_1 = self.m(x_1) op = ops.Concat(axis=1) x = op((x_1, x_2)) return self.conv3(x) class Focus(nn.Cell): """Focus width and height information into channel space.""" def __init__(self, in_channels, out_channels, ksize=1, stride=1, act="silu"): super().__init__() self.conv = BaseConv(in_channels * 4, out_channels, ksize, stride, act=act) def construct(self, x): """ Focus forward """ # shape of x (b,c,w,h) -> y(b,4c,w/2,h/2) patch_top_left = x[..., ::2, ::2] patch_top_right = x[..., ::2, 1::2] patch_bot_left = x[..., 1::2, ::2] patch_bot_right = x[..., 1::2, 1::2] op = ops.Concat(axis=1) x = op( (patch_top_left, patch_bot_left, patch_top_right, patch_bot_right) ) return self.conv(x) 3.6.2 Darknet 两种darknet结构,本代码默认使用Darknet(用于YOLOFPN),需要使用CSPDarknet(用于YOLOPAFPN)需将backbone设置修改为yolox_xtotal_epoch, 两个阶段总共训练的epoch data_dir, coco格式数据集的位置(按照目录存放) per_batch_size, 训练、验证时的batch size num_classes, 数据集对应的种类数 input_size, 输入模型的图像尺寸,设置为32的倍数 no_aug_epochs, total_epoch - max_epoch log_path, 存放eval结果的路径 run_eval, 是否在训练时启用eval eval_interval, eval的频率 classes_path, 存放数据集类别名称的文件位置,需要pred时使用 pred_input, 测试图片位置,需要pred时使用 pred_output, 测试输出位置,需要pred时使用 ckpt_path, 存放输出权重
-
前言基于java使用SDK实现图像识别,主要以媒资图像标签和名人识别为例。一、环境配置Maven(没有直接下载华为的SDK包,而是使用Maven安装依赖)JDK19(官方的SDK包要求JDK版本必须高于JDK8版本,大家根据自己只要满足版本要求即可)开发工具:IDEA 2023.3(其他版本也可)能创建Maven项目即可开通图像识别服务(目前是免费体验):这里我开通的是图像标签/媒资图像标签和名人识别服务。设置访问密钥服务区域:我开通的服务区域是华北-北京四关键步骤Maven项目的创建和Java环境变量的配置我就不再赘诉,这是大家学习java早已熟练掌握的,这里只讲诉易错的。开通图像识别服务 华为云首页就有云产品体验区(找不到就在搜索栏检索),勾选AI: 点击“立即体验”后,找到服务列表,开通你想要的服务(点击开通): 设置访问密钥 在控制台找到“我的凭证”: 找到“访问密钥”,如果没有就新增,新增后一定要下载密钥的CSV文件,他会有提示让你下载,防止你忘记: 下载完csv文件后用记事本打开即可看到AK和SK: Maven引入依赖配置 版本可以自己切换 <dependency> <groupId>com.huaweicloud.sdk</groupId> <artifactId>huaweicloud-sdk-image</artifactId> <version>3.1.8</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.70</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpcore</artifactId> <version>4.4.16</version> </dependency> <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> <version>1.16.0</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.13.0</version> </dependency>二、图像识别实例媒资图像标签功能介绍:对用户传入的图像可以返回图像中的物体名称、所属类别及置信度信息。使用图片是网上的,仅作学习使用: 代码如下:/** * @Version: 1.0.0 * @Author: Dragon_王 * @ClassName: RunImageMediaTaggingSolution * @Description: 媒资图像标签 * @Date: 2024/1/8 11:51 */ /** * 此demo仅供测试使用,强烈建议使用SDK * 使用前需配置依赖jar包。jar包可通过下载SDK获取 */ import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.image.v2.region.ImageRegion; import com.huaweicloud.sdk.image.v2.*; import com.huaweicloud.sdk.image.v2.model.*; public class RunImageMediaTaggingSolution { public static void main(String[] args) { //此处需要输入您的AK/SK信息 String ak = "你的AK"; String sk = "你的SK"; ICredential auth = new BasicCredentials() .withAk(ak) .withSk(sk); ImageClient client = ImageClient.newBuilder() .withCredential(auth) .withRegion(ImageRegion.valueOf("cn-north-4")) //此处替换为您开通服务的区域 .build(); RunImageMediaTaggingRequest request = new RunImageMediaTaggingRequest(); ImageMediaTaggingReq body = new ImageMediaTaggingReq(); body.withThreshold(10f); body.withLanguage("zh"); body.withUrl("https://tse2-mm.cn.bing.net/th/id/OIP-C.SIuEnb1-arhtDNqfdICVqAHaE7?rs=1&pid=ImgDetMain"); //此处替换为公网可以访问的图片地址 request.withBody(body); try { RunImageMediaTaggingResponse response = client.runImageMediaTagging(request); System.out.println(response.toString()); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } }运行结果: //运行结果如下 class RunImageMediaTaggingResponse { result: class ImageMediaTaggingResponseResult { tags: [class ImageMediaTaggingItemBody { confidence: 83.63 type: 动物 tag: 金毛犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 金毛犬 en: Golden retriever } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 81.78 type: 动物 tag: 金毛 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 金毛 en: Golden hair } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 77.00 type: 动物 tag: 金毛寻猎犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 金毛寻猎犬 en: Golden Retriever } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 62.60 type: 动物 tag: 贵妇犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 贵妇犬 en: Poodle } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 59.02 type: 生活 tag: 狗链 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 狗链 en: Dog chain } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 生活 en: Life } instances: [] }, class ImageMediaTaggingItemBody { confidence: 53.84 type: 动物 tag: 宠物狗 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 宠物狗 en: Pet dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 48.01 type: 动物 tag: 狗狗 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 狗狗 en: Dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 44.02 type: 动物 tag: 犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 犬 en: Dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 42.11 type: 动物 tag: 纯种犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 纯种犬 en: Purebred dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 38.65 type: 动物 tag: 中华田园犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 中华田园犬 en: Chinese pastoral dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }] } } Process finished with exit code 0名人识别功能介绍:分析并识别图片中包含的敏感人物、明星及网红人物,返回人物信息及人脸坐标。使用照片是网上的照片,仅作学习使用: 代码如下:/** * @Version: 1.0.0 * @Author: Dragon_王 * @ClassName: RunCelebrityRecognitionSolution * @Description: 媒资标签 * @Date: 2024/1/9 16:23 */ import com.alibaba.fastjson.JSON; import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.image.v2.ImageClient; import com.huaweicloud.sdk.image.v2.model.RunCelebrityRecognitionRequest; import com.huaweicloud.sdk.image.v2.region.ImageRegion; import com.huaweicloud.sdk.image.v2.model.CelebrityRecognitionReq; import com.huaweicloud.sdk.image.v2.model.RunCelebrityRecognitionResponse; public class RunCelebrityRecognitionSolution { public static void main(String[] args) { // 认证用的ak和sk硬编码到代码中或者明文存储都有很大的安全风险,建议在配置文件或者环境变量中密文存放,使用时解密,确保安全 // 本示例以ak和sk保存在环境变量中来实现身份验证为例,运行本示例前请先在本地环境中设置环境变量HUAWEICLOUD_SDK_AK和HUAWEICLOUD_SDK_SK String ak = "你的AK"; String sk = "你的SK"; ICredential auth = new BasicCredentials() .withAk(ak) .withSk(sk); ImageClient client = ImageClient.newBuilder() .withCredential(auth) .withRegion(ImageRegion.valueOf("cn-north-4")) //此处替换为您开通服务的区域 .build(); RunCelebrityRecognitionRequest request = new RunCelebrityRecognitionRequest(); CelebrityRecognitionReq body = new CelebrityRecognitionReq(); body.withThreshold(0f); body.withUrl("https://tse1-mm.cn.bing.net/th/id/OIP-C.tM6jifW1xaCDP7Kia9QiYwHaKD?rs=1&pid=ImgDetMain"); //此处替换为公网可以访问的图片地址 request.withBody(body); try { RunCelebrityRecognitionResponse response = client.runCelebrityRecognition(request); System.out.println(response.getHttpStatusCode()); System.out.println(JSON.toJSONString(response)); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } }运行结果:200 {"httpStatusCode":200,"result":[{"confidence":0.9985551,"faceDetail":{"w":132,"h":186,"x":197,"y":79},"label":"成龙"}]} Process finished with exit code 0总结以上就是华为云的AI图像识别服务调用,这里提供官方文档
-
基于华为云ModelArts实现垃圾分类一、项目背景为保护环境和节约资源,国家推出垃圾分类相关政策,但目前人工进行垃圾分类仍存在很多困难,本项目是想利用华为云自动学习来实现垃圾分类。华为云自动学习华为云自动学习是帮助人们实现AI应用的低门槛、高灵活、零代码的定制化模型开发工具。自动学习功能根据标注数据自动设计模型、自动调参、自动训练、自动压缩和部署模型。开发者无需专业的开发基础和编码能力,只需上传数据,通过自动学习界面引导和简单操作即可完成模型训练和部署。运用华为云的自动学习技术针对图像分类的定制开发,可以实现垃圾分类。二、项目步骤1. 开发环境准备(1)完成华为云账号的注册,并添加凭证密钥后可准备进行下一步的开发(2)创建ModelArts图像分类项目(3)创建OBS桶,并在这个桶中创建输入输出两个文件夹(4)申请数据集资源2. 数据标注(1)上传上述获取的数据集资源,这个数据集资源中包含两个图片,一个是训练集,一个是测试集(2)训练集当中包含多种垃圾,根据不同垃圾类型对图片完成标注,如“一次性快餐盒-其他垃圾”,“易拉罐-可回收物”等(3)完成标注后即可开始自动训练(4)自动训练完成后可部署得到一个在线服务,上传测试训练集,点击预测进行预测(5)完成实验项目之后删除自动学习项目和OBS在线资源即可
-
曾拿过这类比赛奖项,做过硬件Ascend一些项目,能够结合起来,来的加
-
这个训练一直报错,说什么调整输入输出函数
-
图像标签API返回值,为什么有的照片"instances": [],有的照片"instances": [{"bounding_box": {"height": 198.77829749767596,"top_left_x": 2.7452523158146787,"top_left_y": 15.79517181103046,"width": 211.2192412156325},"confidence": "98.91"}]这个instances到底什么样的照片才会有结果
-
{ "result": { "tags": [ { "confidence": "97.16", "i18n_tag": { "en": "Rabbit", "zh": "兔" }, "i18n_type": { "en": "Animal", "zh": "动物" }, "instances": [], "tag": "兔", "type": "动物" } ] }}
-
兔年开工大吉!开工利是这就奉上~华为云开天aPaaS送新春礼物啦!《大话aPaaS》科普栏目第二期视频在B站上线了本期视频教大家如何用华为云开天集成工作台生成新春礼物送给亲朋好友如果大家感兴趣,完成了课后作业就有机会赢得华为音箱、罗技鼠标、折叠包、U盘、云宝公仔等好礼~所以,快点击视频去一探究竟,玩起来吧!*本视频里的流编排需要用到的参数:佐糖图片修复的X-API-KEY:wxffglo10hgj1rpg7(调用次数有限,如用完,可自行用手机号注册获取API KEY,网址https://picwish.cn/api-account)百度智能云获取client_id:OaqGjPWahSroKqNrltGiOYGK 百度智能云client_secret:BlUpXoyaR3UkkcDQ7k88YW6RMWVbhAqj百度智能云grant_type:client_credentials*创建函数所需代码: var url=inputData.url; url1=encodeURIComponent(url);let result = url1;return result;【课后作业】大家感兴趣的话,也可以用华为云开天集成工作台创建连接器做流编排实现某个场景,不限主题,不限数量,自由创作,完成后将流运行日志和运行结果截图发至zhouyao46@h-partners.com,我们将从创意、可行性等维度来评选优秀作品,发放华为音箱、罗技鼠标、折叠包、U盘、云宝公仔等好礼,此作业长期有效,欢迎投稿~PS:《大话aPaaS》是开天aPaaS运营团队在B站推出的科普栏目以轻松、好玩的方式科普开天aPaaS及其应用场景,寓教于乐本月推出的是第二期上期回顾:《大话aPaaS》EP1:2023了,我不允许你们还不知道华为云开天aPaaS!请大家多多转发加关注,更多精彩点这里其他相关推荐怎样使用集成工作台>>用户指南3个短视频快速上手集成工作台>>01-如何利用模板创建一个自动化流02-从创建连接器开始完成从空白创建流03-手动触发流与自动化流的区别
-
但mnist数据集我已经下载好放在obs桶里面了,在下图的dataset文件夹里面,分为train和test两个文件夹按理说应该是可以找到训练集进行训练才对吧,启动文件为 train-lenet.py,data_path = "num-lenet/dataset/",加载数据集的时候指向train文件夹,但就是找不到,我不清楚在modelarts平台上怎么进行处理,希望能有大佬解惑,我的代码是在gitee上下载的。这个是华为的一个实验,但是我没有找到官方的执行脚本,所以从gitee上面下载。aubrey_zhangzh 发表于2023-01-14 15:01:02 2023-01-14 15:01:02 最后回复 aubrey_zhangzh 2023-01-31 10:03:16106 5
-
目前我希望在Atlas 200 DK上完成图像识别的计算任务,于是我做了以下几个步骤:1.我通过Tensorflow2.10.0针对MNIST训练1个CNN模型 M(.h5);2.然后将M(.h5)转化为Atlas200dk能识别的Frozen_M(.pb);3.在Atlas 200 Dk上用ACT对Frozen_M转化为npu_M(.om)格式后,用ACL调用.om文件进行推理测试,测试得出推理速度NPU_speed;4.在Atlas 200 Dk直接用tensorflow加载M(.h5),用CPU进行推理测试,测试得出推理速度CPU_speed;最终测试结果发现,这个NPU_speed和CPU_speed不仅差不了多少甚至更差,如图1所示;其中在用npu_M(.om)进行推理时,npu的使用情况如图2所示;网上分享Atlas 200DK在通过YOLO处理目标检测时,npu对比cpu的推理速度20-300倍,但是在用CNN处理图像识别时,却没法展现性能么?图1:上面方框为NPU_speed,下面方框是CPU_speed.图2:NPU的内存消耗与核数占用.
-
昇腾芯片CANN中国研发的AI平台200DK开发板
-
借用李清照《如梦令》中的一词
-
参数:Field, the happy girl ,the flowers,Fairy tale style, full-color。
上滑加载中
推荐直播
-
0代码智能构建AI Agent——华为云AI原生应用引擎的架构与实践
2024/11/13 周三 16:30-18:00
苏秦 华为云aPaaS DTSE技术布道师
大模型及生成式AI对应用和软件产业带来了哪些影响?从企业场景及应用开发视角,面向AI原生应用需要什么样的工具及平台能力?企业要如何选好、用好、管好大模型,使能AI原生应用快速创新?本期直播,华为云aPaaS DTSE技术布道师苏秦将基于华为云自身实践出发,深入浅出地介绍华为云AI原生应用引擎,通过分钟级智能生成Agent应用的方式帮助企业完成从传统应用到智能应用的竞争力转型,使能千行万业智能应用创新。
去报名 -
TinyEngine低代码引擎系列第2讲——向下扎根,向上生长,TinyEngine灵活构建个性化低代码平台
2024/11/14 周四 16:00-18:00
王老师 华为云前端开发工程师,TinyEngine开源负责人
王老师将从TinyEngine 的灵活定制能力出发,带大家了解隐藏在低代码背后的潜在挑战及突破思路,通过实践及运用,帮助大家贴近面向未来低代码产品。
即将直播 -
华为云AI入门课:AI发展趋势与华为愿景
2024/11/18 周一 18:20-20:20
Alex 华为云学堂技术讲师
本期直播旨在帮助开发者熟悉理解AI技术概念,AI发展趋势,AI实用化前景,了解熟悉未来主要技术栈,当前发展瓶颈等行业化知识。帮助开发者在AI领域快速构建知识体系,构建职业竞争力。
即将直播
热门标签