• [问题求助] python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);" 报错
    安装了910b的cann,RC2,安装了torch2.3.1和torch_npu2.3.1,python3.10.15,一直报错python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);" 报错  [W compiler_depend.ts:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator()) EH9999: Inner Error! EH9999  [Init][Env]init env failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]         TraceBack (most recent call last):         build op model failed, result = 500001[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]  Traceback (most recent call last):   File "<string>", line 1, in <module>   File "/root/.local/lib/python3.10/site-packages/torch/_tensor.py", line 464, in __repr__     return torch._tensor_str._str(self, tensor_contents=tensor_contents)   File "/root/.local/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str     return _str_intern(self, tensor_contents=tensor_contents)   File "/root/.local/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern     tensor_str = _tensor_str(self, indent)   File "/root/.local/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str     formatter = _Formatter(get_summarized_data(self) if summarize else self)   File "/root/.local/lib/python3.10/site-packages/torch/_tensor_str.py", line 138, in __init__     tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0) RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is IsFinite. Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1. [ERROR] 2024-11-21-10:49:01 (PID:20848, Device:0, RankID:0) ERR00100 PTA call acl api failed 
  • [问题求助] 第九期问题求助:给的测试集图片可以直接标注后拿来训练吗
    第九期问题求助:给的4000多张测试集图片可以直接标注后拿来训练吗
  • [其他] 关于手机拍照的一些有趣想法
    最近忽然有个有趣的想法,关于手机应用的。 概况:目前拍照是手机售卖时竞争点,但往往比拼的是用了什么先进的硬件。而消费者的痛点是拿着好手机拍不出好照片。 思考:如果能用一个AI辅助工具帮助消费者拍出好照片,会很有市场。 目前竞品:目前的照相辅助app都是通过AI调节色度等来优化图片 具体想法:将网友旅游打卡照片在地图上实时定位(照片天气、拍摄时间同拍摄者相同才会在地图显示),APP使用者只用选择喜欢的照片,导航到该位置,然后按APP上自己喜欢的照片角度拍照,直接解决不会构图,不会控制光线的问题,再借鉴整合现有成熟的AI调色,就可以拍出完美照片。 优点:帮助消费者在所有景区经典打卡地拍出超出自身审美的优秀照片;如果能变成手机自身功能之一,可以降低手机对拍照硬件的比拼需求,降低成本。困难:需要抓取网络拍摄的优秀照片并将照片与地图结合,需要控制相机部件的调焦(在征求拍照人意见后)
  • [训练管理] yolov5训练报错,如何更换环境中的numpy版本
    我在进行yolov5模型训练时,得到了报错,TypeError: 'numpy.float64' object cannot be interpreted as an integer,根据CSDN上的回答,我尝试修改numpy版本,但依旧不成功。我查看日志发现,似乎知识成功下载了numpy:Looking in indexes: http://pip.modelarts.private.com:8888/repository/pypi/simple Collecting numpy==1.16.4   Downloading http://pip.modelarts.private.com:8888/repository/pypi/packages/numpy/1.16.4/numpy-1.16.4.zip (5.1 MB) Requirement already satisfied: pillow in /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages (from -r /home/ma-user/modelarts/user-job-dir/YOLOv5/requirements.txt (line 2)) (9.5.0),但却并没能成功安装,依旧是原来的版本Requirement already satisfied: numpy>=1.16.4 in /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages (from moxing-framework==2.1.6.879ab2f4) (1.21.2)请问我要如何才能正确修改numpy版本为我指定的版本呢?或者有没有别的办法来解决这个报错呢?
  • [训练管理] Modelarts上训练yolov5模型报错
    我是在gitee上下载的2.1版的官方mindspore yolov5模型代码,模型代码链接是cid:link_0,修改了配置文件来适应我的数据集,即配置文件有关数据集的部分被改动了(只改动了数据路径的部分,没有被删掉的项)(我采用了自己的数据集,标注格式和coco2017相同,数据集和和模型都存储在obs上),其他都没有变化,但在训练的时候出现这个错误:TypeError: modelarts_pre_process() missing 1 required positional argument: 'args'请问有人知道问题在哪里吗?我应该怎么进行修改呢?
  • [数据管理] 数据集
    求求大佬们教教我,PCB的数据集咋弄阿。感觉我的模型没问题的。
  • [问题求助] CANN ImportError: libhccl.so: cannot open shared object file
    python=3.8, torch=2.1.0 torch-npu=2.1.0,CANN_toolkit=7.0.1.1,版本应该都是对应的python是miniconda下的环境,有source set_env.sh ,可以find到libhccl.so这个文件,但是export LD_LIBRARY_PATH没有用报错信息:Traceback (most recent call last):   File "/home/x/miniconda3/envs/pytorch38/lib/python3.8/site-packages/torch_npu/__init__.py", line 14, in <module>     import torch_npu.npu   File "/home/x/miniconda3/envs/pytorch38/lib/python3.8/site-packages/torch_npu/npu/__init__.py", line 110, in <module>     from .utils import (synchronize, device_count, can_device_access_peer, set_device, current_device, get_device_name,   File "/home/x/miniconda3/envs/pytorch38/lib/python3.8/site-packages/torch_npu/npu/utils.py", line 11, in <module>     import torch_npu._CImportError: libhccl.so: cannot open shared object file: No such file or directory. Please check that the cann package is installed. Please run 'source set_env.sh' in the CANN installation path.
  • [训练管理] 训练作业的过程中怎么显示没有medpy模块报错,应该怎么解决?
    训练作业的过程中怎么显示没有这个模块报错,应该怎么解决?
  • [其他问题] loss曲线怎么调出来
    loss曲线和accuracy曲线是从哪里看啊
  • [问题求助] mdc 编码问题
    请问下文档中的ImageData结构体中的rawData 图像数据指针 mbufData Mbuf数据指针 这两个数据有啥区别呢?,是1)、里面存的东西是一样的,只是存储的位置不一样?2)、里面的数据不一样,存储的位置也不一样。文档中没有详细的说明。
  • [其他] YOLOv5(2)
     2.6 学习率  提供了4种不同的学习率形式选择,分别是:exponential,cosine_annealing,cosine_annealing_V2,cosine_annealing_sample。这里选择的是cosine_annealing  """Learning rate scheduler.""" import math from collections import Counter  import numpy as np  def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):     """Linear learning rate."""     lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)     lr = float(init_lr) + lr_inc * current_step     return lr  def warmup_step_lr(lr, lr_epochs, steps_per_epoch, warmup_epochs, max_epoch, gamma=0.1):     """Warmup step learning rate."""     base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)     milestones = lr_epochs     milestones_steps = []     for milestone in milestones:         milestones_step = milestone * steps_per_epoch         milestones_steps.append(milestones_step)      lr_each_step = []     lr = base_lr     milestones_steps_counter = Counter(milestones_steps)     for i in range(total_steps):         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             lr = lr * gamma**milestones_steps_counter[i]         lr_each_step.append(lr)      return np.array(lr_each_step).astype(np.float32)  def multi_step_lr(lr, milestones, steps_per_epoch, max_epoch, gamma=0.1):     return warmup_step_lr(lr, milestones, steps_per_epoch, 0, max_epoch, gamma=gamma)  def step_lr(lr, epoch_size, steps_per_epoch, max_epoch, gamma=0.1):     lr_epochs = []     for i in range(1, max_epoch):         if i % epoch_size == 0:             lr_epochs.append(i)     return multi_step_lr(lr, lr_epochs, steps_per_epoch, max_epoch, gamma=gamma)  def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0):     """Cosine annealing learning rate."""     base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)      lr_each_step = []     for i in range(total_steps):         last_epoch = i // steps_per_epoch         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2         lr_each_step.append(lr)      return np.array(lr_each_step).astype(np.float32)  def warmup_cosine_annealing_lr_V2(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0):     """Cosine annealing learning rate V2."""     base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)      last_lr = 0     last_epoch_V1 = 0      T_max_V2 = int(max_epoch * 1 / 3)      lr_each_step = []     for i in range(total_steps):         last_epoch = i // steps_per_epoch         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             if i < total_steps * 2 / 3:                 lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2                 last_lr = lr                 last_epoch_V1 = last_epoch             else:                 base_lr = last_lr                 last_epoch = last_epoch - last_epoch_V1                 lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max_V2)) / 2          lr_each_step.append(lr)     return np.array(lr_each_step).astype(np.float32)  def warmup_cosine_annealing_lr_sample(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0):     """Warmup cosine annealing learning rate."""     start_sample_epoch = 60     step_sample = 2     tobe_sampled_epoch = 60     end_sampled_epoch = start_sample_epoch + step_sample * tobe_sampled_epoch     max_sampled_epoch = max_epoch + tobe_sampled_epoch     T_max = max_sampled_epoch      base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     total_sampled_steps = int(max_sampled_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)      lr_each_step = []      for i in range(total_sampled_steps):         last_epoch = i // steps_per_epoch         if last_epoch in range(start_sample_epoch, end_sampled_epoch, step_sample):             continue         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2         lr_each_step.append(lr)      assert total_steps == len(lr_each_step)     return np.array(lr_each_step).astype(np.float32)  def get_lr(args, steps_per_epoch):     """generate learning rate."""     if args.lr_scheduler == 'exponential':         lr = warmup_step_lr(args.lr, args.lr_epochs, steps_per_epoch, args.warmup_epochs, args.max_epoch,                             gamma=args.lr_gamma)     elif args.lr_scheduler == 'cosine_annealing':         lr = warmup_cosine_annealing_lr(args.lr, steps_per_epoch, args.warmup_epochs,                                         args.max_epoch, args.T_max, args.eta_min)     elif args.lr_scheduler == 'cosine_annealing_V2':         lr = warmup_cosine_annealing_lr_V2(args.lr, steps_per_epoch, args.warmup_epochs,                                            args.max_epoch, args.T_max, args.eta_min)     elif args.lr_scheduler == 'cosine_annealing_sample':         lr = warmup_cosine_annealing_lr_sample(args.lr, steps_per_epoch, args.warmup_epochs,                                                args.max_epoch, args.T_max, args.eta_min)     else:         raise NotImplementedError(args.lr_scheduler)     return lr 2.7 模型训练 import os import time import mindspore as ms import mindspore.nn as nn import mindspore.communication as comm  from src.logger import get_logger from src.util import AverageMeter, get_param_groups, cpu_affinity from src.initializer import default_recurisive_init, load_yolov5_params  ms.set_seed(1)  def train_preprocess():     if args.lr_scheduler == 'cosine_annealing' and args.max_epoch > args.T_max:         args.T_max = args.max_epoch      args.lr_epochs = list(map(int, args.lr_epochs.split(',')))     args.data_root = os.path.join(args.data_dir, 'train2017')     args.annFile = os.path.join(args.data_dir, 'annotations/instances_train2017.json')     if args.pretrained_checkpoint:         args.pretrained_checkpoint = args.pretrained_checkpoint     args.device_id = int(os.getenv('DEVICE_ID', '1'))     ms.set_context(mode=ms.GRAPH_MODE, device_target=args.device_target)      if args.is_distributed:         # init distributed         init_distribute()      # for promoting performance in GPU device     if args.device_target == "GPU" and args.bind_cpu:         cpu_affinity(args.rank, min(args.group_size, 8))          # logger module is managed by config, it is used in other function. e.x. config.logger.info("xxx")     args.logger = get_logger(args.output_dir, args.rank)   def run_train():     train_preprocess()      loss_meter = AverageMeter('loss')     dict_version = {'yolov5s': 0, 'yolov5m': 1, 'yolov5l': 2, 'yolov5x': 3}     network = YOLOV5(is_training=True, version=dict_version[args.yolov5_version])     # default is kaiming-normal     default_recurisive_init(network)     load_yolov5_params(args, network)     network = YoloWithLossCell(network)      ds = create_yolo_dataset(image_dir=args.data_root, anno_path=args.annFile, is_training=True,                              batch_size=args.per_batch_size, device_num=args.group_size,                              rank=args.rank, config=args)     args.logger.info('Finish loading dataset')      steps_per_epoch = ds.get_dataset_size()     lr = get_lr(args, steps_per_epoch)     opt = nn.Momentum(params=get_param_groups(network), momentum=args.momentum, learning_rate=ms.Tensor(lr),                       weight_decay=args.weight_decay, loss_scale=args.loss_scale)     network = nn.TrainOneStepCell(network, opt, args.loss_scale // 2)     network.set_train()      data_loader = ds.create_tuple_iterator(do_copy=False)     first_step = True     t_end = time.time()      for epoch_idx in range(args.max_epoch):         for step_idx, data in enumerate(data_loader):             images = data[0]             input_shape = images.shape[2:4]             input_shape = ms.Tensor(tuple(input_shape[::-1]), ms.float32)             loss = network(images, data[2], data[3], data[4], data[5], data[6],                            data[7], input_shape)             loss_meter.update(loss.asnumpy())              # it is used for loss, performance output per config.log_interval steps.             if (epoch_idx * steps_per_epoch + step_idx) % args.log_interval == 0:                 time_used = time.time() - t_end                 if first_step:                     fps = args.per_batch_size * args.group_size / time_used                     per_step_time = time_used * 1000                     first_step = False                 else:                     fps = args.per_batch_size * args.log_interval * args.group_size / time_used                     per_step_time = time_used / args.log_interval * 1000                 args.logger.info('epoch[{}], iter[{}], {}, fps:{:.2f} imgs/sec, '                                    'lr:{}, per step time: {}ms'.format(                                     epoch_idx + 1, step_idx + 1, loss_meter, fps,                                      lr[epoch_idx * steps_per_epoch + step_idx], per_step_time))                 t_end = time.time()                 loss_meter.reset()         if args.rank == 0:             ckpt_name = os.path.join(args.output_dir, "yolov5_{}_{}.ckpt".format(epoch_idx + 1, steps_per_epoch))             ms.save_checkpoint(network, ckpt_name)      args.logger.info('==========end training===============')  if __name__ == "__main__":     run_train() (一大堆结果)2.8 模型测试 import os import time  import numpy as np  import mindspore as ms # from model_utils.config import config # from src.yolo import YOLOV5 from src.logger import get_logger from src.util import DetectionEngine # from src.yolo_dataset import create_yolo_dataset  from src.util import DetectionEngine  def eval_preprocess():     args.data_root = os.path.join(args.data_dir, 'val2017')     args.ann_file = os.path.join(args.data_dir, 'annotations/instances_val2017.json')     ms.set_context(mode=ms.GRAPH_MODE, device_target=args.device_target)      # logger module is managed by config, it is used in other function. e.x. config.logger.info("xxx")     args.logger = get_logger(args.output_dir, args.rank)  def load_parameters(network, filename):     args.logger.info("yolov5 pretrained network model: %s", filename)     param_dict = ms.load_checkpoint(filename)     param_dict_new = {}     for key, values in param_dict.items():         if key.startswith('moments.'):             continue         elif key.startswith('yolo_network.'):             param_dict_new[key[13:]] = values         else:             param_dict_new[key] = values     ms.load_param_into_net(network, param_dict_new)     args.logger.info('load_model %s success', filename)  def run_eval():     eval_preprocess()     start_time = time.time()     args.logger.info('Creating Network....')     dict_version = {'yolov5s': 0, 'yolov5m': 1, 'yolov5l': 2, 'yolov5x': 3}     network = YOLOV5(is_training=False, version=dict_version[args.yolov5_version])      if os.path.isfile(args.pretrained):         load_parameters(network, args.pretrained)     else:         raise FileNotFoundError(f"{args.pretrained} is not a filename.")      ds = create_yolo_dataset(args.data_root, args.ann_file, is_training=False, batch_size=args.per_batch_size,                              device_num=1, rank=0, shuffle=False, config=args)      args.logger.info('testing shape : %s', args.test_img_shape)     args.logger.info('total %d images to eval', ds.get_dataset_size() * args.per_batch_size)      network.set_train(False)      # init detection engine     detection = DetectionEngine(args, args.test_ignore_threshold)      input_shape = ms.Tensor(tuple(args.test_img_shape), ms.float32)     args.logger.info('Start inference....')     for index, data in enumerate(ds.create_dict_iterator(output_numpy=True, num_epochs=1)):         image = data["image"]         # adapt network shape of input data         image = np.concatenate((image[..., ::2, ::2], image[..., 1::2, ::2],                                 image[..., ::2, 1::2], image[..., 1::2, 1::2]), axis=1)         image = ms.Tensor(image)         image_shape_ = data["image_shape"]         image_id_ = data["img_id"]         output_big, output_me, output_small = network(image, input_shape)         output_big = output_big.asnumpy()         output_me = output_me.asnumpy()         output_small = output_small.asnumpy()         detection.detect([output_small, output_me, output_big], args.per_batch_size, image_shape_, image_id_)          if index % 50 == 0:             args.logger.info('Processing... {:.2f}% '.format(index / ds.get_dataset_size() * 100))      args.logger.info('Calculating mAP...')     detection.do_nms_for_results()     result_file_path = detection.write_result()     args.logger.info('result file path: %s', result_file_path)     eval_result = detection.get_eval_result()      cost_time = time.time() - start_time     eval_log_string = '\n=============coco eval result=========\n' + eval_result     args.logger.info(eval_log_string)     args.logger.info('testing cost time %.2f h', cost_time / 3600.)  if __name__ == "__main__":     run_eval() (一大堆结果)
  • [其他] YOLOv5(1)
     1、算法介绍 YOLOv5是一种单阶段目标检测算法,该算法在YOLOv4的基础上添加了一些新的改进思路,使其速度与精度都得到了极大的性能提升。需要说明的是,YOLOv5没有论文,其作者是Mosaic Augmentation 的创造者,YOLOv5在gtihub上的链接为:https://github.com/ultralytics/yolov5  1.1 模型结构: YOLOv5网络模型主要有四个部分组成,分别为:输入端,Backbone,Neck,Prediction。和YOLOv4相比,YOLOv5做了一些优化,主要有(1) 输入端:Mosaic数据增强,(2)Backbone:Focus结构,CSP结构, (3)Neck:FPN+PAN结构,(4)Prediction:GIOU_Loss。下面是YOLOv5的整体网络结构图: %E5%9B%BE%E7%89%87-4.png  2、模型实现 2.1 环境准备与数据读取  案例基于MindSpore1.8的GPU版本实现,在GXT1080TI单卡上完成训练。 案例使用数据集为coco_mini,是从COCO数据集中分离出来的一小部分数据,其中训练集50张图像,测试集10张图像,数据格式为图像和json文件。coco_mini数据集的下载链接为: https://pan.baidu.com/s/1FJ_Css0KoXqKqifmUzmBUw 提取码: g55f。下载好的数据集包括3个文件,分别对应数据标签,训练数据,测试数据,文件路径结构如下:    .datasets/     └── coco_mini_dataset         ├── annotations             ├──instances_train2017.json             └──instances_val2017.json          ├── train2017         └── val2017          > 下面是数据的可视化展示:  import numpy as np import matplotlib import os import glob import math from PIL import Image, ImageSequence from matplotlib import pyplot as plt  #显示下载好的数据 train_image_path = "dataset/mini_coco_dataset/train2017/"  image = [] for root, dirs, files in os.walk(train_image_path):     for i in range(6):         image.append(files[i])  def show_image(image_list,num = 6):     '''     #image_list: 图像序列,numpy数组     #num: 显示图片的数量     '''     img_titles = []     for ind,img in enumerate(image_list):         if ind == num:             break         img_titles.append(ind)      for i in range(len(img_titles)):         if len(img_titles) > 6:             row = 3         elif 3<len(img_titles)<=6:             row = 2         else:             row = 1         col = math.ceil(len(img_titles)/row)         plt.subplot(row,col,i+1),plt.imshow(Image.open(os.path.join(train_image_path, image[i])))         plt.title(img_titles[i])         plt.xticks([]),plt.yticks([])     plt.show()      show_image(image,num=4) 2.2 参数定义(包括lr,epoch, pretrained_checkpoints) import argparse import mindspore as ms  import sys sys.argv=[''] del sys  ms.set_seed(1)  parser = argparse.ArgumentParser('mindspore coco training')  # device related parser.add_argument('--device_target', type=str, default='GPU', help='device where the code will be implemented.')  # dataset related parser.add_argument('--data_dir', default='./dataset/mini_coco_dataset/', type=str, help='Train dataset directory.') parser.add_argument('--output_dir', default='./output', type=str, help='output') parser.add_argument('--pretrained_checkpoint', default='',type=str, help='pretrained_checkpoint') parser.add_argument('--per_batch_size', default=8, type=int, help='Batch size for Training. Default: 8')  # network related parser.add_argument('--yolov5_version', default='yolov5s', type=str,                     help='The version of YOLOv5, options: yolov5s, yolov5m, yolov5l, yolov5x') parser.add_argument('--pretrained_backbone', default='', type=str, help='The pretrained file of yolov5. Default: "".') parser.add_argument('--resume_yolov5', default='', type=str,                     help='The ckpt file of YOLOv5, which used to fine tune. Default: ""')  # optimizer and lr related parser.add_argument('--lr_scheduler', default='cosine_annealing', type=str,                     help='Learning rate scheduler, options: exponential, cosine_annealing. Default: exponential') parser.add_argument('--lr', default=0.0005, type=float, help='Learning rate. Default: 0.01') parser.add_argument('--lr_epochs', type=str, default='220,250',                     help='Epoch of changing of lr changing, split with ",". Default: 220,250') parser.add_argument('--lr_gamma', type=float, default=0.1,                     help='Decrease lr by a factor of exponential lr_scheduler. Default: 0.1') parser.add_argument('--eta_min', type=float, default=0., help='Eta_min in cosine_annealing scheduler. Default: 0') parser.add_argument('--T_max', type=int, default=300, help='T-max in cosine_annealing scheduler. Default: 320') parser.add_argument('--max_epoch', type=int, default=300, help='Max epoch num to train the model. Default: 320') parser.add_argument('--warmup_epochs', default=4, type=float, help='Warmup epochs. Default: 0') parser.add_argument('--weight_decay', type=float, default=0.0005, help='Weight decay factor. Default: 0.0005') parser.add_argument('--momentum', type=float, default=0.9, help='Momentum. Default: 0.9') parser.add_argument('--bind_cpu', default= True, help='Whether bind cpu when distributed training. Default: True') parser.add_argument('--resize_rate', default= 10, help='resize_rate') parser.add_argument('--anchor_scales', default= [[12, 16],[19, 36],[40, 28],[36, 75],[76, 55],[72, 146],[142, 110],                                                 [192, 243],                                                 [459, 401]], help='resize_rate') parser.add_argument('--input_shape', default= [[3, 32, 64, 128, 256, 512, 1],[3, 48, 96, 192, 384, 768, 2],                                               [3, 64, 128, 256, 512, 1024, 3],[3, 80, 160, 320, 640, 1280, 4]],                                                help='resize_rate') parser.add_argument('--num_classes', default= 80, help='num_classes') parser.add_argument('--max_box', default= 150, help='num_classes') parser.add_argument('--hue', default= 0.015, help='num_classes') parser.add_argument('--saturation', default= 1.5, help='num_classes') parser.add_argument('--value', default= 0.4, help='num_classes') parser.add_argument('--jitter', default= 0.3, help='num_classes')  # loss related parser.add_argument('--loss_scale', type=int, default=1024, help='Static loss scale. Default: 1024') parser.add_argument('--label_smooth', type=int, default=0, help='Whether to use label smooth in CE. Default:0') parser.add_argument('--label_smooth_factor', type=float, default=0.1,                     help='Smooth strength of original one-hot. Default: 0.1')  # logging related parser.add_argument('--log_interval', type=int, default=6, help='Logging interval steps. Default: 100') parser.add_argument('--ckpt_path', type=str, default='outputs/', help='Checkpoint save location. Default: outputs/') parser.add_argument('--ckpt_interval', type=int, default=None, help='Save checkpoint interval. Default: None')  # distributed related parser.add_argument('--is_distributed', type=int, default=0,                     help='Distribute train or not, 1 for yes, 0 for no. Default: 1') parser.add_argument('--rank', type=int, default=0, help='Local rank of distributed. Default: 0') parser.add_argument('--group_size', type=int, default=1, help='World size of device. Default: 1')  # test related parser.add_argument('--pretrained',default='./output/yolov5_300_6.ckpt', type=str, help='checkpoints') parser.add_argument('--test_img_shape',default=[640,640], help='test image shape') parser.add_argument('--test_ignore_threshold',default=0.001, help='test_ignore_threshold') parser.add_argument('--eval_nms_thresh',default=0.5, help='eval_nms_thresh') parser.add_argument('--ignore_threshold',default=0.5, help='ignore_threshold') parser.add_argument('--multi_label',default=True, help='ignore_threshold') parser.add_argument('--multi_label_thresh',default=0.1, help='ignore_threshold') parser.add_argument('--labels',default=[ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',           'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',           'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',           'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',           'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',           'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',           'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',           'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',           'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book',           'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'], help='labels') parser.add_argument('--coco_ids',default=[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27,             28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53,             54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80,             81, 82, 84, 85, 86, 87, 88, 89, 90 ], help='coco_ids')  args, _ = parser.parse_known_args()  2.3 数据集加载  由于YOLOv5主要目的是进行目标检测,因此在这里构建的数据集格式是参照COCO数据集格式进行构建,还定义了一些函数来对没有标注的数据进行筛选。  YOLOv5的数据集加载中最主要是利用了mosaic实现数据增强手段。Mosaic利用了四张图片,对四张图片进行拼接,每一张图片都有其对应的框,将四张图片拼接之后就获得一张新的图片,同时也获得这张图片对应的框,然后将这样一张新的图片传入到神经网络当中去学习。  """YOLOV5 dataset.""" from __future__ import division import os import multiprocessing import random import numpy as np import cv2 from PIL import Image import math import numpy as np from pycocotools.coco import COCO import mindspore.dataset as ds import mindspore.dataset.vision.c_transforms as CV from mindspore.dataset.vision import Normalize,HWC2CHW from src.transforms import reshape_fn, MultiScaleTrans, PreprocessTrueBox  min_keypoints_per_image = 10 GENERATOR_PARALLEL_WORKER = 8  class DistributedSampler:     """Distributed sampler."""      def __init__(self, dataset_size, num_replicas=None, rank=None, shuffle=True):         if num_replicas is None:             print("***********Setting world_size to 1 since it is not passed in ******************")             num_replicas = 1         if rank is None:             print("***********Setting rank to 0 since it is not passed in ******************")             rank = 0         self.dataset_size = dataset_size         self.num_replicas = num_replicas         self.rank = rank         self.epoch = 0         self.num_samples = int(math.ceil(dataset_size * 1.0 / self.num_replicas))         self.total_size = self.num_samples * self.num_replicas         self.shuffle = shuffle      def __iter__(self):         # deterministically shuffle based on epoch         if self.shuffle:             indices = np.random.RandomState(seed=self.epoch).permutation(self.dataset_size)             # np.array type. number from 0 to len(dataset_size)-1, used as             # index of dataset             indices = indices.tolist()             self.epoch += 1             # change to list type         else:             indices = list(range(self.dataset_size))          # add extra samples to make it evenly divisible         indices += indices[:(self.total_size - len(indices))]         assert len(indices) == self.total_size          # subsample         indices = indices[self.rank:self.total_size:self.num_replicas]         assert len(indices) == self.num_samples          return iter(indices)      def __len__(self):         return self.num_samples  def _has_only_empty_bbox(anno):     return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno)  def _count_visible_keypoints(anno):     return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno)  def has_valid_annotation(anno):     """Check annotation file."""     # if it's empty, there is no annotation     if not anno:         return False     # if all boxes have close to zero area, there is no annotation     if _has_only_empty_bbox(anno):         return False     # keypoints task have a slight different criteria for considering     # if an annotation is valid     if "keypoints" not in anno[0]:         return True     # for keypoint detection tasks, only consider valid images those     # containing at least min_keypoints_per_image     if _count_visible_keypoints(anno) >= min_keypoints_per_image:         return True     return False  class COCOYoloDataset:     """YOLOV5 Dataset for COCO."""     def __init__(self, root, ann_file, remove_images_without_annotations=True,                  filter_crowd_anno=True, is_training=True):         self.coco = COCO(ann_file)         self.root = root         self.img_ids = list(sorted(self.coco.imgs.keys()))         self.filter_crowd_anno = filter_crowd_anno         self.is_training = is_training         self.mosaic = True         # filter images without any annotations         if remove_images_without_annotations:             img_ids = []             for img_id in self.img_ids:                 ann_ids = self.coco.getAnnIds(imgIds=img_id, iscrowd=None)                 anno = self.coco.loadAnns(ann_ids)                 if has_valid_annotation(anno):                     img_ids.append(img_id)             self.img_ids = img_ids          self.categories = {cat["id"]: cat["name"] for cat in self.coco.cats.values()}          self.cat_ids_to_continuous_ids = {             v: i for i, v in enumerate(self.coco.getCatIds())         }         self.continuous_ids_cat_ids = {             v: k for k, v in self.cat_ids_to_continuous_ids.items()         }         self.count = 0      def _mosaic_preprocess(self, index, input_size):         labels4 = []         s = 384         self.mosaic_border = [-s // 2, -s // 2]         yc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border]         indices = [index] + [random.randint(0, len(self.img_ids) - 1) for _ in range(3)]         for i, img_ids_index in enumerate(indices):             coco = self.coco             img_id = self.img_ids[img_ids_index]             img_path = coco.loadImgs(img_id)[0]["file_name"]             img = Image.open(os.path.join(self.root, img_path)).convert("RGB")             img = np.array(img)             h, w = img.shape[:2]              if i == 0:  # top left                 img4 = np.full((s * 2, s * 2, img.shape[2]), 128, dtype=np.uint8)  # base image with 4 tiles                 x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)                 x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)             elif i == 1:  # top right                 x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc                 x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h             elif i == 2:  # bottom left                 x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)                 x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)             elif i == 3:  # bottom right                 x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)                 x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)              img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]              padw = x1a - x1b             padh = y1a - y1b              ann_ids = coco.getAnnIds(imgIds=img_id)             target = coco.loadAnns(ann_ids)             # filter crowd annotations             if self.filter_crowd_anno:                 annos = [anno for anno in target if anno["iscrowd"] == 0]             else:                 annos = [anno for anno in target]              target = {}             boxes = [anno["bbox"] for anno in annos]             target["bboxes"] = boxes              classes = [anno["category_id"] for anno in annos]             classes = [self.cat_ids_to_continuous_ids[cl] for cl in classes]             target["labels"] = classes              bboxes = target['bboxes']             labels = target['labels']             out_target = []              for bbox, label in zip(bboxes, labels):                 tmp = []                 # convert to [x_min y_min x_max y_max]                 bbox = self._convetTopDown(bbox)                 tmp.extend(bbox)                 tmp.append(int(label))                 # tmp [x_min y_min x_max y_max, label]                 out_target.append(tmp)  # 这里out_target是label的实际宽高,对应于图片中的实际度量              labels = out_target.copy()             labels = np.array(labels)             out_target = np.array(out_target)              labels[:, 0] = out_target[:, 0] + padw             labels[:, 1] = out_target[:, 1] + padh             labels[:, 2] = out_target[:, 2] + padw             labels[:, 3] = out_target[:, 3] + padh             labels4.append(labels)          if labels4:             labels4 = np.concatenate(labels4, 0)             np.clip(labels4[:, :4], 0, 2 * s, out=labels4[:, :4])  # use with random_perspective         flag = np.array([1])         return img4, labels4, input_size, flag      def __getitem__(self, index):         """         Args:             index (int): Index          Returns:             (img, target) (tuple): target is a dictionary contains "bbox", "segmentation" or "keypoints",                 generated by the image's annotation. img is a PIL image.         """         coco = self.coco         img_id = self.img_ids[index]         img_path = coco.loadImgs(img_id)[0]["file_name"]         if not self.is_training:             img = Image.open(os.path.join(self.root, img_path)).convert("RGB")             return img, img_id          input_size = [640, 640]         if self.mosaic and random.random() < 0.5:             return self._mosaic_preprocess(index, input_size)         img = np.fromfile(os.path.join(self.root, img_path), dtype='int8')         ann_ids = coco.getAnnIds(imgIds=img_id)         target = coco.loadAnns(ann_ids)         # filter crowd annotations         if self.filter_crowd_anno:             annos = [anno for anno in target if anno["iscrowd"] == 0]         else:             annos = [anno for anno in target]          target = {}         boxes = [anno["bbox"] for anno in annos]         target["bboxes"] = boxes          classes = [anno["category_id"] for anno in annos]         classes = [self.cat_ids_to_continuous_ids[cl] for cl in classes]         target["labels"] = classes          bboxes = target['bboxes']         labels = target['labels']         out_target = []         for bbox, label in zip(bboxes, labels):             tmp = []             # convert to [x_min y_min x_max y_max]             bbox = self._convetTopDown(bbox)             tmp.extend(bbox)             tmp.append(int(label))             # tmp [x_min y_min x_max y_max, label]             out_target.append(tmp)         flag = np.array([0])         return img, out_target, input_size, flag      def __len__(self):         return len(self.img_ids)      def _convetTopDown(self, bbox):         x_min = bbox[0]         y_min = bbox[1]         w = bbox[2]         h = bbox[3]         return [x_min, y_min, x_min+w, y_min+h]  def create_yolo_dataset(image_dir, anno_path, batch_size, device_num, rank,                         config=args, is_training=True, shuffle=True):     """Create dataset for YOLOV5."""     cv2.setNumThreads(0)     ds.config.set_enable_shared_mem(True)     if is_training:         filter_crowd = True         remove_empty_anno = True     else:         filter_crowd = False         remove_empty_anno = False      yolo_dataset = COCOYoloDataset(root=image_dir, ann_file=anno_path, filter_crowd_anno=filter_crowd,                                    remove_images_without_annotations=remove_empty_anno, is_training=is_training)     distributed_sampler = DistributedSampler(len(yolo_dataset), device_num, rank, shuffle=shuffle)     yolo_dataset.size = len(distributed_sampler)     hwc_to_chw = HWC2CHW()      args.dataset_size = len(yolo_dataset)      cores = multiprocessing.cpu_count()     num_parallel_workers = int(cores / device_num)     # num_parallel_workers = 1     if is_training:         multi_scale_trans = MultiScaleTrans(args, device_num)         yolo_dataset.transforms = multi_scale_trans          dataset_column_names = ["image", "annotation", "input_size", "mosaic_flag"]         output_column_names = ["image", "annotation", "bbox1", "bbox2", "bbox3",                                "gt_box1", "gt_box2", "gt_box3"]         map1_out_column_names = ["image", "annotation", "size"]         map2_in_column_names = ["annotation", "size"]         map2_out_column_names = ["annotation", "bbox1", "bbox2", "bbox3",                                  "gt_box1", "gt_box2", "gt_box3"]          dataset = ds.GeneratorDataset(yolo_dataset, column_names=dataset_column_names, sampler=distributed_sampler,                                       python_multiprocessing=True, num_parallel_workers=min(4, num_parallel_workers))         dataset = dataset.map(operations=multi_scale_trans, input_columns=dataset_column_names,                               output_columns=map1_out_column_names, column_order=map1_out_column_names,                               num_parallel_workers=min(12, num_parallel_workers), python_multiprocessing=True)         dataset = dataset.map(operations=PreprocessTrueBox(args), input_columns=map2_in_column_names,                               output_columns=map2_out_column_names, column_order=output_column_names,                               num_parallel_workers=min(4, num_parallel_workers), python_multiprocessing=False)         mean = [m * 255 for m in [0.485, 0.456, 0.406]]         std = [s * 255 for s in [0.229, 0.224, 0.225]]         dataset = dataset.map([Normalize(mean, std), hwc_to_chw],                               num_parallel_workers=min(4, num_parallel_workers))          def concatenate(images):             images = np.concatenate((images[..., ::2, ::2], images[..., 1::2, ::2],                                      images[..., ::2, 1::2], images[..., 1::2, 1::2]), axis=0)             return images         dataset = dataset.map(operations=concatenate, input_columns="image",                               num_parallel_workers=min(4, num_parallel_workers))         dataset = dataset.batch(batch_size, num_parallel_workers=min(4, num_parallel_workers), drop_remainder=True)     else:         dataset = ds.GeneratorDataset(yolo_dataset, column_names=["image", "img_id"],                                       sampler=distributed_sampler)         compose_map_func = (lambda image, img_id: reshape_fn(image, img_id, args))         dataset = dataset.map(operations=compose_map_func, input_columns=["image", "img_id"],                               output_columns=["image", "image_shape", "img_id"],                               column_order=["image", "image_shape", "img_id"],                               num_parallel_workers=8)         dataset = dataset.map(operations=hwc_to_chw, input_columns=["image"], num_parallel_workers=8)         dataset = dataset.batch(batch_size, drop_remainder=True)     return dataset  2.4 模型实现 Backbone  (1)Focus结构:其中比较关键是切片操作。比如下图的切片示意图,4 * 4 * 3的图像切片后变成2 * 2 * 12的特征图  %E5%9B%BE%E7%89%87-4.png  (2)CSP结构: Yolov5中设计了两种CSP结构,CSP1_X结构应用于Backbone主干网络,另一种CSP2_X结构则应用于Neck中。  %E5%9B%BE%E7%89%87-3.png  """DarkNet model.""" import mindspore.nn as nn import mindspore.ops as ops  class Bottleneck(nn.Cell):     # Standard bottleneck     # ch_in, ch_out, shortcut, groups, expansion     def __init__(self, c1, c2, shortcut=True, e=0.5):         super(Bottleneck, self).__init__()         c_ = int(c2 * e)  # hidden channels         self.conv1 = Conv(c1, c_, 1, 1)         self.conv2 = Conv(c_, c2, 3, 1)         self.add = shortcut and c1 == c2      def construct(self, x):         c1 = self.conv1(x)         c2 = self.conv2(c1)         out = c2         if self.add:             out = x + out         return out  class BottleneckCSP(nn.Cell):     # CSP Bottleneck with 3 convolutions     def __init__(self, c1, c2, n=1, shortcut=True, e=0.5):         super(BottleneckCSP, self).__init__()         c_ = int(c2 * e)  # hidden channels         self.conv1 = Conv(c1, c_, 1, 1)         self.conv2 = Conv(c1, c_, 1, 1)         self.conv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)         self.m = nn.SequentialCell(             [Bottleneck(c_, c_, shortcut, e=1.0) for _ in range(n)])         self.concat = ops.Concat(axis=1)      def construct(self, x):         c1 = self.conv1(x)         c2 = self.m(c1)         c3 = self.conv2(x)         c4 = self.concat((c2, c3))         c5 = self.conv3(c4)          return c5  class SPP(nn.Cell):     # Spatial pyramid pooling layer used in YOLOv3-SPP     def __init__(self, c1, c2, k=(5, 9, 13)):         super(SPP, self).__init__()         c_ = c1 // 2  # hidden channels         self.conv1 = Conv(c1, c_, 1, 1)         self.conv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)          self.maxpool1 = nn.MaxPool2d(kernel_size=5, stride=1, pad_mode='same')         self.maxpool2 = nn.MaxPool2d(kernel_size=9, stride=1, pad_mode='same')         self.maxpool3 = nn.MaxPool2d(kernel_size=13, stride=1, pad_mode='same')         self.concat = ops.Concat(axis=1)      def construct(self, x):         c1 = self.conv1(x)         m1 = self.maxpool1(c1)         m2 = self.maxpool2(c1)         m3 = self.maxpool3(c1)         c4 = self.concat((c1, m1, m2, m3))         c5 = self.conv2(c4)         return c5  class Focus(nn.Cell):     # Focus wh information into c-space     def __init__(self, c1, c2, k=1, s=1, p=None, act=True):         super(Focus, self).__init__()         self.conv = Conv(c1 * 4, c2, k, s, p, act)      def construct(self, x):         c1 = self.conv(x)         return c1  class SiLU(nn.Cell):     def __init__(self):         super(SiLU, self).__init__()         self.sigmoid = ops.Sigmoid()      def construct(self, x):         return x * self.sigmoid(x)  def auto_pad(k, p=None):  # kernel, padding     # Pad to 'same'     if p is None:         p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad     return p  class Conv(nn.Cell):     # Standard convolution     def __init__(self, c1, c2, k=1, s=1, p=None,                  dilation=1,                  alpha=0.1,                  momentum=0.97,                  eps=1e-3,                  pad_mode="same",                  act=True):  # ch_in, ch_out, kernel, stride, padding         super(Conv, self).__init__()         self.padding = auto_pad(k, p)         self.pad_mode = None         if self.padding == 0:             self.pad_mode = 'same'         elif self.padding == 1:             self.pad_mode = 'pad'         self.conv = nn.Conv2d(             c1,             c2,             k,             s,             padding=self.padding,             pad_mode=self.pad_mode,             has_bias=False)         self.bn = nn.BatchNorm2d(c2, momentum=momentum, eps=eps)         self.act = SiLU() if act is True else (             act if isinstance(act, nn.Cell) else ops.Identity())      def construct(self, x):         return self.act(self.bn(self.conv(x)))  class YOLOv5Backbone(nn.Cell):     def __init__(self, shape):         super(YOLOv5Backbone, self).__init__()         self.focus = Focus(shape[0], shape[1], k=3, s=1)         self.conv1 = Conv(shape[1], shape[2], k=3, s=2)         self.CSP1 = BottleneckCSP(shape[2], shape[2], n=1 * shape[6])         self.conv2 = Conv(shape[2], shape[3], k=3, s=2)         self.CSP2 = BottleneckCSP(shape[3], shape[3], n=3 * shape[6])         self.conv3 = Conv(shape[3], shape[4], k=3, s=2)         self.CSP3 = BottleneckCSP(shape[4], shape[4], n=3 * shape[6])         self.conv4 = Conv(shape[4], shape[5], k=3, s=2)         self.spp = SPP(shape[5], shape[5], k=[5, 9, 13])         self.CSP4 = BottleneckCSP(shape[5], shape[5], n=1 * shape[6], shortcut=False)      def construct(self, x):         """construct method"""         c1 = self.focus(x)         c2 = self.conv1(c1)         c3 = self.CSP1(c2)         c4 = self.conv2(c3)         # out         c5 = self.CSP2(c4)         c6 = self.conv3(c5)         # out         c7 = self.CSP3(c6)         c8 = self.conv4(c7)         c9 = self.spp(c8)         # out         c10 = self.CSP4(c9)         return c5, c7, c10  2.4 模型实现 Neck  这是Pytorch版本的YOLOv5代码中对neck部分的配置文件,从配置文件中可以看出Neck部分的组件较为单一,基本上就由CBS(ConV)、Upsample、Concat和不带shortcut的CSP(C3)组成:    [[-1, 1, Conv, [512, 1, 1]],    [-1, 1, nn.Upsample, [None, 2, 'nearest']],    [[-1, 6], 1, Concat, [1]],  # cat backbone P4    [-1, 3, C3, [512, False]],  # 13     [-1, 1, Conv, [256, 1, 1]],    [-1, 1, nn.Upsample, [None, 2, 'nearest']],    [[-1, 4], 1, Concat, [1]],  # cat backbone P3    [-1, 3, C3, [256, False]],  # 17 (P3/8-small)     [-1, 1, Conv, [256, 3, 2]],    [[-1, 14], 1, Concat, [1]],  # cat head P4    [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)     [-1, 1, Conv, [512, 3, 2]],    [[-1, 10], 1, Concat, [1]],  # cat head P5    [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)   ] > 另外,Neck的网络结构设计也是沿用了FPN+PAN的结构。FPN就是使用一种 自顶向下的侧边连接在所有尺度上构建出高级语义特征图,构造了特征金字塔的经典结构;PAN的结构也不稀奇,对于PAN,底层的目标信息已经非常模糊了,因此PAN又加入了自底向上的路线,弥补并加强了定位信息  jupyter  #  model import mindspore as ms import mindspore.nn as nn import mindspore.ops as ops  class YoloBlock(nn.Cell):      def __init__(self, in_channels, out_channels):         super(YoloBlock, self).__init__()          self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, has_bias=True)      def construct(self, x):         """construct method"""          out = self.conv(x)         return out  class YOLO(nn.Cell):     def __init__(self, backbone, shape):         super(YOLO, self).__init__()         self.backbone = backbone         self.out_channel = (80 + 5) * 3          self.conv1 = Conv(shape[5], shape[4], k=1, s=1)         self.CSP5 = BottleneckCSP(shape[5], shape[4], n=1*shape[6], shortcut=False)         self.conv2 = Conv(shape[4], shape[3], k=1, s=1)         self.CSP6 = BottleneckCSP(shape[4], shape[3], n=1*shape[6], shortcut=False)         self.conv3 = Conv(shape[3], shape[3], k=3, s=2)         self.CSP7 = BottleneckCSP(shape[4], shape[4], n=1*shape[6], shortcut=False)         self.conv4 = Conv(shape[4], shape[4], k=3, s=2)         self.CSP8 = BottleneckCSP(shape[5], shape[5], n=1*shape[6], shortcut=False)         self.back_block1 = YoloBlock(shape[3], self.out_channel)         self.back_block2 = YoloBlock(shape[4], self.out_channel)         self.back_block3 = YoloBlock(shape[5], self.out_channel)          self.concat = ops.Concat(axis=1)      def construct(self, x):                  img_height = x.shape[2] * 2         img_width = x.shape[3] * 2          feature_map1, feature_map2, feature_map3 = self.backbone(x)          c1 = self.conv1(feature_map3)         ups1 = ops.ResizeNearestNeighbor((img_height // 16, img_width // 16))(c1)         c2 = self.concat((ups1, feature_map2))         c3 = self.CSP5(c2)         c4 = self.conv2(c3)         ups2 = ops.ResizeNearestNeighbor((img_height // 8, img_width // 8))(c4)         c5 = self.concat((ups2, feature_map1))         # out         c6 = self.CSP6(c5)         c7 = self.conv3(c6)          c8 = self.concat((c7, c4))         # out         c9 = self.CSP7(c8)         c10 = self.conv4(c9)         c11 = self.concat((c10, c1))         # out         c12 = self.CSP8(c11)         small_object_output = self.back_block1(c6)         medium_object_output = self.back_block2(c9)         big_object_output = self.back_block3(c12)         return small_object_output, medium_object_output, big_object_output  2.4 模型构建 backbone+Neck+Head  Backbone作用:特征提取  Neck作用:对特征进行一波混合与组合,并且把这些特征传递给预测层  Head作用:进行最终的预测输出  jupyter  # backbone+Neck+Head  class DetectionBlock(nn.Cell):      def __init__(self, scale, is_training=True):         super(DetectionBlock, self).__init__()         if scale == 's':             idx = (0, 1, 2)             self.scale_x_y = 1.2             self.offset_x_y = 0.1         elif scale == 'm':             idx = (3, 4, 5)             self.scale_x_y = 1.1             self.offset_x_y = 0.05         elif scale == 'l':             idx = (6, 7, 8)             self.scale_x_y = 1.05             self.offset_x_y = 0.025         else:             raise KeyError("Invalid scale value for DetectionBlock")         self.anchors = ms.Tensor([args.anchor_scales[i] for i in idx], ms.float32)         self.num_anchors_per_scale = 3         self.num_attrib = 4+1+args.num_classes         self.lambda_coord = 1          self.sigmoid = nn.Sigmoid()         self.reshape = ops.Reshape()         self.tile = ops.Tile()         self.concat = ops.Concat(axis=-1)         self.pow = ops.Pow()         self.transpose = ops.Transpose()         self.exp = ops.Exp()         self.conf_training = is_training      def construct(self, x, input_shape):         """construct method"""         num_batch = x.shape[0]         grid_size = x.shape[2:4]          # Reshape and transpose the feature to [n, grid_size[0], grid_size[1], 3, num_attrib]         prediction = self.reshape(x, (num_batch,                                       self.num_anchors_per_scale,                                       self.num_attrib,                                       grid_size[0],                                       grid_size[1]))         prediction = self.transpose(prediction, (0, 3, 4, 1, 2))          grid_x = ms.numpy.arange(grid_size[1])         grid_y = ms.numpy.arange(grid_size[0])         # Tensor of shape [grid_size[0], grid_size[1], 1, 1] representing the coordinate of x/y axis for each grid         # [batch, gridx, gridy, 1, 1]         grid_x = self.tile(self.reshape(grid_x, (1, 1, -1, 1, 1)), (1, grid_size[0], 1, 1, 1))         grid_y = self.tile(self.reshape(grid_y, (1, -1, 1, 1, 1)), (1, 1, grid_size[1], 1, 1))         # Shape is [grid_size[0], grid_size[1], 1, 2]         grid = self.concat((grid_x, grid_y))          box_xy = prediction[:, :, :, :, :2]         box_wh = prediction[:, :, :, :, 2:4]         box_confidence = prediction[:, :, :, :, 4:5]         box_probs = prediction[:, :, :, :, 5:]          # gridsize1 is x         # gridsize0 is y         box_xy = (self.scale_x_y * self.sigmoid(box_xy) - self.offset_x_y + grid) / \                  ops.cast(ops.tuple_to_array((grid_size[1], grid_size[0])), ms.float32)         # box_wh is w->h         box_wh = self.exp(box_wh) * self.anchors / input_shape          box_confidence = self.sigmoid(box_confidence)         box_probs = self.sigmoid(box_probs)          if self.conf_training:             return prediction, box_xy, box_wh         return self.concat((box_xy, box_wh, box_confidence, box_probs))  class YOLOV5(nn.Cell):     """     YOLOV5 network.      Args:         is_training: Bool. Whether train or not.      Returns:         Cell, cell instance of YOLOV5 neural network.      Examples:         YOLOV5s(True)     """      def __init__(self, is_training, version=0):         super(YOLOV5, self).__init__()          # YOLOv5 network         self.shape = args.input_shape[version]         self.feature_map = YOLO(backbone=YOLOv5Backbone(shape=self.shape), shape=self.shape)          # prediction on the default anchor boxes         self.detect_1 = DetectionBlock('l', is_training=is_training)         self.detect_2 = DetectionBlock('m', is_training=is_training)         self.detect_3 = DetectionBlock('s', is_training=is_training)      def construct(self, x, input_shape):         small_object_output, medium_object_output, big_object_output = self.feature_map(x)         output_big = self.detect_1(big_object_output, input_shape)         output_me = self.detect_2(medium_object_output, input_shape)         output_small = self.detect_3(small_object_output, input_shape)         # big is the final output which has smallest feature map         return output_big, output_me, output_small  2.5 损失定义  YOLOv5一共有三种损失函数:  分类损失cls_loss:计算锚框与对应的标定分类是否正确 定位损失box_loss:预测框与标定框之间的误差(GIoU) 置信度损失obj_loss:计算网络的置信度 总的损失函数=分类损失+定位损失+置信度损失  分类损失cls_loss和置信度损失obj_loss使用的是二元交叉熵损失函数BCEWithLogitsLoss计算。  BCEwithlogitsloss = BCELoss + Sigmoid。  定位损失box_loss使用的是GIoU loss(可以的话,也可以替换成CIoU,EIoU,SIoU等IoU损失) > IOU损失     IOU Loss的定义是先求出预测框和真实框之间的交集和并集之比,再求负对数,但是在实际使用中我们常常将IOU Loss写成1-IOU。如果两个框重合则交并比等于1,Loss为0说明重合度非常高。IOU满足非负性、同一性、对称性、三角不等性,相比于L1/L2等损失函数还具有尺度不变性,不论box的尺度大小,输出的iou损失总是在0-1之间。所以能够较好的反映预测框与真实框的检测效果。     IOU的公式如下: jupyter     IOU的图示如下:     jupyter     普通IOU的优缺点很明显,优点:1、IOU具有尺度不变性2、满足非负性。同时,由于IOU并没有考虑框之间的距离,所以它的作为loss函数的时候也有相应的缺点:1、在A框与B框不重合的时候IOU为0,不能正确反映两者的距离大小。2. IoU无法精确的反映两者的重合度大小。    GIOU是为克服IOU的缺点同时充分利用优点而提出的.(论文:Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression)        GIOU的公式如下:     jupyter     GIOU的图示如下:   jupyter     计算过程如下:1.假设A为预测框,B为真实框,S是所有框的集合,2.不管A与B是否相交,C是包含A与B的最小框(包含A与B的最小凸闭合框),C也属于S集合,3.首先计算IoU,A与B的交并比,4.再计算C框中没有A与B的面积,比上C框面积;5.IoU减去前面算出的比;得到GIoU。  # loss  class ConfidenceLoss(nn.Cell):     """Loss for confidence."""      def __init__(self):         super(ConfidenceLoss, self).__init__()         self.cross_entropy = ops.SigmoidCrossEntropyWithLogits()         self.reduce_sum = ops.ReduceSum()      def construct(self, object_mask, predict_confidence, ignore_mask):         confidence_loss = self.cross_entropy(predict_confidence, object_mask)         confidence_loss = object_mask * confidence_loss + (1 - object_mask) * confidence_loss * ignore_mask         confidence_loss = self.reduce_sum(confidence_loss, ())         return confidence_loss  class ClassLoss(nn.Cell):     """Loss for classification."""      def __init__(self):         super(ClassLoss, self).__init__()         self.cross_entropy = ops.SigmoidCrossEntropyWithLogits()         self.reduce_sum = ops.ReduceSum()      def construct(self, object_mask, predict_class, class_probs):         class_loss = object_mask * self.cross_entropy(predict_class, class_probs)         class_loss = self.reduce_sum(class_loss, ())         return class_loss  class Iou(nn.Cell):     """Calculate the iou of boxes"""     def __init__(self):         super(Iou, self).__init__()         self.min = ops.Minimum()         self.max = ops.Maximum()         self.squeeze = ops.Squeeze(-1)      def construct(self, box1, box2):         """         box1: pred_box [batch, gx, gy, anchors, 1,      4] ->4: [x_center, y_center, w, h]         box2: gt_box   [batch, 1,  1,  1,       maxbox, 4]         convert to topLeft and rightDown         """         box1_xy = box1[:, :, :, :, :, :2]         box1_wh = box1[:, :, :, :, :, 2:4]         box1_mins = box1_xy - box1_wh / ops.scalar_to_array(2.0) # topLeft         box1_maxs = box1_xy + box1_wh / ops.scalar_to_array(2.0) # rightDown          box2_xy = box2[:, :, :, :, :, :2]         box2_wh = box2[:, :, :, :, :, 2:4]         box2_mins = box2_xy - box2_wh / ops.scalar_to_array(2.0)         box2_maxs = box2_xy + box2_wh / ops.scalar_to_array(2.0)          intersect_mins = self.max(box1_mins, box2_mins)         intersect_maxs = self.min(box1_maxs, box2_maxs)         intersect_wh = self.max(intersect_maxs - intersect_mins, ops.scalar_to_array(0.0))         # self.squeeze: for effiecient slice         intersect_area = self.squeeze(intersect_wh[:, :, :, :, :, 0:1]) * \                          self.squeeze(intersect_wh[:, :, :, :, :, 1:2])         box1_area = self.squeeze(box1_wh[:, :, :, :, :, 0:1]) * \                     self.squeeze(box1_wh[:, :, :, :, :, 1:2])         box2_area = self.squeeze(box2_wh[:, :, :, :, :, 0:1]) * \                     self.squeeze(box2_wh[:, :, :, :, :, 1:2])         iou = intersect_area / (box1_area + box2_area - intersect_area)         # iou : [batch, gx, gy, anchors, maxboxes]         return iou      class GIou(nn.Cell):     """Calculating giou"""     def __init__(self):         super(GIou, self).__init__()         self.reshape = ops.Reshape()         self.min = ops.Minimum()         self.max = ops.Maximum()         self.concat = ops.Concat(axis=1)         self.mean = ops.ReduceMean()         self.div = ops.RealDiv()         self.eps = 0.000001      def construct(self, box_p, box_gt):         """construct method"""         box_p_area = (box_p[..., 2:3] - box_p[..., 0:1]) * (box_p[..., 3:4] - box_p[..., 1:2])         box_gt_area = (box_gt[..., 2:3] - box_gt[..., 0:1]) * (box_gt[..., 3:4] - box_gt[..., 1:2])         x_1 = self.max(box_p[..., 0:1], box_gt[..., 0:1])         x_2 = self.min(box_p[..., 2:3], box_gt[..., 2:3])         y_1 = self.max(box_p[..., 1:2], box_gt[..., 1:2])         y_2 = self.min(box_p[..., 3:4], box_gt[..., 3:4])         intersection = (y_2 - y_1) * (x_2 - x_1)         xc_1 = self.min(box_p[..., 0:1], box_gt[..., 0:1])         xc_2 = self.max(box_p[..., 2:3], box_gt[..., 2:3])         yc_1 = self.min(box_p[..., 1:2], box_gt[..., 1:2])         yc_2 = self.max(box_p[..., 3:4], box_gt[..., 3:4])         c_area = (xc_2 - xc_1) * (yc_2 - yc_1)         union = box_p_area + box_gt_area - intersection         union = union + self.eps         c_area = c_area + self.eps         iou = self.div(ops.cast(intersection, ms.float32), ops.cast(union, ms.float32))         res_mid0 = c_area - union         res_mid1 = self.div(ops.cast(res_mid0, ms.float32), ops.cast(c_area, ms.float32))         giou = iou - res_mid1         giou = ops.clip_by_value(giou, -1.0, 1.0)         return giou  def xywh2x1y1x2y2(box_xywh):     boxes_x1 = box_xywh[..., 0:1] - box_xywh[..., 2:3] / 2     boxes_y1 = box_xywh[..., 1:2] - box_xywh[..., 3:4] / 2     boxes_x2 = box_xywh[..., 0:1] + box_xywh[..., 2:3] / 2     boxes_y2 = box_xywh[..., 1:2] + box_xywh[..., 3:4] / 2     boxes_x1y1x2y2 = ops.Concat(-1)((boxes_x1, boxes_y1, boxes_x2, boxes_y2))      return boxes_x1y1x2y2  class YoloLossBlock(nn.Cell):     """     Loss block cell of YOLOV5 network.     """     def __init__(self, scale, config=args):         super(YoloLossBlock, self).__init__()         self.config = config         if scale == 's':             # anchor mask             idx = (0, 1, 2)         elif scale == 'm':             idx = (3, 4, 5)         elif scale == 'l':             idx = (6, 7, 8)         else:             raise KeyError("Invalid scale value for DetectionBlock")         self.anchors = ms.Tensor([self.config.anchor_scales[i] for i in idx], ms.float32)         self.ignore_threshold = ms.Tensor(self.config.ignore_threshold, ms.float32)         self.concat = ops.Concat(axis=-1)         self.iou = Iou()         self.reduce_max = ops.ReduceMax(keep_dims=False)         self.confidence_loss = ConfidenceLoss()         self.class_loss = ClassLoss()          self.reduce_sum = ops.ReduceSum()         self.select = ops.Select()         self.equal = ops.Equal()         self.reshape = ops.Reshape()         self.expand_dims = ops.ExpandDims()         self.ones_like = ops.OnesLike()         self.log = ops.Log()         self.tuple_to_array = ops.TupleToArray()         self.g_iou = GIou()      def construct(self, prediction, pred_xy, pred_wh, y_true, gt_box, input_shape):         """         prediction : origin output from yolo         pred_xy: (sigmoid(xy)+grid)/grid_size         pred_wh: (exp(wh)*anchors)/input_shape         y_true : after normalize         gt_box: [batch, maxboxes, xyhw] after normalize         """         object_mask = y_true[:, :, :, :, 4:5]         class_probs = y_true[:, :, :, :, 5:]         true_boxes = y_true[:, :, :, :, :4]          grid_shape = prediction.shape[1:3]         grid_shape = ops.cast(self.tuple_to_array(grid_shape[::-1]), ms.float32)          pred_boxes = self.concat((pred_xy, pred_wh))         true_wh = y_true[:, :, :, :, 2:4]         true_wh = self.select(self.equal(true_wh, 0.0),                               self.ones_like(true_wh),                               true_wh)         true_wh = self.log(true_wh / self.anchors * input_shape)         # 2-w*h for large picture, use small scale, since small obj need more precise         box_loss_scale = 2 - y_true[:, :, :, :, 2:3] * y_true[:, :, :, :, 3:4]          gt_shape = gt_box.shape         gt_box = self.reshape(gt_box, (gt_shape[0], 1, 1, 1, gt_shape[1], gt_shape[2]))          # add one more dimension for broadcast         iou = self.iou(self.expand_dims(pred_boxes, -2), gt_box)         # gt_box is x,y,h,w after normalize         # [batch, grid[0], grid[1], num_anchor, num_gt]         best_iou = self.reduce_max(iou, -1)         # [batch, grid[0], grid[1], num_anchor]          # ignore_mask IOU too small         ignore_mask = best_iou < self.ignore_threshold         ignore_mask = ops.cast(ignore_mask, ms.float32)         ignore_mask = self.expand_dims(ignore_mask, -1)         # ignore_mask backpro will cause a lot maximunGrad and minimumGrad time consume.         # so we turn off its gradient         ignore_mask = ops.stop_gradient(ignore_mask)          confidence_loss = self.confidence_loss(object_mask, prediction[:, :, :, :, 4:5], ignore_mask)         class_loss = self.class_loss(object_mask, prediction[:, :, :, :, 5:], class_probs)          object_mask_me = self.reshape(object_mask, (-1, 1))  # [8, 72, 72, 3, 1]         box_loss_scale_me = self.reshape(box_loss_scale, (-1, 1))         pred_boxes_me = xywh2x1y1x2y2(pred_boxes)         pred_boxes_me = self.reshape(pred_boxes_me, (-1, 4))         true_boxes_me = xywh2x1y1x2y2(true_boxes)         true_boxes_me = self.reshape(true_boxes_me, (-1, 4))         c_iou = self.g_iou(pred_boxes_me, true_boxes_me)         c_iou_loss = object_mask_me * box_loss_scale_me * (1 - c_iou)         c_iou_loss_me = self.reduce_sum(c_iou_loss, ())         loss = c_iou_loss_me * 4 + confidence_loss + class_loss         batch_size = prediction.shape[0]         return loss / batch_size  class YoloWithLossCell(nn.Cell):     """YOLOV5 loss."""     def __init__(self, network):         super(YoloWithLossCell, self).__init__()         self.yolo_network = network         self.config = args         self.loss_big = YoloLossBlock('l', self.config)         self.loss_me = YoloLossBlock('m', self.config)         self.loss_small = YoloLossBlock('s', self.config)         self.tenser_to_array = ops.TupleToArray()      def construct(self, x, y_true_0, y_true_1, y_true_2, gt_0, gt_1, gt_2, input_shape):         input_shape = x.shape[2:4]         input_shape = ops.cast(self.tenser_to_array(input_shape) * 2, ms.float32)          yolo_out = self.yolo_network(x, input_shape)         loss_l = self.loss_big(*yolo_out[0], y_true_0, gt_0, input_shape)         loss_m = self.loss_me(*yolo_out[1], y_true_1, gt_1, input_shape)         loss_s = self.loss_small(*yolo_out[2], y_true_2, gt_2, input_shape)         return loss_l + loss_m + loss_s * 0.2 
  • [其他] YoloX(4)
    3.12 训练相关函数 #------------------------# # train func #------------------------# set_seed(888)  def set_default():     """ set default """     if config.enable_modelarts:         config.data_root = os.path.join(config.data_dir, 'coco2017/train2017')         config.annFile = os.path.join(config.data_dir, 'coco2017/annotations')         outputs_dir = os.path.join(config.outputs_dir, config.ckpt_path)     else:         config.data_root = os.path.join(config.data_dir, 'train2017')         config.annFile = os.path.join(config.data_dir, 'annotations/instances_train2017.json')         outputs_dir = config.ckpt_path      # logger      config.outputs_dir = os.path.join(outputs_dir, datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))     config.logger = get_logger(config.outputs_dir, config.rank)     config.logger.save_args(config)  def set_graph_kernel_context():     if context.get_context("device_target") == "GPU":         context.set_context(enable_graph_kernel=True)         context.set_context(graph_kernel_flags="--enable_parallel_fusion "                                                "--enable_trans_op_optimize "                                                "--disable_cluster_ops=ReduceMax,Reshape "                                                "--enable_expand_ops=Conv2D")  def network_init(cfg):     """ Network init """     device_id = int(os.getenv('DEVICE_ID', '0'))     context.set_context(mode=context.GRAPH_MODE,                         device_target=cfg.device_target, save_graphs=cfg.save_graphs, device_id=device_id,                         save_graphs_path="ir_path")     set_graph_kernel_context()      profiler = None     if cfg.need_profiler:         profiling_dir = os.path.join(cfg.outputs_dir,                                      datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))         profiler = Profiler(output_path=profiling_dir, is_detail=True, is_show_op_path=True)      # init distributed     cfg.use_syc_bn = False     if cfg.is_distributed:         cfg.use_syc_bn = True         init()         cfg.rank = get_rank()         cfg.group_size = get_group_size()         context.reset_auto_parallel_context()         context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True,                                           device_num=cfg.group_size)      # select for master rank save ckpt or all rank save, compatible for model parallel     cfg.rank_save_ckpt_flag = 0     if cfg.is_save_on_master:         if cfg.rank == 0:             cfg.rank_save_ckpt_flag = 1     else:         cfg.rank_save_ckpt_flag = 1      # logger     cfg.outputs_dir = os.path.join(cfg.ckpt_path,                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))     cfg.logger = get_logger(cfg.outputs_dir, cfg.rank)     cfg.logger.save_args(cfg)     return profiler  def parallel_init(args):     context.reset_auto_parallel_context()     parallel_mode = ParallelMode.STAND_ALONE     degree = 1     if args.is_distributed:         parallel_mode = ParallelMode.DATA_PARALLEL         degree = get_group_size()     context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=degree)  def modelarts_pre_process():     '''modelarts pre process function.'''      def unzip(zip_file, save_dir):         import zipfile         s_time = time.time()         if not os.path.exists(os.path.join(save_dir, config.modelarts_dataset_unzip_name)):             zip_isexist = zipfile.is_zipfile(zip_file)             if zip_isexist:                 fz = zipfile.ZipFile(zip_file, 'r')                 data_num = len(fz.namelist())                 print("Extract Start...")                 print("unzip file num: {}".format(data_num))                 data_print = int(data_num / 100) if data_num > 100 else 1                 i = 0                 for file in fz.namelist():                     if i % data_print == 0:                         print("unzip percent: {}%".format(int(i * 100 / data_num)), flush=True)                     i += 1                     fz.extract(file, save_dir)                 print("cost time: {}min:{}s.".format(int((time.time() - s_time) / 60),                                                      int(int(time.time() - s_time) % 60)))                 print("Extract Done.")             else:                 print("This is not zip.")         else:             print("Zip has been extracted.")      if config.need_modelarts_dataset_unzip:         zip_file_1 = os.path.join(config.data_path, config.modelarts_dataset_unzip_name + ".zip")         save_dir_1 = os.path.join(config.data_path)          sync_lock = "/tmp/unzip_sync.lock"          # Each server contains 8 devices as most.         if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):             print("Zip file path: ", zip_file_1)             print("Unzip file save dir: ", save_dir_1)             unzip(zip_file_1, save_dir_1)             print("===Finish extract data synchronization===")             try:                 os.mknod(sync_lock)             except IOError:                 pass          while True:             if os.path.exists(sync_lock):                 break             time.sleep(1)          print("Device: {}, Finish sync unzip data from {} to {}.".format(get_device_id(), zip_file_1, save_dir_1))      config.ckpt_path = os.path.join(config.output_path, config.ckpt_path)  def parser_init():     parser = argparse.ArgumentParser(description='Yolox train.')     parser.add_argument('--data_url', required=False, default=None, help='Location of data.')     parser.add_argument('--train_url', required=False, default=None, help='Location of training outputs.')     parser.add_argument('--backbone', required=False, default="yolox_darknet53")     parser.add_argument('--min_lr_ratio', required=False, default=0.05)     parser.add_argument('--data_aug', required=False, default=True)     return parser  def get_val_dataset():     val_root = os.path.join(config.data_dir, 'val2017')     ann_file = os.path.join(config.data_dir, 'annotations/instances_val2017.json')     ds_test = create_yolox_dataset(val_root, ann_file, is_training=False, batch_size=config.per_batch_size,                                    device_num=config.group_size,                                    rank=config.rank)     config.logger.info("Finish loading the val dataset!")     return ds_test  def get_optimizer(cfg, network, lr):     param_group = get_param_groups(network, cfg.weight_decay)     if cfg.opt == "SGD":         from mindspore.nn import SGD         opt = SGD(params=param_group, learning_rate=Tensor(lr), momentum=config.momentum, nesterov=True)         cfg.logger.info("Use SGD Optimizer")     else:         from mindspore.nn import Momentum         opt = Momentum(params=param_group,                        learning_rate=Tensor(lr),                        momentum=cfg.momentum,                        use_nesterov=True)         cfg.logger.info("Use Momentum Optimizer")     return opt  def load_resume_checkpoint(cfg, network, ckpt_path):     param_dict = load_checkpoint(ckpt_path)      ema_train_weight = []     ema_moving_weight = []     param_load = {}     for key, param in param_dict.items():         if key.startswith("network.") or key.startswith("moments."):             param_load[key] = param         elif "updates" in key:             cfg.updates = param             network.updates = cfg.updates             config.logger.info("network_ema updates:%s" % network.updates.asnumpy().item())     load_param_into_net(network, param_load)      for key, param in network.parameters_and_names():         if key.startswith("ema.") and "moving_mean" not in key and "moving_variance" not in key:             ema_train_weight.append(param_dict[key])         elif key.startswith("ema.") and ("moving_mean" in key or "moving_variance" in key):             ema_moving_weight.append(param_dict[key])      if network.ema:         if ema_train_weight and ema_moving_weight:             network.ema_weight = ParameterTuple(ema_train_weight)             network.ema_moving_weight = ParameterTuple(ema_moving_weight)             config.logger.info("successful loading ema weights") 4 运行 4.1 训练@moxing_wrapper(pre_process=modelarts_pre_process) def run_train(train_stage='stage_1', profiler=None):     """ Launch Train process """     parser = parser_init()     args_opt, _ = parser.parse_known_args()     if not config.data_aug:  # Train the last no data augment epochs         config.use_l1 = True  # Add L1 loss         config.max_epoch = config.total_epoch - config.max_epoch         config.lr_scheduler = "no_aug_lr"  # fix the min lr for last no data aug epochs     if config.enable_modelarts:         import moxing as mox         local_data_url = os.path.join(config.data_path, str(config.rank))         local_annFile = os.path.join(config.data_path, str(config.rank))         mox.file.copy_parallel(config.data_root, local_data_url)         config.data_dir = os.path.join(config.data_path, 'coco2017')         mox.file.copy_parallel(config.annFile, local_annFile)         config.annFile = os.path.join(local_data_url, 'instances_train2017.json')     if config.backbone == "yolox_darknet53":         backbone = "yolofpn"     else:         backbone = "yolopafpn"     base_network = DetectionBlock(config, backbone=backbone)     if config.pretrained:         base_network = load_backbone(base_network, config.pretrained, config)     config.logger.info('Training backbone is: %s' % config.backbone)     if config.use_syc_bn:         config.logger.info("Using Synchronized batch norm layer...")         use_syc_bn(base_network)     default_recurisive_init(base_network)     config.logger.info("Network weights have been initialized...")     network = YOLOLossCell(base_network, config)     config.logger.info('Finish getting network...')     config.data_root = os.path.join(config.data_dir, 'train2017')     config.annFile = os.path.join(config.data_dir, 'annotations/instances_train2017.json')     ds = create_yolox_dataset(image_dir=config.data_root, anno_path=config.annFile, batch_size=config.per_batch_size,                               device_num=config.group_size, rank=config.rank, data_aug=config.data_aug)     ds_test = get_val_dataset()     config.logger.info('Finish loading training dataset! batch size:%s' % config.per_batch_size)     config.steps_per_epoch = ds.get_dataset_size()     config.logger.info('%s steps for one epoch.' % config.steps_per_epoch)     if config.ckpt_interval <= 0:         config.ckpt_interval = 1     lr = get_lr(config)     config.logger.info("Learning rate scheduler:%s, base_lr:%s, min lr ratio:%s" % (config.lr_scheduler, config.lr,                                                                                     config.min_lr_ratio))     opt = get_optimizer(config, network, lr)     loss_scale_manager = DynamicLossScaleManager(init_loss_scale=2 ** 22)     update_cell = loss_scale_manager.get_update_cell()     network_ema = TrainOneStepWithEMA(network, opt, update_cell,                                       ema=True, decay=0.9998, updates=config.updates).set_train()     if config.resume_yolox:         resume_steps = config.updates.asnumpy().items()         config.resume_epoch = resume_steps // config.steps_per_epoch         lr = lr[resume_steps:]         opt = get_optimizer(config, network, lr)         network_ema = TrainOneStepWithEMA(network, opt, update_cell,                                           ema=True, decay=0.9998, updates=resume_steps).set_train()         load_resume_checkpoint(config, network_ema, config.resume_yolox)     if not config.data_aug:         if os.path.isfile(config.yolox_no_aug_ckpt):  # Loading the resume checkpoint for the last no data aug epochs             load_resume_checkpoint(config, network_ema, config.yolox_no_aug_ckpt)             config.logger.info("Finish load the resume checkpoint, begin to train the last...")         else:             raise FileNotFoundError('{} not exist or not a pre-trained file'.format(config.yolox_no_aug_ckpt))     config.logger.info("Add ema model")     model = Model(network_ema, amp_level="O0")     cb = []     save_ckpt_path = None     if config.rank_save_ckpt_flag:         cb.append(EMACallBack(network_ema, config.steps_per_epoch))         ckpt_config = CheckpointConfig(save_checkpoint_steps=config.steps_per_epoch * config.ckpt_interval,                                        keep_checkpoint_max=config.ckpt_max_num)         save_ckpt_path = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/')         cb.append(ModelCheckpoint(config=ckpt_config, directory=save_ckpt_path, prefix='{}'.format(config.backbone)))     cb.append(YOLOXCB(config.logger, config.steps_per_epoch, lr=lr, save_ckpt_path=save_ckpt_path,                       is_modelart=config.enable_modelarts,                       per_print_times=config.log_interval, train_url=args_opt.train_url))     if config.run_eval:         test_block = DetectionBlock(config, backbone=backbone)         cb.append(             EvalCallBack(ds_test, test_block, network_ema, DetectionEngine(config), config,                          interval=config.eval_interval))     if config.need_profiler:         model.train(3, ds, callbacks=cb, dataset_sink_mode=True, sink_size=config.log_interval)         profiler.analyse()     else:         config.logger.info("Epoch number:%s" % config.max_epoch)         config.logger.info("All steps number:%s" % (config.max_epoch * config.steps_per_epoch))         config.logger.info("==================Start Training " + train_stage + "=========================")         model.train(config.max_epoch, ds, callbacks=cb, dataset_sink_mode=False, sink_size=-1)     config.logger.info("==================Training END " + train_stage + "======================")     mindspore.save_checkpoint(network_ema, os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt'))     config.yolox_no_aug_ckpt = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt')     config.val_ckpt = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt')     config.pred_ckpt = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt') 4.2 验证 #------------------------# # eval func #------------------------# def run_eval():     """The function of eval"""     config.data_root = os.path.join(config.data_dir, 'val2017')     config.annFile = os.path.join(config.data_dir, 'annotations/instances_val2017.json')      # logger     config.outputs_dir = os.path.join(         config.log_path, datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')     )     rank_id = int(os.getenv('RANK_ID', '0'))     config.logger = get_logger(config.outputs_dir, rank_id)      context.reset_auto_parallel_context()     parallel_mode = ParallelMode.STAND_ALONE     context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=1)     # ------------------network create----------------------------------------------------------------------------     config.logger.info('Begin Creating Network....')     if config.backbone == "yolox_darknet53":         backbone = "yolofpn"     else:         backbone = "yolopafpn"     network = DetectionBlock(config, backbone=backbone)  # default yolo-darknet53     default_recurisive_init(network)     config.logger.info(config.val_ckpt)     if os.path.isfile(config.val_ckpt):         param_dict = load_checkpoint(config.val_ckpt)         ema_param_dict = {}         for param in param_dict:             if param.startswith("ema."):                 new_name = param.split("ema.")[1]                 data = param_dict[param]                 data.name = new_name                 ema_param_dict[new_name] = data          load_param_into_net(network, ema_param_dict)         config.logger.info('load model %s success', config.val_ckpt)     else:         config.logger.info('%s doesn''t exist or is not a pre-trained file', config.val_ckpt)         raise FileNotFoundError('{} not exist or not a pre-trained file'.format(config.val_ckpt))     data_root = config.data_root     anno_file = config.annFile     ds = create_yolox_dataset(data_root, anno_file, is_training=False, batch_size=config.per_batch_size, device_num=1,                               rank=rank_id)     data_size = ds.get_dataset_size()     config.logger.info(         'Finish loading the dataset, totally %s images to eval, iters %s' % (data_size * config.per_batch_size, \                                                                                  data_size))     network.set_train(False)     # init detection engine     detection = DetectionEngine(config)     config.logger.info('Start inference...')     for _, data in enumerate(             tqdm(ds.create_dict_iterator(num_epochs=1), total=data_size,                  colour="GREEN")):         image = data['image']         img_info = data['image_shape']         img_id = data['img_id']         prediction = network(image)         prediction = prediction.asnumpy()         img_shape = img_info.asnumpy()         img_id = img_id.asnumpy()         detection.detection(prediction, img_shape, img_id)      config.logger.info('Calculating mAP...')     result_file_path = detection.evaluate_prediction()     config.logger.info('result file path: %s', result_file_path)     eval_result, _ = detection.get_eval_result()     eval_print_str = '\n=============coco eval result=========\n' + eval_result     config.logger.info(eval_print_str)  4.3 测试 #------------------------# # pred func(to be fixed) #------------------------# def run_pred():     if not os.path.exists(config.pred_output):         os.makedirs(config.pred_output)      context.reset_auto_parallel_context()     parallel_mode = ParallelMode.STAND_ALONE     context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=1)     if config.backbone == "yolox_darknet53":         backbone = "yolofpn"     else:         backbone = "yolopafpn"     network = DetectionBlock(config, backbone=backbone)      default_recurisive_init(network)      if os.path.isfile(config.pred_ckpt):         param_dict = load_checkpoint(config.pred_ckpt)         ema_param_dict = {}         for param in param_dict:             if param.startswith("ema."):                 new_name = param.split("ema.")[1]                 data = param_dict[param]                 data.name = new_name                 ema_param_dict[new_name] = data          load_param_into_net(network, ema_param_dict)     else:         raise FileNotFoundError('{} not exist or not a pre-trained file'.format(config.pred_ckpt))      pred_transform = ValTransform(legacy=False)          data_list = os.listdir(config.pred_input)     prediction_engine = PredictionEngine(config=config)     network.set_train(False)     for image_name in tqdm(data_list):         image_path = os.path.join(config.pred_input, image_name)         image = np.array(cv2.imread(image_path))         r = min(config.input_size[0] / image.shape[0], config.input_size[1] / image.shape[1])         image_data = cv2.resize(             image,             (int(image.shape[1] * r), int(image.shape[0] * r)),             interpolation=cv2.INTER_LINEAR,         ).astype(np.float32)         image_data, _ = pred_transform(image_data, config.input_size)         image_data = np.expand_dims(image_data,0)         image_data = Tensor(image_data)         output = network(image_data).asnumpy()                  mask = prediction_engine.prediction(output, image.shape).astype(image.dtype)          if not mask is None:             pred_image = cv2.addWeighted(image,1,mask,0.3,0)             cv2.imwrite(os.path.join(config.pred_output, image_name), pred_image)  #------------------------# # process train #------------------------# def run():     set_default()     profiler = network_init(config)     parallel_init(config)     config.data_aug = True     run_train('stage_1', profiler)     config.data_aug = False     run_train('stage_2', profiler)          run_eval()      #run_pred()  if __name__ == "__main__":     run() (一大堆结果)
  • [其他] YoloX(3)
    3.10 训练与评估相关函数 针对学习率相关的函数 回调机制,保存训练纪录和验证纪录,更新EMA权重 DetectionEngine与PredictionEngine分别作用于验证和测试模块 #------------------------# # lr and callback utils  #------------------------# def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):     """Linear learning rate."""     lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)     lr = float(init_lr) + lr_inc * current_step     return lr  def warmup_step_lr(lr, lr_epochs, steps_per_epoch, warmup_epochs, max_epoch, gamma=0.1):     """Warmup step learning rate."""     base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)     milestones = lr_epochs     milestones_steps = []     for milestone in milestones:         milestones_step = milestone * steps_per_epoch         milestones_steps.append(milestones_step)      lr_each_step = []     lr = base_lr     milestones_steps_counter = Counter(milestones_steps)     for i in range(total_steps):         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             lr = lr * gamma ** milestones_steps_counter[i]         lr_each_step.append(lr)      return np.array(lr_each_step).astype(np.float32)  def multi_step_lr(lr, milestones, steps_per_epoch, max_epoch, gamma=0.1):     return warmup_step_lr(lr, milestones, steps_per_epoch, 0, max_epoch, gamma=gamma)  def step_lr(lr, epoch_size, steps_per_epoch, max_epoch, gamma=0.1):     lr_epochs = []     for i in range(1, max_epoch):         if i % epoch_size == 0:             lr_epochs.append(i)     return multi_step_lr(lr, lr_epochs, steps_per_epoch, max_epoch, gamma=gamma)  def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch, t_max, eta_min=0):     """Cosine annealing learning rate."""     base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)      lr_each_step = []     for i in range(total_steps):         last_epoch = i // steps_per_epoch         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max)) / 2         lr_each_step.append(lr)      return np.array(lr_each_step).astype(np.float32)  def yolox_warm_cos_lr(         lr,         steps_per_epoch,         warmup_epochs,         max_epoch,         no_aug_epochs,         warmup_lr_start=0,         min_lr_ratio=0.05 ):     """Cosine learning rate with warm up."""     base_lr = lr     min_lr = lr * min_lr_ratio     total_iters = int(max_epoch * steps_per_epoch)     warmup_total_iters = int(warmup_epochs * steps_per_epoch)     no_aug_iter = no_aug_epochs * steps_per_epoch     lr_each_step = []     for i in range(total_iters):         if i < warmup_total_iters:             lr = (base_lr - warmup_lr_start) * pow(                 (i + 1) / float(warmup_total_iters), 2             ) + warmup_lr_start         elif i >= total_iters - no_aug_iter:             lr = min_lr         else:             lr = min_lr + 0.5 * (base_lr - min_lr) * (1.0 + math.cos(                 math.pi * (i - warmup_total_iters) / (total_iters - warmup_total_iters - no_aug_iter)))         lr_each_step.append(lr)     return np.array(lr_each_step).astype(np.float32)  def warmup_cosine_annealing_lr_v2(lr, steps_per_epoch, warmup_epochs, max_epoch, t_max, eta_min=0):     """Cosine annealing learning rate V2."""     base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)      last_lr = 0     last_epoch_v1 = 0      t_max_v2 = int(max_epoch * 1 / 3)      lr_each_step = []     for i in range(total_steps):         last_epoch = i // steps_per_epoch         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             if i < total_steps * 2 / 3:                 lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max)) / 2                 last_lr = lr                 last_epoch_v1 = last_epoch             else:                 base_lr = last_lr                 last_epoch = last_epoch - last_epoch_v1                 lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max_v2)) / 2          lr_each_step.append(lr)     return np.array(lr_each_step).astype(np.float32)  def warmup_cosine_annealing_lr_sample(lr, steps_per_epoch, warmup_epochs, max_epoch, t_max, eta_min=0):     """Warmup cosine annealing learning rate."""     start_sample_epoch = 60     step_sample = 2     tobe_sampled_epoch = 60     end_sampled_epoch = start_sample_epoch + step_sample * tobe_sampled_epoch     max_sampled_epoch = max_epoch + tobe_sampled_epoch     t_max = max_sampled_epoch      base_lr = lr     warmup_init_lr = 0     total_steps = int(max_epoch * steps_per_epoch)     total_sampled_steps = int(max_sampled_epoch * steps_per_epoch)     warmup_steps = int(warmup_epochs * steps_per_epoch)      lr_each_step = []      for i in range(total_sampled_steps):         last_epoch = i // steps_per_epoch         if last_epoch in range(start_sample_epoch, end_sampled_epoch, step_sample):             continue         if i < warmup_steps:             lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)         else:             lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max)) / 2         lr_each_step.append(lr)      assert total_steps == len(lr_each_step)     return np.array(lr_each_step).astype(np.float32)  def yolox_no_aug_lr(base_lr, steps_per_epoch, max_epoch, min_lr_ratio=0.05):     total_iters = int(max_epoch * steps_per_epoch)     lr = base_lr * min_lr_ratio     lr_each_step = []     for _ in range(total_iters):         lr_each_step.append(lr)     return np.array(lr_each_step).astype(np.float32)  def get_lr(args):     """generate learning rate."""     if args.lr_scheduler == 'exponential':         lr = warmup_step_lr(args.lr,                             args.lr_epochs,                             args.steps_per_epoch,                             args.warmup_epochs,                             args.max_epoch,                             gamma=args.lr_gamma,                             )     elif args.lr_scheduler == 'cosine_annealing':         lr = warmup_cosine_annealing_lr(args.lr,                                         args.steps_per_epoch,                                         args.warmup_epochs,                                         args.max_epoch,                                         args.t_max,                                         args.eta_min)     elif args.lr_scheduler == 'cosine_annealing_V2':         lr = warmup_cosine_annealing_lr_v2(args.lr,                                            args.steps_per_epoch,                                            args.warmup_epochs,                                            args.max_epoch,                                            args.t_max,                                            args.eta_min)     elif args.lr_scheduler == 'cosine_annealing_sample':         lr = warmup_cosine_annealing_lr_sample(args.lr,                                                args.steps_per_epoch,                                                args.warmup_epochs,                                                args.max_epoch,                                                args.t_max,                                                args.eta_min)     elif args.lr_scheduler == 'yolox_warm_cos_lr':         lr = yolox_warm_cos_lr(lr=args.lr,                                steps_per_epoch=args.steps_per_epoch,                                warmup_epochs=args.warmup_epochs,                                max_epoch=args.total_epoch,                                no_aug_epochs=args.no_aug_epochs,                                min_lr_ratio=args.min_lr_ratio)     elif args.lr_scheduler == 'no_aug_lr':         lr = yolox_no_aug_lr(             args.lr,             args.steps_per_epoch,             args.max_epoch,             min_lr_ratio=args.min_lr_ratio         )     else:         raise NotImplementedError(args.lr_scheduler)     return lr  def get_param_groups(network, weight_decay):     """Param groups for optimizer."""     decay_params = []     no_decay_params = []     for x in network.trainable_params():         parameter_name = x.name         if parameter_name.endswith('.bias'):             # all bias not using weight decay             no_decay_params.append(x)         elif parameter_name.endswith('.gamma'):             # bn weight bias not using weight decay, be carefully for now x not include BN             no_decay_params.append(x)         elif parameter_name.endswith('.beta'):             # bn weight bias not using weight decay, be carefully for now x not include BN             no_decay_params.append(x)         else:             decay_params.append(x)      return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params, 'weight_decay': weight_decay}]  def load_backbone(net, ckpt_path, args):     """Load darknet53 backbone checkpoint."""     param_dict = load_checkpoint(ckpt_path)     load_param_into_net(net, param_dict)      param_not_load = []     for _, param in net.parameters_and_names():         if param.name in param_dict:             pass         else:             param_not_load.append(param.name)     args.logger.info("not loading param is :", len(param_not_load))     return net  class AverageMeter:     """Computes and stores the average and current value"""      def __init__(self, name, fmt=':f', tb_writer=None):         self.name = name         self.fmt = fmt         self.reset()         self.tb_writer = tb_writer         self.cur_step = 1         self.val = 0         self.avg = 0         self.sum = 0         self.count = 0      def reset(self):         self.val = 0         self.avg = 0         self.sum = 0         self.count = 0      def update(self, val, n=1):         self.val = val         self.sum += val * n         self.count += n         self.avg = self.sum / self.count         if self.tb_writer is not None:             self.tb_writer.add_scalar(self.name, self.val, self.cur_step)         self.cur_step += 1      def __str__(self):         print("loss update----------------------------------------------------------------------")         fmtstr = '{name}:{avg' + self.fmt + '}'         return fmtstr.format(**self.__dict__)  def keep_loss_fp32(network):     """Keep loss of network with float32"""     for _, cell in network.cells_and_names():         if isinstance(cell, (YOLOLossCell,)):             cell.to_float(mstype.float32)  class EMACallBack(Callback):      def __init__(self, network, steps_per_epoch, cur_steps=0):         self.steps_per_epoch = steps_per_epoch         self.cur_steps = cur_steps         self.network = network      def on_train_epoch_begin(self, run_context):         if self.network.ema:             if not isinstance(self.network.ema_moving_weight, list):                 tmp_moving = []                 for weight in self.network.ema_moving_weight:                     tmp_moving.append(weight.asnumpy())                 self.network.ema_moving_weight = tmp_moving      def on_train_step_end(self, run_context):         if self.network.ema:             self.network.moving_parameter_update()             self.cur_steps += 1              if self.cur_steps % self.steps_per_epoch == 0:                 if isinstance(self.network.ema_moving_weight, list):                     tmp_moving = []                     moving_name = []                     idx = 0                     for key in self.network.moving_name:                         moving_name.append(key)                      for weight in self.network.ema_moving_weight:                         param = Parameter(Tensor(weight), name=moving_name[idx])                         tmp_moving.append(param)                         idx += 1                     self.network.ema_moving_weight = ParameterTuple(tmp_moving)  class YOLOXCB(Callback):     """     YOLOX Callback.     """      def __init__(self, logger, step_per_epoch, lr, save_ckpt_path, is_modelart=False, per_print_times=1,                  train_url=None):         super(YOLOXCB, self).__init__()         self.train_url = train_url         if not isinstance(per_print_times, int) or per_print_times < 0:             raise ValueError("print_step must be int and >= 0.")         self._per_print_times = per_print_times         self.lr = lr         self.is_modelarts = is_modelart         self.step_per_epoch = step_per_epoch         self.current_step = 0         self.save_ckpt_path = save_ckpt_path         self.iter_time = time.time()         self.epoch_start_time = time.time()         self.average_loss = []         self.logger = logger      def on_train_epoch_begin(self, run_context):         """         Called before each epoch beginning.          Args:             run_context (RunContext): Include some information of the model.         """         self.epoch_start_time = time.time()         self.iter_time = time.time()      def on_train_epoch_end(self, run_context):         """         Called after each epoch finished.          Args:             run_context (RunContext): Include some information of the model.         """         cb_params = run_context.original_args()         cur_epoch = cb_params.cur_epoch_num         loss = cb_params.net_outputs         loss = "loss: %.4f, overflow: %s, scale: %s" % (float(loss[0].asnumpy()),                                                         bool(loss[1].asnumpy()),                                                         int(loss[2].asnumpy()))         self.logger.info(             "epoch: %s epoch time %.2fs %s" % (cur_epoch, time.time() - self.epoch_start_time, loss))          if self.current_step % (self.step_per_epoch * 1) == 0:             if self.is_modelarts:                 import moxing as mox                 if self.save_ckpt_path and self.train_url:                     mox.file.copy_parallel(src_url=self.save_ckpt_path, dst_url=self.train_url)                     cur_epoch = self.current_step // self.step_per_epoch                     self.logger.info(                         "[epoch {}]copy ckpt from{} to {}".format(self.save_ckpt_path, cur_epoch, self.train_url))      def on_train_step_begin(self, run_context):         """         Called before each step beginning.          Args:             run_context (RunContext): Include some information of the model.         """      def on_train_step_end(self, run_context):         """         Called after each step finished.          Args:             run_context (RunContext): Include some information of the model.         """          cur_epoch_step = (self.current_step + 1) % self.step_per_epoch         if cur_epoch_step % self._per_print_times == 0 and cur_epoch_step != 0:             cb_params = run_context.original_args()             cur_epoch = cb_params.cur_epoch_num             loss = cb_params.net_outputs             loss = "loss: %.4f, overflow: %s, scale: %s" % (float(loss[0].asnumpy()),                                                             bool(loss[1].asnumpy()),                                                             int(loss[2].asnumpy()))             self.logger.info("epoch: %s step: [%s/%s], %s, lr: %.6f, avg step time: %.2f ms" % (                 cur_epoch, cur_epoch_step, self.step_per_epoch, loss, self.lr[self.current_step],                 (time.time() - self.iter_time) * 1000 / self._per_print_times))             self.iter_time = time.time()         self.current_step += 1      def on_train_end(self, run_context):         """         Called once after network training.          Args:             run_context (RunContext): Include some information of the model.         """  class EvalCallBack(Callback):     def __init__(self, dataset, test_net, train_net, detection, config, start_epoch=0, interval=1):         self.dataset = dataset         self.network = train_net         self.test_network = test_net         self.detection = detection         self.logger = config.logger         self.start_epoch = start_epoch         self.interval = interval         self.max_epoch = config.max_epoch         self.best_result = 0         self.best_epoch = 0         self.rank = config.rank      def load_ema_parameter(self):         param_dict = {}         for name, param in self.network.parameters_and_names():             if name.startswith("ema."):                 new_name = name.split('ema.')[-1]                 param_new = param.clone()                 param_new.name = new_name                 param_dict[new_name] = param_new         load_param_into_net(self.test_network, param_dict)      def load_network_parameter(self):         param_dict = {}         for name, param in self.network.parameters_and_names():             if name.startswith("network."):                 param_new = param.clone()                 param_dict[name] = param_new         load_param_into_net(self.test_network, param_dict)      def epoch_end(self, run_context):         cb_param = run_context.original_args()         cur_epoch = cb_param.cur_epoch_num         if cur_epoch >= self.start_epoch:             if (cur_epoch - self.start_epoch) % self.interval == 0 or cur_epoch == self.max_epoch:                 self.load_network_parameter()                 self.test_network.set_train(False)                 eval_print_str, results = self.inference()                 if results >= self.best_result:                     self.best_result = results                     self.best_epoch = cur_epoch                     if os.path.exists('best.ckpt'):                         self.remove_ckpoint_file('best.ckpt')                     save_checkpoint(cb_param.train_network, 'best.ckpt')                     self.logger.info("Best result %s at %s epoch" % (self.best_result, self.best_epoch))                 self.logger.info(eval_print_str)                 self.logger.info('Ending inference...')      def end(self, run_context):         self.logger.info("Best result %s at %s epoch" % (self.best_result, self.best_epoch))      def inference(self):         self.logger.info('Start inference...')         self.logger.info("eval dataset size, %s" % self.dataset.get_dataset_size())         counts = 0         for data in self.dataset.create_dict_iterator(num_epochs=1):             image = data['image']             img_info = data['image_shape']             img_id = data['img_id']             prediction = self.test_network(image)             prediction = prediction.asnumpy()             img_shape = img_info.asnumpy()             img_id = img_id.asnumpy()             counts = counts + 1             self.detection.detection(prediction, img_shape, img_id)             self.logger.info('Calculating mAP...%s' % counts)          self.logger.info('Calculating mAP...%s' % counts)         result_file_path = self.detection.evaluate_prediction()         self.logger.info('result file path: %s', result_file_path)         eval_result, results = self.detection.get_eval_result()         if eval_result is not None and results is not None:             eval_print_str = '\n=============coco eval result=========\n' + eval_result             return eval_print_str, results         return None, 0      def remove_ckpoint_file(self, file_name):         """Remove the specified checkpoint file from this checkpoint manager and also from the directory."""         try:             os.chmod(file_name, stat.S_IWRITE)             os.remove(file_name)         except OSError:             self.logger.info("OSError, failed to remove the older ckpt file %s.", file_name)         except ValueError:             self.logger.info("ValueError, failed to remove the older ckpt file %s.", file_name)  class Redirct:     def __init__(self):         self.content = ""      def write(self, content):         self.content += content      def flush(self):         self.content = ""  class DetectionEngine:     """ Detection engine """      def __init__(self, config):         self.config = config         self.input_size = self.config.input_size         self.strides = self.config.fpn_strides  # [8, 16, 32]          self.expanded_strides = None         self.grids = None          self.num_classes = config.num_classes          self.conf_thre = config.conf_thre         self.nms_thre = config.nms_thre         self.annFile = os.path.join(config.data_dir, 'annotations/instances_val2017.json')         self._coco = COCO(self.annFile)         self._img_ids = list(sorted(self._coco.imgs.keys()))         self.coco_catIds = self._coco.getCatIds()         self.save_prefix = config.outputs_dir         self.file_path = ''          self.data_list = []      def detection(self, outputs, img_shape, img_ids):         # post process nms         outputs = self.postprocess(outputs, self.num_classes, self.conf_thre, self.nms_thre)         self.data_list.extend(self.convert_to_coco_format(outputs, info_imgs=img_shape, ids=img_ids))      def postprocess(self, prediction, num_classes, conf_thre=0.7, nms_thre=0.45, class_agnostic=False):         """ nms """         box_corner = np.zeros_like(prediction)         box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2         box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2         box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2         box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2         prediction[:, :, :4] = box_corner[:, :, :4]         output = [None for _ in range(len(prediction))]         for i, image_pred in enumerate(prediction):             if not image_pred.shape[0]:                 continue             # Get score and class with highest confidence             class_conf = np.max(image_pred[:, 5:5 + num_classes], axis=-1)  # (8400)             class_pred = np.argmax(image_pred[:, 5:5 + num_classes], axis=-1)  # (8400)             conf_mask = (image_pred[:, 4] * class_conf >= conf_thre).squeeze()  # (8400)             class_conf = np.expand_dims(class_conf, axis=-1)  # (8400, 1)             class_pred = np.expand_dims(class_pred, axis=-1).astype(np.float16)  # (8400, 1)             # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)             detections = np.concatenate((image_pred[:, :5], class_conf, class_pred), axis=1)             detections = detections[conf_mask]             if not detections.shape[0]:                 continue             if class_agnostic:                 nms_out_index = self._nms(detections[:, :4], detections[:, 4] * detections[:, 5], nms_thre)             else:                 nms_out_index = self._batch_nms(detections[:, :4], detections[:, 4] * detections[:, 5],                                                 detections[:, 6], nms_thre)             detections = detections[nms_out_index]             if output[i] is None:                 output[i] = detections             else:                 output[i] = np.concatenate((output[i], detections))         return output      def _nms(self, xyxys, scores, threshold):         """Calculate NMS"""         x1 = xyxys[:, 0]         y1 = xyxys[:, 1]         x2 = xyxys[:, 2]         y2 = xyxys[:, 3]         scores = scores         areas = (x2 - x1 + 1) * (y2 - y1 + 1)         order = scores.argsort()[::-1]         reserved_boxes = []         while order.size > 0:             i = order[0]             reserved_boxes.append(i)             max_x1 = np.maximum(x1[i], x1[order[1:]])             max_y1 = np.maximum(y1[i], y1[order[1:]])             min_x2 = np.minimum(x2[i], x2[order[1:]])             min_y2 = np.minimum(y2[i], y2[order[1:]])              intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1)             intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1)             intersect_area = intersect_w * intersect_h              ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area)             indexes = np.where(ovr <= threshold)[0]             order = order[indexes + 1]         return reserved_boxes      def _batch_nms(self, xyxys, scores, idxs, threshold, use_offset=True):         """Calculate Nms based on class info,Each index value correspond to a category,         and NMS will not be applied between elements of different categories."""         if use_offset:             max_coordinate = xyxys.max()             offsets = idxs * (max_coordinate + np.array([1]))             boxes_for_nms = xyxys + offsets[:, None]             keep = self._nms(boxes_for_nms, scores, threshold)             return keep         keep_mask = np.zeros_like(scores, dtype=np.bool_)         for class_id in np.unique(idxs):             curr_indices = np.where(idxs == class_id)[0]             curr_keep_indices = self._nms(xyxys[curr_indices], scores[curr_indices], threshold)             keep_mask[curr_indices[curr_keep_indices]] = True         keep_indices = np.where(keep_mask)[0]         return keep_indices[np.argsort(-scores[keep_indices])]      def convert_to_coco_format(self, outputs, info_imgs, ids):         """ convert to coco format """         data_list = []         for (output, img_h, img_w, img_id) in zip(                 outputs, info_imgs[:, 0], info_imgs[:, 1], ids         ):             if output is None:                 continue             bboxes = output[:, 0:4]             scale = min(                 self.input_size[0] / float(img_h), self.input_size[1] / float(img_w)             )              bboxes = bboxes / scale             bboxes[:, [0, 2]] = np.clip(bboxes[:, [0, 2]], 0, img_w)             bboxes[:, [1, 3]] = np.clip(bboxes[:, [1, 3]], 0, img_h)             bboxes = xyxy2xywh(bboxes)              cls = output[:, 6]             scores = output[:, 4] * output[:, 5]             for ind in range(bboxes.shape[0]):                 label = self.coco_catIds[int(cls[ind])]                 pred_data = {                     "image_id": int(img_id),                     "category_id": label,                     "bbox": bboxes[ind].tolist(),                     "score": scores[ind].item(),                     "segmentation": [],                 }  # COCO json format                 data_list.append(pred_data)         return data_list      def evaluate_prediction(self):         """ generate prediction coco json file """         print('Evaluate in main process...')         # write result to coco json format          t = datetime.datetime.now().strftime('_%Y_%m_%d_%H_%M_%S')         try:             self.file_path = self.save_prefix + '/predict' + t + '.json'             f = open(self.file_path, 'w')             json.dump(self.data_list, f)         except IOError as e:             raise RuntimeError("Unable to open json file to dump. What():{}".format(str(e)))         else:             f.close()             if not self.data_list:                 self.file_path = ''                 return self.file_path              self.data_list.clear()             return self.file_path      def get_eval_result(self):         """Get eval result"""         if not self.file_path:             return None, None          cocoGt = self._coco         cocoDt = cocoGt.loadRes(self.file_path)         cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')         cocoEval.evaluate()         cocoEval.accumulate()         rdct = Redirct()         stdout = sys.stdout         sys.stdout = rdct         cocoEval.summarize()         sys.stdout = stdout         return rdct.content, cocoEval.stats[0]  class PredictionEngine:     def __init__(self, config):         self.input_size = config.input_size          self.num_classes = config.num_classes          self.conf_thre = config.pred_conf_thre         self.nms_thre = config.pred_nms_thre                  self.class_names = self.get_classes(config.classes_path)          hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))         self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))      def prediction(self, outputs, image_shape):         outputs = self.postprocess(outputs, self.num_classes, self.conf_thre, self.nms_thre)                  if outputs[0] is None:             return None          top_label = outputs[0][:, 6].astype('int32')         top_conf = outputs[0][:, 4] * outputs[0][:, 5]         top_boxes = outputs[0][:, :4]         scale = min(self.input_size[0] / float(image_shape[0]), self.input_size[1] / float(image_shape[1]))         top_boxes = top_boxes / scale         top_boxes[:, [0, 2]] = np.clip(top_boxes[:, [0, 2]], 0, image_shape[1])         top_boxes[:, [1, 3]] = np.clip(top_boxes[:, [1, 3]], 0, image_shape[0])          info_mask = np.zeros((image_shape[0], image_shape[1], 3))         for i, c in list(enumerate(top_label)):             label_name = self.class_names[int(c)-1]#id start with 1             box = top_boxes[i]             score = top_conf[i]              left, top, right, bottom = box             top     = max(0, np.floor(top).astype('int32'))             left    = max(0, np.floor(left).astype('int32'))             bottom  = min(image_shape[1], np.floor(bottom).astype('int32'))             right   = min(image_shape[0], np.floor(right).astype('int32'))             cv2.rectangle(info_mask, (left, top), (right, bottom), self.colors[int(c)-1], 1)             text = "{}: {:.4f}".format(label_name, score)              cv2.putText(info_mask, text, (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, self.colors[int(c)-1], 1)         return info_mask      def postprocess(self, prediction, num_classes, conf_thre=0.7, nms_thre=0.45, class_agnostic=False):         """ nms """         box_corner = np.zeros_like(prediction)         box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2         box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2         box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2         box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2         prediction[:, :, :4] = box_corner[:, :, :4]         output = [None for _ in range(len(prediction))]         for i, image_pred in enumerate(prediction):             if not image_pred.shape[0]:                 continue             # Get score and class with highest confidence             class_conf = np.max(image_pred[:, 5:5 + num_classes], axis=-1)  # (8400)             class_pred = np.argmax(image_pred[:, 5:5 + num_classes], axis=-1)  # (8400)             conf_mask = (image_pred[:, 4] * class_conf >= conf_thre).squeeze()  # (8400)             class_conf = np.expand_dims(class_conf, axis=-1)  # (8400, 1)             class_pred = np.expand_dims(class_pred, axis=-1).astype(np.float16)  # (8400, 1)             # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)             detections = np.concatenate((image_pred[:, :5], class_conf, class_pred), axis=1)             detections = detections[conf_mask]             if not detections.shape[0]:                 continue             if class_agnostic:                 nms_out_index = self._nms(detections[:, :4], detections[:, 4] * detections[:, 5], nms_thre)             else:                 nms_out_index = self._batch_nms(detections[:, :4], detections[:, 4] * detections[:, 5],                                                 detections[:, 6], nms_thre)             detections = detections[nms_out_index]             if output[i] is None:                 output[i] = detections             else:                 output[i] = np.concatenate((output[i], detections))         return output      def _nms(self, xyxys, scores, threshold):         """Calculate NMS"""         x1 = xyxys[:, 0]         y1 = xyxys[:, 1]         x2 = xyxys[:, 2]         y2 = xyxys[:, 3]         scores = scores         areas = (x2 - x1 + 1) * (y2 - y1 + 1)         order = scores.argsort()[::-1]         reserved_boxes = []         while order.size > 0:             i = order[0]             reserved_boxes.append(i)             max_x1 = np.maximum(x1[i], x1[order[1:]])             max_y1 = np.maximum(y1[i], y1[order[1:]])             min_x2 = np.minimum(x2[i], x2[order[1:]])             min_y2 = np.minimum(y2[i], y2[order[1:]])              intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1)             intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1)             intersect_area = intersect_w * intersect_h              ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area)             indexes = np.where(ovr <= threshold)[0]             order = order[indexes + 1]         return reserved_boxes      def _batch_nms(self, xyxys, scores, idxs, threshold, use_offset=True):         """Calculate Nms based on class info,Each index value correspond to a category,         and NMS will not be applied between elements of different categories."""         if use_offset:             max_coordinate = xyxys.max()             offsets = idxs * (max_coordinate + np.array([1]))             boxes_for_nms = xyxys + offsets[:, None]             keep = self._nms(boxes_for_nms, scores, threshold)             return keep         keep_mask = np.zeros_like(scores, dtype=np.bool_)         for class_id in np.unique(idxs):             curr_indices = np.where(idxs == class_id)[0]             curr_keep_indices = self._nms(xyxys[curr_indices], scores[curr_indices], threshold)             keep_mask[curr_indices[curr_keep_indices]] = True         keep_indices = np.where(keep_mask)[0]         return keep_indices[np.argsort(-scores[keep_indices])]          def get_classes(self, classes_path):         with open(classes_path, encoding='utf-8') as f:             class_names = f.readlines()         class_names = [c.strip() for c in class_names]         return class_names  3.11 网络权重初始化 #------------------------# # network initialized  #------------------------# def calculate_gain(nonlinearity, param=None):     r"""Return the recommended gain value for the given nonlinearity function.     The values are as follows:      ================= ====================================================     nonlinearity      gain     ================= ====================================================     Linear / Identity :math:`1`     Conv{1,2,3}D      :math:`1`     Sigmoid           :math:`1`     Tanh              :math:`\frac{5}{3}`     ReLU              :math:`\sqrt{2}`     Leaky Relu        :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`     ================= ====================================================      Args:         nonlinearity: the non-linear function (`nn.functional` name)         param: optional parameter for the non-linear function      Examples:         >>> gain = nn.init.calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2     """     linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']     if nonlinearity in linear_fns or nonlinearity == 'sigmoid':         return 1     if nonlinearity == 'tanh':         return 5.0 / 3     if nonlinearity == 'relu':         return math.sqrt(2.0)     if nonlinearity == 'leaky_relu':         if param is None:             negative_slope = 0.01         elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):             # True/False are instances of int, hence check above             negative_slope = param         else:             raise ValueError("negative_slope {} not a valid number".format(param))         return math.sqrt(2.0 / (1 + negative_slope ** 2))      raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))  def _assignment(arr, num):     """Assign the value of 'num' and 'arr'."""     if arr.shape == ():         arr = arr.reshape((1))         arr[:] = num         arr = arr.reshape(())     else:         if isinstance(num, np.ndarray):             arr[:] = num[:]         else:             arr[:] = num     return arr  def _calculate_correct_fan(array, mode):     mode = mode.lower()     valid_modes = ['fan_in', 'fan_out']     if mode not in valid_modes:         raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))      fan_in, fan_out = _calculate_fan_in_and_fan_out(array)     return fan_in if mode == 'fan_in' else fan_out  def kaiming_uniform_(arr, a=0, mode='fan_in', nonlinearity='leaky_relu'):     r"""Fills the input `Tensor` with values according to the method     described in `Delving deep into rectifiers: Surpassing human-level     performance on ImageNet classification` - He, K. et al. (2015), using a     uniform distribution. The resulting tensor will have values sampled from     :math:`\mathcal{U}(-\text{bound}, \text{bound})` where      .. math::         \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}      Also known as He initialization.      Args:         tensor: an n-dimensional `Tensor`         a: the negative slope of the rectifier used after this layer (only         used with ``'leaky_relu'``)         mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``             preserves the magnitude of the variance of the weights in the             forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the             backwards pass.         nonlinearity: the non-linear function (`nn.functional` name),             recommended to use only with ``'relu'`` or ``'leaky_relu'`` (default).      Examples:         >>> w = np.empty(3, 5)         >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')     """     fan = _calculate_correct_fan(arr, mode)     gain = calculate_gain(nonlinearity, a)     std = gain / math.sqrt(fan)     bound = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation     return np.random.uniform(-bound, bound, arr.shape)  def _calculate_fan_in_and_fan_out(arr):     """Calculate fan in and fan out."""     dimensions = len(arr.shape)     if dimensions < 2:         raise ValueError("Fan in and fan out can not be computed for array with fewer than 2 dimensions")      num_input_fmaps = arr.shape[1]     num_output_fmaps = arr.shape[0]     receptive_field_size = 1     if dimensions > 2:         receptive_field_size = reduce(lambda x, y: x * y, arr.shape[2:])     fan_in = num_input_fmaps * receptive_field_size     fan_out = num_output_fmaps * receptive_field_size      return fan_in, fan_out  class KaimingUniform(MeInitializer):     """Kaiming uniform initializer."""      def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):         super(KaimingUniform, self).__init__()         self.a = a         self.mode = mode         self.nonlinearity = nonlinearity      def _initialize(self, arr):         tmp = kaiming_uniform_(arr, self.a, self.mode, self.nonlinearity)         _assignment(arr, tmp)  def default_recurisive_init(custom_cell, prior_prob=1e-2):     """Initialize parameter."""     for _, cell in custom_cell.cells_and_names():         if isinstance(cell, nn.Conv2d):             cell.weight.set_data(initializer.initializer(KaimingUniform(a=math.sqrt(5)),                                                   cell.weight.shape,                                                   cell.weight.dtype))             if cell.bias is not None:                 fan_in, _ = _calculate_fan_in_and_fan_out(cell.weight)                 bound = 1 / math.sqrt(fan_in)                 cell.bias.set_data(initializer.initializer(initializer.Uniform(bound),                                                     cell.bias.shape,                                                     cell.bias.dtype))                 if "cls_preds" in cell.bias.name or "obj_preds" in cell.bias.name:                     cell.bias.set_data(initializer.initializer(-math.log((1 - prior_prob) / prior_prob), cell.bias.shape,                                                         cell.bias.dtype))         elif isinstance(cell, nn.Dense):             cell.weight.set_data(initializer.initializer(KaimingUniform(a=math.sqrt(5)),                                                   cell.weight.shape,                                                   cell.weight.dtype))             if cell.bias is not None:                 fan_in, _ = _calculate_fan_in_and_fan_out(cell.weight)                 bound = 1 / math.sqrt(fan_in)                 cell.bias.set_data(initializer.initializer(initializer.Uniform(bound),                                                     cell.bias.shape,                                                     cell.bias.dtype))         elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d, nn.SyncBatchNorm)):             cell.momentum = 0.97             cell.eps = 0.001         else:             pass         initialize_head_biases(custom_cell, prior_prob=0.01)  def initialize_head_biases(network, prior_prob):     for name, cell in network.cells_and_names():         if name.endswith("cls_preds") or name.endswith("obj_preds"):             cell.bias.set_data(initializer.initializer(-math.log((1 - prior_prob) / prior_prob), cell.bias.shape,                                                 cell.bias.dtype))  def load_yolox_params(args, network):     """Load yolox darknet parameter from checkpoint."""     if args.pretrained_backbone:         network = load_backbone(network, args.pretrained_backbone, args)         args.logger.info('load pre-trained backbone {} into network'.format(args.pretrained_backbone))     else:         args.logger.info('Not load pre-trained backbone, please be careful')  def load_resume_params(args, network):     if args.resume_yolox:         args.logger.info('Start to load resume parameters...')         network = load_backbone(network, args.resume_yolox, args)         args.logger.info('resume finished')         args.logger.info('load_model {} success'.format(args.resume_yolox))     else:         args.logger.info('Not load resume!') 
  • [其他] YoloX(2)
     #------------------------# # darknet #------------------------# class Darknet(nn.Cell):     """ Darknet for yolox-darknet53 """     # number of block from dark2 to dark5.     depth2block = {21: [1, 2, 2, 1], 53: [2, 8, 8, 4]}      def __init__(             self,             depth,             in_channels=3,             stem_out_channels=32,             out_features=("dark3", "dark4", "dark5"),     ):         """         Args:             depth (int): depth of darknet used in model, usually use [21, 53] for this param.             in_channels (int): number of input channels, for example, use 3 for RGB image.             stem_out_channels (int): number of output channels of darknet stem.                 It decides channels of darknet layer2 to layer5.             out_features (Tuple[str]): desired output layer name.         """         super(Darknet, self).__init__()         assert out_features, "please provide output features of Darknet"         self.out_features = out_features         self.stem = nn.SequentialCell(             BaseConv(in_channels=in_channels, out_channels=stem_out_channels, ksize=3, stride=1, act="lrelu"),             *self.make_group_layer(stem_out_channels, num_blocks=1, stride=2),         )         in_channels = stem_out_channels * 2          num_blocks = Darknet.depth2block[depth]         # create darknet with `stem_out_channels` and `num_blocks` layers.         # to make model structure more clear, we don't use `for` statement in python.         self.dark2 = nn.SequentialCell(             *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[0], stride=2)         )         in_channels *= 2  # 128         self.dark3 = nn.SequentialCell(             *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[1], stride=2)         )         in_channels *= 2  # 256         self.dark4 = nn.SequentialCell(             *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[2], stride=2)         )         in_channels *= 2  # 512         self.dark5 = nn.SequentialCell(             *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[3], stride=2),             *self.make_spp_block([in_channels, in_channels * 2], in_channels * 2),         )      def make_group_layer(self, in_channels: int, num_blocks: int, stride: int = 1):         "starts with conv layer then has `num_blocks` `ResLayer`"         return [             BaseConv(in_channels, in_channels * 2, ksize=3, stride=stride, act="lrelu"),             *[(ResLayer(in_channels * 2)) for _ in range(num_blocks)],         ]      def make_spp_block(self, filters_list, in_filters):         """ spatial pyramid pooling block"""         m = nn.SequentialCell(             *[                 BaseConv(in_filters, filters_list[0], 1, stride=1, act="lrelu"),                 BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"),                 SPPBottleneck(                     in_channels=filters_list[1],                     out_channels=filters_list[0],                     activation="lrelu",                 ),                 BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"),                 BaseConv(filters_list[1], filters_list[0], 1, stride=1, act="lrelu"),             ]         )         return m      def construct(self, x):         """ forward """         outputs = {}         x = self.stem(x)         outputs["stem"] = x         x = self.dark2(x)         outputs["dark2"] = x         x = self.dark3(x)         outputs["dark3"] = x         x = self.dark4(x)         outputs["dark4"] = x         x = self.dark5(x)         outputs["dark5"] = x         return outputs["dark3"], outputs["dark4"], outputs["dark5"]  class CSPDarknet(nn.Cell):     """ Darknet with CSP block for yolox-s m l x"""      def __init__(             self,             dep_mul,             wid_mul,             out_features=("dark3", "dark4", "dark5"),             depthwise=False,             act="silu"     ):         super(CSPDarknet, self).__init__()         assert out_features, "please provide output features of Darknet"         self.out_features = out_features         Conv = DWConv if depthwise else BaseConv         base_channels = int(wid_mul * 64)         base_depth = max(round(dep_mul * 3), 1)          # stem         self.stem = Focus(3, base_channels, ksize=3, act=act)          # dark2         self.dark2 = nn.SequentialCell(             Conv(base_channels, base_channels * 2, 3, 2, act=act),             CSPLayer(                 base_channels * 2,                 base_channels * 2,                 n=base_depth,                 depthwise=depthwise,                 act=act,             ),         )          # dark3         self.dark3 = nn.SequentialCell(             Conv(base_channels * 2, base_channels * 4, 3, 2, act=act),             CSPLayer(                 base_channels * 4,                 base_channels * 4,                 n=base_depth * 3,                 depthwise=depthwise,                 act=act,             ),         )          # dark4         self.dark4 = nn.SequentialCell(             Conv(base_channels * 4, base_channels * 8, 3, 2, act=act),             CSPLayer(                 base_channels * 8,                 base_channels * 8,                 n=base_depth * 3,                 depthwise=depthwise,                 act=act,             ),         )          # dark5         self.dark5 = nn.SequentialCell(             Conv(base_channels * 8, base_channels * 16, 3, 2, act=act),             SPPBottleneck(base_channels * 16, base_channels * 16, activation=act),             CSPLayer(                 base_channels * 16,                 base_channels * 16,                 n=base_depth,                 shortcut=False,                 depthwise=depthwise,                 act=act,             ),         )      def construct(self, x):         """ forward """         outputs = {}         x = self.stem(x)         outputs["stem"] = x         x = self.dark2(x)         outputs["dark2"] = x         x = self.dark3(x)         outputs["dark3"] = x         x = self.dark4(x)         outputs["dark4"] = x         x = self.dark5(x)         outputs["dark5"] = x         return outputs["dark3"], outputs["dark4"], outputs["dark5"]  3.6.3 backbon+neck 两种结构,如下图所示:  YOLOFPN,采用Darknet为backbone,使用yolov3 baseline的Neck结构,都采用FPN结构进行融合  YOLOPAFPN, 在FPN基础上引入PAN结构   #------------------------# # YOLOFPN #------------------------# class YOLOFPN(nn.Cell):     """     YOLOFPN module, Darknet53 is the default backbone of this model     """      def __init__(self, input_w, input_h, depth=53, in_features=None):         super(YOLOFPN, self).__init__()         if in_features is None:             in_features = ["dark3", "dark4", "dark5"]         self.backbone = Darknet(depth)         self.in_features = in_features          # out 1         self.out1_cbl = self._make_cbl(512, 256, 1)         self.out1 = self._make_embedding([256, 512], 512 + 256)          # out 2         self.out2_cbl = self._make_cbl(256, 128, 1)         self.out2 = self._make_embedding([128, 256], 256 + 128)         # upsample         self.upsample0 = P.ResizeNearestNeighbor((input_h // 16, input_w // 16))         self.upsample1 = P.ResizeNearestNeighbor((input_h // 8, input_w // 8))      def _make_cbl(self, _in, _out, ks):         """ make cbl layer """         return BaseConv(_in, _out, ks, stride=1, act="lrelu")      def _make_embedding(self, filters_list, in_filters):         """ make embedding """         m = nn.SequentialCell(             *[                 self._make_cbl(in_filters, filters_list[0], 1),                 self._make_cbl(filters_list[0], filters_list[1], 3),                 self._make_cbl(filters_list[1], filters_list[0], 1),                 self._make_cbl(filters_list[0], filters_list[1], 3),                 self._make_cbl(filters_list[1], filters_list[0], 1),             ]         )         return m      def construct(self, inputs):         """ forward """         out_features = self.backbone(inputs)         x2, x1, x0 = out_features          #  yolo branch 1         x1_in = self.out1_cbl(x0)         x1_in = self.upsample0(x1_in)         x1_in = P.Concat(axis=1)([x1_in, x1])         out_dark4 = self.out1(x1_in)          #  yolo branch 2         x2_in = self.out2_cbl(out_dark4)         x2_in = self.upsample1(x2_in)         x2_in = P.Concat(axis=1)([x2_in, x2])         out_dark3 = self.out2(x2_in)         outputs = (out_dark3, out_dark4, x0)         return outputs  #------------------------# # YOLOPAFPN #------------------------# class YOLOPAFPN(nn.Cell):     """     YOLOv3 model. Darknet 53 is the default backbone of this model     """      def __init__(             self,             input_w,             input_h,             depth=1.0,             width=1.0,             in_features=("dark3", "dark4", "dark5"),             in_channels=None,             depthwise=False,             act="silu"     ):         super(YOLOPAFPN, self).__init__()         if in_channels is None:             in_channels = [256, 512, 1024]         self.input_w = input_w         self.input_h = input_h         self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)         self.in_features = in_features         self.in_channels = in_channels         Conv = DWConv if depthwise else BaseConv          self.upsample0 = P.ResizeNearestNeighbor((input_h // 16, input_w // 16))         self.upsample1 = P.ResizeNearestNeighbor((input_h // 8, input_w // 8))         self.lateral_conv0 = BaseConv(int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act)         self.C3_p4 = CSPLayer(             int(2 * in_channels[1] * width),             int(in_channels[1] * width),             round(3 * depth),             False,             depthwise=depthwise,             act=act         )         self.reduce_conv1 = BaseConv(             int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act         )         self.C3_p3 = CSPLayer(             int(2 * in_channels[0] * width),             int(in_channels[0] * width),             round(3 * depth),             False,             depthwise=depthwise,             act=act,         )         # bottom-up conv         self.bu_conv2 = Conv(             int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act         )         self.C3_n3 = CSPLayer(             int(2 * in_channels[0] * width),             int(in_channels[1] * width),             round(3 * depth),             False,             depthwise=depthwise,             act=act,         )          # bottom-up conv         self.bu_conv1 = Conv(             int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act         )         self.C3_n4 = CSPLayer(             int(2 * in_channels[1] * width),             int(in_channels[2] * width),             round(3 * depth),             False,             depthwise=depthwise,             act=act,         )         self.concat = P.Concat(axis=1)      def construct(self, inputs):         """         Args:             inputs: input images.          Returns:             Tuple[Tensor]: FPN feature.         """          x2, x1, x0 = self.backbone(inputs)         fpn_out0 = self.lateral_conv0(x0)  # 1024->512  /32         f_out0 = self.upsample0(fpn_out0)  # 512    /16         f_out0 = self.concat((f_out0, x1))  # 512->1024    /16         f_out0 = self.C3_p4(f_out0)  # 1024->512  /16          fpn_out1 = self.reduce_conv1(f_out0)  # 512->256  /16         f_out1 = self.upsample1(fpn_out1)  # 256  /8         f_out1 = self.concat((f_out1, x2))  # 256->512  /8         pan_out2 = self.C3_p3(f_out1)  # 512->256  /16          p_out1 = self.bu_conv2(pan_out2)  # 256->256  /16         p_out1 = self.concat((p_out1, fpn_out1))  # 256->512  /16         pan_out1 = self.C3_n3(p_out1)  # 512->512/16          p_out0 = self.bu_conv1(pan_out1)  # 512->512/32         p_out0 = self.concat((p_out0, fpn_out0))  # 512->1024/32         pan_out0 = self.C3_n4(p_out0)  # 1024->1024/32          return pan_out2, pan_out1, pan_out0 3.7 bbox iou计算相关 #------------------------# # bbox iou #------------------------# @constexpr def raise_bbox_error():     raise IndexError("Index error, shape of input must be 4!")  def bboxes_iou(bboxes_a, bboxes_b, xyxy=True):     """     calculate iou     Args:         bboxes_a:         bboxes_b:         xyxy:      Returns:      """     if bboxes_a.shape[1] != 4 or bboxes_b.shape[1] != 4:         raise_bbox_error()      if xyxy:         tl = P.Maximum()(bboxes_a[:, None, :2], bboxes_b[:, :2])          br = P.Minimum()(bboxes_a[:, None, 2:], bboxes_b[:, 2:])          area_a = bboxes_a[:, 2:] - bboxes_a[:, :2]         area_a = (area_a[:, 0:1] * area_a[:, 1:2]).squeeze(-1)          area_b = bboxes_b[:, 2:] - bboxes_b[:, :2]         area_b = (area_b[:, 0:1] * area_b[:, 1:2]).squeeze(-1)      else:         tl = P.Maximum()(             (bboxes_a[:, None, :2] - bboxes_a[:, None, 2:] / 2),             (bboxes_b[:, :2] - bboxes_b[:, 2:] / 2),         )         br = P.Minimum()(             (bboxes_a[:, None, :2] + bboxes_a[:, None, 2:] / 2),             (bboxes_b[:, :2] + bboxes_b[:, 2:] / 2),         )         area_a = (bboxes_a[:, 2:3] * bboxes_a[:, 3:4]).squeeze(-1)         area_b = (bboxes_b[:, 2:3] * bboxes_b[:, 3:4]).squeeze(-1)     en = (tl < br).astype(tl.dtype)     en = (en[..., 0:1] * en[..., 1:2]).squeeze(-1)     area_i = tl - br     area_i = (area_i[:, :, 0:1] * area_i[:, :, 1:2]).squeeze(-1) * en     return area_i / (area_a[:, None] + area_b - area_i)  def batch_bboxes_iou(batch_bboxes_a, batch_bboxes_b, xyxy=True):     """         calculate iou for one batch     Args:         batch_bboxes_a:         batch_bboxes_b:         xyxy:      Returns:      """     if batch_bboxes_a.shape[-1] != 4 or batch_bboxes_b.shape[-1] != 4:         raise_bbox_error()     ious = []     for i in range(len(batch_bboxes_a)):         if xyxy:             iou = bboxes_iou(batch_bboxes_a[i], batch_bboxes_b[i], True)         else:             iou = bboxes_iou(batch_bboxes_a[i], batch_bboxes_b[i], False)         iou = P.ExpandDims()(iou, 0)         ious.append(iou)     ious = P.Concat(axis=0)(ious)     return ious  3.8 模型、Loss相关 DetectionBlock为完整的yolox结构,用于声明后续训练声明网络结构  yololoss  ema指数移动平均,对模型权重进行加权平均,使其更加鲁棒  3.8.1 网络损失函数 和网络的预测结果一样,YOLOX网络的损失函数也由三个部分组成,分别是Reg部分、Obj部分和Cls部分。Reg部分是特征点的回归参数判断,Obj部分是特征点是否包含物体判断,Cls部分是特征点包含的物体的种类。  在YoloX中,物体的真实框落在哪些特征点内就由该特征点来预测。  对于每一个真实框需要求取所有特征点与它的空间位置情况,作为正样本的特征点需要满足以下几个特点:  1)特征点落在物体的真实框内;  2)特征点距离物体中心尽量要在一定半径内。  满足这两点保证了属于正样本的特征点会落在物体真实框内部,特征点中心与物体真实框中心要相近。但是这两个条件仅用作正样本的初步筛选,在YoloX中,使用了SimOTA方法进行动态的正样本数量分配。  在YoloX中,会计算一个Cost代价矩阵,代表每个真实框和每个特征点之间的代价关系,Cost代价矩阵由三个部分组成:  1)每个真实框和当前特征点预测框的重合程度;  2)每个真实框和当前特征点预测框的种类预测准确度;  3)每个真实框的中心是否落在了特征点的一定半径内。  Cost代价矩阵的目的是自适应的找到当前特征点应该去拟合的真实框,重合度越高越需要拟合,分类越准越需要拟合,在一定半径内越需要拟合。  在SimOTA中,不同目标设定不同的正样本数量(dynamic k),以旷视科技官方回答中的蚂蚁和西瓜为例子,传统的正样本分配方案常常为同一场景下的西瓜和蚂蚁分配同样的正样本数,那要么蚂蚁有很多低质量的正样本,要么西瓜仅仅只有一两个正样本,这样的结果对于哪个分配方式都是不合适的。  动态的正样本设置的关键在于如何确定k,SimOTA具体的做法是首先计算每个目标Cost最低的10特征点,然后把这十个特征点对应的预测框与真实框的IOU加起来求得最终的k。  因此,SimOTA的过程总结如下:  1)计算每个真实框和当前特征点预测框的重合程度;  2)计算将重合度最高的十个预测框与真实框的IOU加起来求得每个真实框的k,也就代表每个真实框有k个特征点与之对应;  3)计算每个真实框和当前特征点预测框的种类预测准确度;  4)判断真实框的中心是否落在了特征点的一定半径内;  5)计算Cost代价矩阵;  6)将Cost最低的k个点作为该真实框的正样本。  由前文所述可知,YoloX的损失由三个部分组成:  1.Reg部分,由SimOTA可以知道每个真实框对应的特征点,获取到每个框对应的特征点后,取出该特征点的预测框,利用真实框和预测框计算IOU损失,作为Reg部分的Loss组成。  2.Obj部分,由SimOTA可知道每个真实框对应的特征点,所有真实框对应的特征点都是正样本,剩余的特征点均为负样本,根据正负样本和特征点的是否包含物体的预测结果计算交叉熵损失,作为Obj部分的Loss组成。  3.Cls部分,由SimOTA可知道每个真实框对应的特征点,获取到每个框对应的特征点后,取出该特征点的种类预测结果,根据真实框的种类和特征点的种类预测结果计算交叉熵损失,作为Cls部分的Loss组成。  其中Cls和Obj部分采用的都是二值交叉熵损失(BCELoss),Reg部分采用的是IoULoss。值得注意的是,Cls和Reg部分只计算正样本的损失,而Obj既计算正样本也计算负样本的损失。  其中:  Lcls代表分类损失,Lreg代表定位损失,Lobj代表obj损失,λ代表定位损失的平衡系数,源码中设置是5.0,Npos代表被分为正样的Anchor Point数。  #------------------------# # yolox model #------------------------# class DetectionPerFPN(nn.Cell):     """ head  """      def __init__(self, num_classes, scale, in_channels=None, act="silu", width=1.0):         super(DetectionPerFPN, self).__init__()         if in_channels is None:             in_channels = [1024, 512, 256]         self.scale = scale         self.num_classes = num_classes         Conv = BaseConv         if scale == 's':             self.stem = BaseConv(in_channels=int(in_channels[0] * width), out_channels=int(256 * width), ksize=1,                                  stride=1, act=act)         elif scale == 'm':             self.stem = BaseConv(in_channels=int(in_channels[1] * width), out_channels=int(256 * width), ksize=1,                                  stride=1, act=act)         elif scale == 'l':             self.stem = BaseConv(in_channels=int(in_channels[2] * width), out_channels=int(256 * width), ksize=1,                                  stride=1, act=act)         else:             raise KeyError("Invalid scale value for DetectionBlock")          self.cls_convs = nn.SequentialCell(             [                 Conv(                     in_channels=int(256 * width),                     out_channels=int(256 * width),                     ksize=3,                     stride=1,                     act=act,                 ),                 Conv(                     in_channels=int(256 * width),                     out_channels=int(256 * width),                     ksize=3,                     stride=1,                     act=act,                 ),             ]         )         self.reg_convs = nn.SequentialCell(             [                 Conv(                     in_channels=int(256 * width),                     out_channels=int(256 * width),                     ksize=3,                     stride=1,                     act=act,                 ),                 Conv(                     in_channels=int(256 * width),                     out_channels=int(256 * width),                     ksize=3,                     stride=1,                     act=act,                 ),             ]         )         self.cls_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=self.num_classes, kernel_size=1, stride=1,                                    pad_mode="pad", has_bias=True)          self.reg_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=4, kernel_size=1, stride=1,                                    pad_mode="pad",                                    has_bias=True)          self.obj_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=1, kernel_size=1, stride=1,                                    pad_mode="pad",                                    has_bias=True)      def construct(self, x):         """ forward """         x = self.stem(x)         cls_x = x         reg_x = x         cls_feat = self.cls_convs(cls_x)          cls_output = self.cls_preds(cls_feat)          reg_feat = self.reg_convs(reg_x)         reg_output = self.reg_preds(reg_feat)         obj_output = self.obj_preds(reg_feat)          return cls_output, reg_output, obj_output  class DetectionBlock(nn.Cell):     """ connect yolox backbone and head """      def __init__(self, config, backbone="yolopafpn"):         super(DetectionBlock, self).__init__()         self.num_classes = config.num_classes         self.attr_num = self.num_classes + 5         self.depthwise = config.depth_wise         self.strides = Tensor([8, 16, 32], mindspore.float32)         self.input_size = config.input_size          # network         if backbone == "yolopafpn":             self.backbone = YOLOPAFPN(depth=1.33, width=1.25, input_w=self.input_size[1], input_h=self.input_size[0])             self.head_inchannels = [1024, 512, 256]             self.activation = "silu"             self.width = 1.25         else:             self.backbone = YOLOFPN(input_w=self.input_size[1], input_h=self.input_size[0])             self.head_inchannels = [512, 256, 128]             self.activation = "lrelu"             self.width = 1.0          self.head_l = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='l',                                       act=self.activation, width=self.width)         self.head_m = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='m',                                       act=self.activation, width=self.width)         self.head_s = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='s',                                       act=self.activation, width=self.width)      def construct(self, x):         """ forward """         outputs = []         x_l, x_m, x_s = self.backbone(x)         cls_output_l, reg_output_l, obj_output_l = self.head_l(x_l)  # (bs, 80, 80, 80)(bs, 4, 80, 80)(bs, 1, 80, 80)         cls_output_m, reg_output_m, obj_output_m = self.head_m(x_m)  # (bs, 80, 40, 40)(bs, 4, 40, 40)(bs, 1, 40, 40)         cls_output_s, reg_output_s, obj_output_s = self.head_s(x_s)  # (bs, 80, 20, 20)(bs, 4, 20, 20)(bs, 1, 20, 20)         if self.training:             output_l = P.Concat(axis=1)((reg_output_l, obj_output_l, cls_output_l))  # (bs, 85, 80, 80)             output_m = P.Concat(axis=1)((reg_output_m, obj_output_m, cls_output_m))  # (bs, 85, 40, 40)             output_s = P.Concat(axis=1)((reg_output_s, obj_output_s, cls_output_s))  # (bs, 85, 20, 20)              output_l = self.mapping_to_img(output_l, stride=self.strides[0])  # (bs, 6400, 85)x_c, y_c, w, h             output_m = self.mapping_to_img(output_m, stride=self.strides[1])  # (bs, 1600, 85)x_c, y_c, w, h             output_s = self.mapping_to_img(output_s, stride=self.strides[2])  # (bs,  400, 85)x_c, y_c, w, h          else:              output_l = P.Concat(axis=1)(                 (reg_output_l, P.Sigmoid()(obj_output_l), P.Sigmoid()(cls_output_l)))  # bs, 85, 80, 80              output_m = P.Concat(axis=1)(                 (reg_output_m, P.Sigmoid()(obj_output_m), P.Sigmoid()(cls_output_m)))  # bs, 85, 40, 40              output_s = P.Concat(axis=1)(                 (reg_output_s, P.Sigmoid()(obj_output_s), P.Sigmoid()(cls_output_s)))  # bs, 85, 20, 20             output_l = self.mapping_to_img(output_l, stride=self.strides[0])  # (bs, 6400, 85)x_c, y_c, w, h             output_m = self.mapping_to_img(output_m, stride=self.strides[1])  # (bs, 1600, 85)x_c, y_c, w, h             output_s = self.mapping_to_img(output_s, stride=self.strides[2])  # (bs,  400, 85)x_c, y_c, w, h         outputs.append(output_l)         outputs.append(output_m)         outputs.append(output_s)         return P.Concat(axis=1)(outputs)  # batch_size, 8400, 85      def mapping_to_img(self, output, stride):         """ map to origin image scale for each fpn """         batch_size = P.Shape()(output)[0]         n_ch = self.attr_num         grid_size = P.Shape()(output)[2:4]         range_x = range(grid_size[1])         range_y = range(grid_size[0])         stride = P.Cast()(stride, output.dtype)         grid_x = P.Cast()(F.tuple_to_array(range_x), output.dtype)         grid_y = P.Cast()(F.tuple_to_array(range_y), output.dtype)         grid_y = P.ExpandDims()(grid_y, 1)         grid_x = P.ExpandDims()(grid_x, 0)         yv = P.Tile()(grid_y, (1, grid_size[1]))         xv = P.Tile()(grid_x, (grid_size[0], 1))         grid = P.Stack(axis=2)([xv, yv])  # (80, 80, 2)         grid = P.Reshape()(grid, (1, 1, grid_size[0], grid_size[1], 2))  # (1,1,80,80,2)         output = P.Reshape()(output,                              (batch_size, n_ch, grid_size[0], grid_size[1]))  # bs, 6400, 85-->(bs,85,80,80)         output = P.Transpose()(output, (0, 2, 1, 3))  # (bs,85,80,80)-->(bs,80,85,80)         output = P.Transpose()(output, (0, 1, 3, 2))  # (bs,80,85,80)--->(bs, 80, 80, 85)         output = P.Reshape()(output, (batch_size, 1 * grid_size[0] * grid_size[1], -1))  # bs, 6400, 85         grid = P.Reshape()(grid, (1, -1, 2))  # grid(1, 6400, 2)          # reconstruct         output_xy = output[..., :2]         output_xy = (output_xy + grid) * stride         output_wh = output[..., 2:4]         output_wh = P.Exp()(output_wh) * stride         output_other = output[..., 4:]         output_t = P.Concat(axis=-1)([output_xy, output_wh, output_other])         return output_t  # bs, 6400, 85           grid(1, 6400, 2)  #------------------------# # yolox Loss #------------------------# class YOLOLossCell(nn.Cell):     """ yolox with loss cell """      def __init__(self, network=None, config=None):         super(YOLOLossCell, self).__init__()         self.network = network         self.n_candidate_k = config.n_candidate_k         self.on_value = Tensor(1.0, mindspore.float32)         self.off_value = Tensor(0.0, mindspore.float32)         self.depth = config.num_classes          self.unsqueeze = P.ExpandDims()         self.reshape = P.Reshape()         self.one_hot = P.OneHot()         self.zeros = P.ZerosLike()         self.sort_ascending = P.Sort(descending=False)         self.bce_loss = nn.BCEWithLogitsLoss(reduction="none")         self.l1_loss = nn.L1Loss(reduction="none")         self.batch_iter = Tensor(np.arange(0, config.per_batch_size * config.max_gt), mindspore.int32)         self.strides = config.fpn_strides         self.grids = [(config.input_size[0] // _stride) * (config.input_size[1] // _stride) for _stride in                       config.fpn_strides]         self.use_l1 = config.use_l1      def construct(self, img, labels=None, pre_fg_mask=None, is_inbox_and_incenter=None):         """ forward with loss return """         batch_size = P.Shape()(img)[0]         gt_max = P.Shape()(labels)[1]         outputs = self.network(img)  # batch_size, 8400, 85         total_num_anchors = P.Shape()(outputs)[1]         bbox_preds = outputs[:, :, :4]  # batch_size, 8400, 4          obj_preds = outputs[:, :, 4:5]  # batch_size, 8400, 1         cls_preds = outputs[:, :, 5:]  # (batch_size, 8400, 80)          # process label         bbox_true = labels[:, :, 1:]  # (batch_size, gt_max, 4)          gt_classes = F.cast(labels[:, :, 0:1].squeeze(-1), mindspore.int32)         pair_wise_ious = batch_bboxes_iou(bbox_true, bbox_preds, xyxy=False)         pair_wise_ious = pair_wise_ious * pre_fg_mask         pair_wise_iou_loss = -P.Log()(pair_wise_ious + 1e-8) * pre_fg_mask         gt_classes_ = self.one_hot(gt_classes, self.depth, self.on_value, self.off_value)         gt_classes_expaned = ops.repeat_elements(self.unsqueeze(gt_classes_, 2), rep=total_num_anchors, axis=2)         gt_classes_expaned = F.stop_gradient(gt_classes_expaned)          cls_preds_ = P.Sigmoid()(ops.repeat_elements(self.unsqueeze(cls_preds, 1), rep=gt_max, axis=1)) * \                      P.Sigmoid()(                          ops.repeat_elements(self.unsqueeze(obj_preds, 1), rep=gt_max, axis=1)                      )         pair_wise_cls_loss = P.ReduceSum()(             P.BinaryCrossEntropy(reduction="none")(P.Sqrt()(cls_preds_), gt_classes_expaned, None), -1)         pair_wise_cls_loss = pair_wise_cls_loss * pre_fg_mask         cost = pair_wise_cls_loss + 3.0 * pair_wise_iou_loss         punishment_cost = 1000.0 * (1.0 - F.cast(is_inbox_and_incenter, mindspore.float32))         cost = F.cast(cost + punishment_cost, mindspore.float16)         # dynamic k matching         ious_in_boxes_matrix = pair_wise_ious  # (batch_size, gt_max, 8400)         ious_in_boxes_matrix = F.cast(pre_fg_mask * ious_in_boxes_matrix, mindspore.float16)         topk_ious, _ = P.TopK(sorted=True)(ious_in_boxes_matrix, self.n_candidate_k)          dynamic_ks = P.ReduceSum()(topk_ious, 2).astype(mindspore.int32).clip(xmin=1, xmax=total_num_anchors - 1,                                                                               dtype=mindspore.int32)          # (1, batch_size * gt_max, 2)         dynamic_ks_indices = P.Stack(axis=1)((self.batch_iter, dynamic_ks.reshape((-1,))))          dynamic_ks_indices = F.stop_gradient(dynamic_ks_indices)          values, _ = P.TopK(sorted=True)(-cost, self.n_candidate_k)  # b_s , 50, 8400         values = P.Reshape()(-values, (-1, self.n_candidate_k))         max_neg_score = self.unsqueeze(P.GatherNd()(values, dynamic_ks_indices).reshape(batch_size, -1), 2)         pos_mask = F.cast(cost < max_neg_score, mindspore.float32)  # (batch_size, gt_num, 8400)         pos_mask = pre_fg_mask * pos_mask         # ----dynamic_k---- END-----------------------------------------------------------------------------------------         cost_t = cost * pos_mask + (1.0 - pos_mask) * 2000.         min_index, _ = P.ArgMinWithValue(axis=1)(cost_t)         ret_posk = P.Transpose()(nn.OneHot(depth=gt_max, axis=-1)(min_index), (0, 2, 1))         pos_mask = pos_mask * ret_posk         pos_mask = F.stop_gradient(pos_mask)         # AA problem--------------END ----------------------------------------------------------------------------------          # calculate target ---------------------------------------------------------------------------------------------         # Cast precision         pos_mask = F.cast(pos_mask, mindspore.float16)         bbox_true = F.cast(bbox_true, mindspore.float16)         gt_classes_ = F.cast(gt_classes_, mindspore.float16)          reg_target = P.BatchMatMul(transpose_a=True)(pos_mask, bbox_true)  # (batch_size, 8400, 4)         pred_ious_this_matching = self.unsqueeze(P.ReduceSum()((ious_in_boxes_matrix * pos_mask), 1), -1)         cls_target = P.BatchMatMul(transpose_a=True)(pos_mask, gt_classes_)          cls_target = cls_target * pred_ious_this_matching         obj_target = P.ReduceMax()(pos_mask, 1)  # (batch_size, 8400)          # calculate l1_target         reg_target = F.stop_gradient(reg_target)         cls_target = F.stop_gradient(cls_target)         obj_target = F.stop_gradient(obj_target)         bbox_preds = F.cast(bbox_preds, mindspore.float32)         reg_target = F.cast(reg_target, mindspore.float32)         obj_preds = F.cast(obj_preds, mindspore.float32)         obj_target = F.cast(obj_target, mindspore.float32)         cls_preds = F.cast(cls_preds, mindspore.float32)         cls_target = F.cast(cls_target, mindspore.float32)         loss_l1 = 0.0         if self.use_l1:             l1_target = self.get_l1_format(reg_target)             l1_preds = self.get_l1_format(bbox_preds)             l1_target = F.stop_gradient(l1_target)             l1_target = F.cast(l1_target, mindspore.float32)             l1_preds = F.cast(l1_preds, mindspore.float32)             loss_l1 = P.ReduceSum()(self.l1_loss(l1_preds, l1_target), -1) * obj_target             loss_l1 = P.ReduceSum()(loss_l1)         # calculate target -----------END-------------------------------------------------------------------------------         loss_iou = IOUloss()(P.Reshape()(bbox_preds, (-1, 4)), reg_target).reshape(batch_size, -1) * obj_target         loss_iou = P.ReduceSum()(loss_iou)         loss_obj = self.bce_loss(P.Reshape()(obj_preds, (-1, 1)), P.Reshape()(obj_target, (-1, 1)))         loss_obj = P.ReduceSum()(loss_obj)          loss_cls = P.ReduceSum()(self.bce_loss(cls_preds, cls_target), -1) * obj_target         loss_cls = P.ReduceSum()(loss_cls)         loss_all = (5 * loss_iou + loss_cls + loss_obj + loss_l1) / (P.ReduceSum()(obj_target) + 1e-3)         return loss_all      def get_l1_format_single(self, reg_target, stride, eps):         """ calculate L1 loss related """         reg_target = reg_target / stride         reg_target_xy = reg_target[:, :, :2]         reg_target_wh = reg_target[:, :, 2:]         reg_target_wh = P.Log()(reg_target_wh + eps)         return P.Concat(-1)((reg_target_xy, reg_target_wh))      def get_l1_format(self, reg_target, eps=1e-8):         """ calculate L1 loss related """         reg_target_l = reg_target[:, 0:self.grids[0], :]  # (bs, 6400, 4)         reg_target_m = reg_target[:, self.grids[0]:self.grids[1] + self.grids[0], :]  # (bs, 1600, 4)         reg_target_s = reg_target[:, -self.grids[2]:, :]  # (bs, 400, 4)          reg_target_l = self.get_l1_format_single(reg_target_l, self.strides[0], eps)         reg_target_m = self.get_l1_format_single(reg_target_m, self.strides[1], eps)         reg_target_s = self.get_l1_format_single(reg_target_s, self.strides[2], eps)          l1_target = P.Concat(axis=1)([reg_target_l, reg_target_m, reg_target_s])         return l1_target  class IOUloss(nn.Cell):     """ Iou loss """      def __init__(self, reduction="none"):         super(IOUloss, self).__init__()         self.reduction = reduction         self.reshape = P.Reshape()      def construct(self, pred, target):         """ forward """         pred = self.reshape(pred, (-1, 4))         target = self.reshape(target, (-1, 4))         tl = P.Maximum()(pred[:, :2] - pred[:, 2:] / 2, target[:, :2] - target[:, 2:] / 2)         br = P.Minimum()(pred[:, :2] + pred[:, 2:] / 2, target[:, :2] + target[:, 2:] / 2)         area_p = (pred[:, 2:3] * pred[:, 3:4]).squeeze(-1)         area_g = (target[:, 2:3] * target[:, 3:4]).squeeze(-1)         en = F.cast((tl < br), tl.dtype)         en = (en[:, 0:1] * en[:, 1:2]).squeeze(-1)         area_i = br - tl         area_i = (area_i[:, 0:1] * area_i[:, 1:2]).squeeze(-1) * en         area_u = area_p + area_g - area_i          iou = area_i / (area_u + 1e-16)         loss = 1 - iou * iou         if self.reduction == "mean":             loss = loss.mean()         elif self.reduction == "sum":             loss = loss.sum()         return loss  grad_scale = C.MultitypeFuncGraph("grad_scale") reciprocal = P.Reciprocal()  @grad_scale.register("Tensor", "Tensor") def tensor_grad_scale(scale, grad):     return grad * reciprocal(scale)  _grad_overflow = C.MultitypeFuncGraph("_grad_overflow") grad_overflow = P.FloatStatus()  @_grad_overflow.register("Tensor") def _tensor_grad_overflow(grad):     return grad_overflow(grad)  #------------------------# #  ema #------------------------# class TrainOneStepWithEMA(nn.TrainOneStepWithLossScaleCell):     """ Train one step with ema model """      def __init__(self, network, optimizer, scale_sense, ema=True, decay=0.9998, updates=0, moving_name=None,                  ema_moving_weight=None):         super(TrainOneStepWithEMA, self).__init__(network, optimizer, scale_sense)         self.ema = ema         self.moving_name = moving_name         self.ema_moving_weight = ema_moving_weight         if self.ema:             self.ema_weight = self.weights.clone("ema", init='same')             self.decay = decay             self.updates = Parameter(Tensor(updates, mindspore.float32))             self.assign = ops.Assign()             self.ema_moving_parameters()      def ema_moving_parameters(self):         self.moving_name = {}         moving_list = []         idx = 0         for key, param in self.network.parameters_and_names():             if "moving_mean" in key or "moving_variance" in key:                 new_param = param.clone()                 new_param.name = "ema." + param.name                 moving_list.append(new_param)                 self.moving_name["ema." + key] = idx                 idx += 1         self.ema_moving_weight = ParameterTuple(moving_list)      def ema_update(self):         """Update EMA parameters."""         if self.ema:             self.updates += 1             d = self.decay * (1 - ops.Exp()(-self.updates / 2000))             # update trainable parameters             for ema_v, weight in zip(self.ema_weight, self.weights):                 tep_v = ema_v * d                 self.assign(ema_v, (1.0 - d) * weight + tep_v)         return self.updates      # moving_parameter_update is executed inside the callback(EMACallBack)     def moving_parameter_update(self):         if self.ema:             d = (self.decay * (1 - ops.Exp()(-self.updates / 2000))).asnumpy().item()             # update moving mean and moving var             for key, param in self.network.parameters_and_names():                 if "moving_mean" in key or "moving_variance" in key:                     idx = self.moving_name["ema." + key]                     moving_weight = param.asnumpy()                     tep_v = self.ema_moving_weight[idx] * d                     ema_value = (1.0 - d) * moving_weight + tep_v                     self.ema_moving_weight[idx] = ema_value      def construct(self, *inputs):         """ Forward """         weights = self.weights         loss = self.network(*inputs)         scaling_sens = self.scale_sense          status, scaling_sens = self.start_overflow_check(loss, scaling_sens)          scaling_sens_filled = C.ones_like(loss) * F.cast(scaling_sens, F.dtype(loss))         grads = self.grad(self.network, weights)(*inputs, scaling_sens_filled)         grads = self.hyper_map(F.partial(grad_scale, scaling_sens), grads)         # apply grad reducer on grads         grads = self.grad_reducer(grads)         self.ema_update()          # get the overflow buffer         cond = self.get_overflow_status(status, grads)         overflow = self.process_loss_scale(cond)         # if there is no overflow, do optimize         if not overflow:             loss = F.depend(loss, self.optimizer(grads))         return loss, cond, scaling_sens 3.9 设备函数 针对平台设备的相关函数 #------------------------# # device adapter #------------------------# def local_adp_get_device_id():     device_id = os.getenv('DEVICE_ID', '0')     return int(device_id)  def local_adp_get_device_num():     device_num = os.getenv('RANK_SIZE', '1')     return int(device_num)  def local_adp_get_rank_id():     global_rank_id = os.getenv('RANK_ID', '0')     return int(global_rank_id)  def local_adp_get_job_id():     return "Local Job"  def moxing_adp_get_device_id():     device_id = os.getenv('DEVICE_ID', '0')     return int(device_id)  def moxing_adp_get_device_num():     device_num = os.getenv('RANK_SIZE', '1')     return int(device_num)  def moxing_adp_get_rank_id():     global_rank_id = os.getenv('RANK_ID', '0')     return int(global_rank_id)  def moxing_adp_get_job_id():     job_id = os.getenv('JOB_ID')     job_id = job_id if job_id != "" else "default"     return job_id  def sync_data(from_path, to_path):     """     Download data from remote obs to local directory if the first url is remote url and the second one is local path     Upload data from local directory to remote obs in contrast.     """     import moxing as mox     global _global_sync_count     sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)     _global_sync_count += 1      # Each server contains 8 devices as most.     if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):         print("from path: ", from_path)         print("to path: ", to_path)         mox.file.copy_parallel(from_path, to_path)         print("===finish data synchronization===")         try:             os.mknod(sync_lock)         except IOError:             pass         print("===save flag===")      while True:         if os.path.exists(sync_lock):             break         time.sleep(1)      print("Finish sync data from {} to {}.".format(from_path, to_path))  def moxing_wrapper(pre_process=None, post_process=None):     """     Moxing wrapper to download dataset and upload outputs.     """     def wrapper(run_func):         @functools.wraps(run_func)         def wrapped_func(*args, **kwargs):             # Download data from data_url             if config.enable_modelarts:                 if config.data_url:                     sync_data(config.data_url, config.data_path)                     print("Dataset downloaded: ", os.listdir(config.data_path))                 if config.checkpoint_url:                     sync_data(config.checkpoint_url, config.load_path)                     print("Preload downloaded: ", os.listdir(config.load_path))                 if config.train_url:                     sync_data(config.train_url, config.output_path)                     print("Workspace downloaded: ", os.listdir(config.output_path))                  context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))                 config.device_num = get_device_num()                 config.device_id = get_device_id()                 if not os.path.exists(config.output_path):                     os.makedirs(config.output_path)                  if pre_process:                     pre_process()              # Run the main function             run_func(*args, **kwargs)              # Upload data to train_url             if config.enable_modelarts:                 if post_process:                     post_process()                  if config.train_url:                     print("Start to copy output directory")                     sync_data(config.output_path, config.train_url)         return wrapped_func     return wrapper  if config.enable_modelarts:     get_device_id = moxing_adp_get_device_id     get_device_num = moxing_adp_get_device_num     get_rank_id = moxing_adp_get_rank_id     get_job_id = moxing_adp_get_job_id else:     get_device_id = local_adp_get_device_id     get_device_num = local_adp_get_device_num     get_rank_id = local_adp_get_rank_id     get_job_id = local_adp_get_job_id  
总条数:41 到第
上滑加载中