图像识别 Image_标签_开发者

博客(17)
视频(0)
论坛(50)
云声(0)
代码示例(0)

[其他问题] loss曲线怎么调出来

loss曲线和accuracy曲线是从哪里看啊

yd_223127372 发表于2024-05-18 16:27:34 2024-05-18 16:27:34 最后回复黄生 2024-05-20 22:51:24
71 2

图像识别 Image 数据可视化 DLV
[问题求助] mdc 编码问题

请问下文档中的ImageData结构体中的rawData 图像数据指针 mbufData Mbuf数据指针这两个数据有啥区别呢？，是1)、里面存的东西是一样的，只是存储的位置不一样？2)、里面的数据不一样，存储的位置也不一样。文档中没有详细的说明。

yd_241974171 发表于2024-03-28 10:02:56 2024-03-28 10:02:56 最后回复 *LJ_2021 2024-03-28 10:47:34
143 2

图像识别 Image
[其他] YOLOv5（2）

2.6 学习率提供了4种不同的学习率形式选择，分别是：exponential，cosine_annealing，cosine_annealing_V2，cosine_annealing_sample。这里选择的是cosine_annealing """Learning rate scheduler.""" import math from collections import Counter import numpy as np def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr): """Linear learning rate.""" lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps) lr = float(init_lr) + lr_inc * current_step return lr def warmup_step_lr(lr, lr_epochs, steps_per_epoch, warmup_epochs, max_epoch, gamma=0.1): """Warmup step learning rate.""" base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) milestones = lr_epochs milestones_steps = [] for milestone in milestones: milestones_step = milestone * steps_per_epoch milestones_steps.append(milestones_step) lr_each_step = [] lr = base_lr milestones_steps_counter = Counter(milestones_steps) for i in range(total_steps): if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: lr = lr * gamma**milestones_steps_counter[i] lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def multi_step_lr(lr, milestones, steps_per_epoch, max_epoch, gamma=0.1): return warmup_step_lr(lr, milestones, steps_per_epoch, 0, max_epoch, gamma=gamma) def step_lr(lr, epoch_size, steps_per_epoch, max_epoch, gamma=0.1): lr_epochs = [] for i in range(1, max_epoch): if i % epoch_size == 0: lr_epochs.append(i) return multi_step_lr(lr, lr_epochs, steps_per_epoch, max_epoch, gamma=gamma) def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0): """Cosine annealing learning rate.""" base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) lr_each_step = [] for i in range(total_steps): last_epoch = i // steps_per_epoch if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2 lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def warmup_cosine_annealing_lr_V2(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0): """Cosine annealing learning rate V2.""" base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) last_lr = 0 last_epoch_V1 = 0 T_max_V2 = int(max_epoch * 1 / 3) lr_each_step = [] for i in range(total_steps): last_epoch = i // steps_per_epoch if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: if i < total_steps * 2 / 3: lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2 last_lr = lr last_epoch_V1 = last_epoch else: base_lr = last_lr last_epoch = last_epoch - last_epoch_V1 lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max_V2)) / 2 lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def warmup_cosine_annealing_lr_sample(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0): """Warmup cosine annealing learning rate.""" start_sample_epoch = 60 step_sample = 2 tobe_sampled_epoch = 60 end_sampled_epoch = start_sample_epoch + step_sample * tobe_sampled_epoch max_sampled_epoch = max_epoch + tobe_sampled_epoch T_max = max_sampled_epoch base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) total_sampled_steps = int(max_sampled_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) lr_each_step = [] for i in range(total_sampled_steps): last_epoch = i // steps_per_epoch if last_epoch in range(start_sample_epoch, end_sampled_epoch, step_sample): continue if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2 lr_each_step.append(lr) assert total_steps == len(lr_each_step) return np.array(lr_each_step).astype(np.float32) def get_lr(args, steps_per_epoch): """generate learning rate.""" if args.lr_scheduler == 'exponential': lr = warmup_step_lr(args.lr, args.lr_epochs, steps_per_epoch, args.warmup_epochs, args.max_epoch, gamma=args.lr_gamma) elif args.lr_scheduler == 'cosine_annealing': lr = warmup_cosine_annealing_lr(args.lr, steps_per_epoch, args.warmup_epochs, args.max_epoch, args.T_max, args.eta_min) elif args.lr_scheduler == 'cosine_annealing_V2': lr = warmup_cosine_annealing_lr_V2(args.lr, steps_per_epoch, args.warmup_epochs, args.max_epoch, args.T_max, args.eta_min) elif args.lr_scheduler == 'cosine_annealing_sample': lr = warmup_cosine_annealing_lr_sample(args.lr, steps_per_epoch, args.warmup_epochs, args.max_epoch, args.T_max, args.eta_min) else: raise NotImplementedError(args.lr_scheduler) return lr 2.7 模型训练 import os import time import mindspore as ms import mindspore.nn as nn import mindspore.communication as comm from src.logger import get_logger from src.util import AverageMeter, get_param_groups, cpu_affinity from src.initializer import default_recurisive_init, load_yolov5_params ms.set_seed(1) def train_preprocess(): if args.lr_scheduler == 'cosine_annealing' and args.max_epoch > args.T_max: args.T_max = args.max_epoch args.lr_epochs = list(map(int, args.lr_epochs.split(','))) args.data_root = os.path.join(args.data_dir, 'train2017') args.annFile = os.path.join(args.data_dir, 'annotations/instances_train2017.json') if args.pretrained_checkpoint: args.pretrained_checkpoint = args.pretrained_checkpoint args.device_id = int(os.getenv('DEVICE_ID', '1')) ms.set_context(mode=ms.GRAPH_MODE, device_target=args.device_target) if args.is_distributed: # init distributed init_distribute() # for promoting performance in GPU device if args.device_target == "GPU" and args.bind_cpu: cpu_affinity(args.rank, min(args.group_size, 8)) # logger module is managed by config, it is used in other function. e.x. config.logger.info("xxx") args.logger = get_logger(args.output_dir, args.rank) def run_train(): train_preprocess() loss_meter = AverageMeter('loss') dict_version = {'yolov5s': 0, 'yolov5m': 1, 'yolov5l': 2, 'yolov5x': 3} network = YOLOV5(is_training=True, version=dict_version[args.yolov5_version]) # default is kaiming-normal default_recurisive_init(network) load_yolov5_params(args, network) network = YoloWithLossCell(network) ds = create_yolo_dataset(image_dir=args.data_root, anno_path=args.annFile, is_training=True, batch_size=args.per_batch_size, device_num=args.group_size, rank=args.rank, config=args) args.logger.info('Finish loading dataset') steps_per_epoch = ds.get_dataset_size() lr = get_lr(args, steps_per_epoch) opt = nn.Momentum(params=get_param_groups(network), momentum=args.momentum, learning_rate=ms.Tensor(lr), weight_decay=args.weight_decay, loss_scale=args.loss_scale) network = nn.TrainOneStepCell(network, opt, args.loss_scale // 2) network.set_train() data_loader = ds.create_tuple_iterator(do_copy=False) first_step = True t_end = time.time() for epoch_idx in range(args.max_epoch): for step_idx, data in enumerate(data_loader): images = data[0] input_shape = images.shape[2:4] input_shape = ms.Tensor(tuple(input_shape[::-1]), ms.float32) loss = network(images, data[2], data[3], data[4], data[5], data[6], data[7], input_shape) loss_meter.update(loss.asnumpy()) # it is used for loss, performance output per config.log_interval steps. if (epoch_idx * steps_per_epoch + step_idx) % args.log_interval == 0: time_used = time.time() - t_end if first_step: fps = args.per_batch_size * args.group_size / time_used per_step_time = time_used * 1000 first_step = False else: fps = args.per_batch_size * args.log_interval * args.group_size / time_used per_step_time = time_used / args.log_interval * 1000 args.logger.info('epoch[{}], iter[{}], {}, fps:{:.2f} imgs/sec, ' 'lr:{}, per step time: {}ms'.format( epoch_idx + 1, step_idx + 1, loss_meter, fps, lr[epoch_idx * steps_per_epoch + step_idx], per_step_time)) t_end = time.time() loss_meter.reset() if args.rank == 0: ckpt_name = os.path.join(args.output_dir, "yolov5_{}_{}.ckpt".format(epoch_idx + 1, steps_per_epoch)) ms.save_checkpoint(network, ckpt_name) args.logger.info('==========end training===============') if __name__ == "__main__": run_train() （一大堆结果）2.8 模型测试 import os import time import numpy as np import mindspore as ms # from model_utils.config import config # from src.yolo import YOLOV5 from src.logger import get_logger from src.util import DetectionEngine # from src.yolo_dataset import create_yolo_dataset from src.util import DetectionEngine def eval_preprocess(): args.data_root = os.path.join(args.data_dir, 'val2017') args.ann_file = os.path.join(args.data_dir, 'annotations/instances_val2017.json') ms.set_context(mode=ms.GRAPH_MODE, device_target=args.device_target) # logger module is managed by config, it is used in other function. e.x. config.logger.info("xxx") args.logger = get_logger(args.output_dir, args.rank) def load_parameters(network, filename): args.logger.info("yolov5 pretrained network model: %s", filename) param_dict = ms.load_checkpoint(filename) param_dict_new = {} for key, values in param_dict.items(): if key.startswith('moments.'): continue elif key.startswith('yolo_network.'): param_dict_new[key[13:]] = values else: param_dict_new[key] = values ms.load_param_into_net(network, param_dict_new) args.logger.info('load_model %s success', filename) def run_eval(): eval_preprocess() start_time = time.time() args.logger.info('Creating Network....') dict_version = {'yolov5s': 0, 'yolov5m': 1, 'yolov5l': 2, 'yolov5x': 3} network = YOLOV5(is_training=False, version=dict_version[args.yolov5_version]) if os.path.isfile(args.pretrained): load_parameters(network, args.pretrained) else: raise FileNotFoundError(f"{args.pretrained} is not a filename.") ds = create_yolo_dataset(args.data_root, args.ann_file, is_training=False, batch_size=args.per_batch_size, device_num=1, rank=0, shuffle=False, config=args) args.logger.info('testing shape : %s', args.test_img_shape) args.logger.info('total %d images to eval', ds.get_dataset_size() * args.per_batch_size) network.set_train(False) # init detection engine detection = DetectionEngine(args, args.test_ignore_threshold) input_shape = ms.Tensor(tuple(args.test_img_shape), ms.float32) args.logger.info('Start inference....') for index, data in enumerate(ds.create_dict_iterator(output_numpy=True, num_epochs=1)): image = data["image"] # adapt network shape of input data image = np.concatenate((image[..., ::2, ::2], image[..., 1::2, ::2], image[..., ::2, 1::2], image[..., 1::2, 1::2]), axis=1) image = ms.Tensor(image) image_shape_ = data["image_shape"] image_id_ = data["img_id"] output_big, output_me, output_small = network(image, input_shape) output_big = output_big.asnumpy() output_me = output_me.asnumpy() output_small = output_small.asnumpy() detection.detect([output_small, output_me, output_big], args.per_batch_size, image_shape_, image_id_) if index % 50 == 0: args.logger.info('Processing... {:.2f}% '.format(index / ds.get_dataset_size() * 100)) args.logger.info('Calculating mAP...') detection.do_nms_for_results() result_file_path = detection.write_result() args.logger.info('result file path: %s', result_file_path) eval_result = detection.get_eval_result() cost_time = time.time() - start_time eval_log_string = '\n=============coco eval result=========\n' + eval_result args.logger.info(eval_log_string) args.logger.info('testing cost time %.2f h', cost_time / 3600.) if __name__ == "__main__": run_eval() （一大堆结果）

yd_233394255 发表于2024-03-09 17:45:25 2024-03-09 17:45:25 最后回复运气男孩 2024-04-01 08:56:54
36 1

图像识别 Image
[其他] YOLOv5（1）

1、算法介绍 YOLOv5是一种单阶段目标检测算法，该算法在YOLOv4的基础上添加了一些新的改进思路，使其速度与精度都得到了极大的性能提升。需要说明的是，YOLOv5没有论文，其作者是Mosaic Augmentation 的创造者，YOLOv5在gtihub上的链接为：https://github.com/ultralytics/yolov5 1.1 模型结构： YOLOv5网络模型主要有四个部分组成，分别为：输入端，Backbone，Neck，Prediction。和YOLOv4相比，YOLOv5做了一些优化，主要有(1) 输入端：Mosaic数据增强，（2）Backbone：Focus结构，CSP结构, (3）Neck：FPN+PAN结构,（4）Prediction：GIOU_Loss。下面是YOLOv5的整体网络结构图： %E5%9B%BE%E7%89%87-4.png 2、模型实现 2.1 环境准备与数据读取案例基于MindSpore1.8的GPU版本实现，在GXT1080TI单卡上完成训练。案例使用数据集为coco_mini，是从COCO数据集中分离出来的一小部分数据,其中训练集50张图像，测试集10张图像，数据格式为图像和json文件。coco_mini数据集的下载链接为: https://pan.baidu.com/s/1FJ_Css0KoXqKqifmUzmBUw 提取码: g55f。下载好的数据集包括3个文件，分别对应数据标签，训练数据，测试数据，文件路径结构如下： .datasets/ └── coco_mini_dataset ├── annotations ├──instances_train2017.json └──instances_val2017.json ├── train2017 └── val2017 > 下面是数据的可视化展示: import numpy as np import matplotlib import os import glob import math from PIL import Image, ImageSequence from matplotlib import pyplot as plt #显示下载好的数据 train_image_path = "dataset/mini_coco_dataset/train2017/" image = [] for root, dirs, files in os.walk(train_image_path): for i in range(6): image.append(files[i]) def show_image(image_list,num = 6): ''' #image_list: 图像序列，numpy数组 #num: 显示图片的数量 ''' img_titles = [] for ind,img in enumerate(image_list): if ind == num: break img_titles.append(ind) for i in range(len(img_titles)): if len(img_titles) > 6: row = 3 elif 3<len(img_titles)<=6: row = 2 else: row = 1 col = math.ceil(len(img_titles)/row) plt.subplot(row,col,i+1),plt.imshow(Image.open(os.path.join(train_image_path, image[i]))) plt.title(img_titles[i]) plt.xticks([]),plt.yticks([]) plt.show() show_image(image,num=4) 2.2 参数定义（包括lr,epoch, pretrained_checkpoints) import argparse import mindspore as ms import sys sys.argv=[''] del sys ms.set_seed(1) parser = argparse.ArgumentParser('mindspore coco training') # device related parser.add_argument('--device_target', type=str, default='GPU', help='device where the code will be implemented.') # dataset related parser.add_argument('--data_dir', default='./dataset/mini_coco_dataset/', type=str, help='Train dataset directory.') parser.add_argument('--output_dir', default='./output', type=str, help='output') parser.add_argument('--pretrained_checkpoint', default='',type=str, help='pretrained_checkpoint') parser.add_argument('--per_batch_size', default=8, type=int, help='Batch size for Training. Default: 8') # network related parser.add_argument('--yolov5_version', default='yolov5s', type=str, help='The version of YOLOv5, options: yolov5s, yolov5m, yolov5l, yolov5x') parser.add_argument('--pretrained_backbone', default='', type=str, help='The pretrained file of yolov5. Default: "".') parser.add_argument('--resume_yolov5', default='', type=str, help='The ckpt file of YOLOv5, which used to fine tune. Default: ""') # optimizer and lr related parser.add_argument('--lr_scheduler', default='cosine_annealing', type=str, help='Learning rate scheduler, options: exponential, cosine_annealing. Default: exponential') parser.add_argument('--lr', default=0.0005, type=float, help='Learning rate. Default: 0.01') parser.add_argument('--lr_epochs', type=str, default='220,250', help='Epoch of changing of lr changing, split with ",". Default: 220,250') parser.add_argument('--lr_gamma', type=float, default=0.1, help='Decrease lr by a factor of exponential lr_scheduler. Default: 0.1') parser.add_argument('--eta_min', type=float, default=0., help='Eta_min in cosine_annealing scheduler. Default: 0') parser.add_argument('--T_max', type=int, default=300, help='T-max in cosine_annealing scheduler. Default: 320') parser.add_argument('--max_epoch', type=int, default=300, help='Max epoch num to train the model. Default: 320') parser.add_argument('--warmup_epochs', default=4, type=float, help='Warmup epochs. Default: 0') parser.add_argument('--weight_decay', type=float, default=0.0005, help='Weight decay factor. Default: 0.0005') parser.add_argument('--momentum', type=float, default=0.9, help='Momentum. Default: 0.9') parser.add_argument('--bind_cpu', default= True, help='Whether bind cpu when distributed training. Default: True') parser.add_argument('--resize_rate', default= 10, help='resize_rate') parser.add_argument('--anchor_scales', default= [[12, 16],[19, 36],[40, 28],[36, 75],[76, 55],[72, 146],[142, 110], [192, 243], [459, 401]], help='resize_rate') parser.add_argument('--input_shape', default= [[3, 32, 64, 128, 256, 512, 1],[3, 48, 96, 192, 384, 768, 2], [3, 64, 128, 256, 512, 1024, 3],[3, 80, 160, 320, 640, 1280, 4]], help='resize_rate') parser.add_argument('--num_classes', default= 80, help='num_classes') parser.add_argument('--max_box', default= 150, help='num_classes') parser.add_argument('--hue', default= 0.015, help='num_classes') parser.add_argument('--saturation', default= 1.5, help='num_classes') parser.add_argument('--value', default= 0.4, help='num_classes') parser.add_argument('--jitter', default= 0.3, help='num_classes') # loss related parser.add_argument('--loss_scale', type=int, default=1024, help='Static loss scale. Default: 1024') parser.add_argument('--label_smooth', type=int, default=0, help='Whether to use label smooth in CE. Default:0') parser.add_argument('--label_smooth_factor', type=float, default=0.1, help='Smooth strength of original one-hot. Default: 0.1') # logging related parser.add_argument('--log_interval', type=int, default=6, help='Logging interval steps. Default: 100') parser.add_argument('--ckpt_path', type=str, default='outputs/', help='Checkpoint save location. Default: outputs/') parser.add_argument('--ckpt_interval', type=int, default=None, help='Save checkpoint interval. Default: None') # distributed related parser.add_argument('--is_distributed', type=int, default=0, help='Distribute train or not, 1 for yes, 0 for no. Default: 1') parser.add_argument('--rank', type=int, default=0, help='Local rank of distributed. Default: 0') parser.add_argument('--group_size', type=int, default=1, help='World size of device. Default: 1') # test related parser.add_argument('--pretrained',default='./output/yolov5_300_6.ckpt', type=str, help='checkpoints') parser.add_argument('--test_img_shape',default=[640,640], help='test image shape') parser.add_argument('--test_ignore_threshold',default=0.001, help='test_ignore_threshold') parser.add_argument('--eval_nms_thresh',default=0.5, help='eval_nms_thresh') parser.add_argument('--ignore_threshold',default=0.5, help='ignore_threshold') parser.add_argument('--multi_label',default=True, help='ignore_threshold') parser.add_argument('--multi_label_thresh',default=0.1, help='ignore_threshold') parser.add_argument('--labels',default=[ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'], help='labels') parser.add_argument('--coco_ids',default=[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90 ], help='coco_ids') args, _ = parser.parse_known_args() 2.3 数据集加载由于YOLOv5主要目的是进行目标检测，因此在这里构建的数据集格式是参照COCO数据集格式进行构建，还定义了一些函数来对没有标注的数据进行筛选。 YOLOv5的数据集加载中最主要是利用了mosaic实现数据增强手段。Mosaic利用了四张图片，对四张图片进行拼接，每一张图片都有其对应的框，将四张图片拼接之后就获得一张新的图片，同时也获得这张图片对应的框，然后将这样一张新的图片传入到神经网络当中去学习。 """YOLOV5 dataset.""" from __future__ import division import os import multiprocessing import random import numpy as np import cv2 from PIL import Image import math import numpy as np from pycocotools.coco import COCO import mindspore.dataset as ds import mindspore.dataset.vision.c_transforms as CV from mindspore.dataset.vision import Normalize,HWC2CHW from src.transforms import reshape_fn, MultiScaleTrans, PreprocessTrueBox min_keypoints_per_image = 10 GENERATOR_PARALLEL_WORKER = 8 class DistributedSampler: """Distributed sampler.""" def __init__(self, dataset_size, num_replicas=None, rank=None, shuffle=True): if num_replicas is None: print("***********Setting world_size to 1 since it is not passed in ******************") num_replicas = 1 if rank is None: print("***********Setting rank to 0 since it is not passed in ******************") rank = 0 self.dataset_size = dataset_size self.num_replicas = num_replicas self.rank = rank self.epoch = 0 self.num_samples = int(math.ceil(dataset_size * 1.0 / self.num_replicas)) self.total_size = self.num_samples * self.num_replicas self.shuffle = shuffle def __iter__(self): # deterministically shuffle based on epoch if self.shuffle: indices = np.random.RandomState(seed=self.epoch).permutation(self.dataset_size) # np.array type. number from 0 to len(dataset_size)-1, used as # index of dataset indices = indices.tolist() self.epoch += 1 # change to list type else: indices = list(range(self.dataset_size)) # add extra samples to make it evenly divisible indices += indices[:(self.total_size - len(indices))] assert len(indices) == self.total_size # subsample indices = indices[self.rank:self.total_size:self.num_replicas] assert len(indices) == self.num_samples return iter(indices) def __len__(self): return self.num_samples def _has_only_empty_bbox(anno): return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno) def _count_visible_keypoints(anno): return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno) def has_valid_annotation(anno): """Check annotation file.""" # if it's empty, there is no annotation if not anno: return False # if all boxes have close to zero area, there is no annotation if _has_only_empty_bbox(anno): return False # keypoints task have a slight different criteria for considering # if an annotation is valid if "keypoints" not in anno[0]: return True # for keypoint detection tasks, only consider valid images those # containing at least min_keypoints_per_image if _count_visible_keypoints(anno) >= min_keypoints_per_image: return True return False class COCOYoloDataset: """YOLOV5 Dataset for COCO.""" def __init__(self, root, ann_file, remove_images_without_annotations=True, filter_crowd_anno=True, is_training=True): self.coco = COCO(ann_file) self.root = root self.img_ids = list(sorted(self.coco.imgs.keys())) self.filter_crowd_anno = filter_crowd_anno self.is_training = is_training self.mosaic = True # filter images without any annotations if remove_images_without_annotations: img_ids = [] for img_id in self.img_ids: ann_ids = self.coco.getAnnIds(imgIds=img_id, iscrowd=None) anno = self.coco.loadAnns(ann_ids) if has_valid_annotation(anno): img_ids.append(img_id) self.img_ids = img_ids self.categories = {cat["id"]: cat["name"] for cat in self.coco.cats.values()} self.cat_ids_to_continuous_ids = { v: i for i, v in enumerate(self.coco.getCatIds()) } self.continuous_ids_cat_ids = { v: k for k, v in self.cat_ids_to_continuous_ids.items() } self.count = 0 def _mosaic_preprocess(self, index, input_size): labels4 = [] s = 384 self.mosaic_border = [-s // 2, -s // 2] yc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border] indices = [index] + [random.randint(0, len(self.img_ids) - 1) for _ in range(3)] for i, img_ids_index in enumerate(indices): coco = self.coco img_id = self.img_ids[img_ids_index] img_path = coco.loadImgs(img_id)[0]["file_name"] img = Image.open(os.path.join(self.root, img_path)).convert("RGB") img = np.array(img) h, w = img.shape[:2] if i == 0: # top left img4 = np.full((s * 2, s * 2, img.shape[2]), 128, dtype=np.uint8) # base image with 4 tiles x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image) x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image) elif i == 1: # top right x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h elif i == 2: # bottom left x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h) x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h) elif i == 3: # bottom right x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h) x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h) img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax] padw = x1a - x1b padh = y1a - y1b ann_ids = coco.getAnnIds(imgIds=img_id) target = coco.loadAnns(ann_ids) # filter crowd annotations if self.filter_crowd_anno: annos = [anno for anno in target if anno["iscrowd"] == 0] else: annos = [anno for anno in target] target = {} boxes = [anno["bbox"] for anno in annos] target["bboxes"] = boxes classes = [anno["category_id"] for anno in annos] classes = [self.cat_ids_to_continuous_ids[cl] for cl in classes] target["labels"] = classes bboxes = target['bboxes'] labels = target['labels'] out_target = [] for bbox, label in zip(bboxes, labels): tmp = [] # convert to [x_min y_min x_max y_max] bbox = self._convetTopDown(bbox) tmp.extend(bbox) tmp.append(int(label)) # tmp [x_min y_min x_max y_max, label] out_target.append(tmp) # 这里out_target是label的实际宽高，对应于图片中的实际度量 labels = out_target.copy() labels = np.array(labels) out_target = np.array(out_target) labels[:, 0] = out_target[:, 0] + padw labels[:, 1] = out_target[:, 1] + padh labels[:, 2] = out_target[:, 2] + padw labels[:, 3] = out_target[:, 3] + padh labels4.append(labels) if labels4: labels4 = np.concatenate(labels4, 0) np.clip(labels4[:, :4], 0, 2 * s, out=labels4[:, :4]) # use with random_perspective flag = np.array([1]) return img4, labels4, input_size, flag def __getitem__(self, index): """ Args: index (int): Index Returns: (img, target) (tuple): target is a dictionary contains "bbox", "segmentation" or "keypoints", generated by the image's annotation. img is a PIL image. """ coco = self.coco img_id = self.img_ids[index] img_path = coco.loadImgs(img_id)[0]["file_name"] if not self.is_training: img = Image.open(os.path.join(self.root, img_path)).convert("RGB") return img, img_id input_size = [640, 640] if self.mosaic and random.random() < 0.5: return self._mosaic_preprocess(index, input_size) img = np.fromfile(os.path.join(self.root, img_path), dtype='int8') ann_ids = coco.getAnnIds(imgIds=img_id) target = coco.loadAnns(ann_ids) # filter crowd annotations if self.filter_crowd_anno: annos = [anno for anno in target if anno["iscrowd"] == 0] else: annos = [anno for anno in target] target = {} boxes = [anno["bbox"] for anno in annos] target["bboxes"] = boxes classes = [anno["category_id"] for anno in annos] classes = [self.cat_ids_to_continuous_ids[cl] for cl in classes] target["labels"] = classes bboxes = target['bboxes'] labels = target['labels'] out_target = [] for bbox, label in zip(bboxes, labels): tmp = [] # convert to [x_min y_min x_max y_max] bbox = self._convetTopDown(bbox) tmp.extend(bbox) tmp.append(int(label)) # tmp [x_min y_min x_max y_max, label] out_target.append(tmp) flag = np.array([0]) return img, out_target, input_size, flag def __len__(self): return len(self.img_ids) def _convetTopDown(self, bbox): x_min = bbox[0] y_min = bbox[1] w = bbox[2] h = bbox[3] return [x_min, y_min, x_min+w, y_min+h] def create_yolo_dataset(image_dir, anno_path, batch_size, device_num, rank, config=args, is_training=True, shuffle=True): """Create dataset for YOLOV5.""" cv2.setNumThreads(0) ds.config.set_enable_shared_mem(True) if is_training: filter_crowd = True remove_empty_anno = True else: filter_crowd = False remove_empty_anno = False yolo_dataset = COCOYoloDataset(root=image_dir, ann_file=anno_path, filter_crowd_anno=filter_crowd, remove_images_without_annotations=remove_empty_anno, is_training=is_training) distributed_sampler = DistributedSampler(len(yolo_dataset), device_num, rank, shuffle=shuffle) yolo_dataset.size = len(distributed_sampler) hwc_to_chw = HWC2CHW() args.dataset_size = len(yolo_dataset) cores = multiprocessing.cpu_count() num_parallel_workers = int(cores / device_num) # num_parallel_workers = 1 if is_training: multi_scale_trans = MultiScaleTrans(args, device_num) yolo_dataset.transforms = multi_scale_trans dataset_column_names = ["image", "annotation", "input_size", "mosaic_flag"] output_column_names = ["image", "annotation", "bbox1", "bbox2", "bbox3", "gt_box1", "gt_box2", "gt_box3"] map1_out_column_names = ["image", "annotation", "size"] map2_in_column_names = ["annotation", "size"] map2_out_column_names = ["annotation", "bbox1", "bbox2", "bbox3", "gt_box1", "gt_box2", "gt_box3"] dataset = ds.GeneratorDataset(yolo_dataset, column_names=dataset_column_names, sampler=distributed_sampler, python_multiprocessing=True, num_parallel_workers=min(4, num_parallel_workers)) dataset = dataset.map(operations=multi_scale_trans, input_columns=dataset_column_names, output_columns=map1_out_column_names, column_order=map1_out_column_names, num_parallel_workers=min(12, num_parallel_workers), python_multiprocessing=True) dataset = dataset.map(operations=PreprocessTrueBox(args), input_columns=map2_in_column_names, output_columns=map2_out_column_names, column_order=output_column_names, num_parallel_workers=min(4, num_parallel_workers), python_multiprocessing=False) mean = [m * 255 for m in [0.485, 0.456, 0.406]] std = [s * 255 for s in [0.229, 0.224, 0.225]] dataset = dataset.map([Normalize(mean, std), hwc_to_chw], num_parallel_workers=min(4, num_parallel_workers)) def concatenate(images): images = np.concatenate((images[..., ::2, ::2], images[..., 1::2, ::2], images[..., ::2, 1::2], images[..., 1::2, 1::2]), axis=0) return images dataset = dataset.map(operations=concatenate, input_columns="image", num_parallel_workers=min(4, num_parallel_workers)) dataset = dataset.batch(batch_size, num_parallel_workers=min(4, num_parallel_workers), drop_remainder=True) else: dataset = ds.GeneratorDataset(yolo_dataset, column_names=["image", "img_id"], sampler=distributed_sampler) compose_map_func = (lambda image, img_id: reshape_fn(image, img_id, args)) dataset = dataset.map(operations=compose_map_func, input_columns=["image", "img_id"], output_columns=["image", "image_shape", "img_id"], column_order=["image", "image_shape", "img_id"], num_parallel_workers=8) dataset = dataset.map(operations=hwc_to_chw, input_columns=["image"], num_parallel_workers=8) dataset = dataset.batch(batch_size, drop_remainder=True) return dataset 2.4 模型实现 Backbone （1）Focus结构：其中比较关键是切片操作。比如下图的切片示意图，4 * 4 * 3的图像切片后变成2 * 2 * 12的特征图 %E5%9B%BE%E7%89%87-4.png （2）CSP结构： Yolov5中设计了两种CSP结构，CSP1_X结构应用于Backbone主干网络，另一种CSP2_X结构则应用于Neck中。 %E5%9B%BE%E7%89%87-3.png """DarkNet model.""" import mindspore.nn as nn import mindspore.ops as ops class Bottleneck(nn.Cell): # Standard bottleneck # ch_in, ch_out, shortcut, groups, expansion def __init__(self, c1, c2, shortcut=True, e=0.5): super(Bottleneck, self).__init__() c_ = int(c2 * e) # hidden channels self.conv1 = Conv(c1, c_, 1, 1) self.conv2 = Conv(c_, c2, 3, 1) self.add = shortcut and c1 == c2 def construct(self, x): c1 = self.conv1(x) c2 = self.conv2(c1) out = c2 if self.add: out = x + out return out class BottleneckCSP(nn.Cell): # CSP Bottleneck with 3 convolutions def __init__(self, c1, c2, n=1, shortcut=True, e=0.5): super(BottleneckCSP, self).__init__() c_ = int(c2 * e) # hidden channels self.conv1 = Conv(c1, c_, 1, 1) self.conv2 = Conv(c1, c_, 1, 1) self.conv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2) self.m = nn.SequentialCell( [Bottleneck(c_, c_, shortcut, e=1.0) for _ in range(n)]) self.concat = ops.Concat(axis=1) def construct(self, x): c1 = self.conv1(x) c2 = self.m(c1) c3 = self.conv2(x) c4 = self.concat((c2, c3)) c5 = self.conv3(c4) return c5 class SPP(nn.Cell): # Spatial pyramid pooling layer used in YOLOv3-SPP def __init__(self, c1, c2, k=(5, 9, 13)): super(SPP, self).__init__() c_ = c1 // 2 # hidden channels self.conv1 = Conv(c1, c_, 1, 1) self.conv2 = Conv(c_ * (len(k) + 1), c2, 1, 1) self.maxpool1 = nn.MaxPool2d(kernel_size=5, stride=1, pad_mode='same') self.maxpool2 = nn.MaxPool2d(kernel_size=9, stride=1, pad_mode='same') self.maxpool3 = nn.MaxPool2d(kernel_size=13, stride=1, pad_mode='same') self.concat = ops.Concat(axis=1) def construct(self, x): c1 = self.conv1(x) m1 = self.maxpool1(c1) m2 = self.maxpool2(c1) m3 = self.maxpool3(c1) c4 = self.concat((c1, m1, m2, m3)) c5 = self.conv2(c4) return c5 class Focus(nn.Cell): # Focus wh information into c-space def __init__(self, c1, c2, k=1, s=1, p=None, act=True): super(Focus, self).__init__() self.conv = Conv(c1 * 4, c2, k, s, p, act) def construct(self, x): c1 = self.conv(x) return c1 class SiLU(nn.Cell): def __init__(self): super(SiLU, self).__init__() self.sigmoid = ops.Sigmoid() def construct(self, x): return x * self.sigmoid(x) def auto_pad(k, p=None): # kernel, padding # Pad to 'same' if p is None: p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Cell): # Standard convolution def __init__(self, c1, c2, k=1, s=1, p=None, dilation=1, alpha=0.1, momentum=0.97, eps=1e-3, pad_mode="same", act=True): # ch_in, ch_out, kernel, stride, padding super(Conv, self).__init__() self.padding = auto_pad(k, p) self.pad_mode = None if self.padding == 0: self.pad_mode = 'same' elif self.padding == 1: self.pad_mode = 'pad' self.conv = nn.Conv2d( c1, c2, k, s, padding=self.padding, pad_mode=self.pad_mode, has_bias=False) self.bn = nn.BatchNorm2d(c2, momentum=momentum, eps=eps) self.act = SiLU() if act is True else ( act if isinstance(act, nn.Cell) else ops.Identity()) def construct(self, x): return self.act(self.bn(self.conv(x))) class YOLOv5Backbone(nn.Cell): def __init__(self, shape): super(YOLOv5Backbone, self).__init__() self.focus = Focus(shape[0], shape[1], k=3, s=1) self.conv1 = Conv(shape[1], shape[2], k=3, s=2) self.CSP1 = BottleneckCSP(shape[2], shape[2], n=1 * shape[6]) self.conv2 = Conv(shape[2], shape[3], k=3, s=2) self.CSP2 = BottleneckCSP(shape[3], shape[3], n=3 * shape[6]) self.conv3 = Conv(shape[3], shape[4], k=3, s=2) self.CSP3 = BottleneckCSP(shape[4], shape[4], n=3 * shape[6]) self.conv4 = Conv(shape[4], shape[5], k=3, s=2) self.spp = SPP(shape[5], shape[5], k=[5, 9, 13]) self.CSP4 = BottleneckCSP(shape[5], shape[5], n=1 * shape[6], shortcut=False) def construct(self, x): """construct method""" c1 = self.focus(x) c2 = self.conv1(c1) c3 = self.CSP1(c2) c4 = self.conv2(c3) # out c5 = self.CSP2(c4) c6 = self.conv3(c5) # out c7 = self.CSP3(c6) c8 = self.conv4(c7) c9 = self.spp(c8) # out c10 = self.CSP4(c9) return c5, c7, c10 2.4 模型实现 Neck 这是Pytorch版本的YOLOv5代码中对neck部分的配置文件，从配置文件中可以看出Neck部分的组件较为单一，基本上就由CBS（ConV）、Upsample、Concat和不带shortcut的CSP（C3)组成: [[-1, 1, Conv, [512, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, C3, [512, False]], # 13 [-1, 1, Conv, [256, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 3, C3, [256, False]], # 17 (P3/8-small) [-1, 1, Conv, [256, 3, 2]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, C3, [512, False]], # 20 (P4/16-medium) [-1, 1, Conv, [512, 3, 2]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, C3, [1024, False]], # 23 (P5/32-large) ] > 另外，Neck的网络结构设计也是沿用了FPN+PAN的结构。FPN就是使用一种自顶向下的侧边连接在所有尺度上构建出高级语义特征图，构造了特征金字塔的经典结构；PAN的结构也不稀奇，对于PAN，底层的目标信息已经非常模糊了，因此PAN又加入了自底向上的路线，弥补并加强了定位信息 jupyter # model import mindspore as ms import mindspore.nn as nn import mindspore.ops as ops class YoloBlock(nn.Cell): def __init__(self, in_channels, out_channels): super(YoloBlock, self).__init__() self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, has_bias=True) def construct(self, x): """construct method""" out = self.conv(x) return out class YOLO(nn.Cell): def __init__(self, backbone, shape): super(YOLO, self).__init__() self.backbone = backbone self.out_channel = (80 + 5) * 3 self.conv1 = Conv(shape[5], shape[4], k=1, s=1) self.CSP5 = BottleneckCSP(shape[5], shape[4], n=1*shape[6], shortcut=False) self.conv2 = Conv(shape[4], shape[3], k=1, s=1) self.CSP6 = BottleneckCSP(shape[4], shape[3], n=1*shape[6], shortcut=False) self.conv3 = Conv(shape[3], shape[3], k=3, s=2) self.CSP7 = BottleneckCSP(shape[4], shape[4], n=1*shape[6], shortcut=False) self.conv4 = Conv(shape[4], shape[4], k=3, s=2) self.CSP8 = BottleneckCSP(shape[5], shape[5], n=1*shape[6], shortcut=False) self.back_block1 = YoloBlock(shape[3], self.out_channel) self.back_block2 = YoloBlock(shape[4], self.out_channel) self.back_block3 = YoloBlock(shape[5], self.out_channel) self.concat = ops.Concat(axis=1) def construct(self, x): img_height = x.shape[2] * 2 img_width = x.shape[3] * 2 feature_map1, feature_map2, feature_map3 = self.backbone(x) c1 = self.conv1(feature_map3) ups1 = ops.ResizeNearestNeighbor((img_height // 16, img_width // 16))(c1) c2 = self.concat((ups1, feature_map2)) c3 = self.CSP5(c2) c4 = self.conv2(c3) ups2 = ops.ResizeNearestNeighbor((img_height // 8, img_width // 8))(c4) c5 = self.concat((ups2, feature_map1)) # out c6 = self.CSP6(c5) c7 = self.conv3(c6) c8 = self.concat((c7, c4)) # out c9 = self.CSP7(c8) c10 = self.conv4(c9) c11 = self.concat((c10, c1)) # out c12 = self.CSP8(c11) small_object_output = self.back_block1(c6) medium_object_output = self.back_block2(c9) big_object_output = self.back_block3(c12) return small_object_output, medium_object_output, big_object_output 2.4 模型构建 backbone+Neck+Head Backbone作用：特征提取 Neck作用：对特征进行一波混合与组合，并且把这些特征传递给预测层 Head作用：进行最终的预测输出 jupyter # backbone+Neck+Head class DetectionBlock(nn.Cell): def __init__(self, scale, is_training=True): super(DetectionBlock, self).__init__() if scale == 's': idx = (0, 1, 2) self.scale_x_y = 1.2 self.offset_x_y = 0.1 elif scale == 'm': idx = (3, 4, 5) self.scale_x_y = 1.1 self.offset_x_y = 0.05 elif scale == 'l': idx = (6, 7, 8) self.scale_x_y = 1.05 self.offset_x_y = 0.025 else: raise KeyError("Invalid scale value for DetectionBlock") self.anchors = ms.Tensor([args.anchor_scales[i] for i in idx], ms.float32) self.num_anchors_per_scale = 3 self.num_attrib = 4+1+args.num_classes self.lambda_coord = 1 self.sigmoid = nn.Sigmoid() self.reshape = ops.Reshape() self.tile = ops.Tile() self.concat = ops.Concat(axis=-1) self.pow = ops.Pow() self.transpose = ops.Transpose() self.exp = ops.Exp() self.conf_training = is_training def construct(self, x, input_shape): """construct method""" num_batch = x.shape[0] grid_size = x.shape[2:4] # Reshape and transpose the feature to [n, grid_size[0], grid_size[1], 3, num_attrib] prediction = self.reshape(x, (num_batch, self.num_anchors_per_scale, self.num_attrib, grid_size[0], grid_size[1])) prediction = self.transpose(prediction, (0, 3, 4, 1, 2)) grid_x = ms.numpy.arange(grid_size[1]) grid_y = ms.numpy.arange(grid_size[0]) # Tensor of shape [grid_size[0], grid_size[1], 1, 1] representing the coordinate of x/y axis for each grid # [batch, gridx, gridy, 1, 1] grid_x = self.tile(self.reshape(grid_x, (1, 1, -1, 1, 1)), (1, grid_size[0], 1, 1, 1)) grid_y = self.tile(self.reshape(grid_y, (1, -1, 1, 1, 1)), (1, 1, grid_size[1], 1, 1)) # Shape is [grid_size[0], grid_size[1], 1, 2] grid = self.concat((grid_x, grid_y)) box_xy = prediction[:, :, :, :, :2] box_wh = prediction[:, :, :, :, 2:4] box_confidence = prediction[:, :, :, :, 4:5] box_probs = prediction[:, :, :, :, 5:] # gridsize1 is x # gridsize0 is y box_xy = (self.scale_x_y * self.sigmoid(box_xy) - self.offset_x_y + grid) / \ ops.cast(ops.tuple_to_array((grid_size[1], grid_size[0])), ms.float32) # box_wh is w->h box_wh = self.exp(box_wh) * self.anchors / input_shape box_confidence = self.sigmoid(box_confidence) box_probs = self.sigmoid(box_probs) if self.conf_training: return prediction, box_xy, box_wh return self.concat((box_xy, box_wh, box_confidence, box_probs)) class YOLOV5(nn.Cell): """ YOLOV5 network. Args: is_training: Bool. Whether train or not. Returns: Cell, cell instance of YOLOV5 neural network. Examples: YOLOV5s(True) """ def __init__(self, is_training, version=0): super(YOLOV5, self).__init__() # YOLOv5 network self.shape = args.input_shape[version] self.feature_map = YOLO(backbone=YOLOv5Backbone(shape=self.shape), shape=self.shape) # prediction on the default anchor boxes self.detect_1 = DetectionBlock('l', is_training=is_training) self.detect_2 = DetectionBlock('m', is_training=is_training) self.detect_3 = DetectionBlock('s', is_training=is_training) def construct(self, x, input_shape): small_object_output, medium_object_output, big_object_output = self.feature_map(x) output_big = self.detect_1(big_object_output, input_shape) output_me = self.detect_2(medium_object_output, input_shape) output_small = self.detect_3(small_object_output, input_shape) # big is the final output which has smallest feature map return output_big, output_me, output_small 2.5 损失定义 YOLOv5一共有三种损失函数：分类损失cls_loss：计算锚框与对应的标定分类是否正确定位损失box_loss：预测框与标定框之间的误差（GIoU）置信度损失obj_loss：计算网络的置信度总的损失函数=分类损失+定位损失+置信度损失分类损失cls_loss和置信度损失obj_loss使用的是二元交叉熵损失函数BCEWithLogitsLoss计算。 BCEwithlogitsloss = BCELoss + Sigmoid。定位损失box_loss使用的是GIoU loss(可以的话，也可以替换成CIoU，EIoU，SIoU等IoU损失) > IOU损失 IOU Loss的定义是先求出预测框和真实框之间的交集和并集之比，再求负对数，但是在实际使用中我们常常将IOU Loss写成1-IOU。如果两个框重合则交并比等于1，Loss为0说明重合度非常高。IOU满足非负性、同一性、对称性、三角不等性，相比于L1/L2等损失函数还具有尺度不变性，不论box的尺度大小，输出的iou损失总是在0-1之间。所以能够较好的反映预测框与真实框的检测效果。 IOU的公式如下： jupyter IOU的图示如下： jupyter 普通IOU的优缺点很明显，优点：1、IOU具有尺度不变性2、满足非负性。同时，由于IOU并没有考虑框之间的距离，所以它的作为loss函数的时候也有相应的缺点：1、在A框与B框不重合的时候IOU为0，不能正确反映两者的距离大小。2. IoU无法精确的反映两者的重合度大小。 GIOU是为克服IOU的缺点同时充分利用优点而提出的.(论文：Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression) GIOU的公式如下： jupyter GIOU的图示如下： jupyter 计算过程如下：1.假设A为预测框，B为真实框，S是所有框的集合，2.不管A与B是否相交，C是包含A与B的最小框(包含A与B的最小凸闭合框)，C也属于S集合，3.首先计算IoU，A与B的交并比，4.再计算C框中没有A与B的面积，比上C框面积；5.IoU减去前面算出的比；得到GIoU。 # loss class ConfidenceLoss(nn.Cell): """Loss for confidence.""" def __init__(self): super(ConfidenceLoss, self).__init__() self.cross_entropy = ops.SigmoidCrossEntropyWithLogits() self.reduce_sum = ops.ReduceSum() def construct(self, object_mask, predict_confidence, ignore_mask): confidence_loss = self.cross_entropy(predict_confidence, object_mask) confidence_loss = object_mask * confidence_loss + (1 - object_mask) * confidence_loss * ignore_mask confidence_loss = self.reduce_sum(confidence_loss, ()) return confidence_loss class ClassLoss(nn.Cell): """Loss for classification.""" def __init__(self): super(ClassLoss, self).__init__() self.cross_entropy = ops.SigmoidCrossEntropyWithLogits() self.reduce_sum = ops.ReduceSum() def construct(self, object_mask, predict_class, class_probs): class_loss = object_mask * self.cross_entropy(predict_class, class_probs) class_loss = self.reduce_sum(class_loss, ()) return class_loss class Iou(nn.Cell): """Calculate the iou of boxes""" def __init__(self): super(Iou, self).__init__() self.min = ops.Minimum() self.max = ops.Maximum() self.squeeze = ops.Squeeze(-1) def construct(self, box1, box2): """ box1: pred_box [batch, gx, gy, anchors, 1, 4] ->4: [x_center, y_center, w, h] box2: gt_box [batch, 1, 1, 1, maxbox, 4] convert to topLeft and rightDown """ box1_xy = box1[:, :, :, :, :, :2] box1_wh = box1[:, :, :, :, :, 2:4] box1_mins = box1_xy - box1_wh / ops.scalar_to_array(2.0) # topLeft box1_maxs = box1_xy + box1_wh / ops.scalar_to_array(2.0) # rightDown box2_xy = box2[:, :, :, :, :, :2] box2_wh = box2[:, :, :, :, :, 2:4] box2_mins = box2_xy - box2_wh / ops.scalar_to_array(2.0) box2_maxs = box2_xy + box2_wh / ops.scalar_to_array(2.0) intersect_mins = self.max(box1_mins, box2_mins) intersect_maxs = self.min(box1_maxs, box2_maxs) intersect_wh = self.max(intersect_maxs - intersect_mins, ops.scalar_to_array(0.0)) # self.squeeze: for effiecient slice intersect_area = self.squeeze(intersect_wh[:, :, :, :, :, 0:1]) * \ self.squeeze(intersect_wh[:, :, :, :, :, 1:2]) box1_area = self.squeeze(box1_wh[:, :, :, :, :, 0:1]) * \ self.squeeze(box1_wh[:, :, :, :, :, 1:2]) box2_area = self.squeeze(box2_wh[:, :, :, :, :, 0:1]) * \ self.squeeze(box2_wh[:, :, :, :, :, 1:2]) iou = intersect_area / (box1_area + box2_area - intersect_area) # iou : [batch, gx, gy, anchors, maxboxes] return iou class GIou(nn.Cell): """Calculating giou""" def __init__(self): super(GIou, self).__init__() self.reshape = ops.Reshape() self.min = ops.Minimum() self.max = ops.Maximum() self.concat = ops.Concat(axis=1) self.mean = ops.ReduceMean() self.div = ops.RealDiv() self.eps = 0.000001 def construct(self, box_p, box_gt): """construct method""" box_p_area = (box_p[..., 2:3] - box_p[..., 0:1]) * (box_p[..., 3:4] - box_p[..., 1:2]) box_gt_area = (box_gt[..., 2:3] - box_gt[..., 0:1]) * (box_gt[..., 3:4] - box_gt[..., 1:2]) x_1 = self.max(box_p[..., 0:1], box_gt[..., 0:1]) x_2 = self.min(box_p[..., 2:3], box_gt[..., 2:3]) y_1 = self.max(box_p[..., 1:2], box_gt[..., 1:2]) y_2 = self.min(box_p[..., 3:4], box_gt[..., 3:4]) intersection = (y_2 - y_1) * (x_2 - x_1) xc_1 = self.min(box_p[..., 0:1], box_gt[..., 0:1]) xc_2 = self.max(box_p[..., 2:3], box_gt[..., 2:3]) yc_1 = self.min(box_p[..., 1:2], box_gt[..., 1:2]) yc_2 = self.max(box_p[..., 3:4], box_gt[..., 3:4]) c_area = (xc_2 - xc_1) * (yc_2 - yc_1) union = box_p_area + box_gt_area - intersection union = union + self.eps c_area = c_area + self.eps iou = self.div(ops.cast(intersection, ms.float32), ops.cast(union, ms.float32)) res_mid0 = c_area - union res_mid1 = self.div(ops.cast(res_mid0, ms.float32), ops.cast(c_area, ms.float32)) giou = iou - res_mid1 giou = ops.clip_by_value(giou, -1.0, 1.0) return giou def xywh2x1y1x2y2(box_xywh): boxes_x1 = box_xywh[..., 0:1] - box_xywh[..., 2:3] / 2 boxes_y1 = box_xywh[..., 1:2] - box_xywh[..., 3:4] / 2 boxes_x2 = box_xywh[..., 0:1] + box_xywh[..., 2:3] / 2 boxes_y2 = box_xywh[..., 1:2] + box_xywh[..., 3:4] / 2 boxes_x1y1x2y2 = ops.Concat(-1)((boxes_x1, boxes_y1, boxes_x2, boxes_y2)) return boxes_x1y1x2y2 class YoloLossBlock(nn.Cell): """ Loss block cell of YOLOV5 network. """ def __init__(self, scale, config=args): super(YoloLossBlock, self).__init__() self.config = config if scale == 's': # anchor mask idx = (0, 1, 2) elif scale == 'm': idx = (3, 4, 5) elif scale == 'l': idx = (6, 7, 8) else: raise KeyError("Invalid scale value for DetectionBlock") self.anchors = ms.Tensor([self.config.anchor_scales[i] for i in idx], ms.float32) self.ignore_threshold = ms.Tensor(self.config.ignore_threshold, ms.float32) self.concat = ops.Concat(axis=-1) self.iou = Iou() self.reduce_max = ops.ReduceMax(keep_dims=False) self.confidence_loss = ConfidenceLoss() self.class_loss = ClassLoss() self.reduce_sum = ops.ReduceSum() self.select = ops.Select() self.equal = ops.Equal() self.reshape = ops.Reshape() self.expand_dims = ops.ExpandDims() self.ones_like = ops.OnesLike() self.log = ops.Log() self.tuple_to_array = ops.TupleToArray() self.g_iou = GIou() def construct(self, prediction, pred_xy, pred_wh, y_true, gt_box, input_shape): """ prediction : origin output from yolo pred_xy: (sigmoid(xy)+grid)/grid_size pred_wh: (exp(wh)*anchors)/input_shape y_true : after normalize gt_box: [batch, maxboxes, xyhw] after normalize """ object_mask = y_true[:, :, :, :, 4:5] class_probs = y_true[:, :, :, :, 5:] true_boxes = y_true[:, :, :, :, :4] grid_shape = prediction.shape[1:3] grid_shape = ops.cast(self.tuple_to_array(grid_shape[::-1]), ms.float32) pred_boxes = self.concat((pred_xy, pred_wh)) true_wh = y_true[:, :, :, :, 2:4] true_wh = self.select(self.equal(true_wh, 0.0), self.ones_like(true_wh), true_wh) true_wh = self.log(true_wh / self.anchors * input_shape) # 2-w*h for large picture, use small scale, since small obj need more precise box_loss_scale = 2 - y_true[:, :, :, :, 2:3] * y_true[:, :, :, :, 3:4] gt_shape = gt_box.shape gt_box = self.reshape(gt_box, (gt_shape[0], 1, 1, 1, gt_shape[1], gt_shape[2])) # add one more dimension for broadcast iou = self.iou(self.expand_dims(pred_boxes, -2), gt_box) # gt_box is x,y,h,w after normalize # [batch, grid[0], grid[1], num_anchor, num_gt] best_iou = self.reduce_max(iou, -1) # [batch, grid[0], grid[1], num_anchor] # ignore_mask IOU too small ignore_mask = best_iou < self.ignore_threshold ignore_mask = ops.cast(ignore_mask, ms.float32) ignore_mask = self.expand_dims(ignore_mask, -1) # ignore_mask backpro will cause a lot maximunGrad and minimumGrad time consume. # so we turn off its gradient ignore_mask = ops.stop_gradient(ignore_mask) confidence_loss = self.confidence_loss(object_mask, prediction[:, :, :, :, 4:5], ignore_mask) class_loss = self.class_loss(object_mask, prediction[:, :, :, :, 5:], class_probs) object_mask_me = self.reshape(object_mask, (-1, 1)) # [8, 72, 72, 3, 1] box_loss_scale_me = self.reshape(box_loss_scale, (-1, 1)) pred_boxes_me = xywh2x1y1x2y2(pred_boxes) pred_boxes_me = self.reshape(pred_boxes_me, (-1, 4)) true_boxes_me = xywh2x1y1x2y2(true_boxes) true_boxes_me = self.reshape(true_boxes_me, (-1, 4)) c_iou = self.g_iou(pred_boxes_me, true_boxes_me) c_iou_loss = object_mask_me * box_loss_scale_me * (1 - c_iou) c_iou_loss_me = self.reduce_sum(c_iou_loss, ()) loss = c_iou_loss_me * 4 + confidence_loss + class_loss batch_size = prediction.shape[0] return loss / batch_size class YoloWithLossCell(nn.Cell): """YOLOV5 loss.""" def __init__(self, network): super(YoloWithLossCell, self).__init__() self.yolo_network = network self.config = args self.loss_big = YoloLossBlock('l', self.config) self.loss_me = YoloLossBlock('m', self.config) self.loss_small = YoloLossBlock('s', self.config) self.tenser_to_array = ops.TupleToArray() def construct(self, x, y_true_0, y_true_1, y_true_2, gt_0, gt_1, gt_2, input_shape): input_shape = x.shape[2:4] input_shape = ops.cast(self.tenser_to_array(input_shape) * 2, ms.float32) yolo_out = self.yolo_network(x, input_shape) loss_l = self.loss_big(*yolo_out[0], y_true_0, gt_0, input_shape) loss_m = self.loss_me(*yolo_out[1], y_true_1, gt_1, input_shape) loss_s = self.loss_small(*yolo_out[2], y_true_2, gt_2, input_shape) return loss_l + loss_m + loss_s * 0.2

yd_233394255 发表于2024-03-09 17:43:07 2024-03-09 17:43:07 最后回复 yd_233394255 0
48 0

图像识别 Image
[其他] YoloX（4）

3.12 训练相关函数 #------------------------# # train func #------------------------# set_seed(888) def set_default(): """ set default """ if config.enable_modelarts: config.data_root = os.path.join(config.data_dir, 'coco2017/train2017') config.annFile = os.path.join(config.data_dir, 'coco2017/annotations') outputs_dir = os.path.join(config.outputs_dir, config.ckpt_path) else: config.data_root = os.path.join(config.data_dir, 'train2017') config.annFile = os.path.join(config.data_dir, 'annotations/instances_train2017.json') outputs_dir = config.ckpt_path # logger config.outputs_dir = os.path.join(outputs_dir, datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) config.logger = get_logger(config.outputs_dir, config.rank) config.logger.save_args(config) def set_graph_kernel_context(): if context.get_context("device_target") == "GPU": context.set_context(enable_graph_kernel=True) context.set_context(graph_kernel_flags="--enable_parallel_fusion " "--enable_trans_op_optimize " "--disable_cluster_ops=ReduceMax,Reshape " "--enable_expand_ops=Conv2D") def network_init(cfg): """ Network init """ device_id = int(os.getenv('DEVICE_ID', '0')) context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target, save_graphs=cfg.save_graphs, device_id=device_id, save_graphs_path="ir_path") set_graph_kernel_context() profiler = None if cfg.need_profiler: profiling_dir = os.path.join(cfg.outputs_dir, datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) profiler = Profiler(output_path=profiling_dir, is_detail=True, is_show_op_path=True) # init distributed cfg.use_syc_bn = False if cfg.is_distributed: cfg.use_syc_bn = True init() cfg.rank = get_rank() cfg.group_size = get_group_size() context.reset_auto_parallel_context() context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True, device_num=cfg.group_size) # select for master rank save ckpt or all rank save, compatible for model parallel cfg.rank_save_ckpt_flag = 0 if cfg.is_save_on_master: if cfg.rank == 0: cfg.rank_save_ckpt_flag = 1 else: cfg.rank_save_ckpt_flag = 1 # logger cfg.outputs_dir = os.path.join(cfg.ckpt_path, datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) cfg.logger = get_logger(cfg.outputs_dir, cfg.rank) cfg.logger.save_args(cfg) return profiler def parallel_init(args): context.reset_auto_parallel_context() parallel_mode = ParallelMode.STAND_ALONE degree = 1 if args.is_distributed: parallel_mode = ParallelMode.DATA_PARALLEL degree = get_group_size() context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=degree) def modelarts_pre_process(): '''modelarts pre process function.''' def unzip(zip_file, save_dir): import zipfile s_time = time.time() if not os.path.exists(os.path.join(save_dir, config.modelarts_dataset_unzip_name)): zip_isexist = zipfile.is_zipfile(zip_file) if zip_isexist: fz = zipfile.ZipFile(zip_file, 'r') data_num = len(fz.namelist()) print("Extract Start...") print("unzip file num: {}".format(data_num)) data_print = int(data_num / 100) if data_num > 100 else 1 i = 0 for file in fz.namelist(): if i % data_print == 0: print("unzip percent: {}%".format(int(i * 100 / data_num)), flush=True) i += 1 fz.extract(file, save_dir) print("cost time: {}min:{}s.".format(int((time.time() - s_time) / 60), int(int(time.time() - s_time) % 60))) print("Extract Done.") else: print("This is not zip.") else: print("Zip has been extracted.") if config.need_modelarts_dataset_unzip: zip_file_1 = os.path.join(config.data_path, config.modelarts_dataset_unzip_name + ".zip") save_dir_1 = os.path.join(config.data_path) sync_lock = "/tmp/unzip_sync.lock" # Each server contains 8 devices as most. if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock): print("Zip file path: ", zip_file_1) print("Unzip file save dir: ", save_dir_1) unzip(zip_file_1, save_dir_1) print("===Finish extract data synchronization===") try: os.mknod(sync_lock) except IOError: pass while True: if os.path.exists(sync_lock): break time.sleep(1) print("Device: {}, Finish sync unzip data from {} to {}.".format(get_device_id(), zip_file_1, save_dir_1)) config.ckpt_path = os.path.join(config.output_path, config.ckpt_path) def parser_init(): parser = argparse.ArgumentParser(description='Yolox train.') parser.add_argument('--data_url', required=False, default=None, help='Location of data.') parser.add_argument('--train_url', required=False, default=None, help='Location of training outputs.') parser.add_argument('--backbone', required=False, default="yolox_darknet53") parser.add_argument('--min_lr_ratio', required=False, default=0.05) parser.add_argument('--data_aug', required=False, default=True) return parser def get_val_dataset(): val_root = os.path.join(config.data_dir, 'val2017') ann_file = os.path.join(config.data_dir, 'annotations/instances_val2017.json') ds_test = create_yolox_dataset(val_root, ann_file, is_training=False, batch_size=config.per_batch_size, device_num=config.group_size, rank=config.rank) config.logger.info("Finish loading the val dataset!") return ds_test def get_optimizer(cfg, network, lr): param_group = get_param_groups(network, cfg.weight_decay) if cfg.opt == "SGD": from mindspore.nn import SGD opt = SGD(params=param_group, learning_rate=Tensor(lr), momentum=config.momentum, nesterov=True) cfg.logger.info("Use SGD Optimizer") else: from mindspore.nn import Momentum opt = Momentum(params=param_group, learning_rate=Tensor(lr), momentum=cfg.momentum, use_nesterov=True) cfg.logger.info("Use Momentum Optimizer") return opt def load_resume_checkpoint(cfg, network, ckpt_path): param_dict = load_checkpoint(ckpt_path) ema_train_weight = [] ema_moving_weight = [] param_load = {} for key, param in param_dict.items(): if key.startswith("network.") or key.startswith("moments."): param_load[key] = param elif "updates" in key: cfg.updates = param network.updates = cfg.updates config.logger.info("network_ema updates:%s" % network.updates.asnumpy().item()) load_param_into_net(network, param_load) for key, param in network.parameters_and_names(): if key.startswith("ema.") and "moving_mean" not in key and "moving_variance" not in key: ema_train_weight.append(param_dict[key]) elif key.startswith("ema.") and ("moving_mean" in key or "moving_variance" in key): ema_moving_weight.append(param_dict[key]) if network.ema: if ema_train_weight and ema_moving_weight: network.ema_weight = ParameterTuple(ema_train_weight) network.ema_moving_weight = ParameterTuple(ema_moving_weight) config.logger.info("successful loading ema weights") 4 运行 4.1 训练@moxing_wrapper(pre_process=modelarts_pre_process) def run_train(train_stage='stage_1', profiler=None): """ Launch Train process """ parser = parser_init() args_opt, _ = parser.parse_known_args() if not config.data_aug: # Train the last no data augment epochs config.use_l1 = True # Add L1 loss config.max_epoch = config.total_epoch - config.max_epoch config.lr_scheduler = "no_aug_lr" # fix the min lr for last no data aug epochs if config.enable_modelarts: import moxing as mox local_data_url = os.path.join(config.data_path, str(config.rank)) local_annFile = os.path.join(config.data_path, str(config.rank)) mox.file.copy_parallel(config.data_root, local_data_url) config.data_dir = os.path.join(config.data_path, 'coco2017') mox.file.copy_parallel(config.annFile, local_annFile) config.annFile = os.path.join(local_data_url, 'instances_train2017.json') if config.backbone == "yolox_darknet53": backbone = "yolofpn" else: backbone = "yolopafpn" base_network = DetectionBlock(config, backbone=backbone) if config.pretrained: base_network = load_backbone(base_network, config.pretrained, config) config.logger.info('Training backbone is: %s' % config.backbone) if config.use_syc_bn: config.logger.info("Using Synchronized batch norm layer...") use_syc_bn(base_network) default_recurisive_init(base_network) config.logger.info("Network weights have been initialized...") network = YOLOLossCell(base_network, config) config.logger.info('Finish getting network...') config.data_root = os.path.join(config.data_dir, 'train2017') config.annFile = os.path.join(config.data_dir, 'annotations/instances_train2017.json') ds = create_yolox_dataset(image_dir=config.data_root, anno_path=config.annFile, batch_size=config.per_batch_size, device_num=config.group_size, rank=config.rank, data_aug=config.data_aug) ds_test = get_val_dataset() config.logger.info('Finish loading training dataset! batch size:%s' % config.per_batch_size) config.steps_per_epoch = ds.get_dataset_size() config.logger.info('%s steps for one epoch.' % config.steps_per_epoch) if config.ckpt_interval <= 0: config.ckpt_interval = 1 lr = get_lr(config) config.logger.info("Learning rate scheduler:%s, base_lr:%s, min lr ratio:%s" % (config.lr_scheduler, config.lr, config.min_lr_ratio)) opt = get_optimizer(config, network, lr) loss_scale_manager = DynamicLossScaleManager(init_loss_scale=2 ** 22) update_cell = loss_scale_manager.get_update_cell() network_ema = TrainOneStepWithEMA(network, opt, update_cell, ema=True, decay=0.9998, updates=config.updates).set_train() if config.resume_yolox: resume_steps = config.updates.asnumpy().items() config.resume_epoch = resume_steps // config.steps_per_epoch lr = lr[resume_steps:] opt = get_optimizer(config, network, lr) network_ema = TrainOneStepWithEMA(network, opt, update_cell, ema=True, decay=0.9998, updates=resume_steps).set_train() load_resume_checkpoint(config, network_ema, config.resume_yolox) if not config.data_aug: if os.path.isfile(config.yolox_no_aug_ckpt): # Loading the resume checkpoint for the last no data aug epochs load_resume_checkpoint(config, network_ema, config.yolox_no_aug_ckpt) config.logger.info("Finish load the resume checkpoint, begin to train the last...") else: raise FileNotFoundError('{} not exist or not a pre-trained file'.format(config.yolox_no_aug_ckpt)) config.logger.info("Add ema model") model = Model(network_ema, amp_level="O0") cb = [] save_ckpt_path = None if config.rank_save_ckpt_flag: cb.append(EMACallBack(network_ema, config.steps_per_epoch)) ckpt_config = CheckpointConfig(save_checkpoint_steps=config.steps_per_epoch * config.ckpt_interval, keep_checkpoint_max=config.ckpt_max_num) save_ckpt_path = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/') cb.append(ModelCheckpoint(config=ckpt_config, directory=save_ckpt_path, prefix='{}'.format(config.backbone))) cb.append(YOLOXCB(config.logger, config.steps_per_epoch, lr=lr, save_ckpt_path=save_ckpt_path, is_modelart=config.enable_modelarts, per_print_times=config.log_interval, train_url=args_opt.train_url)) if config.run_eval: test_block = DetectionBlock(config, backbone=backbone) cb.append( EvalCallBack(ds_test, test_block, network_ema, DetectionEngine(config), config, interval=config.eval_interval)) if config.need_profiler: model.train(3, ds, callbacks=cb, dataset_sink_mode=True, sink_size=config.log_interval) profiler.analyse() else: config.logger.info("Epoch number:%s" % config.max_epoch) config.logger.info("All steps number:%s" % (config.max_epoch * config.steps_per_epoch)) config.logger.info("==================Start Training " + train_stage + "=========================") model.train(config.max_epoch, ds, callbacks=cb, dataset_sink_mode=False, sink_size=-1) config.logger.info("==================Training END " + train_stage + "======================") mindspore.save_checkpoint(network_ema, os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt')) config.yolox_no_aug_ckpt = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt') config.val_ckpt = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt') config.pred_ckpt = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/' + train_stage + '/' + train_stage+'_final.ckpt') 4.2 验证 #------------------------# # eval func #------------------------# def run_eval(): """The function of eval""" config.data_root = os.path.join(config.data_dir, 'val2017') config.annFile = os.path.join(config.data_dir, 'annotations/instances_val2017.json') # logger config.outputs_dir = os.path.join( config.log_path, datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') ) rank_id = int(os.getenv('RANK_ID', '0')) config.logger = get_logger(config.outputs_dir, rank_id) context.reset_auto_parallel_context() parallel_mode = ParallelMode.STAND_ALONE context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=1) # ------------------network create---------------------------------------------------------------------------- config.logger.info('Begin Creating Network....') if config.backbone == "yolox_darknet53": backbone = "yolofpn" else: backbone = "yolopafpn" network = DetectionBlock(config, backbone=backbone) # default yolo-darknet53 default_recurisive_init(network) config.logger.info(config.val_ckpt) if os.path.isfile(config.val_ckpt): param_dict = load_checkpoint(config.val_ckpt) ema_param_dict = {} for param in param_dict: if param.startswith("ema."): new_name = param.split("ema.")[1] data = param_dict[param] data.name = new_name ema_param_dict[new_name] = data load_param_into_net(network, ema_param_dict) config.logger.info('load model %s success', config.val_ckpt) else: config.logger.info('%s doesn''t exist or is not a pre-trained file', config.val_ckpt) raise FileNotFoundError('{} not exist or not a pre-trained file'.format(config.val_ckpt)) data_root = config.data_root anno_file = config.annFile ds = create_yolox_dataset(data_root, anno_file, is_training=False, batch_size=config.per_batch_size, device_num=1, rank=rank_id) data_size = ds.get_dataset_size() config.logger.info( 'Finish loading the dataset, totally %s images to eval, iters %s' % (data_size * config.per_batch_size, \ data_size)) network.set_train(False) # init detection engine detection = DetectionEngine(config) config.logger.info('Start inference...') for _, data in enumerate( tqdm(ds.create_dict_iterator(num_epochs=1), total=data_size, colour="GREEN")): image = data['image'] img_info = data['image_shape'] img_id = data['img_id'] prediction = network(image) prediction = prediction.asnumpy() img_shape = img_info.asnumpy() img_id = img_id.asnumpy() detection.detection(prediction, img_shape, img_id) config.logger.info('Calculating mAP...') result_file_path = detection.evaluate_prediction() config.logger.info('result file path: %s', result_file_path) eval_result, _ = detection.get_eval_result() eval_print_str = '\n=============coco eval result=========\n' + eval_result config.logger.info(eval_print_str) 4.3 测试 #------------------------# # pred func(to be fixed) #------------------------# def run_pred(): if not os.path.exists(config.pred_output): os.makedirs(config.pred_output) context.reset_auto_parallel_context() parallel_mode = ParallelMode.STAND_ALONE context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=1) if config.backbone == "yolox_darknet53": backbone = "yolofpn" else: backbone = "yolopafpn" network = DetectionBlock(config, backbone=backbone) default_recurisive_init(network) if os.path.isfile(config.pred_ckpt): param_dict = load_checkpoint(config.pred_ckpt) ema_param_dict = {} for param in param_dict: if param.startswith("ema."): new_name = param.split("ema.")[1] data = param_dict[param] data.name = new_name ema_param_dict[new_name] = data load_param_into_net(network, ema_param_dict) else: raise FileNotFoundError('{} not exist or not a pre-trained file'.format(config.pred_ckpt)) pred_transform = ValTransform(legacy=False) data_list = os.listdir(config.pred_input) prediction_engine = PredictionEngine(config=config) network.set_train(False) for image_name in tqdm(data_list): image_path = os.path.join(config.pred_input, image_name) image = np.array(cv2.imread(image_path)) r = min(config.input_size[0] / image.shape[0], config.input_size[1] / image.shape[1]) image_data = cv2.resize( image, (int(image.shape[1] * r), int(image.shape[0] * r)), interpolation=cv2.INTER_LINEAR, ).astype(np.float32) image_data, _ = pred_transform(image_data, config.input_size) image_data = np.expand_dims(image_data,0) image_data = Tensor(image_data) output = network(image_data).asnumpy() mask = prediction_engine.prediction(output, image.shape).astype(image.dtype) if not mask is None: pred_image = cv2.addWeighted(image,1,mask,0.3,0) cv2.imwrite(os.path.join(config.pred_output, image_name), pred_image) #------------------------# # process train #------------------------# def run(): set_default() profiler = network_init(config) parallel_init(config) config.data_aug = True run_train('stage_1', profiler) config.data_aug = False run_train('stage_2', profiler) run_eval() #run_pred() if __name__ == "__main__": run() （一大堆结果）

yd_233394255 发表于2024-03-09 17:40:25 2024-03-09 17:40:25 最后回复运气男孩 2024-04-01 08:56:58
34 1

图像识别 Image
[其他] YoloX（3）

3.10 训练与评估相关函数针对学习率相关的函数回调机制，保存训练纪录和验证纪录，更新EMA权重 DetectionEngine与PredictionEngine分别作用于验证和测试模块 #------------------------# # lr and callback utils #------------------------# def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr): """Linear learning rate.""" lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps) lr = float(init_lr) + lr_inc * current_step return lr def warmup_step_lr(lr, lr_epochs, steps_per_epoch, warmup_epochs, max_epoch, gamma=0.1): """Warmup step learning rate.""" base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) milestones = lr_epochs milestones_steps = [] for milestone in milestones: milestones_step = milestone * steps_per_epoch milestones_steps.append(milestones_step) lr_each_step = [] lr = base_lr milestones_steps_counter = Counter(milestones_steps) for i in range(total_steps): if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: lr = lr * gamma ** milestones_steps_counter[i] lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def multi_step_lr(lr, milestones, steps_per_epoch, max_epoch, gamma=0.1): return warmup_step_lr(lr, milestones, steps_per_epoch, 0, max_epoch, gamma=gamma) def step_lr(lr, epoch_size, steps_per_epoch, max_epoch, gamma=0.1): lr_epochs = [] for i in range(1, max_epoch): if i % epoch_size == 0: lr_epochs.append(i) return multi_step_lr(lr, lr_epochs, steps_per_epoch, max_epoch, gamma=gamma) def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch, t_max, eta_min=0): """Cosine annealing learning rate.""" base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) lr_each_step = [] for i in range(total_steps): last_epoch = i // steps_per_epoch if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max)) / 2 lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def yolox_warm_cos_lr( lr, steps_per_epoch, warmup_epochs, max_epoch, no_aug_epochs, warmup_lr_start=0, min_lr_ratio=0.05 ): """Cosine learning rate with warm up.""" base_lr = lr min_lr = lr * min_lr_ratio total_iters = int(max_epoch * steps_per_epoch) warmup_total_iters = int(warmup_epochs * steps_per_epoch) no_aug_iter = no_aug_epochs * steps_per_epoch lr_each_step = [] for i in range(total_iters): if i < warmup_total_iters: lr = (base_lr - warmup_lr_start) * pow( (i + 1) / float(warmup_total_iters), 2 ) + warmup_lr_start elif i >= total_iters - no_aug_iter: lr = min_lr else: lr = min_lr + 0.5 * (base_lr - min_lr) * (1.0 + math.cos( math.pi * (i - warmup_total_iters) / (total_iters - warmup_total_iters - no_aug_iter))) lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def warmup_cosine_annealing_lr_v2(lr, steps_per_epoch, warmup_epochs, max_epoch, t_max, eta_min=0): """Cosine annealing learning rate V2.""" base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) last_lr = 0 last_epoch_v1 = 0 t_max_v2 = int(max_epoch * 1 / 3) lr_each_step = [] for i in range(total_steps): last_epoch = i // steps_per_epoch if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: if i < total_steps * 2 / 3: lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max)) / 2 last_lr = lr last_epoch_v1 = last_epoch else: base_lr = last_lr last_epoch = last_epoch - last_epoch_v1 lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max_v2)) / 2 lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def warmup_cosine_annealing_lr_sample(lr, steps_per_epoch, warmup_epochs, max_epoch, t_max, eta_min=0): """Warmup cosine annealing learning rate.""" start_sample_epoch = 60 step_sample = 2 tobe_sampled_epoch = 60 end_sampled_epoch = start_sample_epoch + step_sample * tobe_sampled_epoch max_sampled_epoch = max_epoch + tobe_sampled_epoch t_max = max_sampled_epoch base_lr = lr warmup_init_lr = 0 total_steps = int(max_epoch * steps_per_epoch) total_sampled_steps = int(max_sampled_epoch * steps_per_epoch) warmup_steps = int(warmup_epochs * steps_per_epoch) lr_each_step = [] for i in range(total_sampled_steps): last_epoch = i // steps_per_epoch if last_epoch in range(start_sample_epoch, end_sampled_epoch, step_sample): continue if i < warmup_steps: lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) else: lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / t_max)) / 2 lr_each_step.append(lr) assert total_steps == len(lr_each_step) return np.array(lr_each_step).astype(np.float32) def yolox_no_aug_lr(base_lr, steps_per_epoch, max_epoch, min_lr_ratio=0.05): total_iters = int(max_epoch * steps_per_epoch) lr = base_lr * min_lr_ratio lr_each_step = [] for _ in range(total_iters): lr_each_step.append(lr) return np.array(lr_each_step).astype(np.float32) def get_lr(args): """generate learning rate.""" if args.lr_scheduler == 'exponential': lr = warmup_step_lr(args.lr, args.lr_epochs, args.steps_per_epoch, args.warmup_epochs, args.max_epoch, gamma=args.lr_gamma, ) elif args.lr_scheduler == 'cosine_annealing': lr = warmup_cosine_annealing_lr(args.lr, args.steps_per_epoch, args.warmup_epochs, args.max_epoch, args.t_max, args.eta_min) elif args.lr_scheduler == 'cosine_annealing_V2': lr = warmup_cosine_annealing_lr_v2(args.lr, args.steps_per_epoch, args.warmup_epochs, args.max_epoch, args.t_max, args.eta_min) elif args.lr_scheduler == 'cosine_annealing_sample': lr = warmup_cosine_annealing_lr_sample(args.lr, args.steps_per_epoch, args.warmup_epochs, args.max_epoch, args.t_max, args.eta_min) elif args.lr_scheduler == 'yolox_warm_cos_lr': lr = yolox_warm_cos_lr(lr=args.lr, steps_per_epoch=args.steps_per_epoch, warmup_epochs=args.warmup_epochs, max_epoch=args.total_epoch, no_aug_epochs=args.no_aug_epochs, min_lr_ratio=args.min_lr_ratio) elif args.lr_scheduler == 'no_aug_lr': lr = yolox_no_aug_lr( args.lr, args.steps_per_epoch, args.max_epoch, min_lr_ratio=args.min_lr_ratio ) else: raise NotImplementedError(args.lr_scheduler) return lr def get_param_groups(network, weight_decay): """Param groups for optimizer.""" decay_params = [] no_decay_params = [] for x in network.trainable_params(): parameter_name = x.name if parameter_name.endswith('.bias'): # all bias not using weight decay no_decay_params.append(x) elif parameter_name.endswith('.gamma'): # bn weight bias not using weight decay, be carefully for now x not include BN no_decay_params.append(x) elif parameter_name.endswith('.beta'): # bn weight bias not using weight decay, be carefully for now x not include BN no_decay_params.append(x) else: decay_params.append(x) return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params, 'weight_decay': weight_decay}] def load_backbone(net, ckpt_path, args): """Load darknet53 backbone checkpoint.""" param_dict = load_checkpoint(ckpt_path) load_param_into_net(net, param_dict) param_not_load = [] for _, param in net.parameters_and_names(): if param.name in param_dict: pass else: param_not_load.append(param.name) args.logger.info("not loading param is :", len(param_not_load)) return net class AverageMeter: """Computes and stores the average and current value""" def __init__(self, name, fmt=':f', tb_writer=None): self.name = name self.fmt = fmt self.reset() self.tb_writer = tb_writer self.cur_step = 1 self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count if self.tb_writer is not None: self.tb_writer.add_scalar(self.name, self.val, self.cur_step) self.cur_step += 1 def __str__(self): print("loss update----------------------------------------------------------------------") fmtstr = '{name}:{avg' + self.fmt + '}' return fmtstr.format(**self.__dict__) def keep_loss_fp32(network): """Keep loss of network with float32""" for _, cell in network.cells_and_names(): if isinstance(cell, (YOLOLossCell,)): cell.to_float(mstype.float32) class EMACallBack(Callback): def __init__(self, network, steps_per_epoch, cur_steps=0): self.steps_per_epoch = steps_per_epoch self.cur_steps = cur_steps self.network = network def on_train_epoch_begin(self, run_context): if self.network.ema: if not isinstance(self.network.ema_moving_weight, list): tmp_moving = [] for weight in self.network.ema_moving_weight: tmp_moving.append(weight.asnumpy()) self.network.ema_moving_weight = tmp_moving def on_train_step_end(self, run_context): if self.network.ema: self.network.moving_parameter_update() self.cur_steps += 1 if self.cur_steps % self.steps_per_epoch == 0: if isinstance(self.network.ema_moving_weight, list): tmp_moving = [] moving_name = [] idx = 0 for key in self.network.moving_name: moving_name.append(key) for weight in self.network.ema_moving_weight: param = Parameter(Tensor(weight), name=moving_name[idx]) tmp_moving.append(param) idx += 1 self.network.ema_moving_weight = ParameterTuple(tmp_moving) class YOLOXCB(Callback): """ YOLOX Callback. """ def __init__(self, logger, step_per_epoch, lr, save_ckpt_path, is_modelart=False, per_print_times=1, train_url=None): super(YOLOXCB, self).__init__() self.train_url = train_url if not isinstance(per_print_times, int) or per_print_times < 0: raise ValueError("print_step must be int and >= 0.") self._per_print_times = per_print_times self.lr = lr self.is_modelarts = is_modelart self.step_per_epoch = step_per_epoch self.current_step = 0 self.save_ckpt_path = save_ckpt_path self.iter_time = time.time() self.epoch_start_time = time.time() self.average_loss = [] self.logger = logger def on_train_epoch_begin(self, run_context): """ Called before each epoch beginning. Args: run_context (RunContext): Include some information of the model. """ self.epoch_start_time = time.time() self.iter_time = time.time() def on_train_epoch_end(self, run_context): """ Called after each epoch finished. Args: run_context (RunContext): Include some information of the model. """ cb_params = run_context.original_args() cur_epoch = cb_params.cur_epoch_num loss = cb_params.net_outputs loss = "loss: %.4f, overflow: %s, scale: %s" % (float(loss[0].asnumpy()), bool(loss[1].asnumpy()), int(loss[2].asnumpy())) self.logger.info( "epoch: %s epoch time %.2fs %s" % (cur_epoch, time.time() - self.epoch_start_time, loss)) if self.current_step % (self.step_per_epoch * 1) == 0: if self.is_modelarts: import moxing as mox if self.save_ckpt_path and self.train_url: mox.file.copy_parallel(src_url=self.save_ckpt_path, dst_url=self.train_url) cur_epoch = self.current_step // self.step_per_epoch self.logger.info( "[epoch {}]copy ckpt from{} to {}".format(self.save_ckpt_path, cur_epoch, self.train_url)) def on_train_step_begin(self, run_context): """ Called before each step beginning. Args: run_context (RunContext): Include some information of the model. """ def on_train_step_end(self, run_context): """ Called after each step finished. Args: run_context (RunContext): Include some information of the model. """ cur_epoch_step = (self.current_step + 1) % self.step_per_epoch if cur_epoch_step % self._per_print_times == 0 and cur_epoch_step != 0: cb_params = run_context.original_args() cur_epoch = cb_params.cur_epoch_num loss = cb_params.net_outputs loss = "loss: %.4f, overflow: %s, scale: %s" % (float(loss[0].asnumpy()), bool(loss[1].asnumpy()), int(loss[2].asnumpy())) self.logger.info("epoch: %s step: [%s/%s], %s, lr: %.6f, avg step time: %.2f ms" % ( cur_epoch, cur_epoch_step, self.step_per_epoch, loss, self.lr[self.current_step], (time.time() - self.iter_time) * 1000 / self._per_print_times)) self.iter_time = time.time() self.current_step += 1 def on_train_end(self, run_context): """ Called once after network training. Args: run_context (RunContext): Include some information of the model. """ class EvalCallBack(Callback): def __init__(self, dataset, test_net, train_net, detection, config, start_epoch=0, interval=1): self.dataset = dataset self.network = train_net self.test_network = test_net self.detection = detection self.logger = config.logger self.start_epoch = start_epoch self.interval = interval self.max_epoch = config.max_epoch self.best_result = 0 self.best_epoch = 0 self.rank = config.rank def load_ema_parameter(self): param_dict = {} for name, param in self.network.parameters_and_names(): if name.startswith("ema."): new_name = name.split('ema.')[-1] param_new = param.clone() param_new.name = new_name param_dict[new_name] = param_new load_param_into_net(self.test_network, param_dict) def load_network_parameter(self): param_dict = {} for name, param in self.network.parameters_and_names(): if name.startswith("network."): param_new = param.clone() param_dict[name] = param_new load_param_into_net(self.test_network, param_dict) def epoch_end(self, run_context): cb_param = run_context.original_args() cur_epoch = cb_param.cur_epoch_num if cur_epoch >= self.start_epoch: if (cur_epoch - self.start_epoch) % self.interval == 0 or cur_epoch == self.max_epoch: self.load_network_parameter() self.test_network.set_train(False) eval_print_str, results = self.inference() if results >= self.best_result: self.best_result = results self.best_epoch = cur_epoch if os.path.exists('best.ckpt'): self.remove_ckpoint_file('best.ckpt') save_checkpoint(cb_param.train_network, 'best.ckpt') self.logger.info("Best result %s at %s epoch" % (self.best_result, self.best_epoch)) self.logger.info(eval_print_str) self.logger.info('Ending inference...') def end(self, run_context): self.logger.info("Best result %s at %s epoch" % (self.best_result, self.best_epoch)) def inference(self): self.logger.info('Start inference...') self.logger.info("eval dataset size, %s" % self.dataset.get_dataset_size()) counts = 0 for data in self.dataset.create_dict_iterator(num_epochs=1): image = data['image'] img_info = data['image_shape'] img_id = data['img_id'] prediction = self.test_network(image) prediction = prediction.asnumpy() img_shape = img_info.asnumpy() img_id = img_id.asnumpy() counts = counts + 1 self.detection.detection(prediction, img_shape, img_id) self.logger.info('Calculating mAP...%s' % counts) self.logger.info('Calculating mAP...%s' % counts) result_file_path = self.detection.evaluate_prediction() self.logger.info('result file path: %s', result_file_path) eval_result, results = self.detection.get_eval_result() if eval_result is not None and results is not None: eval_print_str = '\n=============coco eval result=========\n' + eval_result return eval_print_str, results return None, 0 def remove_ckpoint_file(self, file_name): """Remove the specified checkpoint file from this checkpoint manager and also from the directory.""" try: os.chmod(file_name, stat.S_IWRITE) os.remove(file_name) except OSError: self.logger.info("OSError, failed to remove the older ckpt file %s.", file_name) except ValueError: self.logger.info("ValueError, failed to remove the older ckpt file %s.", file_name) class Redirct: def __init__(self): self.content = "" def write(self, content): self.content += content def flush(self): self.content = "" class DetectionEngine: """ Detection engine """ def __init__(self, config): self.config = config self.input_size = self.config.input_size self.strides = self.config.fpn_strides # [8, 16, 32] self.expanded_strides = None self.grids = None self.num_classes = config.num_classes self.conf_thre = config.conf_thre self.nms_thre = config.nms_thre self.annFile = os.path.join(config.data_dir, 'annotations/instances_val2017.json') self._coco = COCO(self.annFile) self._img_ids = list(sorted(self._coco.imgs.keys())) self.coco_catIds = self._coco.getCatIds() self.save_prefix = config.outputs_dir self.file_path = '' self.data_list = [] def detection(self, outputs, img_shape, img_ids): # post process nms outputs = self.postprocess(outputs, self.num_classes, self.conf_thre, self.nms_thre) self.data_list.extend(self.convert_to_coco_format(outputs, info_imgs=img_shape, ids=img_ids)) def postprocess(self, prediction, num_classes, conf_thre=0.7, nms_thre=0.45, class_agnostic=False): """ nms """ box_corner = np.zeros_like(prediction) box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2 prediction[:, :, :4] = box_corner[:, :, :4] output = [None for _ in range(len(prediction))] for i, image_pred in enumerate(prediction): if not image_pred.shape[0]: continue # Get score and class with highest confidence class_conf = np.max(image_pred[:, 5:5 + num_classes], axis=-1) # (8400) class_pred = np.argmax(image_pred[:, 5:5 + num_classes], axis=-1) # (8400) conf_mask = (image_pred[:, 4] * class_conf >= conf_thre).squeeze() # (8400) class_conf = np.expand_dims(class_conf, axis=-1) # (8400, 1) class_pred = np.expand_dims(class_pred, axis=-1).astype(np.float16) # (8400, 1) # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred) detections = np.concatenate((image_pred[:, :5], class_conf, class_pred), axis=1) detections = detections[conf_mask] if not detections.shape[0]: continue if class_agnostic: nms_out_index = self._nms(detections[:, :4], detections[:, 4] * detections[:, 5], nms_thre) else: nms_out_index = self._batch_nms(detections[:, :4], detections[:, 4] * detections[:, 5], detections[:, 6], nms_thre) detections = detections[nms_out_index] if output[i] is None: output[i] = detections else: output[i] = np.concatenate((output[i], detections)) return output def _nms(self, xyxys, scores, threshold): """Calculate NMS""" x1 = xyxys[:, 0] y1 = xyxys[:, 1] x2 = xyxys[:, 2] y2 = xyxys[:, 3] scores = scores areas = (x2 - x1 + 1) * (y2 - y1 + 1) order = scores.argsort()[::-1] reserved_boxes = [] while order.size > 0: i = order[0] reserved_boxes.append(i) max_x1 = np.maximum(x1[i], x1[order[1:]]) max_y1 = np.maximum(y1[i], y1[order[1:]]) min_x2 = np.minimum(x2[i], x2[order[1:]]) min_y2 = np.minimum(y2[i], y2[order[1:]]) intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1) intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1) intersect_area = intersect_w * intersect_h ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area) indexes = np.where(ovr <= threshold)[0] order = order[indexes + 1] return reserved_boxes def _batch_nms(self, xyxys, scores, idxs, threshold, use_offset=True): """Calculate Nms based on class info,Each index value correspond to a category, and NMS will not be applied between elements of different categories.""" if use_offset: max_coordinate = xyxys.max() offsets = idxs * (max_coordinate + np.array([1])) boxes_for_nms = xyxys + offsets[:, None] keep = self._nms(boxes_for_nms, scores, threshold) return keep keep_mask = np.zeros_like(scores, dtype=np.bool_) for class_id in np.unique(idxs): curr_indices = np.where(idxs == class_id)[0] curr_keep_indices = self._nms(xyxys[curr_indices], scores[curr_indices], threshold) keep_mask[curr_indices[curr_keep_indices]] = True keep_indices = np.where(keep_mask)[0] return keep_indices[np.argsort(-scores[keep_indices])] def convert_to_coco_format(self, outputs, info_imgs, ids): """ convert to coco format """ data_list = [] for (output, img_h, img_w, img_id) in zip( outputs, info_imgs[:, 0], info_imgs[:, 1], ids ): if output is None: continue bboxes = output[:, 0:4] scale = min( self.input_size[0] / float(img_h), self.input_size[1] / float(img_w) ) bboxes = bboxes / scale bboxes[:, [0, 2]] = np.clip(bboxes[:, [0, 2]], 0, img_w) bboxes[:, [1, 3]] = np.clip(bboxes[:, [1, 3]], 0, img_h) bboxes = xyxy2xywh(bboxes) cls = output[:, 6] scores = output[:, 4] * output[:, 5] for ind in range(bboxes.shape[0]): label = self.coco_catIds[int(cls[ind])] pred_data = { "image_id": int(img_id), "category_id": label, "bbox": bboxes[ind].tolist(), "score": scores[ind].item(), "segmentation": [], } # COCO json format data_list.append(pred_data) return data_list def evaluate_prediction(self): """ generate prediction coco json file """ print('Evaluate in main process...') # write result to coco json format t = datetime.datetime.now().strftime('_%Y_%m_%d_%H_%M_%S') try: self.file_path = self.save_prefix + '/predict' + t + '.json' f = open(self.file_path, 'w') json.dump(self.data_list, f) except IOError as e: raise RuntimeError("Unable to open json file to dump. What():{}".format(str(e))) else: f.close() if not self.data_list: self.file_path = '' return self.file_path self.data_list.clear() return self.file_path def get_eval_result(self): """Get eval result""" if not self.file_path: return None, None cocoGt = self._coco cocoDt = cocoGt.loadRes(self.file_path) cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') cocoEval.evaluate() cocoEval.accumulate() rdct = Redirct() stdout = sys.stdout sys.stdout = rdct cocoEval.summarize() sys.stdout = stdout return rdct.content, cocoEval.stats[0] class PredictionEngine: def __init__(self, config): self.input_size = config.input_size self.num_classes = config.num_classes self.conf_thre = config.pred_conf_thre self.nms_thre = config.pred_nms_thre self.class_names = self.get_classes(config.classes_path) hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)] self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) def prediction(self, outputs, image_shape): outputs = self.postprocess(outputs, self.num_classes, self.conf_thre, self.nms_thre) if outputs[0] is None: return None top_label = outputs[0][:, 6].astype('int32') top_conf = outputs[0][:, 4] * outputs[0][:, 5] top_boxes = outputs[0][:, :4] scale = min(self.input_size[0] / float(image_shape[0]), self.input_size[1] / float(image_shape[1])) top_boxes = top_boxes / scale top_boxes[:, [0, 2]] = np.clip(top_boxes[:, [0, 2]], 0, image_shape[1]) top_boxes[:, [1, 3]] = np.clip(top_boxes[:, [1, 3]], 0, image_shape[0]) info_mask = np.zeros((image_shape[0], image_shape[1], 3)) for i, c in list(enumerate(top_label)): label_name = self.class_names[int(c)-1]#id start with 1 box = top_boxes[i] score = top_conf[i] left, top, right, bottom = box top = max(0, np.floor(top).astype('int32')) left = max(0, np.floor(left).astype('int32')) bottom = min(image_shape[1], np.floor(bottom).astype('int32')) right = min(image_shape[0], np.floor(right).astype('int32')) cv2.rectangle(info_mask, (left, top), (right, bottom), self.colors[int(c)-1], 1) text = "{}: {:.4f}".format(label_name, score) cv2.putText(info_mask, text, (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, self.colors[int(c)-1], 1) return info_mask def postprocess(self, prediction, num_classes, conf_thre=0.7, nms_thre=0.45, class_agnostic=False): """ nms """ box_corner = np.zeros_like(prediction) box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2 prediction[:, :, :4] = box_corner[:, :, :4] output = [None for _ in range(len(prediction))] for i, image_pred in enumerate(prediction): if not image_pred.shape[0]: continue # Get score and class with highest confidence class_conf = np.max(image_pred[:, 5:5 + num_classes], axis=-1) # (8400) class_pred = np.argmax(image_pred[:, 5:5 + num_classes], axis=-1) # (8400) conf_mask = (image_pred[:, 4] * class_conf >= conf_thre).squeeze() # (8400) class_conf = np.expand_dims(class_conf, axis=-1) # (8400, 1) class_pred = np.expand_dims(class_pred, axis=-1).astype(np.float16) # (8400, 1) # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred) detections = np.concatenate((image_pred[:, :5], class_conf, class_pred), axis=1) detections = detections[conf_mask] if not detections.shape[0]: continue if class_agnostic: nms_out_index = self._nms(detections[:, :4], detections[:, 4] * detections[:, 5], nms_thre) else: nms_out_index = self._batch_nms(detections[:, :4], detections[:, 4] * detections[:, 5], detections[:, 6], nms_thre) detections = detections[nms_out_index] if output[i] is None: output[i] = detections else: output[i] = np.concatenate((output[i], detections)) return output def _nms(self, xyxys, scores, threshold): """Calculate NMS""" x1 = xyxys[:, 0] y1 = xyxys[:, 1] x2 = xyxys[:, 2] y2 = xyxys[:, 3] scores = scores areas = (x2 - x1 + 1) * (y2 - y1 + 1) order = scores.argsort()[::-1] reserved_boxes = [] while order.size > 0: i = order[0] reserved_boxes.append(i) max_x1 = np.maximum(x1[i], x1[order[1:]]) max_y1 = np.maximum(y1[i], y1[order[1:]]) min_x2 = np.minimum(x2[i], x2[order[1:]]) min_y2 = np.minimum(y2[i], y2[order[1:]]) intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1) intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1) intersect_area = intersect_w * intersect_h ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area) indexes = np.where(ovr <= threshold)[0] order = order[indexes + 1] return reserved_boxes def _batch_nms(self, xyxys, scores, idxs, threshold, use_offset=True): """Calculate Nms based on class info,Each index value correspond to a category, and NMS will not be applied between elements of different categories.""" if use_offset: max_coordinate = xyxys.max() offsets = idxs * (max_coordinate + np.array([1])) boxes_for_nms = xyxys + offsets[:, None] keep = self._nms(boxes_for_nms, scores, threshold) return keep keep_mask = np.zeros_like(scores, dtype=np.bool_) for class_id in np.unique(idxs): curr_indices = np.where(idxs == class_id)[0] curr_keep_indices = self._nms(xyxys[curr_indices], scores[curr_indices], threshold) keep_mask[curr_indices[curr_keep_indices]] = True keep_indices = np.where(keep_mask)[0] return keep_indices[np.argsort(-scores[keep_indices])] def get_classes(self, classes_path): with open(classes_path, encoding='utf-8') as f: class_names = f.readlines() class_names = [c.strip() for c in class_names] return class_names 3.11 网络权重初始化 #------------------------# # network initialized #------------------------# def calculate_gain(nonlinearity, param=None): r"""Return the recommended gain value for the given nonlinearity function. The values are as follows: ================= ==================================================== nonlinearity gain ================= ==================================================== Linear / Identity :math:`1` Conv{1,2,3}D :math:`1` Sigmoid :math:`1` Tanh :math:`\frac{5}{3}` ReLU :math:`\sqrt{2}` Leaky Relu :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}` ================= ==================================================== Args: nonlinearity: the non-linear function (`nn.functional` name) param: optional parameter for the non-linear function Examples: >>> gain = nn.init.calculate_gain('leaky_relu', 0.2) # leaky_relu with negative_slope=0.2 """ linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d'] if nonlinearity in linear_fns or nonlinearity == 'sigmoid': return 1 if nonlinearity == 'tanh': return 5.0 / 3 if nonlinearity == 'relu': return math.sqrt(2.0) if nonlinearity == 'leaky_relu': if param is None: negative_slope = 0.01 elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float): # True/False are instances of int, hence check above negative_slope = param else: raise ValueError("negative_slope {} not a valid number".format(param)) return math.sqrt(2.0 / (1 + negative_slope ** 2)) raise ValueError("Unsupported nonlinearity {}".format(nonlinearity)) def _assignment(arr, num): """Assign the value of 'num' and 'arr'.""" if arr.shape == (): arr = arr.reshape((1)) arr[:] = num arr = arr.reshape(()) else: if isinstance(num, np.ndarray): arr[:] = num[:] else: arr[:] = num return arr def _calculate_correct_fan(array, mode): mode = mode.lower() valid_modes = ['fan_in', 'fan_out'] if mode not in valid_modes: raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes)) fan_in, fan_out = _calculate_fan_in_and_fan_out(array) return fan_in if mode == 'fan_in' else fan_out def kaiming_uniform_(arr, a=0, mode='fan_in', nonlinearity='leaky_relu'): r"""Fills the input `Tensor` with values according to the method described in `Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification` - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where .. math:: \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}} Also known as He initialization. Args: tensor: an n-dimensional `Tensor` a: the negative slope of the rectifier used after this layer (only used with ``'leaky_relu'``) mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'`` preserves the magnitude of the variance of the weights in the forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the backwards pass. nonlinearity: the non-linear function (`nn.functional` name), recommended to use only with ``'relu'`` or ``'leaky_relu'`` (default). Examples: >>> w = np.empty(3, 5) >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu') """ fan = _calculate_correct_fan(arr, mode) gain = calculate_gain(nonlinearity, a) std = gain / math.sqrt(fan) bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation return np.random.uniform(-bound, bound, arr.shape) def _calculate_fan_in_and_fan_out(arr): """Calculate fan in and fan out.""" dimensions = len(arr.shape) if dimensions < 2: raise ValueError("Fan in and fan out can not be computed for array with fewer than 2 dimensions") num_input_fmaps = arr.shape[1] num_output_fmaps = arr.shape[0] receptive_field_size = 1 if dimensions > 2: receptive_field_size = reduce(lambda x, y: x * y, arr.shape[2:]) fan_in = num_input_fmaps * receptive_field_size fan_out = num_output_fmaps * receptive_field_size return fan_in, fan_out class KaimingUniform(MeInitializer): """Kaiming uniform initializer.""" def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'): super(KaimingUniform, self).__init__() self.a = a self.mode = mode self.nonlinearity = nonlinearity def _initialize(self, arr): tmp = kaiming_uniform_(arr, self.a, self.mode, self.nonlinearity) _assignment(arr, tmp) def default_recurisive_init(custom_cell, prior_prob=1e-2): """Initialize parameter.""" for _, cell in custom_cell.cells_and_names(): if isinstance(cell, nn.Conv2d): cell.weight.set_data(initializer.initializer(KaimingUniform(a=math.sqrt(5)), cell.weight.shape, cell.weight.dtype)) if cell.bias is not None: fan_in, _ = _calculate_fan_in_and_fan_out(cell.weight) bound = 1 / math.sqrt(fan_in) cell.bias.set_data(initializer.initializer(initializer.Uniform(bound), cell.bias.shape, cell.bias.dtype)) if "cls_preds" in cell.bias.name or "obj_preds" in cell.bias.name: cell.bias.set_data(initializer.initializer(-math.log((1 - prior_prob) / prior_prob), cell.bias.shape, cell.bias.dtype)) elif isinstance(cell, nn.Dense): cell.weight.set_data(initializer.initializer(KaimingUniform(a=math.sqrt(5)), cell.weight.shape, cell.weight.dtype)) if cell.bias is not None: fan_in, _ = _calculate_fan_in_and_fan_out(cell.weight) bound = 1 / math.sqrt(fan_in) cell.bias.set_data(initializer.initializer(initializer.Uniform(bound), cell.bias.shape, cell.bias.dtype)) elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d, nn.SyncBatchNorm)): cell.momentum = 0.97 cell.eps = 0.001 else: pass initialize_head_biases(custom_cell, prior_prob=0.01) def initialize_head_biases(network, prior_prob): for name, cell in network.cells_and_names(): if name.endswith("cls_preds") or name.endswith("obj_preds"): cell.bias.set_data(initializer.initializer(-math.log((1 - prior_prob) / prior_prob), cell.bias.shape, cell.bias.dtype)) def load_yolox_params(args, network): """Load yolox darknet parameter from checkpoint.""" if args.pretrained_backbone: network = load_backbone(network, args.pretrained_backbone, args) args.logger.info('load pre-trained backbone {} into network'.format(args.pretrained_backbone)) else: args.logger.info('Not load pre-trained backbone, please be careful') def load_resume_params(args, network): if args.resume_yolox: args.logger.info('Start to load resume parameters...') network = load_backbone(network, args.resume_yolox, args) args.logger.info('resume finished') args.logger.info('load_model {} success'.format(args.resume_yolox)) else: args.logger.info('Not load resume!')

yd_233394255 发表于2024-03-09 17:38:57 2024-03-09 17:38:57 最后回复运气男孩 2024-04-01 08:56:58
30 1

图像识别 Image
[其他] YoloX（2）

#------------------------# # darknet #------------------------# class Darknet(nn.Cell): """ Darknet for yolox-darknet53 """ # number of block from dark2 to dark5. depth2block = {21: [1, 2, 2, 1], 53: [2, 8, 8, 4]} def __init__( self, depth, in_channels=3, stem_out_channels=32, out_features=("dark3", "dark4", "dark5"), ): """ Args: depth (int): depth of darknet used in model, usually use [21, 53] for this param. in_channels (int): number of input channels, for example, use 3 for RGB image. stem_out_channels (int): number of output channels of darknet stem. It decides channels of darknet layer2 to layer5. out_features (Tuple[str]): desired output layer name. """ super(Darknet, self).__init__() assert out_features, "please provide output features of Darknet" self.out_features = out_features self.stem = nn.SequentialCell( BaseConv(in_channels=in_channels, out_channels=stem_out_channels, ksize=3, stride=1, act="lrelu"), *self.make_group_layer(stem_out_channels, num_blocks=1, stride=2), ) in_channels = stem_out_channels * 2 num_blocks = Darknet.depth2block[depth] # create darknet with `stem_out_channels` and `num_blocks` layers. # to make model structure more clear, we don't use `for` statement in python. self.dark2 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[0], stride=2) ) in_channels *= 2 # 128 self.dark3 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[1], stride=2) ) in_channels *= 2 # 256 self.dark4 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[2], stride=2) ) in_channels *= 2 # 512 self.dark5 = nn.SequentialCell( *self.make_group_layer(in_channels=in_channels, num_blocks=num_blocks[3], stride=2), *self.make_spp_block([in_channels, in_channels * 2], in_channels * 2), ) def make_group_layer(self, in_channels: int, num_blocks: int, stride: int = 1): "starts with conv layer then has `num_blocks` `ResLayer`" return [ BaseConv(in_channels, in_channels * 2, ksize=3, stride=stride, act="lrelu"), *[(ResLayer(in_channels * 2)) for _ in range(num_blocks)], ] def make_spp_block(self, filters_list, in_filters): """ spatial pyramid pooling block""" m = nn.SequentialCell( *[ BaseConv(in_filters, filters_list[0], 1, stride=1, act="lrelu"), BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"), SPPBottleneck( in_channels=filters_list[1], out_channels=filters_list[0], activation="lrelu", ), BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"), BaseConv(filters_list[1], filters_list[0], 1, stride=1, act="lrelu"), ] ) return m def construct(self, x): """ forward """ outputs = {} x = self.stem(x) outputs["stem"] = x x = self.dark2(x) outputs["dark2"] = x x = self.dark3(x) outputs["dark3"] = x x = self.dark4(x) outputs["dark4"] = x x = self.dark5(x) outputs["dark5"] = x return outputs["dark3"], outputs["dark4"], outputs["dark5"] class CSPDarknet(nn.Cell): """ Darknet with CSP block for yolox-s m l x""" def __init__( self, dep_mul, wid_mul, out_features=("dark3", "dark4", "dark5"), depthwise=False, act="silu" ): super(CSPDarknet, self).__init__() assert out_features, "please provide output features of Darknet" self.out_features = out_features Conv = DWConv if depthwise else BaseConv base_channels = int(wid_mul * 64) base_depth = max(round(dep_mul * 3), 1) # stem self.stem = Focus(3, base_channels, ksize=3, act=act) # dark2 self.dark2 = nn.SequentialCell( Conv(base_channels, base_channels * 2, 3, 2, act=act), CSPLayer( base_channels * 2, base_channels * 2, n=base_depth, depthwise=depthwise, act=act, ), ) # dark3 self.dark3 = nn.SequentialCell( Conv(base_channels * 2, base_channels * 4, 3, 2, act=act), CSPLayer( base_channels * 4, base_channels * 4, n=base_depth * 3, depthwise=depthwise, act=act, ), ) # dark4 self.dark4 = nn.SequentialCell( Conv(base_channels * 4, base_channels * 8, 3, 2, act=act), CSPLayer( base_channels * 8, base_channels * 8, n=base_depth * 3, depthwise=depthwise, act=act, ), ) # dark5 self.dark5 = nn.SequentialCell( Conv(base_channels * 8, base_channels * 16, 3, 2, act=act), SPPBottleneck(base_channels * 16, base_channels * 16, activation=act), CSPLayer( base_channels * 16, base_channels * 16, n=base_depth, shortcut=False, depthwise=depthwise, act=act, ), ) def construct(self, x): """ forward """ outputs = {} x = self.stem(x) outputs["stem"] = x x = self.dark2(x) outputs["dark2"] = x x = self.dark3(x) outputs["dark3"] = x x = self.dark4(x) outputs["dark4"] = x x = self.dark5(x) outputs["dark5"] = x return outputs["dark3"], outputs["dark4"], outputs["dark5"] 3.6.3 backbon+neck 两种结构，如下图所示： YOLOFPN，采用Darknet为backbone，使用yolov3 baseline的Neck结构，都采用FPN结构进行融合 YOLOPAFPN, 在FPN基础上引入PAN结构 #------------------------# # YOLOFPN #------------------------# class YOLOFPN(nn.Cell): """ YOLOFPN module, Darknet53 is the default backbone of this model """ def __init__(self, input_w, input_h, depth=53, in_features=None): super(YOLOFPN, self).__init__() if in_features is None: in_features = ["dark3", "dark4", "dark5"] self.backbone = Darknet(depth) self.in_features = in_features # out 1 self.out1_cbl = self._make_cbl(512, 256, 1) self.out1 = self._make_embedding([256, 512], 512 + 256) # out 2 self.out2_cbl = self._make_cbl(256, 128, 1) self.out2 = self._make_embedding([128, 256], 256 + 128) # upsample self.upsample0 = P.ResizeNearestNeighbor((input_h // 16, input_w // 16)) self.upsample1 = P.ResizeNearestNeighbor((input_h // 8, input_w // 8)) def _make_cbl(self, _in, _out, ks): """ make cbl layer """ return BaseConv(_in, _out, ks, stride=1, act="lrelu") def _make_embedding(self, filters_list, in_filters): """ make embedding """ m = nn.SequentialCell( *[ self._make_cbl(in_filters, filters_list[0], 1), self._make_cbl(filters_list[0], filters_list[1], 3), self._make_cbl(filters_list[1], filters_list[0], 1), self._make_cbl(filters_list[0], filters_list[1], 3), self._make_cbl(filters_list[1], filters_list[0], 1), ] ) return m def construct(self, inputs): """ forward """ out_features = self.backbone(inputs) x2, x1, x0 = out_features # yolo branch 1 x1_in = self.out1_cbl(x0) x1_in = self.upsample0(x1_in) x1_in = P.Concat(axis=1)([x1_in, x1]) out_dark4 = self.out1(x1_in) # yolo branch 2 x2_in = self.out2_cbl(out_dark4) x2_in = self.upsample1(x2_in) x2_in = P.Concat(axis=1)([x2_in, x2]) out_dark3 = self.out2(x2_in) outputs = (out_dark3, out_dark4, x0) return outputs #------------------------# # YOLOPAFPN #------------------------# class YOLOPAFPN(nn.Cell): """ YOLOv3 model. Darknet 53 is the default backbone of this model """ def __init__( self, input_w, input_h, depth=1.0, width=1.0, in_features=("dark3", "dark4", "dark5"), in_channels=None, depthwise=False, act="silu" ): super(YOLOPAFPN, self).__init__() if in_channels is None: in_channels = [256, 512, 1024] self.input_w = input_w self.input_h = input_h self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act) self.in_features = in_features self.in_channels = in_channels Conv = DWConv if depthwise else BaseConv self.upsample0 = P.ResizeNearestNeighbor((input_h // 16, input_w // 16)) self.upsample1 = P.ResizeNearestNeighbor((input_h // 8, input_w // 8)) self.lateral_conv0 = BaseConv(int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act) self.C3_p4 = CSPLayer( int(2 * in_channels[1] * width), int(in_channels[1] * width), round(3 * depth), False, depthwise=depthwise, act=act ) self.reduce_conv1 = BaseConv( int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act ) self.C3_p3 = CSPLayer( int(2 * in_channels[0] * width), int(in_channels[0] * width), round(3 * depth), False, depthwise=depthwise, act=act, ) # bottom-up conv self.bu_conv2 = Conv( int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act ) self.C3_n3 = CSPLayer( int(2 * in_channels[0] * width), int(in_channels[1] * width), round(3 * depth), False, depthwise=depthwise, act=act, ) # bottom-up conv self.bu_conv1 = Conv( int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act ) self.C3_n4 = CSPLayer( int(2 * in_channels[1] * width), int(in_channels[2] * width), round(3 * depth), False, depthwise=depthwise, act=act, ) self.concat = P.Concat(axis=1) def construct(self, inputs): """ Args: inputs: input images. Returns: Tuple[Tensor]: FPN feature. """ x2, x1, x0 = self.backbone(inputs) fpn_out0 = self.lateral_conv0(x0) # 1024->512 /32 f_out0 = self.upsample0(fpn_out0) # 512 /16 f_out0 = self.concat((f_out0, x1)) # 512->1024 /16 f_out0 = self.C3_p4(f_out0) # 1024->512 /16 fpn_out1 = self.reduce_conv1(f_out0) # 512->256 /16 f_out1 = self.upsample1(fpn_out1) # 256 /8 f_out1 = self.concat((f_out1, x2)) # 256->512 /8 pan_out2 = self.C3_p3(f_out1) # 512->256 /16 p_out1 = self.bu_conv2(pan_out2) # 256->256 /16 p_out1 = self.concat((p_out1, fpn_out1)) # 256->512 /16 pan_out1 = self.C3_n3(p_out1) # 512->512/16 p_out0 = self.bu_conv1(pan_out1) # 512->512/32 p_out0 = self.concat((p_out0, fpn_out0)) # 512->1024/32 pan_out0 = self.C3_n4(p_out0) # 1024->1024/32 return pan_out2, pan_out1, pan_out0 3.7 bbox iou计算相关 #------------------------# # bbox iou #------------------------# @constexpr def raise_bbox_error(): raise IndexError("Index error, shape of input must be 4!") def bboxes_iou(bboxes_a, bboxes_b, xyxy=True): """ calculate iou Args: bboxes_a: bboxes_b: xyxy: Returns: """ if bboxes_a.shape[1] != 4 or bboxes_b.shape[1] != 4: raise_bbox_error() if xyxy: tl = P.Maximum()(bboxes_a[:, None, :2], bboxes_b[:, :2]) br = P.Minimum()(bboxes_a[:, None, 2:], bboxes_b[:, 2:]) area_a = bboxes_a[:, 2:] - bboxes_a[:, :2] area_a = (area_a[:, 0:1] * area_a[:, 1:2]).squeeze(-1) area_b = bboxes_b[:, 2:] - bboxes_b[:, :2] area_b = (area_b[:, 0:1] * area_b[:, 1:2]).squeeze(-1) else: tl = P.Maximum()( (bboxes_a[:, None, :2] - bboxes_a[:, None, 2:] / 2), (bboxes_b[:, :2] - bboxes_b[:, 2:] / 2), ) br = P.Minimum()( (bboxes_a[:, None, :2] + bboxes_a[:, None, 2:] / 2), (bboxes_b[:, :2] + bboxes_b[:, 2:] / 2), ) area_a = (bboxes_a[:, 2:3] * bboxes_a[:, 3:4]).squeeze(-1) area_b = (bboxes_b[:, 2:3] * bboxes_b[:, 3:4]).squeeze(-1) en = (tl < br).astype(tl.dtype) en = (en[..., 0:1] * en[..., 1:2]).squeeze(-1) area_i = tl - br area_i = (area_i[:, :, 0:1] * area_i[:, :, 1:2]).squeeze(-1) * en return area_i / (area_a[:, None] + area_b - area_i) def batch_bboxes_iou(batch_bboxes_a, batch_bboxes_b, xyxy=True): """ calculate iou for one batch Args: batch_bboxes_a: batch_bboxes_b: xyxy: Returns: """ if batch_bboxes_a.shape[-1] != 4 or batch_bboxes_b.shape[-1] != 4: raise_bbox_error() ious = [] for i in range(len(batch_bboxes_a)): if xyxy: iou = bboxes_iou(batch_bboxes_a[i], batch_bboxes_b[i], True) else: iou = bboxes_iou(batch_bboxes_a[i], batch_bboxes_b[i], False) iou = P.ExpandDims()(iou, 0) ious.append(iou) ious = P.Concat(axis=0)(ious) return ious 3.8 模型、Loss相关 DetectionBlock为完整的yolox结构，用于声明后续训练声明网络结构 yololoss ema指数移动平均，对模型权重进行加权平均，使其更加鲁棒 3.8.1 网络损失函数和网络的预测结果一样，YOLOX网络的损失函数也由三个部分组成，分别是Reg部分、Obj部分和Cls部分。Reg部分是特征点的回归参数判断，Obj部分是特征点是否包含物体判断，Cls部分是特征点包含的物体的种类。在YoloX中，物体的真实框落在哪些特征点内就由该特征点来预测。对于每一个真实框需要求取所有特征点与它的空间位置情况，作为正样本的特征点需要满足以下几个特点： 1）特征点落在物体的真实框内； 2）特征点距离物体中心尽量要在一定半径内。满足这两点保证了属于正样本的特征点会落在物体真实框内部，特征点中心与物体真实框中心要相近。但是这两个条件仅用作正样本的初步筛选，在YoloX中，使用了SimOTA方法进行动态的正样本数量分配。在YoloX中，会计算一个Cost代价矩阵，代表每个真实框和每个特征点之间的代价关系，Cost代价矩阵由三个部分组成： 1）每个真实框和当前特征点预测框的重合程度； 2）每个真实框和当前特征点预测框的种类预测准确度； 3）每个真实框的中心是否落在了特征点的一定半径内。 Cost代价矩阵的目的是自适应的找到当前特征点应该去拟合的真实框，重合度越高越需要拟合，分类越准越需要拟合，在一定半径内越需要拟合。在SimOTA中，不同目标设定不同的正样本数量(dynamic k)，以旷视科技官方回答中的蚂蚁和西瓜为例子，传统的正样本分配方案常常为同一场景下的西瓜和蚂蚁分配同样的正样本数，那要么蚂蚁有很多低质量的正样本，要么西瓜仅仅只有一两个正样本，这样的结果对于哪个分配方式都是不合适的。动态的正样本设置的关键在于如何确定k，SimOTA具体的做法是首先计算每个目标Cost最低的10特征点，然后把这十个特征点对应的预测框与真实框的IOU加起来求得最终的k。因此，SimOTA的过程总结如下： 1）计算每个真实框和当前特征点预测框的重合程度； 2）计算将重合度最高的十个预测框与真实框的IOU加起来求得每个真实框的k，也就代表每个真实框有k个特征点与之对应； 3）计算每个真实框和当前特征点预测框的种类预测准确度； 4）判断真实框的中心是否落在了特征点的一定半径内； 5）计算Cost代价矩阵； 6）将Cost最低的k个点作为该真实框的正样本。由前文所述可知，YoloX的损失由三个部分组成： 1.Reg部分，由SimOTA可以知道每个真实框对应的特征点，获取到每个框对应的特征点后，取出该特征点的预测框，利用真实框和预测框计算IOU损失，作为Reg部分的Loss组成。 2.Obj部分，由SimOTA可知道每个真实框对应的特征点，所有真实框对应的特征点都是正样本，剩余的特征点均为负样本，根据正负样本和特征点的是否包含物体的预测结果计算交叉熵损失，作为Obj部分的Loss组成。 3.Cls部分，由SimOTA可知道每个真实框对应的特征点，获取到每个框对应的特征点后，取出该特征点的种类预测结果，根据真实框的种类和特征点的种类预测结果计算交叉熵损失，作为Cls部分的Loss组成。其中Cls和Obj部分采用的都是二值交叉熵损失（BCELoss），Reg部分采用的是IoULoss。值得注意的是，Cls和Reg部分只计算正样本的损失，而Obj既计算正样本也计算负样本的损失。其中： Lcls代表分类损失，Lreg代表定位损失，Lobj代表obj损失，λ代表定位损失的平衡系数，源码中设置是5.0，Npos代表被分为正样的Anchor Point数。 #------------------------# # yolox model #------------------------# class DetectionPerFPN(nn.Cell): """ head """ def __init__(self, num_classes, scale, in_channels=None, act="silu", width=1.0): super(DetectionPerFPN, self).__init__() if in_channels is None: in_channels = [1024, 512, 256] self.scale = scale self.num_classes = num_classes Conv = BaseConv if scale == 's': self.stem = BaseConv(in_channels=int(in_channels[0] * width), out_channels=int(256 * width), ksize=1, stride=1, act=act) elif scale == 'm': self.stem = BaseConv(in_channels=int(in_channels[1] * width), out_channels=int(256 * width), ksize=1, stride=1, act=act) elif scale == 'l': self.stem = BaseConv(in_channels=int(in_channels[2] * width), out_channels=int(256 * width), ksize=1, stride=1, act=act) else: raise KeyError("Invalid scale value for DetectionBlock") self.cls_convs = nn.SequentialCell( [ Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), ] ) self.reg_convs = nn.SequentialCell( [ Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), Conv( in_channels=int(256 * width), out_channels=int(256 * width), ksize=3, stride=1, act=act, ), ] ) self.cls_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=self.num_classes, kernel_size=1, stride=1, pad_mode="pad", has_bias=True) self.reg_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=4, kernel_size=1, stride=1, pad_mode="pad", has_bias=True) self.obj_preds = nn.Conv2d(in_channels=int(256 * width), out_channels=1, kernel_size=1, stride=1, pad_mode="pad", has_bias=True) def construct(self, x): """ forward """ x = self.stem(x) cls_x = x reg_x = x cls_feat = self.cls_convs(cls_x) cls_output = self.cls_preds(cls_feat) reg_feat = self.reg_convs(reg_x) reg_output = self.reg_preds(reg_feat) obj_output = self.obj_preds(reg_feat) return cls_output, reg_output, obj_output class DetectionBlock(nn.Cell): """ connect yolox backbone and head """ def __init__(self, config, backbone="yolopafpn"): super(DetectionBlock, self).__init__() self.num_classes = config.num_classes self.attr_num = self.num_classes + 5 self.depthwise = config.depth_wise self.strides = Tensor([8, 16, 32], mindspore.float32) self.input_size = config.input_size # network if backbone == "yolopafpn": self.backbone = YOLOPAFPN(depth=1.33, width=1.25, input_w=self.input_size[1], input_h=self.input_size[0]) self.head_inchannels = [1024, 512, 256] self.activation = "silu" self.width = 1.25 else: self.backbone = YOLOFPN(input_w=self.input_size[1], input_h=self.input_size[0]) self.head_inchannels = [512, 256, 128] self.activation = "lrelu" self.width = 1.0 self.head_l = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='l', act=self.activation, width=self.width) self.head_m = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='m', act=self.activation, width=self.width) self.head_s = DetectionPerFPN(in_channels=self.head_inchannels, num_classes=self.num_classes, scale='s', act=self.activation, width=self.width) def construct(self, x): """ forward """ outputs = [] x_l, x_m, x_s = self.backbone(x) cls_output_l, reg_output_l, obj_output_l = self.head_l(x_l) # (bs, 80, 80, 80)(bs, 4, 80, 80)(bs, 1, 80, 80) cls_output_m, reg_output_m, obj_output_m = self.head_m(x_m) # (bs, 80, 40, 40)(bs, 4, 40, 40)(bs, 1, 40, 40) cls_output_s, reg_output_s, obj_output_s = self.head_s(x_s) # (bs, 80, 20, 20)(bs, 4, 20, 20)(bs, 1, 20, 20) if self.training: output_l = P.Concat(axis=1)((reg_output_l, obj_output_l, cls_output_l)) # (bs, 85, 80, 80) output_m = P.Concat(axis=1)((reg_output_m, obj_output_m, cls_output_m)) # (bs, 85, 40, 40) output_s = P.Concat(axis=1)((reg_output_s, obj_output_s, cls_output_s)) # (bs, 85, 20, 20) output_l = self.mapping_to_img(output_l, stride=self.strides[0]) # (bs, 6400, 85)x_c, y_c, w, h output_m = self.mapping_to_img(output_m, stride=self.strides[1]) # (bs, 1600, 85)x_c, y_c, w, h output_s = self.mapping_to_img(output_s, stride=self.strides[2]) # (bs, 400, 85)x_c, y_c, w, h else: output_l = P.Concat(axis=1)( (reg_output_l, P.Sigmoid()(obj_output_l), P.Sigmoid()(cls_output_l))) # bs, 85, 80, 80 output_m = P.Concat(axis=1)( (reg_output_m, P.Sigmoid()(obj_output_m), P.Sigmoid()(cls_output_m))) # bs, 85, 40, 40 output_s = P.Concat(axis=1)( (reg_output_s, P.Sigmoid()(obj_output_s), P.Sigmoid()(cls_output_s))) # bs, 85, 20, 20 output_l = self.mapping_to_img(output_l, stride=self.strides[0]) # (bs, 6400, 85)x_c, y_c, w, h output_m = self.mapping_to_img(output_m, stride=self.strides[1]) # (bs, 1600, 85)x_c, y_c, w, h output_s = self.mapping_to_img(output_s, stride=self.strides[2]) # (bs, 400, 85)x_c, y_c, w, h outputs.append(output_l) outputs.append(output_m) outputs.append(output_s) return P.Concat(axis=1)(outputs) # batch_size, 8400, 85 def mapping_to_img(self, output, stride): """ map to origin image scale for each fpn """ batch_size = P.Shape()(output)[0] n_ch = self.attr_num grid_size = P.Shape()(output)[2:4] range_x = range(grid_size[1]) range_y = range(grid_size[0]) stride = P.Cast()(stride, output.dtype) grid_x = P.Cast()(F.tuple_to_array(range_x), output.dtype) grid_y = P.Cast()(F.tuple_to_array(range_y), output.dtype) grid_y = P.ExpandDims()(grid_y, 1) grid_x = P.ExpandDims()(grid_x, 0) yv = P.Tile()(grid_y, (1, grid_size[1])) xv = P.Tile()(grid_x, (grid_size[0], 1)) grid = P.Stack(axis=2)([xv, yv]) # (80, 80, 2) grid = P.Reshape()(grid, (1, 1, grid_size[0], grid_size[1], 2)) # (1,1,80,80,2) output = P.Reshape()(output, (batch_size, n_ch, grid_size[0], grid_size[1])) # bs, 6400, 85-->(bs,85,80,80) output = P.Transpose()(output, (0, 2, 1, 3)) # (bs,85,80,80)-->(bs,80,85,80) output = P.Transpose()(output, (0, 1, 3, 2)) # (bs,80,85,80)--->(bs, 80, 80, 85) output = P.Reshape()(output, (batch_size, 1 * grid_size[0] * grid_size[1], -1)) # bs, 6400, 85 grid = P.Reshape()(grid, (1, -1, 2)) # grid(1, 6400, 2) # reconstruct output_xy = output[..., :2] output_xy = (output_xy + grid) * stride output_wh = output[..., 2:4] output_wh = P.Exp()(output_wh) * stride output_other = output[..., 4:] output_t = P.Concat(axis=-1)([output_xy, output_wh, output_other]) return output_t # bs, 6400, 85 grid(1, 6400, 2) #------------------------# # yolox Loss #------------------------# class YOLOLossCell(nn.Cell): """ yolox with loss cell """ def __init__(self, network=None, config=None): super(YOLOLossCell, self).__init__() self.network = network self.n_candidate_k = config.n_candidate_k self.on_value = Tensor(1.0, mindspore.float32) self.off_value = Tensor(0.0, mindspore.float32) self.depth = config.num_classes self.unsqueeze = P.ExpandDims() self.reshape = P.Reshape() self.one_hot = P.OneHot() self.zeros = P.ZerosLike() self.sort_ascending = P.Sort(descending=False) self.bce_loss = nn.BCEWithLogitsLoss(reduction="none") self.l1_loss = nn.L1Loss(reduction="none") self.batch_iter = Tensor(np.arange(0, config.per_batch_size * config.max_gt), mindspore.int32) self.strides = config.fpn_strides self.grids = [(config.input_size[0] // _stride) * (config.input_size[1] // _stride) for _stride in config.fpn_strides] self.use_l1 = config.use_l1 def construct(self, img, labels=None, pre_fg_mask=None, is_inbox_and_incenter=None): """ forward with loss return """ batch_size = P.Shape()(img)[0] gt_max = P.Shape()(labels)[1] outputs = self.network(img) # batch_size, 8400, 85 total_num_anchors = P.Shape()(outputs)[1] bbox_preds = outputs[:, :, :4] # batch_size, 8400, 4 obj_preds = outputs[:, :, 4:5] # batch_size, 8400, 1 cls_preds = outputs[:, :, 5:] # (batch_size, 8400, 80) # process label bbox_true = labels[:, :, 1:] # (batch_size, gt_max, 4) gt_classes = F.cast(labels[:, :, 0:1].squeeze(-1), mindspore.int32) pair_wise_ious = batch_bboxes_iou(bbox_true, bbox_preds, xyxy=False) pair_wise_ious = pair_wise_ious * pre_fg_mask pair_wise_iou_loss = -P.Log()(pair_wise_ious + 1e-8) * pre_fg_mask gt_classes_ = self.one_hot(gt_classes, self.depth, self.on_value, self.off_value) gt_classes_expaned = ops.repeat_elements(self.unsqueeze(gt_classes_, 2), rep=total_num_anchors, axis=2) gt_classes_expaned = F.stop_gradient(gt_classes_expaned) cls_preds_ = P.Sigmoid()(ops.repeat_elements(self.unsqueeze(cls_preds, 1), rep=gt_max, axis=1)) * \ P.Sigmoid()( ops.repeat_elements(self.unsqueeze(obj_preds, 1), rep=gt_max, axis=1) ) pair_wise_cls_loss = P.ReduceSum()( P.BinaryCrossEntropy(reduction="none")(P.Sqrt()(cls_preds_), gt_classes_expaned, None), -1) pair_wise_cls_loss = pair_wise_cls_loss * pre_fg_mask cost = pair_wise_cls_loss + 3.0 * pair_wise_iou_loss punishment_cost = 1000.0 * (1.0 - F.cast(is_inbox_and_incenter, mindspore.float32)) cost = F.cast(cost + punishment_cost, mindspore.float16) # dynamic k matching ious_in_boxes_matrix = pair_wise_ious # (batch_size, gt_max, 8400) ious_in_boxes_matrix = F.cast(pre_fg_mask * ious_in_boxes_matrix, mindspore.float16) topk_ious, _ = P.TopK(sorted=True)(ious_in_boxes_matrix, self.n_candidate_k) dynamic_ks = P.ReduceSum()(topk_ious, 2).astype(mindspore.int32).clip(xmin=1, xmax=total_num_anchors - 1, dtype=mindspore.int32) # (1, batch_size * gt_max, 2) dynamic_ks_indices = P.Stack(axis=1)((self.batch_iter, dynamic_ks.reshape((-1,)))) dynamic_ks_indices = F.stop_gradient(dynamic_ks_indices) values, _ = P.TopK(sorted=True)(-cost, self.n_candidate_k) # b_s , 50, 8400 values = P.Reshape()(-values, (-1, self.n_candidate_k)) max_neg_score = self.unsqueeze(P.GatherNd()(values, dynamic_ks_indices).reshape(batch_size, -1), 2) pos_mask = F.cast(cost < max_neg_score, mindspore.float32) # (batch_size, gt_num, 8400) pos_mask = pre_fg_mask * pos_mask # ----dynamic_k---- END----------------------------------------------------------------------------------------- cost_t = cost * pos_mask + (1.0 - pos_mask) * 2000. min_index, _ = P.ArgMinWithValue(axis=1)(cost_t) ret_posk = P.Transpose()(nn.OneHot(depth=gt_max, axis=-1)(min_index), (0, 2, 1)) pos_mask = pos_mask * ret_posk pos_mask = F.stop_gradient(pos_mask) # AA problem--------------END ---------------------------------------------------------------------------------- # calculate target --------------------------------------------------------------------------------------------- # Cast precision pos_mask = F.cast(pos_mask, mindspore.float16) bbox_true = F.cast(bbox_true, mindspore.float16) gt_classes_ = F.cast(gt_classes_, mindspore.float16) reg_target = P.BatchMatMul(transpose_a=True)(pos_mask, bbox_true) # (batch_size, 8400, 4) pred_ious_this_matching = self.unsqueeze(P.ReduceSum()((ious_in_boxes_matrix * pos_mask), 1), -1) cls_target = P.BatchMatMul(transpose_a=True)(pos_mask, gt_classes_) cls_target = cls_target * pred_ious_this_matching obj_target = P.ReduceMax()(pos_mask, 1) # (batch_size, 8400) # calculate l1_target reg_target = F.stop_gradient(reg_target) cls_target = F.stop_gradient(cls_target) obj_target = F.stop_gradient(obj_target) bbox_preds = F.cast(bbox_preds, mindspore.float32) reg_target = F.cast(reg_target, mindspore.float32) obj_preds = F.cast(obj_preds, mindspore.float32) obj_target = F.cast(obj_target, mindspore.float32) cls_preds = F.cast(cls_preds, mindspore.float32) cls_target = F.cast(cls_target, mindspore.float32) loss_l1 = 0.0 if self.use_l1: l1_target = self.get_l1_format(reg_target) l1_preds = self.get_l1_format(bbox_preds) l1_target = F.stop_gradient(l1_target) l1_target = F.cast(l1_target, mindspore.float32) l1_preds = F.cast(l1_preds, mindspore.float32) loss_l1 = P.ReduceSum()(self.l1_loss(l1_preds, l1_target), -1) * obj_target loss_l1 = P.ReduceSum()(loss_l1) # calculate target -----------END------------------------------------------------------------------------------- loss_iou = IOUloss()(P.Reshape()(bbox_preds, (-1, 4)), reg_target).reshape(batch_size, -1) * obj_target loss_iou = P.ReduceSum()(loss_iou) loss_obj = self.bce_loss(P.Reshape()(obj_preds, (-1, 1)), P.Reshape()(obj_target, (-1, 1))) loss_obj = P.ReduceSum()(loss_obj) loss_cls = P.ReduceSum()(self.bce_loss(cls_preds, cls_target), -1) * obj_target loss_cls = P.ReduceSum()(loss_cls) loss_all = (5 * loss_iou + loss_cls + loss_obj + loss_l1) / (P.ReduceSum()(obj_target) + 1e-3) return loss_all def get_l1_format_single(self, reg_target, stride, eps): """ calculate L1 loss related """ reg_target = reg_target / stride reg_target_xy = reg_target[:, :, :2] reg_target_wh = reg_target[:, :, 2:] reg_target_wh = P.Log()(reg_target_wh + eps) return P.Concat(-1)((reg_target_xy, reg_target_wh)) def get_l1_format(self, reg_target, eps=1e-8): """ calculate L1 loss related """ reg_target_l = reg_target[:, 0:self.grids[0], :] # (bs, 6400, 4) reg_target_m = reg_target[:, self.grids[0]:self.grids[1] + self.grids[0], :] # (bs, 1600, 4) reg_target_s = reg_target[:, -self.grids[2]:, :] # (bs, 400, 4) reg_target_l = self.get_l1_format_single(reg_target_l, self.strides[0], eps) reg_target_m = self.get_l1_format_single(reg_target_m, self.strides[1], eps) reg_target_s = self.get_l1_format_single(reg_target_s, self.strides[2], eps) l1_target = P.Concat(axis=1)([reg_target_l, reg_target_m, reg_target_s]) return l1_target class IOUloss(nn.Cell): """ Iou loss """ def __init__(self, reduction="none"): super(IOUloss, self).__init__() self.reduction = reduction self.reshape = P.Reshape() def construct(self, pred, target): """ forward """ pred = self.reshape(pred, (-1, 4)) target = self.reshape(target, (-1, 4)) tl = P.Maximum()(pred[:, :2] - pred[:, 2:] / 2, target[:, :2] - target[:, 2:] / 2) br = P.Minimum()(pred[:, :2] + pred[:, 2:] / 2, target[:, :2] + target[:, 2:] / 2) area_p = (pred[:, 2:3] * pred[:, 3:4]).squeeze(-1) area_g = (target[:, 2:3] * target[:, 3:4]).squeeze(-1) en = F.cast((tl < br), tl.dtype) en = (en[:, 0:1] * en[:, 1:2]).squeeze(-1) area_i = br - tl area_i = (area_i[:, 0:1] * area_i[:, 1:2]).squeeze(-1) * en area_u = area_p + area_g - area_i iou = area_i / (area_u + 1e-16) loss = 1 - iou * iou if self.reduction == "mean": loss = loss.mean() elif self.reduction == "sum": loss = loss.sum() return loss grad_scale = C.MultitypeFuncGraph("grad_scale") reciprocal = P.Reciprocal() @grad_scale.register("Tensor", "Tensor") def tensor_grad_scale(scale, grad): return grad * reciprocal(scale) _grad_overflow = C.MultitypeFuncGraph("_grad_overflow") grad_overflow = P.FloatStatus() @_grad_overflow.register("Tensor") def _tensor_grad_overflow(grad): return grad_overflow(grad) #------------------------# # ema #------------------------# class TrainOneStepWithEMA(nn.TrainOneStepWithLossScaleCell): """ Train one step with ema model """ def __init__(self, network, optimizer, scale_sense, ema=True, decay=0.9998, updates=0, moving_name=None, ema_moving_weight=None): super(TrainOneStepWithEMA, self).__init__(network, optimizer, scale_sense) self.ema = ema self.moving_name = moving_name self.ema_moving_weight = ema_moving_weight if self.ema: self.ema_weight = self.weights.clone("ema", init='same') self.decay = decay self.updates = Parameter(Tensor(updates, mindspore.float32)) self.assign = ops.Assign() self.ema_moving_parameters() def ema_moving_parameters(self): self.moving_name = {} moving_list = [] idx = 0 for key, param in self.network.parameters_and_names(): if "moving_mean" in key or "moving_variance" in key: new_param = param.clone() new_param.name = "ema." + param.name moving_list.append(new_param) self.moving_name["ema." + key] = idx idx += 1 self.ema_moving_weight = ParameterTuple(moving_list) def ema_update(self): """Update EMA parameters.""" if self.ema: self.updates += 1 d = self.decay * (1 - ops.Exp()(-self.updates / 2000)) # update trainable parameters for ema_v, weight in zip(self.ema_weight, self.weights): tep_v = ema_v * d self.assign(ema_v, (1.0 - d) * weight + tep_v) return self.updates # moving_parameter_update is executed inside the callback(EMACallBack) def moving_parameter_update(self): if self.ema: d = (self.decay * (1 - ops.Exp()(-self.updates / 2000))).asnumpy().item() # update moving mean and moving var for key, param in self.network.parameters_and_names(): if "moving_mean" in key or "moving_variance" in key: idx = self.moving_name["ema." + key] moving_weight = param.asnumpy() tep_v = self.ema_moving_weight[idx] * d ema_value = (1.0 - d) * moving_weight + tep_v self.ema_moving_weight[idx] = ema_value def construct(self, *inputs): """ Forward """ weights = self.weights loss = self.network(*inputs) scaling_sens = self.scale_sense status, scaling_sens = self.start_overflow_check(loss, scaling_sens) scaling_sens_filled = C.ones_like(loss) * F.cast(scaling_sens, F.dtype(loss)) grads = self.grad(self.network, weights)(*inputs, scaling_sens_filled) grads = self.hyper_map(F.partial(grad_scale, scaling_sens), grads) # apply grad reducer on grads grads = self.grad_reducer(grads) self.ema_update() # get the overflow buffer cond = self.get_overflow_status(status, grads) overflow = self.process_loss_scale(cond) # if there is no overflow, do optimize if not overflow: loss = F.depend(loss, self.optimizer(grads)) return loss, cond, scaling_sens 3.9 设备函数针对平台设备的相关函数 #------------------------# # device adapter #------------------------# def local_adp_get_device_id(): device_id = os.getenv('DEVICE_ID', '0') return int(device_id) def local_adp_get_device_num(): device_num = os.getenv('RANK_SIZE', '1') return int(device_num) def local_adp_get_rank_id(): global_rank_id = os.getenv('RANK_ID', '0') return int(global_rank_id) def local_adp_get_job_id(): return "Local Job" def moxing_adp_get_device_id(): device_id = os.getenv('DEVICE_ID', '0') return int(device_id) def moxing_adp_get_device_num(): device_num = os.getenv('RANK_SIZE', '1') return int(device_num) def moxing_adp_get_rank_id(): global_rank_id = os.getenv('RANK_ID', '0') return int(global_rank_id) def moxing_adp_get_job_id(): job_id = os.getenv('JOB_ID') job_id = job_id if job_id != "" else "default" return job_id def sync_data(from_path, to_path): """ Download data from remote obs to local directory if the first url is remote url and the second one is local path Upload data from local directory to remote obs in contrast. """ import moxing as mox global _global_sync_count sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count) _global_sync_count += 1 # Each server contains 8 devices as most. if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock): print("from path: ", from_path) print("to path: ", to_path) mox.file.copy_parallel(from_path, to_path) print("===finish data synchronization===") try: os.mknod(sync_lock) except IOError: pass print("===save flag===") while True: if os.path.exists(sync_lock): break time.sleep(1) print("Finish sync data from {} to {}.".format(from_path, to_path)) def moxing_wrapper(pre_process=None, post_process=None): """ Moxing wrapper to download dataset and upload outputs. """ def wrapper(run_func): @functools.wraps(run_func) def wrapped_func(*args, **kwargs): # Download data from data_url if config.enable_modelarts: if config.data_url: sync_data(config.data_url, config.data_path) print("Dataset downloaded: ", os.listdir(config.data_path)) if config.checkpoint_url: sync_data(config.checkpoint_url, config.load_path) print("Preload downloaded: ", os.listdir(config.load_path)) if config.train_url: sync_data(config.train_url, config.output_path) print("Workspace downloaded: ", os.listdir(config.output_path)) context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id()))) config.device_num = get_device_num() config.device_id = get_device_id() if not os.path.exists(config.output_path): os.makedirs(config.output_path) if pre_process: pre_process() # Run the main function run_func(*args, **kwargs) # Upload data to train_url if config.enable_modelarts: if post_process: post_process() if config.train_url: print("Start to copy output directory") sync_data(config.output_path, config.train_url) return wrapped_func return wrapper if config.enable_modelarts: get_device_id = moxing_adp_get_device_id get_device_num = moxing_adp_get_device_num get_rank_id = moxing_adp_get_rank_id get_job_id = moxing_adp_get_job_id else: get_device_id = local_adp_get_device_id get_device_num = local_adp_get_device_num get_rank_id = local_adp_get_rank_id get_job_id = local_adp_get_job_id

yd_233394255 发表于2024-03-09 17:37:25 2024-03-09 17:37:25 最后回复运气男孩 2024-04-01 08:57:00
24 1

图像识别 Image 图像标签 Image Tagging
[其他] VGG

写在前面：本程序以VGG16为例，用Mindspore框架进行VGG的数据集处理、网络搭建以及训练和测试。 VGGNet是牛津大学视觉几何组(Visual Geometry Group)提出的模型，该模型在2014ImageNet图像分类与定位挑战赛 ILSVRC-2014中取得在分类任务第二，定位任务第一的优异成绩。VGGNet突出的贡献是证明了很小的卷积，通过增加网络深度可以有效提高性能。VGG很好的继承了Alexnet的衣钵同时拥有着鲜明的特点。即网络层次较深。 VGGNet模型有A-E五种结构网络，深度分别为11,11,13,16,19。其中较为典型的网络结构主要有VGG16和VGG19。本程序使用的模型为VGG16,使用的数据集为CIFAR-10数据集。参考博客：https://blog.csdn.net/hgnuxc_1993/article/details/115956774 参考博客：https://blog.csdn.net/weixin_43496706/article/details/10121098 参考博客：https://zhuanlan.zhihu.com/p/9100284 VGG图像分类图像分类是最基础的计算机视觉应用，属于有监督学习类别，如给定一张图像(猫、狗、飞机、汽车等等)，判断图像所属的类别。本章将介绍使用VGG16网络对CIFAR-10数据集进行分类。 VGG网络介绍 VGGNet是牛津大学视觉几何组(Visual Geometry Group)提出的模型，该模型在2014ImageNet图像分类与定位挑战赛 ILSVRC-2014中取得在分类任务第二，定位任务第一的优异成绩。VGGNet突出的贡献是证明了很小的卷积，通过增加网络深度可以有效提高性能。VGG很好的继承了Alexnet的衣钵同时拥有着鲜明的特点。即网络层次较深。 VGGNet模型有A-E五种结构网络，深度分别为11,11,13,16,19。其中较为典型的网络结构主要有VGG16和VGG19。 VGG是Oxford的Visual Geometry Group的组提出的（大家应该能看出VGG名字的由来了）。该网络是在ILSVRC 2014上的相关工作，主要工作是证明了增加网络的深度能够在一定程度上影响网络最终的性能。VGG有两种结构，分别是VGG16和VGG19，两者并没有本质上的区别，只是网络深度不一样。 image.png VGG16相比AlexNet的一个改进是采用连续的几个3x3的卷积核代替AlexNet中的较大卷积核（11x11，7x7，5x5）。对于给定的感受野（与输出有关的输入图片的局部大小），采用堆积的小卷积核是优于采用大的卷积核，因为多层非线性层可以增加网络深度来保证学习更复杂的模式，而且代价还比较小（参数更少）。简单来说，在VGG中，使用了3个3x3卷积核来代替7x7卷积核，使用了2个3x3卷积核来代替5*5卷积核，这样做的主要目的是在保证具有相同感知野的条件下，提升了网络的深度，在一定程度上提升了神经网络的效果。了解VGG网络更多详细内容，参见VGG论文。这里使用了 download.download函数来下载 CIFAR-10 数据集需要预先在控制台”pip install download“ CIFAR-10数据集由60000张32x32彩色图片组成，总共有10个类别，每类6000张图片。有50000个训练样本和10000个测试样本。10个类别包含飞机、汽车、鸟类、猫、鹿、狗、青蛙、马、船和卡车。整个数据集被分为5个训练批次和1个测试批次，每一批10000张图片。测试批次包含10000张图片，是由每一类图片随机抽取出1000张组成的集合。剩下的50000张图片每一类的图片数量都是5000张，训练批次是由剩下的50000张图片打乱顺序，然后随机分成5份，所以可能某个训练批次中10个种类的图片数量不是对等的，会出现一个类的图片数量比另一类多的情况。会在同目录下创建一个名为datasets-cifar10-bin文件夹，然后自动将数据集下载至文件夹中。下载好之后结构如下： datasets-cifar10-bin/cifar-10-batches-bin ├── batches.meta.text ├── data_batch_1.bin ├── data_batch_2.bin ├── data_batch_3.bin ├── data_batch_4.bin ├── data_batch_5.bin ├── readme.html └── test_batch.bin !pip install download from download import download url = "http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz" download(url, "./datasets-cifar10-bin", kind="tar.gz")下面的部分是训练和测试数据集的构建 import mindspore as ms import numpy as np import mindspore.dataset as ds import mindspore.dataset.vision as vision from mindspore import nn, ops data_dir = "./datasets-cifar10-bin/cifar-10-batches-bin" # 数据集根目录 batch_size = 6 # 批量大小 image_size = 32 # 训练图像空间大小 workers = 4 # 并行线程个数 num_classes = 10 # 分类数量 batch_size,image_size,workers,num_classes 利用mindspore.dataset中的函数Cifar10Dataset对CIFAR-10数据集进行处理。该函数读取和解析CIFAR-10数据集的源文件构建数据集。生成的数据集有两列: [image, label] 。 image 列的数据类型是uint8。label 列的数据类型是uint32。具体说明查看API文档：https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.Cifar10Dataset.html?highlight=cifar10dataset dataset_dir：数据集根目录 usage：值可以为"train"或"test"，表示是构建训练集还是测试集 resize：处理后的数据集图像大小，本程序中设置为32 batch_size：批量大小 workers：并行线程个数 return：返回处理好的数据集 shuffle：shuffle=True表示需要混洗数据集，即随机在其中取数据而不是按照顺序 def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers): data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir, usage=usage, num_parallel_workers=workers, shuffle=True) trans = []#需要做的变化的集合 """ 对于训练集，首先进行随机裁剪和随机翻转的操作。使用mindspore.dataset.vision.RandomCrop对输入图像进行随机区域的裁剪,大小为(32, 32)。(4, 4, 4, 4)表示在裁剪前，将在图像上下左右各填充4个像素的空白。使用mindspore.dataset.RandomHorizontalFlip,对输入图像按50%的概率进行水平随机翻转 """ if usage == "train": trans += [ vision.RandomCrop((32, 32), (4, 4, 4, 4)), vision.RandomHorizontalFlip(prob=0.5) ] """ 对数据集进行大小、规模的重组，以及归一化（帮助模型收敛） """ trans += [ vision.Resize(resize), vision.Rescale(1.0 / 255.0, 0.0), vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]), vision.HWC2CHW() ] #对于label进行的操作 target_trans = [(lambda x: np.array([x]).astype(np.int32)[0])] # 数据映射操作 data_set = data_set.map( operations=trans, input_columns='image', num_parallel_workers=workers) data_set = data_set.map( operations=target_trans, input_columns='label', num_parallel_workers=workers) # 批量操作 data_set = data_set.batch(batch_size) return data_set # 利用上面写好的那个函数，获取处理后的训练与测试数据集 dataset_train = create_dataset_cifar10(dataset_dir=data_dir, usage="train", resize=image_size, batch_size=batch_size, workers=workers) step_size_train = dataset_train.get_dataset_size() index_label_dict = dataset_train.get_class_indexing() dataset_val = create_dataset_cifar10(dataset_dir=data_dir, usage="test", resize=image_size, batch_size=batch_size, workers=workers) step_size_val = dataset_val.get_dataset_size() step_size_val,step_size_train,index_label_dict,dataset_train 对训练数据集进行可视化操作 import matplotlib.pyplot as plt import numpy as np data_iter = next(dataset_train.create_dict_iterator()) images = data_iter["image"].asnumpy() labels = data_iter["label"].asnumpy() print(f"Image shape: {images.shape}, Label: {labels}") classes = [] with open(data_dir+"/batches.meta.txt", "r") as f: for line in f: line = line.rstrip() if line != '': classes.append(line) plt.figure() for i in range(6): plt.subplot(2, 3, i+1) image_trans = np.transpose(images[i], (1, 2, 0)) mean = np.array([0.4914, 0.4822, 0.4465]) std = np.array([0.2023, 0.1994, 0.2010]) image_trans = std * image_trans + mean image_trans = np.clip(image_trans, 0, 1) plt.title(f"{classes[labels[i]]}") plt.imshow(image_trans) plt.axis("off") #展示训练集的数据（标签与原图） plt.show() 构建VGG16网络关于VGG16网络结构的介绍：如图： image.png 1、输入224x224x3的图片，经64个3x3的卷积核作两次卷积+ReLU，卷积后的尺寸变为224x224x64 2、作max pooling（最大化池化），池化单元尺寸为2x2（效果为图像尺寸减半），池化后的尺寸变为112x112x64 3、经128个3x3的卷积核作两次卷积+ReLU，尺寸变为112x112x128 4、作2x2的max pooling池化，尺寸变为56x56x128 5、经256个3x3的卷积核作三次卷积+ReLU，尺寸变为56x56x256 6、作2x2的max pooling池化，尺寸变为28x28x256 7、经512个3x3的卷积核作三次卷积+ReLU，尺寸变为28x28x512 8、作2x2的max pooling池化，尺寸变为14x14x512 9、经512个3x3的卷积核作三次卷积+ReLU，尺寸变为14x14x512 10、作2x2的max pooling池化，尺寸变为7x7x512 11、与两层1x1x4096，一层1x1x1000进行全连接+ReLU（共三层） 12、通过softmax输出1000个预测结果(最终会取可能性最大的那个预测结果作为最终预测输出) ATTENTION：由于本程序中使用的数据集中，每张图片的大小为32*32，因此根据这个大小对VGG网络的输入尺寸进行了微调，实际应用中，针对不同的大小，对最后一块的nn.Dense的参数进行调整即可。 VGG16网络的模型图如下 image.png 测试细节论文在测试时，将全连接层转换为卷积层。第一个全连接层转换为7x7的卷积层，最后两个全连接层转换为1x1的卷积层。示意图如下： image.png 只是把权重的维度变换和拓展了。经过转换的网络就没有了全连接层，这样网络就可以接受任意尺寸的输入，而不是像之前之能输入固定大小的输入。这样网络的输出是一个class score map，map的每个通道表示每个分类，map的分辨率是可变的，取决于输入图片的大小。为了获得输出的向量，需要对class score map进行spatially averaged。 class VGG16(nn.Cell): def __init__(self): super().__init__() numClasses = 10 self.all_sequential = nn.SequentialCell( nn.Conv2d(3, 64, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(64), nn.ReLU(), nn.Conv2d(64, 64, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), nn.Conv2d(64, 128, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(128), nn.ReLU(), nn.Conv2d(128, 128, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), nn.Conv2d(128, 256, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(256), nn.ReLU(), nn.Conv2d(256, 256, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(256), nn.ReLU(), nn.Conv2d(256, 256, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(256), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), nn.Conv2d(256, 512, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(512), nn.ReLU(), nn.Conv2d(512, 512, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(512), nn.ReLU(), nn.Conv2d(512, 512, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(512), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), nn.Conv2d(512, 512, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(512), nn.ReLU(), nn.Conv2d(512, 512, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(512), nn.ReLU(), nn.Conv2d(512, 512, kernel_size=3, padding=1, pad_mode="pad"), nn.BatchNorm2d(512), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # 原始模型vgg16输入image大小是224*224，这里使用的数据集输入大小为32*32，缩小7倍 # 可以根据需要的大小来调整，比如如果输入的image大小是224*224，那么由于224/32=7，因此就把第一个nn.Dense的参数改成512*7*7，其他不变 nn.Flatten(), nn.Dense(512*1*1, 256), nn.ReLU(), nn.Dropout(), nn.Dense(256, 256), nn.ReLU(), nn.Dropout(), nn.Dense(256, numClasses), ) def construct(self, x): x = self.all_sequential(x) return x 这个函数主要是用来处理预训练模型的，就是如果有预训练模型参数需要在训练之前输入，就把pretrained设为True，此处由于没有预训练模型提供，因此后面在训练的时候设置的是False。如果不需要预训练的话其实这个函数就是”model = VGG16()“然后return了model而已。 from mindspore import load_checkpoint, load_param_into_net def _vgg16(pretrained: bool = False): model = VGG16() "VGG16模型" #预训练模型的下载网址 model_url = "https://download.mindspore.cn/model_zoo/official/cv/vgg/vgg16_ascend_0.5.0_cifar10_official_classification_20200715/vgg16.ckpt" #存储路径 model_ckpt = "./LoadPretrainedModel/vgg16_0715.ckpt" if pretrained: download(url=model_url, path=model_ckpt) param_dict = load_checkpoint(model_ckpt) load_param_into_net(model, param_dict) return model 训练过程和评估过程 import mindspore as ms # 定义VGG16网络，此处不采用预训练，即将pretrained设置为False vgg16 = _vgg16(pretrained=False) #param.requires_grad = True表示所有参数都需要求梯度进行更新。 for param in vgg16.get_parameters(): param.requires_grad = True # 设置训练的轮数和学习率，这里训练的轮数设置为40 num_epochs = 40 #基于余弦衰减函数计算学习率。学习率最小值为0.0001，最大值为0.0005，具体API见文档https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.cosine_decay_lr.html?highlight=cosine_decay_lr lr = nn.cosine_decay_lr(min_lr=0.0001, max_lr=0.0005, total_step=step_size_train * num_epochs, step_per_epoch=step_size_train, decay_epoch=num_epochs) # 定义优化器和损失函数 #Adam优化器，具体可参考论文https://arxiv.org/abs/1412.6980 opt = nn.Adam(params=vgg16.trainable_params(), learning_rate=lr) # 交叉熵损失 loss_fn = nn.CrossEntropyLoss() #前向传播，计算loss def forward_fn(inputs, targets): logits = vgg16(inputs) loss = loss_fn(logits, targets) return loss #计算梯度和loss grad_fn = ops.value_and_grad(forward_fn, None, opt.parameters) def train_step(inputs, targets): loss, grads = grad_fn(inputs, targets) opt(grads) return loss # 实例化模型 model = ms.Model(vgg16, loss_fn, opt, metrics={"Accuracy": nn.Accuracy()}) # 创建迭代器 data_loader_train = dataset_train.create_tuple_iterator(num_epochs=num_epochs) data_loader_val = dataset_val.create_tuple_iterator(num_epochs=num_epochs) # 最佳模型存储路径 best_acc = 0 best_ckpt_dir = "./BestCheckpoint" best_ckpt_path = "./BestCheckpoint/vgg16-best.ckpt" import os import stat # 开始循环训练 print("Start Training Loop ...") for epoch in range(num_epochs): losses = [] vgg16.set_train() # 为每轮训练读入数据 for i, (images, labels) in enumerate(data_loader_train): loss = train_step(images, labels) if i%100 == 0 or i == step_size_train -1: print('Epoch: [%3d/%3d], Steps: [%3d/%3d], Train Loss: [%5.3f]'%( epoch+1, num_epochs, i+1, step_size_train, loss)) losses.append(loss) # 每个epoch结束后，验证准确率 acc = model.eval(dataset_val)['Accuracy'] print("-" * 50) print("Epoch: [%3d/%3d], Average Train Loss: [%5.3f], Accuracy: [%5.3f]" % ( epoch+1, num_epochs, sum(losses)/len(losses), acc )) print("-" * 50) if acc > best_acc: best_acc = acc if not os.path.exists(best_ckpt_dir): os.mkdir(best_ckpt_dir) if os.path.exists(best_ckpt_path): os.chmod(best_ckpt_path, stat.S_IWRITE)#取消文件的只读属性，不然删不了 os.remove(best_ckpt_path) ms.save_checkpoint(vgg16, best_ckpt_path) print("=" * 80) print(f"End of validation the best Accuracy is: {best_acc: 5.3f}, " f"save the best ckpt file in {best_ckpt_path}", flush=True) """ 验证和评估效果并且将效果可视化 """ import matplotlib.pyplot as plt def visualize_model(best_ckpt_path, dataset_val): net = _vgg16(pretrained=False) # 加载模型参数 param_dict = ms.load_checkpoint(best_ckpt_path) ms.load_param_into_net(net, param_dict) model = ms.Model(net) # 加载验证集的数据进行验证 data = next(dataset_val.create_dict_iterator()) images = data["image"].asnumpy() labels = data["label"].asnumpy() # 预测图像类别 output = model.predict(ms.Tensor(data['image'])) pred = np.argmax(output.asnumpy(), axis=1) # 图像分类 classes = [] with open(data_dir+"/batches.meta.txt", "r") as f: for line in f: line = line.rstrip() if line != '': classes.append(line) # 显示图像及图像的预测值 plt.figure() for i in range(6): plt.subplot(2, 3, i+1) # 若预测正确，显示为蓝色；若预测错误，显示为红色 color = 'blue' if pred[i] == labels[i] else 'red' plt.title('predict:{}'.format(classes[pred[i]]), color=color) picture_show = np.transpose(images[i], (1, 2, 0)) mean = np.array([0.4914, 0.4822, 0.4465]) std = np.array([0.2023, 0.1994, 0.2010]) picture_show = std * picture_show + mean picture_show = np.clip(picture_show, 0, 1) plt.imshow(picture_show) plt.axis('off') plt.show() # 使用测试数据集进行验证 visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val) （一大堆一大堆的结果）另一次测试中，经过10个Epoch后准确率达到了55%，运行输出如下： Start Training Loop ... Epoch: [ 1/ 10], Steps: [ 1/8334], Train Loss: [2.304] Epoch: [ 1/ 10], Steps: [101/8334], Train Loss: [2.298] …… Epoch: [ 1/ 10], Steps: [8301/8334], Train Loss: [2.492] Epoch: [ 1/ 10], Steps: [8334/8334], Train Loss: [1.792] Epoch: [ 1/ 10], Average Train Loss: [2.073], Accuracy: [0.190] Epoch: [ 2/ 10], Steps: [ 1/8334], Train Loss: [1.713] …… Epoch: [ 2/ 10], Steps: [8334/8334], Train Loss: [1.864] Epoch: [ 2/ 10], Average Train Loss: [1.907], Accuracy: [0.195] Epoch: [ 3/ 10], Steps: [ 1/8334], Train Loss: [1.432] …… Epoch: [ 3/ 10], Steps: [8334/8334], Train Loss: [1.967] Epoch: [ 3/ 10], Average Train Loss: [1.834], Accuracy: [0.297] Epoch: [ 4/ 10], Steps: [ 1/8334], Train Loss: [1.815] …… Epoch: [ 4/ 10], Steps: [8334/8334], Train Loss: [0.952] Epoch: [ 4/ 10], Average Train Loss: [1.680], Accuracy: [0.382] Epoch: [ 5/ 10], Steps: [ 1/8334], Train Loss: [1.300] …… Epoch: [ 5/ 10], Steps: [8334/8334], Train Loss: [2.586] Epoch: [ 5/ 10], Average Train Loss: [1.557], Accuracy: [0.395] Epoch: [ 6/ 10], Steps: [ 1/8334], Train Loss: [1.660] …… Epoch: [ 6/ 10], Steps: [8334/8334], Train Loss: [0.922] Epoch: [ 6/ 10], Average Train Loss: [1.460], Accuracy: [0.423] Epoch: [ 7/ 10], Steps: [ 1/8334], Train Loss: [1.651] …… Epoch: [ 7/ 10], Steps: [8334/8334], Train Loss: [1.811] Epoch: [ 7/ 10], Average Train Loss: [1.374], Accuracy: [0.514] Epoch: [ 8/ 10], Steps: [ 1/8334], Train Loss: [0.895] …… Epoch: [ 8/ 10], Steps: [8334/8334], Train Loss: [1.344] Epoch: [ 8/ 10], Average Train Loss: [1.296], Accuracy: [0.514] Epoch: [ 9/ 10], Steps: [ 1/8334], Train Loss: [1.672] …… Epoch: [ 9/ 10], Steps: [8334/8334], Train Loss: [0.477] Epoch: [ 9/ 10], Average Train Loss: [1.234], Accuracy: [0.512] Epoch: [ 10/ 10], Steps: [ 1/8334], Train Loss: [1.143] …… Epoch: [ 10/ 10], Steps: [8334/8334], Train Loss: [4.882] Epoch: [ 10/ 10], Average Train Loss: [1.192], Accuracy: [0.550] ================================================================================ End of validation the best Accuracy is: 0.550, save the best ckpt file in ./BestCheckpoint/resnet50-best.ckpt

yd_233394255 发表于2024-03-09 17:31:12 2024-03-09 17:31:12 最后回复运气男孩 2024-04-01 08:56:59
36 1

图像识别 Image
[其他] restnet50

#数据加载和准备 from download import download url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz" download(url, "./datasets-cifar10-bin", kind="tar.gz", replace=True) #数据加载 import mindspore as ms#将MindSpore框架导入并重命名为ms，以便在代码中使用简短的别名来引用MindSpore的功能。 import mindspore.dataset as ds#入MindSpore中的数据集模块，用于处理和管理数据集。 import mindspore.dataset.vision as vision #入MindSpore中用于处理视觉数据的模块，例如图像数据的加载和预处理。 import mindspore.dataset.transforms as transforms #导入MindSpore中的数据转换模块，用于对数据集进行各种转换和处理操作，如数据增强、标准化等。 from mindspore import dtype as mstype#从MindSpore中导入数据类型（dtype）并将其重命名为mstype，用于指定张量和数组的数据类型 data_dir = "./datasets-cifar10-bin/cifar-10-batches-bin" # 数据集根目录 batch_size = 256 # 批量大小 image_size = 32 # 训练图像空间大小 workers = 4 # 并行线程个数 num_classes = 10 # 分类数量 def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers): #函数：定义了一个函数来创建CIFAR-10数据集，接收参数包括数据集目录、用途、图像大小、批量大小和线程数 data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir, usage=usage, num_parallel_workers=workers, shuffle=True) #使用MindSpore的Cifar10Dataset类创建CIFAR-10数据集，指定数据集目录、用途、线程数和是否打乱数据。 trans = []#定义了一个数据增强操作的列表。 if usage == "train": #如果数据集用途是训练，则执行以下操作。 trans += [ #将随机裁剪和水平翻转等操作添加到数据增强操作列表中。 vision.RandomCrop((32, 32), (4, 4, 4, 4)), vision.RandomHorizontalFlip(prob=0.5) ] trans += [ #将缩放、归一化和通道转换等操作添加到数据增强操作列表中。 vision.Resize(resize), vision.Rescale(1.0 / 255.0, 0.0), vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]), vision.HWC2CHW() ] target_trans = transforms.TypeCast(mstype.int32) #定义一个目标数据类型转换操作，将标签转换为int32类型。 # 数据映射操作，对图像进行处理 data_set = data_set.map(operations=trans, input_columns='image', num_parallel_workers=workers) #将数据集映射到目标数据类型转换操作上，对标签进行处理。 data_set = data_set.map(operations=target_trans, input_columns='label', num_parallel_workers=workers) # 批量操作 data_set = data_set.batch(batch_size) return data_set # 获取处理后的训练与测试数据集 #使用create_dataset_cifar10函数创建训练数据集。 dataset_train = create_dataset_cifar10(dataset_dir=data_dir, usage="train", resize=image_size, batch_size=batch_size, workers=workers) #获取训练数据集的大小 step_size_train = dataset_train.get_dataset_size() #使用create_dataset_cifar10函数创建测试数据集。 dataset_val = create_dataset_cifar10(dataset_dir=data_dir, usage="test", resize=image_size, batch_size=batch_size, workers=workers) #获取验证数据集的大小，以便在模型评估阶段使用。 step_size_val = dataset_val.get_dataset_size() #对数据进行可视化 import matplotlib.pyplot as plt import numpy as np data_iter = next(dataset_train.create_dict_iterator()) images = data_iter["image"].asnumpy() labels = data_iter["label"].asnumpy() print(f"Image shape: {images.shape}, Label shape: {labels.shape}") # 训练数据集中，前六张图片所对应的标签 print(f"Labels: {labels[:6]}") classes = [] with open(data_dir + "/batches.meta.txt", "r") as f: for line in f: line = line.rstrip() if line: classes.append(line) # 训练数据集的前六张图片 plt.figure() for i in range(6): plt.subplot(2, 3, i + 1) image_trans = np.transpose(images[i], (1, 2, 0))#对图像数据进行转置操作，将通道维度移动到最后。 mean = np.array([0.4914, 0.4822, 0.4465])#定义图像均值 std = np.array([0.2023, 0.1994, 0.2010])#定义图像标准差 image_trans = std * image_trans + mean#对图像进行反归一化操作 image_trans = np.clip(image_trans, 0, 1)#将图像像素值限制在0到1之间 plt.title(f"{classes[labels[i]]}")#设置当前子图的标题为对应标签的类别信息 plt.imshow(image_trans)#在子图中显示处理后的图像 plt.axis("off")#关闭坐标轴显示 plt.show()#关闭坐标轴显示 #构建网络构建残差网络结构残差网络结构图如下图所示，残差网络由两个分支构成：一个主分支，一个shortcuts（图中弧线表示）。主分支通过堆叠一系列的卷积操作得到，shotcuts从输入直接到输出，主分支输出的特征矩阵𝐹(𝑥)�(�)加上shortcuts输出的特征矩阵𝑥�得到𝐹(𝑥)+𝑥�(�)+�，通过Relu激活函数后即为残差网络最后的输出。如下代码定义ResidualBlockBase类实现Building Block结构。 from typing import Type, Union, List, Optional#导入了需要用到的类型提示模块，用于指定函数参数和返回值的类型。 import mindspore.nn as nn#导入MindSpore深度学习框架中的神经网络模块nn，用于构建神经网络模型。 from mindspore.common.initializer import Normal#从MindSpore深度学习框架的初始化模块中导入了Normal类，用于参数初始化。 # 初始化卷积层与BatchNorm的参数 weight_init = Normal(mean=0, sigma=0.02)#创建一个正态分布的参数初始化对象，平均值为0，标准差为0.02 gamma_init = Normal(mean=1, sigma=0.02)#创建另一个正态分布的参数初始化对象，用于初始化BatchNorm层中的gamma参数。 class ResidualBlockBase(nn.Cell): #表示这是一个神经网络模型的组件。 expansion: int = 1 # 最后一个卷积核数量与第一个卷积核数量相等 def __init__(self, in_channel: int, out_channel: int, stride: int = 1, norm: Optional[nn.Cell] = None, down_sample: Optional[nn.Cell] = None) -> None: #定义了初始化函数__init__，接受输入通道数in_channel、 #输出通道数out_channel、步幅stride、归一化norm和下采样down_sample等参数，并且没有返回值。 super(ResidualBlockBase, self).__init__() #调用父类nn.Cell的初始化函数，确保正确地初始化继承的属性。 if not norm: #根据是否传入归一化参数norm，选择是否使用BatchNorm2d归一化层 self.norm = nn.BatchNorm2d(out_channel) else: self.norm = norm self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, weight_init=weight_init)#创建一个2维卷积层conv1，指定输入通道数、 #输出通道数、卷积核大小、步幅和参数初始化方式。 self.conv2 = nn.Conv2d(in_channel, out_channel, kernel_size=3, weight_init=weight_init)#创建另一个2维卷积层conv2，指定输入通道数、 #输出通道数、卷积核大小和参数初始化方式。 self.relu = nn.ReLU()#创建一个ReLU激活函数实例relu。 self.down_sample = down_sample #将传入的下采样函数赋值给成员变量down_sample，如果没有传入则为None。 def construct(self, x):#用于定义ResidualBlockBase的前向传播逻辑，接受输入x作为参数 """ResidualBlockBase construct.""" identity = x # shortcuts分支 out = self.conv1(x) # 主分支第一层：3*3卷积层使用conv1进行卷积操作，得到输出out。 out = self.norm(out) #对输出out进行归一化操作。 out = self.relu(out) #对输出out进行ReLU激活函数操作 out = self.conv2(out) # 主分支第二层：3*3卷积层使用conv2进行卷积操作，得到输出out。 out = self.norm(out) # 再次对输出out进行归一化操作 if self.down_sample is not None: #如果存在下采样函数，对输入x进行下采样操作，并将结果赋值给identity。 identity = self.down_sample(x) out += identity # 输出为主分支与shortcuts之和 out = self.relu(out) #对相加后的输出进行ReLU激活函数操作。 return out #如下代码定义ResidualBlock类实现Bottleneck结构。 class ResidualBlock(nn.Cell):#定义了一个名为ResidualBlock的类，继承自nn.Cell，表示这是一个神经网络模型的组件。 expansion = 4 # 最后一个卷积核的数量是第一个卷积核数量的4倍 def __init__(self, in_channel: int, out_channel: int, stride: int = 1, down_sample: Optional[nn.Cell] = None) -> None: #定义了初始化函数__init__， #接受输入通道数in_channel、输出通道数out_channel、步幅stride和下采样down_sample等参数，并且没有返回值。 super(ResidualBlock, self).__init__()#调用父类nn.Cell的初始化函数，确保正确地初始化继承的属性。 self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=1, weight_init=weight_init) #创建一个2维卷积层conv1，指定输入通道数in_channel、 #输出通道数out_channel、卷积核大小为1，并使用weight_init进行参数初始化。 self.norm1 = nn.BatchNorm2d(out_channel) #创建一个BatchNorm2d归一化层norm1，指定输入通道数为out_channel。 self.conv2 = nn.Conv2d(out_channel, out_channel, kernel_size=3, stride=stride, weight_init=weight_init) #创建一个2维卷积层conv2，指定输入通道数和输出通道数都为out_channel， #卷积核大小为3，并且可以通过stride参数设置步幅。 self.norm2 = nn.BatchNorm2d(out_channel) #创建一个BatchNorm2d归一化层norm2，指定输入通道数为out_channel。 self.conv3 = nn.Conv2d(out_channel, out_channel * self.expansion, kernel_size=1, weight_init=weight_init) #创建一个2维卷积层conv3，指定输入通道数为out_channel， #输出通道数为out_channel * self.expansion（即4倍），卷积核大小为1。 self.norm3 = nn.BatchNorm2d(out_channel * self.expansion) #创建一个BatchNorm2d归一化层norm3，指定输入通道数为out_channel * self.expansion。 self.relu = nn.ReLU()#创建一个ReLU激活函数实例rel self.down_sample = down_sample#将传入的下采样函数赋值给成员变量down_sample，如果没有传入则为None。 def construct(self, x): #定义了construct方法，用于定义ResidualBlock的前向传播逻辑，接受输入x作为参数。 identity = x # shortscuts分支 out = self.conv1(x) # 主分支第一层：1*1卷积层使用conv1进行卷积操作，得到输出out。 out = self.norm1(out)#对输出out进行归一化操作 out = self.relu(out)#对输出out进行ReLU激活函数操作 out = self.conv2(out) # 主分支第二层：3*3卷积层使用conv2进行卷积操作，得到输出out。 out = self.norm2(out)#使用conv2进行卷积操作，得到输出out。 out = self.relu(out)#对输出out进行ReLU激活函数操作。 out = self.conv3(out) # 主分支第三层：1*1卷积层使用conv3进行卷积操作，得到输出out。 out = self.norm3(out)#对输出out进行归一化操作。 if self.down_sample is not None: identity = self.down_sample(x) #如果存在下采样函数，对输入x进行下采样操作，并将结果赋值给identity。 out += identity # 输出为主分支与shortcuts之和将主分支的输出与shortcuts分支的输出相加。 out = self.relu(out) #对相加后的输出进行ReLU激活函数操作。 return out #构建ResNet50网络 def make_layer(last_out_channel, block: Type[Union[ResidualBlockBase, ResidualBlock]], channel: int, block_nums: int, stride: int = 1): ##定义了一个名为make_layer的函数，接受输入参数last_out_channel（上一层的输出通道数）、 #zzblock（残差块的类型）、channel（通道数）、block_nums（块的数量）和stride（步幅），并且没有返回值。 down_sample = None # shortcuts分支 if stride != 1 or last_out_channel != channel * block.expansion: #判断条件，如果步幅不为1或者上一层的输出通道数不等于channel * block.expansion，进入条件语句块。 down_sample = nn.SequentialCell([ nn.Conv2d(last_out_channel, channel * block.expansion, kernel_size=1, stride=stride, weight_init=weight_init), nn.BatchNorm2d(channel * block.expansion, gamma_init=gamma_init) ]) #如果满足条件，创建一个下采样模块down_sample，使用Conv2d和BatchNorm2d构建，用于调整维度匹配。 layers = [] #初始化一个空列表layers，用于存储构建的残差块。 layers.append(block(last_out_channel, channel, stride=stride, down_sample=down_sample)) #将第一个残差块添加到layers列表中，传入参数last_out_channel、channel、stride和down_sample。 in_channel = channel * block.expansion #计算下一层的输入通道数，用于堆叠残差网络。 # 堆叠残差网络 for _ in range(1, block_nums): #循环block_nums次，用于堆叠多个残差块。 layers.append(block(in_channel, channel)) #在循环中，将新的残差块添加到layers列表中，传入参数in_channel和channel。 return nn.SequentialCell(layers) #将layers列表中的残差块构建成一个SequentialCell，并返回作为整个层的输出。 #如下示例代码实现ResNet50模型的构建，通过用调函数resnet50即可构建ResNet50模型，函数resnet50参数如下：num_classes：分类的类别数，默认类别数为1000。pretrained：下载对应的训练模型，并加载预训练模型中的参数到网络中。 from mindspore import load_checkpoint, load_param_into_net #从MindSpore库中导入load_checkpoint和load_param_into_net函数，用于加载模型参数到网络中。 class ResNet(nn.Cell): #定义一个名为ResNet的类，继承自nn.Cell，表示这是一个神经网络模型。 def __init__(self, block: Type[Union[ResidualBlockBase, ResidualBlock]], layer_nums: List[int], num_classes: int, input_channel: int) -> None: #定义ResNet类的初始化方法，接受参数block（残差块类型）、layer_nums（各层残差块数量列表）、 #num_classes（分类类别数）、input_channel（输入通道数），并且没有返回值。 super(ResNet, self).__init__() #调用父类nn.Cell的初始化方法。 self.relu = nn.ReLU() # 第一个卷积层，输入channel为3（彩色图像），输出channel为64 self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, weight_init=weight_init) self.norm = nn.BatchNorm2d(64) # 最大池化层，缩小图片的尺寸 self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') #定义一个最大池化层，用于下采样特征图，参数设置为3x3的池化核，步幅为2，填充模式为'same'。 # 各个残差网络结构块定义 self.layer1 = make_layer(64, block, 64, layer_nums[0]) self.layer2 = make_layer(64 * block.expansion, block, 128, layer_nums[1], stride=2) self.layer3 = make_layer(128 * block.expansion, block, 256, layer_nums[2], stride=2) self.layer4 = make_layer(256 * block.expansion, block, 512, layer_nums[3], stride=2) # 平均池化层 self.avg_pool = nn.AvgPool2d() # flattern层 self.flatten = nn.Flatten() # 全连接层 self.fc = nn.Dense(in_channels=input_channel, out_channels=num_classes) def construct(self, x): #定义了网络的构建方法construct，接受输入x作为参数。 x = self.conv1(x) x = self.norm(x) x = self.relu(x) x = self.max_pool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avg_pool(x) x = self.flatten(x) x = self.fc(x) return x def _resnet(model_url: str, block: Type[Union[ResidualBlockBase, ResidualBlock]], layers: List[int], num_classes: int, pretrained: bool, pretrained_ckpt: str, input_channel: int): model = ResNet(block, layers, num_classes, input_channel) if pretrained: # 加载预训练模型 download(url=model_url, path=pretrained_ckpt, replace=True) param_dict = load_checkpoint(pretrained_ckpt) load_param_into_net(model, param_dict) return model -------------------------------------------------------------------------------------------------def resnet50(num_classes: int = 1000, pretrained: bool = False): """ResNet50模型""" resnet50_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/application/resnet50_224_new.ckpt" resnet50_ckpt = "./LoadPretrainedModel/resnet50_224_new.ckpt" return _resnet(resnet50_url, ResidualBlock, [3, 4, 6, 3], num_classes, pretrained, resnet50_ckpt, 2048) #模型训练与评估 # 定义ResNet50网络 network = resnet50(pretrained=True) # 全连接层输入层的大小 in_channel = network.fc.in_channels fc = nn.Dense(in_channels=in_channel, out_channels=10) # 重置全连接层 network.fc = fc # 设置学习率 num_epochs = 5 lr = nn.cosine_decay_lr(min_lr=0.00001, max_lr=0.001, total_step=step_size_train * num_epochs, step_per_epoch=step_size_train, decay_epoch=num_epochs) # 定义优化器和损失函数 opt = nn.Momentum(params=network.trainable_params(), learning_rate=lr, momentum=0.9) loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') def forward_fn(inputs, targets): logits = network(inputs) loss = loss_fn(logits, targets) return loss grad_fn = ms.value_and_grad(forward_fn, None, opt.parameters) def train_step(inputs, targets): loss, grads = grad_fn(inputs, targets) opt(grads) return loss import os # 创建迭代器 data_loader_train = dataset_train.create_tuple_iterator(num_epochs=num_epochs) data_loader_val = dataset_val.create_tuple_iterator(num_epochs=num_epochs) # 最佳模型存储路径 best_acc = 0 best_ckpt_dir = "./BestCheckpoint" best_ckpt_path = "./BestCheckpoint/resnet50-best.ckpt" if not os.path.exists(best_ckpt_dir): os.mkdir(best_ckpt_dir) import mindspore.ops as ops def train(data_loader, epoch): """模型训练""" losses = [] network.set_train(True) for i, (images, labels) in enumerate(data_loader): loss = train_step(images, labels) if i % 100 == 0 or i == step_size_train - 1: print('Epoch: [%3d/%3d], Steps: [%3d/%3d], Train Loss: [%5.3f]' % (epoch + 1, num_epochs, i + 1, step_size_train, loss)) losses.append(loss) return sum(losses) / len(losses) def evaluate(data_loader): """模型验证""" network.set_train(False) correct_num = 0.0 # 预测正确个数 total_num = 0.0 # 预测总数 for images, labels in data_loader: logits = network(images) pred = logits.argmax(axis=1) # 预测结果 correct = ops.equal(pred, labels).reshape((-1, )) correct_num += correct.sum().asnumpy() total_num += correct.shape[0] acc = correct_num / total_num # 准确率 return acc # 开始循环训练 print("Start Training Loop ...") for epoch in range(num_epochs): curr_loss = train(data_loader_train, epoch) curr_acc = evaluate(data_loader_val) print("-" * 50) print("Epoch: [%3d/%3d], Average Train Loss: [%5.3f], Accuracy: [%5.3f]" % ( epoch+1, num_epochs, curr_loss, curr_acc )) print("-" * 50) # 保存当前预测准确率最高的模型 if curr_acc > best_acc: best_acc = curr_acc ms.save_checkpoint(network, best_ckpt_path) print("=" * 80) print(f"End of validation the best Accuracy is: {best_acc: 5.3f}, " f"save the best ckpt file in {best_ckpt_path}", flush=True) #可视化模型预测 import matplotlib.pyplot as plt def visualize_model(best_ckpt_path, dataset_val): num_class = 10 # 对狼和狗图像进行二分类 net = resnet50(num_class) # 加载模型参数 param_dict = ms.load_checkpoint(best_ckpt_path) ms.load_param_into_net(net, param_dict) # 加载验证集的数据进行验证 data = next(dataset_val.create_dict_iterator()) images = data["image"] labels = data["label"] # 预测图像类别 output = net(data['image']) pred = np.argmax(output.asnumpy(), axis=1) # 图像分类 classes = [] with open(data_dir + "/batches.meta.txt", "r") as f: for line in f: line = line.rstrip() if line: classes.append(line) # 显示图像及图像的预测值 plt.figure() for i in range(6): plt.subplot(2, 3, i + 1) # 若预测正确，显示为蓝色；若预测错误，显示为红色 color = 'blue' if pred[i] == labels.asnumpy()[i] else 'red' plt.title('predict:{}'.format(classes[pred[i]]), color=color) picture_show = np.transpose(images.asnumpy()[i], (1, 2, 0)) mean = np.array([0.4914, 0.4822, 0.4465]) std = np.array([0.2023, 0.1994, 0.2010]) picture_show = std * picture_show + mean picture_show = np.clip(picture_show, 0, 1) plt.imshow(picture_show) plt.axis('off') plt.show() # 使用测试数据集进行验证 visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val)

yd_233394255 发表于2024-03-09 15:51:06 2024-03-09 15:51:06 最后回复运气男孩 2024-04-01 08:57:01
50 1

图像识别 Image
[其他] 生成的AscendP3的OM模型可用，而生成的AscendB4的OM模型不可用

我是用ATC生成AscendP3的模型可用，命令如下：atc --framework=5 --model=./fairmot.onnx --input_format=NCHW --input_shape="actual_input_1:1,3,2048,512" --output=./fairmot_bs1 --log=error --soc_version=Ascend310P3在300I上正确推理，同样的模型，用：atc --framework=5 --model=./fairmot.onnx --input_format=NCHW --input_shape="actual_input_1:1,3,2048,512" --output=./fairmot_bs1_B4 --log=error --soc_version=Ascend310B4生成ascend310B4在200 DK上报错，报错内容如下：(FairMot) HwHiAiUser@davinci-mini:~/altas/project/FairMOT/test$ python3 -m ais_bench --model fairmot_bs1_B4.om --input ./pre_dataset/ --output ./out --output_dirname result --outfmt BIN[INFO] acl init success[INFO] open device 0 successE19999: Inner Error!Get stub function failed, name=1351718655777264385_trans_Cast_0.[FUNC:GetFunctionByName][FILE:logger.cc][LINE:91]rtGetFunctionByName execute failed, reason=[kernel lookup error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]E19999 Call rtGetFunctionByName failed op:trans_Cast_0(Cast), bin_file_key:1351718655777264385_trans_Cast_0, ret:0x7BC93[FUNC:InitTVMTask][FILE:kernel_task_info.cc][LINE:1167]TraceBack (most recent call last):Task index:1 init failed, ret:507027.[FUNC:InitTaskInfo][FILE:davinci_model.cc][LINE:3818][Model][FromData]load model from data failed, ge result[507027][FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][ERROR] load model from file failed, model file is fairmot_bs1_B4.om[WARN] Check failed:processModel->LoadModelFromFile(modelPath), ret:1[WARN] no model had been loaded, unload failedTraceback (most recent call last):File "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/runpy.py", line 193, in _run_module_as_main"__main__", mod_spec)File "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/runpy.py", line 85, in _run_codeexec(code, run_globals)File "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/site-packages/ais_bench/__main__.py", line 18, inexec(open(os.path.join(cur_path, "infer/__main__.py")).read())File "", line 280, inFile "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/site-packages/ais_bench/infer/infer_process.py", line 752, in infer_processmain(args)File "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/site-packages/ais_bench/infer/infer_process.py", line 428, in mainsession = init_inference_session(args, tmp_acl_json_path if tmp_acl_json_path is not None else acl_json_path)File "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/site-packages/ais_bench/infer/infer_process.py", line 115, in init_inference_sessionsession = InferSession(args.device, args.model, acl_json_path, args.debug, args.loop)File "/home/HwHiAiUser/.conda/envs/FairMot/lib/python3.7/site-packages/ais_bench/infer/interface.py", line 94, in __init__self.session = aclruntime.InferenceSession(self.model_path, self.device_id, self.options)RuntimeError: [1][ACL: invalid parameter][INFO] end to reset device 0[INFO] end to finalize acl文件信息如下：(FairMot) HwHiAiUser@davinci-mini:~/altas/project/FairMOT/test$ ls -ltotal 123140-r--r--r-- 1 HwHiAiUser HwHiAiUser 41906916 Mar 1 20:33 fairmot_bs1.om-r--r--r-- 1 HwHiAiUser HwHiAiUser 42070773 Mar 1 20:33 fairmot_bs1_B1.om-r--r--r-- 1 HwHiAiUser HwHiAiUser 42071677 Mar 2 21:58 fairmot_bs1_B4.om-rw-rw-r-- 1 HwHiAiUser HwHiAiUser 5412 Mar 1 20:31 fairmot_postprocess.pydrwx------ 2 HwHiAiUser HwHiAiUser 4096 Mar 3 21:49 outdr-xr-xr-x 2 HwHiAiUser HwHiAiUser 4096 Mar 1 20:31 pre_dataset-rw-rw-r-- 1 HwHiAiUser HwHiAiUser 8970 Mar 1 20:31 test.py系统报507027错误，可能是什么原因？模型太大？

stockerc 发表于2024-03-03 22:12:25 2024-03-03 22:12:25 最后回复运气男孩 2024-04-01 08:56:57
176 3

图像识别 Image Atlas 200 DK开发者套件
[问题求助] 如何把在jupyter上写的程序图像重建程序弄到开发板上atlas200DK跑啊

我们自己写了一个unet模型的程序，怎么将它弄到atlas200DK开发板上跑啊，已经完成了制卡，修改ip地址的步骤了。谢谢各位大神了

yd_219508362 发表于2024-02-21 16:02:48 2024-02-21 16:02:48 最后回复 yd_219508362 2024-02-22 15:34:40
41 3

图像识别 Image
[问题求助] Fairmot推理报错

我使用https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/contrib/cv/detection/FairMOT时，严格按照readme文件操作，前面全部成功，最后运行：python3 fairmot_postprocess.py --data_dir=./dataset --input_root=/home/HwHiAiUser/test/out/result 时，报错：Fix size testing. training chunk_sizes: [6, 6] The output will be saved to ./FairMOT/src/lib/../../exp/mot/default heads {'hm': 1, 'wh': 4, 'id': 128, 'reg': 2} 2024-02-16 16:10:09 [INFO]: start seq: MOT17-02-SDP 2024-02-16 16:10:09 [INFO]: start seq: MOT17-02-SDP Traceback (most recent call last): File "fairmot_postprocess.py", line 153, in <module> save_videos=False) File "fairmot_postprocess.py", line 66, in process nf, ta, tc = eval_seq(opt, dataloader, data_type, result_filename, seq,save_dir=output_dir, show_image=show_image, frame_rate=frame_rate) File "fairmot_postprocess.py", line 104, in eval_seq hm_eval = torch.from_numpy(np.fromfile(dataloader[i + 3], dtype='float32').reshape(1, 1, 152, 272)) ValueError: cannot reshape array of size 82688 into shape (1,1,152,272) 请问为什么？如何解决？我的环境，ubuntu20.04 kernel：5.4.0-26-generic ；Ascend310P3；torch 1.5；python 3.7

stockerc 发表于2024-02-16 16:18:42 2024-02-16 16:18:42 最后回复福州司马懿 2024-02-27 10:02:38
103 6

图像识别 Image Atlas 200 DK开发者套件
[教程] 轻松搞定图像识别调用

前言基于java使用SDK实现图像识别，主要以媒资图像标签和名人识别为例。一、环境配置Maven（没有直接下载华为的SDK包，而是使用Maven安装依赖）JDK19（官方的SDK包要求JDK版本必须高于JDK8版本，大家根据自己只要满足版本要求即可）开发工具：IDEA 2023.3（其他版本也可）能创建Maven项目即可开通图像识别服务（目前是免费体验）：这里我开通的是图像标签/媒资图像标签和名人识别服务。设置访问密钥服务区域：我开通的服务区域是华北-北京四关键步骤Maven项目的创建和Java环境变量的配置我就不再赘诉，这是大家学习java早已熟练掌握的，这里只讲诉易错的。开通图像识别服务华为云首页就有云产品体验区（找不到就在搜索栏检索），勾选AI：点击“立即体验”后，找到服务列表，开通你想要的服务（点击开通）：设置访问密钥在控制台找到“我的凭证”：找到“访问密钥”，如果没有就新增，新增后一定要下载密钥的CSV文件，他会有提示让你下载，防止你忘记：下载完csv文件后用记事本打开即可看到AK和SK： Maven引入依赖配置版本可以自己切换 <dependency> <groupId>com.huaweicloud.sdk</groupId> <artifactId>huaweicloud-sdk-image</artifactId> <version>3.1.8</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.70</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpcore</artifactId> <version>4.4.16</version> </dependency> <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> <version>1.16.0</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.13.0</version> </dependency>二、图像识别实例媒资图像标签功能介绍：对用户传入的图像可以返回图像中的物体名称、所属类别及置信度信息。使用图片是网上的，仅作学习使用：代码如下：/** * @Version: 1.0.0 * @Author: Dragon_王 * @ClassName: RunImageMediaTaggingSolution * @Description: 媒资图像标签 * @Date: 2024/1/8 11:51 */ /** * 此demo仅供测试使用，强烈建议使用SDK * 使用前需配置依赖jar包。jar包可通过下载SDK获取 */ import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.image.v2.region.ImageRegion; import com.huaweicloud.sdk.image.v2.*; import com.huaweicloud.sdk.image.v2.model.*; public class RunImageMediaTaggingSolution { public static void main(String[] args) { //此处需要输入您的AK/SK信息 String ak = "你的AK"; String sk = "你的SK"; ICredential auth = new BasicCredentials() .withAk(ak) .withSk(sk); ImageClient client = ImageClient.newBuilder() .withCredential(auth) .withRegion(ImageRegion.valueOf("cn-north-4")) //此处替换为您开通服务的区域 .build(); RunImageMediaTaggingRequest request = new RunImageMediaTaggingRequest(); ImageMediaTaggingReq body = new ImageMediaTaggingReq(); body.withThreshold(10f); body.withLanguage("zh"); body.withUrl("https://tse2-mm.cn.bing.net/th/id/OIP-C.SIuEnb1-arhtDNqfdICVqAHaE7?rs=1&pid=ImgDetMain"); //此处替换为公网可以访问的图片地址 request.withBody(body); try { RunImageMediaTaggingResponse response = client.runImageMediaTagging(request); System.out.println(response.toString()); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } }运行结果： //运行结果如下 class RunImageMediaTaggingResponse { result: class ImageMediaTaggingResponseResult { tags: [class ImageMediaTaggingItemBody { confidence: 83.63 type: 动物 tag: 金毛犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 金毛犬 en: Golden retriever } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 81.78 type: 动物 tag: 金毛 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 金毛 en: Golden hair } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 77.00 type: 动物 tag: 金毛寻猎犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 金毛寻猎犬 en: Golden Retriever } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 62.60 type: 动物 tag: 贵妇犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 贵妇犬 en: Poodle } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 59.02 type: 生活 tag: 狗链 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 狗链 en: Dog chain } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 生活 en: Life } instances: [] }, class ImageMediaTaggingItemBody { confidence: 53.84 type: 动物 tag: 宠物狗 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 宠物狗 en: Pet dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 48.01 type: 动物 tag: 狗狗 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 狗狗 en: Dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 44.02 type: 动物 tag: 犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 犬 en: Dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 42.11 type: 动物 tag: 纯种犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 纯种犬 en: Purebred dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }, class ImageMediaTaggingItemBody { confidence: 38.65 type: 动物 tag: 中华田园犬 i18nTag: class ImageMediaTaggingItemBodyI18nTag { zh: 中华田园犬 en: Chinese pastoral dog } i18nType: class ImageMediaTaggingItemBodyI18nType { zh: 动物 en: Animal } instances: [] }] } } Process finished with exit code 0名人识别功能介绍:分析并识别图片中包含的敏感人物、明星及网红人物，返回人物信息及人脸坐标。使用照片是网上的照片，仅作学习使用：代码如下：/** * @Version: 1.0.0 * @Author: Dragon_王 * @ClassName: RunCelebrityRecognitionSolution * @Description: 媒资标签 * @Date: 2024/1/9 16:23 */ import com.alibaba.fastjson.JSON; import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.image.v2.ImageClient; import com.huaweicloud.sdk.image.v2.model.RunCelebrityRecognitionRequest; import com.huaweicloud.sdk.image.v2.region.ImageRegion; import com.huaweicloud.sdk.image.v2.model.CelebrityRecognitionReq; import com.huaweicloud.sdk.image.v2.model.RunCelebrityRecognitionResponse; public class RunCelebrityRecognitionSolution { public static void main(String[] args) { // 认证用的ak和sk硬编码到代码中或者明文存储都有很大的安全风险，建议在配置文件或者环境变量中密文存放，使用时解密，确保安全 // 本示例以ak和sk保存在环境变量中来实现身份验证为例，运行本示例前请先在本地环境中设置环境变量HUAWEICLOUD_SDK_AK和HUAWEICLOUD_SDK_SK String ak = "你的AK"; String sk = "你的SK"; ICredential auth = new BasicCredentials() .withAk(ak) .withSk(sk); ImageClient client = ImageClient.newBuilder() .withCredential(auth) .withRegion(ImageRegion.valueOf("cn-north-4")) //此处替换为您开通服务的区域 .build(); RunCelebrityRecognitionRequest request = new RunCelebrityRecognitionRequest(); CelebrityRecognitionReq body = new CelebrityRecognitionReq(); body.withThreshold(0f); body.withUrl("https://tse1-mm.cn.bing.net/th/id/OIP-C.tM6jifW1xaCDP7Kia9QiYwHaKD?rs=1&pid=ImgDetMain"); //此处替换为公网可以访问的图片地址 request.withBody(body); try { RunCelebrityRecognitionResponse response = client.runCelebrityRecognition(request); System.out.println(response.getHttpStatusCode()); System.out.println(JSON.toJSONString(response)); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } }运行结果：200 {"httpStatusCode":200,"result":[{"confidence":0.9985551,"faceDetail":{"w":132,"h":186,"x":197,"y":79},"label":"成龙"}]} Process finished with exit code 0总结以上就是华为云的AI图像识别服务调用，这里提供官方文档

浩泽学编程 发表于2024-01-09 17:41:16 2024-01-09 17:41:16 最后回复 Yanamaria 2024-07-11 15:02:44
511 6

Java 名人识别 ROC 图像识别 Image 图像标签 Image Tagging API/SDK
[问题求助] HoloSens一站式开发平台的网址是多少啊？？？

自己搭yolov5环境训练再转wk也太麻烦了吧，看文档说是有个这么个平台能在线训练和转换模型，结果哪儿哪儿找不到网址，请问网址是多少???

yd_241193574 发表于2023-12-22 15:45:43 2023-12-22 15:45:43 最后回复 yd_244274466 2024-05-07 22:44:29
224 12

图像识别 Image 智能摄像机
[优秀实践] 基于华为云ModelArts实现k-Means鸢尾花聚类实验

一、实验名称K-Means鸢尾花聚类实验二、实验目的实践所涉及并要求掌握的知识点1. 了解聚类算法和K-Means的基本概念；2. 了解如何使用MindSpore进行K-Means聚类实验。三、实验内容实验环境要求1. MindSpore 1.32. 华为云ModelArts3. PC 64bit具体实践内容使用MindSpore实现了对鸢尾花数据集做聚类分析，k-Means算法使用简单的迭代将数据集聚成k个类。四、实验原理实践所涉及的原理1. K-Means算法的过程（1）首先输入K的值，即我们希望将数据集经过聚类得到K个分组；（2）从数据集中随机选择K个数据点作为初始数据（质心，Centroid）也可选择非样本点；（3）对集合中每一个样本，分别计算其与K个质心的距离（这里的距离一般取欧氏距离或余弦距离），找到离该点最近的质心，将它归属到对应的簇；（4）所有点都归属到簇之后，M个样本就分为了K个簇，之后重新计算每个簇的质心（平均距离中心），将其定为新的质心；（5）如果新质心和老质心之间的距离小于某一个设置的阈值（表示重新计算的质心的位置变化不大，趋于稳定，或者说收敛），可以认为我们进行的聚类已经达到期望的结果，算法终止；（6）如果新质心和老质心距离变化很大，需要迭代3~5步骤。2. K-Means的主要优点（1）原理比较简单，实现也是很容易，收敛速度快。（2）聚类效果较优。（3）算法的可解释度比较强。（4）主要需要调参的参数仅仅是簇数k。3. K-Means的主要缺点（1）K值的选取不好把握；（2）对于不是凸的数据集比较难收敛；（3）如果各隐含类别的数据不平衡，比如各隐含类别的数据量严重失衡，或者各隐含类别的方差不同，则聚类效果不佳；（4）采用迭代方法，得到的结果只是局部最优；（5）对噪音和异常点比较的敏感。五、实验过程详细记录实践过程，包括涉及到的已知或输入、代码内容、结果或输出、具体步骤、遇到的问题及解决方法等1. 数据准备（1）下载数据从Iris数据集官网下载，解压后得到iris.data文件；鸢尾花数据集的形式如下图所示，包含4个特征和1个类别。（2）下载main.py文件打开新建的Notrbook实验环境，选择MindSpore；点击上传标志将刚才保存的iris.data文件上传2. 创建OBS桶（1）创建桶->配置（2）新建文件夹->创建对象（上传具体文件）3. 创建训练作业（1）新建pip-requirements.txt文件，并在桶中添加该文件（2）配置（旧版界面）（新版界面）4. 点击提交开始训练5. Notebook实现（1）在创建的Notebook中上传main.py和pip-requirements.txt（2）选择Launcher，然后选择mindsport得到新的Untitled.ipynb（3）执行pip install seaborn，指定依赖包的包名及版本号（4）重启内核，执行run main.py执行结果如下：六、心得体会该pdf文档读起来有点困难，一开始觉得逻辑有点混乱，桶和训练作业没有理清逻辑关系。main.py里面的参数配置也比较不确定是文件路径还是url，也有一定原因是对python语言不太熟悉。最后训练作业时一直在排队，最后在同学的账户上进行相同的实验操作后运行完成了。在使用自己的账户时，在等待的过程中，利用Notebook开发环境下完成了该实验，根据打印信息得知该实验成功完成。通过本实验，对 K-means 算法有了进一步的认识（1）K-means 算法基本原理：K-means 是一种无监督学习算法，旨在将数据集划分为 K 个不同的组（簇），其中每个数据点属于与其最近的簇的中心。该算法的主要步骤包括选择初始质心、将数据点分配到最近的质心所属的簇、更新簇的中心，然后重复这些步骤直到收敛。（2）选择 K 的重要性在 K-means 算法中，选择正确的 K（簇的数量）很关键。通过实验尝试不同的 K 值，比较聚类效果，选择使得簇内相似度高、簇间相似度低的 K 值。

yd_272343087 发表于2023-12-12 14:41:36 2023-12-12 14:41:36 最后回复未来日记 2023-12-19 11:09:11
183 3

AI开发平台ModelArts 图像识别 Image

上滑加载中

推荐直播

热门标签

Java Python 数据结构数据库 Linux 机器学习网络任务调度 MySQL JavaScript