• [调优经验] 【MindSpore易点通】常用性能调优方法经验总结
    本文给大家介绍几种常用的性能调优方法。调整数据处理进程数MindSpore Data中的某些算子接收num_parallel_workers参数,该参数设置了算子使用的线程数。根据物理内核的数量和目标系统的负载,增加管道中num_parallel_workers的值,可以提高性能。对于任何包含计算密集型Tensor运算(比如Decode)的Map算子而言,尤其如此。使用默认数据处理进程数(为1):if do_train:    cifar_ds = ds.Cifar10Dataset(dataset_dir=data_home,                         shuffle=True, usage='train')else:    cifar_ds = ds.Cifar10Dataset(dataset_dir=data_home,                         shuffle=False, usage='test')cifar_ds = cifar_ds.map(operations=transform_label, input_columns="label")cifar_ds = cifar_ds.map(operations=transform_data, input_columns="image")cifar_ds = cifar_ds.batch(batch_size, drop_remainder=True)调整数据处理进程数为2:if do_train:    cifar_ds = ds.Cifar10Dataset(dataset_dir=data_home,                         num_parallel_workers=2, shuffle=True, usage='train')else:    cifar_ds = ds.Cifar10Dataset(dataset_dir=data_home,                         num_parallel_workers=2, shuffle=False, usage='test')cifar_ds = cifar_ds.map(operations=transform_label, num_parallel_workers=2, python_multiprocessing=True, input_columns="label")cifar_ds = cifar_ds.map(operations=transform_data, num_parallel_workers=2, python_multiprocessing=True, python_multiprocessing=True, input_columns="image")cifar_ds = cifar_ds.batch(batch_size, num_parallel_workers=2, drop_remainder=True)运行后性能对比:使用默认数据处理进程数:2200 imgs/sec ;调整数据处理进程数为2:2300 imgs/sec使用MindRecord格式加载数据目前MindSpore Data支持的特定格式数据集为:MindRecord。使用MindRecord格式的加载数据性能更优。MindRecord数据加载流程:import mindspore.dataset as dsCV_FILE_NAME = "./cifar10.mindrecord"cifar_ds = ds.MindDataset(dataset_file=CV_FILE_NAME,columns_list=["data","label"], shuffle=True)运行后性能对比: 自定义数据集加载cifar10:850 imgs/sec;MindRecord格式加载cifar10:2200 imgs/sec加载数据集时shuffle数据集加载时shuffle,算子可以在生成输出Tensor之前对数据进行内部shuffle。与在管道中使用显式shuffle算子相比,这种内部shuffle通常会带来更好的性能。管道中使用显式shuffle:import mindspore.dataset as dsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, usage='train')cifar_ds = cifar_ds.shuffle(buffer_size=10000)加载数据集时shuffle:import mindspore.dataset as dsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, shuffle=True, usage='train')运行后性能对比:本样例未见明显性能收益混合精度混合精度训练方法是通过混合使用单精度和半精度数据格式来加速深度神经网络训练的过程,同时保持了单精度训练所能达到的网络精度。混合精度训练能够加速计算过程,同时减少内存使用和存取,并使得在特定的硬件上可以训练更大的模型或batch size。运行后性能对比:高阶API: 2200 imgs/sec ;高阶API混合精度: 3300 imgs/sec                   低阶API: 2000 imgs/sec ;低阶API混合精度: 3200 imgs/sec数据下沉采用数据集的下沉模式,即训练计算下沉到硬件平台中执行,数据下沉可以优化训练性能。将Model.train接口中dataset_sink_mode 值设为True,即可采用数据下沉模式。数据未下沉:model.train(..., dataset_sink_mode=False, ...)数据下沉:model.train(..., dataset_sink_mode=True, sink_size=steps_per_epoch_train)注:低阶API不支持数据下沉,若想实现数据下沉请使用Model.train接口进行训练。运行后性能对比:数据未下沉:2000 imgs/sec;数据下沉:2200 imgs/sec避免c_transform和py_transform混用c_transform是在C++内维护buffer管理,py_transform是在python内维护buffer管理。因为python和C++切换的性能成本,建议不要混用算子,否则会降低训练性能。c_transform和py_transform混用:if do_train:    # Transformation on train datatransform_data = py_trans.Compose([            CV.RandomCrop((32, 32), (4, 4, 4, 4)),            py_vision.ToPIL(),            py_vision.RandomHorizontalFlip(),            CV.Rescale(rescale, shift),            CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),            CV.HWC2CHW()            ])else:    # Transformation on validation data    transform_data = py_trans.Compose([            CV.Rescale(rescale, shift),            CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),            CV.HWC2CHW()            ])          cifar_ds = cifar_ds.map(operations=transform_data, input_columns="image")c_transform和py_transform未混用:if do_train:        # Transformation on train data        transform_data = C.Compose([            CV.RandomCrop((32, 32), (4, 4, 4, 4)),            CV.RandomHorizontalFlip(),            CV.Rescale(rescale, shift),            CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),            CV.HWC2CHW()])    else:        # Transformation on validation data        transform_data = C.Compose([            CV.Rescale(rescale, shift),            CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),            CV.HWC2CHW()])cifar_ds = cifar_ds.map(operations=transform_data, input_columns="image")运行后性能对比:c_transform和py_transform混用:1100 imgs/sec;              c_transform和py_transform未混用:2200 imgs/sec多map合并单map数据预处理时,Map算子可以接收Tensor算子列表,并将按顺序应用所有的这些算子。与为每个Tensor算子使用单独的Map算子相比,可以获得更好的性能。多map数据预处理:randomcrop_op = CV.RandomCrop((32, 32), (4, 4, 4, 4))randomhorizontalflip_op = CV.RandomHorizontalFlip()rescale_op = CV.Rescale(rescale, shift)normalize_op = CV.Normalize((0.4914, 0.4822, 0.4465),                             (0.2023, 0.1994, 0.2010))hwc2chw_op = CV.HWC2CHW()if do_train:    cifar_ds = cifar_ds.map(operations=[randomcrop_op],                                 input_columns="image")    cifar_ds = cifar_ds.map(operations=[randomhorizontalflip_op],                                 input_columns="image")cifar_ds = cifar_ds.map(operations=[rescale_op],                                 input_columns="image")cifar_ds = cifar_ds.map(operations=[normalize_op],                                 input_columns="image")cifar_ds = cifar_ds.map(operations=[hwc2chw_op],                                 input_columns="image")else:cifar_ds = cifar_ds.map(operations=[rescale_op],                                 input_columns="image")cifar_ds = cifar_ds.map(operations=[normalize_op],                                 input_columns="image")cifar_ds = cifar_ds.map(operations=[hwc2chw_op],                                 input_columns="image")合并成单map数据预处理:if do_train:    # Transformation on train data    transform_data = C.Compose([    CV.RandomCrop((32, 32), (4, 4, 4, 4)),    CV.RandomHorizontalFlip(),    CV.Rescale(rescale, shift),    CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),    CV.HWC2CHW()])else:    # Transformation on validation data    transform_data = C.Compose([    CV.Rescale(rescale, shift),    CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),    CV.HWC2CHW()])cifar_ds = cifar_ds.map(operations=transform_data, input_columns="image")运行后性能对比:本样例未见明显性能收益融合算子MindSpore Data提供某些融合算子,这些算子将两个或多个算子的功能聚合到一个算子中。 与它们各自组件的流水线相比,这种融合算子提供了更好的性能。一个很好的融合算子是RandomCropDecodeResizeOp,它执行解码,然后对任意给定的Tensor进行随机裁剪和大小调整。用户可以查看算子API文档查看是否有相应融合算子替代现有算子,以获得更好性能。自动数据加速MindSpore提供了一种自动数据调优的工具——Dataset AutoTune,用于在训练过程中根据环境资源的情况自动调整数据处理管道的并行度,最大化利用系统资源加速数据处理管道的处理速度。在整个训练的过程中,Dataset AutoTune模块会持续检测当前训练性能瓶颈处于数据侧还是网络侧。如果检测到瓶颈在数据侧,则将进一步对数据处理管道中的各个算子(如GeneratorDataset、map、batch此类数据算子)进行参数调整。详细代码请点击附件进行下载
  • [调优经验] 【MindSpore易点通】混合精度训练使用总结
    一、概述混合精度训练方法是通过混合使用单精度和半精度数据格式来加速深度神经网络训练的过程,同时保持了单精度训练所能达到的网络精度。混合精度训练能够加速计算过程,同时减少内存使用和存取,并使得在特定的硬件上可以训练更大的模型或batch size。MindSpore混合精度典型的计算流程如下图所示:1、参数以FP32存储;2、正向计算过程中,遇到FP16算子,需要把算子输入和参数从FP32 cast成FP16进行计算;3、将Loss层设置为FP32进行计算;4、反向计算过程中,首先乘以Loss Scale值,避免反向梯度过小而产生下溢;5、FP16参数参与梯度计算,其结果将被cast回FP32;6、除以Loss scale值,还原被放大的梯度;7、判断梯度是否存在溢出,如果溢出则跳过更新,否则优化器以FP32对原始参数进行更新。二、使用场景由于混合精度能带来加速计算,减少内存占用的优势,因此用户在遇到以下情况可以考虑使用混合精度:1、内存资源不足;2、训练速度较慢。三、使用条件本文档是针对以下两类使用场景的的用户:1、即将启动MindSpore训练代码迁移任务,并对MindSpore有基础了解;2、已完成MindSpore训练代码迁移任务,即有可使用的MindSpore训练代码。四、使用样例1、MindSpore高阶API使用混合精度MindSpore在mindspore.Model接口中做了封装,方便用户调用。具体实现步骤与编写普通训练代码过程没有区别。只需要在Model中设置混合精度相关参数,如amp_level, loss_scale_manager, keep_batchnorm_fp32。修改高阶API代码中的Model接口,将amp_level设置成"O3",网络将采用混合精度进行训练。net = Model(net, loss, opt, metrics=metrics, amp_level="O3")2、MindSpore低阶API使用混合精度MindSpore低阶API使用混合精度,只需在MindSpore低阶API代码构造模型步骤中,将网络设置成混合精度进行训练。下面对比两者构造模型的区别。MindSpore低阶API代码中构造模型:class BuildTrainNetwork(nn.Cell):    '''Build train network.'''    def __init__(self, my_network, my_criterion, train_batch_size, class_num):        super(BuildTrainNetwork, self).__init__()        self.network = my_network        self.criterion = my_criterion        self.print = P.Print()        # Initialize self.output        self.output = mindspore.Parameter(Tensor(np.ones((train_batch_size,                         class_num)), mindspore.float32), requires_grad=False)    def construct(self, input_data, label):        output = self.network(input_data)        # Get the network output and assign it to self.output        self.output = output        loss0 = self.criterion(output, label)        return loss0class TrainOneStepCellV2(TrainOneStepCell):    def __init__(self, network, optimizer, sens=1.0):        super(TrainOneStepCellV2, self).__init__(network, optimizer, sens=1.0)    def construct(self, *inputs):        weights = self.weights        loss = self.network(*inputs)        # Obtain self.network from BuildTrainNetwork        output = self.network.output        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)        # Get the gradient of the network parameters        grads = self.grad(self.network, weights)(*inputs, sens)        grads = self.grad_reducer(grads)        # Optimize model parameters        loss = F.depend(loss, self.optimizer(grads))        return loss, output        model_constructed = BuildTrainNetwork(net, loss_function,                                 TRAIN_BATCH_SIZE, CLASS_NUM)model_constructed = TrainOneStepCellV2(model_constructed, opt)MindSpore低阶API混合精度代码中构造模型:class BuildTrainNetwork(nn.Cell):    '''Build train network.'''    def __init__(self, my_network, my_criterion, train_batch_size, class_num):        super(BuildTrainNetwork, self).__init__()        self.network = my_network        self.criterion = my_criterion        self.print = P.Print()        # Initialize self.output        self.output = mindspore.Parameter(Tensor(np.ones((train_batch_size,                         class_num)), mindspore.float32), requires_grad=False)    def construct(self, input_data, label):        output = self.network(input_data)        # Get the network output and assign it to self.output        self.output = output        loss0 = self.criterion(output, label)        return loss0class TrainOneStepCellV2(TrainOneStepCell):    '''Build train network.'''    def __init__(self, network, optimizer, sens=1.0):        super(TrainOneStepCellV2, self).__init__(network, optimizer, sens=1.0)    def construct(self, *inputs):        weights = self.weights        loss = self.network(*inputs)        # Obtain self.network from BuildTrainNetwork        output = self.network.output        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)        # Get the gradient of the network parameters        grads = self.grad(self.network, weights)(*inputs, sens)        grads = self.grad_reducer(grads)        # Optimize model parameters        loss = F.depend(loss, self.optimizer(grads))        return loss, output        def build_train_network_step2(network, optimizer,             loss_fn=None, level='O0', **kwargs):    """    Build the mixed precision training cell automatically.    """    amp.validator.check_value_type('network', network, nn.Cell)    amp.validator.check_value_type('optimizer', optimizer, nn.Optimizer)    amp.validator.check('level', level, "", ['O0', 'O2', 'O3', "auto"],                         amp.Rel.IN)    if level == "auto":        device_target = context.get_context('device_target')        if device_target == "GPU":            level = "O2"        elif device_target == "Ascend":            level = "O3"        else:            raise ValueError(        "Level `auto` only support when `device_target` is GPU or Ascend.")    amp._check_kwargs(kwargs)    config = dict(amp._config_level[level], **kwargs)    config = amp.edict(config)    if config.cast_model_type == mstype.float16:        network.to_float(mstype.float16)        if config.keep_batchnorm_fp32:            amp._do_keep_batchnorm_fp32(network)    if loss_fn:        network = amp._add_loss_network(network, loss_fn,                                     config.cast_model_type)    if amp._get_parallel_mode() in (amp.ParallelMode.SEMI_AUTO_PARALLEL,                                     amp.ParallelMode.AUTO_PARALLEL):        network = amp._VirtualDatasetCell(network)    loss_scale = 1.0    if config.loss_scale_manager is not None:        loss_scale_manager = config.loss_scale_manager        loss_scale = loss_scale_manager.get_loss_scale()        update_cell = loss_scale_manager.get_update_cell()        if update_cell is not None:            # only cpu not support `TrainOneStepWithLossScaleCell` for control flow.            if not context.get_context("enable_ge")                 and context.get_context("device_target") == "CPU":                raise ValueError("Only `loss_scale_manager=None` and "                "`loss_scale_manager=FixedLossScaleManager`"                "are supported in current version. If you use `O2` option,"                "use `loss_scale_manager=None` or `FixedLossScaleManager`")            network = TrainOneStepCellV2(network, optimizer)            return network    network = TrainOneStepCellV2(network, optimizer)    return network    model_constructed = BuildTrainNetwork(net, loss_function, TRAIN_BATCH_SIZE, CLASS_NUM)model_constructed = build_train_network_step2(model_constructed, opt, level="O3")五、性能对比相比全精度训练,使用混合精度后,获得了可观的性能提升。低阶API: 2000 imgs/sec ;低阶API混合精度: 3200 imgs/sec高阶API: 2200 imgs/sec ;高阶API混合精度: 3300 imgs/sec详细代码请见附件。
  • [数据加载及处理] 【MindSpore易点通】MindSpore Data经验解析
    一、简介首先MindSpore Data提供了简洁、丰富的数据读取、处理、增强等功能;同时使用读取数据的流程,主要分为三步(使用和PyTorch中数据读取方式类似): 数据集加载 - 根据数据格式,选择最简单、高效的数据集加载方式; 数据增强 - 使用几何变换、颜色变换、旋转、平移、缩放等基本图像处理技术来扩充数据集;数据处理 - 对数据集做repeat、batch、shuffle、map、zip等操作。二、使用说明1、数据集加载首先加载要使用的数据集,根据实际使用的数据集格式,从以下三种数据集读取方式选取一种即可:常用标准数据集:例如 ImageNet、MNIST、CIFAR-10、VOC等;特定格式数据集 :特定存储格式的数据,例如:MindRecord;自定义数据集:数据组织形式自定义的数据集。2.1 常用数据集加载目前已经支持的常用数据集有:MNIST, CIFAR-10, CIFAR-100, VOC, ImageNet, CelebA。如果使用以上开源数据集或者已经将所使用的数据整理为以上标准数据集格式,可以直接使用如下方法加载数据集。以CIFAR-10为例:import mindspore.dataset as dsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR)数据集加载好之后,就可以调用接口create_dict_iterator()创建迭代器读取数据,后面两种方式同理。for data in cifar_ds.create_dict_iterator():# In CIFAR-10 dataset, each dictionary of data has keys "image" and "label".    print(data["image"])    print(data["label"])2.2 特定格式数据集加载目前支持的特定格式数据集为:MindRecord。MindRecord格式的数据读取性能更优,推荐用户将数据转换为MindRecord格式。转换示例如下:from mindspore.mindrecord import Cifar10ToMRcifar10_path = "./cifar-10-batches-py"mindrecord_path = "./cifar10.mindrecord"cifar10_transformer = Cifar10ToMR(cifar10_path, mindrecord_path)cifar10_transformer.transform(["label"])MindRecord数据加载:import mindspore.dataset as dsCV_FILE_NAME = "./cifar10.mindrecord"cifar_ds = ds.MindDataset(dataset_file=CV_FILE_NAME,columns_list=["data","label"], shuffle=True)2.3 自定义数据集加载提供的自定义数据集加载方式为:GeneratorDataset接口。GeneratorDataset接口需要自己实现一个生成器,生成训练数据和标签,适用于较复杂的任务。GeneratorDataset()需要传入一个生成器,生成训练数据。import mindspore.dataset as dsclass Dataset:    def __init__(self, image_list, label_list):        super(Dataset, self).__init__()        self.imgs = image_list        self.labels = label_list    def __getitem__(self, index):        img = Image.open(self.imgs[index]).convert('RGB')        return img, self.labels[index]        def __len__(self):        return len(self.imgs)class MySampler():    def __init__(self, dataset):        self.__num_data = len(dataset)    def __iter__(self):        indices = list(range(self.__num_data))        return iter(indices)dataset = Dataset(save_image_list, save_label_list)sampler = MySampler(dataset)cifar_ds = ds.GeneratorDataset(dataset,             column_names=["image", "label"], sampler=sampler, shuffle=True)以上例子中 dataset是一个生成器,产生image和label。2、数据增强提供 c_transforms 和 py_transforms 两个模块来供用户完成数据增强操作,两者的对比如下:模块名称实现优缺点c_transforms基于C++的OpenCV实现性能较高py_transforms基于Python的PIL实现性能较差,但是可以自定义增强函数使用建议:如果不需要自定义增强函数,并且c_transforms中有对应的实现,建议使用c_transforms模块。2.1 c_transforms模块目前c_transforms接口包括两部分:mindspore.dataset.transforms.c_transforms和mindspore.dataset.vision.c_transforms。使用方法:1.定义好数据增强函数:把多个增强函数加入到一个list中,并调用Compose封装;2.调用dataset.map()函数,将定义好的函数或算子作用于指定的数据列。示例代码如下:import mindspore.dataset as dsimport mindspore.dataset.vision.c_transforms as CV_transformsimport mindspore.dataset.transforms.c_transforms as C_transformsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, shuffle=True, usage='train')#定义增强函数列表transforms_list = C_transforms.Compose[     CV_transforms.RandomCrop((32, 32), (4, 4, 4, 4)),CV_transforms.RandomHorizontalFlip(),CV_transforms.Rescale(rescale, shift),CV_transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),CV_transforms.HWC2CHW()]#调用map()函数cifar_ds = cifar_ds.map(operations=transforms_list, input_columns="image")其中,input_columns为指定要做增强的数据列,operations为定义的增强函数。2.2 py_transforms模块py_transforms接口也包括两部分mindspore.dataset.transforms.py_transforms和mindspore.dataset.vision.py_transforms。使用方法:和c_transforms模块中的使用方法类似。示例代码如下:import mindspore.dataset as dsimport mindspore.dataset.vision.py_transforms as py_visionimport mindspore.dataset.transforms.py_transforms as py_transformsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, shuffle=True, usage='train')transform_list = py_transforms.Compose([            py_vision.ToPIL(),            py_vision.RandomCrop((32, 32), (4, 4, 4, 4)),            py_vision.RandomHorizontalFlip(),            py_vision.ToTensor(),            py_vision.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])cifar_ds = cifar_ds.map(operations=transforms_list, input_columns="image")使用py_transforms自定义增强函数:自定义增强函数可参考MindSpore源码中的py_transforms_util.py脚本。下面以RandomBrightness为例,说明自定义增强算子的定义方式:#自定义增强函数定义class RandomBrightness(object):    """    Randomly adjust the brightness of the input image.    Args:        brightness (float): Brightness adjustment factor (default=0.0).    Returns:        numpy.ndarray, image.    """    def __init__(self, brightness=0.0):        self.brightness = brightness    def __call__(self, img):        alpha = random.uniform(-self.brightness, self.brightness)        return (1-alpha) * img自定义算子的调用和py_transforms_util.py中的算子调用没有区别。3、数据处理数据处理操作有:zip、shuffle、map、batch、repeat。数据处理操作说明zip合并多个数据集shuffle混洗数据map将函数和算子作用于指定列数据batch将数据分批,每次迭代返回一个batch的数据repeat对数据集进行复制一般训练过程中都会用到shuffle、map、batch、repeat,如下示例:import mindspore.dataset as dsimport mindspore.dataset.vision.c_transforms as CV_transformsimport mindspore.dataset.transforms.c_transforms as C_transformsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, shuffle=True, usage='train')transform_list = C.Compose([            CV.RandomCrop((32, 32), (4, 4, 4, 4)),            CV.RandomHorizontalFlip(),            CV.Rescale(rescale, shift),            CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),            CV.HWC2CHW()])# map()cifar_ds.map(input_columns="image", operations=transforms_list)# batch()cifar_ds = cifar_ds.batch(batch_size, drop_remainder=True)# repeat()cifar_ds = cifar_ds.repeat(repeat_num)在实际使用过程中,需要组合使用这几个操作时,为达到最优性能,推荐按照如下顺序: 数据集加载并shuffle -> map -> batch -> repeat。以下简单介绍一下数据处理函数的使用方法:3.1数据集加载与shuffle方式一:加载数据集时shuffleimport mindspore.dataset as dsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, shuffle=True, usage='train')方式二:加载数据集后shuffleimport mindspore.dataset as dsDATA_DIR = "./cifar-10-batches-bin/"cifar_ds = ds.Cifar10Dataset(DATA_DIR, usage='train')cifar_ds = cifar_ds.shuffle(buffer_size=10000)参数说明:buffer_size:buffer_size越大,混洗程度越大,时间消耗更大3.2 map:func = lambda x : x*2cifar_ds = cifar_ds.map(input_columns="data", operations=func)参数说明:input_columns:函数作用的列数据operations:对数据做操作的函数3.3 batchcifar_ds = cifar_ds.batch(batch_size=32, drop_remainder=True, num_parallel_workers=4)参数说明:drop_remainder:舍弃最后不完整的batchnum_parallel_workers: 用几个线程来读取数据3.4 repeatcifar_ds = cifar_ds.repeat(count=2)参数说明:count: 数据集复制数量3.5 zipimport mindspore.dataset as dsDATA_DIR_1 = "custom_dataset_dir_1/"DATA_DIR_2 = "custom_dataset_dir_2/"imagefolder_dataset_1 = ds.ImageFolderDatasetV2(DATA_DIR_1)imagefolder_dataset_2 = ds.ImageFolderDatasetV2(DATA_DIR_2)imagefolder_dataset = ds.zip((imagefolder_dataset_1, imagefolder_dataset_2))详细代码请见附件。
  • [模型训练] 【MindSpore易点通】如何将PyTorch源码转成MindSpore低阶API,并在Ascend芯片上实现单机单卡训练
    1 概述本文将介绍如何将PyTorch源码转换成MindSpore低阶API代码,并在Ascend芯片上实现单机单卡训练。下图展示了MindSpore高阶API、低阶API和PyTorch的训练流程的区别。 与MindSpore高阶API相同,低阶API训练也需要进行:配置运行信息、数据读取和预处理、网络定义、定义损失函数和优化器。具体步骤同高阶API。2 构造模型(低阶API)构造模型时,首先将网络原型与损失函数封装,再将组合的模型与优化器封装,最终组合成一个可用于训练的网络。 由于训练并验证中,需计算在训练集上的精度 ,因此返回值中需包含网络的输出值。import mindsporefrom mindspore import Modelimport mindspore.nn as nnfrom mindspore.ops import functional as Ffrom mindspore.ops import operations as Pclass BuildTrainNetwork(nn.Cell):    '''Build train network.'''    def __init__(self, my_network, my_criterion, train_batch_size, class_num):        super(BuildTrainNetwork, self).__init__()        self.network = my_network        self.criterion = my_criterion        self.print = P.Print()        # Initialize self.output        self.output = mindspore.Parameter(Tensor(np.ones((train_batch_size,                         class_num)), mindspore.float32), requires_grad=False)    def construct(self, input_data, label):        output = self.network(input_data)        # Get the network output and assign it to self.output        self.output = output        loss0 = self.criterion(output, label)        return loss0class TrainOneStepCellV2(TrainOneStepCell):    '''Build train network.'''    def __init__(self, network, optimizer, sens=1.0):        super(TrainOneStepCellV2, self).__init__(network, optimizer, sens=1.0)    def construct(self, *inputs):        weights = self.weights        loss = self.network(*inputs)        # Obtain self.network from BuildTrainNetwork        output = self.network.output        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)        # Get the gradient of the network parameters        grads = self.grad(self.network, weights)(*inputs, sens)        grads = self.grad_reducer(grads)        # Optimize model parameters        loss = F.depend(loss, self.optimizer(grads))        return loss, output    # Construct modelmodel_constructed = BuildTrainNetwork(net, loss_function, TRAIN_BATCH_SIZE, CLASS_NUM)model_constructed = TrainOneStepCellV2(model_constructed, opt)3 训练并验证(低阶API)和PyTorch中类似,采用低阶API进行网络训练并验证。详细步骤如下:class CorrectLabelNum(nn.Cell):    def __init__(self):        super(CorrectLabelNum, self).__init__()        self.print = P.Print()        self.argmax = mindspore.ops.Argmax(axis=1)        self.sum = mindspore.ops.ReduceSum()    def construct(self, output, target):        output = self.argmax(output)        correct = self.sum((output == target).astype(mindspore.dtype.float32))        return correctdef train_net(model, network, criterion,     epoch_max, train_path, val_path,     train_batch_size, val_batch_size,     repeat_size):        """define the training method"""    # Create dataset    ds_train, steps_per_epoch_train = create_dataset(train_path,         do_train=True, batch_size=train_batch_size, repeat_num=repeat_size)    ds_val, steps_per_epoch_val = create_dataset(val_path, do_train=False,                batch_size=val_batch_size, repeat_num=repeat_size)    # CheckPoint CallBack definition    config_ck = CheckpointConfig(save_checkpoint_steps=steps_per_epoch_train,                                 keep_checkpoint_max=epoch_max)    ckpoint_cb = ModelCheckpoint(prefix="train_resnet_cifar10",                                 directory="./", config=config_ck)    # Create dict to save internal callback object's parameters    cb_params = _InternalCallbackParam()    cb_params.train_network = model    cb_params.epoch_num = epoch_max    cb_params.batch_num = steps_per_epoch_train    cb_params.cur_epoch_num = 0    cb_params.cur_step_num = 0    run_context = RunContext(cb_params)    ckpoint_cb.begin(run_context)    print("============== Starting Training ==============")    correct_num = CorrectLabelNum()    correct_num.set_train(False)        for epoch in range(epoch_max):        print(" Epoch:", epoch+1, "/", epoch_max)        train_loss = 0        train_correct = 0        train_total = 0          for _, (data, gt_classes) in enumerate(ds_train):            model.set_train()            loss, output = model(data, gt_classes)            train_loss += loss            correct = correct_num(output, gt_classes)            correct = correct.asnumpy()            train_correct += correct.sum()            # Update current step number            cb_params.cur_step_num += 1            # Check whether to save checkpoint or not            ckpoint_cb.step_end(run_context)                    cb_params.cur_epoch_num += 1        my_train_loss = train_loss/steps_per_epoch_train        my_train_accuracy = 100*train_correct/(train_batch_size*                                steps_per_epoch_train)        print('Train Loss:', my_train_loss)        print('Train Accuracy:', my_train_accuracy, '%')                print('evaluating {}/{} ...'.format(epoch + 1, epoch_max))        val_loss = 0        val_correct = 0        for _, (data, gt_classes) in enumerate(ds_val):            network.set_train(False)            output = network(data)            loss = criterion(output, gt_classes)            val_loss += loss            correct = correct_num(output, gt_classes)            correct = correct.asnumpy()            val_correct += correct.sum()        my_val_loss = val_loss/steps_per_epoch_val        my_val_accuracy = 100*val_correct/(val_batch_size*steps_per_epoch_val)        print('Validation Loss:', my_val_loss)        print('Validation Accuracy:', my_val_accuracy, '%')    print("--------- trains out ---------")4 运行脚本启动命令:python MindSpore_1P_low_API.py --data_path=xxx --epoch_num=xxx在开发环境的Terminal中运行脚本,可以看到网络输出结果:注:由于高阶API采用数据下沉模式进行训练,而低阶API不支持数据下沉训练,因此高阶API比低阶API训练速度快。性能对比:低阶API: 2000 imgs/sec ;高阶API: 2200 imgs/sec详细代码请见附件。
  • [模型训练] 【MindSpore易点通】模型测试和验证
    1 模型测试在训练完成之后,需要测试模型在测试集上的表现。依据模型评估方式的不同,分以下两种情况1.评估方式在MindSpore中已实现MindSpore中提供了多种Metrics方式:Accuracy、Precision、Recall、F1、TopKCategoricalAccuracy、Top1CategoricalAccuracy、Top5CategoricalAccuracy、MSE、MAE、Loss 。在测试中调用MindSpore已有的评估函数,需要定义一个dict,包含要使用的评估方式,并在定义model时传入,后续调用model.eval()会返回一个dict,内容即为metrics的指标和结果。...def test_net(network, model, test_data_path, test_batch):    """define the evaluation method"""    print("============== Start Testing ==============")    # load the saved model for evaluation    param_dict = load_checkpoint("./train_resnet_cifar10-1_390.ckpt")    #load parameter to the network    load_param_into_net(network, param_dict)    #load testing dataset    ds_test = create_dataset(test_data_path, do_train=False,                             batch_size=test_batch)    acc = model.eval(ds_test, dataset_sink_mode=False)    print("============== test result:{} ==============".format(acc))if __name__ == "__main__":    ...    net = resnet()    loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True,                                 reduction='mean')    opt = nn.SGD(net.trainable_params(), LR_ORI, MOMENTUM_ORI, WEIGHT_DECAY)    metrics = {    'accuracy': nn.Accuracy(),    'loss': nn.Loss()    }    model = Model(net, loss, opt, metrics=metrics)    test_net(net, model_constructed, TEST_PATH, TEST_BATCH_SIZE)2.评估方式在MindSpore中没有实现如果MindSpore中的评估函数不能满足要求,可参考accuracy.py 通过继承Metric基类完成Metric定义之后,并重写clear,updata,eval三个方法即可。通过调用model.predict()接口,得到网络输出后,按照自定义评估标准计算结果。下面以计算测试集精度为例,实现自定义Metrics:class AccuracyV2(EvaluationBase):    def __init__(self, eval_type='classification'):        super(AccuracyV2, self).__init__(eval_type)        self.clear()    def clear(self):        """Clears the internal evaluation result."""        self._correct_num = 0        self._total_num = 0    def update(self, output_y, label_input):        y_pred = self._convert_data(output_y)        y = self._convert_data(label_input)        indices = y_pred.argmax(axis=1)        results = (np.equal(indices, y) * 1).reshape(-1)                self._correct_num += results.sum()        self._total_num += label_input.shape[0]    def eval(self):        if self._total_num == 0:            raise RuntimeError('Accuary can not be calculated')        return self._correct_num / self._total_num        def test_net(network, model, test_data_path, test_batch):    """define the evaluation method"""    print("============== Start Testing ==============")    # Load the saved model for evaluation    param_dict = load_checkpoint("./train_resnet_cifar10-1_390.ckpt")    # Load parameter to the network    load_param_into_net(network, param_dict)    # Load testing dataset    ds_test = create_dataset(test_data_path, do_train=False,                             batch_size=test_batch)    metric = AccuracyV2()    metric.clear()    for data, label in ds_test.create_tuple_iterator():        output = model.predict(data)        metric.update(output, label)    results = metric.eval()    print("============== New Metric:{} ==============".format(results))    if __name__ == "__main__":...    net = resnet()    loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True,                                         reduction='mean')    opt = nn.SGD(net.trainable_params(), LR_ORI, MOMENTUM_ORI, WEIGHT_DECAY)    model_constructed = Model(net, loss, opt)    test_net(net, model_constructed, TEST_PATH, TEST_BATCH_SIZE)2 边训练边验证在训练的过程中,在验证集上测试模型的效果。目前MindSpore有两种方式。1、交替调用model.train()和model.eval() ,实现边训练边验证。...def train_and_val(model, dataset_train, dataset_val, steps_per_train,                     epoch_max, evaluation_interval):    config_ck = CheckpointConfig(save_checkpoint_steps=steps_per_train,                                 keep_checkpoint_max=epoch_max)    ckpoint_cb = ModelCheckpoint(prefix="train_resnet_cifar10",                                 directory="./", config=config_ck)    model.train(evaluation_interval, dataset_train,             callbacks=[ckpoint_cb, LossMonitor()], dataset_sink_mode=True)    acc = model.eval(dataset_val, dataset_sink_mode=False)    print("============== Evaluation:{} ==============".format(acc))if __name__ == "__main__":    ...    ds_train, steps_per_epoch_train = create_dataset(TRAIN_PATH,             do_train=True, batch_size=TRAIN_BATCH_SIZE, repeat_num=1)    ds_val, steps_per_epoch_val = create_dataset(VAL_PATH, do_train=False,                batch_size=VAL_BATCH_SIZE, repeat_num=1)    net = resnet()    loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True,                                     reduction='mean')    opt = nn.SGD(net.trainable_params(), LR_ORI, MOMENTUM_ORI, WEIGHT_DECAY)    metrics = {    'accuracy': nn.Accuracy(),    'loss': nn.Loss()    }    net = Model(net, loss, opt, metrics=metrics)for i in range(int(EPOCH_MAX / EVAL_INTERVAL)):train_and_val(net, ds_train, ds_val, steps_per_epoch_train,                         EPOCH_MAX, EVAL_INTERVAL)2、MindSpore通过调用model.train接口,在callbacks中传入自定义的EvalCallBack实例,进行训练并验证。class EvalCallBack(Callback):        def __init__(self, model, eval_dataset, eval_epoch, result_evaluation):        self.model = model        self.eval_dataset = eval_dataset        self.eval_epoch = eval_epoch        self.result_evaluation = result_evaluation    def epoch_end(self, run_context):        cb_param = run_context.original_args()        cur_epoch = cb_param.cur_epoch_num        if cur_epoch % self.eval_epoch == 0:            acc = self.model.eval(self.eval_dataset, dataset_sink_mode=False)            self.result_evaluation["epoch"].append(cur_epoch)            self.result_evaluation["acc"].append(acc["accuracy"])            self.result_evaluation["loss"].append(acc["loss"])            print(acc)if __name__ == "__main__":    ...    ds_train, steps_per_epoch_train = create_dataset(TRAIN_PATH,         do_train=True, batch_size=TRAIN_BATCH_SIZE, repeat_num=REPEAT_SIZE)    ds_val, steps_per_epoch_val = create_dataset(VAL_PATH, do_train=False,                batch_size=VAL_BATCH_SIZE, repeat_num=REPEAT_SIZE)    net = resnet()    loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True,                                             reduction='mean')    opt = nn.SGD(net.trainable_params(), LR_ORI, MOMENTUM_ORI, WEIGHT_DECAY)    metrics = {    'accuracy': nn.Accuracy(),    'loss': nn.Loss()    }    net = Model(net, loss, opt, metrics=metrics)result_eval = {"epoch": [], "acc": [], "loss": []}    eval_cb = EvalCallBack(net, ds_val, EVAL_PER_EPOCH, result_eval)    net.train(EPOCH_MAX, ds_train,             callbacks=[ckpoint_cb, LossMonitor(), eval_cb],             dataset_sink_mode=True, sink_size=steps_per_epoch_train)3 样例代码使用说明本文的样例代码是一个Resnet50在Cifar10上训练的分类网络,采用datasets.Cifar10Dataset接口读取二进制版本的CIFAR-10数据集,因此下载CIFAR-10 binary version (suitable for C programs),并在代码中配置好数据路径。启动命令:python xxx.py --data_path=xxx --epoch_num=xxx运行脚本,可以看到网络输出结果:详细代码请见附件
  • [基础知识] 【MindSpore易点通】深度学习系列:其他优化算法
    在前面几期的介绍中,我们已经学习了Mini-batch梯度下降算法、指数加权平均算法,大家是不是觉得不过瘾,别担心,今天小编一口气带来好几个!废话不多说,我们开干吧~动量梯度下降法动量梯度下降法(Momentum),运行速度总是会快于标准的梯度下降算法,基本的思想就是计算梯度的指数加权平均数,并利用该梯度更新权重。在处理优化成本函数时,如上图所示,红点代表最小值的位置,假设从蓝色点开始梯度下降法,无论是batch还是mini-batch下降法,都需要一步一步计算,需要很多计算步骤,浪费很多时间;但是如果使用较大的学习率(紫色箭头),结果可能又会偏离函数范围,为了避免摆动过大,所以需要选用一个较小的学习率。总体来说,纵向上我们希望学习慢一点;而横向上我们更希望加快学习,快速从左向右移,移向最小值红点。所以使用动量梯度下降法时,需要在第t次迭代的过程中,计算微分dW,db;然后重新赋值权重,减缓梯度下降的幅度;最终我们会发现纵向的摆动平均值接近于零,横向的平均值较大,因此具体计算算法为:两个超参数学习率和参数的设置也是需要注意技巧的,控制着指数加权平均数,常用值为0.9,VdW初始值为0,跟W拥有相同的维数,vdb的初始值也是向量零,和b是同一维数。动量梯度下降法就是这么简单,通常可以用来加快学习算法。RMSprop其实RMSprop算法(root mean square prop)也可以加速梯度下降。假设纵轴代表参数b,横轴代表参数W,可能有W1,W2或者其它参数,这里我们简化一下,暂时使用W表示。RMSprop算法可以一方面减缓b方向的学习,同时不会减缓横轴方向的学习。在第t次迭代中,该算法会计算mini-batch的微分dW,db,保留这个指数加权平均数:更新参数:由于我们在横轴方向希望学习速度快,而在垂直方向希望减缓纵轴上的摆动,所以这就需要SdW相对较小,Sdb相对较大。RMSprop算法的更新过程如图中绿色线部分一样,可以用一个的更大学习率加快学习,但其实dW实际是一个高维度的参数向量,实际使用中需要注意这点。RMSprop均方根算法,将微分先平方,最后使用平方根,同时我们在处理的时候,通常会在分母上加上一个很小的(10-8)以保证算法不会除以0。Adam 优化算法Adam(Adam optimization algorithm)优化算法就很巧妙了,是上述两个算法的结合。1.初始化:;2.在第t次迭代中,用mini-batch梯度下降法计算dW,db;3.计算Momentum指数加权平均数:,;4.RMSprop更新:,;5.偏差修正:,;6.更新W和b:。之前我们已经提过,作为dW的移动平均数,通常选用0.9,超参数推荐使用0.999,设置为10-8,这样我们就可以尝试不同的,看看哪个效果最好啦。Adam算法结合了Momentum和RMSprop梯度下降法,是一种极其常用的学习算法,被证明能有效适用于不同神经网络,适用于广泛的结构。学习率衰减学习率衰减(Learning rate decay):随时间慢慢减少学习率,我们将之称为学习率衰减。假设使用mini-batch梯度下降法,mini-batch数量选择64或者128个样本,在迭代过程中会有噪音(蓝色线),所以算法不会会真正收敛,只能在附近摆动。但要慢慢减少学习率的话,在初期的时候,学习率较大,学习相对较快;但随着学习率变小,移动步伐也会变慢变小,所以曲线(绿色线)会在最小值附近的一小块区域里摆动,而不是大幅度在最小值附近摆动。学习率衰减一代就要遍历一次数据,拆分不同的mini-batch,第一次遍历训练集叫做第一代,第二次就是第二代,依此类推,将学习率设为(decay-rate称为衰减率,epoch-num为代数,为初始学习率)。当然也可以使用其他公式,比如指数衰减,其中a0是小于1的值,如,学习率呈指数下降。好啦,这次的优化算法就全部介绍到这里啦,尤其需要注意的是参数和公式的理解!下次给大家带来超参数调试的讲解,挥手十分钟再见!
  • [其他] 在DINO训练的视觉Transformers中探索对抗性攻击和防御
    本研究首次对DINO训练的自监督视觉Transformers 对抗攻击的鲁棒性进行了分析。首先,我们评估通过自监督学习的特征是否比通过监督学习学习的特征更能抵御对抗性攻击。然后,我们提出了在潜在空间中攻击所产生的性质。最后,我们评估了三种众所周知的防御策略是否可以增加下游任务的对抗鲁棒性,即使在有限的计算资源下,仅通过微调分类头来提供鲁棒性。这些防御策略是:对抗性训练、集成对抗性训练和专业网络集成。https://www.zhuanzhi.ai/paper/1fd3439a6ac6312d164fc46827f9cbb2
  • [其他] 在DINO训练的视觉Transformers中探索对抗性攻击和防御
    本研究首次对DINO训练的自监督视觉Transformers 对抗攻击的鲁棒性进行了分析。首先,我们评估通过自监督学习的特征是否比通过监督学习学习的特征更能抵御对抗性攻击。然后,我们提出了在潜在空间中攻击所产生的性质。最后,我们评估了三种众所周知的防御策略是否可以增加下游任务的对抗鲁棒性,即使在有限的计算资源下,仅通过微调分类头来提供鲁棒性。这些防御策略是:对抗性训练、集成对抗性训练和专业网络集成。https://www.zhuanzhi.ai/paper/1fd3439a6ac6312d164fc46827f9cbb2
  • [模型训练] 【MindSpore易点通】如何迁移PyTorch代码并在Ascend上实现单机单卡训练
    本文将具体介绍如何将PyTorch的代码迁移至MindSpore,并在Ascend芯片上实现单机单卡训练。使用的PyTorch代码为:Resnet50+CIFAR-10的图像分类任务。示例代码:包含PyTorch和MindSpore代码数据集:CIFAR-10MindSpore API主要类别一、训练流程对比介绍由于MindSpore的架构设计不同于PyTorch框架,二者的训练流程以及代码实现也有不同,下图展示了两者的区别。二、训练代码实现2.1 基本参数配置该部分与Pyotrch基本保持一致import argparse if __name__ == "__main__": parser = argparse.ArgumentParser(description='MindSpore CIFAR-10 Example') parser.add_argument('--pre_trained', type=str, default=None, help='Pretrained checkpoint path') parser.add_argument('--data_path', type=str, default=None, help='data_path') parser.add_argument('--epoch_num', type=int, default=200, help='epoch_num') parser.add_argument('--checkpoint_max_num', type=int, default=5, help='Max num of checkpoint') args = parser.parse_args() LR_ORI = 0.01 EPOCH_MAX = args.epoch_num TRAIN_BATCH_SIZE = 128 VAL_BATCH_SIZE = 100 MOMENTUM_ORI = 0.9 WEIGHT_DECAY = 5e-4 CHECKPOINT_MAX_NUM = args.checkpoint_max_num # Data path TRAIN_PATH = args.data_path VAL_PATH = args.data_path2.2 配置运行信息MindSpore通过context.set_context来配置运行需要的信息,譬如运行模式、后端、硬件等信息。该用例中,我们配置使用图模式,并运行在Ascend芯片上。from mindspore import context context.set_context(mode=context.GRAPH_MODE, device_target='Ascend')2.3 数据集加载与处理PyTorch的数据增强方式包括:RandomCrop,RandomHorizontalFlip,Normalize,ToTensor。通过文API映射关系找到PyTorch在MindSpore对应的接口,进行代码迁移,此处使用了c_transforms接口,是基于C++ opencv开发的高性能图像增强模块,因此最后需通过HWC2CHW()将HWC格式转为MindSpore支持的CHW格式。 迁移后的MindSpore数据集加载与处理代码如下:import mindsporeimport mindspore.dataset as dsimport mindspore.dataset.vision.c_transforms as CVimport mindspore.dataset.transforms.c_transforms as C def create_dataset(data_home, do_train, batch_size): # Define dataset if do_train: cifar_ds = ds.Cifar10Dataset(dataset_dir=data_home, num_parallel_workers=8, shuffle=True, usage='train') else: cifar_ds = ds.Cifar10Dataset(dataset_dir=data_home, num_parallel_workers=8, shuffle=False, usage='test') if do_train: # Transformation on train data transform_data = C.Compose([CV.RandomCrop((32, 32), (4, 4, 4, 4)), CV.RandomHorizontalFlip(), CV.Rescale(1.0 / 255.0, 0.0), CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), CV.HWC2CHW()]) else: # Transformation on validation data transform_data = C.Compose([CV.Rescale(1.0 / 255.0, 0.0), CV.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), CV.HWC2CHW()]) # Transformation on label transform_label = C.TypeCast(mindspore.dtype.int32) # Apply map operations on images cifar_ds = cifar_ds.map(operations=transform_label, num_parallel_workers=8, python_multiprocessing=True, input_columns="label") cifar_ds = cifar_ds.map(operations=transform_data, num_parallel_workers=8, python_multiprocessing=True, input_columns="image") cifar_ds = cifar_ds.batch(batch_size, num_parallel_workers=8, drop_remainder=True) steps_per_epoch = cifar_ds.get_dataset_size() return cifar_ds, steps_per_epoch定义好后直接在主函数中调用即可# Create dataset ds_train, steps_per_epoch_train = create_dataset(TRAIN_PATH, do_train=True, batch_size=TRAIN_BATCH_SIZE) ds_val, steps_per_epoch_val = create_dataset(VAL_PATH, do_train=False, batch_size=VAL_BATCH_SIZE)MindSpore针对以下三种情况已经做了很好的适配,可参考使用。1.常用数据集加载2.特定格式数据集加载(MindRecord)3.自定义数据集加载2.4 网络定义分析PyTorch网络中所包含的算子,通过API映射关系和MindSpore API,找到MindSpore对应的算子,并构造Resnet网络:MindSpore中使用nn.Cell构造网络结构。在Cell的__init__函数内,定义需要使用的算子。然后在construct函数内将定义好的算子连接起来,最后将输出通过return返回。构造resnet的block单元注: 为了保证权重初始化与PyTorch一致,故定义了_conv2d和_dense函数。import mathimport mindsporeimport mindspore.nn as nnfrom mindspore.ops import operations as P EXPANSION = 4 def _conv2d(in_channel, out_channel, kernel_size, stride=1, padding=0): scale = math.sqrt(1/(in_channel*kernel_size*kernel_size)) if padding == 0: return nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=stride, padding=padding, pad_mode='same', weight_init=mindspore.common.initializer.Uniform(scale=scale)) else: return nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=stride, padding=padding, pad_mode='pad', weight_init=mindspore.common.initializer.Uniform(scale=scale)) def _dense(in_channel, out_channel): scale = math.sqrt(1/in_channel) return nn.Dense(in_channel, out_channel, weight_init=mindspore.common.initializer.Uniform(scale=scale), bias_init=mindspore.common.initializer.Uniform(scale=scale)) class ResidualBlock(nn.Cell): def __init__(self, in_planes, planes, stride=1): super(ResidualBlock, self).__init__() self.conv1 = _conv2d(in_planes, planes, kernel_size=1) self.bn1 = nn.BatchNorm2d(planes) self.conv2 = _conv2d(planes, planes, kernel_size=3, stride=stride, padding=1) self.bn2 = nn.BatchNorm2d(planes) self.conv3 = _conv2d(planes, EXPANSION*planes, kernel_size=1) self.bn3 = nn.BatchNorm2d(EXPANSION*planes) self.shortcut = nn.SequentialCell() if stride != 1 or in_planes != EXPANSION*planes: self.shortcut = nn.SequentialCell( _conv2d(in_planes, EXPANSION*planes, kernel_size=1, stride=stride), nn.BatchNorm2d(EXPANSION*planes)) self.relu = nn.ReLU() self.add = P.Add() def construct(self, x_input): out = self.relu(self.bn1(self.conv1(x_input))) out = self.relu(self.bn2(self.conv2(out))) out = self.bn3(self.conv3(out)) identity = self.shortcut(x_input) out = self.add(out, identity) out = self.relu(out) return out构建主干网络ResNet网络中有大量的重复结构,可以使用循环构造多个Cell实例并通过SequentialCell来串联,减少代码重复。在construct函数内将定义好的算子连接起来,最后将网络输出通过return返回。主干网络代码如下:class ResNet(nn.Cell): def __init__(self, num_blocks, num_classes=10): super(ResNet, self).__init__() self.in_planes = 64 self.conv1 = _conv2d(3, 64, kernel_size=3, stride=1, padding=1) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU() self.layer1 = self._make_layer(64, num_blocks[0], stride=1) self.layer2 = self._make_layer(128, num_blocks[1], stride=2) self.layer3 = self._make_layer(256, num_blocks[2], stride=2) self.layer4 = self._make_layer(512, num_blocks[3], stride=2) self.avgpool2d = nn.AvgPool2d(kernel_size=4, stride=4) self.reshape = mindspore.ops.Reshape() self.linear = _dense(2048, num_classes) def _make_layer(self, planes, num_blocks, stride): strides = [stride] + [1]*(num_blocks-1) layers = [] for stride in strides: layers.append(ResidualBlock(self.in_planes, planes, stride)) self.in_planes = EXPANSION*planes return nn.SequentialCell(*layers) def construct(self, x_input): x_input = self.conv1(x_input) out = self.relu(self.bn1(x_input)) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = self.layer4(out) out = self.avgpool2d(out) out = self.reshape(out, (out.shape[0], 2048)) out = self.linear(out) return out对外调用接口def resnet_50(): return ResNet([3, 4, 6, 3])下图展示了PyTorch与MindSpore在定义一个小的CNN网络上的差异:2.5 定义损失函数和优化器PyTorch损失函数和优化器:# Define network net = resnet_50() device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') net = net.to(device) # Define the loss function criterion = torch.nn.CrossEntropyLoss() # Define the optimizer optimizer = torch.optim.SGD(net.parameters(), LR_ORI, MOMENTUM_ORI, WEIGHT_DECAY)迁移后的MindSpore的损失函数和优化器:# Define network net = resnet_50() # Define the loss function loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') # Define the optimizer opt = nn.SGD(net.trainable_params(), LR_ORI, MOMENTUM_ORI, WEIGHT_DECAY)2.6 构建模型MindSpore推荐使用mindspore.Model接口对网络进行封装,内部会自动构建训练流程。需要将定义好的网络原型、损失函数、优化器和metrics传入Model接口,同时为了便于模型评估,MindSpore中提供了多种Metrics,如Accuracy、Precision、Recall、F1等。注:此处为了发挥Ascend芯片的高性能算力,开启了amp_level="O3"。from mindspore import Model # Create train model metrics = {'accuracy': nn.Accuracy(), 'loss': nn.Loss()} model = Model(net, loss, opt, metrics=metrics, amp_level="O3")2.7 训练并验证MindSpore通过调用Model.train接口,并在callbacks中传入自带的ModelCheckpoint、LossMonitor和自定义的EvalCallBack、PrintFps实例,进行训练并验证。import timefrom mindspore.train.callback import Callback, ModelCheckpoint, LossMonitorfrom mindspore.train.callback import CheckpointConfig class EvalCallBack(Callback): def __init__(self, eval_model, eval_dataset, eval_per_epoch): self.eval_model = eval_model self.eval_dataset = eval_dataset self.eval_per_epoch = eval_per_epoch def epoch_end(self, run_context): cb_param = run_context.original_args() cur_epoch = cb_param.cur_epoch_num if cur_epoch % self.eval_per_epoch == 0: acc = self.eval_model.eval(self.eval_dataset, dataset_sink_mode=False) print(acc) class PrintFps(Callback): def __init__(self, step_num, start_time): self.step_num = step_num self.start_time = start_time self.end_time = time.time() def epoch_begin(self, run_context): self.start_time = time.time() def epoch_end(self, run_context): self.end_time = time.time() cb_param = run_context.original_args() cur_epoch = cb_param.cur_epoch_num fps = self.step_num / (self.end_time - self.start_time) print("Epoch:{}, {:.2f}imgs/sec".format(cur_epoch, fps)) # CheckPoint CallBack definition config_ck = CheckpointConfig(save_checkpoint_steps=steps_per_epoch_train, keep_checkpoint_max=CHECKPOINT_MAX_NUM) ckpoint_cb = ModelCheckpoint(prefix="train_resnet_cifar10", directory="./checkpoint/", config=config_ck) # Eval CallBack definition EVAL_PER_EPOCH = 1 eval_cb = EvalCallBack(model, ds_val, EVAL_PER_EPOCH) train_data_num = steps_per_epoch_train * TRAIN_BATCH_SIZE # FPS CallBack definition init_time = time.time() fps_cb = PrintFps(train_data_num, init_time) # Train print("============== Starting Training ==============") model.train(EPOCH_MAX, ds_train, callbacks=[LossMonitor(), eval_cb, fps_cb, ckpoint_cb], dataset_sink_mode=True, sink_size=steps_per_epoch_train)三、运行启动命令:python MindSpore_1P.py --epoch_num=xxx --data_path=xxx在Terminal中运行脚本,可以看到网络输出结果: 相关代码请点击附件下载:
  • [执行问题] MindSpore 1.5 版本yolov4预训练权重问题
    最近在ModelArt上申请了一个MindSpore版本为1.5.1的昇腾设备,在modelZoo上想找个yolov4来训练以下自己的数据集,但是发现MindSpore1.5的yolov4预训练权重需要自己进行训练,有没有MindSpore1.5版本的CSPDrakNet预训练权重呀,或者 1.5 的能不能用1.6的预训练权重?之前在1.6上用SSD模型跑过自己的数据集,然后再310上的1.7转om是算子报错了,那1.6的算子和1.5之间的差距是否会影响到CSP的预训练权重,可以直接提供一个1.5的CSP权重嘛
  • [Atlas300] 【Atlas300T】【Cascade_RCNN训练】inner报错npu报错、EOFError、BrokenPipeError
    【功能模块】Cascade_RCNN代码来源https://www.hiascend.com/zh/software/modelzoo/models/detail/2/6b3530aa27304214972f593383b5eda3使用coco数据集【操作步骤&问题现象】执行bash ./test/train_performance_1p.sh【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [技术干货] 【CANN训练营】在昇腾310上部署基于飞桨的PicoDet轻量级目标检测模型,推理耗时4ms
    # PicoDet模型在昇腾310上的部署 ## 1、简介: 本项目是基于昇腾310推理芯片部署PicoDet模型的应用,实现目标检测的功能。项目链接: [ Ascend310-PicoDet](https://gitee.com/dzm9999/ascend310-pico-det) 进行推理功能的测试方法可以参考项目中的readme文件 关于PicoDet模型可以参考: [PaddleDetection-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) ## 2、实现流程: 本项目参考Paddle2ONNX中PicoDet模型在onnxruntime下的部署流程,添加了使用昇腾310进行推理的代码,整体流程符合典型的目标检测应用的流程,即: 获取图片--》预处理--》推理--》后处理--》保存图片 使用cv2.imread读取图片后,对图片进行等比例缩放和补边,调整图片的尺寸满足模型的输入要求,进行归一化,通道转换等操作后,使用acl.model进行模型相关的操作,并进行推理。之后通过nms,坐标转换等后处理操作将推理得到的数据转换为实际图片上的预测框坐标和预测种类的置信度,最后将结果保存为图片文件。 ## 3、效果展示: 两图分别为在昇腾310上的推理结果和在pc上使用onnx模型进行推理的结果。在模型输入为320x320,FP16的情况下,昇腾310推理耗时为4ms ![ascend](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/20228/9/1660026972588107109.jpg) ![onnx](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/20228/9/1660026990129983670.jpg) ## 4、后续优化 本项目中仍有很多可以优化的点,如预处理的实现使用的是opencv,如替换成dvpp+aipp的实现方式,可能对前处理速度有较大提升。本项目参考了以下项目: [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) [AclLite](https://gitee.com/ascend/samples/tree/master/python/common/acllite)
  • [高校开发者专区] 【HCSD-DevCloud训练营学习笔记】飞机大战游戏上云实践
    学习目标:目录DEvOps的五要素DEvOps的生命周期
  • [问题求助] 【昇腾开发板】【训练和推理】新手求推荐能训练和推理的昇腾开发板
    我新手,想搞纯国产的AI软硬件,求推荐个能训练和推理的昇腾开发板,能跑MindSpore就行。请大佬们指教。谢谢。
  • [执行问题] mindspore 求梯度报错
    【功能模块】对mask_tensor进行优化【操作步骤&问题现象】1、定义的优化器:self.opt = nn.Adam(params=[Parameter(default_input=self.mask_tensor,name="mask", requires_grad=True)], learning_rate=0.01, weight_decay=0.0001)2、自定义的trainonestep:class TrainOneStepCell_3(nn.Cell): """自定义训练网络""" def __init__(self, network, optimizer, sens=1.0): """入参有三个:训练网络,优化器和反向传播缩放比例""" super(TrainOneStepCell_3, self).__init__(auto_prefix=False) self.network = network self.network.set_grad() self.optimizer = optimizer self.weights = self.optimizer.parameters self.grad = ops.composite.GradOperation(get_all=True, sens_param=False) def construct(self, input_tensor,mask_tensor,ref): loss = self.network(input_tensor,mask_tensor,ref) grads = self.grad(self.network)(input_tensor,mask_tensor,ref) # loss = ops.depend(loss, self.optimizer(grads[1])) self.optimizer(grads[1]) return grads,loss【截图信息】报错信息:【日志信息】(可选,上传日志内容或者附件)