• [问题求助] 在基于mindspore框架的昇腾芯片上跑yolov5的目标检测代码时,报一些看不懂的错误
    在运行train.py,进行训练时报错,可是明明所有地方都改好了,网上也查不到,实在是不知道在哪里可以解决。报错信息如下:[EXCEPTION] ANALYZER(8834,ffffb5fff5d0,python):2022-02-13-21:39:45.035.352 [mindspore/ccsrc/pipeline/jit/static_analysis/prim.cc:954] GetEvaluatedValueForBuiltinTypeAttrOrMethod] Not supported to get attribute item name:'arange' of a type[kMetaTypeNone][ERROR] ME(8834:281473735194064,MainProcess):2022-02-13-21:39:45.156.789 [mindspore/dataset/engine/datasets.py:2686] Uncaught exception: Traceback (most recent call last):  File "train.py", line 131, in <module>    run_train()  File "/home/ma-user/work/model_utils/moxing_adapter.py", line 167, in wrapped_func    run_func(*args, **kwargs)  File "train.py", line 105, in run_train    data[7], input_shape)  File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 404, in __call__    out = self.compile_and_run(*inputs)  File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 682, in compile_and_run    self.compile(*inputs)  File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 669, in compile    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)  File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 548, in compile    result = self._graph_executor.compile(obj, args_list, phase, use_vm, self.queue_name)RuntimeError: mindspore/ccsrc/pipeline/jit/static_analysis/prim.cc:954 GetEvaluatedValueForBuiltinTypeAttrOrMethod] Not supported to get attribute item name:'arange' of a type[kMetaTypeNone]The function call stack (See file '/home/ma-user/work/rank_0/om/analyze_fail.dat' for more details):# 0 In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py(353)        loss = self.network(*inputs)               ^# 1 In file /home/ma-user/work/src/yolo.py(394)        yolo_out = self.yolo_network(x, input_shape)                   ^# 2 In file /home/ma-user/work/src/yolo.py(358)        output_big = self.detect_1(big_object_output, input_shape)                     ^# 3 In file /home/ma-user/work/src/yolo.py(192)        if self.conf_training:# 4 In file /home/ma-user/work/src/yolo.py(168)        grid_x = ms.numpy.arange(grid_size[1])                 ^[ERROR] MD(8834,ffff1d8f91e0,python):2022-02-13-21:39:53.946.184 [mindspore/ccsrc/minddata/dataset/util/task.cc:67] operator()] Task: GeneratorOp(ID:8) - thread(281471177691616) is terminated with err msg: Exception thrown from PyFunc. The actual amount of data read from generator 952 is different from generator.len 117266, you should adjust generator.len to make them match.Line of code : 208File         : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_CentOS@2/mindspore/mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc[ERROR] MD(8834,ffff1d8f91e0,python):2022-02-13-21:39:53.946.266 [mindspore/ccsrc/minddata/dataset/util/task_manager.cc:217] InterruptMaster] Task is terminated with err msg(more detail in info level log):Exception thrown from PyFunc. The actual amount of data read from generator 952 is different from generator.len 117266, you should adjust generator.len to make them match.Line of code : 208File         : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_CentOS@2/mindspore/mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc
  • [执行问题] 【mindspore】使用mindspore镜像无法跑通样例
    【功能模块】在ascend310T上,启动mindspore1.5.0的容器,在容器里面无法正常运行程序。【操作步骤&问题现象】1、安装有mindpore的容器正常运行并且登录成功2、运行程序后报导入失败:ImportError: cannot import name 'context' from 'mindspore'报错【截图信息】
  • [推理] 【Ascend310】【Mindspore】安装mindspore后无法进行跑通样例
    功能模块】在ascend310上,安装mindspore1.6.0后无法跑通样例【操作步骤&问题现象】我的软硬件平台是:Ascend310,配套软件包为toolkit 5.0.4.alpha003,python3.7.5,ubuntu-aarch_641、按照官网安装toolkit,环境变量也设置完成2、pip安装mindspore :pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.0/MindSpore/ascend/aarch64/mindspore_ascend-1.6.0-cp37-cp37m-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple3、按照步骤跑ascend310_single_op_sample的样例,按照官网的命令,cmake和make后生成了可执行文件tensor_add_sample运行./tensor_add_sample结果显示:Build model failed【截图信息】报错如下:WARNING: Logging before InitGoogleLogging() is written to STDERR[WARNING] ME(25498,ffff71ca5f90,tensor_add_sample):2022-02-13-00:26:36.767.528 [mindspore/ccsrc/cxx_api/model/model_converter_utils/multi_process.cc:230] HeartbeatThreadFuncInner] Peer stopped[ERROR] ME(25498,ffff78e50000,tensor_add_sample):2022-02-13-00:26:36.768.463 [mindspore/ccsrc/cxx_api/model/acl/model_converter.cc:172] operator()] Receive result model from child process failed[ERROR] ME(25498,ffff78e50000,tensor_add_sample):2022-02-13-00:26:36.768.551 [mindspore/ccsrc/cxx_api/model/model_converter_utils/multi_process.cc:111] ParentProcess] Parent process process failed[ERROR] ME(25498,ffff78e50000,tensor_add_sample):2022-02-13-00:26:36.772.071 [mindspore/ccsrc/cxx_api/model/acl/model_converter.cc:205] LoadMindIR] Convert MindIR model to OM model failed[ERROR] ME(25498,ffff78e50000,tensor_add_sample):2022-02-13-00:26:36.772.116 [mindspore/ccsrc/cxx_api/model/acl/acl_model.cc:79] Build] Load MindIR failed.Build model failed.
  • [执行问题] mindspore的construct网络可以使用if语句吗
    【功能模块】【操作步骤&问题现象】def construct(self, x): x = self.pad(x) print(x.shape) if self._use_batch_norm: x = self.bn(x) if self._activation_fn is not None: x = self._activation_fn(x) return x 【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [API使用] 【MindSpore产品】【数据集功能】无法查看数据集
    【代码】from mindspore import context context.set_context(mode=context.GRAPH_MODE,device_target='CPU') import mindspore.dataset as ds import mindspore.dataset.transforms.c_transforms as C import mindspore.dataset.vision.c_transforms as CV from mindspore.dataset.vision import Inter from mindspore import dtype as mstype train_path="datasets/MNIST_Data/train" test_path="datasets/MNIST_Data/test" def create_dataset(data_path, batch_size=32, repeat_size=1, num_parallel_workers=1): # 定义数据集 mnist_ds = ds.MnistDataset(data_path) resize_height, resize_width = 32, 32 rescale = 1.0 / 255.0 shift = 0.0 rescale_nml = 1 / 0.3081 shift_nml = -1 * 0.1307 / 0.3081 # 定义所需要操作的map映射 resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) rescale_nml_op = CV.Rescale(rescale_nml, shift_nml) rescale_op = CV.Rescale(rescale, shift) hwc2chw_op = CV.HWC2CHW() type_cast_op = C.TypeCast(mstype.int32) # 使用map映射函数,将数据操作应用到数据集 mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers) mnist_ds = mnist_ds.map(operations=[resize_op, rescale_op, rescale_nml_op, hwc2chw_op], input_columns="image", num_parallel_workers=num_parallel_workers) # 进行shuffle、batch、repeat操作 buffer_size = 10000 mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True) mnist_ds = mnist_ds.repeat(count=repeat_size) return mnist_ds datasets=create_dataset(train_path) for data in datasets.create_dict_iterator(): print("data: {}".format(data["data"])) print("label: {}".format(data["label"]))显示:Traceback (most recent call last):  File "D:\mindspore\手写数字识别\手写数字识别.py", line 43, in <module>    for data in datasets.create_dict_iterator():  File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\site-packages\mindspore\dataset\engine\iterators.py", line 148, in __next__    data = self._get_next()  File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\site-packages\mindspore\dataset\engine\iterators.py", line 203, in _get_next    raise err  File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\lib\site-packages\mindspore\dataset\engine\iterators.py", line 196, in _get_next    return {k: self._transform_tensor(t) for k, t in self._iterator.GetNextAsMap().items()}RuntimeError出现了错误,无法查看数据数据:数据是来自这里谢谢各位大佬
  • [API使用] 【Mindspore】【Mindconverter】onnx转化问题
    使用Mindspore1.6.0+Mindconverter1.6.0转换onnx模型我先将pytorvh的模型转为了onnx,再使用onnx转mindspore脚本时报错。这个onnx模型没有问题,我可以使用Netron正常打开,只是每个算子都是以编号命名的。详情见附件。我使用Mindspore1.5.0+Mindconverter1.5.0转换的时候则不报这个错误,但是有不支持的算子,所以想用1.6试试看。但是用1.6出现了报错
  • [模型训练] Say Hello to MindSpore——基于LeNet5的手写数字识别案例复现ModelArts和WSL2+GPU
    基于LeNet5的手写数字识别实验介绍LeNet5 + MNIST被誉为深度学习领域的“Hello world”。本实验主要介绍使用MindSpore在MNIST手写数字数据集上开发和训练一个LeNet5模型,并验证模型精度。今天就给大家做一个简简单单的模型案例复现,详情均可以阅读官网参考文档。实验准备数据集准备MNIST是一个手写数字数据集,训练集包含60000张手写数字,测试集包含10000张手写数字,共10类。MNIST数据集的官网:THE MNIST DATABASE。• 方式一,从MNIST官网下载如下4个文件到本地并解压:train-images-idx3-ubyte.gz: training set images (9912422 bytes)train-labels-idx1-ubyte.gz: training set labels (28881 bytes)t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)• 方式二,从华为云OBS中下载MNIST数据集 并解压。• 方式三(推荐),使用ModelArts训练作业/Notebook时,可以拷贝他人共享的OBS桶内的数据集,方法详见适配训练作业、数据拷贝。脚本准备从课程gitee仓库 上下载本实验相关脚本。将脚本和数据集组织为如下形式:lenet5├── MNIST│ ├── test│ │ ├── t10k-images-idx3-ubyte│ │ └── t10k-labels-idx1-ubyte│ └── train│ ├── train-images-idx3-ubyte│ └── train-labels-idx1-ubyte└── main.py创建OBS桶使用ModelArts训练作业/Notebook时,需要使用华为云OBS存储实验脚本和数据集,可以参考快速通过OBS控制台上传下载文件 了解使用OBS创建桶、上传文件、下载文件的使用方法(下文给出了操作步骤)。提示: 华为云新用户使用OBS时通常需要创建和配置“访问密钥”,可以在使用OBS时根据提示完成创建和配置。也可以参考获取访问密钥并完成ModelArts全局配置 获取并配置访问密钥。打开OBS控制台 ,点击右上角的“创建桶”按钮进入桶配置页面,创建OBS桶的参考配置如下:• 区域:华北-北京四• 数据冗余存储策略:单AZ存储• 桶名称:全局唯一的字符串• 存储类别:标准存储• 桶策略:公共读• 归档数据直读:关闭• 企业项目、标签等配置:免上传文件点击新建的OBS桶名,再打开“对象”标签页,通过“上传对象”、“新建文件夹”等功能,将脚本和数据集上传到OBS桶中。上传文件后,查看页面底部的“任务管理”状态栏(正在运行、已完成、失败),确保文件均上传完成。若失败请:• 参考上 传对象 大小限制/切换上 传方式 ,• 参考上 传对象 失败常见原因 。• 若无法解决请新建工单 ,产品类为“对象存储服务”,问题类型为“桶和对象相关”,会有技术人员协助解决。实验步骤(ModelArts训练作业)ModelArts提供了训练作业服务,训练作业资源池大,且具有作业排队等功能,适合大规模并发使用。使用训练作业时,如果有修改代码和调试的需求,有如下三个方案:1. 在本地修改代码后重新上传;2. 使用PyCharm ToolKit 配置一个本地Pycharm+ModelArts的开发环境,便于上传代码、提交训练作业和获取训练日志。3. 在ModelArts上创建Notebook,然后设置Sync OBS功能 ,可以在线修改代码并自动同步到OBS中。因为只用Notebook来编辑代码,所以创建CPU类型最低规格的Notebook就行。适配训练作业创建训练作业时,运行参数会通过脚本传参的方式输入给脚本代码,脚本必须解析传参才能在代码中使用相应参数。如data_url和train_url,分别对应数据存储路径(OBS路径)和训练输出路径(OBS路径)。脚本对传参进行解析后赋值到args变量里,在后续代码里可以使用。import argparseparser = argparse.ArgumentParser()parser.add_argument('--data_url', required=True, default=None, help='Location of data.')parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')args, unknown = parser.parse_known_args()MindSpore暂时没有提供直接访问OBS数据的接口,需要通过ModelArts自带的moxing框架与OBS交互。• 方式一,拷贝自己账户下OBS桶内的数据集至执行容器。• import moxing• # src_url形如's3://OBS/PATH',为OBS桶中数据集的路径,dst_url为执行容器中的路径• moxing.file.copy_parallel(src_url=args.data_url, dst_url='MNIST/')• 方式二(推荐),拷贝他人共享的OBS桶内的数据集至执行容器,前提是他人账户下的OBS桶已设为公共读/公共读写。若在创建桶时桶策略为私有,请参考配置标准桶策略 修改为公共读/公共读写。• import moxing• moxing.file.copy_parallel(src_url="s3://share-course/dataset/MNIST/", dst_url='MNIST/')• 方式三(不推荐),先关联他人私有账户,再拷贝他人账户下OBS桶内的数据集至执行容器,前提是已获得他人账户的访问密钥、私有访问密钥、OBS桶-概览-基本信息-Endpoint。• import moxing• # 设置他人账户的访问密钥, ak:Access Key Id, sk:Secret Access Key, server:endpoint of obs bucket• moxing.file.set_auth(ak='Access Key', sk='Secret Access Key', server="obs.cn-north-4.myhuaweicloud.com")• moxing.file.copy_parallel(src_url="s3://share-course/dataset/MNIST/", dst_url='MNIST/')创建训练作业可以参考使用常用框架训练模型 来创建并启动训练作业(下文给出了操作步骤)。打开ModelArts控制台-训练管理-训练作业 ,点击“创建”按钮进入训练作业配置页面,创建训练作业的参考配置:• 算法来源:常用框架->Ascend-Powered-Engine->MindSpore• 代码目录:选择上述新建的OBS桶中的lenet5目录,用obs browser+上传• 启动文件:选择上述新建的OBS桶中的lenet5目录下的main.py,快速创建算法• 数据来源:数据存储位置->选择上述新建的OBS桶中的lenet5目录下的MNIST目录• 训练输出位置:选择上述新建的OBS桶中的lenet5目录并在其中创建output目录• 作业日志路径:同训练输出位置• 规格:Ascend:1*Ascend 910• 其他均为默认•  启动并查看训练过程:1. 点击提交以开始训练;2. 在训练作业列表里可以看到刚创建的训练作业,在训练作业页面可以看到版本管理;3. 点击运行中的训练作业,在展开的窗口中可以查看作业配置信息,以及训练过程中的日志,日志会不断刷新,等训练作业完成后也可以下载日志到本地进行查看;4. 参考实验步骤(ModelArts Notebook),在日志中找到对应的打印信息,检查实验是否成功。5. epoch: 1 step: 1875, loss is 1.93937336. epoch: 2 step: 1875, loss is 0.046494857. epoch: 3 step: 1875, loss is 0.067324838. [WARNING] SESSION(168,ffff0ffff1e0,python):2022-02-10-19:52:59.136.619 [mindspore/ccsrc/backend/session/ascend_session.cc:1806] SelectKernel] There are 4 node/nodes used reduce precision to selected the kernel!9. Metrics: {'loss': 0.07129916341009682, 'acc': 0.9781650641025641}实验步骤(ModelArts Notebook)ModelArts Notebook资源池较小,且每个运行中的Notebook会一直占用Device资源不释放,不适合大规模并发使用(不使用时需停止实例,以释放资源)。创建Notebook可以参考创建并打开Notebook 来创建并打开Notebook(下文给出了操作步骤)。打开ModelArts控制台-开发环境-Notebook ,点击“创建”按钮进入Notebook配置页面,创建Notebook的参考配置:• 计费模式:按需计费• 名称:notebook-lenet5• 工作环境:公共镜像->Ascend+ARM算法开发和训练基础镜像,AI引擎预置TensorFlow和MindSpore• 资源池:公共资源• 类型:Ascend• 规格:单卡1*Ascend 910• 存储位置:对象存储服务(OBS)->选择上述新建的OBS桶中的lenet5文件夹(此为旧版操作,新版有所变更,请看下面详细叙述)• 自动停止:打开->选择1小时后(后续可在Notebook中随时调整)将数据添加到并行文件系统中创建完成后,添加数据存储,将数据挂载到data目录下即可。打开Notebook后,选择MindSpore环境作为Kernel。导入模块导入MindSpore模块和辅助模块,设置MindSpore上下文,如执行模式、设备等。import os# os.environ['DEVICE_ID'] = '0'import mindspore as msimport mindspore.context as contextimport mindspore.dataset.transforms.c_transforms as Cimport mindspore.dataset.vision.c_transforms as CVfrom mindspore import nnfrom mindspore.train import Modelfrom mindspore.train.callback import LossMonitorcontext.set_context(mode=context.GRAPH_MODE, device_target='Ascend') # Ascend, CPU, GPU数据处理在使用数据集训练网络前,首先需要对数据进行预处理,如下:def create_dataset(data_dir, training=True, batch_size=32, resize=(32, 32),rescale=1/(255*0.3081), shift=-0.1307/0.3081, buffer_size=64):data_train = os.path.join(data_dir, 'train') # train setdata_test = os.path.join(data_dir, 'test') # test setds = ms.dataset.MnistDataset(data_train if training else data_test)ds = ds.map(input_columns=["image"], operations=[CV.Resize(resize), CV.Rescale(rescale, shift), CV.HWC2CHW()])ds = ds.map(input_columns=["label"], operations=C.TypeCast(ms.int32))# When `dataset_sink_mode=True` on Ascend, append `ds = ds.repeat(num_epochs) to the endds = ds.shuffle(buffer_size=buffer_size).batch(batch_size, drop_remainder=True)return ds对其中几张图片进行可视化,可以看到图片中的手写数字,图片的大小为32x32。import matplotlib.pyplot as pltds = create_dataset('data/lenet/lenet5/MNIST', training=False)#修改为挂载路径即可data = ds.create_dict_iterator(output_numpy=True).get_next()images = data['image']labels = data['label']for i in range(1, 5):plt.subplot(2, 2, i)plt.imshow(images[i][0])plt.title('Number: %s' % labels[i])plt.xticks([])plt.show()定义模型定义LeNet5模型class LeNet5(nn.Cell):def __init__(self):super(LeNet5, self).__init__()self.conv1 = nn.Conv2d(1, 6, 5, stride=1, pad_mode='valid')self.conv2 = nn.Conv2d(6, 16, 5, stride=1, pad_mode='valid')self.relu = nn.ReLU()self.pool = nn.MaxPool2d(kernel_size=2, stride=2)self.flatten = nn.Flatten()self.fc1 = nn.Dense(400, 120)self.fc2 = nn.Dense(120, 84)self.fc3 = nn.Dense(84, 10)def construct(self, x):x = self.relu(self.conv1(x))x = self.pool(x)x = self.relu(self.conv2(x))x = self.pool(x)x = self.flatten(x)x = self.fc1(x)x = self.fc2(x)x = self.fc3(x)return x训练使用MNIST数据集对上述定义的LeNet5模型进行训练。训练策略如下表所示,可以调整训练策略并查看训练效果,要求验证精度大于95%。batch sizenumber of epochslearning rateoptimizer3230.01Momentum 0.9def train(data_dir, lr=0.01, momentum=0.9, num_epochs=3):ds_train = create_dataset(data_dir)ds_eval = create_dataset(data_dir, training=False)net = LeNet5()loss = nn.loss.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')opt = nn.Momentum(net.trainable_params(), lr, momentum)loss_cb = LossMonitor(per_print_times=ds_train.get_dataset_size())model = Model(net, loss, opt, metrics={'acc', 'loss'})# dataset_sink_mode can be True when using Ascendmodel.train(num_epochs, ds_train, callbacks=[loss_cb], dataset_sink_mode=False)metrics = model.eval(ds_eval, dataset_sink_mode=False)print('Metrics:', metrics)train('data/lenet/lenet5/MNIST/')#此处我们修改为自己的挂载路径即可 训练完成。实验步骤(本地CPU/GPU/Ascend)MindSpore还支持在本地CPU/GPU/Ascend环境上运行,如Windows/Ubuntu x64笔记本,NVIDIA GPU服务器,以及Atlas Ascend服务器等。在本地环境运行实验前,需要先参考安装教程配置环境。在Windows/Ubuntu x64笔记本上运行实验:# 编辑main.py 将第15行的context设置为`device_target='CPU'或者'GPU'python main.py --data_url=.\MNIST#拷贝到当前文件夹,也可自定义路径。实验小结本实验展示了如何使用MindSpore进行手写数字识别,以及开发和训练LeNet5模型。通过对LeNet5模型做几代的训练,然后使用训练后的LeNet5模型对手写数字进行识别,识别准确率大于95%。即LeNet5学习到了如何进行手写数字识别。
  • [应用实践] MindSpore入门--跑通DeepLabV3模型之二
    [MindSpore入门--跑通DeepLabV3模型之一](https://bbs.huaweicloud.com/forum/thread-179316-1-1.html) [MindSpore入门--跑通DeepLabV3模型之二](https://bbs.huaweicloud.com/forum/thread-179317-1-1.html) ## 4. 模型评估 ### 4.1 使用voc val数据集评估s16 评估命令如下 ```shell nohup python3 eval.py --data_root=./ --data_lst=./raw_data/voc_val_lst.txt --batch_size=8 --crop_size=513 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s16 --scales_type=0 --freeze_bn=True --ckpt_path=./s16_aug_train_1g/ckpt/deeplab_v3_s16-200_661.ckpt > ms_log/eval_s16.log 2>&1 & ``` 可以使用`tail -n 132 ms_log/eval_s16.log`查看结果 ### 4.2 使用voc val数据集评估s8 评估命令如下 ```shell nohup python3 eval.py --data_root=./ --data_lst=./raw_data/voc_val_lst.txt --batch_size=8 --crop_size=513 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s8 --scales_type=0 --freeze_bn=True --ckpt_path=./s8_voc_train_1g/ckpt/deeplab_v3_s8-200_183.ckpt > ms_log/eval_s8.log 2>&1 & ``` 可以使用`tail -n 132 ms_log/eval_s8.log`查看结果 ### 4.3 使用voc val数据集评估多尺度s8 评估命令如下 ```shell nohup python3 eval.py --data_root=./ --data_lst=./raw_data/voc_val_lst.txt --batch_size=8 --crop_size=513 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s8 --scales_type=1 --freeze_bn=True --ckpt_path=./s8_voc_train_1g/ckpt/deeplab_v3_s8-200_183.ckpt > ms_log/eval_s8_multiscale.log 2>&1 & ``` 可以使用`tail -n 132 ms_log/eval_s8_multiscale.log`查看结果 ### 4.4 使用voc val数据集评估多尺度和翻转s8 评估命令如下 ```shell nohup python3 eval.py --data_root=./ --data_lst=./raw_data/voc_val_lst.txt --batch_size=8 --crop_size=513 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s8 --scales_type=1 --flip=True --freeze_bn=True --ckpt_path=./s8_voc_train_1g/ckpt/deeplab_v3_s8-200_183.ckpt > ms_log/eval_s8_multiscale_flip.log 2>&1 & ``` 可以使用`tail -n 132 ms_log/eval_s8_multiscale_flip.log`查看结果 ## 5. 线上推理 > 由于deeplabv3目前推理代码仅限Ascend硬件(笔者尝试修改代码进行GPU/CPU支持,都会有部分错误,如果后面有时间再详细debug。),所以我们需要开通一个Ascend服务器,并搭建相关环境进行后续步骤。 > > 注意 > > - 由于`deeplabv3`数据较大,需要提前增加一块云硬盘并挂载到系统中,或者开通服务器时系统盘选择200G。 > - Ascend服务器开通及MindSpore环境搭建参考笔者文章 -- [基于Ascend服务器安装MindSpore 1.5.0](https://my.oschina.net/kaierlong/blog/5375343) 云端数据下载 > 云端数据下载参考前面数据准备章节,仅需要下载完成数据并解压即可。注意需要提前将代码仓库clone到云端。 在云端,创建`deeplabv3`目录下创建`SegmentationClassGray`目录 ```shell mkdir -p raw_data/VOCdevkit/VOC2012/SegmentationClassGray ``` 在本地,模型导出 > 注意将 `ckpt_file`替换 ```shell python3 export.py --ckpt_file=./s16_aug_train_1g/ckpt/deeplab_v3_s16-200_661.ckpt --file_name=deeplab_v3_s16_200_661 --file_format=MINDIR --export_model=deeplab_v3_s16 --device_target=GPU python3 export.py --ckpt_file=./s8_voc_train_1g/ckpt/deeplab_v3_s8-200_183.ckpt --file_name=deeplab_v3_s8_200_183 --file_format=MINDIR --export_model=deeplab_v3_s8 --device_target=GPU ``` 在本地,数据上传 > 注意将`s8_voc_train_1g/ckpt/deeplab_v3_s8-200_183.ckpt`替换 ```shell remote_ip="your_server_ip" scp *.mindir root@${remote_ip}:/root/codes/models/official/cv/deeplabv3/ scp s8_voc_train_1g/ckpt/deeplab_v3_s8-200_183.ckpt root@${remote_ip}:/root/codes/models/official/cv/deeplabv3/ scp raw_data/*.txt root@${remote_ip}:/root/codes/models/official/cv/deeplabv3/raw_data/ scp -r raw_data/VOCdevkit/VOC2012/SegmentationClassGray/* root@${remote_ip}:/root/codes/models/official/cv/deeplabv3/raw_data/VOCdevkit/VOC2012/SegmentationClassGray/ ``` 在云端,模型推理 ```shell cd /root/codes/models/official/cv/deeplabv3/ mkdir ascend310_infer_result && cd ascend310_infer_result cp ../scripts/run_infer_310.sh ./ chmod a+x run_infer_310.sh nohup ./run_infer_310.sh ../deeplab_v3_s8_200_183.mindir ./ ../ ../raw_data/voc_val_lst.txt 0 & ``` > 可以使用`cat acc.log`命令查看推理结果 在云端,量化推理 > 这里因为软件限制,无法使用`amct_mindspore`,故而将`post_quant.py`步骤进行替换。 ```shell cd /root/codes/models/official/cv/deeplabv3/ascend310_quant_infer nohup python3 export_bin.py --model=deeplab_v3_s8 --data_root=../raw_data --data_lst=../raw_data/voc_val_lst.txt > data.log 2>&1 & chmod a+x run_quant_infer.sh # post_quant.py步骤替换为export.py # python3 post_quant.py --model=deeplab_v3_s8 --data_root=../raw_data --data_lst=../raw_data/voc_val_lst.txt --ckpt_file=../deeplab_v3_s8-200_183.ckpt python3 export.py --ckpt_file=deeplab_v3_s8-200_183.ckpt --file_name=deeplab_v3_s8_200_183 --file_format=AIR --export_model=deeplab_v3_s8 --device_target=Ascend nohup ./run_quant_infer.sh ../deeplab_v3_s8_200_183.air ./data/00_data/ ./data/01_label ./data/shape.npy & ``` > 可以使用`cat acc.log`命令查看推理结果 ## 总结 本文主要介绍了如何基于GPU环境跑通`deeplabv3`的训练评估步骤,以及使用线上`Ascend`环境跑通`deeplabv3`的模型推理步骤。 ## 问题 > 笔者遇到的问题及解决思路 问题1 > dataset_sink_mode问题 ```shell Traceback (most recent call last): File "train.py", line 213, in <module> train() File "/mnt/data_0301_12t/xingchaolong/home/codes/gitee/mindspore_models/official/cv/deeplabv3/model_utils/moxing_adapter.py", line 105, in wrapped_func run_func(*args, **kwargs) File "train.py", line 209, in train model.train(args.train_epochs, dataset, callbacks=cbs, dataset_sink_mode=(args.device_target != "CPU")) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/train/model.py", line 722, in train self._train(epoch, File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/train/model.py", line 504, in _train self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/train/model.py", line 566, in _train_dataset_sink_process outputs = self._train_network(*inputs) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/nn/cell.py", line 404, in __call__ out = self.compile_and_run(*inputs) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/nn/cell.py", line 682, in compile_and_run self.compile(*inputs) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/nn/cell.py", line 669, in compile _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/common/api.py", line 548, in compile result = self._graph_executor.compile(obj, args_list, phase, use_vm, self.queue_name) TypeError: mindspore/ccsrc/runtime/device/gpu/kernel_info_setter.cc:355 PrintUnsupportedTypeException] Select GPU kernel op[SoftmaxCrossEntropyWithLogits] fail! Incompatible data type! The supported data types are in[float32 float32], out[float32 float32]; , but get in [float16 float16 ] out [float16 float16 ] ``` 问题2 > GPU显存不足问题 ```shell [ERROR] RUNTIME_FRAMEWORK(1715024,7f1962ffd700,python3):2021-12-10-11:06:52.083.474 [mindspore/ccsrc/runtime/framework/actor/memory_manager_actor.cc:182] SetOpContextMemoryAllocFail] Device(id:0) memory isn't enough and alloc failed, kernel name: Default/network-TrainOneStepCell/network-BuildTrainNetwork/network-DeepLabV3/resnet-Resnet/layer3-SequentialCell/0-Bottleneck/bn3-BatchNorm2d/BatchNorm-op4408, alloc size: 142737408B. [EXCEPTION] VM(1715024,7f1ab171c740,python3):2021-12-10-11:06:52.083.822 [mindspore/ccsrc/vm/backend.cc:835] RunGraph] The actor runs failed, actor name: kernel_graph_1 load_model ./resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt success Traceback (most recent call last): File "train.py", line 221, in <module> train() File "/mnt/data_0301_12t/xingchaolong/home/codes/gitee/mindspore_models/official/cv/deeplabv3/model_utils/moxing_adapter.py", line 105, in wrapped_func run_func(*args, **kwargs) File "train.py", line 217, in train model.train(args.train_epochs, dataset, callbacks=cbs, dataset_sink_mode=(args.device_target != "CPU")) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/train/model.py", line 722, in train self._train(epoch, File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/train/model.py", line 504, in _train self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/train/model.py", line 566, in _train_dataset_sink_process outputs = self._train_network(*inputs) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/nn/cell.py", line 404, in __call__ out = self.compile_and_run(*inputs) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/nn/cell.py", line 698, in compile_and_run return _cell_graph_executor(self, *new_inputs, phase=self.phase) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/common/api.py", line 627, in __call__ return self.run(obj, *args, phase=phase) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/common/api.py", line 655, in run return self._exec_pip(obj, *args, phase=phase_real) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/common/api.py", line 78, in wrapper results = fn(*arg, **kwargs) File "/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0/lib/python3.8/site-packages/mindspore/common/api.py", line 638, in _exec_pip return self._graph_executor(args_list, phase) RuntimeError: mindspore/ccsrc/vm/backend.cc:835 RunGraph] The actor runs failed, actor name: kernel_graph_1 ``` 问题3 > sudo apt install libgl1-mesa-glx ```shell Traceback (most recent call last): File "../postprocess.py", line 20, in <module> import cv2 File "/root/pyenvs/env_mindspore_ascend_1.5.0/lib/python3.7/site-packages/cv2/__init__.py", line 8, in <module> from .cv2 import * ImportError: libGL.so.1: cannot open shared object file: No such file or directory ``` ## 参考 - [mindspore models deeplabv3](https://gitee.com/mindspore/models/tree/master/official/cv/deeplabv3)
  • [应用实践] MindSpore入门--跑通DeepLabV3模型之一
    [MindSpore入门--跑通DeepLabV3模型之一](https://bbs.huaweicloud.com/forum/thread-179316-1-1.html) [MindSpore入门--跑通DeepLabV3模型之二](https://bbs.huaweicloud.com/forum/thread-179317-1-1.html) # An Introduction To MindSpore -- DeepLabV3 > MindSpore入门--跑通DeepLabV3模型 本文开发环境如下 > 训练评估部分 > > - Ubuntu Server 20.04 x86_64 > - Python 3.8.10 > - Cuda 11.1.0 > - RTX 3090 * 4 -- 实际仅用单卡 > - MindSpore 1.5.0 > > 线上推理部分 > > - Ubuntu 18.04 x86_64 > - Python 3.7.5 > - Ascend > - MindSpore 1.5.0 本文主要内容如下 > - 环境准备 > - 数据准备 > - 模型训练 > - 模型评估 > - 线上推理 > - 总结 > - 问题 > - 参考 ## 1. 环境准备 > 注意事项: > > - 先按照[基于GPU服务器安装MindSpore 1.5.0](https://my.oschina.net/kaierlong/blog/5375003)搭建基础开发环境 ### 1.1 克隆仓库并进入到本地`deeplabv3`目录 ```shell git clone https://gitee.com/mindspore/models.git mindspore_models cd mindspore_models/official/cv/deeplabv3 ``` 可以使用`tree`查看`deeplabv3`目录结构,目录结构如下所示。 ```shell . ├── ascend310_infer │   ├── build.sh │   ├── CMakeLists.txt │   ├── fusion_switch.cfg │   ├── inc │   │   └── utils.h │   └── src │   ├── main.cc │   └── utils.cc ├── ascend310_quant_infer │   ├── acc.py │   ├── config.cfg │   ├── export_bin.py │   ├── fusion_switch.cfg │   ├── inc │   │   ├── model_process.h │   │   ├── sample_process.h │   │   └── utils.h │   ├── post_quant.py │   ├── run_quant_infer.sh │   └── src │   ├── acl.json │   ├── build.sh │   ├── CMakeLists.txt │   ├── main.cpp │   ├── model_process.cpp │   ├── sample_process.cpp │   └── utils.cpp ├── default_config.yaml ├── Dockerfile ├── eval.py ├── export.py ├── mindspore_hub_conf.py ├── model_utils │   ├── config.py │   ├── device_adapter.py │   ├── __init__.py │   ├── local_adapter.py │   └── moxing_adapter.py ├── postprocess.py ├── README_CN.md ├── README.md ├── requirements.txt ├── scripts │   ├── build_data.sh │   ├── docker_start.sh │   ├── run_distribute_train_s16_r1.sh │   ├── run_distribute_train_s8_r1.sh │   ├── run_distribute_train_s8_r2.sh │   ├── run_eval_s16.sh │   ├── run_eval_s8_multiscale_flip.sh │   ├── run_eval_s8_multiscale.sh │   ├── run_eval_s8.sh │   ├── run_infer_310.sh │   ├── run_standalone_train_cpu.sh │   └── run_standalone_train.sh ├── src │   ├── data │   │   ├── build_seg_data.py │   │   ├── dataset.py │   │   ├── get_dataset_lst.py │   │   └── __init__.py │   ├── __init__.py │   ├── loss │   │   ├── __init__.py │   │   └── loss.py │   ├── nets │   │   ├── deeplab_v3 │   │   │   ├── deeplab_v3.py │   │   │   └── __init__.py │   │   ├── __init__.py │   │   └── net_factory.py │   ├── tools │   │   ├── get_multicards_json.py │   │   └── __init__.py │   └── utils │   ├── __init__.py │   └── learning_rates.py └── train.py 15 directories, 64 files ``` ### 1.2 准备开发环境 ```shell pip3 install -r requirements.txt ``` ## 2.数据准备 ### 2.1 下载数据集 > 数据集下载地址 > > - Pascal VOC数据集 > - 主页地址 http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html > - 下载地址 http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar > > - 语义边界数据集 > - 主页地址 http://home.bharathh.info/pubs/codes/SBD/download.html > - 下载地址 http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz > > 注意事项 > > - 如果使用wget下载速度慢,可以使用迅雷等下载工具下载完成后再上传到服务器。 #### 2.1.1 创建原始数据保存目录,并下载数据集 ```shell mkdir raw_data && cd raw_data wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar wget http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz ``` #### 2.1.2 检测数据集MD5(可跳过) ```shell md5sum benchmark.tgz VOCtrainval_11-May-2012.tar ``` > 会输出如下内容 > > ```shell > 82b4d87ceb2ed10f6038a1cba92111cb benchmark.tgz > 6cd6e144f989b92b3379bac3b3de84fd VOCtrainval_11-May-2012.tar > ``` #### 2.1.3 解压数据集 ```shell tar zxvf benchmark.tgz tar xvf VOCtrainval_11-May-2012.tar ``` #### 2.1.4 查看数据集目录结构 ```shell tree -d benchmark_RELEASE/ ``` > 会输出如下内容 > > ```shell > benchmark_RELEASE/ > ├── benchmark_code_RELEASE > │   ├── cp_src > │   └── demo > │   ├── datadir > │   │   ├── cls > │   │   ├── img > │   │   └── inst > │   ├── indir > │   └── outdir > └── dataset > ├── cls > ├── img > └── inst > ``` ```shell tree -d VOCdevkit ``` > 会输出如下内容 > > ```shell > VOCdevkit > └── VOC2012 > ├── Annotations > ├── ImageSets > │   ├── Action > │   ├── Layout > │   ├── Main > │   └── Segmentation > ├── JPEGImages > ├── SegmentationClass > └── SegmentationObject > ``` #### 2.1.5 生成数据清单文件 > raw_data下生成三个新文件`voc_train_lst.txt`,`voc_val_lst.txt`,`vocaug_train_lst.txt` ```shell cd .. python3 src/data/get_dataset_lst.py --data_dir ./raw_data ``` > 会输出如下内容 > > ```shell > Data dir is: ./raw_data > converting voc color png to gray png ... > converting done. > generating voc train list success. > generating voc val list success. > converting sbd annotations to png ... > converting done > generating voc train aug list success. > ``` #### 2.1.6 将数据集转换为MindRecords 创建保存目录 ```shell mkdir vocaug_mindrecords mkdir voctrain_mindrecords mkdir vocval_mindrecords ``` 转换`vocaug_train`数据 ```shell # 注意data_root和dst_path python3 src/data/build_seg_data.py --data_root ./ --data_lst ./raw_data/vocaug_train_lst.txt --dst_path ./vocaug_mindrecords/mindrecord_ --num_shards 8 --shuffle True ``` > 会输出如下内容 > > ```shell > number of samples: 10582 > number of samples written: 1000 > number of samples written: 2000 > number of samples written: 3000 > number of samples written: 4000 > number of samples written: 5000 > number of samples written: 6000 > number of samples written: 7000 > number of samples written: 8000 > number of samples written: 9000 > number of samples written: 10000 > number of samples written: 10582 > ``` > > 可以使用`tree vocaug_mindrecords/`,查看转换后的数据目录,输出如下内容 > > ```shell > vocaug_mindrecords/ > ├── mindrecord_0 > ├── mindrecord_0.db > ├── mindrecord_1 > ├── mindrecord_1.db > ├── mindrecord_2 > ├── mindrecord_2.db > ├── mindrecord_3 > ├── mindrecord_3.db > ├── mindrecord_4 > ├── mindrecord_4.db > ├── mindrecord_5 > ├── mindrecord_5.db > ├── mindrecord_6 > ├── mindrecord_6.db > ├── mindrecord_7 > └── mindrecord_7.db > ``` 依次转换`voc_train`和`voc_val`数据集 ```shell python3 src/data/build_seg_data.py --data_root ./ --data_lst ./raw_data/voc_train_lst.txt --dst_path ./voctrain_mindrecords/mindrecord_ --num_shards 8 --shuffle True python3 src/data/build_seg_data.py --data_root ./ --data_lst ./raw_data/voc_val_lst.txt --dst_path ./vocval_mindrecords/mindrecord_ --num_shards 8 --shuffle True ``` ## 3. 模型训练 ### 3.1 预训练模型下载 ```shell wget https://download.mindspore.cn/model_zoo/r1.2/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt ``` ### 3.2 GPU训练支持 > deeplabv3目前只支持CPU和Ascend,需要增加GPU支持。虽然笔者使用的机器有4张GPU,为保险起见,仅修改代码支持单机单卡GPU。 #### 3.2.1 代码备份 ```shell cp train.py train.py.bak cp default_config.yaml default_config.yaml.bak ``` #### 3.2.2 代码修改 修改原`train.py`文件中109到113行为如下内容: > 注意:读者需要根据GPU实际显存,调整max_device_memory参数。 ```python if args.device_target == "CPU": context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target="CPU") elif args.device_target == "GPU": context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target="GPU", device_id=get_device_id(), max_device_memory="16GB") else: context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target="Ascend", device_id=get_device_id()) ``` 修改原`train.py`文件中192行为如下内容: ```python if args.device_target == "Ascend": amp_level = "O3" else: # CPU GPU amp_level = "O0" ``` 修改原`train.py`中205行味如下内容: ```python model.train(args.train_epochs, dataset, callbacks=cbs, dataset_sink_mode=(args.device_target == "Ascend")) ``` 修改原`default_config.yaml`中第11行内容为 ```yaml device_target: "GPU" # ['Ascend', 'CPU', 'GPU'] ``` ### 3.3 使用VOCaug数据集训练s16,微调ResNet-101预训练模型。 > 注意: > > - --data_file > - If dataset_file is a str, it represents for a file name of one component of a mindrecord source, other files with identical source in the same path will be found and loaded automatically. > > - 读者需要根据实际显存情况调整batch_size。具体大小可参考如下。 > > | GPU_memory_size | batch_size | > | --------------- | ---------- | > | 8GB | 4 | > | 16GB | 16 | 步骤如下 进入`deeplabv3`项目根目录,创建`ckpt`文件夹用来保存模型参数。 ```shell mkdir ms_log mkdir -p s16_aug_train_1g/ckpt ``` 设置指定GPU可见 > 单卡GPU机器可跳过本步骤 ```shell export CUDA_VISIBLE_DEVICES=1 ``` 检测指定GPU是否生效 ```shell echo $CUDA_VISIBLE_DEVICES ``` > 会输出如下内容 > > ```shell > 1 > ``` 使用如下命令进行GPU训练 ```shell nohup python3 train.py --train_dir s16_aug_train_1g/ckpt --data_file ./vocaug_mindrecords/mindrecord_0 --device_target GPU --train_epochs=100 --batch_size=16 --crop_size=513 --base_lr=0.015 --lr_type=cos --min_scale=0.5 --max_scale=2.0 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s16 --ckpt_pre_trained=./resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt --save_steps=1500 --keep_checkpoint_max=20 > ms_log/s16_aug_train_1g.log 2>&1 & ``` 训练过程中,可使用如下命令检测GPU使用情况 ```shell watch -n 0.1 -d nvidia-smi ``` 下图为笔者截取了20s左右时间的GPU使用情况。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/150529ksndbusnpv86dofi.gif) <center>图 GPU使用情况</center> ### 3.4 使用VOCaug数据集训练s8,微调上一步的模型 > 注意: > > - ckpt_pre_trained需要替换为上一步训练的ckpt文件地址 > - 例如:`./s16_aug_train_1g/ckpt/deeplab_v3_s16-7_534.ckpt` 步骤如下 ```shell mkdir -p s8_aug_train_1g/ckpt nohup python3 train.py --train_dir=s8_aug_train_1g/ckpt --data_file=./vocaug_mindrecords/mindrecord_0 --device_target=GPU --train_epochs=200 --batch_size=8 --crop_size=513 --base_lr=0.02 --lr_type=cos --min_scale=0.5 --max_scale=2.0 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s8 --loss_scale=2048 --ckpt_pre_trained=./s16_aug_train_1g/ckpt/deeplab_v3_s16-200_661.ckpt --save_steps=1000 --keep_checkpoint_max=20 > ms_log/s8_aug_train_1g.log 2>&1 & ``` ### 3.5 使用VOCtrain数据集训练s8,微调上一步的模型 > 注意: > > - ckpt_pre_trained需要替换为上一步训练的ckpt文件地址 > - 例如:`./s8_aug_train_1g/ckpt/deeplab_v3_s8-7_534.ckpt` 步骤如下 ```shell mkdir -p s8_voc_train_1g/ckpt nohup python3 train.py --train_dir=s8_voc_train_1g/ckpt --data_file=./voctrain_mindrecords/mindrecord_0 --device_target GPU --train_epochs=200 --batch_size=8 --crop_size=513 --base_lr=0.008 --lr_type=cos --min_scale=0.5 --max_scale=2.0 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s8 --loss_scale=2048 --ckpt_pre_trained=s8_aug_train_1g/ckpt/deeplab_v3_s8-200_1322.ckpt --save_steps=50 --keep_checkpoint_max=200 > ms_log/s8_voc_train_1g.log 2>&1 & ```
  • [安装经验] MindSpore入门--基于Ascend服务器安装MindSpore 1.5.0
    # MindSpore Install On Ascend Server > 基于Ascend服务器安装MindSpore 1.5.0 本文开发环境 > - Ubuntu 18.04 x86_64 > - Ascend 310 > - Python 3.7.5 本文内容摘要 > - 购买服务器 > - 环境配置 > - Mindspore安装 > - Mindspore测试 > - 总结 > - 问题 > - 参考 ## 0. 缘起 笔者在参加`MindCon`比赛过程中,需要用到`Ascend`处理器来跑一些推理步骤。但是当开通相应服务器及使用相应镜像后,在测试MindSpore是否安装成功的环节,出现了如下错误: ```shell ./tensor_add_sample: error while loading shared libraries: libopskernel.so: cannot open shared object file: No such file or directory ``` 这使我意识到Ascend的环境配置并没有想象中那么简单,也许大家也有相关的困扰,故而整理这篇文章供大家参考借鉴。 ## 1. 购买服务器 > 购买昇腾AI云服务器 > > 注意: > > - 系统镜像选择`Ubuntu 18.04 server 64bit for Ai1s` > - 规格选择AI加速型 > - 默认购买的服务器没有开通公网,需要购买一个弹性公网进行IP绑定 前往[昇腾AI云服务器](https://www.huaweicloud.com/product/ecs/ascend.html),按自己需要购买合适服务器。新手可以选择按需模式,节省资金。 本文购买的服务器型号如下图所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/1453230kw7w4lmw2pmqqys.png) 之后根据提示配置网络等,开通服务器。 ## 2. 环境配置 ### 2.1 更新系统 ```shell sudo apt update sudo apt upgrade sudo apt autoremove sudo apt autoclean sudo apt clean ``` ### 2.2 创建目录 > 创建目录用户存储后续的虚拟环境和代码 ```shell mkdir pyenvs mkdir codes ``` ### 2.3 更新cmake > 系统自带的`cmake`版本较低,无法满足后续的一些操作,故升级。 > > 注意事项: > > - 在服务器端使用`wge`t下载`cmake`源码可能速度太慢,建议本地下载好之后上传到服务器 ```shell sudo apt autoremove cmake cd ~/codes wget https://github.com/Kitware/CMake/archive/refs/tags/v3.20.1.tar.gz scp v3.20.1.tar.gz root@124.71.78.174:/root/codes/ tar -zxvf v3.20.1.tar.gz CMake-3.20.1/ ./bootstrap make sudo make install ``` ### 2.4 安装gflags > 后续的一些编译工作需要用到`gflags` ```shell cd ~/codes wget https://github.com/gflags/gflags/archive/refs/tags/v2.2.2.tar.gz scp v2.2.2.tar.gz root@124.71.78.174:/root/codes/ tar -zxvf v2.2.2.tar.gz cd gflags-2.2.2/ mkdir build && cd build export CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 cmake .. -DBUILD_SHARED_LIBS=ON make -j 10 sudo make install ``` ### 2.5 更新Ascend驱动及工具包 > 本文因条件限制采用社区版 > > 社区版下载地址如下: > > [Ascend驱动下载地址](https://www.hiascend.com/hardware/firmware-drivers?tag=community) > > [Ascend工具包下载地址](https://www.hiascend.com/software/cann/community) > > 注意事项: > > - 如果具备商业版下载条件,建议下载商业版。 > - [商业版驱动下载地址](https://www.hiascend.com/hardware/firmware-drivers?tag=commercial) > - [商业版工具包下载地址](https://www.hiascend.com/zh/software/cann/commercial) #### 2.5.1 Ascend相关文件下载 > 注意事项: > > - 建议本地下载完成相应文件后,上传到服务器 驱动文件如下图所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/145406pqxlv4cmcw2zdrqn.png) 工具包文件如下图所示 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/1454202hcnmczekxzzl1ns.png) #### 2.5.2 驱动更新 > 本文上传到了服务器`root`账户的`install`文件夹下 进入安装文件所在目录 ```shell cd ~/install ``` 安装驱动 ```shell # wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/5.0.4.alpha003/Ascend-cann-toolkit_5.0.4.alpha003_linux-x86_64.run chmod a+x A300-3010-npu-driver_21.0.3.1_linux-x86_64.run sudo ./A300-3010-npu-driver_21.0.3.1_linux-x86_64.run --upgrade ``` 驱动安装输出如下内容 ```shell Verifying archive integrity... 100% SHA256 checksums are OK. All good. Uncompressing npu-driver-run-package 100% [Driver] [2021-12-16 16:12:38] [INFO]Start time: 2021-12-16 16:12:38 [Driver] [2021-12-16 16:12:38] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log [Driver] [2021-12-16 16:12:38] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log [Driver] [2021-12-16 16:12:38] [INFO]base version is 20.1.0. [Driver] [2021-12-16 16:12:38] [WARNING]Do not power off or restart the system during the installation/upgrade [Driver] [2021-12-16 16:12:38] [INFO]set username and usergroup, HwHiAiUser:HwHiAiUser deleting install files... remove install files successfully! deleting installed folders... remove install folders successfully! [Driver] [2021-12-16 16:12:45] [INFO]driver install type: DKMS [Driver] [2021-12-16 16:12:45] [INFO]upgradePercentage:10% [Driver] [2021-12-16 16:12:48] [INFO]upgradePercentage:30% [Driver] [2021-12-16 16:12:48] [INFO]upgradePercentage:40% [Driver] [2021-12-16 16:13:35] [INFO]upgradePercentage:90% [Driver] [2021-12-16 16:13:35] [INFO]upgradePercentage:100% [Driver] [2021-12-16 16:13:35] [INFO]Driver package upgrade success! Reboot needed for installation/upgrade to take effect! [Driver] [2021-12-16 16:13:35] [INFO]End time: 2021-12-16 16:13:35 ``` #### 2.5.3 工具包更新 配置python环境 ```shell cd /usr/local/bin ln -s /usr/local/python3.7.5/bin/pip3 ``` 安装工具包 ```shell cd ~/install chmod a+x Ascend-cann-toolkit_5.0.4.alpha003_linux-x86_64.run sudo ./Ascend-cann-toolkit_5.0.4.alpha003_linux-x86_64.run --upgrade ``` 工具包安装输出如下内容 ```shell Verifying archive integrity... 100% SHA256 checksums are OK. All good. Uncompressing ASCEND_RUN_PACKAGE 100% [Toolkit] [20211216-16:20:33] [INFO] LogFile:/var/log/ascend_seclog/ascend_toolkit_install.log [Toolkit] [20211216-16:20:33] [INFO] upgrade start [Toolkit] [20211216-16:21:56] [INFO] Ascend-fwkacllib-1.75.22.0.220-linux.x86_64.run uninstall start [Fwkacllib] [2021-12-16 16:21:56] [INFO]: Start time:2021-12-16 16:21:56 [Fwkacllib] [2021-12-16 16:21:56] [INFO]: LogFile:/var/log/ascend_seclog/ascend_install.log [Fwkacllib] [2021-12-16 16:21:56] [INFO]: InputParams:--uninstall [Fwkacllib] [2021-12-16 16:21:56] [INFO]: uninstall /usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux full [Fwkacllib] [2021-12-16 16:22:00] [INFO]: Fwkacllib package uninstall success! Uninstallation takes effect immediately. [Fwkacllib] [2021-12-16 16:22:00] [INFO]: End time:2021-12-16 16:22:00 [Toolkit] [20211216-16:22:00] [INFO] rm soft link fwkacllib [Toolkit] [20211216-16:22:00] [INFO] Ascend-toolkit-1.75.22.0.220-linux.x86_64.run uninstall start [INFO] LogFile: /var/log/ascend_seclog/ascend_install.log [INFO] OperationLogFile: /var/log/ascend_seclog/operation.log [INFO] InputParams: --quiet [INFO] base version is 1.75.22.0.220. 2021-12-16,16:22:00 [INFO] [MSVP] install_profiling_hiprof.sh: Begin to uninstall profiling... 2021-12-16,16:22:00 [INFO] [MSVP] install_profiling_hiprof.sh: Begin to stop profiling... 2021-12-16,16:22:00 [INFO] [MSVP] install_profiling_hiprof.sh: Stop profiler success 2021-12-16,16:22:00 [INFO] [MSVP] install_profiling_hiprof.sh: Uninstall profiling success. deleting install files... remove install files successfully! deleting installed folders... remove install folders successfully! [INFO] Toolkit package uninstall success! Uninstallation takes effect immediately. [Toolkit] [20211216-16:22:07] [INFO] rm soft link toolkit [Toolkit] [20211216-16:22:07] [INFO] Ascend-opp-1.75.22.0.220-linux.x86_64.run uninstall start [opp] [INFO]:Execute the opp run package. [opp] [INFO]:OperationLogFile path: /var/log/ascend_seclog/ascend_install.log. [opp] [INFO]:Input params: --uninstall --install-path=/usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux --quiet [opp] [INFO]:Begin uninstall the opp module. [opp] [INFO]:Delete the ops soft link (/usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux/ops). [opp] [INFO]:Delete the installed opp source files in (/usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux). [opp] [INFO]:Delete the version info file (/usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux/opp/version.info). [opp] [INFO]:Delete the install info file (/usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux/opp/ascend_install.info). [opp] [INFO]:Opp package uninstall success! Uninstallation takes effect immediately. Install MAJOR root 2021-12-16 16:22:09 127.0.0.1 OPP success install_type=full; cmdlist=--uninstall --install-path=/usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux --quiet. [Toolkit] [20211216-16:22:09] [INFO] rm soft link opp [Toolkit] [20211216-16:22:09] [INFO] Ascend-atc-1.75.22.0.220-linux.x86_64.run uninstall start [Atc] [2021-12-16 16:22:09] [INFO]: Start time:2021-12-16 16:22:09 [Atc] [2021-12-16 16:22:09] [INFO]: LogFile:/var/log/ascend_seclog/ascend_install.log [Atc] [2021-12-16 16:22:09] [INFO]: InputParams:--uninstall [Atc] [2021-12-16 16:22:09] [INFO]: uninstall /usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux full [Atc] [2021-12-16 16:22:11] [INFO]: Atc package uninstall success! Uninstallation takes effect immediately. [Atc] [2021-12-16 16:22:11] [INFO]: End time:2021-12-16 16:22:11 [Toolkit] [20211216-16:22:11] [INFO] rm soft link atc [Toolkit] [20211216-16:22:11] [INFO] Ascend-pyACL-20.1.rc1-linux.x86_64.run uninstall start [Toolkit] [20211216-16:22:11] [INFO] rm soft link pyACL [Toolkit] [20211216-16:22:11] [INFO] Ascend-acllib-1.75.22.0.220-linux.x86_64.run uninstall start [Acllib] [2021-12-16 16:22:11] [INFO]: Start time:2021-12-16 16:22:11 [Acllib] [2021-12-16 16:22:11] [INFO]: LogFile:/var/log/ascend_seclog/ascend_install.log [Acllib] [2021-12-16 16:22:11] [INFO]: InputParams:--uninstall [Acllib] [2021-12-16 16:22:11] [INFO]: uninstall /usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux full [Acllib] [2021-12-16 16:22:11] [INFO]: step into run_acllib_uninstall.sh ...... [Acllib] [2021-12-16 16:22:11] [INFO]: uninstall targetdir /usr/local/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux, type full. [Acllib] [2021-12-16 16:22:12] [INFO]: Acllib package uninstall success! Uninstallation takes effect immediately. [Acllib] [2021-12-16 16:22:12] [INFO]: End time:2021-12-16 16:22:12 [Toolkit] [20211216-16:22:12] [INFO] rm soft link acllib_linux.x86_64 [Toolkit] [20211216-16:22:12] [INFO] rm soft link acllib [Toolkit] [20211216-16:22:12] [INFO] /usr/local/Ascend/ascend-toolkit not empty [Toolkit] [20211216-16:22:12] [INFO] toolkit uninstall success [Toolkit] [20211216-16:22:12] [INFO] process end [Toolkit] [20211216-16:22:12] [INFO] upgrade package CANN-runtime-5.0.4.alpha003-linux.x86_64.run start [Toolkit] [20211216-16:22:13] [INFO] CANN-runtime-5.0.4.alpha003-linux.x86_64.run --full --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:22:13] [INFO] upgrade package CANN-compiler-5.0.4.alpha003-linux.x86_64.run start WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv [Toolkit] [20211216-16:22:53] [INFO] CANN-compiler-5.0.4.alpha003-linux.x86_64.run --full --pylocal --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:22:53] [INFO] upgrade package CANN-opp-5.0.4.alpha003-linux.x86_64.run start [Toolkit] [20211216-16:23:16] [INFO] CANN-opp-5.0.4.alpha003-linux.x86_64.run --full --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:23:16] [INFO] upgrade package CANN-toolkit-5.0.4.alpha003-linux.x86_64.run start [Toolkit] [20211216-16:23:55] [INFO] CANN-toolkit-5.0.4.alpha003-linux.x86_64.run --full --pylocal --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:23:55] [INFO] upgrade package Ascend-mindstudio-toolkit_3.0.3.B120_linux-x86_64.run start [Toolkit] [20211216-16:23:56] [INFO] Ascend-mindstudio-toolkit_3.0.3.B120_linux-x86_64.run --full --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:23:56] [INFO] upgrade package Ascend-test-ops_5.0.3_linux.run start [Toolkit] [20211216-16:23:56] [INFO] Ascend-test-ops_5.0.3_linux.run --full --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:23:56] [INFO] upgrade package Ascend-pyACL_5.0.3_linux-x86_64.run start [Toolkit] [20211216-16:23:56] [INFO] Ascend-pyACL_5.0.3_linux-x86_64.run --full --quiet --nox11 --install-path=/usr/local/Ascend/ascend-toolkit/5.0.4.alpha003/x86_64-linux --install-for-all upgrade success [Toolkit] [20211216-16:23:57] [INFO] /etc/Ascend/ascend_cann_install.info generate success [Toolkit] [20211216-16:23:57] [INFO] Please make sure that: PATH includes : /usr/local/Ascend/ascend-toolkit/latest/bin: /usr/local/Ascend/ascend-toolkit/latest/compiler/ccec_compiler/bin: LD_LIBRARY_PATH includes : /usr/local/Ascend/ascend-toolkit/latest/lib64: /usr/local/Ascend/ascend-toolkit/latest/compiler/lib64/plugin/opskernel: /usr/local/Ascend/ascend-toolkit/latest/compiler/lib64/plugin/nnengine: PYTHONPATH includes : /usr/local/Ascend/ascend-toolkit/latest/python/site-packages: /usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe: ASCEND_AICPU_PATH includes : /usr/local/Ascend/ascend-toolkit/latest: ASCEND_OPP_PATH includes : /usr/local/Ascend/ascend-toolkit/latest/opp: TOOLCHAIN_HOME includes : /usr/local/Ascend/ascend-toolkit/latest/toolkit: ASCEND_AUTOML_PATH includes : /usr/local/Ascend/latest/tools: [Toolkit] [20211216-16:23:57] [INFO] If your service is started using the shell script, you can call the /usr/local/Ascend/ascend-toolkit/set_env.sh script to configure environment variables. Note that this script can not be executed mannually. [Toolkit] [20211216-16:23:57] [INFO] Ascend-cann-toolkit_5.0.4.alpha003_linux-x86_64 upgrade success,The install path is /usr/local/Ascend ! ``` #### 2.5.4 设置环境变量 > 新新版本工具包生成的环境变量文件复制到指定目录,便于使用 ```shell mkdir ~/scripts cp /usr/local/Ascend/ascend-toolkit/set_env.sh ~/scripts/ ``` #### 2.5.5 安装amct工具(可跳过) > 在使用Ascend服务器和MindSpore进行量化推理时,会用到该工具。但笔者尚未在社区版找到可用版本,仅商业版支持。 > > 下面仅给出相关工具包下载地址,安装过程暂略。 [Ascend-cann-amct_1.76.22.10.220_ubuntu18.04-x86_64.tar.gz](https://support.huawei.com/enterprise/zh/software/251707127-ESW2000346859) [上述文件说明文档](https://support.huawei.com/enterprise/en/doc/EDOC1100206689/d6f46f4b/installation) [商业版见参考2]() ## 3. MindSpore安装 ### 3.1 设置python环境 > 主要设置pip和virtualenv ```shell pip3 install --upgrade pip3 pip3 config set global.index-url https://mirror.baidu.com/pypi/simple pip3 install virtualenv ``` ### 3.2 安装mindspore ```shell cd ~/pyenvs virtualenv -p python3 env_mindspore_ascend_1.5.0 source env_mindspore_ascend_1.5.0/bin/activate pip3 list Package Version --------------- ------- auto-tune 0.1.0 hccl 0.1.0 op-gen 0.1 op-test-frame 0.1 pip 21.3.1 schedule-search 0.0.1 setuptools 58.3.0 te 0.4.0 topi 0.4.0 wheel 0.37.0 pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/MindSpore/ascend/x86_64/mindspore_ascend-1.5.0-cp37-cp37m-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple ``` ## 4. MindSpore测试 ### 4.1 安装缺失环境 ```shell pip3 install decorator pip3 install sympy ``` ### 4.2 测试是否安装成功 下载测试代码并编译 ```shell cd ~/codes wget --no-check-certificate https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/sample_resources/ascend310_single_op_sample.zip unzip -x ascend310_single_op_sample.zip cmake . -DMINDSPORE_PATH=`pip3 show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath` make ``` 测试命令如下 ```shell ./tensor_add_sample ``` 如果安装成功,会输出如下内容 ```shell 3 5 7 9 ``` ## 总结 本文主要介绍了在`Ascend`服务器上配置安装`MindSpore`,及对其进行简单测试的相关操作。 ## 问题 暂无 ## 参考 1. [弹性云服务器--AI加速型](https://support.huaweicloud.com/productdesc-ecs/ecs_01_0047.html) 2. [amct量化工具包中没有sample文件](https://toscode.gitee.com/ascend/modelzoo/issues/I4HA8A)
  • [安装经验] MindSpore入门--基于GPU服务器安装MindSpore 1.5.0
    # MindSpore Install On GPU Server > 基于GPU服务器安装MindSpore 1.5.0 本文开发环境如下: > - Ubuntu Server 20.04 x86_64 > - Python 3.8.10 > - Cuda 11.1.0 > - RTX 3090 * 4 > - MindSpore 1.5.0 本文主要内容如下: >- 系统安装(略过) >- GPU环境配置 >- Mindspore安装及测试 >- Mindspore Serving安装及测试 >- 问题 >- 参考 ## 0. 系统安装 > `Ubuntu Server 20.04`安装过程略 ## 1. GPU环境配置 ### 1.1 NVIDIA驱动配置 > 注意事项: > > - NVIDIA驱动版本:NVIDIA-Linux-x86_64-470.86.run > - 关于dkms:由于采用run格式文件安装,未避免系统内核升级重启后,需要重新安装驱动,这里引入dkms模块。 ```shell sudo apt install dkms wget -c https://cn.download.nvidia.com/XFree86/Linux-x86_64/470.86/NVIDIA-Linux-x86_64-470.86.run chmod a+x NVIDIA-Linux-x86_64-470.86.run sudo ./NVIDIA-Linux-x86_64-470.86.run ``` 具体安装过程如下图1.1所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/143929dsepoq1p9yqi50d0.gif)图1.1 NVIDIA驱动安装过程安装完成后,使用命令`nvidia-smi`查看是否安装成功。如果安装成功,会输入类似如下内容。 ```shell +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:24:00.0 Off | N/A | | 30% 34C P0 103W / 350W | 0MiB / 24268MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:41:00.0 Off | N/A | | 30% 35C P0 107W / 350W | 0MiB / 24268MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... Off | 00000000:81:00.0 Off | N/A | | 30% 34C P0 103W / 350W | 0MiB / 24268MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... Off | 00000000:E1:00.0 Off | N/A | | 30% 34C P0 103W / 350W | 0MiB / 24268MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ### 1.2 CUDA环境配置 #### 1.2.1 安装CUDA ```shell wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run chmod a+x cuda_11.1.0_455.23.05_linux.run sudo ./cuda_11.1.0_455.23.05_linux.run ``` 具体安装过程如下图1.2所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/144021oxv0n47jiw4gcpdv.gif)图1.2 CUDA 11.1.0 安装过程安装成功后输出如下内容: ```shell =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-11.1/ Samples: Not Selected Please make sure that - PATH includes /usr/local/cuda-11.1/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.1/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.1/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.1 functionality to work. To install the driver using this installer, run the following command, replacing with the name of this run file: sudo .run --silent --driver Logfile is /var/log/cuda-installer.log ``` #### 1.2.2 配置环境变量 > 添加`CUDA`环境变量的常规方法添加在`/etc/profile`或用户目录下的`.bashrc`文件中,这样每次登陆终端便会自动生效。 > > 如果需要在多个深度学习框架之间切换,且各个框架需要的`CUDA`环境不同时,上述方面即为不方便。 > > 因此,本文将`CUDA`添加到特定的bash文件中,需要使用某个`CUDA`环境时,直接`source`即可。 ```shell vim ~/env_cuda_11.1.0.sh ``` 使用上述命令在用户目录下创建`bash`文件,写入如下内容。 > 注意:`bash`文件名可以自定义。 ```shell #!/bin/bash # cuda 11.1.0 ########################### ## cuda 11.1.0 ## ########################### export PATH="/usr/local/cuda-11.1/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH" ``` 如果要使用本文`CUDA`环境,只需要执行以下命令即可。 ```shell source ~/env_cuda_11.1.0.sh ``` 使用`nvcc -V`命令检测`CUDA`环境是否生效。生效后输出如下内容。 ```shell nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Tue_Sep_15_19:10:02_PDT_2020 Cuda compilation tools, release 11.1, V11.1.74 Build cuda_11.1.TC455_06.29069683_0 ``` ### 1.3 cuDNN环境配置 > 注意事项:`cuDNN`文件下载需要注册NVIDIA账号。 从`cuDNN`官网下载相应包,并上传到服务器。需要下载的文件如下图所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/144341zasa1wnlwyqlhzwo.png) 然后使用如下命令进行安装。 ```shell tar -zxvf cudnn-11.1-linux-x64-v8.0.5.39.tgz cd cuda sudo cp include/*.h /usr/local/cuda-11.1/include/ sudo cp -d lib64/libcudnn* /usr/local/cuda-11.1/lib64/ ``` 如果需要查看解压内容,可在解压后`cuda`目录下使用命令`tree`。 ```shell . ├── include │   ├── cudnn_adv_infer.h │   ├── cudnn_adv_train.h │   ├── cudnn_backend.h │   ├── cudnn_cnn_infer.h │   ├── cudnn_cnn_train.h │   ├── cudnn.h │   ├── cudnn_ops_infer.h │   ├── cudnn_ops_train.h │   └── cudnn_version.h ├── lib64 │   ├── libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8 │   ├── libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.0.5 │   ├── libcudnn_adv_infer.so.8.0.5 │   ├── libcudnn_adv_train.so -> libcudnn_adv_train.so.8 │   ├── libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.0.5 │   ├── libcudnn_adv_train.so.8.0.5 │   ├── libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8 │   ├── libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.0.5 │   ├── libcudnn_cnn_infer.so.8.0.5 │   ├── libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8 │   ├── libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.0.5 │   ├── libcudnn_cnn_train.so.8.0.5 │   ├── libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8 │   ├── libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.0.5 │   ├── libcudnn_ops_infer.so.8.0.5 │   ├── libcudnn_ops_train.so -> libcudnn_ops_train.so.8 │   ├── libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.0.5 │   ├── libcudnn_ops_train.so.8.0.5 │   ├── libcudnn.so -> libcudnn.so.8 │   ├── libcudnn.so.8 -> libcudnn.so.8.0.5 │   ├── libcudnn.so.8.0.5 │   └── libcudnn_static.a └── NVIDIA_SLA_cuDNN_Support.txt ``` ### 1.4 NCCL环境配置 > 注意事项: > > - 如果非多GPU环境,可跳过本步骤。 > > - `NCCL`文件下载需要注册NVIDIA账号。 从`NCCL`官网下载相应包,并上传到服务器。需要下载的文件如下图所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/144114y5oijmwvmn4xi21p.png) 然后使用如下命令进行安装。 ```shell tar -xvf nccl_2.8.4-1+cuda11.1_x86_64.txz cd nccl_2.8.4-1+cuda11.1_x86_64 sudo cp include/*.h /usr/local/cuda-11.1/include/ sudo cp -d -r lib/* /usr/local/cuda-11.1/lib64/ ``` 如果需要查看解压内容,可在解压后`nccl_2.8.4-1+cuda11.1_x86_64`目录下使用命令`tree`。 ```shell . ├── include │   ├── nccl.h │   └── nccl_net.h ├── lib │   ├── libnccl.so -> libnccl.so.2 │   ├── libnccl.so.2 -> libnccl.so.2.8.4 │   ├── libnccl.so.2.8.4 │   ├── libnccl_static.a │   └── pkgconfig │   └── nccl.pc └── LICENSE.txt ``` ### 1.5 TensorRT环境配置 > 注意事项: > > - 如果不使用`MindSpore Serving`推理服务,可跳过本步骤。 > > - 本文采用tar文件安装方式,需要先注册NVIDIA账号下载相应文件。 从`TensorRT`官网下载相应包,并上传到服务器。需要下载的文件如下图所示。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/144219cx7r9uhbipppqscn.png) 然后使用如下命令进行安装。 - 基础依赖库安装。 ```shell tar zxvf TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz mkdir ~/tensorrt mv TensorRT-7.2.2.3 ~/tensorrt/ ``` 在1.2.2中的`env_cuda_11.1.0.sh`文件中增加`tensorrt`环境。内容如下。 ```shell # ${TensorRT-path}为变量,需要替换为你的真实目录。 # 例如:LD_LIBRARY_PATH=/home/ubuntu/tensorrt/TensorRT-7.2.2.3/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${TensorRT-path}/lib>:$LD_LIBRARY_PATH ``` - python包安装 > 建议:在完成Mindspore虚拟环境创建并激活后安装 ```shell cd ~/tensorrt/TensorRT-7.2.2.3/ pip3 install python/tensorrt-7.2.2.3-cp38-none-linux_x86_64.whl pip3 install uff/uff-0.6.9-py2.py3-none-any.whl pip3 install graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl pip3 install onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl ``` ## 2. MindSpore安装及测试 #### 2.1 安装`Python`基础环境 > 如果已经安装,可跳过本步骤。 ```shell sudo apt update sudo apt install python3 python3-dev python3-pip sudo pip3 install virtualenv ``` 可以使用`python3 -V`命令检查安装的`python`版本。本文版本信息如下。 ```shell Python 3.8.10 ``` #### 2.2. 安装`MindSpore` - 创建虚拟环境 ```shell virtualenv -p python3 env_mindspore_1.5.0 ``` 输出内容如下 ```shell created virtual environment CPython3.8.10.final.0-64 in 292ms creator CPython3Posix(dest=/mnt/data_0301_12t/xingchaolong/home/pyenvs/env_mindspore_1.5.0, clear=False, no_vcs_ignore=False, global=False) seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/xingchaolong/.local/share/virtualenv) added seed packages: pip==21.3.1, setuptools==58.3.0, wheel==0.37.0 activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator ``` - 激活虚拟环境 ```shell source env_mindspore_1.5.0/bin/activate ``` - 激活`CUDA`环境 ```shell source ~/env_cuda_11.1.0.sh ``` - 安装`mindspore` ```shell pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/MindSpore/gpu/x86_64/cuda-11.1/mindspore_gpu-1.5.0-cp38-cp38-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple ``` #### 2.3 测试`Mindspore` 使用如下命令测试是否安装成功。 ```shell python3 -c "import mindspore;mindspore.run_check()" ``` 如果安装成功,输出如下内容。 ```shell MindSpore version: 1.5.0 The result of multiplication calculation is correct, MindSpore has been installed successfully! ``` ## 3. MindSpore Serving安装及测试 > 注意事项: > > - MindSpore Serving目前并未安装成功,以下步骤仅供参考。 #### 3.1 安装`MindSpore Serving` ```shell pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/Serving/x86_64/mindspore_serving-1.5.0-cp38-cp38-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple ``` #### 3.2 测试`MindSpore Serving` ```shell python3 -c "from mindspore_serving import server" ``` 如果没有报错,则表示安装成功。 ## 4. 总结 本文介绍了Ubuntu Server 20.04环境下,基于英伟达GPU硬件配置相关环境,安装MindSpore及对其测试。 ## 问题 - MindSpore Serving测试时出现以下问题 ```shell free(): invalid pointer Aborted (core dumped) ``` 目前官方已修复该问题。 ## 参考 - [DKMS简介](http://abcdxyzk.github.io/blog/2020/09/21/kernel-dkms/) - [TensorRT安装](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar) - [安装MindSpore](https://www.mindspore.cn/install) - [安装MindSpore Serving](https://www.mindspore.cn/serving/docs/zh-CN/r1.5/serving_install.html)
  • [安装经验] MindSpore进阶--基于macOS编译MindQuantum
    # build mindquantum on macos > 基于macOS编译MindQuantum 本文开发环境 > MacBook Pro (16-inch, 2019) > > macOS Catania 10.15.7 (19H1615) > > Python 3.9.0 > > MindSpore 1.6.0 > > cmake 3.22.2 > > libomp 13.0.1 本文主要内容 - 环境准备 - 软件编译 - 安装测试 - 本文总结 - 问题思路 - 本文参考 ## 1. 环境准备 ### 1.1 python 3.9.0 > - 本文安装的是Python 3.9.0 > > - [下载地址](https://www.python.org/downloads/release/python-390/) > - 安装完成后,安装目录为`/Library/Frameworks/Python.framework/Versions/3.9/` 具体所需文件如下图所示,安装方法点击安装即可。 ![](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202202/10/115842m3o7koriaguux6lr.png) 安装完成后,可以使用如下命令检测是否安装成功: ```shell python3.9 --version ``` 输出如下内容,则表示成功。 ```shell Python 3.9.0 ``` ### 1.2 brew #### 1.2.1 安装brew > 使用`brew `安装的相关软件及库会存放在`/usr/local/Cellar/`文件夹下。 安装命令如下: ```shell cd /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" brew update ``` #### 1.2.2 安装cmake 安装命令如下: ```shell brew install cmake ``` 校验安装版本: ```shell cmake --version ``` > 会输出如下版本内容: > > ```shell > cmake version 3.22.2 > > CMake suite maintained and supported by Kitware (kitware.com/cmake). > ``` #### 1.2.3 安装libomp > 库安装目录为`/usr/local/Cellar/libomp/13.0.1` 安装命令如下: ```shell brew install libomp ``` 可以使用`tree`命令查看库内容。 > `tree`命令可以使用`brew install tree`安装 ```shell tree /usr/local/Cellar/libomp/13.0.1 ``` > 输出如下内容: > > ```shell > /usr/local/Cellar/libomp/13.0.1 > ├── INSTALL_RECEIPT.json > ├── LICENSE.TXT > ├── README.rst > ├── include > │   ├── omp-tools.h > │   ├── omp.h > │   └── ompt.h > └── lib > ├── libomp.a > └── libomp.dylib > > 2 directories, 8 files > ``` #### 1.2.4 安装mindspore ##### 1.2.4.1 安装virtualenv 略 ##### 1.2.4.2 安装mindspore 创建虚拟环境,命令如下: ```shell mkdir ~/pyenvs && cd pyenvs # 创建并进入虚拟环境目录 virtualenv -p /Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9 env_mindquantum source env_mindquantum/bin/activate # 激活虚拟环境 ``` 安装`MindSpore`,命令如下: ```shell pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.0/MindSpore/cpu/x86_64/mindspore-1.6.0-cp39-cp39-macosx_10_15_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple ``` 检测`MindSpore`是否安装成功,命令如下: ```shell python3 -c "import mindspore;mindspore.run_check()" ``` > 输出如下内容,安装成功! > > ```shell > MindSpore version: 1.6.0 > WARNING: Logging before InitGoogleLogging() is written to STDERR > [WARNING] DEBUG(11665,0x114151dc0,Python):2022-02-10-11:00:55.264.923 [mindspore/ccsrc/debug/debugger/debugger.cc:95] Debugger] Not enabling debugger. Debugger does not support CPU. > The result of multiplication calculation is correct, MindSpore has been installed successfully! > ``` ## 2. 软件编译 > 注意事项: > > - 如下编译过程在上述虚拟环境激活情况下进行!!! ### 2.1 克隆仓库并进入代码目录 命令如下: ```shell git clone https://gitee.com/mindspore/mindquantum.git && cd mindquantum ``` ### 2.2 安装所需依赖环境 命令如下 ```shell pip3 install -r requirements.txt ``` ### 2.3 修改`utils.h`部分代码 备份原始代码文件,命令如下: ```shell cp mindquantum/src/core/utils.h mindquantum/src/core/utils.h.bak ``` 修改`utils.h`第21行为如下内容: ```shell /usr/local/Cellar/libomp/13.0.1/include/omp.h ``` > 这里要修改的文件地址,实际就是1.2.3中安装的`libomp`中`omp.h`的绝对地址。 ### 2.4 编译软件 ```shell bash build.sh ``` 若顺利编译成功,会输出如下内容: ```shell ...... running install running install_lib creating build/bdist.macosx-10.9-x86_64 creating build/bdist.macosx-10.9-x86_64/wheel creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/core creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/core/gates creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/core/parameterresolver creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/core/operators creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/core/circuit creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/core/third_party creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/framework creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/io creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/io/qasm creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/io/display creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/utils creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/algorithm creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/algorithm/library creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/algorithm/nisq creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/algorithm/nisq/qaoa creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/algorithm/nisq/qnn creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/algorithm/nisq/chem creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/third_party creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/simulator creating build/bdist.macosx-10.9-x86_64/wheel/mindquantum/engine running install_egg_info Copying mindquantum.egg-info to build/bdist.macosx-10.9-x86_64/wheel/mindquantum-0.5.0rc1-py3.9.egg-info running install_scripts [WARNING] This wheel needs a higher macOS version than the version your Python interpreter is compiled against. To silence this warning, set MACOSX_DEPLOYMENT_TARGET to at least 10_15 or recreate these files with lower MACOSX_DEPLOYMENT_TARGET: build/bdist.macosx-10.9-x86_64/wheel/mindquantum/mqbackend.cpython-39-darwin.so[WARNING] This wheel needs a higher macOS version than the version your Python interpreter is compiled against. To silence this warning, set MACOSX_DEPLOYMENT_TARGET to at least 10_15 or recreate these files with lower MACOSX_DEPLOYMENT_TARGET: build/bdist.macosx-10.9-x86_64/wheel/mindquantum/mqbackend.cpython-39-darwin.so------Successfully created mindquantum package------ ``` 同时会生成`output`文件夹,使用`tree output`命令,查看文件夹内容如下所示: ```shell output ├── mindquantum-0.5.0rc1-cp39-cp39-macosx_10_15_x86_64.whl └── mindquantum-0.5.0rc1-cp39-cp39-macosx_10_15_x86_64.whl.sha256 0 directories, 2 files ``` ## 3. 安装测试 ### 3.1 安装`mindquantum` 安装命令如下: ```shell pip3 install output/mindquantum-0.5.0rc1-cp39-cp39-macosx_10_15_x86_64.whl ``` 检测是否安装成功,命令如下: > 注意要跳出编译代码目录!!! ```shell cd python3 -c 'import mindquantum' ``` 如果没有报错输出,则表示安装成功。 ### 3.2 测试`mindquantum` 3.1中已经检测了是否安装成功,下面用一个demo来进一步验证编译是否存在问题。 测试代码如下: ```python import mindspore as ms import numpy as np from mindquantum import * def demo(): encoder = Circuit().h(0).rx({'a0': 2}, 0).ry('a1', 1) print("=="*32) print(encoder) print(encoder.get_qs(pr={'a0': np.pi/2, 'a1': np.pi/2}, ket=True)) print("=="*32) ansatz = CPN(encoder.hermitian(), {'a0': 'b0', 'a1': 'b1'}) sim = Simulator('projectq', 2) ham = Hamiltonian(-QubitOperator('Z0 Z1')) grad_ops = sim.get_expectation_with_grad(ham, encoder + ansatz, encoder_params_name=encoder.params_name, ansatz_params_name=ansatz.params_name) ms.context.set_context(mode=ms.context.PYNATIVE_MODE, device_target="CPU") net = MQLayer(grad_ops) encoder_data = ms.Tensor(np.array([[np.pi / 2, np.pi / 2]])) opti = ms.nn.Adam(net.trainable_params(), learning_rate=0.1) train_net = ms.nn.TrainOneStepCell(net, opti) for i in range(100): train_net(encoder_data) print(dict(zip(ansatz.params_name, net.trainable_params()[0].asnumpy()))) if __name__ == "__main__": demo() ``` 新建`demo.py`文件,将上述代码拷贝粘贴到该文件内,然后执行如下命令: ```shell python3 demo.py ``` 输出如下内容,表示编译的软件运行良好: ```shell ================================================================ q0: ────H───────RX(2*a0)── q1: ──RY(a1)────────────── -1/2j¦00⟩ -1/2j¦01⟩ -1/2j¦10⟩ -1/2j¦11⟩ ================================================================ WARNING: Logging before InitGoogleLogging() is written to STDERR [WARNING] DEBUG(13135,0x10ad9fdc0,Python):2022-02-10-11:30:23.404.250 [mindspore/ccsrc/debug/debugger/debugger.cc:95] Debugger] Not enabling debugger. Debugger does not support CPU. {'b1': 1.571565, 'b0': 0.0050863153} ``` ## 4. 本文总结 本文在macOS系统中,对`MindSpore`项目下的`mindquantum`进行了编译,并安装测试了编译出的`whl`包。 ## 5. 问题思路 ### 问题1 > 无法找到`omp.h`库文件,先查看是否已安装相应库,发现在已安装情况下仍然找不到,最终绝对替换为文件绝对地址。 ```shell running bdist_wheel running build running build_py running egg_info writing mindquantum.egg-info/PKG-INFO writing dependency_links to mindquantum.egg-info/dependency_links.txt writing requirements to mindquantum.egg-info/requires.txt writing top-level names to mindquantum.egg-info/top_level.txt reading manifest file 'mindquantum.egg-info/SOURCES.txt' writing manifest file 'mindquantum.egg-info/SOURCES.txt' running build_ext using cmake command: cmake ----- Configuring from /Users/kaierlong/Documents/Codes/gitee/mindquantum ------ CMake command: cmake /Users/kaierlong/Documents/Codes/gitee/mindquantum -DPython_EXECUTABLE:FILEPATH=/Users/kaierlong/Documents/PyEnv/env_mindspore_1.6.0/bin/python3 -DBUILD_TESTING:BOOL=OFF -DIN_PLACE_BUILD:BOOL=OFF -DIS_PYTHON_BUILD:BOOL=ON -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DVERSION_INFO="{self.distribution.get_version()}" -DCMAKE_BUILD_TYPE=Release -DENABLE_PROJECTQ:BOOL=ON -DENABLE_QUEST:BOOL=OFF -DENABLE_OPENMP:BOOL=ON -DQUEST_OUTPUT_DIR=/Users/kaierlong/Documents/Codes/gitee/mindquantum/build/lib.macosx-10.9-x86_64-3.9/mindquantum -DMQBACKEND_OUTPUT_DIR=/Users/kaierlong/Documents/Codes/gitee/mindquantum/build/lib.macosx-10.9-x86_64-3.9/mindquantum cwd: build/temp.macosx-10.9-x86_64-3.9/mindquantum -- Detected processor: x86_64 -- Looking for pybind11 -- Looking for pybind11-global in virtualenv -- Looking for pybind11-global in virtualenv - Not-found -- Looking for pybind11 >= 2.6.0 in virtualenv -- Could NOT find PYMOD_pybind11 (missing: PYMOD_PYBIND11_PATH) (Required is at least version "2.6.0") -- Looking for pybind11 >= 2.6.0 in virtualenv - Not-found -- Could NOT find PYMOD_pybind11_cmake (missing: PYMOD_PYBIND11_CMAKE_PATH) -- Looking for pybind11-global in global and user sites -- Looking for pybind11-global in global and user sites - Not-found -- Looking for pybind11 - Failed -- Could NOT find pybind11 (missing: pybind11_DIR) -- pybind11 was not found on your system or it is an incompatible version -- -> will be fetching pybind11 from an external Git repository -- Populating pybind11 -- Configuring done -- Generating done -- Build files have been written to: /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/_deps/pybind11-subbuild [100%] Built target pybind11-populate -- pybind11 v2.7.1 CMake Warning (dev) at /usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/CMakeDependentOption.cmake:84 (message): Policy CMP0127 is not set: cmake_dependent_option() supports full Condition Syntax. Run "cmake --help-policy CMP0127" for policy details. Use the cmake_policy command to set the policy and suppress this warning. Call Stack (most recent call first): build/temp.macosx-10.9-x86_64-3.9/mindquantum/_deps/pybind11-src/CMakeLists.txt:98 (cmake_dependent_option) This warning is for project developers. Use -Wno-dev to suppress it. -- Populating projectq -- Configuring done -- Generating done -- Build files have been written to: /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/_deps/projectq-subbuild [100%] Built target projectq-populate -- Configuring done -- Generating done -- Build files have been written to: /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum --- End configuring from /Users/kaierlong/Documents/Codes/gitee/mindquantum ---- ------------------------ Building mindquantum.libQuEST ------------------------- CMake command: {" ".join(self.cmake_cmd + ["--build", ".", "--target", ext.target] + self.build_args)} cwd: build/temp.macosx-10.9-x86_64-3.9/mindquantum make: *** No rule to make target `QuEST'. Stop. Failed to compile optional extension QuEST (not an error) ---------------------- End building mindquantum.libQuEST ----------------------- ------------------------ Building mindquantum.mqbackend ------------------------ CMake command: {" ".join(self.cmake_cmd + ["--build", ".", "--target", ext.target] + self.build_args)} cwd: build/temp.macosx-10.9-x86_64-3.9/mindquantum /usr/local/Cellar/cmake/3.22.2/bin/cmake -S/Users/kaierlong/Documents/Codes/gitee/mindquantum -B/Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum --check-build-system CMakeFiles/Makefile.cmake 0 /Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Makefile2 mqbackend /usr/local/Cellar/cmake/3.22.2/bin/cmake -S/Users/kaierlong/Documents/Codes/gitee/mindquantum -B/Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum --check-build-system CMakeFiles/Makefile.cmake 0 /usr/local/Cellar/cmake/3.22.2/bin/cmake -E cmake_progress_start /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/CMakeFiles 4 /Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Makefile2 mindquantum/src/CMakeFiles/mqbackend.dir/all /Applications/Xcode.app/Contents/Developer/usr/bin/make -f mindquantum/src/CMakeFiles/mq_base.dir/build.make mindquantum/src/CMakeFiles/mq_base.dir/depend cd /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum && /usr/local/Cellar/cmake/3.22.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/kaierlong/Documents/Codes/gitee/mindquantum /Users/kaierlong/Documents/Codes/gitee/mindquantum/mindquantum/src /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/mindquantum/src /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/mindquantum/src/CMakeFiles/mq_base.dir/DependInfo.cmake --color= Dependencies file "mindquantum/src/CMakeFiles/mq_base.dir/utils.cc.o.d" is newer than depends file "/Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/mindquantum/src/CMakeFiles/mq_base.dir/compiler_depend.internal". Consolidate compiler generated dependencies of target mq_base /Applications/Xcode.app/Contents/Developer/usr/bin/make -f mindquantum/src/CMakeFiles/mq_base.dir/build.make mindquantum/src/CMakeFiles/mq_base.dir/build [ 25%] Building CXX object mindquantum/src/CMakeFiles/mq_base.dir/utils.cc.o cd /Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/mindquantum/src && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DENABLE_OPENMP -DVERSION_INFO="\"{self.distribution.get_version()}\"" -D_FORTIFY_SOURCE=2 -I/Users/kaierlong/Documents/Codes/gitee/mindquantum/build/temp.macosx-10.9-x86_64-3.9/mindquantum/_deps/pybind11-src/include -I/Users/kaierlong/Documents/Codes/gitee/mindquantum/mindquantum/src -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk -mmacosx-version-min=10.15 -fPIC -ffast-math -O3 -fstack-protector-all -std=c++17 -MD -MT mindquantum/src/CMakeFiles/mq_base.dir/utils.cc.o -MF CMakeFiles/mq_base.dir/utils.cc.o.d -o CMakeFiles/mq_base.dir/utils.cc.o -c /Users/kaierlong/Documents/Codes/gitee/mindquantum/mindquantum/src/utils.cc In file included from /Users/kaierlong/Documents/Codes/gitee/mindquantum/mindquantum/src/utils.cc:17: /Users/kaierlong/Documents/Codes/gitee/mindquantum/mindquantum/src/core/utils.h:21:14: fatal error: 'omp.h' file not found # include ^~~~~~~ 1 error generated. make[3]: *** [mindquantum/src/CMakeFiles/mq_base.dir/utils.cc.o] Error 1 make[2]: *** [mindquantum/src/CMakeFiles/mq_base.dir/all] Error 2 make[1]: *** [mindquantum/src/CMakeFiles/mqbackend.dir/rule] Error 2 make: *** [mqbackend] Error 2 ---------------------- End building mindquantum.mqbackend ---------------------- Traceback (most recent call last): File "/Users/kaierlong/Documents/Codes/gitee/mindquantum/setup.py", line 368, in build_extension subprocess.check_call( File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'mqbackend', '--config', 'Release', '-j 12', '--']' returned non-zero exit status 2. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/kaierlong/Documents/Codes/gitee/mindquantum/setup.py", line 499, in setuptools.setup( File "/Users/kaierlong/Documents/PyEnv/env_mindspore_1.6.0/lib/python3.9/site-packages/setuptools/__init__.py", line 163, in setup return distutils.core.setup(**attrs) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/core.py", line 148, in setup dist.run_commands() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/Users/kaierlong/Documents/PyEnv/env_mindspore_1.6.0/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 290, in run self.run_command('build') File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/Users/kaierlong/Documents/PyEnv/env_mindspore_1.6.0/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 87, in run _build_ext.run(self) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/Users/kaierlong/Documents/Codes/gitee/mindquantum/setup.py", line 291, in build_extensions build_ext.build_extensions(self) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/Users/kaierlong/Documents/Codes/gitee/mindquantum/setup.py", line 374, in build_extension raise BuildFailed() from err __main__.BuildFailed ``` ## 6. 本文参考 - [brew](https://brew.sh/) - [mindspore](https://www.mindspore.cn/install) - [mindquantum](https://gitee.com/mindspore/mindquantum/tree/master/)
  • [安装经验] Jetson Nano源码编译MindSpore 1.6 GPU版编译的再次尝试(连载中)
    在各位大大的支持下,张小白带着一颗不服输的心,开始了继Nvidia Jetson Nano B01初体验(一)https://bbs.huaweicloud.com/blogs/330158Nvidia Jetson Nano B01初体验(二)https://bbs.huaweicloud.com/blogs/330177Nvidia Jetson Nano B01初体验(三)https://bbs.huaweicloud.com/blogs/330290之后的另一次尝试。
  • [安装经验] Jetson Nano源码编译MindSpore 1.6 GPU版编译记录
    本文摘自 原创作品:Nvidia Jetson Nano B01初体验(3)https://bbs.huaweicloud.com/blogs/330290(十七)再次探索:源码编译MindSpore 1.6.0当张小白向群里诉苦说,MindSpore没有GPU+aarch64版本的时候,小口鸟大大的哥哥月月鸟大大指出了一条康庄大道:尽管张小白一百万个不情愿,还是觉得跟着大大的方向是没错的。所以开玩笑归开玩笑,迈向披荆斩棘的道路中,总要遵循灯塔或者看一下交通标志灯。。先看了一下32G TF卡的剩余空间:还有13G可用,按理说应该是够了(够折腾的了)。那就开干吧。参考 https://bbs.huaweicloud.com/blogs/198349  张小白上次源码编译V1.0的时候还是在一年半前。查看编译所需的环境要求:(1)gcc 7.3.0当前是7.5.0,从张小白以前的经验来看,7.5应该也是可以的。(2)gmp 6.1.2 在上一篇博客中已经安装过了:此处不再赘述。(3)Python 3.9.0 这个进入vitualenv环境就可以了。我们可以另外创建一个给源码编译用。virtualenv -p ~/.pyenv/versions/3.9.0/bin/python mindspore-sourcesource ~/mindspore-source/bin/activate(4)cmake 大于3.18.3够了。(5)patch 大于2.5wget http://ftp.gnu.org/gnu/patch/patch-2.5.4.tar.gztar zxvf patch-2.5.4.tar.gz./configuremake -j8sudo make install -j8patch --version够了。(6)flex 大于2.5.35wget https://github.com/westes/flex/releases/download/v2.6.4/flex-2.6.4.tar.gztar zxvf flex-2.6.4.tar.gz./configuremake -j8coredump了!可能是交换分区不够,改为make (不带-j参数)编译试试。一样的结果。小技巧:如何给Nano增加虚拟内存:那看看如何增加虚拟内存吧!free -msudo fallocate -l 4G /var/swapfilesudo chmod 600 /var/swapfilesudo mkswap /var/swapfilesudo swapon /var/swapfilesudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'free -m再试一遍吧:make -j8sudo make install -j8一样的。换个方式呢?sudo apt-get install flexflex --version真是多此一举啊。。但是这个问题好奇怪,怎么会编译不过去呢?张小白百思不得其解。(7)wheel 大于0.32.0pip3 install wheel==0.32.0其实已经有了wheel.还原吧!pip3 install wheel==0.37.1(8)openssl 大于1.1.1wget https://github.com/openssl/openssl/archive/OpenSSL_1_1_1g.tar.gztar -zxvf OpenSSL_1_1_1g.tar.gz./config --prefix=/usr/local/opensslmake -j8sudo make install -j8编辑 ~/.bashrc,追加一行:export OPENSSL_ROOT_DIR=/usr/local/opensslsource ~/.bashrc 使其生效。当然需要再回到virtualenv环境:source ~/mindspore-source/bin/activate(9)NUMA大于2.0.11sudo apt-get install libnuma-dev(10)下载mindspore源码:git clone https://gitee.com/mindspore/mindspore.git -b r1.6(11)开始编译(遇到Python找不到的问题)bash build.sh -e gpu -j12好像是老问题了。换成virtualenv 3.7.5 的环境试试呢?deactivatevirtualenv -p ~/.pyenv/versions/3.7.5/bin/python mindspore-source375source ~/mindspore-source375/bin/activatebash build.sh -e gpu -j12好像依然如此。(12)源码编译Python3.7.5还是先源码编译 Python3.7.5吧!wget https://www.python.org/ftp/python/3.7.5/Python-3.7.5.tgztar -xzf Python-3.7.5.tgz进入Python-3.7.5目录后./configure --prefix=/usr/local/python3.7.5 --with-openssl=/usr/local/openssl --enable-sharedmake -j 12sudo make install -j 12export PATH=/usr/local/python3.7.5/bin:$HOME/.local/bin:$PATHpython -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())"python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))"(13)再次尝试解决Python找不到的问题张小白突然想起当年编译的时候遇到的坑:https://bbs.huaweicloud.com/forum/forum.php?mod=viewthread&tid=80520&ordertype=2&page=4唉,看来还是得这样操作才行。跟踪build.sh代码:是执行到了 ./scripts/build/build_mindspore.sh打开这个shell加上ENABLE_GITEE=ON参数,让其走gitee而不是github修改对应的CMakeLists.txt文件:强行让其不执行find_package.编辑 cmake/check_requirements.cmake(14)解决gmp找不到的问题再来一遍:bash build.sh -e gpu -j12终于没报python3.7.5找不到的错了,报了gmp找不到的错。那就装gmp吧。sudo apt-get install -y build-essential m4 libgmp-dev libmpfr-dev libmpc-dev(15)再次编译再来!bash build.sh -e gpu -j12终于开始编译了。。。。。。。。。。。。好像还在编译依赖库,耐心等待吧。毕竟以前github更慢。在编译flatbuffers包的时候,突然中断了:老革命遇到了新问题。。在build_mindspore.sh中增加参数试试:再次执行,好像还是一样的错:(16)源码安装gcc7.3.0CMAKE_CXX_COMPILER报错,张小白总感觉是gcc 7.5.0的编译器有问题,要不试一试 gcc7.3.0?df -h感觉32G的TF卡快不够用了。wget http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.gztar -xzf gcc-7.3.0.tar.gzcd gcc-7.3.0./configure --enable-checking=release --enable-languages=c,c++ --disable-multilib。。make -j 12 sudo make install -j 12先备份一下gcc7.5,再将软链接指向gcc7.3cd /usr/binsudo mv gcc gcc7.5sudo mv g++ g++7.5sudo ln -s /usr/local/bin/gcc gccsudo ln -s /usr/local/bin/g++ g++gcc --versiong++ --version这次编译完之后,df -h看来要清理一下,否则mindspore会没有足够的空间编译的。(17)再再次编译再来,因为前面有很多是用gcc 7.5.0编译的,现在全改为gcc 7.3.0编译吧,清空代码仓目录,重新下载重新编译吧!rm -rf ~/mindsporegit clone https://gitee.com/mindspore/mindspore.git -b r1.6vi ~/mindspore/scripts/build/build_mindspore.sh箭头所示的两行都加上-DENABLE_GITEE=ON,让它不要走恐怖的github。。bash build.sh -e gpu -j12。。。。。。。。。。。正当张小白满怀欣喜的准备收工的时候,突然:CUDA报错“No CMAKE_CUDA_COMPILER”度娘了一把,这个报错是PATH缺少cuda的路径造成的。然后查看了下现在的PATH:难道 .bashrc中定义的路径在编译过程中被去掉了?重新 source ~/.bashrc后,检查PATH含了 /usr/local/cuda/bin的路径后,重新编译:bash build.sh -e gpu -j12报了pybind11_add_module错。难道是pybind11没装?装下看看:sudo pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple/  pybind11并不是,pybind11已经有了。再试一次bash build.sh -e gpu -j12,结果跟前面一样(其实啥也没新装肯定是这样了)。再检查错误日志中提到的CMakeList.txt:难道不是缺pybind11,而是缺_ms_mpi的包?是因为openmpi没装吗?装一下试试:wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gztar -xvzf openmpi-4.0.3.tar.gzcd openmpi-4.0.3./configuremake -j12sudo make install -j12whereis openmpi再次编译:bash build.sh -e gpu -j12BUG依旧在,几度夕阳红。看了一下安装依赖:貌似就剩下NCCL没装了。感觉安装这个是没用的。暂时搞不定了。。。在论坛提了问题贴:https://bbs.huaweicloud.com/forum/thread-179083-1-1.html结果得到的专家答复如下:这个就很尴尬了。。。看来MindSpore团队也没空管我的问题。。。。唉。。。就这样吧。
  • [安装经验] Jetson Nano安装MindSpore 1.6 CPU版记录
    本文摘自 原创作品:Nvidia Jetson Nano B01初体验(2)  https://bbs.huaweicloud.com/blogs/330177 第16节(十六)探索:安装昇思尽管MindSpore没有CUDA10.2的aarch64版本。只有aarch的CPU版本:Python的版本也只有3.6.9。但是作为昇思开发者的张小白,仍然想在Nano的CPU上跑一下MindSpore试试。于是,张小白心想:应该先装anaconda for aarch64,这样才能让Nano上有Python3.7.5。打开 https://www.anaconda.com/products/individual复制ARM64的安装链接:wget https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-aarch64.sh./Anaconda3-2021.11-Linux-aarch64.sh按回车继续:输入yes,回车:按回车继续:好像不行,core dump了。含泪删除:rm -rf ~/anaconda3打开 https://docs.conda.io/en/latest/miniconda.html#linux-installers复制下链接:wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-aarch64.shbash ./Miniconda3-py37_4.10.3-Linux-aarch64.sh好像是一样的coredump,看来conda是不行了。再次含泪删除:rm -rf ~/miniconda3。再试下 pyenv和virtualenv吧,原理上:使用 pyenv 可以安装任何版本的 Python,用 virtualenv 创建虚拟环境时指定需要的 Python 版本路径,这样就可以创建任何版本的虚拟环境。sudo -H pip3 install virtualenv virtualenvwrapper -i https://pypi.tuna.tsinghua.edu.cn/simple/mkdir ~/.virtualenvsvi ~/.bashrcsource ~/.bashrc安装pyenvcurl -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bashgit clone https://github.com/yyuu/pyenv.git ~/.pyenv在~/.bashrc加入以下内容:source ~/.bashrc 使其生效查看Python可用版本:pyenv install --list。。。貌似有非常多。用pyenv装个python3.7.5试试:pyenv install 3.7.5感觉下载要蛮长的时间,耐心等待吧。报缺少ssl库。sudo apt-get install openssl libssl-dev按Y继续:再来一遍:pyenv install 3.7.5可以到 /home/zhanghui/.pyenv/versions/3.7.5 目录下看到Python的版本:刷新pyenv并查看现在的版本:pyenv rehashpyenv versions创建Python 3.7.5的mindspore运行环境(命名为mindspore-py375):virtualenv -p ~/.pyenv/versions/3.7.5/bin/python mindspore-py375它会在当前目录下建一个mindspore-py375目录:source ~/mindspore-py375/bin/activate这就切换到了Python 3.7.5的环境。安装mindspore cpu(aarch64) 1.6.0版本:pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.0/MindSpore/cpu/aarch64/mindspore-1.6.0-cp37-cp37m-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple报错了。先安装gmp6.1.2wget https://gmplib.org/download/gmp/gmp-6.1.2.tar.xzxz -d gmp-6.1.2.tar.xztar -xvf gmp-6.1.2.tarcd gmp-6.1.2./configure...makesudo make install升级下pip试试:python -m pip install --upgrade pip去掉清华源试试:pip3 install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.0/MindSpore/cpu/aarch64/mindspore-1.6.0-cp37-cp37m-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com安装速度要慢一点,耐心等待吧。pip3 install --upgrade setuptoolspip3 install pip==21.3.1sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl..wget https://github.com/Kitware/CMake/releases/download/v3.18.5/cmake-3.18.5.tar.gz tar -zxvf cmake-3.18.5.tar.gzcd cmake-3.18.5./bootstrap...make -j8...sudo make install -j8cmake --version发现 /usr/bin和/usr/local/bin下都有cmake。将/usr/bin下的cmake改名,重试:pip3  install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.0/MindSpore/cpu/aarch64/mindspore-1.6.0-cp37-cp37m-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple好像还是不行。要不还是换成Python 3.9.0再试一次退出mindspore-py375环境deactivatepyenv install 3.9.0pyenv rehashpyenv versionsvirtualenv -p ~/.pyenv/versions/3.9.0/bin/python mindspore-py390source ~/mindspore-py390/bin/activatepip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.6.0/MindSpore/cpu/aarch64/mindspore-1.6.0-cp39-cp39-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple这个居然安装成功了!做个测试看看:python -c "import mindspore;mindspore.run_check()"好了,其实CPU也没啥意义。只是张小白不服输而已。要是GPU+aarch64版本能用才是真的有用。