• [安装] 【Mindspore-GPU】【编译过程】使用源码编译过程中报错
    【功能模块】我使用源码编译的时候提示如图所示的错误,我是Ubuntu20.04版本,CUDA11.0,cuDNN8.0【操作步骤&问题现象】1、我使用源码编译的时候提示如图所示的错误,我是Ubuntu20.04版本,CUDA11.0,cuDNN8.0,然后大概到78%的时候显示make【】error,想问一下能不能解决,怎么解决【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [执行问题] 【MindSpore】【GPU训练】TypeError: 'NoneType' is not callable
    【功能模块】使用GPU训练Faster rcnn模型【操作步骤&问题现象】1、使用GPU训练Faster rcnn模型时出现如下错误【截图信息】日志链接:https://paste.ubuntu.com/p/GVyhCCwD3m/【日志信息】(可选,上传日志内容或者附件)
  • [分布式] docker1.2.1-gpu报错:Failed to init nccl communicator for group
    【功能模块】from mindspore.communication.management import initfrom mindspore.context import ParallelMode【操作步骤&问题现象】1、docker-1.2.1-gpu 版本下运行分布式代码:from mindspore import context from mindspore.communication.management import init context.set_context(mode=context.GRAPH_MODE, device_target="GPU") init() import numpy as np from mindspore import dataset as ds from mindspore import nn, Tensor, Model import time from mindspore.train.callback import Callback, LossMonitor, ModelCheckpoint, CheckpointConfig from mindspore.context import ParallelMode import mindspore as ms ms.common.set_seed(0) start_time = time.time() def get_data(num, a=2.0, b=3.0, c=5.0): for _ in range(num): x = np.random.uniform(-1.0, 1.0) y = np.random.uniform(-1.0, 1.0) noise = np.random.normal(0, 0.03) z = a * x ** 2 + b * y ** 3 + c + noise yield np.array([[x**2], [y**3]],dtype=np.float32).reshape(1,2), np.array([z]).astype(np.float32) def create_dataset(num_data, batch_size=16, repeat_size=1): input_data = ds.GeneratorDataset(list(get_data(num_data)), column_names=['xy','z']) input_data = input_data.batch(batch_size) input_data = input_data.repeat(repeat_size) return input_data data_number = 1600 batch_number = 64 repeat_number = 20 context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL) ds_train = create_dataset(data_number, batch_size=batch_number, repeat_size=repeat_number) dict_datasets = next(ds_train.create_dict_iterator()) class LinearNet(nn.Cell): def __init__(self): super(LinearNet, self).__init__() self.fc = nn.Dense(2, 1, 0.02, 0.02) def construct(self, x): x = self.fc(x) return x net = LinearNet() model_params = net.trainable_params() print ('Param Shape is: {}'.format(len(model_params))) for net_param in net.trainable_params(): print(net_param, net_param.asnumpy()) net_loss = nn.loss.MSELoss() optim = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.6) ckpt_config = CheckpointConfig() ckpt_callback = ModelCheckpoint(prefix='data_parallel', config=ckpt_config) model = Model(net, net_loss, optim) epoch = 1000 #model.train(epoch, ds_train, dataset_sink_mode=True) #model.train(epoch, ds_train, callbacks=[ckpt_callback], dataset_sink_mode=True) model.train(epoch, ds_train, callbacks=[LossMonitor(500)], dataset_sink_mode=True) for net_param in net.trainable_params(): print(net_param, net_param.asnumpy()) print ('The total time cost is: {}s'.format(time.time() - start_time))代码原地址:https://www.cnblogs.com/dechinphy/p/dms.html2、运行命令:mpirun -n 4 python ./test_nonlinear.py软硬件:运行环境:硬件:Intel CPU, 4卡泰坦软件:Ubuntu18.04宿主机,docker容器运行MindSpore-gpu-1.2.1-docker版本【截图信息】【日志信息】(可选,上传日志内容或者附件)Failed to init nccl communicator for group init nccl communicator for group nccl_world_group
  • [技术干货] 鲲鹏服务器安装NV显卡报Unable to determine the device handle for GPU错误解决方法
    【问题描述】鲲鹏服务器安装NVIDIA 2080 Ti显卡,安装NVIDIA-Linux-aarch64-460.73.01.run驱动后执行nvidia-smi 后报Unable to determine the device handle for GPU 0000:82:00.0: Unknown Error错误【定位过程】从上面到报错信息无法看出问题原因,浏览器搜索也没有找到详细原因;安装cuda_11.1.0_455.23.05_linux_sbsa.run驱动自带到455.23.05驱动,再次执行nvidia-smi报如下错误:Unable to determine the device handle for GPU 0000:82:00.0: Unable to communicate with GPU because it is insufficiently powered. This may be because not all required external power cables are attached, or the attached cables are not seated properly.从报错信息可以看出原因是NVIDIA卡供电不足导致。【解决方法】下电鲲鹏服务器,根据GPU卡电源接口情况连接GPU电源线到Riser卡,然后重新上电安装NVIDIA-Linux-aarch64-460.73.01.run驱动后,此时执行nvidia-smi即可正常显示:注:2080 Ti卡及30系到NV显卡都需要连接GPU电源线才能让GPU正常工作。
  • [安装] gpu版本安装验证错误
    >>> import numpy as np>>> from mindspore import Tensor>>> import mindspore.ops as ops>>> import mindspore.context as context>>> >>> context.set_context(device_target="GPU")>>> x = Tensor(np.ones([1,3,3,4]).astype(np.float32))>>> y = Tensor(np.ones([1,3,3,4]).astype(np.float32))>>> print(ops.add(x, y))[[[[0. 0. 0. 0.]   [0. 0. 0. 0.]   [0. 0. 0. 0.]]  [[0. 0. 0. 0.]   [0. 0. 0. 0.]   [0. 0. 0. 0.]]  [[0. 0. 0. 0.]   [0. 0. 0. 0.]   [0. 0. 0. 0.]]]]>>> 环境是ubuntu18.04,cuda10.1,显卡是Geforce Titan X
  • [开发环境] 【开发环境】【类型】没有免费的CPU、GPU选项
    【功能模块】ModelArts 开发环境 创建Notebook实例时没有免费的CPU、GPU可选。【操作步骤&问题现象】1、没有创建Notebook实例2、但是没有免费的CPU、GPU可选【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [其他问题] “华为云杯”2021人工智能应用创新大赛的测评机器是CPU还是GPU运行的?
    “华为云杯”2021人工智能应用创新大赛的测评机器是CPU还是GPU运行的?
  • [技术干货] 基于Docker安装的MindSpore-1.2 GPU版本
    技术背景在前面一篇博客中,我们介绍过MindSpore-CPU版本的Docker部署以及简单的案例测试,当时官方还不支持GPU版本的Docker容器化部署。经过MindSpore团队的努力,1.2.0版本的MindSpore-GPU终于推出了Docker版本的安装解决方案:在本文中我们将针对这一方案进行直接的测试,并补充其中一些很有可能被忽略的细节,接下来直接上手。更换华为云镜像源在华为云官方提供的镜像源仓库中找到适配自己系统的源,然后按照其中的指导进行配置。这里我们本地使用的是Ubuntu 20.04版本,可以按照如下的方法更新apt的源:root@ubuntu2004:~# cp -a /etc/apt/sources.list /etc/apt/sources.list.bakroot@ubuntu2004:~# sed -i "s@http://.*archive.ubuntu.com@http://repo.huaweicloud.com@g" /etc/apt/sources.listroot@ubuntu2004:~# sed -i "s@http://.*security.ubuntu.com@http://repo.huaweicloud.com@g" /etc/apt/sources.listroot@ubuntu2004:~# apt-get update更新镜像源会需要一定的时间,等待即可,这一步一般不会出什么问题。MindSpore-GPU-Docker的安装这里可以参考MindSpore官方的指导文档一步步的进行操作,其中遇到一些非常规问题时我们再看看解决的策略:root@ubuntu2004:~# DISTRIBUTION=$(. /etc/os-release; echo $ID$VERSION_ID)root@ubuntu2004:~# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -gpg: 找不到有效的 OpenPGP 数据。第二步的操作时非常容易出问题的地方,因为本地的主机列表无法解析这个链接的ip地址。一开始我跟参考链接1的作者类似的,以为是需要上Google才能够解决此类的问题,但是后来尝试了一下参考链接1中的解决方案,发现是可以生效的,这里直接演示解决的方法:root@ubuntu2004:~# vi /etc/hosts # 在文档的最后面加上下面的四行ip地址与域名相对照的列表root@ubuntu2004:~# cat /etc/hosts | grep nvidia # 查询修改情况185.199.108.153 nvidia.github.io185.199.109.153 nvidia.github.io185.199.110.153 nvidia.github.io185.199.111.153 nvidia.github.io经过上述的简单配置之后,继续MindSpore-GPU-Docker的安装步骤:root@ubuntu2004:~# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -OKroot@ubuntu2004:~# curl -s -L https://nvidia.github.io/nvidia-docker/$DISTRIBUTION/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.listdeb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /root@ubuntu2004:~# apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-docker2root@ubuntu2004:~# systemctl restart docker到这里所需要的依赖就已经安装完成了,最后还有一步是需要修改docker的配置文件,使得MindSpore可以使用Docker的nvidia-container-runtime:root@ubuntu2004:~# vi /etc/docker/daemon.json # 修改成如下所示的配置root@ubuntu2004:~# cat /etc/docker/daemon.json {    "runtimes": {        "nvidia": {            "path": "nvidia-container-runtime",            "runtimeArgs": []        }    }} root@ubuntu2004:~# systemctl daemon-reload # 重新加载配置root@ubuntu2004:~# systemctl restart docker # 重启Docker上述配置都完成之后,终于到了最后一步,使用Docker来拉取MindSpore的官方镜像:root@ubuntu2004:~# docker pull swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu:1.2.01.2.0: Pulling from mindspore/mindspore-gpu6e0aa5e7af40: Pull complete d47239a868b3: Pull complete 49cbb10cca85: Pull complete 4450dd082e0f: Pull complete b4bc5dc4c4f3: Pull complete 5353957e2ca6: Pull complete f91e05a16062: Pull complete 7a841761f52f: Pull complete 698198ce2872: Pull complete 05a2da03249e: Pull complete b1761864f72a: Pull complete 29479e68065f: Pull complete 4bf6be16ed12: Pull complete c429d95fc15b: Pull complete 48c41c211021: Pull complete 349cae3c1ede: Pull complete 768237cdcd4d: Pull complete 2fd2faf6c353: Pull complete 268f4496a02c: Pull complete e962d4980323: Pull complete f1d280968a28: Pull complete bc3e02707e81: Pull complete Digest: sha256:3318c68d63cfe110e85d7ed93398b308f8458624dc96aad9a4d31bc6d345daa7Status: Downloaded newer image for swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu:1.2.0swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu:1.2.0关于Docker这里要多说两点:Docker在Ubuntu20.04上面的安装不是apt-get install docker,而是apt-get install docker.io关于更多Docker的使用示例,可以参考这些以往的博客(博客1,博客2,博客3,博客4,博客5)。MindSpore-GPU的测试测试用例同样也来自于MindSpore的官方文档,这里只是额外补充了一个本地的目录映射,将测试目录/home/dechin/projects/mindspore/test/映射为MindSpore容器中的/home目录,这样在容器内操作所导致的文件变更都会在本地测试目录下同步。需要注意的是,这里的目录映射只能采用绝对路径而不能采用相对路径:dechin@ubuntu2004:~/projects/mindspore/test$ sudo docker run -it -v /dev/shm:/dev/shm -v /home/dechin/projects/mindspore/test/:/home --runtime=nvidia --privileged=true swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu:1.2.0 /bin/bash[sudo] dechin 的密码: root@0b44a5a66fca:/# cd /home/root@0b44a5a66fca:/home# vim mindspore_test.py # python文件的具体内容在后面root@0b44a5a66fca:/home# python mindspore_test.py [[[[2. 2. 2. 2.]   [2. 2. 2. 2.]   [2. 2. 2. 2.]]   [[2. 2. 2. 2.]   [2. 2. 2. 2.]   [2. 2. 2. 2.]]   [[2. 2. 2. 2.]   [2. 2. 2. 2.]   [2. 2. 2. 2.]]]]如下所示是刚才在容器中用于测试的python代码:# mindspore_test.py import numpy as npimport mindspore.context as contextimport mindspore.ops as opsfrom mindspore import Tensor context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU") x = Tensor(np.ones([1,3,3,4]).astype(np.float32))y = Tensor(np.ones([1,3,3,4]).astype(np.float32))print(ops.add(x, y))我们可以看到最终是成功的运行了,说明MindSpore-GPU的Docker容器化环境部署成功。总结概要继上一篇文章介绍了MindSpore的CPU版本的Docker容器化部署之后,MindSpore官方团队推出了MindSpore的GPU版本的Docker容器化部署方案,本文针对这一方案进行了安装测试,并且对于其中一些安装的时候可以遇到的问题的细节进行了处理。之所以采用容器化的解决方案,主要是为了做到SDK环境与编程环境的隔离,释放本地环境配置与部署的压力。当然,也使得本地的开发环境更加的“干净”。————————————————原文链接:https://blog.csdn.net/baidu_37157624/article/details/117315758
  • [安装] docker环境下编译mindspore1.2-gpu失败
    【功能模块】MindSpore【操作步骤&问题现象】用docker拉取预装了cuda和cudnn的镜像,在该镜像容器中根据官方环境配置教程走了一遍,然后下载mindspore1.2源代码,运行"bash build.sh -e gpu -S on"编译gpu版mindspore,在编译openmpi时出错                    【截图信息】【日志信息】见附件
  • [安装] 使用docker安装gpu版 运行找不到设备
    【功能模块】【操作步骤&问题现象】1、cudaSetDevice failed, ret[100], no CUDA-capable device is detected2、 GPUSession failed to set current device id.【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [问题求助] 在arm云上部署测试,要安装测试GPU需要安装anaconda和pytorch,这两个软件咱们适配没呢?
    在arm云上部署测试,要安装测试GPU需要安装anaconda和pytorch,这两个软件咱们适配没呢?
  • [典型案例] GPU虚拟机在FC界面添加GPU资源组报错“虚拟机操作系统不支持虚拟化GPU设备”
    【故障类型】:局点GPU虚拟机在FC界面添加GPU资源组报错“虚拟机操作系统不支持虚拟化GPU设备”【使用版本】:通用【案例作者】:【关 键 词】:虚拟机操作系统不支持虚拟化GPU设备【问题现象】:局点GPU虚拟机在FC界面添加GPU资源组报错“虚拟机操作系统不支持虚拟化GPU设备”【告警信息】:【问题分析】:1、局点反馈在FC界面添加GPU资源组报错“虚拟机操作系统不支持虚拟化GPU设备”。截图如下:2、检查操作系统版本,操作系统在兼容性列表,排除虚拟机系统不兼容问题。3、检查虚拟机OS类型,发现如下所示:Windows 10 Enterprise 2019 LTSC 64bit4、让其讲操作系统版本号改为非“Windows 10 Enterprise 2019 LTSC 64bit”之后添加正常。例如改为:【解决方案】:通过上述排查,因为FC平台在GPU虚拟机操作系统类型为“Windows 10 Enterprise 2019 LTSC 64bit”会在添加GPU资源组失败,换成别的操作系统类型之后可以正常添加。【总结&建议】:无
  • [其他问题] 【众智】【PR门禁】PR门禁时,Smoke_GPU中执行test_hsigmoid_dynamic_float32出错
    【功能模块】Ascend计算算子HSigmoid和HSigmoidGrad进行PR门禁时,Smoke_GPU执行test_hsigmoid_dynamic_float32中的generate_dynamic_testcase(np.float32)时报错【操作步骤&问题现象】1、Smoke_Ascend是通过的,所以不知道该问题是否是代码错误引起2、会不会因为PR提交之后主库更新,未与主库同步有关【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [问题求助] 哪些国产品牌服务器 ARM架构支持GPU呢?
    国产服务器  支持GPU的服务器有哪些呢?ARM架构的服务器。
  • [安装] 安装mindspore gpu1.2.0 时出错
    请问,在安装mindspore_gpu1.2.0版本时运行验证程序,出现如下错误会是什么原因?(mindspore_GPU) lzh@lzh-HP-288-Pro-G4-MT-Business-PC:~$ pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.2.0/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.2.0-cp37-cp37m-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simpleLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleCollecting mindspore-gpu==1.2.0  Using cached https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.2.0/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.2.0-cp37-cp37m-linux_x86_64.whl (145.8 MB)Requirement already satisfied: sympy>=1.4 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (1.8)Requirement already satisfied: astunparse>=1.6.3 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (1.6.3)Requirement already satisfied: protobuf>=3.8.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (3.13.0)Requirement already satisfied: setuptools>=40.8.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (52.0.0.post20210125)Requirement already satisfied: psutil>=5.6.1 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (5.8.0)Requirement already satisfied: cffi>=1.12.3 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (1.14.5)Requirement already satisfied: pillow>=6.2.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (8.2.0)Requirement already satisfied: numpy>=1.17.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (1.20.3)Requirement already satisfied: wheel>=0.32.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (0.36.2)Requirement already satisfied: scipy>=1.5.2 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (1.6.3)Requirement already satisfied: easydict>=1.9 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (1.9)Requirement already satisfied: asttokens>=1.1.13 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (2.0.5)Requirement already satisfied: packaging>=20.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (20.9)Requirement already satisfied: decorator>=4.4.0 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from mindspore-gpu==1.2.0) (4.4.2)Requirement already satisfied: six in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from asttokens>=1.1.13->mindspore-gpu==1.2.0) (1.16.0)Requirement already satisfied: pycparser in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from cffi>=1.12.3->mindspore-gpu==1.2.0) (2.20)Requirement already satisfied: pyparsing>=2.0.2 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from packaging>=20.0->mindspore-gpu==1.2.0) (2.4.7)Requirement already satisfied: mpmath>=0.19 in ./anaconda3/envs/mindspore/lib/python3.7/site-packages (from sympy>=1.4->mindspore-gpu==1.2.0) (1.2.1)(mindspore_GPU) lzh@lzh-HP-288-Pro-G4-MT-Business-PC:~$ cd /home/lzh/下载/mindvision(mindspore_GPU) lzh@lzh-HP-288-Pro-G4-MT-Business-PC:~/下载/mindvision$ pythonPython 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0] :: Anaconda, Inc. on linuxType "help", "copyright", "credits" or "license" for more information.>>> import numpy as np>>> from mindspore import Tensor>>> import mindspore.ops as ops>>> import mindspore.context as context>>> context.set_context(device_target="GPU")>>> x = Tensor(np.ones([1,3,3,4]).astype(np.float32))>>> y = Tensor(np.ones([1,3,3,4]).astype(np.float32))>>> print(ops.add(x, y))[ERROR] PYNATIVE(5956,python):2021-06-04-15:29:11.009.871 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:1493] RunOpInMs] : The pointer[session] is null.Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/home/lzh/anaconda3/envs/mindspore/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 188, in __call__    return _run_op(self, self.name, args)  File "/home/lzh/anaconda3/envs/mindspore/lib/python3.7/site-packages/mindspore/common/api.py", line 75, in wrapper    results = fn(*arg, **kwargs)  File "/home/lzh/anaconda3/envs/mindspore/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 556, in _run_op    output = real_run_op(obj, op_name, args)RuntimeError: mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:1493 RunOpInMs] : The pointer[session] is null.