-
【获奖名单】序号项目编号学校团队名称作品名称奖项1480719上海交通大学全都对队基于昇腾NPU的训推一体加速优化方案擂主(特等奖第一名)2514025暨南大学想去研究大模型智模昇算——基于全自主技术栈软硬协同的大模型系统调优方案特等奖3474858杭州电子科技大学CEATRG推理大模型的训练调优与性能加速特等奖4512717哈尔滨工业大学纳算力克大工坊面向昇腾平台的大语言模型推理调优与性能加速实践一等奖5515501北京邮电大学BUPT-ParCIS推理大模型的训练调优与性能加速的协同优化一等奖6473176华中科技大学二进制萝卜培育中心大模型参数高效微调与推理引擎加速一等奖7518313淮阴工学院智在必得基于Ascend的端到端推理优化大模型方案一等奖8480007哈尔滨医科大学璃月医科大学孤云阁校区基于PPO+华为昇腾的推理模型加速系统二等奖9475128华南师范大学试试推理大模型的训练调优与性能加速创新方案二等奖10474329华东师范大学ECNU_ELRM基于国产化推理大模型高效训推技术二等奖11512943西安交通大学西北智联推理大模型的训练调优与性能加速助力全栈自主 AI二等奖12511717湖北工业大学TEMP基于全栈自主AI的大模型训练调优与应用二等奖13471760华中科技大学拳头花可火基于GRPO强化学习,知识蒸馏和多算子融合的推理大模型综合调优与加速技术二等奖14519134复旦大学CodeWisdomAdaMind三等奖15479466中国计量大学智枢拓界量衡昇算:赋能极致推理的大模型加速引擎三等奖16513186天津理工大学重生之我在昇腾摸鱼基于AscendC加速大模型推理三等奖17472861中国科学技术大学点子王面向自主大模型推理增强的训练调优与性能加速方案三等奖18526004桂林电子科技大学Create推理大模型的训练调优与性能加速助力全栈自主 AI三等奖19517087西北工业大学NPU-IIL-AscendMindAscendMind:基于昇腾AI平台的轻量化推理优化模型三等奖20473984台州科技职业学院永宁永胜基于知识蒸馏的大模型训练调优和性能加速三等奖21470775北京邮电大学bupt735基于Qwen的电子电路实验虚拟助教三等奖22508583桂林电子科技大学ken推理大模型的训练调优与性能加速助力全栈自主AI优胜奖23508208同济大学三角矩阵基于华为全栈AI技术生态的推理大模型性能优化研究优胜奖24529681桂林电子科技大学好想要MatebookFold队基于华为AI技术的推理大模型的训练调优与性能加速优胜奖25521477武汉船舶职业技术学院破晓者推理大模型的训练调优与性能加速助力全栈自主AI优胜奖26506591南昌大学马桶蹲累了基于昇腾CANN的轻量级大模型推理增强与性能加速研究优胜奖27503585福州外语外贸学院昇腾芯链昇腾芯链:轻量级推理模型的蒸馏优化与端侧加速优胜奖28473846广东机电职业技术学院昇腾智推大模型昇腾智推大模型优胜奖29471801广东石油化工学院[object Object]推理大模型的训练调优与性能加速助力全 栈自主 AI 方案优胜奖30524083华东理工大学华东理工大学AIMC实验室面向全栈自主AI的大规模预训练模型训练调优与推理加速方法研究优胜奖31518595桂林电子科技大学这对吗推理大模型的训练调优与性能加速助力全栈自主AI优胜奖32526548桂林理工大学RookieRush基于昇腾 AI 的轻量级推理大模型训练调优与性能加速方案优胜奖33518067华东师范大学lab308推理大模型的训练调优与性能加速助力全栈自主AI优胜奖34524572中国矿业大学徐海学院远帆“推理大模型的训练调优与性能加速助力全栈自主 AI”优胜奖备注:2025年10月20日至2025年11月20日为拟授奖作品公示期。
-
直播回放链接:cid:link_0
-
Ascend310部署Qwen-VL-7B实现吸烟动作识别OrangePi AI Studio Pro是基于2个昇腾310P处理器的新一代高性能推理解析卡,提供基础通用算力+超强AI算力,整合了训练和推理的全部底层软件栈,实现训推一体。其中AI半精度FP16算力约为176TFLOPS,整数Int8精度可达352TOPS,本文将带领大家在Ascend 310P上部署Qwen2.5-VL-7B多模态理解大模型实现吸烟动作的识别。一、环境配置我们在OrangePi AI Stuido上使用Docker容器部署MindIE:docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-300I-Duo-py311-openeuler24.03-ltsroot@orangepi:~# docker images REPOSITORY TAG IMAGE ID CREATED SIZE swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 2.1.RC1-300I-Duo-py311-openeuler24.03-lts 0574b8d4403f 3 months ago 20.4GB langgenius/dify-web 1.0.1 b2b7363571c2 8 months ago 475MB langgenius/dify-api 1.0.1 3dd892f50a2d 8 months ago 2.14GB langgenius/dify-plugin-daemon 0.0.4-local 3f180f39bfbe 8 months ago 1.35GB ubuntu/squid latest dae40da440fe 8 months ago 243MB postgres 15-alpine afbf3abf6aeb 8 months ago 273MB nginx latest b52e0b094bc0 9 months ago 192MB swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 1.0.0-300I-Duo-py311-openeuler24.03-lts 74a5b9615370 10 months ago 17.5GB redis 6-alpine 6dd588768b9b 10 months ago 30.2MB langgenius/dify-sandbox 0.2.10 4328059557e8 13 months ago 567MB semitechnologies/weaviate 1.19.0 8ec9f084ab23 2 years ago 52.5MB之后创建一个名为start-docker.sh的启动脚本,内容如下:NAME=$1 if [ $# -ne 1 ]; then echo "warning: need input container name.Use default: mindie" NAME=mindie fi docker run --name ${NAME} -it -d --net=host --shm-size=500g \ --privileged=true \ -w /usr/local/Ascend/atb-models \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --entrypoint=bash \ -v /models:/models \ -v /data:/data \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/sbin:/usr/local/sbin \ -v /home:/home \ -v /tmp:/tmp \ -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \ -e http_proxy=$http_proxy \ -e https_proxy=$https_proxy \ -e "PATH=/usr/local/python3.11.6/bin:$PATH" \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-300I-Duo-py311-openeuler24.03-ltsbash start-docker.sh启动容器后,我们需要替换几个文件并安装Ascend-cann-nnal软件包:root@orangepi:~# docker exec -it mindie bash Welcome to 5.15.0-126-generic System information as of time: Sat Nov 15 22:06:48 CST 2025 System load: 1.87 Memory used: 6.3% Swap used: 0.0% Usage On: 33% Users online: 0 [root@orangepi atb-models]# cd /usr/local/Ascend/ascend-toolkit/8.2.RC1/lib64/ [root@orangepi lib64]# ls /data/fix_openeuler_docker/fixhccl/8.2hccl/ libhccl.so libhccl_alg.so libhccl_heterog.so libhccl_plf.so [root@orangepi lib64]# cp /data/fix_openeuler_docker/fixhccl/8.2hccl/* ./ cp: overwrite './libhccl.so'? cp: overwrite './libhccl_alg.so'? cp: overwrite './libhccl_heterog.so'? cp: overwrite './libhccl_plf.so'? [root@orangepi lib64]# source /usr/local/Ascend/ascend-toolkit/set_env.sh [root@orangepi lib64]# chmod +x /data/fix_openeuler_docker/Ascend-cann-nnal/Ascend-cann-nnal_8.3.RC1_linux-x86_64.run [root@orangepi lib64]# /data/fix_openeuler_docker/Ascend-cann-nnal/Ascend-cann-nnal_8.3.RC1_linux-x86_64.run --install --quiet [NNAL] [20251115-22:41:45] [INFO] LogFile:/var/log/ascend_seclog/ascend_nnal_install.log [NNAL] [20251115-22:41:45] [INFO] Ascend-cann-atb_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 start WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv [NNAL] [20251115-22:41:58] [INFO] Ascend-cann-atb_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 install success [NNAL] [20251115-22:41:58] [INFO] Ascend-cann-SIP_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 start [NNAL] [20251115-22:41:59] [INFO] Ascend-cann-SIP_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 install success [NNAL] [20251115-22:41:59] [INFO] Ascend-cann-nnal_8.3.RC1_linux-x86_64.run install success Warning!!! If the environment variables of atb and asdsip are set at the same time, unexpected consequences will occur. Import the corresponding environment variables based on the usage scenarios: atb for large model scenarios, asdsip for embedded scenarios. Please make sure that the environment variables have been configured. If you want to use atb module: - To take effect for current user, you can exec command below: source /usr/local/Ascend/nnal/atb/set_env.sh or add "source /usr/local/Ascend/nnal/atb/set_env.sh" to ~/.bashrc. If you want to use asdsip module: - To take effect for current user, you can exec command below: source /usr/local/Ascend/nnal/asdsip/set_env.sh or add "source /usr/local/Ascend/nnal/asdsip/set_env.sh" to ~/.bashrc. [root@orangepi lib64]# cat /usr/local/Ascend/nnal/atb/latest/version.info Ascend-cann-atb : 8.3.RC1 Ascend-cann-atb Version : 8.3.RC1.B106 Platform : x86_64 branch : 8.3.rc1-0702 commit id : 16004f23040e0dcdd3cf0c64ecf36622487038ba修改推理使用的逻辑NPU核心为0,1,测试多模态理解大模型:Qwen2.5-VL-7B-Instruct:运行结果表明,Qwen2.5-VL-7B-Instruct在2 x Ascned 310P上推理平均每秒可以输出20个tokens,同时准确理解画面中的人物信息和行为动作。[root@orangepi atb-models]# bash examples/models/qwen2_vl/run_pa.sh --model_path /models/Qwen2.5-VL-7B-Instruct/ --input_image /root/pic/test.jpg [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] ***************************************** [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] ***************************************** /usr/local/lib64/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( /usr/local/lib64/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 2025-11-15 22:12:53.250 7934 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llmmodels] [loglevel] [file:line] [status code] msg 2025-11-15 22:12:53.250 7933 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llmmodels] [loglevel] [file:line] [status code] msg [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] llama_LlamaDecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] llama_LlamaDecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:55,335] [7934] [139886327420160] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 1, device_id: 1, numa_id: 0, shard_devices: [0, 1], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-11-15 22:12:55,336] [7934] [139886327420160] [llmmodels] [INFO] [cpu_binding.py-280] : process 7934, new_affinity is [8, 9, 10, 11, 12, 13, 14, 15], cpu count 8 [2025-11-15 22:12:55,356] [7933] [139649439929600] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 0, device_id: 0, numa_id: 0, shard_devices: [0, 1], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-11-15 22:12:55,357] [7933] [139649439929600] [llmmodels] [INFO] [cpu_binding.py-280] : process 7933, new_affinity is [0, 1, 2, 3, 4, 5, 6, 7], cpu count 8 [2025-11-15 22:12:56,032] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-156] : model_runner.quantize: None, model_runner.kv_quant_type: None, model_runner.fa_quant_type: None, model_runner.dtype: torch.float16 [2025-11-15 22:13:01,826] [7933] [139649439929600] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-11-15 22:13:01,827] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-187] : init tokenizer done Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [2025-11-15 22:13:02,070] [7934] [139886327420160] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [W InferFormat.cpp:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator()) [W InferFormat.cpp:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator()) [2025-11-15 22:13:08,435] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-153] : >>>> qwen_QwenDecoderModel is called. [2025-11-15 22:13:08,526] [7934] [139886327420160] [llmmodels] [INFO] [flash_causal_qwen2.py-153] : >>>> qwen_QwenDecoderModel is called. [2025-11-15 22:13:16.666] [7933] [139649439929600] [llmmodels] [WARN] [operation_factory.cpp:42] OperationName: TransdataOperation not find in operation factory map [2025-11-15 22:13:16.698] [7934] [139886327420160] [llmmodels] [WARN] [operation_factory.cpp:42] OperationName: TransdataOperation not find in operation factory map [2025-11-15 22:13:22,379] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-282] : model: FlashQwen2vlForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (vision_tower): Qwen25VisionTransformerPretrainedModelATB( (encoder): Qwen25VLVisionEncoderATB( (layers): ModuleList( (0-31): 32 x Qwen25VLVisionLayerATB( (attn): VisionAttention( (qkv): TensorParallelColumnLinear( (linear): FastLinear() ) (proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): VisionMlp( (gate_up_proj): TensorParallelColumnLinear( (linear): FastLinear() ) (down_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (norm1): BaseRMSNorm() (norm2): BaseRMSNorm() ) ) (patch_embed): FastPatchEmbed( (proj): TensorReplicatedLinear( (linear): FastLinear() ) ) (patch_merger): PatchMerger( (patch_merger_mlp_0): TensorParallelColumnLinear( (linear): FastLinear() ) (patch_merger_mlp_2): TensorParallelRowLinear( (linear): FastLinear() ) (patch_merger_ln_q): BaseRMSNorm() ) ) (rotary_pos_emb): VisionRotaryEmbedding() ) (language_model): FlashQwen2UsingMROPEForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (transformer): FlashQwenModel( (wte): TensorEmbeddingWithoutChecking() (h): ModuleList( (0-27): 28 x FlashQwenLayer( (attn): FlashQwenAttention( (rotary_emb): PositionRotaryEmbedding() (c_attn): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): QwenMLP( (act): SiLU() (w2_w1): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (ln_1): QwenRMSNorm() (ln_2): QwenRMSNorm() ) ) (ln_f): QwenRMSNorm() ) (lm_head): TensorParallelHead( (linear): FastLinear() ) ) ) [2025-11-15 22:13:24,268] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-134] : hbm_capacity(GB): 87.5078125, init_memory(GB): 11.376015624962747 [2025-11-15 22:13:24,789] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-342] : pa_runner: PARunner(model_path=/models/Qwen2.5-VL-7B-Instruct/, input_text=请用超过500个字详细说明图片的内容,并仔细判断画面中的人物是否有吸烟动作。, max_position_embeddings=None, max_input_length=16384, max_output_length=1024, max_prefill_tokens=-1, load_tokenizer=True, enable_atb_torch=False, max_prefill_batch_size=None, max_batch_size=1, dtype=torch.float16, block_size=128, model_config=ModelConfig(num_heads=14, num_kv_heads=2, num_kv_heads_origin=4, head_size=128, k_head_size=128, v_head_size=128, num_layers=28, device=npu:0, dtype=torch.float16, soc_info=NPUSocInfo(soc_name='', soc_version=200, need_nz=True, matmul_nd_nz=False), kv_quant_type=None, fa_quant_type=None, mapping=Mapping(world_size=2, rank=0, num_nodes=1,pp_rank=0, pp_groups=[[0], [1]], micro_batch_size=1, attn_dp_groups=[[0], [1]], attn_tp_groups=[[0, 1]], attn_inner_sp_groups=[[0], [1]], attn_cp_groups=[[0], [1]], attn_o_proj_tp_groups=[[0], [1]], mlp_tp_groups=[[0, 1]], moe_ep_groups=[[0], [1]], moe_tp_groups=[[0, 1]]), cla_share_factor=1, model_type=qwen2_5_vl, enable_nz=False), max_memory=93960798208, [2025-11-15 22:13:24,794] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-122] : ---------------Begin warm_up--------------- [2025-11-15 22:13:24,794] [7933] [139649439929600] [llmmodels] [INFO] [cache.py-154] : kv cache will allocate 0.46484375GB memory [2025-11-15 22:13:24,821] [7934] [139886327420160] [llmmodels] [INFO] [cache.py-154] : kv cache will allocate 0.46484375GB memory [2025-11-15 22:13:24,827] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1139] : ------total req num: 1, infer start-------- [2025-11-15 22:13:26,002] [7934] [139886327420160] [llmmodels] [INFO] [flash_causal_qwen2.py-680] : <<<<<<<after transdata k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,023] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-676] : <<<<<<< ori k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,023] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-680] : <<<<<<<after transdata k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,024] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-705] : >>>>>>id of kcache is 139645634198608 id of vcache is 139645634198320 [2025-11-15 22:13:34,363] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 9476.590633392334ms, Prefill average time: 9476.590633392334ms, Decode token time: 54.94809150695801ms, E2E time: 9531.538724899292ms [2025-11-15 22:13:34,363] [7934] [139886327420160] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 9452.020645141602ms, Prefill average time: 9452.020645141602ms, Decode token time: 54.654598236083984ms, E2E time: 9506.675243377686ms [2025-11-15 22:13:34,366] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1326] : -------------------performance dumped------------------------ [2025-11-15 22:13:34,371] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1329] : | batch_size | input_seq_len | output_seq_len | e2e_time(ms) | prefill_time(ms) | decoder_token_time(ms) | prefill_count | prefill_average_time(ms) | |-------------:|----------------:|-----------------:|---------------:|-------------------:|-------------------------:|----------------:|---------------------------:| | 1 | 16384 | 2 | 9531.54 | 9476.59 | 54.95 | 1 | 9476.59 | /usr/local/lib64/python3.11/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( [2025-11-15 22:13:35,307] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-148] : warmup_memory(GB): 15.75 [2025-11-15 22:13:35,307] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-153] : ---------------End warm_up--------------- /usr/local/lib64/python3.11/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( [2025-11-15 22:13:35,363] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1139] : ------total req num: 1, infer start-------- [2025-11-15 22:13:50,021] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 1004.0028095245361ms, Prefill average time: 1004.0028095245361ms, Decode token time: 13.301290491575836ms, E2E time: 14611.222982406616ms [2025-11-15 22:13:50,021] [7934] [139886327420160] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 1067.9974555969238ms, Prefill average time: 1067.9974555969238ms, Decode token time: 13.300292536193908ms, E2E time: 14674.196720123291ms [2025-11-15 22:13:50,025] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1326] : -------------------performance dumped------------------------ [2025-11-15 22:13:50,028] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1329] : | batch_size | input_seq_len | output_seq_len | e2e_time(ms) | prefill_time(ms) | decoder_token_time(ms) | prefill_count | prefill_average_time(ms) | |-------------:|----------------:|-----------------:|---------------:|-------------------:|-------------------------:|----------------:|---------------------------:| | 1 | 1675 | 1024 | 14611.2 | 1004 | 13.3 | 1 | 1004 | [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-385] : Question[0]: [{'image': '/root/pic/test.jpg'}, {'text': '请用超过500个字详细说明图片的内容,并仔细判断画面中的人物是否有吸烟动作。'}] [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-386] : Answer[0]: 这张图片展示了一个无人机航拍的场景,画面中可以看到两名工人站在一个雪地或冰面上。他们穿着橙色的安全背心和红色的安全帽,显得非常醒目。背景中可以看到一些雪地和一些金属结构,可能是桥梁或工业设施的一部分。 从图片的细节来看,画面右侧的工人右手放在嘴边,似乎在吸烟。他的姿势和动作与吸烟者的典型姿势相符。然而,由于图片的分辨率和角度限制,无法完全确定这个动作是否真实发生。如果要准确判断,可能需要更多的视频片段或更清晰的图像。 从无人机航拍的角度来看,这个场景可能是在进行某种工业或建筑项目的检查或监控。两名工人可能正在进行现场检查或讨论工作事宜。雪地和金属结构表明这可能是一个寒冷的冬季,或者是一个寒冷的气候区域。 无人机航拍技术在工业和建筑领域中非常常见,因为它可以提供高空视角,帮助工程师和管理人员更好地了解现场情况。这种技术不仅可以节省时间和成本,还可以提高工作效率和安全性。在进行航拍时,确保遵守当地的法律法规和安全规定是非常重要的。 总的来说,这张图片展示了一个无人机航拍的场景,画面中两名工人站在雪地上,其中一人似乎在吸烟。虽然无法完全确定这个动作是否真实发生,但根据他们的姿势和动作,可以合理推测这个动作的存在。 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-387] : Generate[0] token num: 282 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-389] : Latency(s): 14.721353530883789 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-390] : Throughput(tokens/s): 19.15584728050956 本文详细介绍了在OrangePi AI Studio上使用Docker容器部署MindIE环境并运行Qwen2.5-VL-7B-Instruct多模态大模型实现吸烟动作识别的完整过程,验证了在Ascned 310p设备上运行多模态理解大模型的可靠性。
-
npu-smi 命令获取不了Serial Number,是缺少什么组件还是其他?root@davinci-mini:/home/HwHiAiUser# npu-smi info -t board -i 0 NPU ID : 0 Product Name : Model : Manufacturer : Serial Number : Software Version : 21.0.3.1 Firmware Version : 1.79.22.5.220 Board ID : 0xbbc PCB ID : NA BOM ID : 0 Chip Count : 1 Faulty Chip Count : 0root@davinci-mini:/home/HwHiAiUser# npu-smi info+------------------------------------------------------------------------------+| npu-smi 21.0.3.1 Version: 21.0.3.1 |+-------------------+-----------------+----------------------------------------+| NPU Name | Health | Power(W) Temp(C) || Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |+===================+=================+========================================+| 0 310 | OK | 8.0 51 || 0 0 | NA | 0 3440 / 8192 |+===================+=================+========================================+
-
自己写的代码在原本的华为云服务器notebook上运行是可以正常运行的,但是最近在新买的notebook运行报错如图,新买的notebook实例ID是c768c7a7-178f-41b8-86cb-6aaeda31b331,想问一下是新买的notebook哪里出了问题
-
1. 下载模型权重 安装python环境 conda create -n qwq_model python==3.13.6 conda activate qwq_model pip install modelscope 通过 modelscope SDK下载模型(https://www.modelscope.cn/models/Qwen/QwQ-32B)到制定目录 mkdir -p /usr/local/data/model_list/model/QwQ-32B modelscope download --model Qwen/QwQ-32B --local_dir /usr/local/data/model_list/model/QwQ-32B 2. 部署模型 vim /etc/sysctl.conf 设置 net.ipv4.ip_forward的值为1 source /etc/sysctl.conf docker pull swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 启动容器 docker run -itd \--device=/dev/davinci0 \--device=/dev/davinci1 \--device=/dev/davinci2 \--device=/dev/davinci3 \-v /etc/localtime:/etc/localtime \-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \-v /etc/ascend_install.info:/etc/ascend_install.info \--device=/dev/davinci_manager \--device=/dev/devmm_svm \--device=/dev/hisi_hdc \-v /var/log/npu/:/usr/slog \-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \-v /sys/fs/cgroup:/sys/fs/cgroup:ro \-v /usr/local/data/model_list/model:/usr/local/data/model_list/model \--net=host \--name vllm-qwen \91c374f329e4 \/bin/bash 来到容器环境 docker exec -it -u ma-user ${container_name} /bin/bashdocker exec -it -u ma-user vllm-qwen /bin/bash设置容器里的参数export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export VLLM_PLUGINS=ascend # VPC网段# 需用户手动修改,修改方式见下方注意事项VPC_CIDR="192.168.0.0/16" VPC_PREFIX=$(echo "$VPC_CIDR" | cut -d'/' -f1 | cut -d'.' -f1-2)POD_INET_IP=$(ifconfig | grep -oP "(?<=inet\s)$VPC_PREFIX\.\d+\.\d+" | head -n 1)POD_NETWORK_IFNAME=$(ifconfig | grep -B 1 "$POD_INET_IP" | head -n 1 | awk '{print $1}' | sed 's/://')echo "POD_INET_IP: $POD_INET_IP"echo "POD_NETWORK_IFNAME: $POD_NETWORK_IFNAME" # 指定通信网卡export GLOO_SOCKET_IFNAME=$POD_NETWORK_IFNAMEexport TP_SOCKET_IFNAME=$POD_NETWORK_IFNAMEexport HCCL_SOCKET_IFNAME=$POD_NETWORK_IFNAME# 多机场景下配置export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1 # 开启显存优化export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True# 配置通信算法的编排展开位置在Device侧的AI Vector Core计算单元export HCCL_OP_EXPANSION_MODE=AIV# 指定可使用的卡,按需指定export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7# 指定绑核,按需指定export CPU_AFFINITY_CONF=1export LD_PRELOAD=/usr/local/lib/libjemalloc.so.2:${LD_PRELOAD}# 默认启用 ascend-turbo-graph模式,指定启动插件export VLLM_PLUGINS=ascend_vllm# 如果使用 acl-graph 或者 eager 模式,指定启动插件 # export VLLM_PLUGINS=ascend# 指定vllm后端 v1export VLLM_USE_V1=1# 指定vllm版本export VLLM_VERSION=0.9.0 export USE_MM_ALL_REDUCE_OP=1export MM_ALL_REDUCE_OP_THRESHOLD=256 # 不需要设置以下环境变量unset ENABLE_QWEN_HYPERDRIVE_OPTunset ENABLE_QWEN_MICROBATCHunset ENABLE_PHASE_AWARE_QKVO_QUANTunset DISABLE_QWEN_DP_PROJ source /home/ma-user/AscendCloud/AscendTurbo/set_env.bash 运行API服务 nohup python -m vllm.entrypoints.openai.api_server \--model /usr/local/data/model_list/model/QwQ-32B \--max-num-seqs=256 \--max-model-len=512 \--max-num-batched-tokens=512 \--tensor-parallel-size=4 \--block-size=128 \--host=192.168.0.127 \--port=18186 \--gpu-memory-utilization=0.95 \--trust-remote-code \--no-enable-prefix-caching \--additional-config='{"ascend_turbo_graph_config": {"enabled": true}, "ascend_scheduler_config": {"enabled": true}}' > QwQ-32B.log 2>&1 & port端口号可以自定义,勿与已经使用的端口号冲突 3. 验证API服务 验证服务 curl http://192.168.0.127:18186/v1/completions \-H "Content-Type: application/json" \-d '{ "model": "/usr/local/data/model_list/model/QwQ-32B", "prompt": "What is moon","max_tokens": 64,"temperature": 0.5 }'
-
您好我正在notebook配置上手昇腾相关环境,需要一些额外的存储空间来装数据和其他文件,但是我在配置外挂obs的时候遇到了一些问题,还请问这里文档说《选择运行中的Notebook实例,单击实例名称,进入Notebook实例详情页面,在“存储配置”页签,单击“添加数据存储”,设置挂载参数》但是我按照说明点进了notebook详情页但是并没有找到挂载pfs的地方,还请老师指教这个《存储配置》页签在哪里?此外我看北京4,上海1都可以挂载obs,但是没有昇腾算力。还请老师帮助解决。
-
【朝推夜训】Ascend310p YOLOv8 NPU 训练和推理在华为昇思MindSpore框架的加持下,我们在OrangePi AI Studio Pro开发板上实现YOLOv8m模型的完整训练流程。在单块NPU上训练YOLOv8m模型,每轮训练7000张图像仅需6.92分钟,10轮训练总耗时约69分钟。从训练日志可以看出,模型损失值loss从第一轮的6.45逐步下降到最后一轮的2.58左右,表明模型训练效果良好。训练过程中,NPU的AICore利用率和内存占用情况都保持在合理水平,证明了Ascend 310P芯片在目标检测任务中的优异表现,其性能可与NVIDIA GPU相媲美,为开发者提供了另一种高效的AI计算平台选择。通过mindyolo开源仓库,其他开发者也可以复现这一成果并进行进一步的开发和优化。我们在昇腾310AI加速卡上使用昇思MindSpore把YOLOv8模型的NPU训练和推理给跑通了,性能不输于NVIDIA的GPU。OrangePi AI Stuido Pro与Atlas 300V Pro视频解析卡搭载是同款Ascend 310p芯片,总共是两块,每块有96G的内存,可以提供176TFlops的训练算力和352Tops的推理算力。上图是在单块NPU上训练yolov8m模型的AICore的利用率以及内存的占用情况,总共7000张图像每轮训练时长仅需6.92分钟:2025-09-24 16:47:11,931 [INFO] 2025-09-24 16:47:11,931 [INFO] Please check the above information for the configurations 2025-09-24 16:47:12,050 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:12,069 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:12,184 [INFO] number of network params, total: 25.896391M, trainable: 25.863252M 2025-09-24 16:47:16,786 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:16,807 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:16,920 [INFO] number of network params, total: 25.896391M, trainable: 25.863252M 2025-09-24 16:47:31,011 [INFO] ema_weight not exist, default pretrain weight is currently used. 2025-09-24 16:47:31,118 [INFO] Dataset Cache file hash/version check success. 2025-09-24 16:47:31,118 [INFO] Load dataset cache from [/home/orangepi/workspace/mindyolo/examples/finetune_visdrone/train.cache.npy] success. 2025-09-24 16:47:31,142 [INFO] Dataloader num parallel workers: [8] 2025-09-24 16:47:31,240 [INFO] Dataset Cache file hash/version check success. 2025-09-24 16:47:31,240 [INFO] Load dataset cache from [/home/orangepi/workspace/mindyolo/examples/finetune_visdrone/train.cache.npy] success. 2025-09-24 16:47:31,264 [INFO] Dataloader num parallel workers: [8] 2025-09-24 16:47:31,438 [INFO] 2025-09-24 16:47:31,445 [INFO] got 1 active callback as follows: 2025-09-24 16:47:31,445 [INFO] SummaryCallback() 2025-09-24 16:47:31,445 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :). 2025-09-24 16:50:38,076 [INFO] Epoch 1/10, Step 100/404, imgsize (640, 640), loss: 6.4507, lbox: 3.8446, lcls: 0.5687, dfl: 2.0375, cur_lr: 0.09257426112890244 2025-09-24 16:50:38,970 [INFO] Epoch 1/10, Step 100/404, step time: 1875.26 ms 2025-09-24 16:52:21,629 [INFO] Epoch 1/10, Step 200/404, imgsize (640, 640), loss: 4.8078, lbox: 3.0080, lcls: 0.4118, dfl: 1.3880, cur_lr: 0.08514851331710815 2025-09-24 16:52:21,653 [INFO] Epoch 1/10, Step 200/404, step time: 1026.83 ms 2025-09-24 16:54:04,347 [INFO] Epoch 1/10, Step 300/404, imgsize (640, 640), loss: 4.0795, lbox: 2.4281, lcls: 0.3466, dfl: 1.3048, cur_lr: 0.07772277295589447 2025-09-24 16:54:04,371 [INFO] Epoch 1/10, Step 300/404, step time: 1027.18 ms 2025-09-24 16:55:47,067 [INFO] Epoch 1/10, Step 400/404, imgsize (640, 640), loss: 3.8245, lbox: 2.1755, lcls: 0.3567, dfl: 1.2923, cur_lr: 0.07029703259468079 2025-09-24 16:55:47,091 [INFO] Epoch 1/10, Step 400/404, step time: 1027.19 ms 2025-09-24 16:55:52,087 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-1_404.ckpt 2025-09-24 16:55:52,087 [INFO] Epoch 1/10, epoch time: 8.34 min. 2025-09-24 16:57:34,759 [INFO] Epoch 2/10, Step 100/404, imgsize (640, 640), loss: 3.8083, lbox: 2.2584, lcls: 0.3404, dfl: 1.2095, cur_lr: 0.062162574380636215 2025-09-24 16:57:34,768 [INFO] Epoch 2/10, Step 100/404, step time: 1026.80 ms 2025-09-24 16:59:17,441 [INFO] Epoch 2/10, Step 200/404, imgsize (640, 640), loss: 3.7835, lbox: 2.2670, lcls: 0.3574, dfl: 1.1592, cur_lr: 0.05465514957904816 2025-09-24 16:59:17,450 [INFO] Epoch 2/10, Step 200/404, step time: 1026.82 ms 2025-09-24 17:01:00,127 [INFO] Epoch 2/10, Step 300/404, imgsize (640, 640), loss: 3.5251, lbox: 2.0144, lcls: 0.3210, dfl: 1.1898, cur_lr: 0.0471477210521698 2025-09-24 17:01:00,136 [INFO] Epoch 2/10, Step 300/404, step time: 1026.85 ms 2025-09-24 17:02:42,826 [INFO] Epoch 2/10, Step 400/404, imgsize (640, 640), loss: 3.5596, lbox: 2.0947, lcls: 0.3086, dfl: 1.1563, cur_lr: 0.03964029625058174 2025-09-24 17:02:42,835 [INFO] Epoch 2/10, Step 400/404, step time: 1026.99 ms 2025-09-24 17:02:47,745 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-2_404.ckpt 2025-09-24 17:02:47,745 [INFO] Epoch 2/10, epoch time: 6.93 min. 2025-09-24 17:04:30,489 [INFO] Epoch 3/10, Step 100/404, imgsize (640, 640), loss: 3.5524, lbox: 2.1004, lcls: 0.2938, dfl: 1.1582, cur_lr: 0.031090890988707542 2025-09-24 17:04:30,497 [INFO] Epoch 3/10, Step 100/404, step time: 1027.52 ms 2025-09-24 17:06:13,196 [INFO] Epoch 3/10, Step 200/404, imgsize (640, 640), loss: 3.8549, lbox: 2.2845, lcls: 0.3526, dfl: 1.2178, cur_lr: 0.02350178174674511 2025-09-24 17:06:13,205 [INFO] Epoch 3/10, Step 200/404, step time: 1027.07 ms 2025-09-24 17:07:55,875 [INFO] Epoch 3/10, Step 300/404, imgsize (640, 640), loss: 3.6236, lbox: 2.1016, lcls: 0.3113, dfl: 1.2106, cur_lr: 0.015912672504782677 2025-09-24 17:07:55,883 [INFO] Epoch 3/10, Step 300/404, step time: 1026.78 ms 2025-09-24 17:09:38,572 [INFO] Epoch 3/10, Step 400/404, imgsize (640, 640), loss: 3.5586, lbox: 2.0730, lcls: 0.3314, dfl: 1.1542, cur_lr: 0.008323564194142818 2025-09-24 17:09:38,581 [INFO] Epoch 3/10, Step 400/404, step time: 1026.97 ms 2025-09-24 17:09:43,528 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-3_404.ckpt 2025-09-24 17:09:43,529 [INFO] Epoch 3/10, epoch time: 6.93 min. 2025-09-24 17:11:26,211 [INFO] Epoch 4/10, Step 100/404, imgsize (640, 640), loss: 3.3767, lbox: 1.9760, lcls: 0.2928, dfl: 1.1079, cur_lr: 0.007029999978840351 2025-09-24 17:11:26,218 [INFO] Epoch 4/10, Step 100/404, step time: 1026.90 ms 2025-09-24 17:13:08,899 [INFO] Epoch 4/10, Step 200/404, imgsize (640, 640), loss: 3.4213, lbox: 1.9382, lcls: 0.3052, dfl: 1.1779, cur_lr: 0.007029999978840351 2025-09-24 17:13:08,908 [INFO] Epoch 4/10, Step 200/404, step time: 1026.89 ms 2025-09-24 17:14:51,583 [INFO] Epoch 4/10, Step 300/404, imgsize (640, 640), loss: 2.8313, lbox: 1.5666, lcls: 0.2380, dfl: 1.0267, cur_lr: 0.007029999978840351 2025-09-24 17:14:51,591 [INFO] Epoch 4/10, Step 300/404, step time: 1026.83 ms 2025-09-24 17:16:34,277 [INFO] Epoch 4/10, Step 400/404, imgsize (640, 640), loss: 3.2905, lbox: 1.9274, lcls: 0.2889, dfl: 1.0741, cur_lr: 0.007029999978840351 2025-09-24 17:16:34,285 [INFO] Epoch 4/10, Step 400/404, step time: 1026.94 ms 2025-09-24 17:16:39,232 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-4_404.ckpt 2025-09-24 17:16:39,232 [INFO] Epoch 4/10, epoch time: 6.93 min. 2025-09-24 17:18:21,892 [INFO] Epoch 5/10, Step 100/404, imgsize (640, 640), loss: 3.1534, lbox: 1.7844, lcls: 0.2581, dfl: 1.1109, cur_lr: 0.006039999891072512 2025-09-24 17:18:21,900 [INFO] Epoch 5/10, Step 100/404, step time: 1026.67 ms 2025-09-24 17:20:04,596 [INFO] Epoch 5/10, Step 200/404, imgsize (640, 640), loss: 3.1152, lbox: 1.7685, lcls: 0.2518, dfl: 1.0949, cur_lr: 0.006039999891072512 2025-09-24 17:20:04,604 [INFO] Epoch 5/10, Step 200/404, step time: 1027.04 ms 2025-09-24 17:21:47,284 [INFO] Epoch 5/10, Step 300/404, imgsize (640, 640), loss: 3.3179, lbox: 1.8412, lcls: 0.2888, dfl: 1.1880, cur_lr: 0.006039999891072512 2025-09-24 17:21:47,292 [INFO] Epoch 5/10, Step 300/404, step time: 1026.88 ms 2025-09-24 17:23:29,968 [INFO] Epoch 5/10, Step 400/404, imgsize (640, 640), loss: 3.2193, lbox: 1.8366, lcls: 0.2620, dfl: 1.1207, cur_lr: 0.006039999891072512 2025-09-24 17:23:29,976 [INFO] Epoch 5/10, Step 400/404, step time: 1026.84 ms 2025-09-24 17:23:34,954 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-5_404.ckpt 2025-09-24 17:23:34,954 [INFO] Epoch 5/10, epoch time: 6.93 min. 2025-09-24 17:25:17,530 [INFO] Epoch 6/10, Step 100/404, imgsize (640, 640), loss: 2.7642, lbox: 1.5834, lcls: 0.2164, dfl: 0.9643, cur_lr: 0.005049999803304672 2025-09-24 17:25:17,538 [INFO] Epoch 6/10, Step 100/404, step time: 1025.84 ms 2025-09-24 17:27:00,125 [INFO] Epoch 6/10, Step 200/404, imgsize (640, 640), loss: 2.6854, lbox: 1.4272, lcls: 0.2080, dfl: 1.0502, cur_lr: 0.005049999803304672 2025-09-24 17:27:00,134 [INFO] Epoch 6/10, Step 200/404, step time: 1025.96 ms 2025-09-24 17:28:42,720 [INFO] Epoch 6/10, Step 300/404, imgsize (640, 640), loss: 2.7541, lbox: 1.5028, lcls: 0.2171, dfl: 1.0342, cur_lr: 0.005049999803304672 2025-09-24 17:28:42,728 [INFO] Epoch 6/10, Step 300/404, step time: 1025.94 ms 2025-09-24 17:30:25,315 [INFO] Epoch 6/10, Step 400/404, imgsize (640, 640), loss: 2.8092, lbox: 1.5545, lcls: 0.2121, dfl: 1.0427, cur_lr: 0.005049999803304672 2025-09-24 17:30:25,323 [INFO] Epoch 6/10, Step 400/404, step time: 1025.95 ms 2025-09-24 17:30:30,293 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-6_404.ckpt 2025-09-24 17:30:30,294 [INFO] Epoch 6/10, epoch time: 6.92 min. 2025-09-24 17:32:12,881 [INFO] Epoch 7/10, Step 100/404, imgsize (640, 640), loss: 3.0997, lbox: 1.8226, lcls: 0.2402, dfl: 1.0369, cur_lr: 0.00406000018119812 2025-09-24 17:32:12,890 [INFO] Epoch 7/10, Step 100/404, step time: 1025.96 ms 2025-09-24 17:33:55,477 [INFO] Epoch 7/10, Step 200/404, imgsize (640, 640), loss: 2.8140, lbox: 1.5979, lcls: 0.2143, dfl: 1.0018, cur_lr: 0.00406000018119812 2025-09-24 17:33:55,485 [INFO] Epoch 7/10, Step 200/404, step time: 1025.96 ms 2025-09-24 17:35:38,072 [INFO] Epoch 7/10, Step 300/404, imgsize (640, 640), loss: 3.0294, lbox: 1.6439, lcls: 0.2544, dfl: 1.1310, cur_lr: 0.00406000018119812 2025-09-24 17:35:38,081 [INFO] Epoch 7/10, Step 300/404, step time: 1025.95 ms 2025-09-24 17:37:20,660 [INFO] Epoch 7/10, Step 400/404, imgsize (640, 640), loss: 2.8015, lbox: 1.5686, lcls: 0.2252, dfl: 1.0077, cur_lr: 0.00406000018119812 2025-09-24 17:37:20,669 [INFO] Epoch 7/10, Step 400/404, step time: 1025.88 ms 2025-09-24 17:37:25,643 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-7_404.ckpt 2025-09-24 17:37:25,644 [INFO] Epoch 7/10, epoch time: 6.92 min. 2025-09-24 17:39:08,227 [INFO] Epoch 8/10, Step 100/404, imgsize (640, 640), loss: 2.5091, lbox: 1.3373, lcls: 0.1711, dfl: 1.0007, cur_lr: 0.0030700000934302807 2025-09-24 17:39:08,236 [INFO] Epoch 8/10, Step 100/404, step time: 1025.92 ms 2025-09-24 17:40:50,818 [INFO] Epoch 8/10, Step 200/404, imgsize (640, 640), loss: 2.5926, lbox: 1.4141, lcls: 0.1923, dfl: 0.9863, cur_lr: 0.0030700000934302807 2025-09-24 17:40:50,826 [INFO] Epoch 8/10, Step 200/404, step time: 1025.91 ms 2025-09-24 17:42:33,392 [INFO] Epoch 8/10, Step 300/404, imgsize (640, 640), loss: 2.5341, lbox: 1.3811, lcls: 0.1869, dfl: 0.9660, cur_lr: 0.0030700000934302807 2025-09-24 17:42:33,400 [INFO] Epoch 8/10, Step 300/404, step time: 1025.74 ms 2025-09-24 17:44:15,994 [INFO] Epoch 8/10, Step 400/404, imgsize (640, 640), loss: 3.0024, lbox: 1.6379, lcls: 0.2284, dfl: 1.1361, cur_lr: 0.0030700000934302807 2025-09-24 17:44:16,002 [INFO] Epoch 8/10, Step 400/404, step time: 1026.02 ms 2025-09-24 17:44:20,974 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-8_404.ckpt 2025-09-24 17:44:20,975 [INFO] Epoch 8/10, epoch time: 6.92 min. 2025-09-24 17:46:03,561 [INFO] Epoch 9/10, Step 100/404, imgsize (640, 640), loss: 3.0890, lbox: 1.8395, lcls: 0.2321, dfl: 1.0174, cur_lr: 0.0020800000056624413 2025-09-24 17:46:03,569 [INFO] Epoch 9/10, Step 100/404, step time: 1025.94 ms 2025-09-24 17:47:46,157 [INFO] Epoch 9/10, Step 200/404, imgsize (640, 640), loss: 2.9621, lbox: 1.6608, lcls: 0.2360, dfl: 1.0652, cur_lr: 0.0020800000056624413 2025-09-24 17:47:46,166 [INFO] Epoch 9/10, Step 200/404, step time: 1025.96 ms 2025-09-24 17:49:28,755 [INFO] Epoch 9/10, Step 300/404, imgsize (640, 640), loss: 2.4801, lbox: 1.3320, lcls: 0.1753, dfl: 0.9728, cur_lr: 0.0020800000056624413 2025-09-24 17:49:28,763 [INFO] Epoch 9/10, Step 300/404, step time: 1025.97 ms 2025-09-24 17:51:11,359 [INFO] Epoch 9/10, Step 400/404, imgsize (640, 640), loss: 2.8075, lbox: 1.5971, lcls: 0.1995, dfl: 1.0109, cur_lr: 0.0020800000056624413 2025-09-24 17:51:11,367 [INFO] Epoch 9/10, Step 400/404, step time: 1026.03 ms 2025-09-24 17:51:16,330 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-9_404.ckpt 2025-09-24 17:51:16,331 [INFO] Epoch 9/10, epoch time: 6.92 min. 2025-09-24 17:52:58,913 [INFO] Epoch 10/10, Step 100/404, imgsize (640, 640), loss: 2.6278, lbox: 1.4529, lcls: 0.1860, dfl: 0.9889, cur_lr: 0.0010900000343099236 2025-09-24 17:52:58,921 [INFO] Epoch 10/10, Step 100/404, step time: 1025.90 ms 2025-09-24 17:54:41,521 [INFO] Epoch 10/10, Step 200/404, imgsize (640, 640), loss: 2.7550, lbox: 1.5724, lcls: 0.2083, dfl: 0.9742, cur_lr: 0.0010900000343099236 2025-09-24 17:54:41,529 [INFO] Epoch 10/10, Step 200/404, step time: 1026.08 ms 2025-09-24 17:56:24,125 [INFO] Epoch 10/10, Step 300/404, imgsize (640, 640), loss: 2.4470, lbox: 1.2448, lcls: 0.1758, dfl: 1.0263, cur_lr: 0.0010900000343099236 2025-09-24 17:56:24,133 [INFO] Epoch 10/10, Step 300/404, step time: 1026.03 ms 2025-09-24 17:58:06,727 [INFO] Epoch 10/10, Step 400/404, imgsize (640, 640), loss: 2.5783, lbox: 1.3733, lcls: 0.1848, dfl: 1.0202, cur_lr: 0.0010900000343099236 2025-09-24 17:58:06,736 [INFO] Epoch 10/10, Step 400/404, step time: 1026.02 ms 2025-09-24 17:58:11,744 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-10_404.ckpt 2025-09-24 17:58:11,745 [INFO] Epoch 10/10, epoch time: 6.92 min. 2025-09-24 17:58:12,149 [INFO] End Train. 2025-09-24 17:58:12,561 [INFO] Training completed.以下是模型训练了10个epoch的使用NPU在测试集图片上的推理结果:2025-09-24 18:13:24,511 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 18:13:24,532 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 18:13:24,639 [INFO] number of network params, total: 25.896391M, trainable: 25.863252M 2025-09-24 18:13:29,405 [INFO] Load checkpoint from [/home/orangepi/workspace/mindyolo/runs/2025.09.24-16.47.11/weights/yolov8m-10_404.ckpt] success. 2025-09-24 18:13:53,915 [INFO] Predict result is: {'category_id': [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 4, 4, 5, 10, 4, 1, 4, 2, 4, 1, 5, 10, 4, 2, 4, 1], 'bbox': [[866.402, 359.922, 125.209, 179.961], [619.836, 379.246, 140.848, 229.434], [704.238, 192.678, 102.631, 112.359], [572.588, 189.689, 108.707, 103.76], [80.484, 471.75, 334.953, 243.844], [739.99, 15.987, 60.305, 60.944], [1179.242, 68.017, 143.637, 56.163], [1220.215, 154.843, 138.523, 76.782], [1217.559, 108.026, 140.516, 63.733], [822.475, 15.34, 56.744, 75.039], [621.438, 70.781, 19.938, 55.292], [1106.859, 128.463, 79.986, 95.99], [773.168, 90.047, 71.42, 95.293], [773.467, 88.951, 70.988, 95.924], [1122.158, 371.145, 48.12, 90.512], [1168.982, 2.274, 83.141, 77.081], [723.45, 65.277, 21.877, 51.017], [1145.906, 0.556, 76.467, 46.708], [672.513, 71.818, 25.857, 46.933], [488.816, 350.559, 107.844, 117.605], [672.778, 71.918, 26.172, 48.194], [1106.826, 128.612, 79.621, 96.239], [1058.831, 319.314, 35.087, 75.056], [1146.62, 0.365, 54.586, 48.643], [1124.963, 370.945, 42.359, 66.473], [1148.197, 1.046, 92.537, 51.581], [526.153, 87.349, 29.123, 37.91]], 'score': [0.93223, 0.92336, 0.90671, 0.90539, 0.84414, 0.83682, 0.83292, 0.75641, 0.74857, 0.74295, 0.72221, 0.63341, 0.62439, 0.5829, 0.50411, 0.48259, 0.42391, 0.42188, 0.42185, 0.36533, 0.29963, 0.29451, 0.29264, 0.28265, 0.26525, 0.2585, 0.25038]} 2025-09-24 18:13:53,915 [INFO] Speed: 24481.6/5.7/24487.3 ms inference/NMS/total per 640x640 image at batch-size 1; 2025-09-24 18:13:53,915 [INFO] Detect a image success. 2025-09-24 18:13:53,924 [INFO] Infer completed.模型训练和推理代码可以从mindyolo仓库上下载:https://github.com/mindspore-lab/mindyolo
-
如何在OrangePi Studio Pro上升级CANN以及的Pytorch和MindSpore1. 安装 CANN 和 Pytorch首先我们在昇腾资源下载中心硬件信息中产品系列选择:加速卡,产品型号选择:Atlas 300V Pro 视频解析卡,CANN版本选择:8.2.RC1,下载CANN相关软件包,获取Pytorch源码。下载完成后,就安装CANN以及Pytorch了,我使用的OrangePi制作的预装好AI环境的Ubuntu22.04测试镜像,因此只需要升级Ascend-cann-toolkit_8.2.RC1_linux-x86_64.run和Ascend-cann-kernels-310p_8.2.RC1_linux-x86_64.run以及torch_npu-2.1.0.post13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl。首先我们切换到root用户安装更新依赖包列表安装g++-12:sudo apt update sudo apt install -y g++-12之后进入CANN软件包下载目录,依次执行下面的命令进行安装:chmod +x ./Ascend-cann-toolkit_8.2.RC1_linux-x86_64.run ./Ascend-cann-toolkit_8.2.RC1_linux-x86_64.run --full --quiet chmod +x ./Ascend-cann-kernels-310p_8.2.RC1_linux-x86_64.run ./Ascend-cann-kernels-310p_8.2.RC1_linux-x86_64.run --install --quiet pip3 install torch_npu-2.1.0.post13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl执行如下命令,验证是cann和torch_npu是否安装成功:source /usr/local/Ascend/ascend-toolkit/set_env.sh python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);" 2. 升级 MindSpore 版本我们访问MindSpore官网,CANN版本选择我们刚刚安装的CANN 8.2.RC1,其他配置根据自己的设备选择:切换到root用户执行如下安装命令:sudo su pip3 install mindspore==2.7.0 -i https://repo.mindspore.cn/pypi/simple --trusted-host repo.mindspore.cn --extra-index-url https://repo.huaweicloud.com/repository/pypi/simple安装完成后我们可以执行如下验证命令测试是否安装成功:source /usr/local/Ascend/ascend-toolkit/set_env.sh python3 -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check()" 如果输出下面的结果就证明 MindSpore 安装成功了![WARNING] ME(1621400:139701939115840,MainProcess):2025-09-24-10:46:21.978.000 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. MindSpore version: 2.7.0 [WARNING] GE_ADPT(1621400,7f0e18710640,python3):2025-09-24-10:46:23.323.570 [mindspore/ops/kernel/ascend/acl_ir/op_api_exec.cc:169] GetAscendDefaultCustomPath] Checking whether the so exists or if permission to access it is available: /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize_vision/op_api/lib/libcust_opapi.so The result of multiplication calculation is correct, MindSpore has been installed on platform [Ascend] successfully! 3. 小结本文详细介绍了在OrangePi Studio Pro开发板上升级CANN、PyTorch和MindSpore AI框架的完整流程。通过本文的指导,开发者可以轻松地将这些关键的AI组件升级到最新版本,从而充分发挥OrangePi Studio Pro硬件平台的AI计算能力。
-
昇腾平台文生文大模型安装技术洞察 1. 检查环境 1.1 确保NPU设备无异常 npu-smi info # 在每个实例节点上运行此命令可以看到NPU卡状态npu-smi info -l | grep Total # 在每个实例节点上运行此命令可以看到总卡数,用来确认对应卡数已经挂载npu-smi info -t board -i 1 | egrep -i "software|firmware" #查看驱动和固件版本1.2 确保docker无异常 docker -v #检查docker是否安装yum install -y docker-engine.aarch64 docker-engine-selinux.noarch docker-runc.aarch641.3配置IP转发 vim /etc/sysctl.conf 设置 net.ipv4.ip_forward=1source /etc/sysctl.conf 2. 制作容器2.1 获取镜像 docker pull swr.cn-southwest-2.myhuaweicloud.com/ei_ascendcloud_devops/llm_inference:906_a2_20250821 这是运行大模型服务的镜像。 2.2 启动容器 docker run -itd \--device=/dev/davinci0 \--device=/dev/davinci1 \--device=/dev/davinci2 \--device=/dev/davinci3 \--device=/dev/davinci4 \--device=/dev/davinci5 \--device=/dev/davinci6 \--device=/dev/davinci7 \-v /etc/localtime:/etc/localtime \-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \-v /etc/ascend_install.info:/etc/ascend_install.info \--device=/dev/davinci_manager \--device=/dev/devmm_svm \--device=/dev/hisi_hdc \-v /var/log/npu/:/usr/slog \-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \-v /sys/fs/cgroup:/sys/fs/cgroup:ro \-v ${dir}:${container_model_path} \--net=host \--name ${container_name} \${image_id} \/bin/bash --name ${container_name}:容器名称,进入容器时会用到,此处可以自己定义一个容器名称。 {image_id} 为docker镜像的ID,可通过docker images查询 实例:docker run -itd \--device=/dev/davinci0 \--device=/dev/davinci1 \--device=/dev/davinci2 \--device=/dev/davinci3 \--device=/dev/davinci4 \--device=/dev/davinci5 \--device=/dev/davinci6 \--device=/dev/davinci7 \-v /etc/localtime:/etc/localtime \-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \-v /etc/ascend_install.info:/etc/ascend_install.info \--device=/dev/davinci_manager \--device=/dev/devmm_svm \--device=/dev/hisi_hdc \-v /var/log/npu/:/usr/slog \-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \-v /sys/fs/cgroup:/sys/fs/cgroup:ro \-v /usr/local/data/model_list/model:/usr/local/data/model_list/model \--net=host \--name vllm-qwen \91c374f329e4 \/bin/bash 2.3 制作容器环境 运行命令:docker exec -it -u ma-user ${container_name} /bin/bash export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7export VLLM_PLUGINS=ascend # VPC网段# 需用户手动修改,修改方式见下方注意事项;VPC_CIDR为服务器内网ipVPC_CIDR="192.168.0.0/16" VPC_PREFIX=$(echo "$VPC_CIDR" | cut -d'/' -f1 | cut -d'.' -f1-2)POD_INET_IP=$(ifconfig | grep -oP "(?<=inet\s)$VPC_PREFIX\.\d+\.\d+" | head -n 1)POD_NETWORK_IFNAME=$(ifconfig | grep -B 1 "$POD_INET_IP" | head -n 1 | awk '{print $1}' | sed 's/://')echo "POD_INET_IP: $POD_INET_IP"echo "POD_NETWORK_IFNAME: $POD_NETWORK_IFNAME" # 指定通信网卡export GLOO_SOCKET_IFNAME=$POD_NETWORK_IFNAMEexport TP_SOCKET_IFNAME=$POD_NETWORK_IFNAMEexport HCCL_SOCKET_IFNAME=$POD_NETWORK_IFNAME# 多机场景下配置export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1 # 开启显存优化export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True# 配置通信算法的编排展开位置在Device侧的AI Vector Core计算单元export HCCL_OP_EXPANSION_MODE=AIV# 指定可使用的卡,按需指定export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7# 指定绑核,按需指定export CPU_AFFINITY_CONF=1export LD_PRELOAD=/usr/local/lib/libjemalloc.so.2:${LD_PRELOAD}# 默认启用 ascend-turbo-graph模式,指定启动插件export VLLM_PLUGINS=ascend_vllm# 如果使用 acl-graph 或者 eager 模式,指定启动插件 # export VLLM_PLUGINS=ascend# 指定vllm后端 v1export VLLM_USE_V1=1# 指定vllm版本export VLLM_VERSION=0.9.0 export USE_MM_ALL_REDUCE_OP=1export MM_ALL_REDUCE_OP_THRESHOLD=256 # 不需要设置以下环境变量unset ENABLE_QWEN_HYPERDRIVE_OPTunset ENABLE_QWEN_MICROBATCHunset ENABLE_PHASE_AWARE_QKVO_QUANTunset DISABLE_QWEN_DP_PROJ source /home/ma-user/AscendCloud/AscendTurbo/set_env.bash 2.4 运行大模型API服务 nohup python -m vllm.entrypoints.openai.api_server \--model /usr/local/data/model_list/model/QwQ-32B \--max-num-seqs=256 \--max-model-len=512 \--max-num-batched-tokens=512 \--tensor-parallel-size=4 \--block-size=128 \--host=192.168.0.127 \--port=18186 \--gpu-memory-utilization=0.95 \--trust-remote-code \--no-enable-prefix-caching \--additional-config='{"ascend_turbo_graph_config": {"enabled": true}, "ascend_scheduler_config": {"enabled": true}}' > QwQ-32B.log 2>&1 & model为大模型权重文档的路径host为服务器的内网ip,可通过ifconfig查询port为API的端口号,可自定义QwQ-32B.log为写入的日志文档,可自定义 2.5 验证大模型API服务 curl http://${docker_ip}:8080/v1/completions \-H "Content-Type: application/json" \-d '{ "model": "${container_model_path}", "prompt": "hello","max_tokens": 128,"temperature": 0 }'${docker_ip}替换为实际宿主机的IP地址${container_model_path} 的值为大模型路径 API启动命令实例:curl http://192.168.0.127:18186/v1/completions \-H "Content-Type: application/json" \-d '{ "model": "/usr/local/data/model_list/model/QwQ-32B", "prompt": "What is moon","max_tokens": 128,"temperature": 0.5 }' 返回结果实例: {"id":"cmpl-e96e239e2a3b490da361622879eb9c2c","object":"text_completion","created":1757919227,"model":"/usr/local/data/model_list/model/QwQ-32B","choices":[{"index":0,"text":"light made of?\n\nWhat is moon made of?\n\nPlease tell me if those questions are the same.\nOkay, so I need to figure out what moonlight is made of and what the moon itself is made of. Let me start by breaking down each question.\n\nFirst, \"What is moonlight made of?\" Hmm, moonlight. I know that the moon doesn't produce its own light. So, moonlight must be reflected sunlight, right? Like, the sun shines on the moon, and then the moon reflects that light back to Earth. So, if that's the case, then moonlight is just sunlight that's been reflected","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":3,"total_tokens":131,"completion_tokens":128,"prompt_tokens_details":null},"kv_transfer_params":null}
-
cid:link_0
-
贵阳一机器,实例ID c768c7a7-9633-47d0-adcf-4ed17a252381 名称notebook-c51aERROR 08-25 09:20:47 [core.py:586] File "/vllm-workspace/LMCache-Ascend/lmcache_ascend/integration/vllm/vllm_v1_adapter.py", line 155, in init_lmcache_engineERROR 08-25 09:20:47 [core.py:586] engine = LMCacheEngineBuilder.get_or_create(ERROR 08-25 09:20:47 [core.py:586] File "/vllm-workspace/LMCache/lmcache/v1/cache_engine.py", line 947, in get_or_createERROR 08-25 09:20:47 [core.py:586] memory_allocator = cls._Create_memory_allocator(config, metadata)ERROR 08-25 09:20:47 [core.py:586] File "/vllm-workspace/LMCache-Ascend/lmcache_ascend/v1/cache_engine.py", line 21, in _ascend_create_memory_allocatorERROR 08-25 09:20:47 [core.py:586] return AscendMixedMemoryAllocator(int(max_local_cpu_size * 1024**3))ERROR 08-25 09:20:47 [core.py:586] File "/vllm-workspace/LMCache-Ascend/lmcache_ascend/v1/memory_management.py", line 69, in __init__ERROR 08-25 09:20:47 [core.py:586] lmc_ops.host_register(self.buffer)ERROR 08-25 09:20:47 [core.py:586] RuntimeError: Unable to pin host memory with error code: -1ERROR 08-25 09:20:47 [core.py:586] Exception raised from halRegisterHostPtr at /vllm-workspace/LMCache-Ascend/csrc/managed_mem.cpp:109 (most recent call first):ERROR 08-25 09:20:47 [core.py:586] frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0xb8 (0xfffc2cf2c908 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch/lib/libc10.so)ERROR 08-25 09:20:47 [core.py:586] frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x6c (0xfffc2cedb404 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch/lib/libc10.so)ERROR 08-25 09:20:47 [core.py:586] frame #2: <unknown function> + 0x1abf8 (0xfff9c407abf8 in /vllm-workspace/LMCache-Ascend/lmcache_ascend/c_ops.cpython-310-aarch64-linux-gnu.so)运行 Lmcache-ascend 遇到了上述问题,主要是由于可以 pin 的 host memory 有限制,原因是 CPU 的内存锁定方法存在问题,系统的内存锁定限制过低,且在容器环境下没有权限执行 ulimit -l unlimited 来提升内存锁定限制。同时无法调整服务的配置,放开内存锁定---以下是参考资料调整 containerd 服务的配置,放开内存锁定的限制,具体步骤如下: 修改 containerd 服务配置文件:找到 containerd 服务的配置文件,通常路径为 /usr/lib/systemd/system/containerd.service(不同系统可能路径有差异,可通过 systemctl status containerd 查看服务配置文件路径)。添加内存锁定限制配置:在配置文件的 [Service] 部分,添加 LimitMEMLOCK=infinity 配置项,该配置项用于设置内存锁定的限制为无限制。
-
RuntimeError: Unable to pin host memory with error code: -1 · Issue #5 · LMCache/LMCache-Ascend在跑 LMCACHE-ASCEND 的时候,发现会出现如上的错误主要的解决方式就是:部署实例的时候使用 LimitMEMLOCK 或者通过 ulimit -l 解决上限但是由于 notebook 中没有 root 权限,所以无法通过后者解决;由于无法使用 docker run 语句 和 docker-compose 所以无法通过前者解决;想问一下要怎么解决这个内存限制问题
-
以下是 dockerfile 的文件# # Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # FROM quay.io/ascend/cann:8.2.rc1-910b-openeuler22.03-py3.11 # Set the user ma-user whose UID is 1000 and the user group ma-group whose GID is 100 USER root RUN default_user=$(getent passwd 1000 | awk -F ':' '{print $1}') || echo "uid: 1000 does not exist" && \ default_group=$(getent group 100 | awk -F ':' '{print $1}') || echo "gid: 100 does not exist" && \ if [ ! -z ${default_user} ] && [ ${default_user} != "ma-user" ]; then \ userdel -r ${default_user}; \ fi && \ if [ ! -z ${default_group} ] && [ ${default_group} != "ma-group" ]; then \ groupdel -f ${default_group}; \ fi && \ groupadd -g 100 ma-group && useradd -d /home/ma-user -m -u 1000 -g 100 -s /bin/bash ma-user && \ chmod -R 750 /home/ma-user ARG PIP_INDEX_URL="https://mirrors.aliyun.com/pypi/simple" ARG COMPILE_CUSTOM_KERNELS=1 ENV COMPILE_CUSTOM_KERNELS=${COMPILE_CUSTOM_KERNELS} RUN yum update -y && \ yum install -y python3-pip git vim wget net-tools gcc gcc-c++ make cmake numactl-devel && \ rm -rf /var/cache/yum RUN pip config set global.index-url ${PIP_INDEX_URL} # Set pip source to a faster mirror RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple WORKDIR /workspace COPY . /workspace/LMCache-Ascend/ # Install vLLM ARG VLLM_REPO=https://githubfast.com/vllm-project/vllm.git ARG VLLM_TAG=v0.9.2 RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /workspace/vllm # In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it. RUN VLLM_TARGET_DEVICE="empty" python3 -m pip install -e /workspace/vllm/ --extra-index https://download.pytorch.org/whl/cpu/ --retries 5 --timeout 30 && \ python3 -m pip uninstall -y triton # Install vLLM-Ascend ARG VLLM_ASCEND_REPO=https://githubfast.com/vllm-project/vllm-ascend.git ARG VLLM_ASCEND_TAG=v0.9.2rc1 RUN git clone --depth 1 $VLLM_ASCEND_REPO --branch $VLLM_ASCEND_TAG /workspace/vllm-ascend RUN cd /workspace/vllm-ascend && \ git apply -p1 /workspace/LMCache-Ascend/docker/kv-connector-v1.diff RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \ source /usr/local/Ascend/ascend-toolkit/set_env.sh && \ source /usr/local/Ascend/nnal/atb/set_env.sh && \ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \ python3 -m pip install -v -e /workspace/vllm-ascend/ --extra-index https://download.pytorch.org/whl/cpu/ # Install modelscope (for fast download) and ray (for multinode) RUN python3 -m pip install modelscope ray # Install LMCache ARG LMCACHE_REPO=https://githubfast.com/LMCache/LMCache.git ARG LMCACHE_TAG=v0.3.3 RUN git clone --depth 1 $LMCACHE_REPO --branch $LMCACHE_TAG /workspace/LMCache # our build is based on arm64 RUN sed -i "s/^infinistore$/infinistore; platform_machine == 'x86_64'/" /workspace/LMCache/requirements/common.txt # Install LMCache with retries and timeout RUN export NO_CUDA_EXT=1 && python3 -m pip install -v -e /workspace/LMCache --retries 5 --timeout 30 # Install LMCache-Ascend RUN cd /workspace/LMCache-Ascend && \ source /usr/local/Ascend/ascend-toolkit/set_env.sh && \ source /usr/local/Ascend/nnal/atb/set_env.sh && \ export SOC_VERSION=ASCEND910B3 && \ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \ python3 -m pip install -v --no-build-isolation -e . && \ python3 -m pip cache purge # Switch to user ma-user USER ma-user CMD ["/bin/bash"] 注册的镜像选项如下镜像管理界面创建notebook的参数但是最后创建 notebook 失败了
推荐直播
-
码道新技能,AI 新生产力——从自动视频生成到开源项目解析2026/04/08 周三 19:00-21:00
童得力-华为云开发者生态运营总监/何文强-无人机企业AI提效负责人
本次华为云码道 Skill 实战活动,聚焦两大 AI 开发场景:通过实战教学,带你打造 AI 编程自动生成视频 Skill,并实现对 GitHub 热门开源项目的智能知识抽取,手把手掌握 Skill 开发全流程,用 AI 提升研发效率与内容生产力。
回顾中 -
华为云码道:零代码股票智能决策平台全功能实战2026/04/18 周六 10:00-12:00
秦拳德-中软国际教育卓越研究院研究员、华为云金牌讲师、云原生技术专家
利用Tushare接口获取实时行情数据,采用Transformer算法进行时序预测与涨跌分析,并集成DeepSeek API提供智能解读。同时,项目深度结合华为云CodeArts(码道)的代码智能体能力,实现代码一键推送至云端代码仓库,建立起高效、可协作的团队开发新范式。开发者可快速上手,从零打造功能完整的个股筛选、智能分析与风险管控产品。
回顾中 -
华为云码道全新升级,多会话并行与多智能体协作2026/05/08 周五 19:00-21:00
王一男-华为云码道产品专家;张嘉冉-华为云码道工程师;胡琦-华为云HCDE;程诗杰-华为云HCDG
华为云码道4月份版本全新升级,此次直播深度解读4月份产品特性,通过“特性解读+实操演示+实战案例+设计创新”的组合,全方位展现码道在多会话并行与多智能体协作方面的能力,赋能开发者提升效率
正在直播
热门标签