• [问题求助] 使用Deepseed时Conv3DBackpropFilter格式报错
    使用transformer的Deepseed进行单机多卡训练时算子报错。不使用Deepseed时可以正常训练模型: Qwen2.5-VL 3B镜像:pytorch_2.1.0-cann_8.0.rc2-py_3.9-euler_2.10.7-aarch64-snt9b机器:2卡910B2环境:torch                   2.4.0torch-npu               2.4.0.post2torchvision             0.19.0transformers            4.51.3deepspeed               0.16.5启动命令:ASCEND_LAUNCH_BLOCKING=1 accelerate launch --config_file deep_config.yaml engine.py deep_config.yaml:compute_environment: LOCAL_MACHINEdistributed_type: DEEPSPEEDdeepspeed_config:  path: ds_config.jsondebug: falsegpu_ids: "0,1"num_processes: 2use_cpu: falsenum_machines: 1machine_rank: 0same_network: truerdzv_backend: staticmain_training_function: mainmain_process_port: 29503tpu_env: []tpu_use_cluster: falsetpu_use_sudo: false deepconfig.json:{    "train_micro_batch_size_per_gpu": 1,    "gradient_accumulation_steps": 1,    "gradient_clipping": 1.0,     "bf16": {      "enabled": true    },    "fp16": {      "enabled": false    },     "zero_optimization": {      "stage": 2,      "contiguous_gradients": true,      "overlap_comm": true,      "reduce_scatter": true,      "reduce_bucket_size": 5e8,      "allgather_bucket_size": 5e8    },     "optimizer": {      "type": "AdamW",      "params": {        "lr": 5e-5,        "betas": [0.9, 0.999],        "eps": 1e-8,        "weight_decay": 0.01      }    },     "scheduler": {      "type": "WarmupLR",      "params": {        "warmup_min_lr": 0.0,        "warmup_max_lr": 5e-5,        "warmup_num_steps": 0      }    },     "activation_checkpointing": {      "partition_activations": false,      "contiguous_memory_optimization": false,      "cpu_checkpointing": false    },     "wall_clock_breakdown": false  } 报错信息:loss ar: 59.6875, computed! performing backward pass...[rank1]: Traceback (most recent call last):[rank1]:   File "/home/ma-user/work/DAR/engine.py", line 179, in <module>[rank1]:     dar_trainer.train()[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train[rank1]:     return inner_training_loop([rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)[rank1]:   File "/home/ma-user/work/DAR/dar_trainer.py", line 35, in training_step[rank1]:     return self.consistency_training_step(model, inputs)[rank1]:   File "/home/ma-user/work/DAR/dar_trainer.py", line 78, in consistency_training_step[rank1]:     self.accelerator.backward(loss_ar)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/accelerator.py", line 2446, in backward[rank1]:     self.deepspeed_engine_wrapped.backward(loss, **kwargs)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 266, in backward[rank1]:     self.engine.backward(loss, **kwargs)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn[rank1]:     ret_val = func(*args, **kwargs)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2187, in backward[rank1]:     self._do_optimizer_backward(loss, retain_graph)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward[rank1]:     self.optimizer.backward(loss, retain_graph=retain_graph)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2089, in backward[rank1]:     self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward[rank1]:     scaled_loss.backward(retain_graph=retain_graph)[rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward[rank1]:     torch.autograd.backward([rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward[rank1]:     _engine_run_backward([rank1]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward[rank1]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass[rank1]: RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv3DBackpropFilter, error code is 500002[rank1]: [ERROR] 2025-09-02-22:43:34 (PID:1305427, Device:1, RankID:1) ERR01100 OPS call acl api failed[rank1]: [Error]: A GE error occurs in the system.[rank1]:         Rectify the fault based on the error information in the ascend log.[rank1]: E69999: Inner Error![rank1]: E69999: [PID: 1305427] 2025-09-02-22:43:34.306.673 op[Conv3DBackpropFilter3], illegal format of x.[FUNC:Conv3DBackpropFilterInfer][FILE:nn_calculation_ops.cc][LINE:9783][rank1]:         TraceBack (most recent call last):[rank1]:        Sessin_id 0 does not exist, graph_id 2[FUNC:GetJsonObject][FILE:analyzer.cc][LINE:155][rank1]:        Param:graph_info is nullptr, check invalid[FUNC:DoAnalyze][FILE:analyzer.cc][LINE:253][rank1]:        Param:graph_info is nullptr, check invalid[FUNC:SaveAnalyzerDataToFile][FILE:analyzer.cc][LINE:210][rank1]:        Call InferShapeAndType for node:Conv3DBackpropFilter3(Conv3DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117][rank1]:        process pass InferShapePass on node:Conv3DBackpropFilter3 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:563][rank1]:        build graph failed, graph id:2, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][rank1]:        [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][rank1]:        [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][rank1]:        build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]Traceback (most recent call last):  File "/home/ma-user/work/DAR/engine.py", line 179, in <module>    dar_trainer.train()  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train    return inner_training_loop(  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)  File "/home/ma-user/work/DAR/dar_trainer.py", line 35, in training_step    return self.consistency_training_step(model, inputs)  File "/home/ma-user/work/DAR/dar_trainer.py", line 78, in consistency_training_step    self.accelerator.backward(loss_ar)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/accelerator.py", line 2446, in backward    self.deepspeed_engine_wrapped.backward(loss, **kwargs)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 266, in backward    self.engine.backward(loss, **kwargs)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn    ret_val = func(*args, **kwargs)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2187, in backward    self._do_optimizer_backward(loss, retain_graph)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward    self.optimizer.backward(loss, retain_graph=retain_graph)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2089, in backward    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward    scaled_loss.backward(retain_graph=retain_graph)  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward    torch.autograd.backward(  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward    _engine_run_backward(  File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward passRuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv3DBackpropFilter, error code is 500002[ERROR] 2025-09-02-22:43:34 (PID:1305426, Device:0, RankID:0) ERR01100 OPS call acl api failed[Error]: A GE error occurs in the system.        Rectify the fault based on the error information in the ascend log.E69999: Inner Error!E69999: [PID: 1305426] 2025-09-02-22:43:34.341.828 op[Conv3DBackpropFilter3], illegal format of x.[FUNC:Conv3DBackpropFilterInfer][FILE:nn_calculation_ops.cc][LINE:9783]        TraceBack (most recent call last):       Sessin_id 0 does not exist, graph_id 2[FUNC:GetJsonObject][FILE:analyzer.cc][LINE:155]       Param:graph_info is nullptr, check invalid[FUNC:DoAnalyze][FILE:analyzer.cc][LINE:253]       Param:graph_info is nullptr, check invalid[FUNC:SaveAnalyzerDataToFile][FILE:analyzer.cc][LINE:210]       Call InferShapeAndType for node:Conv3DBackpropFilter3(Conv3DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117]       process pass InferShapePass on node:Conv3DBackpropFilter3 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:563]       build graph failed, graph id:2, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615]       [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]       [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]       build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][rank0]: Traceback (most recent call last):[rank0]:   File "/home/ma-user/work/DAR/engine.py", line 179, in <module>[rank0]:     dar_trainer.train()[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train[rank0]:     return inner_training_loop([rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)[rank0]:   File "/home/ma-user/work/DAR/dar_trainer.py", line 35, in training_step[rank0]:     return self.consistency_training_step(model, inputs)[rank0]:   File "/home/ma-user/work/DAR/dar_trainer.py", line 78, in consistency_training_step[rank0]:     self.accelerator.backward(loss_ar)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/accelerator.py", line 2446, in backward[rank0]:     self.deepspeed_engine_wrapped.backward(loss, **kwargs)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 266, in backward[rank0]:     self.engine.backward(loss, **kwargs)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn[rank0]:     ret_val = func(*args, **kwargs)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2187, in backward[rank0]:     self._do_optimizer_backward(loss, retain_graph)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward[rank0]:     self.optimizer.backward(loss, retain_graph=retain_graph)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2089, in backward[rank0]:     self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward[rank0]:     scaled_loss.backward(retain_graph=retain_graph)[rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward[rank0]:     torch.autograd.backward([rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward[rank0]:     _engine_run_backward([rank0]:   File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward[rank0]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass[rank0]: RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv3DBackpropFilter, error code is 500002[rank0]: [ERROR] 2025-09-02-22:43:34 (PID:1305426, Device:0, RankID:0) ERR01100 OPS call acl api failed[rank0]: [Error]: A GE error occurs in the system.[rank0]:         Rectify the fault based on the error information in the ascend log.[rank0]: E69999: Inner Error![rank0]: E69999: [PID: 1305426] 2025-09-02-22:43:34.341.828 op[Conv3DBackpropFilter3], illegal format of x.[FUNC:Conv3DBackpropFilterInfer][FILE:nn_calculation_ops.cc][LINE:9783][rank0]:         TraceBack (most recent call last):[rank0]:        Sessin_id 0 does not exist, graph_id 2[FUNC:GetJsonObject][FILE:analyzer.cc][LINE:155][rank0]:        Param:graph_info is nullptr, check invalid[FUNC:DoAnalyze][FILE:analyzer.cc][LINE:253][rank0]:        Param:graph_info is nullptr, check invalid[FUNC:SaveAnalyzerDataToFile][FILE:analyzer.cc][LINE:210][rank0]:        Call InferShapeAndType for node:Conv3DBackpropFilter3(Conv3DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117][rank0]:        process pass InferShapePass on node:Conv3DBackpropFilter3 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:563][rank0]:        build graph failed, graph id:2, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][rank0]:        [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][rank0]:        [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][rank0]:        build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] 
  • [技术干货] 松材线虫病检测
    松材线虫病检测1. 数据切分无人机广角拍摄的影像分辨率较高(4000x3000),首先对人工标注好的松材线虫病数据集进行切分,将大图切分成小图并设置不同的切分尺寸(例如:1000x1000、1500x1500、2000x2000)和重叠比例(例如:0%、10%、20%、30%)送入模型进行训练。2. 模型训练YOLOv8自2023年推出后,经过多次优化迭代,其架构设计(如C2F模块、动态标签分配)与训练流程已趋于成熟。例如嵌入式设备依赖v8的轻量化特效,在医疗检测领域,v8的高召回率已被临床验证。YOLO12等虽在理论上超越YOLOv8,但是v8的推理速度仍具不可替代性,目前在工业界广泛采用该版本进行部署。我们使用YOLOv8对等比例缩放后的原始图像和切分后的松材线虫病检测数据集进行训练,提高模型对不同大小目标的泛化能力,每次迭代训练s和m两种尺寸的模型,分别用于视频直播检测和图像的自动标注。目前我们的模型已经适配国产昇腾和英伟达的算力卡,可以实现模型的自动化训练作业,并针对不同算力芯片进行模型的自动转换和量化。3. 云上标注我们的模型可以对无人机回传的图片和视频进行切分检测和自动标注,针对不同大小的目标和类别可以设置不同的切分尺寸和重叠比例,实现无人机影像的细粒度检测。4. 直播推理我们的AI直播推理业务Pipeline并发运行,使用Python结合C++进行开发,功能模块化,业务运行更高效,可以在RK3588、Jetson系列开发板上进行部署。目前针对松材线虫病检测的场景,已经支持对9种疫木的实时识别。----转自博客:https://bbs.huaweicloud.com/blogs/458003
  • [行业动态] 【话题讨论】未来人工智能芯片的发展趋势是什么(NPU VS GPGPU)?
    传闻华为昇腾NPU转向GPGPU,大家对此有什么看法?可以畅所欲言!
  • [技术干货] 华为云开发者空间☁️昇腾NPU实现AI工业质检
    华为云开发者空间☁️昇腾NPU实现AI工业质检本案例将在华为云开发者空间工作台⚙️AI Notebook 中使用免费的🎉 昇腾 NPU 910B ✨完成YOLO11模型训练,利用SAHI切片辅助超推理框架实现PCB缺陷检测。1. 下载模型和数据集📦首先在Notebook的代码块中粘贴并运行下面的代码,下载解压本案例所需的训练数据和模型文件:import os import zipfile if not os.path.exists('yolo11_train_ascend.zip'): os.system('wget -q https://orangepi-ascend.obs.cn-north-4.myhuaweicloud.com/yolo11_train_ascend.zip') if not os.path.exists('yolo11_train_ascend'): zip_file = zipfile.ZipFile('yolo11_train_ascend.zip') zip_file.extractall() zip_file.close() 2. 安装依赖包🛠️安装YOLO11所需的依赖包以及SAHI库,构建项目的运行环境:!pip install ultralytics==8.3.160 ultralytics-thop==2.0.14 sahi==0.11.26 numpy==1.26.4 Defaulting to user installation because normal site-packages is not writeable Looking in indexes: https://mirrors.huaweicloud.com/repository/pypi/simple Collecting ultralytics==8.3.160 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/7b/8d/924524ff26c0ed0ba43b90cc598887e2b06f3bf00dd51a505a754ecb138d/ultralytics-8.3.160-py3-none-any.whl (1.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 9.8 MB/s eta 0:00:00 Collecting ultralytics-thop==2.0.14 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/a6/10/251f036b4c5d77249f9a119cc89dafe8745dc1ad1f1a5f06b6a3988ca454/ultralytics_thop-2.0.14-py3-none-any.whl (26 kB) Collecting sahi==0.11.26 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/5e/8a/9782c8088af52e6f41fee59c77b5117783c0d6eafde45c96ca3912ec197f/sahi-0.11.26-py3-none-any.whl (115 kB) Collecting numpy==1.26.4 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/fc/a5/4beee6488160798683eed5bdb7eead455892c3b4e1f78d79d8d3f3b084ac/numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.2/14.2 MB 42.9 MB/s eta 0:00:00 0:00:01 Requirement already satisfied: matplotlib>=3.3.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (3.10.0) Requirement already satisfied: opencv-python>=4.6.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (4.10.0.84) Requirement already satisfied: pillow>=7.1.2 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (11.0.0) Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (6.0.2) Requirement already satisfied: requests>=2.23.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (2.32.3) Requirement already satisfied: scipy>=1.4.1 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (1.14.1) Requirement already satisfied: torch>=1.8.0 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (2.1.0) Requirement already satisfied: torchvision>=0.9.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (0.16.0) Requirement already satisfied: tqdm>=4.64.0 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (4.67.1) Requirement already satisfied: psutil in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (5.9.8) Requirement already satisfied: py-cpuinfo in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (9.0.0) Requirement already satisfied: pandas>=1.1.4 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (2.2.3) Requirement already satisfied: click in /usr/local/python3.10/lib/python3.10/site-packages (from sahi==0.11.26) (8.1.8) Collecting fire (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/6b/b6/82c7e601d6d3c3278c40b7bd35e17e82aa227f050aa9f66cb7b7fce29471/fire-0.7.0.tar.gz (87 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting pybboxes==0.1.6 (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/3c/3f/46f6613b41a3c2b4f7af3b526035771ca5bb12d6fdf3b23145899f785e36/pybboxes-0.1.6-py3-none-any.whl (24 kB) Collecting shapely>=2.0.0 (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/29/51/0b158a261df94e33505eadfe737db9531f346dfa60850945ad25fd4162f1/shapely-2.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 22.0 MB/s eta 0:00:00 Collecting terminaltables (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/c4/fb/ea621e0a19733e01fe4005d46087d383693c0f4a8f824b47d8d4122c87e0/terminaltables-3.1.10-py2.py3-none-any.whl (15 kB) Requirement already satisfied: contourpy>=1.0.1 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (1.3.1) Requirement already satisfied: cycler>=0.10 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (4.55.3) Requirement already satisfied: kiwisolver>=1.3.1 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (1.4.8) Requirement already satisfied: packaging>=20.0 in /usr/local/python3.10/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (24.2) Requirement already satisfied: pyparsing>=2.3.1 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (3.2.0) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/python3.10/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/python3.10/lib/python3.10/site-packages (from pandas>=1.1.4->ultralytics==8.3.160) (2024.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/python3.10/lib/python3.10/site-packages (from pandas>=1.1.4->ultralytics==8.3.160) (2024.2) Requirement already satisfied: charset-normalizer<4,>=2 in /home/service/.local/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/python3.10/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/service/.local/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /home/service/.local/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (2024.12.14) Requirement already satisfied: filelock in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (3.16.1) Requirement already satisfied: typing-extensions in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (4.12.2) Requirement already satisfied: sympy in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (1.13.3) Requirement already satisfied: networkx in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (3.4.2) Requirement already satisfied: jinja2 in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (3.1.5) Requirement already satisfied: fsspec in /home/service/.local/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (2024.9.0) Collecting termcolor (from fire->sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/4f/bd/de8d508070629b6d84a30d01d57e4a65c69aa7f5abe7560b8fad3b50ea59/termcolor-3.1.0-py3-none-any.whl (7.7 kB) Requirement already satisfied: six>=1.5 in /usr/local/python3.10/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib>=3.3.0->ultralytics==8.3.160) (1.16.0) Requirement already satisfied: MarkupSafe>=2.0 in /home/service/.local/lib/python3.10/site-packages (from jinja2->torch>=1.8.0->ultralytics==8.3.160) (3.0.2) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/python3.10/lib/python3.10/site-packages (from sympy->torch>=1.8.0->ultralytics==8.3.160) (1.3.0) Building wheels for collected packages: fire Building wheel for fire (pyproject.toml) ... done Created wheel for fire: filename=fire-0.7.0-py3-none-any.whl size=114330 sha256=a1f27d511635da524f8f51fa2d35ae22862e400cec55285acbc05ced6ef91371 Stored in directory: /home/service/.cache/pip/wheels/9b/dc/c7/06491fe82713723ab64494dbcfd521bdbe80cf26b5fcb5f564 Successfully built fire Installing collected packages: terminaltables, termcolor, numpy, shapely, pybboxes, fire, ultralytics-thop, sahi, ultralytics Attempting uninstall: numpy Found existing installation: numpy 1.24.4 Uninstalling numpy-1.24.4: Successfully uninstalled numpy-1.24.4 WARNING: The script f2py is installed in '/home/service/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script sahi is installed in '/home/service/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts ultralytics and yolo are installed in '/home/service/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 5.9.1 requires markupsafe~=2.0, but you have markupsafe 3.0.2 which is incompatible. openmind 0.9.1 requires datasets<=2.21.0,>=2.18.0, but you have datasets 3.2.0 which is incompatible. openmind 0.9.1 requires openmind-hub==0.9.0, but you have openmind-hub 0.9.1 which is incompatible. openmind-datasets 0.7.1 requires datasets==2.18.0, but you have datasets 3.2.0 which is incompatible. openmind-evaluate 0.7.0 requires datasets==2.18.0, but you have datasets 3.2.0 which is incompatible. Successfully installed fire-0.7.0 numpy-1.26.4 pybboxes-0.1.6 sahi-0.11.26 shapely-2.1.1 termcolor-3.1.0 terminaltables-3.1.10 ultralytics-8.3.160 ultralytics-thop-2.0.14 [notice] A new release of pip is available: 24.3.1 -> 25.1.1 [notice] To update, run: pip install --upgrade pip3. 修改配置文件📝我们在配置文件中指定数据集路径和类别等信息,用于后续模型的训练:%%writefile yolo11_train_ascend/pcb.yaml # Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] path: /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced # dataset root dir (absolute path) train: images/train # train images (relative to 'path') val: images/val # val images (relative to 'path') test: # test images (optional) # Classes,类别 names: 0: mouse_bite 1: open_circuit 2: short 3: spur 4: spurious_copperWriting yolo11_train_ascend/pcb.yaml4. 下载 Arial.ttf 字体🖋️为了避免影响训练进展,可以先提前下载字体文件并拷贝到 /home/service/.config/Ultralytics 路径下。!wget https://orangepi-ascend.obs.cn-north-4.myhuaweicloud.com/Arial.ttf !mkdir -p /home/service/.config/Ultralytics !cp Arial.ttf /home/service/.config/Ultralytics/Arial.ttf--2025-06-28 05:55:59-- https://pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com/Arial.ttf Resolving pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com (pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com)... 100.125.6.3, 100.125.7.3, 100.125.6.131 Connecting to pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com (pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com)|100.125.6.3|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 773236 (755K) [application/x-font-ttf] Saving to: 'Arial.ttf' Arial.ttf 100%[===================>] 755.11K --.-KB/s in 0.004s 2025-06-28 05:55:59 (188 MB/s) - 'Arial.ttf' saved [773236/773236] 5. 模型训练🧠🔥我们使用yolo11n.pt预训练模型,利用昇腾NPU进行模型加速,设置模型的训练次数为10轮、图像的大小为640x640、开启8个数据加载线程每次送入模型32张图像进行迭代优化。%cd yolo11_train_ascend import torch import torch_npu from torch_npu.contrib import transfer_to_npu from ultralytics import YOLO # Load a model model = YOLO('yolo11n.pt') # load a pretrained model (recommended for training) # Train the model results = model.train(data='pcb.yaml', epochs=10, imgsz=640, workers=8, batch=32) %cd .. /home/service/.local/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library. self.shell.db['dhist'] = compress_dhist(dhist)[-100:] /opt/huawei/edu-apaas/src/init/yolo11_train_ascend /home/service/.local/lib/python3.10/site-packages/torch_npu/utils/path_manager.py:82: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/latest owner does not match the current user. warnings.warn(f"Warning: The {path} owner does not match the current user.") /home/service/.local/lib/python3.10/site-packages/torch_npu/utils/path_manager.py:82: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/8.0.RC3/aarch64-linux/ascend_toolkit_install.info owner does not match the current user. warnings.warn(f"Warning: The {path} owner does not match the current user.") /home/service/.local/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:301: ImportWarning: ************************************************************************************************************* The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now.. The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now.. The backend in torch.distributed.init_process_group set to hccl now.. The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now.. The device parameters have been replaced with npu in the function below: torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty ************************************************************************************************************* warnings.warn(msg, ImportWarning) /home/service/.local/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu. warnings.warn(msg, RuntimeWarning) Creating new Ultralytics Settings v0.0.6 file ✅ View Ultralytics Settings with 'yolo settings' or at '/home/service/.config/Ultralytics/settings.json' Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings. [W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator()) Ultralytics 8.3.160 🚀 Python-3.10.15 torch-2.1.0 CUDA:0 (Ascend910B3, 62432MiB) engine/trainer: agnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=32, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=pcb.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/train, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=True, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None Overriding model.yaml nc=80 with nc=5 from n params module arguments 0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2] 1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2] 2 -1 1 6640 ultralytics.nn.modules.block.C3k2 [32, 64, 1, False, 0.25] 3 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 4 -1 1 26080 ultralytics.nn.modules.block.C3k2 [64, 128, 1, False, 0.25] 5 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 6 -1 1 87040 ultralytics.nn.modules.block.C3k2 [128, 128, 1, True] 7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2] 8 -1 1 346112 ultralytics.nn.modules.block.C3k2 [256, 256, 1, True] 9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5] 10 -1 1 249728 ultralytics.nn.modules.block.C2PSA [256, 256, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1] 13 -1 1 111296 ultralytics.nn.modules.block.C3k2 [384, 128, 1, False] 14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 15 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1] 16 -1 1 32096 ultralytics.nn.modules.block.C3k2 [256, 64, 1, False] 17 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 18 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1] 19 -1 1 86720 ultralytics.nn.modules.block.C3k2 [192, 128, 1, False] 20 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 21 [-1, 10] 1 0 ultralytics.nn.modules.conv.Concat [1] 22 -1 1 378880 ultralytics.nn.modules.block.C3k2 [384, 256, 1, True] 23 [16, 19, 22] 1 431647 ultralytics.nn.modules.head.Detect [5, [64, 128, 256]] /home/service/.local/lib/python3.10/site-packages/torch_npu/utils/storage.py:38: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() if self.device.type != 'cpu': YOLO11n summary: 181 layers, 2,590,815 parameters, 2,590,799 gradients, 6.4 GFLOPs Transferred 448/499 items from pretrained weights Freezing layer 'model.23.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks... [W compiler_depend.ts:51] Warning: CAUTION: The operator 'torchvision::nms' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) AMP: checks passed ✅ train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 620.7±42.3 MB/s, size: 454.2 KB) train: Scanning /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/train... 4646 images, 0 backgrounds, 0 corrupt: 100%|██████████| 4646/4646 [00:05<00:00, 848.20it/s] train: New cache created: /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/train.cache val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 471.4±135.5 MB/s, size: 448.2 KB) val: Scanning /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/val... 422 images, 0 backgrounds, 0 corrupt: 100%|██████████| 422/422 [00:00<00:00, 520.44it/s] val: New cache created: /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/val.cache Plotting labels to runs/detect/train/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: AdamW(lr=0.001111, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0) Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs/detect/train Starting training for 10 epochs... Closing dataloader mosaic Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 0%| | 0/146 [00:00<?, ?it/s] . /home/service/.local/lib/python3.10/site-packages/ultralytics/utils/tal.py:274: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.) target_scores = torch.where(fg_scores_mask > 0, target_scores, 0) [W compiler_depend.ts:103] Warning: Non finite check and unscale on NPU device! (function operator()) 1/10 7.77G 2.238 5.333 1.761 8 640: 100%|██████████| 146/146 [01:31<00:00, 1.60it/s] Class Images Instances Box(P R mAP50 mAP50-95): 0%| | 0/7 [00:00<?, ?it/s] ..... Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:58<00:00, 8.39s/it] all 422 604 0.39 0.0656 0.0888 0.023 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 2/10 8.2G 1.876 2.724 1.462 8 640: 100%|██████████| 146/146 [01:16<00:00, 1.92it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.58it/s] all 422 604 0.451 0.238 0.214 0.0639 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 3/10 8.2G 1.825 1.912 1.445 8 640: 100%|██████████| 146/146 [01:12<00:00, 2.00it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.64it/s] all 422 604 0.339 0.291 0.244 0.0742 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 4/10 8.2G 1.748 1.571 1.398 4 640: 100%|██████████| 146/146 [01:12<00:00, 2.02it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.58it/s] all 422 604 0.409 0.361 0.335 0.117 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 5/10 8.2G 1.703 1.343 1.372 6 640: 100%|██████████| 146/146 [01:11<00:00, 2.05it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.66it/s] all 422 604 0.442 0.34 0.321 0.118 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 6/10 8.2G 1.673 1.26 1.343 5 640: 100%|██████████| 146/146 [01:11<00:00, 2.03it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.62it/s] all 422 604 0.605 0.49 0.53 0.224 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 7/10 8.2G 1.614 1.145 1.316 6 640: 100%|██████████| 146/146 [01:12<00:00, 2.00it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.58it/s] all 422 604 0.595 0.542 0.525 0.206 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 8/10 8.2G 1.578 1.067 1.294 7 640: 100%|██████████| 146/146 [01:11<00:00, 2.03it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.73it/s] all 422 604 0.754 0.629 0.685 0.307 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 9/10 8.21G 1.551 1.009 1.275 8 640: 100%|██████████| 146/146 [01:11<00:00, 2.04it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.57it/s] all 422 604 0.782 0.618 0.703 0.315 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 10/10 8.21G 1.5 0.9621 1.255 6 640: 100%|██████████| 146/146 [01:12<00:00, 2.02it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:03<00:00, 1.83it/s] all 422 604 0.8 0.661 0.732 0.354 10 epochs completed in 0.236 hours. Optimizer stripped from runs/detect/train/weights/last.pt, 5.5MB Optimizer stripped from runs/detect/train/weights/best.pt, 5.5MB Validating runs/detect/train/weights/best.pt... Ultralytics 8.3.160 🚀 Python-3.10.15 torch-2.1.0 CUDA:0 (Ascend910B3, 62432MiB) YOLO11n summary (fused): 100 layers, 2,583,127 parameters, 0 gradients, 6.3 GFLOPs ... Class Images Instances Box(P R mAP50 mAP50-95): 0%| | 0/7 [00:00<?, ?it/s] . Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:07<00:00, 1.14s/it] all 422 604 0.799 0.663 0.732 0.355 mouse_bite 107 169 0.806 0.785 0.829 0.4 open_circuit 73 101 0.656 0.471 0.492 0.219 short 69 87 0.889 0.54 0.701 0.314 spur 95 134 0.864 0.714 0.76 0.342 spurious_copper 95 113 0.782 0.805 0.88 0.5 Speed: 0.1ms preprocess, 8.3ms inference, 0.0ms loss, 2.5ms postprocess per image Results saved to runs/detect/train /opt/huawei/edu-apaas/src/init /home/service/.local/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library. self.shell.db['dhist'] = compress_dhist(dhist)[-100:] 模型训练好后,可以在runs/detect/train目录下查看训练结果,例如损失函数的变化曲线、mAP等评价指标📈💪。6. 图像切分检测✂️🔍最后我们利用SAHI框架对高清PCB图像进行切片推理,从而更精准地检测出PCB的瑕疵类别。import torch import torch_npu from torch_npu.contrib import transfer_to_npu from sahi.predict import get_sliced_prediction from sahi import AutoDetectionModel from PIL import Image detection_model = AutoDetectionModel.from_pretrained( model_type = 'ultralytics', model_path = "yolo11_train_ascend/runs/detect/train/weights/best.pt", confidence_threshold = 0.4, device = "cuda:0" ) 这里我们使用滑窗检测🔍的技术,将原始图像切分成640x640大小的子图🖼️,同时设置一定的重叠度,再分别预测每张子图,最后将所有的检测结果进行合并处理🛠️。image_path = "https://orangepi-ascend.obs.cn-north-4.myhuaweicloud.com/001.bmp" result = get_sliced_prediction( image_path, detection_model, slice_height = 640, slice_width = 640, overlap_height_ratio = 0.1, overlap_width_ratio = 0.1, perform_standard_pred = False, postprocess_class_agnostic = True, postprocess_match_threshold = 0.1, ) result.export_visuals(export_dir="output/", file_name="sliced_result") Image.open("output/sliced_result.png") Performing prediction on 24 slices.可以看到,模型准确无误的预测出PCB缺陷的位置、类别和置信度😄7. 小结📌本案例借助华为云开发者空间💡昇腾910B NPU完成YOLO11模型训练与PCB缺陷检测,并且结合SAHI实现高效切片推理🚀,华为云开发者空间💻AI Notebook开箱即用,大家快来体验吧!🤗 ----转自博客:https://bbs.huaweicloud.com/blogs/455280
  • [技术干货] 基于开发者空间Notebook部署Qwen2-VL-7B-Instruct
    一、 背景Qwen2-VL-7B-Instruct是通义千问系列中的一款多模态大模型,具备强大得视觉与语言理解能力。它在保持较小体积的同时,提供了出色的视觉理解和语言生成能力,是当前中文多模态AI领域的优秀选择之一。华为开发者空间内置昇腾NPU资源,开发者每天共有两个小时的免费使用时长,本次为开发者带来基于华为开发者空间Notebook部署Qwen2-VL-Instruct模型进行图片理解全流程。二、环境配置及模型部署首先,浏览器进入魔塔社区,获取Qwen2-VL-7B-Instruct模型文件进入华为开发者空间Notebook,进行模型下载,打开终端输入下载模型命令:git clone https://www.modelscope.cn/Qwen/Qwen2.5-VL-7B-Instruct.git模型下载完毕后开始配置环境,随后安装需要的工具包pip install qwen-vl-utils必要安装包下载完毕后,进入模型文件下获取路径。点击左上角 “+” 启动一个代码页,将以下地址复制到代码行中进行工具包的安装。pip install --upgrade transformers peft diffusers accelerate将以下代码复制到代码行中,其中替换模型路径。import os import torch # 设置 NPU 内存优化 os.environ["PYTORCH_NPU_ALLOC_CONF"] = "expandable_segments:True" # 修复 torch.compiler if not hasattr(torch.compiler, 'is_compiling'): torch.compiler.is_compiling = lambda: False import torch_npu from modelscope import Qwen2VLForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info # Step 1: 确认 NPU 可用 assert torch.npu.is_available(), "NPU not available" # Step 2: 加载模型(使用 bfloat16) model = Qwen2VLForConditionalGeneration.from_pretrained( "/opt/huawei/edu-apaas/src/init/model/Qwen2-VL-7B-Instruct", torch_dtype=torch.bfloat16, device_map=None, trust_remote_code=True ) model = model.eval() # 关闭训练模式 model = model.to("npu:0") # Step 3: 限制图像 token 数量(🔥 关键) min_pixels = 256 * 28 * 28 max_pixels = 1280 * 28 * 28 processor = AutoProcessor.from_pretrained( "/opt/huawei/edu-apaas/src/init/model/Qwen2-VL-7B-Instruct", trust_remote_code=True, use_fast=False, min_pixels=min_pixels, max_pixels=max_pixels ) # Step 4: 构造输入 messages = [ { "role": "user", "content": [ { "type": "image", "image": "/opt/huawei/edu-apaas/src/init/model/Qwen2-VL-7B-Instruct/fed651d4f97246c4_big.jpg", }, {"type": "text", "text": "Describe this image."}, ], } ] text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("npu:0") # Step 5: 推理 with torch.no_grad(): generated_ids = model.generate(**inputs, max_new_tokens=128) # 解码 generated_ids_trimmed = [ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print("Output:", output_text) 代码复制完毕后,再将需要进行理解的图片上传到Notebook中,并将代码中的图片路径进行替换。替换完毕后,运行,最后结果会将上传的图片进行解释。至此,Qwen2-VL-7B-Instruct模型部署完毕。
  • [分享交流] X+AI驱动下的教育革新与产教融合实践
     本期直播聚焦X+AI驱动下的教育革新与产教融合实践,吸引超过7200人次的在线观看,累计社媒播放量突破6000+,本期企业开发者占比56%。 
  • [技术交流] 基于昇腾云的模型迁移适配 --- 从GPU到NPU的 PyTorch 模型迁移与训练优化
    一、背景与问题在AI模型开发领域,训练平台的硬件架构差异对模型迁移和性能优化提出了显著挑战。当前主流的PyTorch框架最初针对NVIDIA GPU设计,其代码实现高度依赖CUDA生态(如torch.cuda.*接口、nn.DataParallel并行模式等)。随着国产算力平台的崛起,有越来越多的方式方法,能够将此类模型无缝迁移至国产NPU平台,并兼顾训练稳定性与性能表现。迁移过程中面临的核心问题包括:硬件架构差异:GPU与NPU在内存管理(如显存/NPU内存分配策略)、计算单元(SIMD vs. 张量核)、通信机制(CUDA vs. HCCL)等方面存在本质区别;接口适配复杂度:PyTorch原生接口需逐层替换为NPU适配接口(如 npu() 替代 cuda() ),且部分算子需定制化开发;分布式训练兼容性:多卡通信后端需从NCCL切换为HCCL,同时验证分布式逻辑在NPU平台的稳定性;性能瓶颈定位:需针对NPU特性优化超参数(如batch size、学习率),并量化迁移后的精度与耗时差异。 二、实现原理算子级替换机制接口映射:通过torch.device("npu")指定设备,逐个替换tensor.cuda()为tensor.npu(),并替换分布式接口init_process_group(backend="nccl")为backend="hccl"。算子适配:对未直接映射的CUDA算子,采用官方扩展库(如torch_npu)实现功能替代。环境配置核心逻辑环境变量隔离:通过ASCEND_HOME、LD_LIBRARY_PATH等环境变量明确指向NPU驱动路径,确保PyTorch正确加载NPU运行时依赖。资源分配策略:单卡训练时需根据NPU内存容量调整batch size;多卡场景下通过DistributedDataParallel封装模型,并利用HCCL优化通信效率。分布式训练优化通信后端切换:HCCL(华为高性能通信库)替代NCCL,支持更高效的多卡数据同步与梯度聚合。线性加速比验证:通过多卡并行训练验证HCCL的通信开销控制能力,确保加速比接近理论线性增长。 三、主要技术内容1:算子切换与接口替换背景:原模型使用 PyTorch + CUDA 进行开发,依赖大量 CUDA 接口及算子,如 `torch.cuda.*`、`nn.DataParallel()`、`torch.utils.checkpoint` 等。工作内容:使用手动迁移方式对关键算子进行逐个替换:- 将 device = torch.device("cuda") 替换为 device = torch.device("npu");- 替换所有与设备相关的函数调用,如 tensor.cuda() → tensor.npu();- 替换分布式通信接口:由 torch.distributed.init_process_group(backend="nccl") 改为 backend="hccl" ;     2. 对不支持自动映射的算子进行了定制化开发或寻找替代方案,例如:- 使用 Ascend 提供的 torch_npu 扩展库中的等价算子;-在昇腾社区寻找相关的算子进行替换;2:环境变量与运行配置设置环境变量:为了确保 PyTorch 能够正确识别并调度 NPU 设备,设置了以下环境变量:bashexport ASCEND_HOME=/usr/local/Ascend/latest export PATH=${ASCEND_HOME}/compiler/bin:${ASCEND_HOME}/tools/bin:${PATH} export PYTHONPATH=${ASCEND_HOME}/pyACL/python/site-packages/acl:$PYTHONPATH export LD_LIBRARY_PATH=${ASCEND_HOME}/lib64:${ASCEND_HOME}/runtime/lib64:${LD_LIBRARY_PATH}配置启动脚本:修改训练启动脚本以指定设备为 NPU:pythondevice = torch.device("npu") model.to(device)-    若使用多卡训练,则添加如下逻辑: bashtorch.distributed.init_process_group(backend='hccl', world_size=world_size, rank=rank)3:单卡NPU训练验证步骤:将原始训练脚本导入昇腾环境;替换所有 CUDA 接口为 NPU 接口;设置 batch size、学习率等超参数以适应 NPU 的显存限制;验证 loss 下降趋势、准确率、收敛速度是否正常;输出模型 checkpoint 并与 GPU 版本对比验证精度一致性。性能表现:指标GPU(V100)NPU(Ascend 910)单步训练耗时180ms205ms最终精度87.2%87.0%注:NPU 在首次执行时存在编译延迟,后续迭代时间可降至约 170ms。4:单机多卡 NPU 分布式训练架构调整:- 将模型封装为 `DistributedDataParallel` 模式:pythonmodel = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])- 替换通信后端为 HCCL:bash  torch.distributed.init_process_group(backend='hccl')  多卡资源配置:- 使用 torchrun 或自定义脚本启动多卡训练:bashtorchrun --nproc_per_node=8 train.py性能测试结果:卡数总训练时间(Epoch)加速比(相对单卡)11h20m1x242m1.9x423m3.5x813m6.1x> 注:加速比接近线性,说明通信开销控制良好,HCCL 表现优异。 四、问题排查与优化典型问题与解决方案:问题解决方案不支持的 CUDA 算子查阅 Ascend 官方文档,查找对应算子映射关系,或采用 `torch_npu` 扩展库多卡训练通信异常使用 `hccl_tools` 工具检测通信组建立情况,确认 rank 配置无误内存占用高减小 batch size,启用混合精度训练(`amp`),关闭冗余日志输出初始化失败或设备不可见检查环境变量、驱动版本、固件版本是否匹配性能调优建议:利用NPU的内存复用机制优化显存占用;通过Ascend Profiler工具定位训练瓶颈(如算子耗时、通信开销)。通用迁移方法论:建立CUDA-CANN算子映射表,加速接口替换;采用分阶段验证(单卡→多卡→性能调优),降低迁移风险。 五、成果总结成功将 PyTorch 模型从 GPU 平台完整迁移至 Ascend NPU 平台;实现了单卡 NPU 上的稳定训练流程;支持单机多卡分布式训练,具备良好的扩展性和稳定性;在功能一致性的前提下,性能指标基本达到预期;积累了 NPU 上 PyTorch 模型训练的实践经验,为后续大规模迁移打下基础。
  • [课程学习] 【华为云学堂】解锁 AI 未来,三大黄金学习路径重磅来袭!
            从0到1解锁大模型应用开发:这3条学习路径让你玩转智能应用!        [学习链接] cid:link_0别让AI浪潮把你拍在沙滩上!这份大模型学习秘籍请收好!        在这个AI应用如雨后春笋般涌现的时代,你是否也渴望成为站在技术前沿的弄潮儿?但面对大模型开发这座看似高不可攀的山峰,满心的热情却常常被 “不知从何学起” 的迷茫浇灭。别担心!由华为云学堂技术专家精心打磨的大模型学习路径,正是为你量身定制的 “通关指南”!        跟着这3条学习路径,您将经历:        大模型应用开发学习路径——从0到1拆解大模型核心技术        大模型应用开发学习路径就像一把万能钥匙,带你0基础入门,一步步拆解大模型应用开发全流程。想象一下,通过学习,你能亲手打造出服务千万用户的智能助手,或是为教育、医疗行业定制专属的个性化推荐系统,让大模型真正成为解决实际问题的得力工具,无论你是开发者还是AI爱好者,在这里都能掌握构建智能应用的硬核技能,让你的创意借助大模型落地开花!        从入门到精通五步法1、了解AI与大模型2、会用AI大模型3、会模型部署和集成4、会做大模型应用开发5、会模型微调        RAG开发学习路径——搭建你的第一个大模型应用        RAG开发学习路径,堪称攻克大模型 “幻觉”难题的利剑。系统学习向量数据库搭建、语义检索优化等核心技术,掌握知识动态融合策略,你将拥有打造精准智能问答、专业知识库系统的超能力。当企业因你的 RAG 技术,让知识管理变得高效有序,让智能客服更加 “聪明伶俐”,你就是推动行业变革的幕后英雄。        从入门到精通四步法1、初识RAG2、掌握RAG关键技术3、搭建一个RAG应用4、RAG性能优化        AI Agent 开发学习路径——掌握AI Agent开发的核心技巧        AI Agent 开发学习路径则充满了未来感,它将带你探秘智能体自主决策的神奇世界。学习任务规划、多智能体协作等前沿技术,亲手构建能自主思考的智能办公助手、自动化运营 Agent。        AI Agent开发与构建从原理到实战五阶梯1、初始AI Agent2、AI Agent常用架构3、深入了解AI Agent中的常用工具4、AI Agent集成工具优化技术:MCP5、搭建一个AI Agent应用        更令人惊喜的是,我们还准备了 “豪华学习大礼包”!        开发者空间——开发者专属的云上成长空间        汇聚预置华为根技术工具和资源,一站式服务使能开发者持续探索创新,配套超全大模型学习宝库!实验、课程、案例、华为云开发者认证一网打尽,还能获得权威认证证书,助力开发者能力快速进阶。        别再犹豫,别再观望!与其在AI浪潮中被动等待,不如主动出击,踏上大模型学习之路,解锁属于你的AI未来,成为下一个改变世界的技术达人!点击链接,开启这场充满惊喜与挑战的学习之旅吧!
  • [分享交流] 深圳人才福利!5大“国家队”级信创技术免费培训课程开始报名
    当别人已在鸿蒙生态接单月入3万+,你还在Java内卷?当大模型重构IT岗位,传统运维正批量淘汰…体贴的深圳人社为您再次奉上信创技术/鸿蒙系统/麒麟系统等新一代信息应用技术生存技能大餐•       信创智算与大模型技术课程•       开源高斯数据库技术课程•       鸿蒙原生应用开发课程•       开源鸿蒙设备应用技术课程•       前沿  科技  国产系统应用 还等什么基于麒麟操作系统的信创基础软件适配迁移与运维课程 【理论授课、现场实战、组队攻坚】告别枯燥的理论推砌聘请行业专家担任讲师5门免费课程助你抢占信息技术高地,满级晋升!最重要的是全!免!费!咱不花一分钱就开启成长与蜕变的大门报名有啥要求?咋报名?快随我往下看吧!一、  报名条件报名学员需具有新一代电子信息应用技术相关的行业从业背景或具备相关专业背景,并满足以下条件之一即可:1.本市户籍人员;2.在本市正常缴交社会保险的人员;3.深圳市登记失业人员;4.在深圳市公共就业服务机构进行离校两年未就业实名登记的本市高校毕业生;5.本市高校或本市户籍在市外高校的毕业年度(指毕业时间在2025年1月1日至12月31日之间)毕业生(含技师学院高级工班、预备技师班)。温馨提示:(1)同一劳动者同一年度只能参加1次项目制培训哦。如果您已经参加2024年度项目制培训但未完成规定学时50%以上,很遗憾,那无法参加2025年度的项目制培训了哦。(2)同一劳动者同一年度内企业新型学徒制培训,学生学徒制培训、技培生学徒制培训只能参加一次,且均不能和项目制培训同时享受。等等先别急还没完!还有额外补贴!【额外补贴】如您满足以下条件之一:•       本市就业困难人员•       本市零就业家庭成员•       本市就业残疾人•       本市城市低保家庭成员•       本市毕业2年内的“两后生”中的农村学员•       本市求职就业的脱贫人员不仅可以免费学课程还可以再领500元的生活补贴金💴! 接下来咱们看一下具体学习内容吧!二、  学习内容以及报名方式 新一代电子信息应用技术项目制培训 指导单位:深圳市人力资源和社会保障局主办单位 :深圳市职业技能培训指导中心承办单位:深圳职业技术大学1、信创智算与大模型技术课程课程内容培训天数培训名额主要内容包括:1.基于昇腾平台的 DeepSeek 模型搭建与优化2.华为云昇腾算力支持下的私有场景大模型部署3.基于昇腾与 DeepSeek 的私有大模型自主训练4.电商场景下大模型的创新应用与拓展6天约300人 信创智算与大模型技术课程报名二维码及交流群(QQ群)    2、开源高斯数据库技术课程课程内容培训天数培训名额主要内容包括:1.高斯数据库安装与对象管理实操2.场景化高斯数据库实验探索3.数据库AI策略与技巧4.数据安全管理与防护6天约250人开源高斯数据库技术课程报名二维码及交流群(QQ群)   3、      鸿蒙原生应用开发课程课程内容培训天数培训名额主要内容包括:1.基于ArkTSUI框架搭建实训云平台2.鸿蒙原生办公签到系统开发3.基于Next版本开发实时社交软件联动DeepSeek实现聊天问答4.基于鸿蒙服务卡片开发音乐推荐软件5.鸿蒙原生健康服务检测软件开发6天约250人 鸿蒙原生应用开发课程报名二维码及交流群(QQ群)     4、      开源鸿蒙设备应用技术课程课程内容培训天数培训名额主要内容包括:1.OpenHarmony搭建与配置2.开源鸿蒙设备驱动开发和集成3.基于开源鸿蒙的HAL层开发4.基于开源鸿蒙的智能家居软硬件开发6天约250人 开源鸿蒙设备应用技术课程报名二维码及交流群(QQ群)    5、基于麒麟操作系统的信创基础软件适配迁移与运维课程课程内容培训天数培训名额主要内容包括:1.银河麒麟桌面操作系统 V10 的管理应用2.掌握适配测试基础及软硬件适配测试技能6天约250人 基于麒麟操作系统的信创基础软件适配迁移与运维课程报名二维码及交流群(QQ群)   三、成长与收获1.掌握实用技能,提高自身职业技能,增强就业竞争力,优化职业发展路径;2.培训考勤达标且考核通过将获得《深圳市职业技能提升培训合格证书》;3.可自行选考行业权威认证:HCIA-AI认证、HCIA-openGauss认证、HarmonyOS应用开发者认证、开放原子OpenHarmony人才认证、KYCA、KYCP认证(不含考证费)。四、班级设置l  7月29日-9月30日  开设日常精品班(周一到周六)开设周末精品班(周六或周日单日班)l  7月29日-10月20日开设周末精品班(周六或周日单日班)五、咨询与联络黄老师:13528095312(微信同号)周老师:0755-26019607咨询时间:工作日9:00- 18:00其他时间咨询联系QQ群工作人员六、培训地点深圳职业技术大学西丽湖园区(信息楼)建议绿色出行:深圳地铁5号线西丽地铁站F口步行800米。公交车站-深圳职业技术大学(西丽湖园区),线路包括M197、M182、M176、M492、高峰专线59、325、M535、M217、67、326等。 别再犹豫,抓住这个难得的机会,让自己在公益性培训中实现华丽转身!立即报名,开启你的成长之旅!
  • [大赛资讯] 初赛A榜格式分统计(截止7月20日23:59)
    团队名称格式得分精度得分西北智联161153CEATRG0149.5hid_t7v_sdh548dhkkz0148.5hid_cwfo0xxj8regp6r0146鸿蒙极客队162143.48BUPT-ParCIS160136.34notrickno154136.08武汉船院计算机2307#2153133.49ECNU_ELRM139132.15全都对队162131.56judgeyang98131myf gogogo98129.11二进制萝卜培育中心98127擎狮0126纳算力克大工坊105125.45不玊之客劣等兵朴昌罗桂夏0123奇点0122.73点子王0122.45破晓者155120.4想去研究大模型108119.9bupt735162117.47Decoder-Only161116123462115[object Object]0114.88浙安院云计算63112.54挑战杯揭榜挂帅华为0111.75说人话0111.11昇腾推理智速引擎0109.08试试0107.1蒜鸟你搞不赢队0106.55ken0102.35hid_b2ydyl88e3z7tqc160102.02智在必得098.79yangs_wdxw12096.88TEMP12096.88这对吗096.03拳头花可火091.64西北文科大学队13891武汉船院计算机2307090.71昇腾芯链088.33hid_77fv2kg9-fgvjfg087.81三角矩阵085.99随便起一个13585.08永宁永胜9883.93123--083.79PACKPACK11378.32华东理工大学AIMC实验室074.49昇腾智推大模型072.53CodeWisdom070.54GT-ejdkd068.7Create3267扬花落满肩064.14马桶蹲累了160.97急急急060.11AAA建材王哥058璃月医科大学孤云阁校区16257.22lab3083851.42fengerhu14946被资本做局041.24智枢拓界025.66hid_x7qejp3bft91lsd16012.02hid_hyh--co6xfj6nhk010challenger X07A-team02hw035532519500.4CCD队00.4没机基队00
  • [AI类] 昇腾310P推理性能
    我使用310P搭建的图片推理服务,yolov5l模型推理性能为100fps,我想问一下这个性能是这张卡的上限了吗?
  • [大赛资讯] 初赛A榜格式分统计(截止7月16日14:30)
    截止7月16日14:30前,各团队初赛A榜注:仅最高精度分对应的格式得分,如最高精度得分相同,以最早提交的最高精度得分为准。团队名称格式得分精度得分西北智联161153hid_t7v_sdh548dhkkz0148.5hid_cwfo0xxj8regp6r0146鸿蒙极客队162143.48BUPT-ParCIS160136.34CEATRG0135.5武汉船院计算机2307#2153133.49ECNU_ELRM139132.15全都对队162131.56judgeyang98131myf gogogo98129.11二进制萝卜培育中心98127擎狮0126纳算力克大工坊105125.45不玊之客劣等兵朴昌罗桂夏0123奇点0122.73点子王0122.45破晓者155120.4bupt735162117.47Decoder-Only161116123462115浙安院云计算63112.54试试0107.1ken0102.35hid_b2ydyl88e3z7tqc160102.02蒜鸟你搞不赢队0101.19智在必得098.79说人话098.77这对吗096.03TEMP095.17拳头花可火091.64武汉船院计算机2307090.71挑战杯揭榜挂帅华为088.8hid_77fv2kg9-fgvjfg087.81随便起一个13585.08昇腾芯链084.74yangs_wdxw082.65三角矩阵081.22PACKPACK11378.32永宁永胜075.39华东理工大学AIMC实验室074.49昇腾智推大模型072.53CodeWisdom070.54GT-ejdkd068.7急急急060.11璃月医科大学孤云阁校区16257.22想去研究大模型056.65马桶蹲累了056.05fengerhu14946扬花落满肩118.43Create017.11hid_x7qejp3bft91lsd16012.02智枢拓界011.59hid_hyh--co6xfj6nhk010challenger X07A-team02hw035532519500.4CCD队00.4没机基队00
  • [大赛资讯] 初赛A榜格式分统计(截止7月9日14点00分)
    截止7月9日14点00分前,各团队初赛A榜注:仅最高精度分对应的格式得分,如最高精度得分相同,以最早提交的最高精度得分为准。团队名称格式得分精度得分hid_t7v_sdh548dhkkz0148.5hid_cwfo0xxj8regp6r0146鸿蒙极客队162143.48CEATRG0135.5武汉船院计算机2307#2153133.49ECNU_ELRM144131.34擎狮0126纳算力克大工坊105125.45奇点0122.73点子王0122.45破晓者155120.4bupt735162117.47西北智联133117Decoder-Only161116123462115judgeyang0114.07浙安院云计算63112.54二进制萝卜培育中心0112.06BUPT-ParCIS76108.52myf gogogo0107.23试试0107.1不玊之客劣等兵朴昌罗桂夏0106.4ken0102.35hid_b2ydyl88e3z7tqc160102.02智在必得098.79这对吗096.03蒜鸟你搞不赢队092.13拳头花可火091.64武汉船院计算机2307090.71说人话089.05hid_77fv2kg9-fgvjfg087.81TEMP085.96yangs_wdxw082.65昇腾芯链082.39永宁永胜075.39华东理工大学AIMC实验室074.49挑战杯揭榜挂帅华为073.73CodeWisdom070.54GT-ejdkd068.7急急急060.11璃月医科大学孤云阁校区16257.22想去研究大模型056.65全都对队4846fengerhu14946扬花落满肩118.43Create017.11hid_x7qejp3bft91lsd16012.02hid_hyh--co6xfj6nhk010智枢拓界09.56challenger X07A-team02hw035532519500.4没机基队00
  • [课程学习] 昇腾AI专区学习路径正式发布 —开启AI创新与应用之旅
    系统学习人工智能,掌握硬核AI技术,通关秘籍来啦!!!昇腾AI专区精心打造的「6步进阶式学习路径」正式上线,为学习者提供了一条全面且系统的学习路径,帮助学习者掌握关键技术,推动AI创新与应用。无论你是初学者还是经验丰富的开发者,都能在这条学习路径中找到适合自己的学习方向。助你从入门到精通,清晰路径,步步为营,稳扎稳打攀登AI高峰!一、人工智能:AI世界的万能钥匙「Python」Python作为AI领域的“通用语言”,是AI学习中效率最高的工具之一,它能让你把精力聚焦在AI算法和业务逻辑上,而不是被复杂的编程语法束缚。无论是入门机器学习还是开发复杂的大模型应用,Python都是不可或缺的基础技能。第一步从动手学Python开始,从基础语法到高级编程技巧,再到实际项目实践,掌握Python核心知识与应用能力。① 动手学Python② python数据处理③ 实用AI库④ 动手学机器学习⑤ 动手学深度学习二、昇腾基础入门:走进昇腾AI进入昇腾基础入门阶段,学习者将正式踏入昇腾AI的领域。这一板块主要介绍昇腾AI硬件的全栈技术体系,深度剖析昇腾架构核心模块,解读910/310等AI处理器的性能特性与场景适配策略,并实战演练Atlas全栈产品的部署方案,为昇腾AI解决方案开发提供芯片级调优能力。以及CANN(Compute Architecture for Neural Networks)异构计算架构的基础知识。学习者将掌握基于AscendCL的高性能应用开发技能,深入理解GE图引擎优化技术,并熟练运用ATC、AOE、AMCT等核心…系统培养昇腾AI处理器的高性能算子开发能力。① 昇腾硬件② 异构计算架构CANN③ 昇腾算子开发三、昇腾模型开发:掌握模型构建核心昇腾模型开发板块深入探讨基于昇腾平台的模型开发技术。将系统培养基于昇腾平台的PyTorch全栈开发能力,覆盖环境配置、模型迁移、性能优化及精度调优全链路实战。提供昇腾平台无缝迁移全栈方案,深度融合MindSpore框架与昇腾硬件特性。① 昇腾PyTorch开发② 昇腾模型开发工具链③ 昇腾PyTorch三方库④ 昇腾PyTorch经典任务实践⑤ 昇腾MindSpore开发⑥ 昇腾MindSpore迁移开发四、大模型开发:探索昇腾大模型框架在大模型开发板块,学习者将深入了解大模型的原理、架构和训练方法、高效推理能力。学习昇腾平台大模型训练核心技术,聚焦MindSpeed框架的并行架构革新、计算加速优化与内存极致压缩三大维度。① 昇腾大模型训练框架② 昇腾大模型推理框架五、应用开发:实现AI技术落地应用开发是将AI技术转化为实际价值的关键环节。在这一板块,学习者将学习构建大语言模型的高效交互能力,从提示工程基础到进阶应用,通过精准提示词设计掌握模型行为控制、任务分解与多轮对话编排技术,实现大模型输出质量与可靠性的双重提升。聚焦LangChain框架的核心技术实践,实现复杂任务的自动化分解与执行编排。详细介绍Pangu-Pro-MoE、DeepSeek等经典大模型在提示工程、代码编写、信息抽取以及Agent构建等方面的应用,LlamaIndex框架的智能应用开发能力。① 提示工程② LangChain大模型应用开发③ 经典大模型应用开发④ LlamaIndex大模型应六、AI4S:拓展AI应用新领域AI4S(AI for Science)是AI技术在科学研究领域的应用,为科学研究带来了新的方法和思路。在AI4S板块,学习者将深度融合人工智能与生命科学前沿技术,聚焦生物序列智能建模,为精准医疗与生物工程提供数据驱动的科研新范式。并聚焦AI技术在地球科学领域的应用与前沿发展,培养学生利用机器学习、深度学习等方法解决地球系统问题的能力。通过这一板块的学习,学习者能够拓展AI应用的视野,为未来在科学研究领域的创新提供技术支持。① AI与生命科学② AI与数学/物理求解③ AI与地球科学此外,昇腾AI专区学习路径还涵盖了经典大模型应用开发、使用AI库、动手学机器学习等热门学习路径。通过实际操作和案例分析,学习者能够更加深入的理解和掌握相关技术。昇腾AI专区学习路径为学习者提供了一个全面、系统、循序渐进的学习框架,帮助学习者逐步掌握昇腾关键技术,推动AI创新与应用。无论你是希望在AI领域开启职业生涯,还是寻求技术突破和创新,都能在这条学习路径中找到属于自己的成长之路。快来加入昇腾AI专区的学习之旅,点这里>>cid:link_0一起探索AI世界的无限可能!
  • [案例共创] 【案例共创】基于华为开发者空间+DeepSeek:打造全链路高效数据分析工作流
    前言在当今这个数字经济时代,数据无疑成为企业最核心的资产之一。面对激烈的市场竞争,企业若想精准决策,迅速响应市场变化,就必须依赖高效的数据分析能力。数据分析不仅能够帮助企业洞察市场趋势、预测消费者行为,更可以优化运营效率,降低成本,提升企业的整体竞争力。举个简单的例子:以某服装电商为例:双十一前夕,数据分析师发现一个关键趋势——平台上’oversized卫衣’的搜索量在过去一周内激增了300%,但转化率却只有2.1%,远低于平均水平的5.8%。通过深入分析用户行为轨迹,团队发现问题所在:用户在商品详情页的平均停留时间只有15秒,且90%的用户会点击尺码表后直接关闭页面。进一步调研发现,消费者对’oversized’的具体版型理解存在偏差,担心买到的衣服过大或过小。基于这一洞察,企业立即调整策略:在商品页面增加真人试穿视频,标注模特身高体重及穿着尺码;优化尺码表,用’宽松度对比图’替代传统数据表格;针对搜索’oversized卫衣’的用户推送’30天无理由退换’的专属服务。结果在48小时内,该品类的转化率提升至6.2%,销量环比增长了180%。而没有及时响应这一数据信号的竞争对手,则错失了这波流量红利。这个成功案例的背后,反映出高效数据分析工作流的重要性。然而,在实际的数据分析实践中,许多企业和开发者仍面临着诸多挑战:从数据采集、清洗、建模到可视化展示,每个环节都可能成为效率瓶颈。如何构建一套流畅、智能的全链路数据分析工作流,正成为当下亟待解决的关键问题。本文将深入探讨华为开发者空间与DeepSeek的强强联合,如何助力打造全链路高效数据分析工作流,帮助开发者和企业在数据驱动的道路上走得更快、更稳。数据分析链路中存在哪些典型痛点?尽管数据分析的重要性已经得到普遍认可,但现实中企业在数据分析链路中仍面临诸多具体的挑战:首先,数据采集与处理难度大。企业的数据通常散落在各类系统和业务平台中,这些数据格式多样且杂乱无章,数据往往分散在ERP、CRM、电商平台、社交媒体、IoT设备等多个系统中,数据格式千差万别(结构化、半结构化、非结构化),数据标准不统一。获取这些数据需要对接各种API接口,配置复杂的ETL流程,整个过程耗时耗力。企业需要花费大量时间和精力去获取、清洗和整合数据,确保数据的准确性和一致性。这种繁琐而耗时的数据预处理环节严重拖慢了整体数据分析效率。其次,分析过程繁琐复杂,从数据预处理到建模分析,再到结果验证,往往需要使用多种不同的工具和平台(如Python、R、SQL、Tableau、PowerBI等),工具间切换频繁,学习成本高,协作效率低。析过程往往缺乏标准化流程和版本管理,同一个分析任务在不同时间、不同人员操作下可能产生不同结果,影响分析结果的可信度和可复现性。此外,数据分析人才短缺也是企业普遍面临的问题。纯技术背景的分析师往往缺乏对业务场景的深度理解,而业务专家又缺乏数据分析的技术能力,导致分析结果与业务需求之间存在较大鸿沟。专业的数据分析师招聘难度大、成本高,并且培养周期较长,企业难以快速组建起有效的数据分析团队,进一步加剧了数据分析能力的瓶颈。最后,分析结果的可视化程度较低。数据分析结果如果无法直观清晰地呈现给决策者,数据分析往往涉及IT、业务、管理等多个部门,缺乏统一的协作平台和标准化流程,沟通成本高,项目推进缓慢。以上痛点使得企业亟需更高效、更智能、更简单的数据分析解决方案。如何利用云计算、人工智能等新兴技术,打造一套全链路、低门槛、高效能的数据分析工作流,已成为当下数字化转型的关键课题。华为开发者空间与DeepSeek的技术基础华为开发者空间是面向全球开发者打造的专属空间,其技术基础具有多方面特性。它汇聚了鸿蒙、昇腾、鲲鹏、GaussDB、欧拉等各项根技术的开发资源及工具。在硬件资源层面,依托华为强大的技术实力,能为开发者提供性能强劲的云主机,保障开发过程中的计算需求,无论是复杂的模型训练,还是大规模的数据处理,都能高效运行。在开发工具方面,提供了一站式的开发环境。以 CodeArts IDE for Cangjie 编辑器为例,它不仅具备常用的语法高亮、错误诊断、自动补全等功能,可实时反馈并提升开发效率,还支持反向调试,方便开发者查看历史调试信息。并且,该编辑器被预置在云主机环境中,开发者开箱即用,极大降低了开发的前期准备成本。同时,开发者空间还集成了丰富的命令行工具,满足不同开发者多样化的开发习惯和复杂的开发需求。此外,华为开发者空间提供了从开发编码到应用调测的配套案例,结合其云上存储空间,形成了一个完整的开发闭环。开发者可以基于这些资源,在华为根技术生态下高效便捷地进行知识学习、技术体验以及应用创新。在利用华为开发者空间提供的云主机开发 DeepSeek 技术时,可从多维度实现两者的有机融合。华为开发者空间性能强劲的云主机,能为 DeepSeek 技术开发提供坚实的硬件支撑。预置的 CodeArts IDE for Cangjie 编辑器具备语法高亮、错误诊断、自动补全等功能,还支持反向调试,方便开发者对 DeepSeek 模型开发过程进行调试和优化。集成的丰富命令行工具,可满足 DeepSeek 开发中不同场景的需求。从开发闭环角度看,华为开发者空间从开发编码到应用调测的配套案例,结合云上存储空间,为 DeepSeek 技术开发提供了完整的流程支持。开发者可借助这些资源,在华为根技术生态下,更高效地进行 DeepSeek 模型的知识学习、技术体验以及应用创新。同时,DeepSeek 完全开源 R1 等模型、采用 MIT 协议的开源策略,与华为开发者空间的开放生态相契合,开发者可在云主机环境中自由定制和二次开发 DeepSeek 模型,进一步推动技术共享与协作。开放的 API 定价优势,也能让更多中小企业在华为开发者空间云主机上开发和应用 DeepSeek 技术,降低进入 AI 领域的门槛,促进 DeepSeek 在各行业的广泛应用。全链路高效数据分析工作流搭建与应用实践1.配置云主机首先进入到开发者空间,之后点击工作台,开始配置云主机:配置相应的云主机:以上系统模拟我们开发生产环境的主要机器,之后我们来开始一步步搭建,首先进入到云开发空间里面,打开终端界面,输入curl -fsSL https://dtse-mirrors.obs.cn-north-4.myhuaweicloud.com/case/0035/install.sh | sudo bash2.下载ollama先下载ollama下载完毕之后我们可以借助 Ollama 工具来部署 Deepseek 大模型,部署 deepseek-r1:1.5b 版本,如果硬件支撑可以部署更高效的模型,执行命令:ollama run deepseek-r1:1.5b以上我们就部署完了,可以尝试输入prompt来测试效果:通过命令:sudo netstat -tunlp可以查看olloama开放的本地端口:3.CodeArt IDE for Python那么接下来我们可以打开CodeArt IDE for Python,对端口进行通信,完成这一步之后我们可以开始尝试构建智能体:我们知道ollama serve默认监听地址为 http://localhost:11434,首先下载requests库:pip install requests我们可以通过Python代码尝试通信:import requests def chat_with_ollama(prompt, model="deepseek-r1:1.5b"): url = "http://localhost:11434/api/generate" headers = { "Content-Type": "application/json" } data = { "model": model, "prompt": prompt, "stream": False # 关闭流式返回,适合简单测试 } try: response = requests.post(url, json=data, headers=headers) response.raise_for_status() result = response.json() print("模型回复:", result["response"]) except requests.exceptions.RequestException as e: print("❌ 请求出错:", e) except Exception as e: print("❌ 其他错误:", e) if __name__ == "__main__": test_prompt = "你好" chat_with_ollama(test_prompt) 如何能获取到大模型输出,说明我们之前的过程都没有问题:4.工作流搭建.那么我们下一步就可以开始全链路高效数据分析工作流搭建了:4.数据获取/清洗/提取首先我们可以思考数据获取层,这方面可以是企业存储数据和历史数据,也可以爬取互联网上相关信息数据,这方面不需要AI介入,而数据清洗和提取就可以通过大模型完美替代,而且清洗提取特别高效。我们可以将这个 Agent 封装为一个 Python 类,支持如下功能:功能说明clean_text(text)文本清洗:去除空格、标点符号、HTML、无用词等extract_fields(text, instruction)利用大模型提取指定结构信息(如姓名、地址、公司名等)batch_process(data_list)支持批量数据清洗和字段抽取custom_prompt(data, task)通过自定义任务 prompt 构造更复杂的清洗与抽取任务具体代码可以如下编写:import requests import re class DataCleaningAgent: def __init__(self, model_name="deepseek-r1:1.5b", host="http://localhost:11434"): self.model = model_name self.api_url = f"{host}/api/generate" def _call_model(self, prompt, stream=False): payload = { "model": self.model, "prompt": prompt, "stream": stream } try: response = requests.post(self.api_url, json=payload) response.raise_for_status() result = response.json() return result.get("response", "").strip() except Exception as e: print("调用失败:", e) return "" def clean_text(self, text): """ 基础清洗:去除HTML标签、特殊字符、重复空格等 """ text = re.sub(r"<.*?>", "", text) # 去HTML text = re.sub(r"\s+", " ", text) # 合并空格 text = re.sub(r"[^\w\s\u4e00-\u9fff]", "", text) # 去除特殊符号 return text.strip() def extract_fields(self, text, instruction="请从中提取所有公司名称和联系人"): """ 调用LLM进行字段提取 """ prompt = f"以下是原始数据:\n{text}\n\n{instruction}" return self._call_model(prompt) def batch_process(self, data_list, instruction): results = [] for i, text in enumerate(data_list): print(f"处理第{i+1}条数据...") cleaned = self.clean_text(text) extracted = self.extract_fields(cleaned, instruction) results.append({ "original": text, "cleaned": cleaned, "extracted": extracted }) return results # 示例调用 if __name__ == "__main__": agent = DataCleaningAgent() sample_data = [ "联系人:张三,联系电话:123456789,公司:江西省招标有限公司", "地址:南昌市东湖区,北京华为技术有限公司,联系人王五" ] instruction = "请提取所有公司名称和联系人姓名,返回JSON格式" results = agent.batch_process(sample_data, instruction) for res in results: print("\n原始:", res["original"]) print("清洗后:", res["cleaned"]) print("提取信息:", res["extracted"]) 大家可自行验证:4.2 数据库读取SQL清洗完毕之后我们将数据转入库中即可,那么这是建立在我们需要收集外部数据的情况下设置的数据清洗和提取智能体,大部分企业是有存储自己业务数据的,一般直接放置在数据库中。但是获取这部分数据需要比较繁琐的步骤,一般来说数据分析师通过编写SQL获取,但也有不是数据分析师的客户或者是企业管理者想直接拿到数据分析,为了降低数据分析的门槛,我们可以直接设定一个获取数据的智能体,返回给我们基础数据。简而言之就是构建一个自然语言 → 数据库查询 → 自动执行SQL → 返回结果的智能体系统,为企业管理者、运营、产品等非技术用户提供低门槛的数据访问能力。需要注意的是这对我们的Prompt以及知识库要求较高,这里需要我们根据企业业务数据库去制作一份对应的数据字典,也就是说这份数据字典是包含所有业务数据字段的解释和各个表模块的介绍。这里给出一部分的Prompt示例:请根据下方表结构,从表 `sales_data` 中查询 2024 年每个月的销售总额和增长率: 表结构: CREATE TABLE sales_data ( id INT, sale_date DATE, amount DECIMAL, region VARCHAR(50) ) 问题: 2024年每个月的销售额是多少?同比去年增长了多少? 我们可以先解决需要代码编写的部分,比如执行层Python 实现数据库访问和结果返回:from sqlalchemy import create_engine import pandas as pd def execute_sql(sql: str, db_url: str): engine = create_engine(db_url) with engine.connect() as conn: df = pd.read_sql_query(sql, conn) return df 组件含义sqlalchemy.create_engine创建数据库连接对象,支持多种数据库(如 MySQL、PostgreSQL、SQLite)db_url数据库连接字符串,格式示例: mysql+pymysql://user:password@host:port/dbnamepd.read_sql_query(sql, conn)使用 pandas 执行 SQL 语句并将结果返回为 DataFrameconn数据库连接上下文,自动管理连接释放返回值 df查询结果组成的 DataFrame,可用于打印、导出、图表分析等调用实例如下:sql = "SELECT region, SUM(amount) as total FROM sales_data GROUP BY region" db_url = "mysql+pymysql://root:123456@localhost:3306/mydb" df = execute_sql(sql, db_url) print(df) 之后我们可以封装DeepSeek大模型,封装成一个DataQueryAgent,示例:class DataQueryAgent: def __init__(self, db_url: str, model_host="http://localhost:11434", model="deepseek-r1:1.5b"): ... def _generate_sql(self, question: str, schema_hint: str = ""): ... def query(self, question: str, schema_hint: str = ""): ... 设定面向用户的“自然语言问数据库”的封装类,封装了SQL生成 + 执行 两个流程。__init__ 方法比较简单定义:def __init__(self, db_url: str, model_host="http://localhost:11434", model="deepseek-r1:1.5b"): self.db_url = db_url self.model_host = model_host self.model = model _generate_sql() 方法就是完成功能(自然语言 → SQL)def _generate_sql(self, question: str, schema_hint: str = ""): prompt = f"根据以下数据库结构:\n{schema_hint}\n请生成对应SQL语句来回答问题:{question}" response = requests.post(f"{self.model_host}/api/generate", json={ "model": self.model, "prompt": prompt, "stream": False }) return response.json().get("response", "") 用本地部署的 DeepSeek 模型,将自然语言问题 + 数据库结构作为 Prompt,发送请求后解析返回结果为 SQL。query() 方法完成整个工作流(完整工作流:问题 → SQL → 查询结果)def query(self, question: str, schema_hint: str = ""): sql = self._generate_sql(question, schema_hint) print("生成的SQL:", sql) try: engine = create_engine(self.db_url) with engine.connect() as conn: df = pd.read_sql_query(sql, conn) return df except Exception as e: return f"SQL 执行失败: {e}" 示例调用:agent = DataQueryAgent( db_url="mysql+pymysql://root:password@localhost:3306/mydb" ) schema_hint = """ CREATE TABLE orders ( id INT, customer_name VARCHAR(50), amount DECIMAL(10,2), order_date DATE, status VARCHAR(20) ); """ question = "查询今年每个月的订单总金额" df = agent.query(question, schema_hint) print(df) 这样我们就得到一个完整的dataframe,之后我们就可以很轻松对其进行数据可视化了。4.3 DataFrame 可视化智能体(Visualization Agent)也可以将DataFrame 交给大模型智能体,让它根据数据内容决定可视化方式,并自动生成图表代码(如用 matplotlib / plotly / seaborn)。这种做法本质上就是构建一个 DataFrame 可视化智能体(Visualization Agent),它能够结合数据结构、字段含义与任务需求,为用户自动设计图表并呈现。具体实现逻辑很简单:DataFrame → LLM → 图表自动生成。构建可视化智能体类:import requests import pandas as pd import contextlib import io import traceback class VisualizationAgent: def __init__(self, model_host="http://localhost:11434", model="deepseek-r1:1.5b"): self.model_host = model_host self.model = model def _generate_plot_code(self, df: pd.DataFrame, task_description: str): # 将 df 转为 CSV 方便 LLM 解析 csv_data = df.to_csv(index=False) prompt = f""" 你是一个数据分析专家,以下是CSV格式的数据,以及用户的可视化需求。请根据这些数据生成 Python 绘图代码,使用 matplotlib(优先)或 plotly。 用户需求:{task_description} CSV数据如下:{csv_data} 请返回完整可运行的 Python 绘图代码: """ response = requests.post(f"{self.model_host}/api/generate", json={ "model": self.model, "prompt": prompt, "stream": False }) return response.json().get("response", "") def _safe_exec(self, code: str, local_vars=None): """ 安全执行模型生成的代码 """ local_vars = local_vars or {} global_vars = {"__builtins__": __builtins__, "pd": pd} try: # 重定向输出,避免污染控制台 with contextlib.redirect_stdout(io.StringIO()) as f: exec(code, global_vars, local_vars) except Exception as e: print("⚠️ 代码执行错误:") traceback.print_exc() def visualize(self, df: pd.DataFrame, task_description: str = "绘制销售额柱状图", show_code: bool = True): code = self._generate_plot_code(df, task_description) if show_code: print("模型生成绘图代码:\n") print(code) print("\n正在执行绘图代码...") self._safe_exec(code, local_vars={"df": df}) 使用示例if __name__ == "__main__": df = pd.DataFrame({ "region": ["华东", "华南", "华北", "西南"], "sales": [120000, 95000, 78000, 67000] }) agent = VisualizationAgent() agent.visualize(df, task_description="请生成一个地区销售额柱状图") 当运行这段代码后,它将调用模型生成图表代码,并自动执行、展示图表。5.总结在本章中,我们把“想法”一步步落地成了一条 端到端的智能数据分析流水线:环境就绪——先在云开发空间上快速拉起一台高配云主机,借助 curl install.sh 一键初始化开发环境,再用 Ollama 把 Deepseek-r1:1.5b 模型部署好并通过 netstat 确认端口;模型打通——用最简洁的 requests 脚本验证模型能正常对话,确保后面所有智能体都能调用;数据获取 & 清洗——编写 DataCleaningAgent,让大模型接管繁琐的文本去噪、实体抽取,批量输出结构化 JSON;自然语言查库——通过 DataQueryAgent 把 “一句中文 → SQL → DataFrame” 自动串起来,让非技术同事也能秒查运营数据;智能可视化——把查询得到的 DataFrame 交给 VisualizationAgent,由大模型自动产出 Matplotlib/Plotly 代码并即时执行,图表立现;全链路闭环——至此,数据从 获取 → 清洗 → 入库 → 查询 → 展示 全程自动化完成,真正实现了“低门槛、高效率、可复用”的企业级数据分析工作流。关键收益零 SQL / 零绘图门槛:业务人员只需说人话,背后全部自动完成。高度可插拔:每个 Agent 都是独立模块,清洗、查询、可视化可按需组合或替换更强模型。私有化可控:全部部署在企业自己的云主机与数据库内,安全合规。企业实战案例分享基于以上我们全链路的数据分析工作流的搭建,我们已经有足够的能力来实践一套供应链周报自动生成系统。一句话目标:让供应链经理把“写周报”这件事,彻底交给代码和大模型——数据自己来、图表自己画、结论自己写,人只需要点头确认。1. 背景与痛点典型痛点传统做法带来的问题数据源多、格式乱手动从 ERP / WMS / OMS 导出 Excel粗糙拼表、易漏字段,费时费神维度多、指标杂人肉写 SQL 拉数据运营/采购不懂 SQL,严重依赖数据团队图表重复劳动“复制数据 → 贴进模板 → 调样式”一键回车变成一上午文字结论经理凭经验手写主观、易漏看趋势结果:一个 10 页左右的供应链周报,往往要 3-4 人、费时半天才能端出来。2. 解决思路 :全链路工作流上场3. 核心模块落地3.1 数据清洗 & 入库接入源:ERP(采购、到货)、WMS(库存)、OMS(销售订单)工具:DataCleaningAgent任务:消歧同品名、规格(“A4纸80g” = “A4-80g”)解析自然语言备注,提取 供应商投诉 字段输出标准化 CSV,按天入 supply_dw 数据仓库3.2 自然语言取数使用者:采购经理、供应链分析师典型问句“近 12 周每周的采购金额与去年同期对比”“本周供应商到货准时率前 5 & 后 5”流程DataQueryAgent 读取 数据字典(自动生成或手写)Deepseek 模型根据问句 + Schema → 生成 SQL执行后返回 DataFrame示例快速验证:question = "列出最近4周库存周转天数趋势" schema_hint = "TABLE inv_summary(id, item, turnover_days, stat_week)" df_turnover = agent.query(question, schema_hint) 3.3 自动可视化 + 结论生成VisualizationAgent:DataFrame → 图表代码 → 即时渲染支持多图联排、双轴折线、漏斗、热力地图结论生成:再次调用 Deepseek:prompt = f"下面是近12周库存周转天数表格,请用200字生成总结性分析:\n{df.to_markdown()}" summary = llm(prompt) 3.4 报告编排 & 交付ReportBot(Python):收集所有图 PNG + 文字结论用 python-pptx 拼成周报 Deck,或 WeasyPrint 导出 PDF调用企业 IM / 邮件 API:“Hi @Team,本周周报已生成 👉 链接/附件”调度:Crontab / Airflow 每周一早 8 点自动跑4.关键代码骨架一览def weekly_report(): # 1. 拉取核心指标 metrics = { "采购金额": query_agent.query("近4周采购金额趋势", sales_schema), "到货准时率": query_agent.query("近4周到货准时率", inbound_schema), "库存周转": query_agent.query("近12周库存周转天数", inv_schema) } # 2. 生成图表 charts = {} for k, df in metrics.items(): charts[k] = viz_agent.visualize(df, f"绘制{k}折线图", show_code=False) # 3. AI 书写摘要 summary = llm("请基于以上三张表和图,总结供应链本周主要问题及建议") # 4. 组装 PPT ppt = build_ppt(charts, summary) ppt.save("weekly_supply_report.pptx") # 5. 送达 send_mail("weekly_supply_report.pptx", to=["boss@corp.com"]) 5. 效果速览指标传统方式全链路自动化周报产出时长≈ 4 小时 / 人≤ 10 分钟 / 机器参与人数运营 + 数据 + 设计0(无人值守)可追溯性低(手改 Excel)高(SQL + Git + 日志)结论一致性人工主观AI 根据数据重算6. 实施要点 & 踩坑提示数据字典必须够详细字段含义、单位、取值范围,一定要让大模型“看得懂”。SQL 安全在执行层加白名单,只允许 SELECT。图表代码安全执行VisualizationAgent 内部要过滤 os, subprocess 等危险关键字。权限管理建议给每个调用者分配专属 API Key + 查询范围。缓存/增量机制周报通常是“滚动周”,可以把前 N-1 周的数据缓存,速度更快。7. 价值回顾“让系统写报告,人来做决策。”效率:周报自动产出,数据团队有更多时间做深度分析。准确:口径统一、自动核对,避免手抄表格出错。透明:从原始数据到图表代码全链路留痕,可追溯可复盘。可扩展:同样的框架可复制到财务、营销、客服,形成企业级“智能报告工厂”。下一步,如果你想把这套系统推广到其他业务线,只需要:补充新表的 Schema;编写对应的 KPI Prompt;在调度器里新增定时任务。实践建议与关键技术要点总结大模型优化方法众多(如剪枝、量化、算子融合等),但没有 “通用最优解”。选择优化策略时,需结合业务需求、硬件条件、成本等多维度因素,避免盲目追求技术先进性而忽视实际效果。1应用场景导向应用场景导向的核心逻辑是不同场景对模型性能的需求不同,优化目标需与场景匹配。实时交互场景(如聊天机器人、自动驾驶):需低延迟,优先选择模型量化、算子融合或模型蒸馏。批量处理场景(如离线数据分析、大规模预测):需高吞吐量,可采用模型并行或动态批处理。边缘部署场景(如手机、物联网设备):受限于算力和能耗,需模型压缩+ 轻量级架构设计。若部署一个需要实时响应的智能家居语音助手,即使模型精度略低,也应优先选择量化和剪枝,以确保推理速度。2成本与性能平衡成本与性能平衡的核心逻辑是不同硬件对优化方法的支持程度不同,需充分发挥硬件优势。GPU/TPU 等加速硬件适合算子融合、模型并行等需要高并行计算的优化方法。边缘设备则需模型轻量化(剪枝、量化)+ 轻量级推理框架(如 TensorFlow Lite),避免复杂计算。CPU 服务器可选择多线程优化或内存优化(如 onnxruntime 的 CPU 优化)。比如在 GPU 集群上部署大模型时,使用 TensorRT 进行算子融合和图优化,能显著提升推理速度;而在手机端部署时,使用量化后的模型和 MNN 框架更合适。3. 实际应用中的综合决策在实际部署中,需结合多个原则进行权衡。比如自动驾驶公司需在车载设备(边缘硬件)上部署实时目标检测模型:场景导向:选择低延迟优化,如模型剪枝和量化。硬件匹配:针对车载芯片的架构,使用专用优化工具(如 Nvidia 的 TensorRT 针对 GPU 优化)。成本平衡:通过蒸馏轻量化模型,降低对高端硬件的依赖,控制成本。扩展性:设计可动态调整的流水线,未来若更换传感器或算法,可快速适配。这四个原则为大模型优化提供了从场景需求到技术落地的完整思考路径:以场景为起点,结合硬件和成本约束,选择灵活可扩展的方案。通过这种系统性的决策,企业既能在当下实现高效部署,又能为未来的业务增长预留技术空间。智能体编排未来应用趋势随着大模型从“一个聪明的回答机”逐步进化为“面向任务的执行体”,多智能体编排(Agent Orchestration) 正成为新一轮 AI 应用变革的核心引擎。相比传统单模型调用逻辑,智能体编排更像是构建“具身智能系统”或“数字员工团队”,将感知、思考、决策与执行完整串联起来,实现从“回答你”到“替你做”的飞跃。在我们展示的数据分析系统中,其实已经具备“雏形智能体”的要素:如 DataCleaningAgent、DataQueryAgent、VisualizationAgent 等等。这些模块虽然是按功能划分,但本质上已经具备“接收任务 → 理解意图 → 执行操作 → 返回结果”的 Agent 行为模式。而在更复杂的未来业务中,一个任务往往无法由单个智能体完成,这时需要:多个 Agent 分工合作、传递信息、联合完成复杂任务,甚至具备自治能力。比如在一个“企业经营分析”智能系统中:角色智能体职责说明任务协调 Agent(Coordinator)理解高层命令,如“生成月度经营报告”,并拆解成子任务数据智能体(Query Agent)根据需求调用数据库并返回 DataFrame图表智能体(Viz Agent)生成图表代码并渲染结果洞察智能体(Insight Agent)结合数据输出摘要分析与趋势判断报告生成 Agent汇总图表与文字,输出 Markdown/PPT/邮件等格式监控反馈 Agent记录执行日志、收集用户反馈并优化流程每一个 Agent 都是可复用的能力单元,而智能体间的协作流转由Agent Framework 或 编排平台调度控制。要真正落地多智能体协同系统,以下几个能力将成为技术团队不可回避的核心能力:意图解析能力需要将用户自然语言转化为结构化任务链(如 AWEL DSL、JSON Graph 等);任务拆解与分发能力使用树状或 DAG 结构,支持任务自动拆解与并发执行;智能体能力注册与调度机制类似“插件系统”,支持注册各类智能体,并调度调用;记忆与上下文管理实现智能体在多轮协作中共享上下文、传递中间变量;异常处理与回滚机制构建面向业务场景的“容错能力”,支持失败重试、断点续跑;安全与权限机制管理智能体访问数据的边界,防止越权调用或数据泄漏;未来的 AI,不再只是一个“回答者”,而是一个主动工作的“数字合作者”。从数据分析到知识总结,从任务执行到流程协同,我们已经看到多智能体系统开始扮演团队中的新角色。“让系统成为员工的一部分,而不仅仅是工具。”这不仅是数据智能化的进化方向,更是企业迈向新一代智能生产力时代的起点。下一阶段,智能体将不再只是调用函数的代理,而是逐步演化为具备推理、感知、社交能力的数字个体,真正融入业务、参与决策、持续学习与成长。我正在参加【案例共创】第4期 基于华为开发者空间+仓颉/DeepSeek/MCP完成应用构建开发实践 https://bbs.huaweicloud.com/forum/thread-02127182415062274055-1-1.html