计算机视觉_标签_开发者

博客(48)
视频(0)
论坛(44)
云声(0)
代码示例(0)

[技术干货] C++ TensorRT YOLOv8-SAHI 高性能部署指南

C++ TensorRT YOLOv8-SAHI 高性能部署指南项目介绍本项目将介绍如何在Jetson等嵌入式设备上实现YOLOv8-SAHI的高性能部署，特别是使用Int8引擎的优化方案。在Jetson Orin Nano (8GB)设备上，图像切片和批量推理的测试时间消耗小于0.05秒，对1080p视频进行切分检测和bytetrack跟踪性能接近15FPS。# 代码仓库： https://github.com/HouYanSong/tensorrtx-yolov8-sahi导出 YOLOv8 Int8 量化模型我们固定输入图像尺寸1440x1080进行切分，其中每张切分子图的大小为640x640重叠度>20%，加上原始图像一次推理对8张图像进行检测，导出Int8量化后BatchSize=8的模型。从yolov8.pt生成yolov8s.wts权重文件pip install ultralytics python gen_wts.py从yolov8s.wts导出yolov8s.engine引擎文件，BatchSize大小为8sudo apt install libeigen3-devrm -fr build cmake -S . -B build cmake --build build cd build ./yolov8_sahi -s ../weights/yolov8s.wts ../weights/yolov8s.engine s模型的参数配置模型的配置文件为include/config.h，这里我们使用yolov8s官方预训练模型，模型的输入大小为640x640总共有80个类别，并且设置模型的kBatchSize = 8，一次最多可8以推理8张图像，指定量化图片的路径导出Int8量化后的模型。#ifndef CONFIG_H #define CONFIG_H // #define USE_FP16 #define USE_INT8 #include <string> #include <vector> const static char *kInputTensorName = "images"; const static char *kOutputTensorName = "output"; const static int kNumClass = 80; const static int kBatchSize = 8; const static int kGpuId = 0; const static int kInputH = 640; const static int kInputW = 640; const static float kNmsThresh = 0.55f; const static float kConfThresh = 0.45f; const static int kMaxInputImageSize = 3000 * 3000; const static int kMaxNumOutputBbox = 1000; const std::string trtFile = "../weights/yolov8s.engine"; const std::string cacheFile = "./int8calib.table"; const std::string calibrationDataPath = "../images/"; // 存放用于 int8 量化校准的图像 const std::vector<std::string> vClassNames { "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" }; #endif // CONFIG_H YOLOv8-SAHI 切分检测为了验证量化后模型精度以及Batch推理的性能，这里我们使用Int8量化后的模型直接对量化图片进行切分检测，推理命令如下：cd build ./yolov8_sahi -d ../weights/yolov8s.engine ../images/在Jetson Orin Nano (8GB)上使用Int8引擎的YOLOv8-SAHI性能表现如下：sample0102.png YOLOv8-SAHI: 1775ms sample0206.png YOLOv8-SAHI: 46ms sample0121.png YOLOv8-SAHI: 44ms sample0058.png YOLOv8-SAHI: 44ms sample0070.png YOLOv8-SAHI: 44ms sample0324.png YOLOv8-SAHI: 43ms sample0122.png YOLOv8-SAHI: 44ms sample0086.png YOLOv8-SAHI: 45ms sample0124.png YOLOv8-SAHI: 45ms sample0230.png YOLOv8-SAHI: 45ms ...可以看到模型对单张图片的推理时间小于0.5毫秒，可以达到实时检测的要求。YOLOv8-SAHI-ByteTrack 视频跟踪我们可以结合ByteTrack跟踪算法对视频文件进行实时的切分检测和跟踪，在build目录下执行：cd build ./yolov8_sahi_track ../media/c3_1080.mp4 在Jetson Orin Nano (8GB)上YOLOv8-SAHI-ByteTrack性能表现如下：Total frames: 341 Init ByteTrack! Processing frame 20 (8 fps) Processing frame 40 (11 fps) Processing frame 60 (12 fps) Processing frame 80 (12 fps) Processing frame 100 (13 fps) Processing frame 120 (13 fps) Processing frame 140 (13 fps) Processing frame 160 (14 fps) Processing frame 180 (14 fps) Processing frame 200 (14 fps) Processing frame 220 (14 fps) Processing frame 240 (14 fps) Processing frame 260 (14 fps) Processing frame 280 (14 fps) Processing frame 300 (14 fps) Processing frame 320 (14 fps) Processing frame 340 (15 fps) FPS: 15 可以看到模型在1080p的视频上切分检测的帧率接近15FPS，并且ByteTrack的跟踪效果非常优秀。小结通过本项目，开发者可以在资源受限的嵌入式设备上实现高效的YOLOv8切分检测和跟踪，特别适用于需要实时处理的边缘计算场景。

HouYanSong 发表于2025-09-02 17:26:42 2025-09-02 17:26:42 最后回复 yd_238858456 2025-11-17 17:30:38
342 5

计算机视觉 C++ 边缘计算
[技术干货] 猫脸关键点检测（ModelBox）

猫脸关键点检测（ModelBox）一、模型训练与转换ResNet50V2是改进版的深度卷积神经网络，基于 ResNet 架构发展而来。它采用前置激活（将 BN 和 ReLU 移至卷积前）与身份映射，优化了信息传播和模型训练性能。作为 50 层深度的网络，ResNet50V2 广泛应用于图像分类、目标检测等任务，支持迁移学习，适合快速适配新数据集，具有良好的泛化能力和较高准确率。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建ResNet50V2工程：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n ResNet50V2 ... success: create ResNet50V2 in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中，-t参数，表示所创建实例的类型，包括server（ModelBox工程）、python（Python功能单元）、c++（C++功能单元）、infer（推理功能单元）等；-n参数，表示所创建实例的名称；-s参数，表示将使用后面参数值代表的模板创建工程，而不是创建空的工程。2. 创建推理功能单元在ModelBox sdk目录下使用create.bat创建resnet50v2_infer推理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t infer -n resnet50_infer -p ResNet50V2 ... success: create infer resnet50_infer in D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2/model/resnet50_infercreate.bat工具使用时，-t infer即表示创建的是推理功能单元；-n xxx_infer表示创建的功能单元名称为xxx_infer；-p表示所创建的功能单元属于ResNet50V2应用。下载转换好的ResNet50V2.onnx模型到ResNet50V2\model目录下，修改推理功能单元resnet50v2_infer.toml模型的配置文件：# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [base] name = "resnet50_infer" device = "cpu" version = "1.0.0" description = "your description" entry = "./ResNet50V2.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # Input ports description [input] [input.input1] # input port number, Format is input.input[N] name = "Input" # input port name type = "float" # input port data type ,e.g. float or uint8 device = "cpu" # input buffer type: cpu, win10 now copy input from cpu # Output ports description [output] [output.output1] # output port number, Format is output.output[N] name = "Output" # output port name type = "float" # output port data type ,e.g. float or uint83. 创建后处理功能单元在ModelBox sdk目录下使用create.bat创建resnet50v2_post后处理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t python -n resnet50v2_post -p ResNet50V2 ... success: create python resnet50v2_post in D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2/etc/flowunit/resnet50v2_postcreate.bat工具使用时，-t python即表示创建的是通用功能单元；-n xxx_post表示创建的功能单元名称为xxx_post；-p表示所创建的功能单元属于ResNet50V2应用。a. 修改配置文件我们的模型有一个输入和输出，总共包含猫脸的9个关键点：# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "resnet50v2_post" # The FlowUnit name device = "cpu" # The flowunit runs on cpu version = "1.0.0" # The version of the flowunit type = "python" # Fixed value, do not change description = "description" # The description of the flowunit entry = "resnet50v2_post@resnet50v2_postFlowUnit" # Python flowunit entry function group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit # The default Flowunit config [config] keypoints = 9 # Input ports description [input] [input.input1] # Input port number, the format is input.input[N] name = "in_feat" # Input port name type = "float" # Input port type # Output ports description [output] [output.output1] # Output port number, the format is output.output[N] name = "out_data" # Output port name type = "string" # Output port typeb. 修改逻辑代码# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. #!/usr/bin/env python # -*- coding: utf-8 -*- import _flowunit as modelbox import numpy as np import json class resnet50v2_postFlowUnit(modelbox.FlowUnit): # Derived from modelbox.FlowUnit def __init__(self): super().__init__() def open(self, config): # Open the flowunit to obtain configuration information self.params = {} self.params['keypoints'] = config.get_int('keypoints') return modelbox.Status.StatusCode.STATUS_SUCCESS def process(self, data_context): # Process the data in_data = data_context.input("in_feat") out_data = data_context.output("out_data") # resnet50v2_post process code. # Remove the following code and add your own code here. for buffer_feat in in_data: feat_data = np.array(buffer_feat.as_object(), copy=False) keypoints = feat_data.reshape(-1, 2).tolist() result = {"keypoints": keypoints} result_str = json.dumps(result) out_buffer = modelbox.Buffer(self.get_bind_device(), result_str) out_data.push_back(out_buffer) return modelbox.Status.StatusCode.STATUS_SUCCESS def close(self): # Close the flowunit return modelbox.Status() def data_pre(self, data_context): # Before streaming data starts return modelbox.Status() def data_post(self, data_context): # After streaming data ends return modelbox.Status() def data_group_pre(self, data_context): # Before all streaming data starts return modelbox.Status() def data_group_post(self, data_context): # After all streaming data ends return modelbox.Status() 4. 修改应用的流程图ResNet50V2工程graph目录下存放流程图，默认的流程图ResNet50V2.toml与工程同名：# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph ResNet50V2 { node [shape=Mrecord] queue_size = 4 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] httpserver_sync_receive[type=flowunit, flowunit=httpserver_sync_receive_v2, device=cpu, deviceid=0, time_out_ms=5000, endpoint="http://0.0.0.0:1234/v1/ResNet50V2", max_requests=100] image_decoder[type=flowunit, flowunit=image_decoder, device=cpu, key="image_base64", queue_size=4] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=224, image_height=224] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="0.003921568627450,0.003921568627450,0.003921568627450"] resnet50v2_infer[type=flowunit, flowunit=resnet50v2_infer, device=cpu, deviceid=0, batch_size=1] resnet50v2_post[type=flowunit, flowunit=resnet50v2_post, device=cpu, deviceid=0] httpserver_sync_reply[type=flowunit, flowunit=httpserver_sync_reply_v2, device=cpu, deviceid=0] input1:input -> httpserver_sync_receive:in_url httpserver_sync_receive:out_request_info -> image_decoder:in_encoded_image image_decoder:out_image -> image_resize:in_image image_resize:out_image -> normalize:in_data normalize:out_data -> resnet50v2_infer:Input resnet50v2_infer:Output -> resnet50v2_post:in_feat resnet50v2_post:out_data -> httpserver_sync_reply:in_reply_info }""" [flow] desc = "ResNet50V2 run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面，可以实时修改并查看项目的流程图：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor5. 运行应用在ResNet50V2工程目录下执行.\bin\main.bat运行应用：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2 PS D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2> .\bin\main.bat在ResNet50V2工程data目录下新建test_http.py测试脚本：#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. import os import cv2 import json import base64 import http.client class HttpConfig: '''http调用的参数配置''' def __init__(self, host_ip, port, url, img_base64_str): self.hostIP = host_ip self.Port = port self.httpMethod = "POST" self.requstURL = url self.headerdata = { "Content-Type": "application/json" } self.test_data = { "image_base64": img_base64_str } self.body = json.dumps(self.test_data) def read_image(img_path): '''读取图片数据并转为base64编码的字符串''' img_data = cv2.imread(img_path) img_data = cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB) img_str = cv2.imencode('.jpg', img_data)[1].tobytes() img_bin = base64.b64encode(img_str) img_base64_str = str(img_bin, encoding='utf8') return img_data, img_base64_str def test_image(img_path, ip, port, url): '''单张图片测试''' img_data, img_base64_str = read_image(img_path) http_config = HttpConfig(ip, port, url, img_base64_str) conn = http.client.HTTPConnection(host=http_config.hostIP, port=http_config.Port) conn.request(method=http_config.httpMethod, url=http_config.requstURL, body=http_config.body, headers=http_config.headerdata) response = conn.getresponse().read().decode() print('response: ', response) result = json.loads(response) w, h = img_data.shape[1], img_data.shape[0] for x, y in result["keypoints"]: if x > 0 and y > 0: cv2.circle(img_data, (int(x * w), int(y * h)), 5, (0, 255, 0), -1) cv2.imwrite('./result-' + os.path.basename(img_path), img_data[..., ::-1]) if __name__ == "__main__": port = 1234 ip = "127.0.0.1" url = "/v1/ResNet50V2" img_folder = './test_imgs' file_list = os.listdir(img_folder) for img_file in file_list: print("\n================ {} ================".format(img_file)) img_path = os.path.join(img_folder, img_file) test_image(img_path, ip, port, url) 在ResNet50V2工程data目录下新建test_imgs文件夹存放测试图片：在另一个终端中进入ResNet50V2工程目录data文件夹下运行test_http.py脚本发起HTTP请求测试：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2\data PS D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe .\test_http.py ================ 2256.jpg ================ response: {"keypoints": [[0.19147011637687683, 0.26770520210266113], [0.29639703035354614, 0.26533427834510803], [0.24554343521595, 0.35762542486190796], [0.11009970307350159, 0.2090619057416916], [0.08408773690462112, 0.09547536075115204], [0.17451311647891998, 0.169035404920578], [0.2880205512046814, 0.168979212641716], [0.3739408254623413, 0.0717596635222435], [0.34669068455696106, 0.20229394733905792]]} ================ 6899.jpg ================ response: {"keypoints": [[0.3829421401023865, 0.41393953561782837], [0.47102952003479004, 0.42683106660842896], [0.4321300983428955, 0.5082458853721619], [0.3185971677303314, 0.36286458373069763], [0.33502572774887085, 0.2243150770664215], [0.3852037489414215, 0.29658034443855286], [0.4819968640804291, 0.30954840779304504], [0.5504774451255798, 0.2711380124092102], [0.5290539264678955, 0.3962092399597168]]} 在ResNet50V2工程data目录下即可查看测试图片的推理结果：三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个ResNet50V2猫脸关键点检测的AI应用，我们只需要准备模型文件以及简单的配置即可创建一个HTTP服务。同时我们可以了解到ResNet50V2网络的基本结构、数据处理和模型训练方法，以及对应推理应用的逻辑。----转自博客：https://bbs.huaweicloud.com/blogs/451999

HouYanSong 发表于2025-08-29 17:20:28 2025-08-29 17:20:28 最后回复一只牛博 2025-09-04 09:05:28
32 3

计算机视觉人工智能 ModelBox AI Gallery
[技术干货] 果蔬病虫害分割（ModelBox）

果蔬病虫害分割（ModelBox）一、模型训练与转换FCN（全卷积网络，Fully Convolutional Networks）是用于语义分割任务的一种深度学习模型架构，引入了跳跃结构（Skip Architecture），通过融合浅层和深层的特征图，保留更多的细节信息，提升分割精度。此外，FCN还利用多尺度上下文聚合，捕捉不同层级的特征，增强了对不同大小目标的识别能力。FCN的成功推动了语义分割领域的发展，成为后续许多先进模型的基础。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建FCN工程：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n FCN ... success: create FCN in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中，-t参数，表示所创建实例的类型，包括server（ModelBox工程）、python（Python功能单元）、c++（C++功能单元）、infer（推理功能单元）等；-n参数，表示所创建实例的名称；-s参数，表示将使用后面参数值代表的模板创建工程，而不是创建空的工程。2. 创建推理功能单元在ModelBox sdk目录下使用create.bat创建fcn_infer推理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t infer -n fcn_infer -p FCN ... success: create infer fcn_infer in D:\modelbox-win10-x64-1.5.3\workspace\FCN/model/fcn_infercreate.bat工具使用时，-t infer即表示创建的是推理功能单元；-n xxx_infer表示创建的功能单元名称为xxx_infer；-p表示所创建的功能单元属于FCN应用。下载转换好的FCN.onnx模型到FCN\model目录下，修改推理功能单元fcn_infer.toml模型的配置文件：# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [base] name = "fcn_infer" device = "cpu" version = "1.0.0" description = "your description" entry = "./FCN.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # Input ports description [input] [input.input1] # input port number, Format is input.input[N] name = "Input" # input port name type = "float" # input port data type ,e.g. float or uint8 device = "cpu" # input buffer type: cpu, win10 now copy input from cpu # Output ports description [output] [output.output1] # output port number, Format is output.output[N] name = "Output" # output port name type = "float" # output port data type ,e.g. float or uint83. 创建后处理功能单元在ModelBox sdk目录下使用create.bat创建fcn_post后处理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t python -n fcn_post -p FCN ... success: create python fcn_post in D:\modelbox-win10-x64-1.5.3\workspace\FCN/etc/flowunit/fcn_postcreate.bat工具使用时，-t python即表示创建的是通用功能单元；-n xxx_post表示创建的功能单元名称为xxx_post；-p表示所创建的功能单元属于FCN应用。a. 修改配置文件我们的模型有一个输入和输出，对116种果蔬病虫害进行分割，加上背景总共是117类：# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "fcn_post" # The FlowUnit name device = "cpu" # The flowunit runs on cpu version = "1.0.0" # The version of the flowunit type = "python" # Fixed value, do not change description = "description" # The description of the flowunit entry = "fcn_post@fcn_postFlowUnit" # Python flowunit entry function group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit # The default Flowunit config [config] num_classes = 117 net_w = 224 net_h = 224 # Input ports description [input] [input.input1] # Input port number, the format is input.input[N] name = "in_image" # Input port name type = "uint8" # Input port type [input.input2] # Input port number, the format is input.input[N] name = "in_feat" # Input port name type = "float" # Input port type # Output ports description [output] [output.output1] # Output port number, the format is output.output[N] name = "out_image" # Output port name type = "uint8" # Output port typeb. 修改逻辑代码# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. #!/usr/bin/env python # -*- coding: utf-8 -*- import _flowunit as modelbox import numpy as np import cv2 class fcn_postFlowUnit(modelbox.FlowUnit): # Derived from modelbox.FlowUnit def __init__(self): super().__init__() def open(self, config): # Open the flowunit to obtain configuration information self.params = {} self.params['num_classes'] = config.get_int('num_classes') self.params['net_w'] = config.get_int('net_w') self.params['net_h'] = config.get_int('net_h') return modelbox.Status.StatusCode.STATUS_SUCCESS def process(self, data_context): # Process the data in_image = data_context.input("in_image") in_feat = data_context.input("in_feat") out_image = data_context.output("out_image") # fcn_post process code. # Remove the following code and add your own code here. for buffer_image, buffer_feat in zip(in_image, in_feat): channel = buffer_image.get('channel') width = buffer_image.get('width') height = buffer_image.get('height') image = np.array(buffer_image.as_object(), dtype=np.uint8, copy=False) image = image.reshape(height, width, channel) feat = np.array(buffer_feat.as_object(), dtype=np.float32, copy=False) feat = feat.reshape(self.params['net_h'], self.params['net_w'], self.params['num_classes']) mask = np.argmax(feat, axis=-1).astype(np.uint8) mask = cv2.resize(mask, (width, height), interpolation=cv2.INTER_NEAREST) overlay = np.zeros_like(image) for i in range(1, self.params['num_classes']): color = np.random.randint(0, 255, (3,)).tolist() overlay[mask==i] = color result_image = cv2.addWeighted(image[..., ::-1], 0.5, overlay, 0.5, 0) add_buffer = modelbox.Buffer(self.get_bind_device(), result_image) add_buffer.copy_meta(buffer_image) out_image.push_back(add_buffer) return modelbox.Status.StatusCode.STATUS_SUCCESS def close(self): # Close the flowunit return modelbox.Status() def data_pre(self, data_context): # Before streaming data starts return modelbox.Status() def data_post(self, data_context): # After streaming data ends return modelbox.Status() def data_group_pre(self, data_context): # Before all streaming data starts return modelbox.Status() def data_group_post(self, data_context): # After all streaming data ends return modelbox.Status() 4. 修改应用的流程图FCN工程graph目录下存放流程图，默认的流程图FCN.toml与工程同名：# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph FCN { node [shape=Mrecord] queue_size = 4 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] httpserver_sync_receive[type=flowunit, flowunit=httpserver_sync_receive_v2, device=cpu, deviceid=0, time_out_ms=5000, endpoint="http://0.0.0.0:1234/v1/FCN", max_requests=100] image_decoder[type=flowunit, flowunit=image_decoder, device=cpu, key="image_base64", queue_size=4] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=224, image_height=224] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="0.003921568627450,0.003921568627450,0.003921568627450"] fcn_infer[type=flowunit, flowunit=fcn_infer, device=cpu, deviceid=0, batch_size=1] fcn_post[type=flowunit, flowunit=fcn_post, device=cpu, deviceid=0] httpserver_sync_reply[type=flowunit, flowunit=httpserver_sync_reply_v2, device=cpu, deviceid=0] input1:input -> httpserver_sync_receive:in_url httpserver_sync_receive:out_request_info -> image_decoder:in_encoded_image image_decoder:out_image -> image_resize:in_image image_resize:out_image -> normalize:in_data normalize:out_data -> fcn_infer:Input image_decoder:out_image -> fcn_post:in_image fcn_infer:Output -> fcn_post:in_feat fcn_post:out_image -> httpserver_sync_reply:in_reply_info }""" [flow] desc = "FCN run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面，可以实时修改并查看项目的流程图：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor5. 运行应用在FCN工程目录下执行.\bin\main.bat运行应用：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\FCN PS D:\modelbox-win10-x64-1.5.3\workspace\FCN> .\bin\main.bat在FCN工程data目录下新建test_http.py测试脚本：import cv2 import json import base64 import requests import numpy as np if __name__ == "__main__": port = 1234 ip = "127.0.0.1" url = "/v1/FCN" img_path = "apple_black_rot_google_0056.jpg" img_data = cv2.imread(img_path) img_data = cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB) img_str = cv2.imencode('.jpg', img_data)[1].tobytes() img = base64.b64encode(img_str) img_base64_str = str(img, encoding='utf8') params = {"image_base64": img_base64_str} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) h, w, c = img_data.shape img_array = np.frombuffer(response.content, np.uint8) img_array = img_array.reshape((h, -1, c)) cv2.imwrite("res.jpg", img_array) 在FCN工程data目录下存放测试图片：在另一个终端中进入FCN工程目录data文件夹下：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\FCN\data首先安装requests依赖包：PS D:\modelbox-win10-x64-1.5.3\workspace\FCN\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe -m pip install requests然后运行test_http.py脚本发起HTTP请求测试：PS D:\modelbox-win10-x64-1.5.3\workspace\FCN\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe .\test_http.py测试图片的分割结果res.jpg将保存在FCN工程data目录下：三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个FCN果蔬病虫害分割的AI应用，我们只需要准备模型文件以及简单的配置即可创建一个HTTP服务。同时我们可以了解到FCN网络的基本结构、数据处理和模型训练方法，以及对应推理应用的逻辑。----转自博客：https://bbs.huaweicloud.com/blogs/449045

HouYanSong 发表于2025-08-29 17:19:23 2025-08-29 17:19:23 最后回复一只牛博 2025-09-04 09:05:28
43 3

计算机视觉人工智能 ModelBox AI Gallery
[技术干货] 深海鱼类检测（ModelBox）

深海鱼类检测（ModelBox）一、模型训练和转换YOLOX是YOLO系列的优化版本，引入了解耦头、数据增强、无锚点以及标签分类等目标检测领域的优秀进展，拥有较好的精度表现，同时对工程部署友好。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、ModelBox 应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建fish_det工程：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n fish_det -s car_det ... success: create fish_det in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中，-t参数，表示所创建实例的类型，包括server（ModelBox工程）、python（Python功能单元）、c++（C++功能单元）、infer（推理功能单元）等；-n参数，表示所创建实例的名称；-s参数，表示将使用后面参数值代表的模板创建工程，而不是创建空的工程。2. 修改推理功能单元下载转换好的yolox_fish.onnx模型到fish_det\model目录下，修改推理功能单元yolox_infer.toml模型的配置文件：# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. [base] name = "yolox_infer" device = "cpu" version = "1.0.0" description = "fish detection" entry = "./yolox_fish.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # input port description, suporrt multiple input ports [input] [input.input1] name = "input" type = "float" device = "cpu" # output port description, suporrt multiple output ports [output] [output.output1] name = "output" type = "float" 3. 修改后处理功能单元我们的模型的输入大小为320，类别数量是1，修改fish_det\etc\flowunit\yolox_post目录下的yolox_post.toml配置文件：# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "yolox_post" # The FlowUnit name device = "cpu" # The device the flowunit runs on，cpu，cuda，ascend。 version = "1.0.0" # The version of the flowunit description = "description" # The description of the flowunit entry = "yolox_post@yolox_postFlowUnit" # Python flowunit entry function type = "python" # Fixed value group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit [config] net_h = 320 net_w = 320 num_classes = 1 strides = ['8', '16', '32'] conf_threshold = 0.25 iou_threshold = 0.45 [input] [input.input1] name = "in_feat" type = "float" [output] [output.output1] name = "out_data" type = "string" 4. 修改绘图功能单元我们这里只有一个类别，所以修改coco_car_labels = [0]只检测鱼这个类别：... def decode_car_bboxes(self, bbox_str, input_shape): try: coco_car_labels = [0] # fish det_result = json.loads(bbox_str)['det_result'] if (det_result == "None"): return [] bboxes = json.loads(det_result) car_bboxes = list(filter(lambda x: int(x[5]) in coco_car_labels, bboxes)) except Exception as ex: modelbox.error(str(ex)) return [] else: for bbox in car_bboxes: bbox[0] = int(bbox[0] * input_shape[1]) bbox[1] = int(bbox[1] * input_shape[0]) bbox[2] = int(bbox[2] * input_shape[1]) bbox[3] = int(bbox[3] * input_shape[0]) return car_bboxes ... 5. 修改应用的流程图修改image_resize图像预处理功能单元参数image_width=320, image_height=320与模型的输入大小保持一致：# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph fish_det { node [shape=Mrecord] queue_size = 1 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] data_source_parser[type=flowunit, flowunit=data_source_parser, device=cpu, deviceid=0] video_demuxer[type=flowunit, flowunit=video_demuxer, device=cpu, deviceid=0] video_decoder[type=flowunit, flowunit=video_decoder, device=cpu, deviceid=0, pix_fmt=bgr] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=320, image_height=320] image_transpose[type=flowunit, flowunit=packed_planar_transpose, device=cpu, deviceid=0] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="1,1,1"] car_detection[type=flowunit, flowunit=yolox_infer, device=cpu, deviceid=0, batch_size = 1] yolox_post[type=flowunit, flowunit=yolox_post, device=cpu, deviceid=0] draw_car_bbox[type=flowunit, flowunit=draw_car_bbox, device=cpu, deviceid=0] video_out[type=flowunit, flowunit=video_out, device=cpu, deviceid=0] input1:input -> data_source_parser:in_data data_source_parser:out_video_url -> video_demuxer:in_video_url video_demuxer:out_video_packet -> video_decoder:in_video_packet video_decoder:out_video_frame -> image_resize:in_image image_resize:out_image -> image_transpose:in_image image_transpose:out_image -> normalize:in_data normalize:out_data -> car_detection:input car_detection:output -> yolox_post:in_feat video_decoder:out_video_frame -> draw_car_bbox:in_image yolox_post:out_data -> draw_car_bbox:in_bbox draw_car_bbox:out_image -> video_out:in_video_frame }""" [flow] desc = "fish_det run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面，可以实时修改并查看项目的流程图：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor6. 配置应用的输入输出下载测试视频到fish_det\data目录下，修改应用fish_det\bin\mock_task.toml配置文件：# 用于本地mock文件读取任务，脚本中已经配置了IVA_SVC_CONFIG环境变量, 添加了此文件路径 ########### 请确定使用linux的路径类型，比如在windows上要用 D:/xxx/xxx 不能用D:\xxx\xxx ########### # 任务的参数为一个压缩并转义后的json字符串 # 直接写需要转义双引号，也可以用 content_file 添加一个json文件 [common] content = "{\"param_str\":\"string param\",\"param_int\":10,\"param_float\":10.5}" # 任务输入配置，mock模拟目前仅支持一路rtsp或者本地url, 当前支持以下几种输入方式： # 1. rtsp摄像头或rtsp视频流：type="rtsp", url="rtsp://xxx.xxx" (type为rtsp的时候，支持视频中断自动重连) # 2. 设备自带摄像头或者USB摄像头：type="url"，url="摄像头编号,比如 0 或者 1 等" (需配合local_camera功能单元使用) # 3. 本地视频文件：type="url"，url="视频文件路径" (可以是相对路径 -- 相对这个mock_task.toml文件, 也支持从环境变量${HILENS_APP_ROOT}所在目录文件输入) # 4. http服务：type="url", url="http://xxx.xxx"(指的是任务作为http服务启动，此处需填写对外暴露的http服务地址，需配合httpserver类的功能单元使用) [input] type = "url" url = "${HILENS_APP_ROOT}/data/Test_ROV_video_h264_decim.mp4" # 任务输出配置，当前支持以下几种输出方式： # 1. rtsp视频流：type="local", url="rtsp://xxx.xxx" # 2. 本地屏幕：type="local", url="0:xxx" (设备需要接显示器，系统需要安装桌面) # 3. 本地视频文件：type="local"，url="视频文件路径" (可以是相对路径——相对这个mock_task.toml文件, 也支持输出到环境变量${HILENS_DATA_DIR}所在目录或子目录) # 4. http服务：type="webhook", url="http://xxx.xxx" (指的是任务产生的数据上报给某个http服务，此处需填写上传的http服务地址) [output] type = "local" url = "0" 7. 运行应用在fish_det工程目录下执行.\bin\main.bat运行应用，本地屏幕上会自动弹出鱼群的实时检测画面：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\fish_det PS D:\modelbox-win10-x64-1.5.3\workspace\fish_det> .\bin\main.bat三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个YOLOX鱼类目标检测的AI应用，我们只需要准备模型并配置对应的toml文件，即可快速实现模型的高效推理和部署。 ----转自博客：https://bbs.huaweicloud.com/blogs/449038

HouYanSong 发表于2025-08-29 17:13:11 2025-08-29 17:13:11 最后回复一只牛博 2025-09-04 09:05:14
36 3

计算机视觉人工智能 ModelBox AI Gallery
[技术干货] 动物分类（ModelBox）

动物分类（ModelBox）一、模型训练与转换Inception V3，GoogLeNet的改进版本，采用InceptionModule和全局平均池化层，v3一个最重要的改进是分解（Factorization），将7x7分解成两个一维的卷积（1x7,7x1），3x3也是一样（1x3,3x1），这样的好处，既可以加速计算（多余的计算能力可以用来加深网络），又可以将1个conv拆成2个conv，使得网络深度进一步增加，增加了网络的非线性。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、ModelBox 应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建InceptionV3工程：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n InceptionV3 ... success: create InceptionV3 in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中，-t参数，表示所创建实例的类型，包括server（ModelBox工程）、python（Python功能单元）、c++（C++功能单元）、infer（推理功能单元）等；-n参数，表示所创建实例的名称；-s参数，表示将使用后面参数值代表的模板创建工程，而不是创建空的工程。2. 创建推理功能单元在ModelBox sdk目录下使用create.bat创建inceptionv3_infer推理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t infer -n inceptionv3_infer -p InceptionV3 ... success: create infer inceptionv3_infer in D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3/model/inceptionv3_infercreate.bat工具使用时，-t infer即表示创建的是推理功能单元；-n xxx_infer表示创建的功能单元名称为xxx_infer；-p表示所创建的功能单元属于InceptionV3应用。下载转换好的InceptionV3.onnx模型到InceptionV3\model目录下，修改推理功能单元inceptionv3_infer.toml模型的配置文件：# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [base] name = "inceptionv3_infer" device = "cpu" version = "1.0.0" description = "your description" entry = "./InceptionV3.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # Input ports description [input] [input.input1] # input port number, Format is input.input[N] name = "Input" # input port name type = "float" # input port data type ,e.g. float or uint8 device = "cpu" # input buffer type: cpu, win10 now copy input from cpu # Output ports description [output] [output.output1] # output port number, Format is output.output[N] name = "Output" # output port name type = "float" # output port data type ,e.g. float or uint83. 创建后处理功能单元在ModelBox sdk目录下使用create.bat创建inceptionv3_post后处理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t python -n inceptionv3_post -p InceptionV3 ... success: create python inceptionv3_post in D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3/etc/flowunit/inceptionv3_postcreate.bat工具使用时，-t python即表示创建的是通用功能单元；-n xxx_post表示创建的功能单元名称为xxx_post；-p表示所创建的功能单元属于InceptionV3应用。a. 修改配置文件我们的模型有一个输入和输出，总共包含90种动物类别：# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "inceptionv3_post" # The FlowUnit name device = "cpu" # The flowunit runs on cpu version = "1.0.0" # The version of the flowunit type = "python" # Fixed value, do not change description = "description" # The description of the flowunit entry = "inceptionv3_post@inceptionv3_postFlowUnit" # Python flowunit entry function group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit # The default Flowunit config [config] num_classes = 90 # Input ports description [input] [input.input1] # Input port number, the format is input.input[N] name = "in_feat" # Input port name type = "float" # Input port type # Output ports description [output] [output.output1] # Output port number, the format is output.output[N] name = "out_data" # Output port name type = "string" # Output port typeb. 修改逻辑代码# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. #!/usr/bin/env python # -*- coding: utf-8 -*- import _flowunit as modelbox import numpy as np import json class inceptionv3_postFlowUnit(modelbox.FlowUnit): # Derived from modelbox.FlowUnit def __init__(self): super().__init__() def open(self, config): # Open the flowunit to obtain configuration information self.params = {} self.params['num_classes'] = config.get_int('num_classes') return modelbox.Status.StatusCode.STATUS_SUCCESS def process(self, data_context): # Process the data in_feat = data_context.input("in_feat") out_data = data_context.output("out_data") # inceptionv3_post process code. # Remove the following code and add your own code here. for buffer_feat in in_feat: feat_data = np.array(buffer_feat.as_object(), copy=False) clsse = np.argmax(feat_data).astype(np.int32).item() score = feat_data[clsse].astype(np.float32).item() result = {"clsse": clsse, "score":score} result_str = json.dumps(result) out_buffer = modelbox.Buffer(self.get_bind_device(), result_str) out_data.push_back(out_buffer) return modelbox.Status.StatusCode.STATUS_SUCCESS def close(self): # Close the flowunit return modelbox.Status() def data_pre(self, data_context): # Before streaming data starts return modelbox.Status() def data_post(self, data_context): # After streaming data ends return modelbox.Status() def data_group_pre(self, data_context): # Before all streaming data starts return modelbox.Status() def data_group_post(self, data_context): # After all streaming data ends return modelbox.Status() 4. 修改应用的流程图InceptionV3工程graph目录下存放流程图，默认的流程图InceptionV3.toml与工程同名：# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph InceptionV3 { node [shape=Mrecord] queue_size = 4 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] httpserver_sync_receive[type=flowunit, flowunit=httpserver_sync_receive_v2, device=cpu, deviceid=0, time_out_ms=5000, endpoint="http://0.0.0.0:1234/v1/InceptionV3", max_requests=100] image_decoder[type=flowunit, flowunit=image_decoder, device=cpu, key="image_base64", queue_size=4] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=224, image_height=224] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="0.003921568627450,0.003921568627450,0.003921568627450"] inceptionv3_infer[type=flowunit, flowunit=inceptionv3_infer, device=cpu, deviceid=0, batch_size=1] inceptionv3_post[type=flowunit, flowunit=inceptionv3_post, device=cpu, deviceid=0] httpserver_sync_reply[type=flowunit, flowunit=httpserver_sync_reply_v2, device=cpu, deviceid=0] input1:input -> httpserver_sync_receive:in_url httpserver_sync_receive:out_request_info -> image_decoder:in_encoded_image image_decoder:out_image -> image_resize:in_image image_resize:out_image -> normalize:in_data normalize:out_data -> inceptionv3_infer:Input inceptionv3_infer:Output -> inceptionv3_post:in_feat inceptionv3_post:out_data -> httpserver_sync_reply:in_reply_info }""" [flow] desc = "InceptionV3 run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面，可以实时修改并查看项目的流程图：PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor5. 运行应用在InceptionV3工程目录下执行.\bin\main.bat运行应用：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3 PS D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3> .\bin\main.bat在InceptionV3工程data目录下新建test_http.py测试脚本：#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. import os import cv2 import json import base64 import http.client class HttpConfig: '''http调用的参数配置''' def __init__(self, host_ip, port, url, img_base64_str): self.hostIP = host_ip self.Port = port self.httpMethod = "POST" self.requstURL = url self.headerdata = { "Content-Type": "application/json" } self.test_data = { "image_base64": img_base64_str } self.body = json.dumps(self.test_data) def read_image(img_path): '''读取图片数据并转为base64编码的字符串''' img_data = cv2.imread(img_path) img_data = cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB) img_str = cv2.imencode('.jpg', img_data)[1].tobytes() img_bin = base64.b64encode(img_str) img_base64_str = str(img_bin, encoding='utf8') return img_data, img_base64_str def decode_result_str(result_str): try: result = json.loads(result_str) except Exception as ex: print(str(ex)) return [] else: return result labels = ['antelope', 'badger', 'bat', 'bear', 'bee', 'beetle', 'bison', 'boar', 'butterfly', 'cat', 'caterpillar', 'chimpanzee', 'cockroach', 'cow', 'coyote', 'crab', 'crow', 'deer', 'dog', 'dolphin', 'donkey', 'dragonfly', 'duck', 'eagle', 'elephant', 'flamingo', 'fly', 'fox', 'goat', 'goldfish', 'goose', 'gorilla', 'grasshopper', 'hamster', 'hare', 'hedgehog', 'hippopotamus', 'hornbill', 'horse', 'hummingbird', 'hyena', 'jellyfish', 'kangaroo', 'koala', 'ladybugs', 'leopard', 'lion', 'lizard', 'lobster', 'mosquito', 'moth', 'mouse', 'octopus', 'okapi', 'orangutan', 'otter', 'owl', 'ox', 'oyster', 'panda', 'parrot', 'pelecaniformes', 'penguin', 'pig', 'pigeon', 'porcupine', 'possum', 'raccoon', 'rat', 'reindeer', 'rhinoceros', 'sandpiper', 'seahorse', 'seal', 'shark', 'sheep', 'snake', 'sparrow', 'squid', 'squirrel', 'starfish', 'swan', 'tiger', 'turkey', 'turtle', 'whale', 'wolf', 'wombat', 'woodpecker', 'zebra'] def test_image(img_path, ip, port, url): '''单张图片测试''' img_data, img_base64_str = read_image(img_path) http_config = HttpConfig(ip, port, url, img_base64_str) conn = http.client.HTTPConnection(host=http_config.hostIP, port=http_config.Port) conn.request(method=http_config.httpMethod, url=http_config.requstURL, body=http_config.body, headers=http_config.headerdata) response = conn.getresponse().read().decode() print('response: ', response) result = decode_result_str(response) clsse, score = result["clsse"], result["score"] result_str = f"{labels[clsse]}:{round(score, 2)}" cv2.putText(img_data, result_str, (0, 100), cv2.FONT_HERSHEY_TRIPLEX, 4, (0, 255, 0), 2) cv2.imwrite('./result-' + os.path.basename(img_path), img_data[..., ::-1]) if __name__ == "__main__": port = 1234 ip = "127.0.0.1" url = "/v1/InceptionV3" img_folder = './test_imgs' file_list = os.listdir(img_folder) for img_file in file_list: print("\n================ {} ================".format(img_file)) img_path = os.path.join(img_folder, img_file) test_image(img_path, ip, port, url) 在InceptionV3工程data目录下新建test_imgs文件夹存放测试图片：在另一个终端中进入InceptionV3工程目录data文件夹下运行test_http.py脚本发起HTTP请求测试：PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3\data PS D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe .\test_http.py ================ 61cf5127ce.jpg ================ response: {"clsse": 63, "score": 0.9996486902236938} ================ 7e2a453559.jpg ================ response: {"clsse": 81, "score": 0.999880313873291} 在InceptionV3工程data目录下即可查看测试图片的推理结果：三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个InceptionV3动物图片分类的AI应用，我们只需要准备模型文件以及简单的配置即可创建一个HTTP服务。同时我们可以了解到InceptionV3网络的基本结构、数据处理和模型训练方法，以及对应推理应用的逻辑。----转自博客：https://bbs.huaweicloud.com/blogs/449036

HouYanSong 发表于2025-08-29 17:10:36 2025-08-29 17:10:36 最后回复一只牛博 2025-09-04 09:05:14
24 3

计算机视觉人工智能 ModelBox AI Gallery
[技术干货] 小样本分类实现

小样本分类实现度量学习方法1、Prototypical Networks为每个类别计算原型向量（类中心）通过计算样本与原型的距离进行分类import torch import torch.nn as nn class PrototypicalNetwork(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder def forward(self, support_images, support_labels, query_images): # 编码支持集和查询集 support_embeddings = self.encoder(support_images) query_embeddings = self.encoder(query_images) # 计算类原型 prototypes = compute_prototypes(support_embeddings, support_labels) # 计算距离并分类 distances = compute_distances(query_embeddings, prototypes) return -distances2、Siamese Networks学习样本对之间的相似性度量通过比较查询样本与支持样本的相似度进行分类核心思想Siamese Networks采用孪生网络结构，通过比较两个输入样本的特征表示来判断它们的相似性：网络结构：两个完全相同的子网络共享权重输入方式：同时输入一对样本（正样本对或负样本对）学习目标：学习一个距离度量函数最佳实践可以参考AI Gallery中的Noetbook示例：

HouYanSong 发表于2025-08-29 16:50:07 2025-08-29 16:50:07 最后回复一只牛博 2025-09-04 09:05:08
9 2

AI开发平台ModelArts 计算机视觉人工智能
[技术干货] 【朝推夜训】如何在边缘设备上搭建深度学习开发环境

【朝推夜训】如何在边缘设备上搭建深度学习开发环境如何在边缘设备上搭建深度学习的开发环境，我们以Jetson Orin Nano为例，介绍如何在开发板上安装Miniconda并配置conda源，以及如何安装Pytorch和Torchvision。1. 安装 Miniconda首先下载Miniconda最新安装包：wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh运行Miniconda3-latest-Linux-aarch64.sh安装脚本进行安装：bash ~/Miniconda3-latest-Linux-aarch64.sh关闭并重新打开终端窗口以使安装完全生效，或者使用以下命令刷新终端：source ~/.bashrc2. conda 换源首先编辑.condarc文件：vi ~/.condarc我们使用清华源，将文件修改为如下内容，即可添加Anaconda Python免费仓库。channels: - defaults show_channel_urls: true default_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud使用下列命令清除索引缓存，并创建Python-3.8开发环境。conda clean -i conda create -n py38 python=3.8 3. pip 换源在用户目录创建.pip目录，并编辑pip.conf文件：cd ~ mkdir .pip cd .pip vi pip.confpip.conf写入以下内容：[global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple/ [install] trusted-host = pypi.tuna.tsinghua.edu.cn保存pip.conf4. 安装 Pytorch 和 Torchvision首先打开PyTorch for Jetson官网根据自己的JetPack版本选择合适安装包进行下载：https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048 cd ~/Downloads wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl我下载版本的是PyTorch v2.1.0，之后激活我们刚刚创建好的conda环境py38并安装Pytorch：conda activate py38 sudo apt-get install python3-pip libopenblas-base libopenmpi-dev libomp-dev pip install ~/Downloads/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl安装完成后，我们可以验证一下，在命令行中输入：python -c "import torch; print(torch.cuda.is_available())" 结果为True，则说明安装成功。之后安装Torchvision，Pytorch v2.1.0对应torchvision v0.16.1：PyTorch v1.0 - torchvision v0.2.2PyTorch v1.1 - torchvision v0.3.0PyTorch v1.2 - torchvision v0.4.0PyTorch v1.3 - torchvision v0.4.2PyTorch v1.4 - torchvision v0.5.0PyTorch v1.5 - torchvision v0.6.0PyTorch v1.6 - torchvision v0.7.0PyTorch v1.7 - torchvision v0.8.1PyTorch v1.8 - torchvision v0.9.0PyTorch v1.9 - torchvision v0.10.0PyTorch v1.10 - torchvision v0.11.1PyTorch v1.11 - torchvision v0.12.0PyTorch v1.12 - torchvision v0.13.0PyTorch v1.13 - torchvision v0.13.0PyTorch v1.14 - torchvision v0.14.1PyTorch v2.0 - torchvision v0.15.1PyTorch v2.1 - torchvision v0.16.1PyTorch v2.2 - torchvision v0.17.1PyTorch v2.3 - torchvision v0.18.0Torchvision安装命令如下：sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libopenblas-dev libavcodec-dev libavformat-dev libswscale-dev git clone --branch v0.16.1 https://github.com/pytorch/vision torchvision cd torchvision conda activate py38 export BUILD_VERSION=0.16.1 pip install numpy==1.23.5 Pillow==9.5.0 requests==2.32.4 python setup.py install --user等待Trochvision安装完成后，我们可以验证一下，在命令行中输入：python -c "import torchvision; print(torchvision.__version__)" 成功打印版本号说明安装成功！

HouYanSong 发表于2025-08-29 10:11:22 2025-08-29 10:11:22 最后回复一只牛博 2025-09-04 09:05:08
67 3

计算机视觉人工智能边缘计算
[技术干货] RK3588 AI 应用开发（ResNet50V2-关键点检测）

RK3588 AI 应用开发（ResNet50V2-关键点检测）一、模型训练与转换ResNet50V2 是改进版的深度卷积神经网络，基于 ResNet 架构发展而来。它采用前置激活（将 BN 和 ReLU 移至卷积前）与身份映射，优化了信息传播和模型训练性能。作为 50 层深度的网络，ResNet50V2 广泛应用于图像分类、目标检测等任务，支持迁移学习，适合快速适配新数据集，具有良好的泛化能力和较高准确率。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/ResNet50V2" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="ResNet50V2 猫脸关键点检测") demo.launch(share=False, server_port=3000) /home/orangepi/miniconda3/envs/python-3.10.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码class ResNet50V2: def __init__(self, model_path): self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) def preprocess(self, image): image = image[:, :, ::-1] image = cv2.resize(image, (224, 224)) return np.expand_dims(image, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): feat = pred.squeeze().reshape(-1, 2) return feat def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 keypoints = self.post_process(pred) # 绘制关键点检测结果 h, w, _ = image.shape for x, y in keypoints: cv2.circle(image, (int(x * w), int(y * h)), 5, (0, 255, 0), -1) return image[..., ::-1] def release(self): self.rknn_lite.release() 3. 图片批量预测import os import cv2 import numpy as np import matplotlib.pyplot as plt from rknnlite.api import RKNNLite model = ResNet50V2('model/ResNet50V2.rknn') for image in os.listdir("image"): image = cv2.imread(os.path.join("image", image)) image = model.predict(image) plt.imshow(image) plt.axis('off') plt.show() model.release() 4. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/ResNet50V2', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = ResNet50V2('model/ResNet50V2.rknn') app.run(host='0.0.0.0', port=8000) model.release() W rknn-toolkit-lite2 version: 2.3.2 * Serving Flask app '__main__' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:8000 * Running on http://192.168.3.50:8000 Press CTRL+C to quit 127.0.0.1 - - [02/May/2025 02:13:40] "POST /v1/ResNet50V2 HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 02:13:46] "POST /v1/ResNet50V2 HTTP/1.1" 200 - 5. 上传图片预测三、小结本章介绍了基于 RK3588 的 ResNet50V2 关键点检测应用开发全流程，包括模型训练与转换、Gradio 界面设计、推理代码实现、批量预测处理及 Flask 服务部署，完整实现了从模型到端到端应用的落地。----转自博客：https://bbs.huaweicloud.com/blogs/451999

HouYanSong 发表于2025-08-29 07:53:10 2025-08-29 07:53:10 最后回复一只牛博 2025-09-04 09:05:08
100 3

AI开发平台ModelArts 计算机视觉 AI Gallery
[技术干货] RK3588 AI 应用开发（FCN-语义分割）

RK3588 AI 应用开发（FCN-语义分割）一、模型训练与转换FCN（全卷积网络，Fully Convolutional Networks）是用于语义分割任务的一种深度学习模型架构，引入了跳跃结构（Skip Architecture），通过融合浅层和深层的特征图，保留更多的细节信息，提升分割精度。此外，FCN还利用多尺度上下文聚合，捕捉不同层级的特征，增强了对不同大小目标的识别能力。FCN的成功推动了语义分割领域的发展，成为后续许多先进模型的基础。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/FCN" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="FCN 果蔬病虫害分割") demo.launch(share=False, server_port=3000) /home/orangepi/miniconda3/envs/python-3.10.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码class FCN: def __init__(self, model_path): self.num_classes = 117 self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) self.color_list = np.random.randint(0, 255, size=(self.num_classes, 3), dtype=np.uint8).tolist() def preprocess(self, image): image = image[:, :, ::-1] image = cv2.resize(image, (224, 224)) return np.expand_dims(image, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): feat = pred.squeeze() return np.argmax(feat, axis=-1).astype(np.uint8) def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 feat = self.post_process(pred) # 生成图像分割结果 canv = np.zeros_like(image) mask = cv2.resize(feat, image.shape[:2][::-1], interpolation=cv2.INTER_NEAREST) for i in range(1, self.num_classes): canv[mask==i] = self.color_list[i] return cv2.addWeighted(image[..., ::-1], 0.5, canv, 0.5, 0) def release(self): self.rknn_lite.release() 3. 图片批量预测import os import cv2 import numpy as np import matplotlib.pyplot as plt from rknnlite.api import RKNNLite model = FCN('model/FCN.rknn') for image in os.listdir("image"): image = cv2.imread(os.path.join("image", image)) image = model.predict(image) plt.imshow(image) plt.axis('off') plt.show() model.release() 4. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/FCN', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = FCN('model/FCN.rknn') app.run(host='0.0.0.0', port=8000) model.release() W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [00:06:51.738] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14) I RKNN: [00:06:51.738] RKNN Driver Information: version: 0.9.6 I RKNN: [00:06:51.739] RKNN Model Information: version: 1, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: TFLite, framework layout: NHWC * Serving Flask app '__main__' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:8000 * Running on http://192.168.3.50:8000 Press CTRL+C to quit 127.0.0.1 - - [02/May/2025 00:07:17] "POST /v1/FCN HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 00:07:24] "POST /v1/FCN HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 00:07:31] "POST /v1/FCN HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 00:07:39] "POST /v1/FCN HTTP/1.1" 200 - 5. 上传图片预测三、小结本章介绍了基于RK3588平台使用FCN模型进行语义分割的AI应用开发全流程，包括模型训练与转换、Gradio界面开发、推理代码编写、批量预测实现及Flask服务部署。通过该流程，开发者可实现高效的图像分割任务，并在本地或云端进行预测和展示。----转自博客：https://bbs.huaweicloud.com/blogs/451998

HouYanSong 发表于2025-08-29 07:51:20 2025-08-29 07:51:20 最后回复一只牛博 2025-09-04 09:05:08
102 3

AI开发平台ModelArts 计算机视觉 AI Gallery
[技术干货] RK3588 AI 应用开发（YOLOX-目标检测）

RK3588 AI 应用开发（YOLOX-目标检测）一、模型训练和转换YOLOX是YOLO系列的优化版本，引入了解耦头、数据增强、无锚点以及标签分类等目标检测领域的优秀进展，拥有较好的精度表现，同时对工程部署友好。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/fish_det" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="YOLOX 深海鱼类检测") demo.launch(share=False, server_port=3000) * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码%%writefile YOLOX/yolox/data/datasets/voc_classes.py #!/usr/bin/env python3 # -*- coding:utf-8 -*- # Copyright (c) Megvii, Inc. and its affiliates. # VOC_CLASSES = ( '__background__', # always index 0 VOC_CLASSES = ( "fish", ) Overwriting YOLOX/yolox/data/datasets/voc_classes.pyimport sys sys.path.append("YOLOX") from yolox.utils import demo_postprocess, multiclass_nms, vis from yolox.data.data_augment import preproc as preprocess from yolox.data.datasets.voc_classes import VOC_CLASSESimport cv2 import numpy as np import ipywidgets as widgets from rknnlite.api import RKNNLite from IPython.display import display class YOLOX: def __init__(self, model_path): self.ratio = None self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) def preprocess(self, image): start_img, self.ratio = preprocess(image, (320, 320), swap=(0, 1, 2)) return np.expand_dims(start_img, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): predictions = demo_postprocess(pred.squeeze(), (320, 320)) boxes = predictions[:, :4] scores = predictions[:, 4:5] * predictions[:, 5:] boxes_xyxy = np.ones_like(boxes) boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2. boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2. boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2. boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2. boxes_xyxy /= self.ratio dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.25) return dets def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 dets = self.post_process(pred) # 绘制目标检测结果 if dets is not None: final_boxes = dets[:, :4] final_scores, final_cls_inds = dets[:, 4], dets[:, 5] image = vis(image, final_boxes, final_scores, final_cls_inds, conf=0.25, class_names=VOC_CLASSES) return image[..., ::-1] def img2bytes(self, image): """将图片转换为字节码""" return bytes(cv2.imencode('.jpg', image)[1]) def infer_video(self, video_path): """视频推理""" image_widget = widgets.Image(format='jpeg', width=800, height=600) display(image_widget) cap = cv2.VideoCapture(video_path) while True: ret, img_frame = cap.read() if not ret: break image_pred = self.predict(img_frame) image_widget.value = self.img2bytes(image_pred) cap.release() def release(self): """释放资源""" self.rknn_lite.release() 3. 图像预测4. 视频推理5. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/fish_det', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = YOLOX('model/yolox_fish.rknn') app.run(host='0.0.0.0', port=8000) model.release() 6. 上传图片预测三、小结本章介绍了基于RK3588平台使用YOLOX进行目标检测的全流程，包括模型训练与转换、Gradio界面开发、推理代码编写、图像和视频预测实现，以及Flask服务部署。整体实现了高效的鱼类检测应用，适用于嵌入式设备部署与实际场景应用。----转自博客：https://bbs.huaweicloud.com/blogs/452001

HouYanSong 发表于2025-08-29 07:48:44 2025-08-29 07:48:44 最后回复一只牛博 2025-09-04 09:05:08
17 3

AI开发平台ModelArts 计算机视觉 AI Gallery
[技术干货] RK3588 AI 应用开发（InceptionV3-图像分类）

RK3588 AI 应用开发（InceptionV3-图像分类）一、模型训练与转换Inception V3，GoogLeNet的改进版本，采用InceptionModule和全局平均池化层，v3一个最重要的改进是分解（Factorization），将7x7分解成两个一维的卷积（1x7,7x1），3x3也是一样（1x3,3x1），这样的好处，既可以加速计算（多余的计算能力可以用来加深网络），又可以将1个conv拆成2个conv，使得网络深度进一步增加，增加了网络的非线性。模型的训练与转换教程已经开放在AI Gallery中，其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后，再转换成对应平台的模型格式：onnx格式可以用在Windows设备上，RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/InceptionV3" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="InceptionV3 动物分类") demo.launch(share=False, server_port=3000) /home/orangepi/miniconda3/envs/python-3.10.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码class InceptionV3: def __init__(self, model_path): self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) self.label = ['antelope', 'badger', 'bat', 'bear', 'bee', 'beetle', 'bison', 'boar', 'butterfly', 'cat', 'caterpillar', 'chimpanzee', 'cockroach', 'cow', 'coyote', 'crab', 'crow', 'deer', 'dog', 'dolphin', 'donkey', 'dragonfly', 'duck', 'eagle', 'elephant', 'flamingo', 'fly', 'fox', 'goat', 'goldfish', 'goose', 'gorilla', 'grasshopper', 'hamster', 'hare', 'hedgehog', 'hippopotamus', 'hornbill', 'horse', 'hummingbird', 'hyena', 'jellyfish', 'kangaroo', 'koala', 'ladybugs', 'leopard', 'lion', 'lizard', 'lobster', 'mosquito', 'moth', 'mouse', 'octopus', 'okapi', 'orangutan', 'otter', 'owl', 'ox', 'oyster', 'panda', 'parrot', 'pelecaniformes', 'penguin', 'pig', 'pigeon', 'porcupine', 'possum', 'raccoon', 'rat', 'reindeer', 'rhinoceros', 'sandpiper', 'seahorse', 'seal', 'shark', 'sheep', 'snake', 'sparrow', 'squid', 'squirrel', 'starfish', 'swan', 'tiger', 'turkey', 'turtle', 'whale', 'wolf', 'wombat', 'woodpecker', 'zebra'] def preprocess(self, image): image = image[:, :, ::-1] image = cv2.resize(image, (224, 224)) return np.expand_dims(image, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): clsse = np.argmax(pred, axis=-1) score = pred[0][clsse[0]].item() return self.label[clsse[0]], round(score * 100, 2) def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 label, score = self.post_process(pred) # 绘制识别结果 print(f'{label}:{score}%') image = cv2.putText(image, f'{label}:{score}%', (0, 100), cv2.FONT_HERSHEY_TRIPLEX, 4, (0, 255, 0), 8) return image[..., ::-1] def release(self): self.rknn_lite.release() 3. 图片批量预测import os import cv2 import numpy as np import matplotlib.pyplot as plt from rknnlite.api import RKNNLite model = InceptionV3('model/InceptionV3.rknn') for image in os.listdir("image"): image = cv2.imread(os.path.join("image", image)) image = model.predict(image) plt.imshow(image) plt.axis('off') plt.show() model.release() 4. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/InceptionV3', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = InceptionV3('model/InceptionV3.rknn') app.run(host='0.0.0.0', port=8000) model.release() W rknn-toolkit-lite2 version: 2.3.2 * Serving Flask app '__main__' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:8000 * Running on http://192.168.3.50:8000 Press CTRL+C to quit 127.0.0.1 - - [01/May/2025 20:37:00] "POST /v1/InceptionV3 HTTP/1.1" 200 - pig:99.95% 127.0.0.1 - - [01/May/2025 20:37:09] "POST /v1/InceptionV3 HTTP/1.1" 200 - swan:99.95% 127.0.0.1 - - [01/May/2025 20:37:20] "POST /v1/InceptionV3 HTTP/1.1" 200 - cat:97.02% 5. 上传图片预测三、小结本章介绍了基于RK3588平台的InceptionV3图像分类应用开发全流程，包括模型训练与格式转换、Gradio界面设计、推理代码实现、批量预测处理及Flask服务部署，实现了从本地到Web端的高效AI推理应用。----转自博客：https://bbs.huaweicloud.com/blogs/451978

HouYanSong 发表于2025-08-29 07:46:45 2025-08-29 07:46:45 最后回复一只牛博 2025-09-04 09:05:08
126 4

AI开发平台ModelArts 计算机视觉 AI Gallery
[技术干货] 松材线虫病检测

松材线虫病检测1. 数据切分无人机广角拍摄的影像分辨率较高（4000x3000），首先对人工标注好的松材线虫病数据集进行切分，将大图切分成小图并设置不同的切分尺寸（例如：1000x1000、1500x1500、2000x2000）和重叠比例（例如：0%、10%、20%、30%）送入模型进行训练。2. 模型训练YOLOv8自2023年推出后，经过多次优化迭代，其架构设计（如C2F模块、动态标签分配）与训练流程已趋于成熟。例如嵌入式设备依赖v8的轻量化特效，在医疗检测领域，v8的高召回率已被临床验证。YOLO12等虽在理论上超越YOLOv8，但是v8的推理速度仍具不可替代性，目前在工业界广泛采用该版本进行部署。我们使用YOLOv8对等比例缩放后的原始图像和切分后的松材线虫病检测数据集进行训练，提高模型对不同大小目标的泛化能力，每次迭代训练s和m两种尺寸的模型，分别用于视频直播检测和图像的自动标注。目前我们的模型已经适配国产昇腾和英伟达的算力卡，可以实现模型的自动化训练作业，并针对不同算力芯片进行模型的自动转换和量化。3. 云上标注我们的模型可以对无人机回传的图片和视频进行切分检测和自动标注，针对不同大小的目标和类别可以设置不同的切分尺寸和重叠比例，实现无人机影像的细粒度检测。4. 直播推理我们的AI直播推理业务Pipeline并发运行，使用Python结合C++进行开发，功能模块化，业务运行更高效，可以在RK3588、Jetson系列开发板上进行部署。目前针对松材线虫病检测的场景，已经支持对9种疫木的实时识别。----转自博客：https://bbs.huaweicloud.com/blogs/458003

HouYanSong 发表于2025-08-29 07:41:05 2025-08-29 07:41:05 最后回复一只牛博 2025-09-04 09:05:08
31 3

昇腾计算机视觉人工智能边缘计算
[技术干货] 如何使用 Python 开发 AI 图编排应用

如何使用 Python 开发 AI 图编排应用本文将介绍使用Python开发一个简单的AI图编排应用，我们的目标是实现AI应用在RK3588上灵活编排和高效部署。首先我们定义的图是由边和节点组成的有向无环图，边代表任务队列，表示数据在节点之间的流动关系，每个节点都是一个计算单元，用于处理特定的任务。之后我们可以定义一组的处理特定任务的函数节点也称为计算单元，例如：read_frame、model_infer、kf_tracker、draw_boxes、push_frame、redis_push，分别用于读取视频、模型检测、目标跟踪、图像绘制、视频输出以及结果推送。每个节点可以有一个输入和多个输出，数据在节点之间是单向流动的，节点之间通过边进行连接，每个节点通过队列消费和传递数据。代码地址：https://github.com/HouYanSong/modelbox-rk3588一. 计算节点的实现我们在Json文件中定义每一个节点的的数据结构并使用Python进行代码实现：读流计算单元有4个参数：pull_video_url、height、width、fps，分别代表视频地址、视频高度和宽度以及读取帧率，它仅作为生产者，产生的数据可以输出到多个队列。"read_frame": { "config": { "pull_video_url": { "type": "str", "required": true, "default": null, "desc": "pull video url", "source": "mp4|flv|rtmp|rtsp" }, "height": { "type": "int", "required": true, "default": null, "max": 1440, "min": 720, "desc": "video height" }, "width": { "type": "int", "required": true, "default": null, "max": 1920, "min": 960, "desc": "video width" }, "fps": { "type": "int", "required": true, "default": null, "max": 15, "min": 5, "desc": "frame rate" } }, "multi_output": [] } 函数代码的实现如下，我们可以对视频文件或者视频流使用ffmpeg进行硬件解码，并将解码后的帧数据写入到队列中，用于后续任务节点的计算。def read_frame(share_dict, flowunit_data, queue_dict, data): pull_video_url = flowunit_data["config"]["pull_video_url"] height = flowunit_data["config"]["height"] width = flowunit_data["config"]["width"] fps = flowunit_data["config"]["fps"] ffmpeg_cmd = [ 'ffmpeg', '-c:v', 'h264_rkmpp', '-i', pull_video_url, '-r', f'{fps}', '-loglevel', 'info', '-s', f'{width}x{height}', '-an', '-f', 'rawvideo', '-pix_fmt', 'bgr24', 'pipe:' ] ffmpeg_process = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE, stderr=sp.DEVNULL, bufsize=10**7) index = 0 while True: index += 1 raw_frame = ffmpeg_process.stdout.read(width * height * 3) if not raw_frame: break else: frame = np.frombuffer(raw_frame, dtype=np.uint8).reshape((height, width, -1)) data["frame"] = frame for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) # 读取结束，图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) ffmpeg_process.stdout.close() ffmpeg_process.terminate() 推理计算单元的函数定义如下，它有一个输入和多个输出，我们可以指定模型和配置文件路径以及单次图像推理的批次大小等参数。"model_infer": { "config": { "model_file": { "type": "str", "required": true, "default": null, "desc": "model file path, rk3588 mostly ends with .rknn" }, "model_info": { "type": "str", "required": true, "default": null, "desc": "model info file path, mostly use json file" }, "batch_size": { "type": "int", "required": true, "default": null, "max": 8, "min": 1, "desc": "batch size" } }, "single_input": null, "multi_output": [] } 对应的函数实现如下，这里我们通过创建线程池的方式对图像进行批量推理，BatchSize的大小代表创建线程池的数量，将一个批次的推理结果写入到输出队列中，输出队列不唯一，可以为空或有多个输出队列。def model_infer(share_dict, flowunit_data, queue_dict, data): model_file = flowunit_data["config"]["model_file"] model_info = flowunit_data["config"]["model_info"] batch_size = flowunit_data["config"]["batch_size"] rknn_lite_list = [] for i in range(batch_size): rknn_lite = RKNNLite() rknn_lite.load_rknn(model_file) rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) rknn_lite_list.append(rknn_lite) with open(model_info, "r") as f: model_info = json.load(f) labels = [] for label in list(model_info["model_classes"].values()): labels.append(label) IMG_SIZE = model_info["input_shape"][0][-2:] OBJ_THRESH = model_info["conf_threshold"] NMS_THRESH = model_info["nms_threshold"] exist = False index = 0 while True: index += 1 image_batch = [] if flowunit_data["single_input"] is not None: for i in range(batch_size): data = queue_dict[flowunit_data["single_input"]].get() # 图片数据为None就退出循环 if data["frame"] is None: exist = True break image_batch.append(data) else: break with ThreadPoolExecutor(max_workers=batch_size) as executor: results = list(executor.map(infer_single_image, [(data["frame"], rknn_lite_list[i % batch_size], IMG_SIZE, OBJ_THRESH, NMS_THRESH) for i, data in enumerate(image_batch)])) for i, (boxes, classes, scores) in enumerate(results): classes = [labels[class_id] for class_id in classes] data = image_batch[i] if data.get("boxes") is None: data["boxes"] = boxes data["classes"] = classes data["scores"] = scores else: data["boxes"].extend(boxes) data["classes"].extend(classes) data["scores"].extend(scores) for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) if exist: break # 读取结束，图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) for rknn_lite in rknn_lite_list: rknn_lite.release() 跟踪功能单元的可以对推理结果添加跟踪ID，如果没有推理结果，则直接返回原始数据，其定义如下："kf_tracker": { "config": {}, "single_input": null, "multi_output": [] } 对应的函数代码实现如下：def kf_tracker(share_dict, flowunit_data, queue_dict, data): tracker = CentroidKF_Tracker(max_lost=30) index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break boxes, classes, scores = data.get("boxes"), data.get("classes"), data.get("scores") boxes = np.array(boxes) classes = np.array(classes) scores = np.array(scores) boxes[:, 2] = boxes[:, 2] - boxes[:, 0] boxes[:, 3] = boxes[:, 3] - boxes[:, 1] results = tracker.update(boxes, scores, classes) boxes = [] classes = [] scores = [] tracks = [] for result in results: frame_num, id, bb_left, bb_top, bb_width, bb_height, confidence, x, y, z, class_id = result boxes.append([bb_left, bb_top, bb_left + bb_width, bb_top + bb_height]) classes.append(class_id) scores.append(confidence) tracks.append(id) data["boxes"] = boxes data["classes"] = classes data["scores"] = scores data["tracks"] = tracks for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) # 读取结束，图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) 绘制功能单元可以对检测和跟踪结果进行绘制，如果检测结果或跟踪结果为空，则直接返回原始数据，其定义如下："draw_boxes": { "single_input": null, "config": {}, "multi_output": [] } 代码逻辑如下：def draw_boxes(share_dict, flowunit_data, queue_dict, data): index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break boxes, classes, scores = data.get("boxes"), data.get("classes"), data.get("scores") if boxes is not None: tracks = data.get("tracks") if tracks is not None: for boxe, clss, track in zip(boxes, classes, tracks): cv2.rectangle(data["frame"], (boxe[0], boxe[1]), (boxe[2], boxe[3]), (0, 255, 0), 2) cv2.putText(data["frame"], f"{clss} {track}", (boxe[0], boxe[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) else: for boxe, clss, conf in zip(boxes, classes, scores): cv2.rectangle(data["frame"], (boxe[0], boxe[1]), (boxe[2], boxe[3]), (0, 255, 0), 2) cv2.putText(data["frame"], f"{clss} {conf * 100:.2f}%", (boxe[0], boxe[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) # 读取结束，图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) 输出功能单元可以将视频帧编码成视频输到到视频文件或者推流到RTMP服务器，其参数定义如下："push_frame": { "config": { "push_video_url": { "type": "str", "required": true, "default": null, "desc": "push video url", "source": "rtmp|flv|mp4" }, "format": { "type": "str", "required": true, "default": null, "desc": "vodeo format", "source": "flv|mp4" }, "height": { "type": "int", "required": true, "default": null, "max": 1920, "min": 720, "desc": "video height" }, "width": { "type": "int", "required": true, "default": null, "max": 1920, "min": 960, "desc": "video width" }, "fps": { "type": "int", "required": true, "default": null, "max": 15, "min": 5, "desc": "frame rate" } }, "single_input": null } push_video_url参数是推流地址，也可以输出到本地视频文件。format参数指定视频格式，支持flv和mp4。height和width为视频分辨率，fps是输出帧率。它仅作为消费者，具体函数代码实现如下：def push_frame(share_dict, flowunit_data, queue_dict, data): push_video_url = flowunit_data["config"]["push_video_url"] format = flowunit_data["config"]["format"] height = flowunit_data["config"]["height"] width = flowunit_data["config"]["width"] fps = flowunit_data["config"]["fps"] process_stdin = ( ffmpeg .input('pipe:', format='rawvideo', pix_fmt='bgr24', s="{}x{}".format(width, height), framerate=fps) .filter('fps', fps=fps, round='up') .output( push_video_url, vcodec='h264_rkmpp', bitrate='2500k', f=format, g=fps, an=None, timeout='0' ) .overwrite_output() .run_async(cmd=["ffmpeg", "-re"], pipe_stdin=True) ) index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break frame = data["frame"] frame = cv2.resize(frame, (width, height)) process_stdin.stdin.write(frame.tobytes()) process_stdin.stdin.close() process_stdin.terminate() 消息功能单元可以将检测或跟踪结果发送到Redis服务器，具体可以根据实际情况进行调整。"redis_push": { "config": { "task_id": { "type": "str", "required": true, "default": null, "desc": "task id" }, "host": { "type": "str", "required": true, "default": null, "desc": "redis host" }, "port": { "type": "int", "required": true, "default": null, "desc": "redis port" }, "username": { "type": "str", "required": true, "default": null, "desc": "redis username" }, "password": { "type": "str", "required": true, "default": null, "desc": "redis password" }, "db": { "type": "int", "required": true, "default": null, "desc": "redis db" } }, "single_input": null } 同样，它也仅作为消费者，只有一个输入，具体函数代码如下：def redis_push(share_dict, flowunit_data, queue_dict, data): task_id = flowunit_data["config"]["task_id"] host = flowunit_data["config"]["host"] port = flowunit_data["config"]["port"] username = flowunit_data["config"]["username"] password = flowunit_data["config"]["password"] db = flowunit_data["config"]["db"] r = redis.Redis( host = host, port = port, username = username, password = password, db = db, decode_responses = True ) index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break track_objs = [] height, width = data["frame"].shape[:2] boxes, classes, scores, tracks = data.get("boxes"), data.get("classes"), data.get("scores"), data.get("tracks") if boxes is not None: for boxe, clss, conf, track in zip(boxes, classes, scores, tracks): x1 = float(boxe[0] / width) y1 = float(boxe[1] / height) x2 = float(boxe[2] / width) y2 = float(boxe[3] / height) track_obj = { "bbox": [x1, y1, x2, y2], "track_id": int(track), "class_id": 0, "class_name": str(clss) } track_objs.append(track_obj) key = 'vision:track:' + str(task_id) + ':frame:' + str(index) value = json.dumps({"track_result": track_objs}) r.set(key, value) r.expire(key, 2) print(track_objs) r.close() 二、流程图编排定义好节点，我们就可以定义管道也就是“边”将“节点”的输入和输出连接起来，这里我们定义6条边也就是实例化6个队列，在配置文件中声明每条管道的名称以及队列的最大容量。"queue_size": 16, "queue_list": [ "frame_queue", "infer_queue_1", "infer_queue_2", "track_queue", "draw_queue_1", "draw_queue_2" ] 之后就是对每一个节点的参数进行配置，并定义功能单元的输入和输出。"graph_edge": { "读流功能单元": { "read_frame": { "config": { "pull_video_url": "/home/orangepi/workspace/modelbox/data/car.mp4", "height": 720, "width": 1280, "fps": 20 }, "multi_output": [ "frame_queue" ] } }, "推理功能单元": { "model_infer": { "config": { "model_file": "/home/orangepi/workspace/modelbox/model/yolov8n_800x800_int8.rknn", "model_info": "/home/orangepi/workspace/modelbox/model/yolov8n_800x800_int8.json", "batch_size": 8 }, "single_input": "frame_queue", "multi_output": [ "infer_queue_1", "infer_queue_2" ] } }, "跟踪功能单元_2": { "kf_tracker": { "config": {}, "single_input": "infer_queue_2", "multi_output": [ "track_queue" ] } }, "绘图功能单元_1": { "draw_boxes": { "config": {}, "single_input": "infer_queue_1", "multi_output": [ "draw_queue_1" ] } }, "绘图功能单元_2": { "draw_boxes": { "single_input": "track_queue", "config": {}, "multi_output": [ "draw_queue_2" ] } }, "推流功能单元_1": { "push_frame": { "config": { "push_video_url": "/home/orangepi/workspace/modelbox/output/det_result.mp4", "format": "mp4", "height": 720, "width": 1280, "fps": 20 }, "single_input": "draw_queue_1" } }, "推流功能单元_2": { "push_frame": { "config": { "push_video_url": "/home/orangepi/workspace/modelbox/output/track_result.mp4", "format": "mp4", "height": 720, "width": 1280, "fps": 20 }, "single_input": "draw_queue_2" } } } 每个功能单元需要起一个节点名称用于功能单元的创建，每个节点名称保证全局唯一，正如字典中的键值不能重复。之后根据这份图文件编排启动AI应用，Python代码如下：import os import sys import json import argparse sys.path.append(os.path.join(os.path.dirname(__file__), '..')) from etc.flowunit import * from multiprocessing import Process, Queue, Manager if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('graph_path', type=str, nargs='?', default='/home/orangepi/workspace/modelbox/graph/person_car.json') args = parser.parse_args() # 初始化数据 data = {"frame": None} config = {} # 读取流程图 with open(args.graph_path) as f: graph = json.load(f) # 创建队列 queue_dict = {} queue_size = graph["queue_size"] for queue_name in graph["queue_list"]: queue_dict[queue_name] = Queue(maxsize=queue_size) with Manager() as manager: # 创建共享字典 share_dict = manager.dict() # 创建进程 process_list = [] graph_edge = graph["graph_edge"] for process in graph_edge.keys(): p = Process(target=eval(list(graph_edge[process].keys())[0]), args=(share_dict, list(graph_edge[process].values())[0], queue_dict, data,)) process_list.append(p) print("=============Start Process...=============") # 启动进程 for p in process_list: p.start() # 等待进程结束 for p in process_list: p.join() print("==========All Process Finished.===========") 这里我们读取一段测试视频分别将检测结果和跟踪结果保存为两个视频文件输出到output目录下：(python-3.9.15) orangepi@orangepi5plus:~$ python /home/orangepi/workspace/modelbox/graph/graph.py /home/orangepi/workspace/modelbox/graph/person_car.json =============Start Process...============= ffmpeg version 04f5eaa Copyright (c) 2000-2023 the FFmpeg developers built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04) configuration: --prefix=/usr --enable-gpl --enable-version3 --enable-libdrm --enable-rkmpp --enable-rkrga libavutil 58. 29.100 / 58. 29.100 libavcodec 60. 31.102 / 60. 31.102 libavformat 60. 16.100 / 60. 16.100 libavdevice 60. 3.100 / 60. 3.100 libavfilter 9. 12.100 / 9. 12.100 libswscale 7. 5.100 / 7. 5.100 libswresample 4. 12.100 / 4. 12.100 libpostproc 57. 3.100 / 57. 3.100 ffmpeg version 04f5eaa Copyright (c) 2000-2023 the FFmpeg developers built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04) configuration: --prefix=/usr --enable-gpl --enable-version3 --enable-libdrm --enable-rkmpp --enable-rkrga libavutil 58. 29.100 / 58. 29.100 libavcodec 60. 31.102 / 60. 31.102 libavformat 60. 16.100 / 60. 16.100 libavdevice 60. 3.100 / 60. 3.100 libavfilter 9. 12.100 / 9. 12.100 libswscale 7. 5.100 / 7. 5.100 libswresample 4. 12.100 / 4. 12.100 libpostproc 57. 3.100 / 57. 3.100 W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.190] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.191] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.192] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.248] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.338] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.338] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.339] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.384] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.459] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.459] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.460] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.504] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.606] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.606] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.608] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.658] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.761] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.761] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.762] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.814] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.910] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.910] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.912] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.962] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:48.069] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:48.070] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:48.071] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:48.122] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:48.228] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:48.228] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:48.229] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:48.280] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) Input #0, rawvideo, from 'pipe:': Duration: N/A, start: 0.000000, bitrate: 442368 kb/s Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 442368 kb/s, 20 tbr, 20 tbn Stream mapping: Stream #0:0 (rawvideo) -> fps:default fps:default -> Stream #0:0 (h264_rkmpp) Output #0, mp4, to '/home/orangepi/workspace/modelbox/output/det_result.mp4': Metadata: encoder : Lavf60.16.100 Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), bgr24(progressive), 1280x720, q=2-31, 2000 kb/s, 20 fps, 10240 tbn Metadata: encoder : Lavc60.31.102 h264_rkmpp Input #0, rawvideo, from 'pipe:': 0kB time=N/A bitrate=N/A speed=N/A Duration: N/A, start: 0.000000, bitrate: 442368 kb/s Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 442368 kb/s, 20 tbr, 20 tbn Stream mapping: Stream #0:0 (rawvideo) -> fps:default fps:default -> Stream #0:0 (h264_rkmpp) Output #0, mp4, to '/home/orangepi/workspace/modelbox/output/track_result.mp4': Metadata: encoder : Lavf60.16.100 Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), bgr24(progressive), 1280x720, q=2-31, 2000 kb/s, 20 fps, 10240 tbn Metadata: encoder : Lavc60.31.102 h264_rkmpp [out#0/mp4 @ 0x558f2625e0] video:2330kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.062495% frame= 132 fps= 19 q=-0.0 Lsize= 2331kB time=00:00:06.55 bitrate=2915.8kbits/s speed=0.924x Exiting normally, received signal 15. [out#0/mp4 @ 0x557e4875e0] video:1990kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.072192% frame= 131 fps= 18 q=-0.0 Lsize= 1991kB time=00:00:06.50 bitrate=2509.6kbits/s speed= 0.9x ==========All Process Finished.=========== Exiting normally, received signal 15.应用推理的帧率取决于视频读取的帧率以及耗时最久的功能单元，实测FPS约为20左右，满足AI实时检测的场景。

HouYanSong 发表于2025-08-28 14:15:01 2025-08-28 14:15:01 最后回复湘山Hsiong 2025-08-29 17:41:42
66 3

计算机视觉人工智能 ModelBox 边缘计算
[技术干货] 无人机巡检数据集：空中语义分割

无人机巡检数据集：空中语义分割由计算机图形与视觉研究所（ICG开发的语义无人机数据集，旨在推动城市场景的语义理解研究，提升自主无人机飞行与着陆的安全性。以下从数量、类别、分布及分辨率四个核心维度展开说明：一、数据数量与划分该数据集包含600张高分辨率图像，按用途分为训练集和测试集两部分：训练集：400张图像，公开可获取，包含完整的标注信息，支持模型训练与算法验证。测试集：200张图像，为私有数据，主要用于评估模型的泛化能力，确保研究结果的客观性。此外，数据集还提供了丰富的辅助数据，包括1Hz采集的高分辨率图像序列、5Hz的鱼眼立体图像（同步配有IMU测量数据）、1Hz的热成像图像，以及3栋房屋的地面控制点和全站仪获取的3D真值数据，进一步扩展了其应用场景。二、语义类别与标注数据集针对语义分割任务定义了20个核心类别，覆盖城市场景中的典型元素，具体分类如下：自然元素：树（tree）、草（gras）、其他植被（other vegetation）、泥土（dirt）、 gravel、岩石（rocks）、水（water）；人工构造：铺装区域（paved area）、泳池（pool）、屋顶（roof）、墙（wall）、栅栏（fence）、栅栏柱（fence-pole）、窗户（window）、门（door）、障碍物（obstacle）；动态目标：人（person）、狗（dog）、汽车（car）、自行车（bicycle）。标注精度达到像素级，确保语义分割任务的准确性；同时，针对人物检测任务，提供了训练集和测试集的边界框（bounding box）标注，支持多任务研究。三、数据分布特点数据集的图像均通过无人机从天底视角（鸟瞰视角）采集，覆盖超过20栋房屋的城市区域，拍摄高度在地面以上5至30米之间，确保场景的真实性与多样性。从分布来看：场景覆盖：包含居民区内的建筑、植被、道路、休闲区域（如泳池）等，兼顾自然与人工环境的混合场景；目标密度：图像中包含不同数量的动态目标（人、动物、车辆）和静态结构，适合测试算法在复杂目标交互场景下的表现；辅助数据分布：热成像、立体图像等辅助数据与主图像时空同步，可用于多模态融合研究，提升模型对环境的感知能力。四、分辨率与数据格式数据集的核心图像采用高分辨率相机采集，单张图像尺寸为6000×4000像素（2400万像素），确保细节信息的完整性，满足精细语义分割的需求。训练集提供多种格式的标注文件，包括：Python pickle格式的边界框数据；可选的XML格式边界框标注；可选的掩码图像（mask images），便于不同算法框架的直接使用。五、数据集下载地址# 数据集地址 https://developer.huaweicloud.com/develop/aigallery/dataset/detail?id=0efe9613-5248-4a43-925e-4d6d377b6996综上，该数据集凭借大尺寸、多类别、高精度的特点，为无人机视觉、语义分割、目标检测等领域的研究提供了高质量的基准数据，其丰富的辅助信息也为多模态感知与3D重建等任务奠定了基础。

HouYanSong 发表于2025-08-27 10:41:07 2025-08-27 10:41:07 最后回复湘山Hsiong 2025-08-29 17:41:42
101 4

AI开发平台ModelArts 计算机视觉 AI Gallery
[技术干货] 华为云开发者空间☁️昇腾NPU实现AI工业质检

华为云开发者空间☁️昇腾NPU实现AI工业质检本案例将在华为云开发者空间工作台⚙️AI Notebook 中使用免费的🎉 昇腾 NPU 910B ✨完成YOLO11模型训练，利用SAHI切片辅助超推理框架实现PCB缺陷检测。1. 下载模型和数据集📦首先在Notebook的代码块中粘贴并运行下面的代码，下载解压本案例所需的训练数据和模型文件：import os import zipfile if not os.path.exists('yolo11_train_ascend.zip'): os.system('wget -q https://orangepi-ascend.obs.cn-north-4.myhuaweicloud.com/yolo11_train_ascend.zip') if not os.path.exists('yolo11_train_ascend'): zip_file = zipfile.ZipFile('yolo11_train_ascend.zip') zip_file.extractall() zip_file.close() 2. 安装依赖包🛠️安装YOLO11所需的依赖包以及SAHI库，构建项目的运行环境：!pip install ultralytics==8.3.160 ultralytics-thop==2.0.14 sahi==0.11.26 numpy==1.26.4 Defaulting to user installation because normal site-packages is not writeable Looking in indexes: https://mirrors.huaweicloud.com/repository/pypi/simple Collecting ultralytics==8.3.160 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/7b/8d/924524ff26c0ed0ba43b90cc598887e2b06f3bf00dd51a505a754ecb138d/ultralytics-8.3.160-py3-none-any.whl (1.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 9.8 MB/s eta 0:00:00 Collecting ultralytics-thop==2.0.14 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/a6/10/251f036b4c5d77249f9a119cc89dafe8745dc1ad1f1a5f06b6a3988ca454/ultralytics_thop-2.0.14-py3-none-any.whl (26 kB) Collecting sahi==0.11.26 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/5e/8a/9782c8088af52e6f41fee59c77b5117783c0d6eafde45c96ca3912ec197f/sahi-0.11.26-py3-none-any.whl (115 kB) Collecting numpy==1.26.4 Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/fc/a5/4beee6488160798683eed5bdb7eead455892c3b4e1f78d79d8d3f3b084ac/numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.2/14.2 MB 42.9 MB/s eta 0:00:00 0:00:01 Requirement already satisfied: matplotlib>=3.3.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (3.10.0) Requirement already satisfied: opencv-python>=4.6.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (4.10.0.84) Requirement already satisfied: pillow>=7.1.2 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (11.0.0) Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (6.0.2) Requirement already satisfied: requests>=2.23.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (2.32.3) Requirement already satisfied: scipy>=1.4.1 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (1.14.1) Requirement already satisfied: torch>=1.8.0 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (2.1.0) Requirement already satisfied: torchvision>=0.9.0 in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (0.16.0) Requirement already satisfied: tqdm>=4.64.0 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (4.67.1) Requirement already satisfied: psutil in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (5.9.8) Requirement already satisfied: py-cpuinfo in /home/service/.local/lib/python3.10/site-packages (from ultralytics==8.3.160) (9.0.0) Requirement already satisfied: pandas>=1.1.4 in /usr/local/python3.10/lib/python3.10/site-packages (from ultralytics==8.3.160) (2.2.3) Requirement already satisfied: click in /usr/local/python3.10/lib/python3.10/site-packages (from sahi==0.11.26) (8.1.8) Collecting fire (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/6b/b6/82c7e601d6d3c3278c40b7bd35e17e82aa227f050aa9f66cb7b7fce29471/fire-0.7.0.tar.gz (87 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting pybboxes==0.1.6 (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/3c/3f/46f6613b41a3c2b4f7af3b526035771ca5bb12d6fdf3b23145899f785e36/pybboxes-0.1.6-py3-none-any.whl (24 kB) Collecting shapely>=2.0.0 (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/29/51/0b158a261df94e33505eadfe737db9531f346dfa60850945ad25fd4162f1/shapely-2.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 22.0 MB/s eta 0:00:00 Collecting terminaltables (from sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/c4/fb/ea621e0a19733e01fe4005d46087d383693c0f4a8f824b47d8d4122c87e0/terminaltables-3.1.10-py2.py3-none-any.whl (15 kB) Requirement already satisfied: contourpy>=1.0.1 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (1.3.1) Requirement already satisfied: cycler>=0.10 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (4.55.3) Requirement already satisfied: kiwisolver>=1.3.1 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (1.4.8) Requirement already satisfied: packaging>=20.0 in /usr/local/python3.10/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (24.2) Requirement already satisfied: pyparsing>=2.3.1 in /home/service/.local/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (3.2.0) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/python3.10/lib/python3.10/site-packages (from matplotlib>=3.3.0->ultralytics==8.3.160) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/python3.10/lib/python3.10/site-packages (from pandas>=1.1.4->ultralytics==8.3.160) (2024.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/python3.10/lib/python3.10/site-packages (from pandas>=1.1.4->ultralytics==8.3.160) (2024.2) Requirement already satisfied: charset-normalizer<4,>=2 in /home/service/.local/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/python3.10/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/service/.local/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /home/service/.local/lib/python3.10/site-packages (from requests>=2.23.0->ultralytics==8.3.160) (2024.12.14) Requirement already satisfied: filelock in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (3.16.1) Requirement already satisfied: typing-extensions in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (4.12.2) Requirement already satisfied: sympy in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (1.13.3) Requirement already satisfied: networkx in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (3.4.2) Requirement already satisfied: jinja2 in /usr/local/python3.10/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (3.1.5) Requirement already satisfied: fsspec in /home/service/.local/lib/python3.10/site-packages (from torch>=1.8.0->ultralytics==8.3.160) (2024.9.0) Collecting termcolor (from fire->sahi==0.11.26) Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/4f/bd/de8d508070629b6d84a30d01d57e4a65c69aa7f5abe7560b8fad3b50ea59/termcolor-3.1.0-py3-none-any.whl (7.7 kB) Requirement already satisfied: six>=1.5 in /usr/local/python3.10/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib>=3.3.0->ultralytics==8.3.160) (1.16.0) Requirement already satisfied: MarkupSafe>=2.0 in /home/service/.local/lib/python3.10/site-packages (from jinja2->torch>=1.8.0->ultralytics==8.3.160) (3.0.2) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/python3.10/lib/python3.10/site-packages (from sympy->torch>=1.8.0->ultralytics==8.3.160) (1.3.0) Building wheels for collected packages: fire Building wheel for fire (pyproject.toml) ... done Created wheel for fire: filename=fire-0.7.0-py3-none-any.whl size=114330 sha256=a1f27d511635da524f8f51fa2d35ae22862e400cec55285acbc05ced6ef91371 Stored in directory: /home/service/.cache/pip/wheels/9b/dc/c7/06491fe82713723ab64494dbcfd521bdbe80cf26b5fcb5f564 Successfully built fire Installing collected packages: terminaltables, termcolor, numpy, shapely, pybboxes, fire, ultralytics-thop, sahi, ultralytics Attempting uninstall: numpy Found existing installation: numpy 1.24.4 Uninstalling numpy-1.24.4: Successfully uninstalled numpy-1.24.4 WARNING: The script f2py is installed in '/home/service/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script sahi is installed in '/home/service/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts ultralytics and yolo are installed in '/home/service/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 5.9.1 requires markupsafe~=2.0, but you have markupsafe 3.0.2 which is incompatible. openmind 0.9.1 requires datasets<=2.21.0,>=2.18.0, but you have datasets 3.2.0 which is incompatible. openmind 0.9.1 requires openmind-hub==0.9.0, but you have openmind-hub 0.9.1 which is incompatible. openmind-datasets 0.7.1 requires datasets==2.18.0, but you have datasets 3.2.0 which is incompatible. openmind-evaluate 0.7.0 requires datasets==2.18.0, but you have datasets 3.2.0 which is incompatible. Successfully installed fire-0.7.0 numpy-1.26.4 pybboxes-0.1.6 sahi-0.11.26 shapely-2.1.1 termcolor-3.1.0 terminaltables-3.1.10 ultralytics-8.3.160 ultralytics-thop-2.0.14 [notice] A new release of pip is available: 24.3.1 -> 25.1.1 [notice] To update, run: pip install --upgrade pip3. 修改配置文件📝我们在配置文件中指定数据集路径和类别等信息，用于后续模型的训练：%%writefile yolo11_train_ascend/pcb.yaml # Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] path: /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced # dataset root dir (absolute path) train: images/train # train images (relative to 'path') val: images/val # val images (relative to 'path') test: # test images (optional) # Classes，类别 names: 0: mouse_bite 1: open_circuit 2: short 3: spur 4: spurious_copperWriting yolo11_train_ascend/pcb.yaml4. 下载 Arial.ttf 字体🖋️为了避免影响训练进展，可以先提前下载字体文件并拷贝到 /home/service/.config/Ultralytics 路径下。!wget https://orangepi-ascend.obs.cn-north-4.myhuaweicloud.com/Arial.ttf !mkdir -p /home/service/.config/Ultralytics !cp Arial.ttf /home/service/.config/Ultralytics/Arial.ttf--2025-06-28 05:55:59-- https://pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com/Arial.ttf Resolving pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com (pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com)... 100.125.6.3, 100.125.7.3, 100.125.6.131 Connecting to pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com (pcb-sahi-public.obs.cn-southwest-2.myhuaweicloud.com)|100.125.6.3|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 773236 (755K) [application/x-font-ttf] Saving to: 'Arial.ttf' Arial.ttf 100%[===================>] 755.11K --.-KB/s in 0.004s 2025-06-28 05:55:59 (188 MB/s) - 'Arial.ttf' saved [773236/773236] 5. 模型训练🧠🔥我们使用yolo11n.pt预训练模型，利用昇腾NPU进行模型加速，设置模型的训练次数为10轮、图像的大小为640x640、开启8个数据加载线程每次送入模型32张图像进行迭代优化。%cd yolo11_train_ascend import torch import torch_npu from torch_npu.contrib import transfer_to_npu from ultralytics import YOLO # Load a model model = YOLO('yolo11n.pt') # load a pretrained model (recommended for training) # Train the model results = model.train(data='pcb.yaml', epochs=10, imgsz=640, workers=8, batch=32) %cd .. /home/service/.local/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library. self.shell.db['dhist'] = compress_dhist(dhist)[-100:] /opt/huawei/edu-apaas/src/init/yolo11_train_ascend /home/service/.local/lib/python3.10/site-packages/torch_npu/utils/path_manager.py:82: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/latest owner does not match the current user. warnings.warn(f"Warning: The {path} owner does not match the current user.") /home/service/.local/lib/python3.10/site-packages/torch_npu/utils/path_manager.py:82: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/8.0.RC3/aarch64-linux/ascend_toolkit_install.info owner does not match the current user. warnings.warn(f"Warning: The {path} owner does not match the current user.") /home/service/.local/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:301: ImportWarning: ************************************************************************************************************* The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now.. The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now.. The backend in torch.distributed.init_process_group set to hccl now.. The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now.. The device parameters have been replaced with npu in the function below: torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty ************************************************************************************************************* warnings.warn(msg, ImportWarning) /home/service/.local/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu. warnings.warn(msg, RuntimeWarning) Creating new Ultralytics Settings v0.0.6 file ✅ View Ultralytics Settings with 'yolo settings' or at '/home/service/.config/Ultralytics/settings.json' Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings. [W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator()) Ultralytics 8.3.160 🚀 Python-3.10.15 torch-2.1.0 CUDA:0 (Ascend910B3, 62432MiB) engine/trainer: agnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=32, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=pcb.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/train, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=True, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None Overriding model.yaml nc=80 with nc=5 from n params module arguments 0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2] 1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2] 2 -1 1 6640 ultralytics.nn.modules.block.C3k2 [32, 64, 1, False, 0.25] 3 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 4 -1 1 26080 ultralytics.nn.modules.block.C3k2 [64, 128, 1, False, 0.25] 5 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 6 -1 1 87040 ultralytics.nn.modules.block.C3k2 [128, 128, 1, True] 7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2] 8 -1 1 346112 ultralytics.nn.modules.block.C3k2 [256, 256, 1, True] 9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5] 10 -1 1 249728 ultralytics.nn.modules.block.C2PSA [256, 256, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1] 13 -1 1 111296 ultralytics.nn.modules.block.C3k2 [384, 128, 1, False] 14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 15 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1] 16 -1 1 32096 ultralytics.nn.modules.block.C3k2 [256, 64, 1, False] 17 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 18 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1] 19 -1 1 86720 ultralytics.nn.modules.block.C3k2 [192, 128, 1, False] 20 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 21 [-1, 10] 1 0 ultralytics.nn.modules.conv.Concat [1] 22 -1 1 378880 ultralytics.nn.modules.block.C3k2 [384, 256, 1, True] 23 [16, 19, 22] 1 431647 ultralytics.nn.modules.head.Detect [5, [64, 128, 256]] /home/service/.local/lib/python3.10/site-packages/torch_npu/utils/storage.py:38: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() if self.device.type != 'cpu': YOLO11n summary: 181 layers, 2,590,815 parameters, 2,590,799 gradients, 6.4 GFLOPs Transferred 448/499 items from pretrained weights Freezing layer 'model.23.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks... [W compiler_depend.ts:51] Warning: CAUTION: The operator 'torchvision::nms' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) AMP: checks passed ✅ train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 620.7±42.3 MB/s, size: 454.2 KB) train: Scanning /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/train... 4646 images, 0 backgrounds, 0 corrupt: 100%|██████████| 4646/4646 [00:05<00:00, 848.20it/s] train: New cache created: /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/train.cache val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 471.4±135.5 MB/s, size: 448.2 KB) val: Scanning /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/val... 422 images, 0 backgrounds, 0 corrupt: 100%|██████████| 422/422 [00:00<00:00, 520.44it/s] val: New cache created: /opt/huawei/edu-apaas/src/init/yolo11_train_ascend/pcb_sliced/labels/val.cache Plotting labels to runs/detect/train/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: AdamW(lr=0.001111, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0) Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs/detect/train Starting training for 10 epochs... Closing dataloader mosaic Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 0%| | 0/146 [00:00<?, ?it/s] . /home/service/.local/lib/python3.10/site-packages/ultralytics/utils/tal.py:274: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.) target_scores = torch.where(fg_scores_mask > 0, target_scores, 0) [W compiler_depend.ts:103] Warning: Non finite check and unscale on NPU device! (function operator()) 1/10 7.77G 2.238 5.333 1.761 8 640: 100%|██████████| 146/146 [01:31<00:00, 1.60it/s] Class Images Instances Box(P R mAP50 mAP50-95): 0%| | 0/7 [00:00<?, ?it/s] ..... Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:58<00:00, 8.39s/it] all 422 604 0.39 0.0656 0.0888 0.023 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 2/10 8.2G 1.876 2.724 1.462 8 640: 100%|██████████| 146/146 [01:16<00:00, 1.92it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.58it/s] all 422 604 0.451 0.238 0.214 0.0639 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 3/10 8.2G 1.825 1.912 1.445 8 640: 100%|██████████| 146/146 [01:12<00:00, 2.00it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.64it/s] all 422 604 0.339 0.291 0.244 0.0742 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 4/10 8.2G 1.748 1.571 1.398 4 640: 100%|██████████| 146/146 [01:12<00:00, 2.02it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.58it/s] all 422 604 0.409 0.361 0.335 0.117 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 5/10 8.2G 1.703 1.343 1.372 6 640: 100%|██████████| 146/146 [01:11<00:00, 2.05it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.66it/s] all 422 604 0.442 0.34 0.321 0.118 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 6/10 8.2G 1.673 1.26 1.343 5 640: 100%|██████████| 146/146 [01:11<00:00, 2.03it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.62it/s] all 422 604 0.605 0.49 0.53 0.224 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 7/10 8.2G 1.614 1.145 1.316 6 640: 100%|██████████| 146/146 [01:12<00:00, 2.00it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.58it/s] all 422 604 0.595 0.542 0.525 0.206 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 8/10 8.2G 1.578 1.067 1.294 7 640: 100%|██████████| 146/146 [01:11<00:00, 2.03it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.73it/s] all 422 604 0.754 0.629 0.685 0.307 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 9/10 8.21G 1.551 1.009 1.275 8 640: 100%|██████████| 146/146 [01:11<00:00, 2.04it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:04<00:00, 1.57it/s] all 422 604 0.782 0.618 0.703 0.315 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 10/10 8.21G 1.5 0.9621 1.255 6 640: 100%|██████████| 146/146 [01:12<00:00, 2.02it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:03<00:00, 1.83it/s] all 422 604 0.8 0.661 0.732 0.354 10 epochs completed in 0.236 hours. Optimizer stripped from runs/detect/train/weights/last.pt, 5.5MB Optimizer stripped from runs/detect/train/weights/best.pt, 5.5MB Validating runs/detect/train/weights/best.pt... Ultralytics 8.3.160 🚀 Python-3.10.15 torch-2.1.0 CUDA:0 (Ascend910B3, 62432MiB) YOLO11n summary (fused): 100 layers, 2,583,127 parameters, 0 gradients, 6.3 GFLOPs ... Class Images Instances Box(P R mAP50 mAP50-95): 0%| | 0/7 [00:00<?, ?it/s] . Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 7/7 [00:07<00:00, 1.14s/it] all 422 604 0.799 0.663 0.732 0.355 mouse_bite 107 169 0.806 0.785 0.829 0.4 open_circuit 73 101 0.656 0.471 0.492 0.219 short 69 87 0.889 0.54 0.701 0.314 spur 95 134 0.864 0.714 0.76 0.342 spurious_copper 95 113 0.782 0.805 0.88 0.5 Speed: 0.1ms preprocess, 8.3ms inference, 0.0ms loss, 2.5ms postprocess per image Results saved to runs/detect/train /opt/huawei/edu-apaas/src/init /home/service/.local/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library. self.shell.db['dhist'] = compress_dhist(dhist)[-100:] 模型训练好后，可以在runs/detect/train目录下查看训练结果，例如损失函数的变化曲线、mAP等评价指标📈💪。6. 图像切分检测✂️🔍最后我们利用SAHI框架对高清PCB图像进行切片推理，从而更精准地检测出PCB的瑕疵类别。import torch import torch_npu from torch_npu.contrib import transfer_to_npu from sahi.predict import get_sliced_prediction from sahi import AutoDetectionModel from PIL import Image detection_model = AutoDetectionModel.from_pretrained( model_type = 'ultralytics', model_path = "yolo11_train_ascend/runs/detect/train/weights/best.pt", confidence_threshold = 0.4, device = "cuda:0" ) 这里我们使用滑窗检测🔍的技术，将原始图像切分成640x640大小的子图🖼️，同时设置一定的重叠度，再分别预测每张子图，最后将所有的检测结果进行合并处理🛠️。image_path = "https://orangepi-ascend.obs.cn-north-4.myhuaweicloud.com/001.bmp" result = get_sliced_prediction( image_path, detection_model, slice_height = 640, slice_width = 640, overlap_height_ratio = 0.1, overlap_width_ratio = 0.1, perform_standard_pred = False, postprocess_class_agnostic = True, postprocess_match_threshold = 0.1, ) result.export_visuals(export_dir="output/", file_name="sliced_result") Image.open("output/sliced_result.png") Performing prediction on 24 slices.可以看到，模型准确无误的预测出PCB缺陷的位置、类别和置信度😄7. 小结📌本案例借助华为云开发者空间💡昇腾910B NPU完成YOLO11模型训练与PCB缺陷检测，并且结合SAHI实现高效切片推理🚀，华为云开发者空间💻AI Notebook开箱即用，大家快来体验吧！🤗 ----转自博客：https://bbs.huaweicloud.com/blogs/455280

HouYanSong 发表于2025-08-26 12:54:46 2025-08-26 12:54:46 最后回复湘山Hsiong 2025-08-29 17:56:54
323 4

昇腾计算机视觉开发者空间

上滑加载中

推荐直播

热门标签

Java Python 数据结构数据库 Linux 机器学习网络任务调度 MySQL JavaScript