-
RK3588部署CNN-LSTM驾驶行为识别模型CNN(卷积神经网络)擅长提取图像的空间特征,LSTM(长短期记忆网络)则擅长处理序列数据的时间特征。首先使用CNN提取视频每一帧特征,之后将提取出的所有特征送入LSTM捕捉视频中的时空特征并对视频特征序列进行分类,实现正常驾驶、闭眼、打哈欠、打电话、左顾右盼5种驾驶行为的识别。一. 模型训练我们在ModelArts创建Notebook完成模型的训练,使用规格是GPU: 1*Pnt1(16GB)|CPU: 8核 64GB,镜像为tensorflow_2.1.0-cuda_10.1-py_3.7-ubuntu_18.04,首先下载数据集:import os import moxing as mox if not os.path.exists('fatigue_driving'): mox.file.copy_parallel('obs://modelbox-course/fatigue_driving', 'fatigue_driving') if not os.path.exists('rknn_toolkit2-2.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl'): mox.file.copy_parallel('obs://modelbox-course/rknn_toolkit2-2.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', 'rknn_toolkit2-2.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl') 该数据集包含1525段视频,总共有5个类别:0:正常驾驶、1:闭眼、2:打哈欠、3:打电话、4:左顾右盼我们从原视频中裁剪出主驾驶位画面,并将画面缩放到特征提取网络的输入大小:def crop_driving_square(frame): h, w = frame.shape[:2] start_x = w // 2 end_x = w start_y = 0 end_y = h return frame[start_y:end_y, start_x:end_x] 使用在imagenet上预训练的MobileNetV2网络作为卷积基创建并保存图像特征提取器:def get_feature_extractor(): feature_extractor = keras.applications.mobilenet_v2.MobileNetV2( weights = 'imagenet', include_top = False, pooling = 'avg', input_shape = (IMG_SIZE, IMG_SIZE, 3) ) preprocess_input = keras.applications.mobilenet_v2.preprocess_input inputs = keras.Input((IMG_SIZE, IMG_SIZE, 3)) preprocessed = preprocess_input(inputs) outputs = feature_extractor(preprocessed) model = keras.Model(inputs, outputs, name = 'feature_extractor') return model feature_extractor = get_feature_extractor() feature_extractor.save('feature_extractor') feature_extractor.summary() Model: "feature_extractor" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 256, 256, 3)] 0 _________________________________________________________________ tf_op_layer_truediv (TensorF [(None, 256, 256, 3)] 0 _________________________________________________________________ tf_op_layer_sub (TensorFlowO [(None, 256, 256, 3)] 0 _________________________________________________________________ mobilenetv2_1.00_224 (Model) (None, 1280) 2257984 ================================================================= Total params: 2,257,984 Trainable params: 2,223,872 Non-trainable params: 34,112 设置网络的输入大小为256x256,每隔6帧截取一帧提取视频的图像特征,特征向量的大小为1280,最终得到每个视频的特征序列,序列的最大长度为40,不足用0补齐:def load_video(file_name): cap = cv2.VideoCapture(file_name) frame_interval = 6 frames = [] count = 0 while True: ret, frame = cap.read() if not ret: break if count % frame_interval == 0: frame = crop_driving_square(frame) frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE)) frame = frame[:, :, [2, 1, 0]] frames.append(frame) count += 1 return np.array(frames) def load_data(videos, labels): video_features = [] for video in tqdm(videos): frames = load_video(video) counts = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if counts < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - counts # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 获取前MAX_SEQUENCE_LENGTH帧画面 frames = frames[:MAX_SEQUENCE_LENGTH, :] # 批量提取图像特征 video_feature = feature_extractor.predict(frames) video_features.append(video_feature) return np.array(video_features), np.array(labels) video_features, classes = load_data(videos, labels) video_features.shape, classes.shape((1525, 40, 1280), (1525,)) 总共提取了1525个视频的特征序列,按照8:2的比例划分训练集和测试集(batchsize的大小设为16):batch_size = 16 dataset = tf.data.Dataset.from_tensor_slices((video_features, classes)) dataset = dataset.shuffle(len(videos)) test_count = int(len(videos) * 0.2) train_count = len(videos) - test_count dataset_train = dataset.skip(test_count).cache().repeat() dataset_test = dataset.take(test_count).cache().repeat() train_dataset = dataset_train.shuffle(train_count).batch(batch_size) test_dataset = dataset_test.shuffle(test_count).batch(batch_size) train_dataset, train_count, test_dataset, test_count(<BatchDataset shapes: ((None, 40, 1280), (None,)), types: (tf.float32, tf.int64)>, 1220, <BatchDataset shapes: ((None, 40, 1280), (None,)), types: (tf.float32, tf.int64)>, 305) 之后创建LSTM提取视频特征序列的时间信息送入Dense分类器,模型的定义如下:def video_cls_model(class_vocab): # 类别数量 classes_num = len(class_vocab) # 定义模型 model = keras.Sequential([ layers.Input(shape=(MAX_SEQUENCE_LENGTH, NUM_FEATURES)), layers.LSTM(64, return_sequences=True), layers.Flatten(), layers.Dense(classes_num, activation='softmax') ]) # 编译模型 model.compile(optimizer = keras.optimizers.Adam(1e-5), loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'] ) return model # 模型实例化 model = video_cls_model(np.unique(labels)) # 保存检查点 checkpoint = keras.callbacks.ModelCheckpoint(filepath='best.h5', monitor='val_loss', save_weights_only=True, save_best_only=True, verbose=1, mode='min') # 模型结构 model.summary() 网络的输入大小为(N, 40, 1280),使用softmax进行激活,输出5个类别的概率:Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm (LSTM) (None, 40, 64) 344320 _________________________________________________________________ flatten (Flatten) (None, 2560) 0 _________________________________________________________________ dense (Dense) (None, 5) 12805 ================================================================= Total params: 357,125 Trainable params: 357,125 Non-trainable params: 0 _________________________________________________________________实验表明模型训练300个Epoch基本收敛:history = model.fit(train_dataset, epochs = 300, steps_per_epoch = train_count // batch_size, validation_steps = test_count // batch_size, validation_data = test_dataset, callbacks=[checkpoint]) plt.plot(history.epoch, history.history['loss'], 'r', label='loss') plt.plot(history.epoch, history.history['val_loss'], 'g--', label='val_loss') plt.title('LSTM') plt.xlabel('Epoch') plt.ylabel('Loss') plt.legend() plt.plot(history.epoch, history.history['accuracy'], 'r', label='acc') plt.plot(history.epoch, history.history['val_accuracy'], 'g--', label='val_acc') plt.title('LSTM') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() 加载模型最优权重,模型在测试集上的分类准确率为95.8%,保存为saved_model格式:model.load_weights('best.h5') model.evaluate(dataset.batch(batch_size)) model.save('saved_model') 96/96 [==============================] - 0s 5ms/step - loss: 0.2169 - accuracy: 0.9580 [0.21687692414949802, 0.9580328] 二、模型转换首先将图像特征提取器feature_extractor转为tflite格式,并开启模型量化:import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model('feature_extractor') converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] converter.post_training_quantize = True # 模型量化 tflite_model = converter.convert() with open('mbv2.tflite', 'wb') as f: f.write(tflite_model) 再将视频序列分类模型转为onnx格式,由于lstm参数量较少,不需要进行量化:python -m tf2onnx.convert --saved-model saved_model --output lstm.onnx --opset 12 最后导出RKNN格式的模型,可根据需要设置target_platform为rk3568/rk3588:from rknn.api import RKNN rknn = RKNN(verbose=False) rknn.config(target_platform="rk3588") rknn.load_tflite(model="mbv2.tflite") rknn.build(do_quantization=False) rknn.export_rknn('mbv2.rknn') rknn.release() rknn = RKNN(verbose=False) rknn.config(target_platform="rk3588") rknn.load_onnx( model="lstm.onnx", inputs=['input_3'], # 输入节点名称 input_size_list=[[1, 40, 1280]] # 固定输入尺寸 ) rknn.build(do_quantization=False) rknn.export_rknn('lstm.rknn') rknn.release() 三、模型部署我们在RK3588上部署MobileNetV2和LSTM模型,以下是板侧的推理代码:import os import cv2 import glob import shutil import imageio import numpy as np from IPython.display import Image from rknnlite.api import RKNNLite MAX_SEQUENCE_LENGTH = 40 IMG_SIZE = 256 NUM_FEATURES = 1280 def crop_driving_square(img): h, w = img.shape[:2] start_x = w // 2 end_x = w start_y = 0 end_y = h result = img[start_y:end_y, start_x:end_x] return result def load_video(file_name): cap = cv2.VideoCapture(file_name) # 每隔多少帧抽取一次 frame_interval = 6 frames = [] count = 0 while True: ret, frame = cap.read() if not ret: break # 每隔frame_interval帧保存一次 if count % frame_interval == 0: # 中心裁剪 frame = crop_driving_square(frame) # 缩放 frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE)) # BGR -> RGB [0,1,2] -> [2,1,0] frame = frame[:, :, [2, 1, 0]] frames.append(frame) count += 1 cap.release() return np.array(frames).astype(np.uint8) # 获取视频特征序列 def getVideoFeat(frames): frames_count = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if frames_count < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - frames_count # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 取前MAX_SEQ_LENGTH帧 frames = frames[:MAX_SEQUENCE_LENGTH,:] frames = frames.astype(np.float32) # 提取视频每一帧特征 feats = [] for frame in frames: frame = np.expand_dims(frame, axis=0) result = rknn_lite_mbv2.inference(inputs=[frame]) feats.append(result[0]) return feats rknn_lite_mbv2 = RKNNLite() rknn_lite_lstm = RKNNLite() rknn_lite_mbv2.load_rknn('model/mbv2.rknn') rknn_lite_lstm.load_rknn('model/lstm.rknn') rknn_lite_mbv2.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) rknn_lite_lstm.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) files = glob.glob("video/*.mp4") for video_path in files: label_to_name = {0:'正常驾驶', 1:'闭眼', 2:'打哈欠', 3:'打电话', 4:'左顾右盼'} frames = load_video(video_path) frames = frames[:MAX_SEQUENCE_LENGTH] imageio.mimsave('test.gif', frames, durations=10, loop=0) display(Image(open('test.gif', 'rb').read())) feats = getVideoFeat(frames) feats = np.concatenate(feats, axis=0) feats = np.expand_dims(feats, axis=0) preds = rknn_lite_lstm.inference(inputs=[feats])[0][0] for i in np.argsort(preds)[::-1][:5]: print('{}: {}%'.format(label_to_name[i], round(preds[i]*100, 2))) rknn_lite_mbv2.release() rknn_lite_lstm.release() 最终的视频识别效果如下:🚀四、本文小结本文详细阐述了基于RK3588平台的CNN-LSTM驾驶行为识别模型全流程,利用MobileNetV2提取图像的空间特征、LSTM处理视频的时序特征完成对正常驾驶、闭眼、打哈欠、打电话和左顾右盼5类驾驶行为的精准识别,在ModelArts上训练达到95.8%分类准确率,并分别将mbv2.tflite和lstm.onnx转换为RKNN格式实现板侧的高效推理部署。
-
在CUDA编程中,一个CUDA Kernel是由众多线程(threds)组成,而这些线程又可以被组织成一个或多个block块。在同一线程块中,线程ID是从0开始连续编号的,可以通过内置变量threadIdx来获取:// 获取本线程的索引,blockIdx 指的是线程块的索引,blockDim 指的是线程块的大小,threadIdx 指的是本线程块中的线程索引 int tid = blockIdx.x * blockDim.x + threadIdx.x; 以对图像的归一化处理为例,需要对图片中的每一个像素点的三个通道值分别除以255,相比于使用CPU进行串行计算,我们可以使用CUDA核函数创建更多的线程和线程块来充分利用GPU的并行处理能力:// 计算需要的线程总量(高度 x 宽度):640*640=409600 int jobs = dst_height * dst_width; // 一个线程块包含256个线程 int threads = 256; // 计算线程块的数量(向上取整) int blocks = ceil(jobs / (float)threads); // 调用kernel函数 preprocess_kernel<<<blocks, threads>>>( img_buffer_device, dst, dst_width, dst_height, jobs); // 函数的参数 这里我们定义每个线程块的线程数量为256,线程块的数量为ceil(jobs / (float)threads),总的线程总量要大于等图片的像素数量。当启动Kernel函数时,GPU上的每个线程都会执行相同的程序代码,从而实现更高效的并行计算,函数具体实现如下:// 一个线程处理一个像素点 __global__ void preprocess_kernel( uint8_t *src, float *dst, int dst_width, int dst_height, int edge) { int tid = blockDim.x * blockIdx.x + threadIdx.x; if (tid >= edge) return; int dx = tid % dst_width; // 计算当前线程对应的目标图像的x坐标 int dy = tid / dst_width; // 计算当前线程对应的目标图像的y坐标 // normalization(对原图中(x,y)坐标的像素点3个通道进行归一化) float c0 = src[dy * dst_width * 3 + dx * 3 + 0] / 255.0f; float c1 = src[dy * dst_width * 3 + dx * 3 + 1] / 255.0f; float c2 = src[dy * dst_width * 3 + dx * 3 + 2] / 255.0f; // bgr to rgb float t = c2; c2 = c0; c0 = t; // rgbrgbrgb to rrrgggbbb // NHWC to NCHW int area = dst_width * dst_height; float *pdst_c0 = dst + dy * dst_width + dx; float *pdst_c1 = pdst_c0 + area; float *pdst_c2 = pdst_c1 + area; *pdst_c0 = c0; *pdst_c1 = c1; *pdst_c2 = c2; } 其中tid是本线程的索引,dst_width和dst_height是图像的宽和高,edge是图片的像素数量,每一个线程处理一个像素点。由于线程索引是从0开始计数的,我们要确保tid不能超过图片的像素数量edge:int tid = blockDim.x * blockIdx.x + threadIdx.x; if (tid >= edge) return; 由于图像数据以行优先(row-major)顺序连续存储在内存中,每个像素由3个字节表示(BGR)。为了获取每个线程所处理的像素点在内存中的起始位置,我们可以先计算当前线程所对应图像的x和y坐标即dx和dy:int dx = tid % dst_width; // 计算当前线程对应的目标图像的x坐标 int dy = tid / dst_width; // 计算当前线程对应的目标图像的y坐标 然后获取当前线程所处理的像素点在内存中的起始位置:dy * dst_width * 3 + dx * 3,*3是因为每个像素点有3个通道值,在内存中的排列方式为:BGRBGRBGR...,最后再/255对原图中(x,y)坐标的像素点3个通道值进行归一化:// normalization float c0 = src[dy * dst_width * 3 + dx * 3 + 0] / 255.0f; float c1 = src[dy * dst_width * 3 + dx * 3 + 1] / 255.0f; float c2 = src[dy * dst_width * 3 + dx * 3 + 2] / 255.0f; dy * dst_width * 3:定位到第dy行的起始位置dx * 3:在当前行中定位到第dx个像素的起始位置+ 0, + 1, + 2:分别访问B、G、R三个通道的值除以255交换变量做BGR到RGB的通道转换:// bgr to rgb float t = c2; c2 = c0; c0 = t; 目标图像(RGB)像素点在内存中的排列方式为RRR...GGG...BBB,当前像素点R通道的值在目标图像中内存地址为(dst + dy * dst_width + dx),G通道的值在目标图像中内存地址为(dst + dy * dst_width + dx) + area,加上1个通道的偏移量area,以此类推,完成对图像的通道转换:// NHWC to NCHW // rgbrgbrgb to rrrgggbbb int area = dst_width * dst_height; float *pdst_c0 = dst + dy * dst_width + dx; float *pdst_c1 = pdst_c0 + area; float *pdst_c2 = pdst_c1 + area; *pdst_c0 = c0; *pdst_c1 = c1; *pdst_c2 = c2;
-
Ascend310部署Qwen-VL-7B实现吸烟动作识别OrangePi AI Studio Pro是基于2个昇腾310P处理器的新一代高性能推理解析卡,提供基础通用算力+超强AI算力,整合了训练和推理的全部底层软件栈,实现训推一体。其中AI半精度FP16算力约为176TFLOPS,整数Int8精度可达352TOPS,本文将带领大家在Ascend 310P上部署Qwen2.5-VL-7B多模态理解大模型实现吸烟动作的识别。一、环境配置我们在OrangePi AI Stuido上使用Docker容器部署MindIE:docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-300I-Duo-py311-openeuler24.03-ltsroot@orangepi:~# docker images REPOSITORY TAG IMAGE ID CREATED SIZE swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 2.1.RC1-300I-Duo-py311-openeuler24.03-lts 0574b8d4403f 3 months ago 20.4GB langgenius/dify-web 1.0.1 b2b7363571c2 8 months ago 475MB langgenius/dify-api 1.0.1 3dd892f50a2d 8 months ago 2.14GB langgenius/dify-plugin-daemon 0.0.4-local 3f180f39bfbe 8 months ago 1.35GB ubuntu/squid latest dae40da440fe 8 months ago 243MB postgres 15-alpine afbf3abf6aeb 8 months ago 273MB nginx latest b52e0b094bc0 9 months ago 192MB swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 1.0.0-300I-Duo-py311-openeuler24.03-lts 74a5b9615370 10 months ago 17.5GB redis 6-alpine 6dd588768b9b 10 months ago 30.2MB langgenius/dify-sandbox 0.2.10 4328059557e8 13 months ago 567MB semitechnologies/weaviate 1.19.0 8ec9f084ab23 2 years ago 52.5MB之后创建一个名为start-docker.sh的启动脚本,内容如下:NAME=$1 if [ $# -ne 1 ]; then echo "warning: need input container name.Use default: mindie" NAME=mindie fi docker run --name ${NAME} -it -d --net=host --shm-size=500g \ --privileged=true \ -w /usr/local/Ascend/atb-models \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --entrypoint=bash \ -v /models:/models \ -v /data:/data \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/sbin:/usr/local/sbin \ -v /home:/home \ -v /tmp:/tmp \ -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \ -e http_proxy=$http_proxy \ -e https_proxy=$https_proxy \ -e "PATH=/usr/local/python3.11.6/bin:$PATH" \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-300I-Duo-py311-openeuler24.03-ltsbash start-docker.sh启动容器后,我们需要替换几个文件并安装Ascend-cann-nnal软件包:root@orangepi:~# docker exec -it mindie bash Welcome to 5.15.0-126-generic System information as of time: Sat Nov 15 22:06:48 CST 2025 System load: 1.87 Memory used: 6.3% Swap used: 0.0% Usage On: 33% Users online: 0 [root@orangepi atb-models]# cd /usr/local/Ascend/ascend-toolkit/8.2.RC1/lib64/ [root@orangepi lib64]# ls /data/fix_openeuler_docker/fixhccl/8.2hccl/ libhccl.so libhccl_alg.so libhccl_heterog.so libhccl_plf.so [root@orangepi lib64]# cp /data/fix_openeuler_docker/fixhccl/8.2hccl/* ./ cp: overwrite './libhccl.so'? cp: overwrite './libhccl_alg.so'? cp: overwrite './libhccl_heterog.so'? cp: overwrite './libhccl_plf.so'? [root@orangepi lib64]# source /usr/local/Ascend/ascend-toolkit/set_env.sh [root@orangepi lib64]# chmod +x /data/fix_openeuler_docker/Ascend-cann-nnal/Ascend-cann-nnal_8.3.RC1_linux-x86_64.run [root@orangepi lib64]# /data/fix_openeuler_docker/Ascend-cann-nnal/Ascend-cann-nnal_8.3.RC1_linux-x86_64.run --install --quiet [NNAL] [20251115-22:41:45] [INFO] LogFile:/var/log/ascend_seclog/ascend_nnal_install.log [NNAL] [20251115-22:41:45] [INFO] Ascend-cann-atb_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 start WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv [NNAL] [20251115-22:41:58] [INFO] Ascend-cann-atb_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 install success [NNAL] [20251115-22:41:58] [INFO] Ascend-cann-SIP_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 start [NNAL] [20251115-22:41:59] [INFO] Ascend-cann-SIP_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 install success [NNAL] [20251115-22:41:59] [INFO] Ascend-cann-nnal_8.3.RC1_linux-x86_64.run install success Warning!!! If the environment variables of atb and asdsip are set at the same time, unexpected consequences will occur. Import the corresponding environment variables based on the usage scenarios: atb for large model scenarios, asdsip for embedded scenarios. Please make sure that the environment variables have been configured. If you want to use atb module: - To take effect for current user, you can exec command below: source /usr/local/Ascend/nnal/atb/set_env.sh or add "source /usr/local/Ascend/nnal/atb/set_env.sh" to ~/.bashrc. If you want to use asdsip module: - To take effect for current user, you can exec command below: source /usr/local/Ascend/nnal/asdsip/set_env.sh or add "source /usr/local/Ascend/nnal/asdsip/set_env.sh" to ~/.bashrc. [root@orangepi lib64]# cat /usr/local/Ascend/nnal/atb/latest/version.info Ascend-cann-atb : 8.3.RC1 Ascend-cann-atb Version : 8.3.RC1.B106 Platform : x86_64 branch : 8.3.rc1-0702 commit id : 16004f23040e0dcdd3cf0c64ecf36622487038ba修改推理使用的逻辑NPU核心为0,1,测试多模态理解大模型:Qwen2.5-VL-7B-Instruct:运行结果表明,Qwen2.5-VL-7B-Instruct在2 x Ascned 310P上推理平均每秒可以输出20个tokens,同时准确理解画面中的人物信息和行为动作。[root@orangepi atb-models]# bash examples/models/qwen2_vl/run_pa.sh --model_path /models/Qwen2.5-VL-7B-Instruct/ --input_image /root/pic/test.jpg [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] ***************************************** [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] ***************************************** /usr/local/lib64/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( /usr/local/lib64/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 2025-11-15 22:12:53.250 7934 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llmmodels] [loglevel] [file:line] [status code] msg 2025-11-15 22:12:53.250 7933 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llmmodels] [loglevel] [file:line] [status code] msg [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] llama_LlamaDecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] llama_LlamaDecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:55,335] [7934] [139886327420160] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 1, device_id: 1, numa_id: 0, shard_devices: [0, 1], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-11-15 22:12:55,336] [7934] [139886327420160] [llmmodels] [INFO] [cpu_binding.py-280] : process 7934, new_affinity is [8, 9, 10, 11, 12, 13, 14, 15], cpu count 8 [2025-11-15 22:12:55,356] [7933] [139649439929600] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 0, device_id: 0, numa_id: 0, shard_devices: [0, 1], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-11-15 22:12:55,357] [7933] [139649439929600] [llmmodels] [INFO] [cpu_binding.py-280] : process 7933, new_affinity is [0, 1, 2, 3, 4, 5, 6, 7], cpu count 8 [2025-11-15 22:12:56,032] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-156] : model_runner.quantize: None, model_runner.kv_quant_type: None, model_runner.fa_quant_type: None, model_runner.dtype: torch.float16 [2025-11-15 22:13:01,826] [7933] [139649439929600] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-11-15 22:13:01,827] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-187] : init tokenizer done Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [2025-11-15 22:13:02,070] [7934] [139886327420160] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [W InferFormat.cpp:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator()) [W InferFormat.cpp:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator()) [2025-11-15 22:13:08,435] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-153] : >>>> qwen_QwenDecoderModel is called. [2025-11-15 22:13:08,526] [7934] [139886327420160] [llmmodels] [INFO] [flash_causal_qwen2.py-153] : >>>> qwen_QwenDecoderModel is called. [2025-11-15 22:13:16.666] [7933] [139649439929600] [llmmodels] [WARN] [operation_factory.cpp:42] OperationName: TransdataOperation not find in operation factory map [2025-11-15 22:13:16.698] [7934] [139886327420160] [llmmodels] [WARN] [operation_factory.cpp:42] OperationName: TransdataOperation not find in operation factory map [2025-11-15 22:13:22,379] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-282] : model: FlashQwen2vlForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (vision_tower): Qwen25VisionTransformerPretrainedModelATB( (encoder): Qwen25VLVisionEncoderATB( (layers): ModuleList( (0-31): 32 x Qwen25VLVisionLayerATB( (attn): VisionAttention( (qkv): TensorParallelColumnLinear( (linear): FastLinear() ) (proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): VisionMlp( (gate_up_proj): TensorParallelColumnLinear( (linear): FastLinear() ) (down_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (norm1): BaseRMSNorm() (norm2): BaseRMSNorm() ) ) (patch_embed): FastPatchEmbed( (proj): TensorReplicatedLinear( (linear): FastLinear() ) ) (patch_merger): PatchMerger( (patch_merger_mlp_0): TensorParallelColumnLinear( (linear): FastLinear() ) (patch_merger_mlp_2): TensorParallelRowLinear( (linear): FastLinear() ) (patch_merger_ln_q): BaseRMSNorm() ) ) (rotary_pos_emb): VisionRotaryEmbedding() ) (language_model): FlashQwen2UsingMROPEForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (transformer): FlashQwenModel( (wte): TensorEmbeddingWithoutChecking() (h): ModuleList( (0-27): 28 x FlashQwenLayer( (attn): FlashQwenAttention( (rotary_emb): PositionRotaryEmbedding() (c_attn): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): QwenMLP( (act): SiLU() (w2_w1): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (ln_1): QwenRMSNorm() (ln_2): QwenRMSNorm() ) ) (ln_f): QwenRMSNorm() ) (lm_head): TensorParallelHead( (linear): FastLinear() ) ) ) [2025-11-15 22:13:24,268] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-134] : hbm_capacity(GB): 87.5078125, init_memory(GB): 11.376015624962747 [2025-11-15 22:13:24,789] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-342] : pa_runner: PARunner(model_path=/models/Qwen2.5-VL-7B-Instruct/, input_text=请用超过500个字详细说明图片的内容,并仔细判断画面中的人物是否有吸烟动作。, max_position_embeddings=None, max_input_length=16384, max_output_length=1024, max_prefill_tokens=-1, load_tokenizer=True, enable_atb_torch=False, max_prefill_batch_size=None, max_batch_size=1, dtype=torch.float16, block_size=128, model_config=ModelConfig(num_heads=14, num_kv_heads=2, num_kv_heads_origin=4, head_size=128, k_head_size=128, v_head_size=128, num_layers=28, device=npu:0, dtype=torch.float16, soc_info=NPUSocInfo(soc_name='', soc_version=200, need_nz=True, matmul_nd_nz=False), kv_quant_type=None, fa_quant_type=None, mapping=Mapping(world_size=2, rank=0, num_nodes=1,pp_rank=0, pp_groups=[[0], [1]], micro_batch_size=1, attn_dp_groups=[[0], [1]], attn_tp_groups=[[0, 1]], attn_inner_sp_groups=[[0], [1]], attn_cp_groups=[[0], [1]], attn_o_proj_tp_groups=[[0], [1]], mlp_tp_groups=[[0, 1]], moe_ep_groups=[[0], [1]], moe_tp_groups=[[0, 1]]), cla_share_factor=1, model_type=qwen2_5_vl, enable_nz=False), max_memory=93960798208, [2025-11-15 22:13:24,794] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-122] : ---------------Begin warm_up--------------- [2025-11-15 22:13:24,794] [7933] [139649439929600] [llmmodels] [INFO] [cache.py-154] : kv cache will allocate 0.46484375GB memory [2025-11-15 22:13:24,821] [7934] [139886327420160] [llmmodels] [INFO] [cache.py-154] : kv cache will allocate 0.46484375GB memory [2025-11-15 22:13:24,827] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1139] : ------total req num: 1, infer start-------- [2025-11-15 22:13:26,002] [7934] [139886327420160] [llmmodels] [INFO] [flash_causal_qwen2.py-680] : <<<<<<<after transdata k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,023] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-676] : <<<<<<< ori k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,023] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-680] : <<<<<<<after transdata k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,024] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-705] : >>>>>>id of kcache is 139645634198608 id of vcache is 139645634198320 [2025-11-15 22:13:34,363] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 9476.590633392334ms, Prefill average time: 9476.590633392334ms, Decode token time: 54.94809150695801ms, E2E time: 9531.538724899292ms [2025-11-15 22:13:34,363] [7934] [139886327420160] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 9452.020645141602ms, Prefill average time: 9452.020645141602ms, Decode token time: 54.654598236083984ms, E2E time: 9506.675243377686ms [2025-11-15 22:13:34,366] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1326] : -------------------performance dumped------------------------ [2025-11-15 22:13:34,371] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1329] : | batch_size | input_seq_len | output_seq_len | e2e_time(ms) | prefill_time(ms) | decoder_token_time(ms) | prefill_count | prefill_average_time(ms) | |-------------:|----------------:|-----------------:|---------------:|-------------------:|-------------------------:|----------------:|---------------------------:| | 1 | 16384 | 2 | 9531.54 | 9476.59 | 54.95 | 1 | 9476.59 | /usr/local/lib64/python3.11/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( [2025-11-15 22:13:35,307] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-148] : warmup_memory(GB): 15.75 [2025-11-15 22:13:35,307] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-153] : ---------------End warm_up--------------- /usr/local/lib64/python3.11/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( [2025-11-15 22:13:35,363] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1139] : ------total req num: 1, infer start-------- [2025-11-15 22:13:50,021] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 1004.0028095245361ms, Prefill average time: 1004.0028095245361ms, Decode token time: 13.301290491575836ms, E2E time: 14611.222982406616ms [2025-11-15 22:13:50,021] [7934] [139886327420160] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 1067.9974555969238ms, Prefill average time: 1067.9974555969238ms, Decode token time: 13.300292536193908ms, E2E time: 14674.196720123291ms [2025-11-15 22:13:50,025] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1326] : -------------------performance dumped------------------------ [2025-11-15 22:13:50,028] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1329] : | batch_size | input_seq_len | output_seq_len | e2e_time(ms) | prefill_time(ms) | decoder_token_time(ms) | prefill_count | prefill_average_time(ms) | |-------------:|----------------:|-----------------:|---------------:|-------------------:|-------------------------:|----------------:|---------------------------:| | 1 | 1675 | 1024 | 14611.2 | 1004 | 13.3 | 1 | 1004 | [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-385] : Question[0]: [{'image': '/root/pic/test.jpg'}, {'text': '请用超过500个字详细说明图片的内容,并仔细判断画面中的人物是否有吸烟动作。'}] [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-386] : Answer[0]: 这张图片展示了一个无人机航拍的场景,画面中可以看到两名工人站在一个雪地或冰面上。他们穿着橙色的安全背心和红色的安全帽,显得非常醒目。背景中可以看到一些雪地和一些金属结构,可能是桥梁或工业设施的一部分。 从图片的细节来看,画面右侧的工人右手放在嘴边,似乎在吸烟。他的姿势和动作与吸烟者的典型姿势相符。然而,由于图片的分辨率和角度限制,无法完全确定这个动作是否真实发生。如果要准确判断,可能需要更多的视频片段或更清晰的图像。 从无人机航拍的角度来看,这个场景可能是在进行某种工业或建筑项目的检查或监控。两名工人可能正在进行现场检查或讨论工作事宜。雪地和金属结构表明这可能是一个寒冷的冬季,或者是一个寒冷的气候区域。 无人机航拍技术在工业和建筑领域中非常常见,因为它可以提供高空视角,帮助工程师和管理人员更好地了解现场情况。这种技术不仅可以节省时间和成本,还可以提高工作效率和安全性。在进行航拍时,确保遵守当地的法律法规和安全规定是非常重要的。 总的来说,这张图片展示了一个无人机航拍的场景,画面中两名工人站在雪地上,其中一人似乎在吸烟。虽然无法完全确定这个动作是否真实发生,但根据他们的姿势和动作,可以合理推测这个动作的存在。 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-387] : Generate[0] token num: 282 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-389] : Latency(s): 14.721353530883789 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-390] : Throughput(tokens/s): 19.15584728050956 本文详细介绍了在OrangePi AI Studio上使用Docker容器部署MindIE环境并运行Qwen2.5-VL-7B-Instruct多模态大模型实现吸烟动作识别的完整过程,验证了在Ascned 310p设备上运行多模态理解大模型的可靠性。
-
松材线虫病边缘模型训练与推理部署本文详细介绍了松材线虫病检测的边缘模型训练与推理部署全流程。首先,针对无人机拍摄的4032×3024原始图像进行预处理,缩放到1024×1024避免内存溢出,并定义了9个类别(包括麻栎、罩网、疑似、早期、轻度、中度、重度、死亡和逾年)。随后采用20%重叠率对图像进行切分,生成训练集60000张、验证集6495张的sahi数据集。模型训练基于yolo11s.yaml配置,在pwd数据集上进行10个Epoch的训练,虽然实际应用建议至少100个Epoch。评估结果显示,模型在pwd(重度)类别上表现最佳(mAP50达0.707),而pwd_early(早期)类别表现较差。为提升推理效率,将模型导出为TensorRT FP16引擎,GPU推理速度提升高达5倍,单张图片推理耗时约20ms。最后,通过Gradio构建了用户友好的检测应用,实现了松材线虫病的实时检测功能,为林业病害监测提供了有效的技术解决方案,具有较强的实用价值和推广前景。1. 原始数据无人机拍摄原始图像大小是4032 x 3024,这里缩放到1024 x 1024,避免在模型训练时内存溢出:%%writefile pwd.yaml # Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] path: /home/jetson/ultralytics/dataset/pwd # dataset root dir (absolute path) train: train/images # train images (relative to 'path') val: val/images # val images (relative to 'path') test: # test images (optional) # Classes,类别 names: 0: hardwood # 麻栎 1: net # 罩网 2: abnormal # 疑似 3: pwd_pre_early # 早期 4: pwd_early # 轻度 5: pwd_moderate # 中度 6: pwd # 重度 7: dead_recent # 死亡 8: dead # 逾年 Overwriting pwd.yaml训练集3000张图像,验证集529张图像,查看验证集标注情况:import os import cv2 import yaml import random import numpy as np from matplotlib import pyplot as plt %matplotlib inline with open('pwd.yaml', 'r', encoding='utf-8') as f: data = yaml.load(f.read(), Loader=yaml.FullLoader) classes = data['names'] file_path = os.path.join(data['path'], 'val/images') file_list = os.listdir(file_path) img_paths = random.sample(file_list, 4) img_lists = [] for img_path in img_paths: img_path = os.path.join(file_path, img_path) img = cv2.imread(img_path) h, w, _ = img.shape tl = round(0.002 * (h + w) / 2) + 1 color = (0, 255, 255) if img_path.endswith('.png'): with open(img_path.replace("images", "labels").replace(".png", ".txt")) as f: labels = f.readlines() if img_path.endswith('.jpg'): with open(img_path.replace("images", "labels").replace(".jpg", ".txt")) as f: labels = f.readlines() if img_path.endswith('.jpeg'): with open(img_path.replace("images", "labels").replace(".jpeg", ".txt")) as f: labels = f.readlines() for label in labels: l, x, y, wc, hc = [float(x) for x in label.strip().split()] x1 = int((x - wc / 2) * w) y1 = int((y - hc / 2) * h) x2 = int((x + wc / 2) * w) y2 = int((y + hc / 2) * h) cv2.rectangle(img, (x1, y1), (x2, y2), color, thickness=tl, lineType=cv2.LINE_AA) cv2.putText(img,classes[int(l)],(x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, 2, cv2.LINE_AA) img_lists.append(cv2.resize(img, (1024, 1024))) image = np.concatenate([np.concatenate(img_lists[:2], axis=1), np.concatenate(img_lists[2:], axis=1)], axis=0) cv2.imwrite("sample-pwd.png", image) plt.rcParams["figure.figsize"] = (16, 16) plt.imshow(image[:,:,::-1]) plt.axis('off') plt.show() 2. 切分数据对dataset/pwd数据集进行图像切分,切分大小为1024 x 1024,重叠率是20%,生成新的数据集dataset/pwd-sahi:%%writefile pwd-sahi.yaml # Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] path: /home/jetson/ultralytics/dataset/pwd-sahi # dataset root dir (absolute path) train: train/images # train images (relative to 'path') val: val/images # val images (relative to 'path') test: # test images (optional) # Classes,类别 names: 0: hardwood # 麻栎 1: net # 罩网 2: abnormal # 疑似 3: pwd_pre_early # 早期 4: pwd_early # 轻度 5: pwd_moderate # 中度 6: pwd # 重度 7: dead_recent # 死亡 8: dead # 逾年 Overwriting pwd-sahi.yaml其中训练集60000张图像(部分为背景图),验证集6495张图像(不含背景图),查看验证集的标注情况:import os import cv2 import yaml import random import numpy as np from matplotlib import pyplot as plt %matplotlib inline with open('pwd-sahi.yaml', 'r', encoding='utf-8') as f: data = yaml.load(f.read(), Loader=yaml.FullLoader) classes = data['names'] file_path = os.path.join(data['path'], 'val/images') file_list = os.listdir(file_path) img_paths = random.sample(file_list, 4) img_lists = [] for img_path in img_paths: img_path = os.path.join(file_path, img_path) img = cv2.imread(img_path) h, w, _ = img.shape tl = round(0.002 * (h + w) / 2) + 1 color = (0, 255, 255) if img_path.endswith('.png'): with open(img_path.replace("images", "labels").replace(".png", ".txt")) as f: labels = f.readlines() if img_path.endswith('.jpg'): with open(img_path.replace("images", "labels").replace(".jpg", ".txt")) as f: labels = f.readlines() if img_path.endswith('.jpeg'): with open(img_path.replace("images", "labels").replace(".jpeg", ".txt")) as f: labels = f.readlines() for label in labels: l, x, y, wc, hc = [float(x) for x in label.strip().split()] x1 = int((x - wc / 2) * w) y1 = int((y - hc / 2) * h) x2 = int((x + wc / 2) * w) y2 = int((y + hc / 2) * h) cv2.rectangle(img, (x1, y1), (x2, y2), color, thickness=tl, lineType=cv2.LINE_AA) cv2.putText(img,classes[int(l)],(x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, 2, cv2.LINE_AA) img_lists.append(cv2.resize(img, (1024, 1024))) image = np.concatenate([np.concatenate(img_lists[:2], axis=1), np.concatenate(img_lists[2:], axis=1)], axis=0) cv2.imwrite("sample-pwd-sahi.png", image) plt.rcParams["figure.figsize"] = (16, 16) plt.imshow(image[:,:,::-1]) plt.axis('off') plt.show() 3. 模型训练我们加载yolo11s.yaml模型的配置文件在dataset/pwd数据集上训练10个Epoch,模型的训练结果保存在pine_wilt_disease/yolo11s_10目录下:%%writefile train.py from ultralytics import YOLO # Load a model model = YOLO('yolo11s.yaml') # load yaml model # Train the model results = model.train(data='pwd.yaml', epochs=10, imgsz=640, workers=4, batch=8, project="pine_wilt_disease", name="yolo11s_10") Overwriting train.py在终端中运行:/home/jetson/ultralytics/train.sh在另一个终端中运行/home/jetson/ultralytics/tensorboard.sh可以监控模型的训练情况:4. 模型评估加载训练好的模型,这里我们仅训练了10个Epoch,实际训练至少100个Epoch才能取得较好的效果:from ultralytics import YOLO # Load a model model = YOLO('pine_wilt_disease/yolo11s_10/weights/best.pt') # load the best model # Evaluate the model metrics = model.val( data='pwd.yaml', # 数据集配置 imgsz=640, # 模型输入大小 workers=4, # 数据加载线程 batch=8, # 验证批次大小 plots=True, # 生成验证结果图 split='val' # 指定使用验证集 ) Ultralytics 8.3.55 🚀 Python-3.10.12 torch-2.5.0a0+872d972e41.nv24.08 CUDA:0 (Orin, 7620MiB) YOLO11s summary (fused): 238 layers, 9,416,283 parameters, 0 gradients, 21.3 GFLOPs val: Scanning /home/jetson/ultralytics/dataset/pwd/val/labels.cache... 529 images, 0 backgrounds, 0 corrupt: 100%|██████████| 529/529 [00:00<?, ?it/s] Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 67/67 [00:22<00:00, 2.92it/s] all 529 8612 0.602 0.445 0.442 0.261 net 383 4210 0.681 0.625 0.683 0.401 pwd_early 167 375 1 0 0.0362 0.0168 pwd_moderate 253 503 0.4 0.225 0.224 0.108 pwd 383 1141 0.582 0.765 0.707 0.462 dead_recent 376 1456 0.503 0.443 0.444 0.26 dead 229 927 0.444 0.613 0.557 0.318 Speed: 0.9ms preprocess, 24.5ms inference, 0.0ms loss, 4.1ms postprocess per image Results saved to runs/detect/val注意,图片实际标注的类别只有6类,不包含麻栎和疑似。5. 模型导出导出到TensorRT,GPU推理速度提升高达5倍:https://docs.ultralytics.com/zh/integrations/tensorrt/from ultralytics import YOLO model = YOLO("pine_wilt_disease/yolo11s_10/weights/best.pt") # TensorRT FP16 model.export(format="engine", imgsz=640, batch=1, half=True) WARNING ⚠️ TensorRT requires GPU export, automatically assigning device=0 Ultralytics 8.3.55 🚀 Python-3.10.12 torch-2.5.0a0+872d972e41.nv24.08 CUDA:0 (Orin, 7620MiB) YOLO11s summary (fused): 238 layers, 9,416,283 parameters, 0 gradients, 21.3 GFLOPs [34m[1mPyTorch:[0m starting from 'pine_wilt_disease/yolo11s_10/weights/best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 13, 8400) (18.3 MB) [34m[1mONNX:[0m starting export with onnx 1.17.0 opset 19... [34m[1mONNX:[0m slimming with onnxslim 0.1.47... [34m[1mONNX:[0m export success ✅ 3.2s, saved as 'pine_wilt_disease/yolo11s_10/weights/best.onnx' (36.2 MB) [34m[1mTensorRT:[0m starting export with TensorRT 10.7.0... [11/15/2025-16:23:19] [TRT] [I] [MemUsageChange] Init CUDA: CPU -2, GPU +0, now: CPU 1395, GPU 7158 (MiB) [11/15/2025-16:23:25] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +970, GPU +258, now: CPU 2322, GPU 7418 (MiB) [11/15/2025-16:23:26] [TRT] [I] ---------------------------------------------------------------- [11/15/2025-16:23:26] [TRT] [I] Input filename: pine_wilt_disease/yolo11s_10/weights/best.onnx [11/15/2025-16:23:26] [TRT] [I] ONNX IR version: 0.0.9 [11/15/2025-16:23:26] [TRT] [I] Opset version: 19 [11/15/2025-16:23:26] [TRT] [I] Producer name: pytorch [11/15/2025-16:23:26] [TRT] [I] Producer version: 2.5.0 [11/15/2025-16:23:26] [TRT] [I] Domain: [11/15/2025-16:23:26] [TRT] [I] Model version: 0 [11/15/2025-16:23:26] [TRT] [I] Doc string: [11/15/2025-16:23:26] [TRT] [I] ---------------------------------------------------------------- [34m[1mTensorRT:[0m input "images" with shape(1, 3, 640, 640) DataType.FLOAT [34m[1mTensorRT:[0m output "output0" with shape(1, 13, 8400) DataType.FLOAT [34m[1mTensorRT:[0m building FP16 engine as pine_wilt_disease/yolo11s_10/weights/best.engine [11/15/2025-16:23:26] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [11/15/2025-16:28:08] [TRT] [I] Compiler backend is used during engine build. [11/15/2025-16:31:48] [TRT] [I] Detected 1 inputs and 1 output network tensors. [11/15/2025-16:31:53] [TRT] [I] Total Host Persistent Memory: 543184 bytes [11/15/2025-16:31:53] [TRT] [I] Total Device Persistent Memory: 0 bytes [11/15/2025-16:31:53] [TRT] [I] Max Scratch Memory: 2764800 bytes [11/15/2025-16:31:53] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 162 steps to complete. [11/15/2025-16:31:53] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 19.653ms to assign 10 blocks to 162 nodes requiring 19046912 bytes. [11/15/2025-16:31:53] [TRT] [I] Total Activation Memory: 19046400 bytes [11/15/2025-16:31:53] [TRT] [I] Total Weights Memory: 18914082 bytes [11/15/2025-16:31:53] [TRT] [I] Compiler backend is used during engine execution. [11/15/2025-16:31:53] [TRT] [I] Engine generation completed in 506.948 seconds. [11/15/2025-16:31:53] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 140 MiB [34m[1mTensorRT:[0m export success ✅ 519.0s, saved as 'pine_wilt_disease/yolo11s_10/weights/best.engine' (21.6 MB) Export complete (519.7s) Results saved to [1m/home/jetson/ultralytics/pine_wilt_disease/yolo11s_10/weights[0m Predict: yolo predict task=detect model=pine_wilt_disease/yolo11s_10/weights/best.engine imgsz=640 half Validate: yolo val task=detect model=pine_wilt_disease/yolo11s_10/weights/best.engine imgsz=640 data=pwd.yaml half Visualize: https://netron.app导出FP16精度的量化模型大概需要10分钟左右。6. 模型推理使用TensorRT引擎加载模型对验证集的部分图片进行推理,每张图片的推理耗时约20ms:import cv2 import glob from ultralytics import YOLO import matplotlib.pyplot as plt %matplotlib inline # Load the TensorRT engine model model = YOLO("pine_wilt_disease/yolo11s_10/weights/best.engine") # Define the prediction function def predict(image_path): reuslts = model.predict(image_path, conf=0.45, iou=0.55) return reuslts[0].plot() # Load the images for inference images_path = glob.glob("dataset/pwd/val/images/*.jpeg") # Perform inference and display results for image_path in images_path[:10]: result = predict(image_path) result = cv2.cvtColor(result, cv2.COLOR_BGR2RGB) result = cv2.resize(result, (4032 // 4, 3024 // 4)) plt.imshow(result) plt.axis("off") plt.show() WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'. Loading pine_wilt_disease/yolo11s_10/weights/best.engine for TensorRT inference... [11/15/2025-16:31:54] [TRT] [I] Loaded engine size: 21 MiB [11/15/2025-16:31:54] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18, now: CPU 0, GPU 36 (MiB) image 1/1 /home/jetson/ultralytics/dataset/pwd/val/images/1d1d160a-ae4f-4fe4-801d-f001d4e7ff6d.jpeg: 640x640 4 nets, 1 pwd, 1 dead_recent, 20.1ms Speed: 45.6ms preprocess, 20.1ms inference, 49.3ms postprocess per image at shape (1, 3, 640, 640) ... 构建Gradio应用程序,上传图片实现松材线虫病检测的功能:至此,本章结束。
-
Pi0☁云端推理部署Pi0 是一个通用机器人策略基础模型,专为解决机器人学习中的数据稀缺、泛化能力差和鲁棒性不足等核心挑战而设计。借鉴大语言模型的训练方法,Pi0 通过大规模预训练掌握广泛的机器人操作技能,并能通过微调快速适应具体任务需求。该模型支持高效的数据利用与快速部署,在有限数据下也能实现良好性能,为机器人智能化提供了一种可扩展的解决方案。1. 环境配置在之前LeRobot安装及使用教程和Pi0 模型训练的基础上,Pi0的推理也建议使用云端部署的方式运行。a. 我租赁的服务器是H20,在服务端(openpi环境)中运行:cd /root/autodl-tmp/openpi_episode1_student uv run scripts/serve_policy.py policy:checkpoint --policy.config=enpei_robot_demo_move_toy_low_mem_finetune --policy.dir=checkpoints/enpei_robot_demo_move_toy_low_mem_finetune/my_experiment/9999启动服务器后,他会暴露一个服务端口(默认是 6006):warning: The `tool.uv.dev-dependencies` field (used in `packages/openpi-client/pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead INFO:root:Loading model... INFO:2025-11-09 14:40:27,523:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:2025-11-09 14:40:27,524:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory INFO:absl:orbax-checkpoint version: 0.11.13 INFO:absl:Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f78fd908250> INFO:absl:Restoring checkpoint from /root/autodl-tmp/openpi_episode1_student/checkpoints/enpei_robot_demo_move_toy_low_mem_finetune/my_experiment/9999/params. INFO:absl:[thread=MainThread] Failed to get flag value for EXPERIMENTAL_ORBAX_USE_DISTRIBUTED_PROCESS_ID. INFO:absl:[process=0] /jax/checkpoint/read/bytes_per_sec: 1.1 GiB/s (total bytes: 6.1 GiB) (time elapsed: 5 seconds) (per-host) INFO:absl:Finished restoring checkpoint in 5.72 seconds from /root/autodl-tmp/openpi_episode1_student/checkpoints/enpei_robot_demo_move_toy_low_mem_finetune/my_experiment/9999/params. INFO:root:Loaded norm stats from /root/autodl-tmp/openpi_episode1_student/assets/enpei_robot_demo_move_toy_low_mem_finetune/hou/demo_move_toy_openpi INFO:root:Loaded norm stats from /root/autodl-tmp/openpi_episode1_student/checkpoints/enpei_robot_demo_move_toy_low_mem_finetune/my_experiment/9999/assets/hou/demo_move_toy_openpi INFO:root:Creating server (host: autodl-container-9af746813b-5ccfe20e, ip: 172.17.0.8) INFO:websockets.server:server listening on 0.0.0.0:6006 INFO:websockets.server:connection open INFO:openpi.serving.websocket_policy_server:Connection from ('127.0.0.1', 53688) opened INFO:openpi.serving.websocket_policy_server:Connection from ('127.0.0.1', 53688) closed INFO:websockets.server:connection open INFO:openpi.serving.websocket_policy_server:Connection from ('127.0.0.1', 56818) opened INFO:openpi.serving.websocket_policy_server:Connection from ('127.0.0.1', 56818) closed在AutoDL中使用它的端口转发服务(自定义服务):b. 在客户端(lerobot环境)进入Lerobot仓库代码,中安装openpi-client:cd ./packages/openpi-client pip install -e . c. 启动机械臂进行归零,安装腕部相机和夹爪,查看相机ID:(base) hou@hou-Ubuntu:~/workspace/lerobot_single_student$ conda activate lerobot (lerobot) hou@hou-Ubuntu:~/workspace/lerobot_single_student$ python -m lerobot.episode_default_position --ip="localhost" --port=12345 INFO 2025-11-09 14:25:56 _position.py:33 Connected to EnpeiRobot controller at localhost:12345 INFO 2025-11-09 14:26:02 _position.py:46 Moving to default position, estimated time: 1.72s INFO 2025-11-09 14:26:04 _position.py:50 Successfully moved to default position (lerobot) hou@hou-Ubuntu:~/workspace/lerobot_single_student$ python -m lerobot.find_cameras opencv --- Detected Cameras --- Camera #0: Name: OpenCV Camera @ /dev/video0 Type: OpenCV Id: /dev/video0 Backend api: V4L2 Default stream profile: Format: 0.0 Width: 640 Height: 480 Fps: 30.0 -------------------- Camera #1: Name: OpenCV Camera @ /dev/video2 Type: OpenCV Id: /dev/video2 Backend api: V4L2 Default stream profile: Format: 0.0 Width: 640 Height: 480 Fps: 30.0 -------------------- Camera #2: Name: OpenCV Camera @ /dev/video4 Type: OpenCV Id: /dev/video4 Backend api: V4L2 Default stream profile: Format: 0.0 Width: 640 Height: 480 Fps: 30.0 -------------------- Finalizing image saving... Image capture finished. Images saved to outputs/captured_images2. Pi0 推理启动客户端,在终端中运行run.sh:host 服务器地址(这里因为用了 AutoDL 本地转发,所以是 localhost)port 服务器端口instruction 文本指令,保持和采集数据一致enpei_use_radian 需要使用弧度制#!/bin/bash python -m lerobot.test_openpi \ --robot.ip_address="localhost" \ --robot.port=12345 \ --robot.type=enpei_follower \ --robot.id=enpei_follower \ --robot.cameras="{ handeye: {type: opencv, index_or_path: 4, width: 320, height: 240, fps: 30}, fixed: {type: opencv, index_or_path: 0, width: 320, height: 240, fps: 30}}" \ --host=localhost \ --port=6006 \ --instruction="Put the toy to the white box" \ --fps=30 \ --enpei_use_radian=true (lerobot) hou@hou-Ubuntu:~/workspace/lerobot_single_student$ ./run.sh INFO 2025-11-09 14:48:58 nt_policy.py:30 Waiting for server at ws://localhost:6006... Connected to remote policy server at localhost:6006 Server metadata: {} Using instruction: Put the toy to the white box 设置enpei follower机器人角度单位: 弧度 INFO 2025-11-09 14:48:58 follower.py:134 Connected to EnpeiRobot controller INFO 2025-11-09 14:49:03 a_opencv.py:176 OpenCVCamera(4) connected. INFO 2025-11-09 14:49:05 a_opencv.py:176 OpenCVCamera(0) connected. INFO 2025-11-09 14:49:05 follower.py:156 enpei_follower EnpeiFollower connected. Robot connected successfully 运行策略,目标FPS: 30 推理线程启动 等待第一个动作生成... wait for frame: 0.0018558219999249559 wait for frame: 0.0018982179999511573 Loop duration: 1.4765s, Real FPS: 0.68, Action FPS: 33.87, Action count: 50, Action shape: (50, 7) Loop duration: 1.4735s, Real FPS: 0.68, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.4755s, Real FPS: 0.68, Action FPS: 33.89, Action count: 50, Action shape: (50, 7) Loop duration: 1.4973s, Real FPS: 0.67, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5040s, Real FPS: 0.66, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5590s, Real FPS: 0.64, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5400s, Real FPS: 0.65, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.4911s, Real FPS: 0.67, Action FPS: 33.91, Action count: 50, Action shape: (50, 7) Loop duration: 1.4938s, Real FPS: 0.67, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.4853s, Real FPS: 0.67, Action FPS: 33.92, Action count: 50, Action shape: (50, 7) Loop duration: 1.4957s, Real FPS: 0.67, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.4930s, Real FPS: 0.67, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.4914s, Real FPS: 0.67, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.4878s, Real FPS: 0.67, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.4911s, Real FPS: 0.67, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.4914s, Real FPS: 0.67, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.5189s, Real FPS: 0.66, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.4915s, Real FPS: 0.67, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.5207s, Real FPS: 0.66, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.4865s, Real FPS: 0.67, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.5499s, Real FPS: 0.65, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.6458s, Real FPS: 0.61, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.6337s, Real FPS: 0.61, Action FPS: 33.98, Action count: 50, Action shape: (50, 7) Loop duration: 1.5200s, Real FPS: 0.66, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5218s, Real FPS: 0.66, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5184s, Real FPS: 0.66, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5273s, Real FPS: 0.65, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5247s, Real FPS: 0.66, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5285s, Real FPS: 0.65, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5576s, Real FPS: 0.64, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5193s, Real FPS: 0.66, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5519s, Real FPS: 0.64, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.5240s, Real FPS: 0.66, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.6051s, Real FPS: 0.62, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5251s, Real FPS: 0.66, Action FPS: 33.92, Action count: 50, Action shape: (50, 7) Loop duration: 1.5533s, Real FPS: 0.64, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.5564s, Real FPS: 0.64, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5579s, Real FPS: 0.64, Action FPS: 33.92, Action count: 50, Action shape: (50, 7) Loop duration: 1.5550s, Real FPS: 0.64, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.5526s, Real FPS: 0.64, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5507s, Real FPS: 0.64, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5511s, Real FPS: 0.64, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5557s, Real FPS: 0.64, Action FPS: 33.92, Action count: 50, Action shape: (50, 7) Loop duration: 1.5514s, Real FPS: 0.64, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5539s, Real FPS: 0.64, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5573s, Real FPS: 0.64, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5767s, Real FPS: 0.63, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.5736s, Real FPS: 0.64, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5553s, Real FPS: 0.64, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5537s, Real FPS: 0.64, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.5514s, Real FPS: 0.64, Action FPS: 33.98, Action count: 50, Action shape: (50, 7) Loop duration: 1.6261s, Real FPS: 0.61, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.6378s, Real FPS: 0.61, Action FPS: 33.94, Action count: 50, Action shape: (50, 7) Loop duration: 1.5520s, Real FPS: 0.64, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) Loop duration: 1.5588s, Real FPS: 0.64, Action FPS: 33.98, Action count: 50, Action shape: (50, 7) Loop duration: 1.5567s, Real FPS: 0.64, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.5534s, Real FPS: 0.64, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5481s, Real FPS: 0.65, Action FPS: 33.97, Action count: 50, Action shape: (50, 7) Loop duration: 1.6520s, Real FPS: 0.61, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5514s, Real FPS: 0.64, Action FPS: 33.96, Action count: 50, Action shape: (50, 7) Loop duration: 1.5552s, Real FPS: 0.64, Action FPS: 33.95, Action count: 50, Action shape: (50, 7) Loop duration: 1.5494s, Real FPS: 0.65, Action FPS: 33.93, Action count: 50, Action shape: (50, 7) ^C Client stopped by user 推理线程结束测试视频:3. 小结本文介绍了Pi0云端推理部署的完整流程。首先配置好服务器端环境并启动策略服务,然后在客户端安装所需依赖并连接机械臂与摄像头。实测下来,即使对于没见过的云宝机械臂的抓取放置成功率也能达到100%,体现了模型强大的泛化能力!
-
Pi0 模型训练Pi0 是一个通用机器人策略基础模型,专为解决机器人学习中的数据稀缺、泛化能力差和鲁棒性不足等核心挑战而设计。借鉴大语言模型的训练方法,Pi0 通过大规模预训练掌握广泛的机器人操作技能,并能通过微调快速适应具体任务需求。该模型支持高效的数据利用与快速部署,在有限数据下也能实现良好性能,为机器人智能化提供了一种可扩展的解决方案。1. 环境安装在LeRobot安装及使用教程的基础上,训练Pi0需要显存较大的显卡,我租赁的实例是 H20,使用VS Code SSH远程连接服务器:# ssh -p 17825 root@region-42.seetacloud.com Host region-42 HostName region-42.seetacloud.com User root Port 17825 在安装Pi0的过程中需要访问Github和HuggingFace等国外网站,建议开启AutoDL内置的学术加速服务,由于该加速服务可能对正常网络造成一定影响,安装过程中如果失败可以多次尝试,当不再需要时建议取消学术加速。# 启用学术加速 source /etc/network_turbo cd /root/autodl-tmp git clone https://github.com/enpeizhao/openpi_episode1_student.git cd openpi_episode1_student/ pip install uv GIT_LFS_SKIP_SMUDGE=1 uv sync GIT_LFS_SKIP_SMUDGE=1 uv pip install -e . # 取消学术加速 unset http_proxy && unset https_proxy2. 转换数据Pi0训练必须是弧度制数据,安装ffmpeg转换为openpi格式:sudo apt update sudo apt install ffmpeg uv run ./examples/libero/lerobot2oppi.py \ --source-repo-id=hou/demo_move_toy \ --target-repo-id=hou/demo_move_toy_openpi \ --output-path=./demo_move_toy_openpi \ --source-dataset-root=/root/autodl-tmp/demo_move_toy \ --max-episodes=100 3. 修改配置以下是我的配置文件,为了节省训练时间num_train_steps设置为10_000:# 单臂配置:openpi_episode1_student/src/openpi/training/config.py TrainConfig( name="enpei_robot_demo_move_toy_low_mem_finetune", # Here is an example of loading a pi0 model for LoRA fine-tuning. model=pi0.Pi0Config(paligemma_variant="gemma_2b_lora", action_expert_variant="gemma_300m_lora"), data=LeRobotLiberoDataConfig( repo_id="hou/demo_move_toy_openpi", # 数据集repo_od root="./demo_move_toy_openpi", # 数据集路径 base_config=DataConfig(prompt_from_task=True), ), weight_loader=weight_loaders.CheckpointWeightLoader("/root/autodl-tmp/pi0_base/params"), num_train_steps=10_000, # The freeze filter defines which parameters should be frozen during training. # We have a convenience function in the model config that returns the default freeze filter # for the given model config for LoRA finetuning. Just make sure it matches the model config # you chose above. freeze_filter=pi0.Pi0Config( paligemma_variant="gemma_2b_lora", action_expert_variant="gemma_300m_lora" ).get_freeze_filter(), # Turn off EMA for LoRA finetuning. ema_decay=None, ), 4. 模型训练# 计算 normalization uv run scripts/compute_norm_stats.py --config-name enpei_robot_demo_move_toy_low_mem_finetune # 开启训练 XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py enpei_robot_demo_move_toy_low_mem_finetune --exp-name=my_experiment --overwrite在训练过程中可以看到模型的LOSS在逐渐降低,也可以在终端中查看GPU的利用率,采集100个episodes数据使用H20迭代10_000步大概需要花费9个小时:训练过程中Pi0会保存多个时间点的权重checkpoint,每次约消耗磁盘12G,因此建议将代码目录放到/root/autodl-tmp数据盘目录下,这样训练过程中产生的权重文件会自动保存在/root/autodl-tmp/checkpoints下,避免系统盘因空间不足导致训练失败。5. 本文小结本文介绍了在H20实例上安装和训练Pi0模型的完整流程,包括环境配置、数据转换、修改训练配置以及模型训练等关键步骤,并提供了相关命令和注意事项,帮助读者快速上手Pi0模型训练。
-
如何在Python中调用C++版本的ByteTrack跟踪算法这个项目提供了基于ByteTrack-TensorRT的Python插件,并在原有算法基础上提供了跟踪目标的类别信息,Jetson Orin Nano在之前YOLOv5插件的基础上实现高达83 FPS的实时检测跟踪性能。⚡ 极致性能: 基于TensorRT优化,充分利用硬件加速📦 开箱即用:构建过程简单,快速部署您的跟踪应用🐍 Python 友好: 使用Pybind11提供简洁Python接口📱 边缘设备优化: 特别针对Jetson边缘设备进行适配Build plugin首先安装必要的库克隆仓库构建项目,注意JetPack 5.x版本才能正常运行:sudo apt update sudo apt install ffmpeg sudo apt install pybind11-dev sudo apt install libeigen3-dev git cone https://github.com/HouYanSong/bytetrack_pybind11.git cd bytetrack_pybind11 pip install pybind11 rm -fr build cmake -S . -B build cmake --build build[ 12%] Building CXX object CMakeFiles/bytetrack.dir/bytetrack/src/BYTETracker.cpp.o [ 25%] Building CXX object CMakeFiles/bytetrack.dir/bytetrack/src/STrack.cpp.o [ 37%] Building CXX object CMakeFiles/bytetrack.dir/bytetrack/src/kalmanFilter.cpp.o [ 50%] Building CXX object CMakeFiles/bytetrack.dir/bytetrack/src/lapjv.cpp.o [ 62%] Building CXX object CMakeFiles/bytetrack.dir/bytetrack/src/utils.cpp.o [ 75%] Linking CXX shared library libbytetrack.so [ 75%] Built target bytetrack [ 87%] Building CXX object CMakeFiles/bytetrack_trt.dir/bytetrack_trt.cpp.o [100%] Linking CXX shared module bytetrack_trt.cpython-38-aarch64-linux-gnu.so [100%] Built target bytetrack_trtRun demo我们提供了一个简单的Python示例,只需要导入C++构建的Python动态链接库就可以非常方便的调用ByteTrack跟踪算法,返回目标位置、跟踪ID和类别信息。import cv2 import time import ctypes # 加载依赖库 ctypes.CDLL("./yolov5_trt_plugin/libyolo_plugin.so", mode=ctypes.RTLD_GLOBAL) ctypes.CDLL("./yolov5_trt_plugin/libyolo_utils.so", mode=ctypes.RTLD_GLOBAL) ctypes.CDLL("./build/libbytetrack.so", mode=ctypes.RTLD_GLOBAL) # 导入YOLOv5检测器和ByteTrack跟踪器 from yolov5_trt_plugin import yolov5_trt from build import bytetrack_trt def draw_image(image, detections, tracks, fps): for track in tracks: x, y, w, h = track.tlwh track_id = track.track_id class_id = track.label x1, y1, x2, y2 = int(x), int(y), int(x+w), int(y+h) cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2) cv2.putText(image, f"C:{class_id} T:{track_id}", (x1, y1 - 10), cv2.FONT_HERSHEY_PLAIN, 1.2, (0, 0, 255), 2) cv2.putText(image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 0, 255), 2) return image def main(input_path, output_path): cap = cv2.VideoCapture(input_path) fps_value = int(cap.get(cv2.CAP_PROP_FPS)) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'MJPG'), fps_value, (width, height)) detector = yolov5_trt.YOLOv5Detector("./yolov5_trt_plugin/yolov5s.engine", width, height) tracker = bytetrack_trt.BYTETracker(frame_rate = fps_value, track_buffer = 30) fps_list = [] frame_count = 0 total_time = 0.0 while cap.isOpened(): ret, frame = cap.read() if not ret: break start_time = time.time() # 目标检测 detections = detector.detect(input_image=frame, input_w=640, input_h=640, conf_thresh=0.45, nms_thresh=0.55) objects = [] for det in detections: x1, y1, x2, y2 = det['bbox'] rect = bytetrack_trt.RectFloat(x1, y1, x2-x1, y2-y1) # x, y, width, height obj = bytetrack_trt.Object() obj.rect = rect obj.label = det['class_id'] obj.prob = det['confidence'] objects.append(obj) # 目标跟踪 tracks = tracker.update(objects) process_time = time.time() - start_time current_fps = 1.0 / process_time if process_time > 0 else 0 frame_count += 1 total_time += process_time fps_list.append(current_fps) # 图像绘制 image = draw_image(frame, detections, tracks, current_fps) writer.write(image) cap.release() writer.release() if frame_count > 0: avg_fps = frame_count / total_time if total_time > 0 else 0 print(f"Processed {frame_count} frames") print(f"Average FPS: {avg_fps:.2f}") print(f"Min FPS: {min(fps_list):.2f}") print(f"Max FPS: {max(fps_list):.2f}") if __name__ == "__main__": input_video = "./media/sample_720p.mp4" output_video = "./result.avi" main(input_video, output_video) 仅需在终端中运行yolov5_bytetrack.py脚本:python yolov5_bytetrack.py[11/07/2025-17:13:10] [I] [TRT] Loaded engine size: 8 MiB Deserialize yoloLayer plugin: YoloLayer [11/07/2025-17:13:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +702, now: CPU 841, GPU 3927 (MiB) [11/07/2025-17:13:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +94, now: CPU 924, GPU 4021 (MiB) [11/07/2025-17:13:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7, now: CPU 0, GPU 7 (MiB) [11/07/2025-17:13:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 924, GPU 4021 (MiB) [11/07/2025-17:13:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +1, now: CPU 924, GPU 4022 (MiB) [11/07/2025-17:13:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 18 (MiB) Init ByteTrack! Processed 1442 frames Average FPS: 83.78 Min FPS: 68.31 Max FPS: 113.35 Conclusion Remarks本文实现了ByteTrack-TensorRT跟踪算法的Python插件,并在原有算法基础上提供了跟踪目标的类别信息,Jetson Orin Nano (8GB)上的YOLOv5实时目标检测和跟踪速度高达80FPS,满足对快速运动目标的跟踪需求。
-
如何在Jetson上将YOLOv5实时检测速度提升至120+FPS这个项目提供了基于 Pybind11 的 TensorRT YOLOv5 插件 Python 绑定,实现了令人难以置信的实时目标检测性能!⚡ 超100FPS性能: 在 Jetson Orin Nano 上轻松实现超过 120 帧/秒的检测速度🎯 高精度检测: 基于成熟的 YOLOv5 架构,准确识别COCO数据集上的80类目标🔌 即插即用: 简单的 Python 接口,无需复杂的配置🛠️ 工业级优化: 采用 TensorRT 进行模型优化和加速1. Building the plugin首先安装必要的库克隆仓库构建项目,注意JetPack 5.x版本才能正常运行:sudo apt update sudo apt install ffmpeg sudo apt install pybind11-dev git clone https://github.com/HouYanSong/yolov5_trt_pybind11.git cd yolov5_trt_pybind11 pip install pybind11 rm -fr build cmake -S . -B build cmake --build build2. Model quantization生成量化图片对YOLOv5s模型进行Int8量化,保存量化后的模型:./media/gen_calib.sh ./build/build weights/yolov5s.onnx 1 ./media/ ./media/filelist.txt weights/yolov5s.engine[11/06/2025-11:57:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +221, GPU +0, now: CPU 249, GPU 4229 (MiB) [11/06/2025-11:57:39] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +277, now: CPU 574, GPU 4529 (MiB) [11/06/2025-11:57:39] [I] [TRT] ---------------------------------------------------------------- [11/06/2025-11:57:39] [I] [TRT] Input filename: weights/yolov5s.onnx [11/06/2025-11:57:39] [I] [TRT] ONNX IR version: 0.0.7 [11/06/2025-11:57:39] [I] [TRT] Opset version: 12 [11/06/2025-11:57:39] [I] [TRT] Producer name: [11/06/2025-11:57:39] [I] [TRT] Producer version: [11/06/2025-11:57:39] [I] [TRT] Domain: [11/06/2025-11:57:39] [I] [TRT] Model version: 0 [11/06/2025-11:57:39] [I] [TRT] Doc string: [11/06/2025-11:57:39] [I] [TRT] ---------------------------------------------------------------- [11/06/2025-11:57:39] [I] [TRT] No importer registered for op: YoloLayer_TRT. Attempting to import as plugin. [11/06/2025-11:57:39] [I] [TRT] Searching for plugin: YoloLayer_TRT, plugin_version: 1, plugin_namespace: [11/06/2025-11:57:39] [I] [TRT] Successfully created plugin: YoloLayer_TRT [11/06/2025-11:57:39] [I] sample0001.png [11/06/2025-11:57:39] [I] sample0002.png [11/06/2025-11:57:39] [I] sample0003.png [11/06/2025-11:57:39] [I] sample0004.png [11/06/2025-11:57:39] [I] sample0005.png [11/06/2025-11:57:39] [I] sample0006.png [11/06/2025-11:57:39] [I] sample0007.png [11/06/2025-11:57:39] [I] sample0008.png [11/06/2025-11:57:39] [I] sample0009.png [11/06/2025-11:57:39] [I] sample0010.png [11/06/2025-11:57:39] [I] sample0011.png [11/06/2025-11:57:39] [I] sample0012.png [11/06/2025-11:57:39] [I] sample0013.png [11/06/2025-11:57:39] [I] sample0014.png [11/06/2025-11:57:39] [I] sample0015.png [11/06/2025-11:57:39] [I] sample0016.png [11/06/2025-11:57:39] [I] sample0017.png [11/06/2025-11:57:39] [I] sample0018.png [11/06/2025-11:57:39] [I] sample0019.png [11/06/2025-11:57:39] [I] sample0020.png [11/06/2025-11:57:39] [I] sample0021.png [11/06/2025-11:57:39] [I] sample0022.png [11/06/2025-11:57:39] [I] sample0023.png [11/06/2025-11:57:39] [I] sample0024.png [11/06/2025-11:57:39] [I] sample0025.png [11/06/2025-11:57:39] [I] sample0026.png [11/06/2025-11:57:39] [I] sample0027.png [11/06/2025-11:57:39] [I] sample0028.png [11/06/2025-11:57:39] [I] sample0029.png [11/06/2025-11:57:39] [I] sample0030.png [11/06/2025-11:57:39] [I] sample0031.png [11/06/2025-11:57:39] [I] sample0032.png [11/06/2025-11:57:39] [I] sample0033.png [11/06/2025-11:57:39] [I] sample0034.png [11/06/2025-11:57:39] [I] sample0035.png [11/06/2025-11:57:39] [I] sample0036.png [11/06/2025-11:57:39] [I] sample0037.png [11/06/2025-11:57:39] [I] sample0038.png [11/06/2025-11:57:39] [I] sample0039.png [11/06/2025-11:57:39] [I] sample0040.png [11/06/2025-11:57:39] [I] sample0041.png [11/06/2025-11:57:39] [I] sample0042.png [11/06/2025-11:57:39] [I] sample0043.png [11/06/2025-11:57:39] [I] sample0044.png [11/06/2025-11:57:39] [I] sample0045.png [11/06/2025-11:57:39] [I] sample0046.png [11/06/2025-11:57:39] [I] sample0047.png [11/06/2025-11:57:39] [I] sample0048.png [11/06/2025-11:57:39] [I] sample0049.png [11/06/2025-11:57:39] [I] sample0050.png [11/06/2025-11:57:39] [I] sample0051.png [11/06/2025-11:57:39] [I] sample0052.png [11/06/2025-11:57:39] [I] sample0053.png [11/06/2025-11:57:39] [I] sample0054.png [11/06/2025-11:57:39] [I] sample0055.png [11/06/2025-11:57:39] [I] sample0056.png [11/06/2025-11:57:39] [I] sample0057.png [11/06/2025-11:57:39] [I] sample0058.png [11/06/2025-11:57:39] [I] sample0059.png [11/06/2025-11:57:39] [I] sample0060.png [11/06/2025-11:57:39] [I] sample0061.png [11/06/2025-11:57:39] [I] sample0062.png [11/06/2025-11:57:39] [I] sample0063.png [11/06/2025-11:57:39] [I] sample0064.png [11/06/2025-11:57:39] [I] sample0065.png [11/06/2025-11:57:39] [I] sample0066.png [11/06/2025-11:57:39] [I] sample0067.png [11/06/2025-11:57:39] [I] sample0068.png [11/06/2025-11:57:39] [I] sample0069.png [11/06/2025-11:57:39] [I] sample0070.png [11/06/2025-11:57:39] [I] sample0071.png [11/06/2025-11:57:39] [I] sample0072.png [11/06/2025-11:57:39] [I] sample0073.png [11/06/2025-11:57:39] [I] sample0074.png [11/06/2025-11:57:39] [I] sample0075.png [11/06/2025-11:57:39] [I] sample0076.png [11/06/2025-11:57:39] [I] sample0077.png [11/06/2025-11:57:39] [I] sample0078.png [11/06/2025-11:57:39] [I] sample0079.png [11/06/2025-11:57:39] [I] sample0080.png [11/06/2025-11:57:39] [I] sample0081.png [11/06/2025-11:57:39] [I] sample0082.png [11/06/2025-11:57:39] [I] sample0083.png [11/06/2025-11:57:39] [I] sample0084.png [11/06/2025-11:57:39] [I] sample0085.png [11/06/2025-11:57:39] [I] sample0086.png [11/06/2025-11:57:39] [I] sample0087.png [11/06/2025-11:57:39] [I] sample0088.png [11/06/2025-11:57:39] [I] sample0089.png [11/06/2025-11:57:39] [I] sample0090.png [11/06/2025-11:57:39] [I] sample0091.png [11/06/2025-11:57:39] [I] sample0092.png [11/06/2025-11:57:39] [I] sample0093.png [11/06/2025-11:57:39] [I] sample0094.png [11/06/2025-11:57:39] [I] sample0095.png [11/06/2025-11:57:39] [I] sample0096.png [11/06/2025-11:57:39] [I] sample0097.png [11/06/2025-11:57:39] [I] sample0098.png [11/06/2025-11:57:39] [I] sample0099.png [11/06/2025-11:57:39] [I] sample0100.png [11/06/2025-11:57:39] [I] sample0101.png [11/06/2025-11:57:39] [I] sample0102.png [11/06/2025-11:57:39] [I] sample0103.png [11/06/2025-11:57:39] [I] sample0104.png [11/06/2025-11:57:39] [I] sample0105.png [11/06/2025-11:57:39] [I] sample0106.png [11/06/2025-11:57:39] [I] sample0107.png [11/06/2025-11:57:39] [I] sample0108.png [11/06/2025-11:57:39] [I] sample0109.png [11/06/2025-11:57:39] [I] sample0110.png [11/06/2025-11:57:39] [I] sample0111.png [11/06/2025-11:57:39] [I] sample0112.png [11/06/2025-11:57:39] [I] sample0113.png [11/06/2025-11:57:39] [I] sample0114.png [11/06/2025-11:57:39] [I] sample0115.png [11/06/2025-11:57:39] [I] sample0116.png [11/06/2025-11:57:39] [I] sample0117.png [11/06/2025-11:57:39] [I] sample0118.png [11/06/2025-11:57:39] [I] sample0119.png [11/06/2025-11:57:39] [I] sample0120.png [11/06/2025-11:57:39] [I] sample0121.png [11/06/2025-11:57:39] [I] sample0122.png [11/06/2025-11:57:39] [I] sample0123.png [11/06/2025-11:57:39] [I] sample0124.png [11/06/2025-11:57:39] [I] sample0125.png [11/06/2025-11:57:39] [I] sample0126.png [11/06/2025-11:57:39] [I] sample0127.png [11/06/2025-11:57:39] [I] sample0128.png [11/06/2025-11:57:39] [I] sample0129.png [11/06/2025-11:57:39] [I] sample0130.png [11/06/2025-11:57:39] [I] sample0131.png [11/06/2025-11:57:39] [I] sample0132.png [11/06/2025-11:57:39] [I] sample0133.png [11/06/2025-11:57:39] [I] sample0134.png [11/06/2025-11:57:39] [I] sample0135.png [11/06/2025-11:57:39] [I] sample0136.png [11/06/2025-11:57:39] [I] sample0137.png [11/06/2025-11:57:39] [I] sample0138.png [11/06/2025-11:57:39] [I] sample0139.png [11/06/2025-11:57:39] [I] sample0140.png [11/06/2025-11:57:39] [I] sample0141.png [11/06/2025-11:57:39] [I] sample0142.png [11/06/2025-11:57:39] [I] sample0143.png [11/06/2025-11:57:39] [I] sample0144.png [11/06/2025-11:57:39] [I] sample0145.png CalibrationDataReader: 145 images, 145 batches. [11/06/2025-11:57:39] [I] [TRT] Reading Calibration Cache for calibrator: MinMaxCalibration [11/06/2025-11:57:39] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales. [11/06/2025-11:57:39] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache. [11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeNumDetection, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeDetectionClasses, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on DLA ---------- [11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on GPU ---------- [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.0/conv/Conv + PWN(PWN(/model.0/act/Sigmoid), /model.0/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.1/conv/Conv + PWN(PWN(/model.1/act/Sigmoid), /model.1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv1/conv/Conv + PWN(PWN(/model.2/cv1/act/Sigmoid), /model.2/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv2/conv/Conv + PWN(PWN(/model.2/cv2/act/Sigmoid), /model.2/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv1/conv/Conv + PWN(PWN(/model.2/m/m.0/cv1/act/Sigmoid), /model.2/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.2/m/m.0/cv2/act/Sigmoid), /model.2/m/m.0/cv2/act/Mul), /model.2/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv3/conv/Conv + PWN(PWN(/model.2/cv3/act/Sigmoid), /model.2/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.3/conv/Conv + PWN(PWN(/model.3/act/Sigmoid), /model.3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv1/conv/Conv + PWN(PWN(/model.4/cv1/act/Sigmoid), /model.4/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv2/conv/Conv + PWN(PWN(/model.4/cv2/act/Sigmoid), /model.4/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv1/conv/Conv + PWN(PWN(/model.4/m/m.0/cv1/act/Sigmoid), /model.4/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.0/cv2/act/Sigmoid), /model.4/m/m.0/cv2/act/Mul), /model.4/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv1/conv/Conv + PWN(PWN(/model.4/m/m.1/cv1/act/Sigmoid), /model.4/m/m.1/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.1/cv2/act/Sigmoid), /model.4/m/m.1/cv2/act/Mul), /model.4/m/m.1/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv3/conv/Conv + PWN(PWN(/model.4/cv3/act/Sigmoid), /model.4/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.5/conv/Conv + PWN(PWN(/model.5/act/Sigmoid), /model.5/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv1/conv/Conv + PWN(PWN(/model.6/cv1/act/Sigmoid), /model.6/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv2/conv/Conv + PWN(PWN(/model.6/cv2/act/Sigmoid), /model.6/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv1/conv/Conv + PWN(PWN(/model.6/m/m.0/cv1/act/Sigmoid), /model.6/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.0/cv2/act/Sigmoid), /model.6/m/m.0/cv2/act/Mul), /model.6/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv1/conv/Conv + PWN(PWN(/model.6/m/m.1/cv1/act/Sigmoid), /model.6/m/m.1/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.1/cv2/act/Sigmoid), /model.6/m/m.1/cv2/act/Mul), /model.6/m/m.1/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv1/conv/Conv + PWN(PWN(/model.6/m/m.2/cv1/act/Sigmoid), /model.6/m/m.2/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.2/cv2/act/Sigmoid), /model.6/m/m.2/cv2/act/Mul), /model.6/m/m.2/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv3/conv/Conv + PWN(PWN(/model.6/cv3/act/Sigmoid), /model.6/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.7/conv/Conv + PWN(PWN(/model.7/act/Sigmoid), /model.7/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv1/conv/Conv + PWN(PWN(/model.8/cv1/act/Sigmoid), /model.8/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv2/conv/Conv + PWN(PWN(/model.8/cv2/act/Sigmoid), /model.8/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv1/conv/Conv + PWN(PWN(/model.8/m/m.0/cv1/act/Sigmoid), /model.8/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.8/m/m.0/cv2/act/Sigmoid), /model.8/m/m.0/cv2/act/Mul), /model.8/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv3/conv/Conv + PWN(PWN(/model.8/cv3/act/Sigmoid), /model.8/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv1/conv/Conv + PWN(PWN(/model.9/cv1/act/Sigmoid), /model.9/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m/MaxPool [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_1/MaxPool [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_2/MaxPool [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/cv1/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m/MaxPool_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m_1/MaxPool_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv2/conv/Conv + PWN(PWN(/model.9/cv2/act/Sigmoid), /model.9/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.10/conv/Conv + PWN(PWN(/model.10/act/Sigmoid), /model.10/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.11/Resize [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.11/Resize_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv1/conv/Conv + PWN(PWN(/model.13/cv1/act/Sigmoid), /model.13/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv2/conv/Conv + PWN(PWN(/model.13/cv2/act/Sigmoid), /model.13/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv1/conv/Conv + PWN(PWN(/model.13/m/m.0/cv1/act/Sigmoid), /model.13/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv2/conv/Conv + PWN(PWN(/model.13/m/m.0/cv2/act/Sigmoid), /model.13/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv3/conv/Conv + PWN(PWN(/model.13/cv3/act/Sigmoid), /model.13/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.14/conv/Conv + PWN(PWN(/model.14/act/Sigmoid), /model.14/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.15/Resize [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.15/Resize_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.4/cv3/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv1/conv/Conv + PWN(PWN(/model.17/cv1/act/Sigmoid), /model.17/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv2/conv/Conv + PWN(PWN(/model.17/cv2/act/Sigmoid), /model.17/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv1/conv/Conv + PWN(PWN(/model.17/m/m.0/cv1/act/Sigmoid), /model.17/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv2/conv/Conv + PWN(PWN(/model.17/m/m.0/cv2/act/Sigmoid), /model.17/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv3/conv/Conv + PWN(PWN(/model.17/cv3/act/Sigmoid), /model.17/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.18/conv/Conv + PWN(PWN(/model.18/act/Sigmoid), /model.18/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.0/Conv + PWN(/model.24/Sigmoid) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.14/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv1/conv/Conv + PWN(PWN(/model.20/cv1/act/Sigmoid), /model.20/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv2/conv/Conv + PWN(PWN(/model.20/cv2/act/Sigmoid), /model.20/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv1/conv/Conv + PWN(PWN(/model.20/m/m.0/cv1/act/Sigmoid), /model.20/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv2/conv/Conv + PWN(PWN(/model.20/m/m.0/cv2/act/Sigmoid), /model.20/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv3/conv/Conv + PWN(PWN(/model.20/cv3/act/Sigmoid), /model.20/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.21/conv/Conv + PWN(PWN(/model.21/act/Sigmoid), /model.21/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.1/Conv + PWN(/model.24/Sigmoid_1) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.10/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv1/conv/Conv + PWN(PWN(/model.23/cv1/act/Sigmoid), /model.23/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv2/conv/Conv + PWN(PWN(/model.23/cv2/act/Sigmoid), /model.23/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv1/conv/Conv + PWN(PWN(/model.23/m/m.0/cv1/act/Sigmoid), /model.23/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv2/conv/Conv + PWN(PWN(/model.23/m/m.0/cv2/act/Sigmoid), /model.23/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv3/conv/Conv + PWN(PWN(/model.23/cv3/act/Sigmoid), /model.23/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.2/Conv + PWN(/model.24/Sigmoid_2) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] PLUGIN_V2: YoloLayer [11/06/2025-11:57:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +689, now: CPU 1137, GPU 5200 (MiB) [11/06/2025-11:57:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +132, now: CPU 1220, GPU 5332 (MiB) [11/06/2025-11:57:41] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [11/06/2025-12:00:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes. [11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 1115794944 [11/06/2025-12:01:03] [I] [TRT] Detected 1 inputs and 4 output network tensors. [11/06/2025-12:01:03] [I] [TRT] Total Host Persistent Memory: 175984 [11/06/2025-12:01:03] [I] [TRT] Total Device Persistent Memory: 614912 [11/06/2025-12:01:03] [I] [TRT] Total Scratch Memory: 0 [11/06/2025-12:01:03] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 7 MiB, GPU 553 MiB [11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 67 steps to complete. [11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 2.77161ms to assign 6 blocks to 67 nodes requiring 10925056 bytes. [11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 10925056 [11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB) [11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB) [11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +8, now: CPU 7, GPU 8 (MiB) Engine build success! Python call example以下是一个简单Python示例调用C++生成的动态链接库,仅需指定模型文件的路径和视频输入的大小,就能返回视频每一帧的检测结果,并且在视频推理过程中可以动态调整置信度和交并比等参数的阈值。import cv2 import time import ctypes ctypes.CDLL("./build/libyolo_plugin.so", mode=ctypes.RTLD_GLOBAL) ctypes.CDLL("./build/libyolo_utils.so", mode=ctypes.RTLD_GLOBAL) from build import yolov5_trt def draw_detections(image, detections, fps): for detection in detections: class_id = detection['class_id'] x1, y1, x2, y2 = detection['bbox'] confidence = detection['confidence'] cv2.rectangle(image, (x1, y1), (x2, y2), (0x27, 0xC1, 0x36), 2) cv2.putText(image, f"{class_id}:{confidence:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_PLAIN, 1.2, (0x27, 0xC1, 0x36), 2) cv2.putText(image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 0, 255), 2) return image def main(input_path, output_path): cap = cv2.VideoCapture(input_path) fps = int(cap.get(cv2.CAP_PROP_FPS)) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) detector = yolov5_trt.YOLOv5Detector("./weights/yolov5s.engine", width, height) writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'MJPG'), fps, (width, height)) fps_list = [] frame_count = 0 total_time = 0.0 while cap.isOpened(): ret, frame = cap.read() if not ret: break start_time = time.time() detections = detector.detect(input_image=frame, input_w=640, input_h=640, conf_thresh=0.45, nms_thresh=0.55) process_time = time.time() - start_time current_fps = 1.0 / process_time if process_time > 0 else 0 frame_count += 1 total_time += process_time fps_list.append(current_fps) image = draw_detections(frame, detections, current_fps) writer.write(image) cap.release() writer.release() if frame_count > 0: avg_fps = frame_count / total_time if total_time > 0 else 0 print(f"Processed {frame_count} frames") print(f"Average FPS: {avg_fps:.2f}") print(f"Min FPS: {min(fps_list):.2f}") print(f"Max FPS: {max(fps_list):.2f}") if __name__ == "__main__": input_video = "./media/sample_720p.mp4" output_video = "./result.avi" main(input_video, output_video) 对应的C++推理代码如下:#include "NvInfer.h" #include "logger.h" #include "common.h" #include "buffers.h" #include "utils/preprocess.h" #include "utils/postprocess.h" #include "utils/types.h" #include "utils/utils.h" #include <pybind11/pybind11.h> #include <pybind11/numpy.h> #include <pybind11/stl.h> #include <memory> #include <mutex> namespace py = pybind11; // 将numpy数组转换为cv::Mat cv::Mat numpy_to_mat(py::array_t<unsigned char>& input) { py::buffer_info buf_info = input.request(); if (buf_info.ndim == 3) { // 彩色图像 int height = buf_info.shape[0]; int width = buf_info.shape[1]; int channels = buf_info.shape[2]; cv::Mat mat(height, width, CV_8UC3, (unsigned char*)buf_info.ptr); return mat.clone(); } else if (buf_info.ndim == 2) { // 灰度图像 int height = buf_info.shape[0]; int width = buf_info.shape[1]; cv::Mat mat(height, width, CV_8UC1, (unsigned char*)buf_info.ptr); return mat.clone(); } throw std::runtime_error("Unsupported array dimensions"); } // 将cv::Mat转换为numpy数组 py::array_t<unsigned char> mat_to_numpy(cv::Mat& mat) { if (mat.empty()) { return py::array_t<unsigned char>(); } if (mat.channels() == 1) { // 灰度图像 auto result = py::array_t<unsigned char>({mat.rows, mat.cols}); auto buf = result.request(); memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total()); return result; } else { // 彩色图像 auto result = py::array_t<unsigned char>({mat.rows, mat.cols, mat.channels()}); auto buf = result.request(); memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total() * mat.channels()); return result; } } // 加载模型文件 std::vector<unsigned char> load_engine_file(const std::string &file_name) { std::vector<unsigned char> engine_data; std::ifstream engine_file(file_name, std::ios::binary); assert(engine_file.is_open() && "Unable to load engine file."); engine_file.seekg(0, engine_file.end); int length = engine_file.tellg(); engine_data.resize(length); engine_file.seekg(0, engine_file.beg); engine_file.read(reinterpret_cast<char *>(engine_data.data()), length); return engine_data; } // YOLOv5推理器类 class YOLOv5Detector { private: std::unique_ptr<nvinfer1::IRuntime> runtime; std::shared_ptr<nvinfer1::ICudaEngine> engine; std::unique_ptr<nvinfer1::IExecutionContext> context; std::unique_ptr<samplesCommon::BufferManager> buffers; bool initialized = false; public: YOLOv5Detector(const std::string& engine_file, int frame_width, int frame_height) { initialize(engine_file); int img_size = frame_width * frame_height; cuda_preprocess_init(img_size); // 申请cuda内存 } void initialize(const std::string& engine_file) { // ========== 1. 创建推理运行时runtime ========== runtime = std::unique_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(sample::gLogger.getTRTLogger())); if (!runtime) { throw std::runtime_error("Failed to create TensorRT runtime"); } // ========== 2. 反序列化生成engine ========== auto plan = load_engine_file(engine_file); engine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(plan.data(), plan.size())); if (!engine) { throw std::runtime_error("Failed to deserialize engine"); } // ========== 3. 创建执行上下文context ========== context = std::unique_ptr<nvinfer1::IExecutionContext>(engine->createExecutionContext()); if (!context) { throw std::runtime_error("Failed to create execution context"); } // ========== 4. 创建输入输出缓冲区 ========== buffers = std::make_unique<samplesCommon::BufferManager>(engine); initialized = true; } py::list detect(py::array_t<unsigned char>& input_image, int input_w=kInputW, int input_h=kInputH, float conf_thresh=kConfThresh, float nms_thresh=kNmsThresh) { if (!initialized) { throw std::runtime_error("Detector not initialized"); } // 将numpy数组转换为cv::Mat cv::Mat frame = numpy_to_mat(input_image); if (frame.empty()) { throw std::runtime_error("Invalid input image"); } // CUDA预处理 process_input_gpu(frame, (float *)buffers->getDeviceBuffer(kInputTensorName), input_w, input_h); // ========== 5. 执行推理 ========== context->executeV2(buffers->getDeviceBindings().data()); // 拷贝回host buffers->copyOutputToHost(); // 从buffer manager中获取模型输出 int32_t *num_det = (int32_t *)buffers->getHostBuffer(kOutNumDet); int32_t *cls = (int32_t *)buffers->getHostBuffer(kOutDetCls); float *conf = (float *)buffers->getHostBuffer(kOutDetScores); float *bbox = (float *)buffers->getHostBuffer(kOutDetBBoxes); // 执行nms(非极大值抑制) std::vector<Detection> bboxs; yolo_nms(bboxs, num_det, cls, conf, bbox, conf_thresh, nms_thresh); // 返回检测结果 py::list result_list; for (size_t j = 0; j < bboxs.size(); j++) { cv::Rect r = get_rect(frame, bboxs[j].bbox, input_w, input_h); py::dict detection; detection["class_id"] = (int)bboxs[j].class_id; detection["confidence"] = (float)bboxs[j].conf; detection["bbox"] = py::cast(std::vector<int>{r.x, r.y, r.x + r.width, r.y + r.height}); result_list.append(detection); } return result_list; } }; // Python绑定代码 PYBIND11_MODULE(yolov5_trt, m) { m.doc() = "YOLOv5 TensorRT Python bindings"; py::class_<YOLOv5Detector>(m, "YOLOv5Detector") .def(py::init<const std::string&, int, int>(), "Initialize detector with engine file", py::arg("engine_file"), py::arg("frame_width"), py::arg("frame_height")) .def("detect", &YOLOv5Detector::detect, "Perform detection on input image", py::arg("input_image"), py::arg("input_w") = kInputW, py::arg("input_h") = kInputH, py::arg("conf_thresh") = kConfThresh, py::arg("nms_thresh") = kNmsThresh); } 实际在Jetson Oron Nano (8GB)上对720P输入大小的视频进行目标检测,平均帧率稳定在120+ FPS,满足工业场景下对实时性的要求。python yolov5_infer.py[11/06/2025-15:23:26] [I] [TRT] Loaded engine size: 7 MiB Deserialize yoloLayer plugin: YoloLayer [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +955, now: CPU 830, GPU 4470 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +149, now: CPU 913, GPU 4619 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7, now: CPU 0, GPU 7 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 913, GPU 4620 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 913, GPU 4623 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 18 (MiB) Processed 1442 frames Average FPS: 127.51 Min FPS: 75.75 Max FPS: 134.67 Conclusion Remarks最后我们还提供了ByteTrack跟踪算法的Python绑定,基于Pybind11实现,并在原有算法基础上提供了跟踪目标的类别信息,Jetson Orin Nano也能在此基础上也能实现高达83 FPS的实时目标检测和跟踪性能:ByteTrack-Pybind11: 高性能实时目标跟踪解决方案 🚀
-
CNN-RNN 视频动态手势识别人工智能的发展日新月异,也深刻的影响到人机交互领域的发展。手势动作作为一种自然、快捷的交互方式,在智能驾驶、虚拟现实等领域有着广泛的应用。手势识别的任务是,当操作者做出某个手势动作后,计算机能够快速准确的判断出该手势的类型。本文将使用ModelArts开发训练一个视频动态手势识别的算法模型,对上滑、下滑、左滑、右滑、打开、关闭等动态手势类别进行检测,实现类似华为手机隔空手势的功能。算法简介CNN-RNN视频动态手势识别算法首先使用预训练网络InceptionResNetV2逐帧提取视频动作片段特征,然后输入LSTM进行分类。我们使用全栈AI黑客松决赛样例数据对算法进行测试,总共包含108段视频,数据集包含无效手势、上滑、下滑、左滑、右滑、打开、关闭、放大、缩小等9种手势的视频,数据集下载链接如下:https://developer.huaweicloud.com/develop/aigallery/dataset/detail?id=7e9c0d90-461f-4af2-93b3-67a8df76c109代码实现首先我们将采集的视频文件解码抽取关键帧,每隔4帧保存一次,然后对图像进行中心裁剪和预处理,代码如下:def load_video(file_name): cap = cv2.VideoCapture(file_name) # 每隔多少帧抽取一次 frame_interval = 4 frames = [] count = 0 while True: ret, frame = cap.read() if not ret: break # 每隔frame_interval帧保存一次 if count % frame_interval == 0: # 中心裁剪 frame = crop_center_square(frame) # 缩放 frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE)) # BGR -> RGB [0,1,2] -> [2,1,0] frame = frame[:, :, [2, 1, 0]] frames.append(frame) count += 1 return np.array(frames) 然后我们创建图像特征提取器,使用预训练模型InceptionResNetV2提取图像特征,代码如下:def get_feature_extractor(): feature_extractor = keras.applications.inception_resnet_v2.InceptionResNetV2( weights = 'imagenet', include_top = False, pooling = 'avg', input_shape = (IMG_SIZE, IMG_SIZE, 3) ) preprocess_input = keras.applications.inception_resnet_v2.preprocess_input inputs = keras.Input((IMG_SIZE, IMG_SIZE, 3)) preprocessed = preprocess_input(inputs) outputs = feature_extractor(preprocessed) model = keras.Model(inputs, outputs, name = 'feature_extractor') return model接着提取视频特征向量,如果视频不足40帧就创建全0数组进行补白:def load_data(videos, labels): video_features = [] for video in tqdm(videos): frames = load_video(video) counts = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if counts < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - counts # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 获取前MAX_SEQUENCE_LENGTH帧画面 frames = frames[:MAX_SEQUENCE_LENGTH, :] # 批量提取特征 video_feature = feature_extractor.predict(frames) video_features.append(video_feature) return np.array(video_features), np.array(labels) 最后创建LSTM Model,代码如下:def video_cls_model(class_vocab): # 类别数量 classes_num = len(class_vocab) # 定义模型 model = keras.Sequential([ layers.Input(shape=(MAX_SEQUENCE_LENGTH, NUM_FEATURES)), layers.LSTM(64, return_sequences=True), layers.Flatten(), layers.Dense(classes_num, activation='softmax') ]) # 编译模型 model.compile(optimizer = keras.optimizers.Adam(1e-5), loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'] ) return model模型训练体验完整的训练流程可以点击Run in ModelArts运行我发布的Notebook使用云上的免费算力训练,代码链接如下:https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=e8914ecc-953e-48b1-8e5d-90b37b3bc8e9视频推理首先加载LSTM Model,获取视频类别索引标签:import random # 加载模型 model = tf.keras.models.load_model('saved_model') # 类别标签 label_to_name = {0:'无效手势', 1:'上滑', 2:'下滑', 3:'左滑', 4:'右滑', 5:'打开', 6:'关闭', 7:'放大', 8:'缩小'} 然后使用图像特征提取器InceptionResNetV2提取视频特征:# 获取视频特征 def getVideoFeat(frames): frames_count = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if frames_count < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - frames_count # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 取前MAX_SEQ_LENGTH帧 frames = frames[:MAX_SEQUENCE_LENGTH,:] # 计算视频特征 N, 1536 video_feat = feature_extractor.predict(frames) return video_feat最后将视频序列的特征向量输入LSTM进行预测:# 视频预测 def testVideo(): test_file = random.sample(videos, 1)[0] label = test_file.split('_')[-2] print('文件名:{}'.format(test_file) ) print('真实类别:{}'.format(label_to_name.get(int(label))) ) # 读取视频每一帧 frames = load_video(test_file) # 挑选前帧MAX_SEQUENCE_LENGTH显示 frames = frames[:MAX_SEQUENCE_LENGTH].astype(np.uint8) # 保存为GIF imageio.mimsave('animation.gif', frames, duration=10) # 获取特征 feat = getVideoFeat(frames) # 模型推理 prob = model.predict(tf.expand_dims(feat, axis=0))[0] print('预测类别:') for i in np.argsort(prob)[::-1][:5]: print('{}: {}%'.format(label_to_name[i], round(prob[i]*100, 2))) return display(Image(open('animation.gif', 'rb').read())) 运行testVideo()函数,会随机选择一个视频进行预测,并显示模型的预测结果:文件名:hand_gesture/man_005_3_1.mp4 真实类别:左滑 预测类别: 左滑: 78.32% 右滑: 8.54% 下滑: 5.51% 无效手势: 4.33% 上滑: 1.67%文章小结本文介绍了基于CNN-RNN架构的视频动态手势识别算法,通过结合InceptionResNetV2卷积神经网络和LSTM循环神经网络,实现了对多种手势动作(如上滑、下滑、左滑、右滑、打开、关闭等)的高效识别。
-
OrangePi AI Studio Pro基于MindYolo实现YOLOv8模型训练及验证OrangePi AI Studio Pro是基于 2 个昇腾 310P 处理器的新一代高性能推理解析卡,提供基础通用算力+超强AI算力,整合了训练和推理的全部底层软件栈,实现训推一体。其中AI半精度FP16算力约为176TFLOPS,整数Int8精度可达352TOPS。本章将介绍如何在昇腾310上基于mindyolo实现YOLOv8模型的训练及验证。一、环境准备首先检查昇腾310P的NPU驱动,在命令行中输入:npu-smi info,可以看到两块昇腾310P的AICore的利用率和内存的占用情况。+--------------------------------------------------------------------------------------------------------+ | npu-smi v1.0 Version: 24.1.rc4.b999 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 30208 310P1 | OK | NA 41 0 / 0 | | 0 0 | 0000:77:00.0 | 0 1416 / 89608 | +-------------------------------+-----------------+------------------------------------------------------+ | 30208 310P1 | OK | NA 40 0 / 0 | | 1 1 | 0000:77:00.0 | 0 1622 / 89085 | +===============================+=================+======================================================+ +-------------------------------+-----------------+------------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===============================+=================+======================================================+ | No running processes found in NPU 30208 | +===============================+=================+======================================================+之后升级CANN的版本以及更新MindSpore,可以参考我的另一篇文章:如何在OrangePi Studio Pro上升级CANN以及的Pytorch和MindSpore,升级完成后,检查MindSpore的安装情况,我使用的版本是2.7.0。source /usr/local/Ascend/ascend-toolkit/set_env.sh python3 -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check()" [WARNING] ME(1621400:139701939115840,MainProcess):2025-09-24-10:46:21.978.000 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. MindSpore version: 2.7.0 [WARNING] GE_ADPT(1621400,7f0e18710640,python3):2025-09-24-10:46:23.323.570 [mindspore/ops/kernel/ascend/acl_ir/op_api_exec.cc:169] GetAscendDefaultCustomPath] Checking whether the so exists or if permission to access it is available: /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize_vision/op_api/lib/libcust_opapi.so The result of multiplication calculation is correct, MindSpore has been installed on platform [Ascend] successfully! 克隆mindyolo仓库,我们使用由天津大学发布的无人机视觉挑战赛数据集VisDrone-Dataset进行模型的训练及验证。git clone https://github.com/mindspore-lab/mindyolo.git正克隆到 'mindyolo'... remote: Enumerating objects: 3505, done. remote: Counting objects: 100% (157/157), done. remote: Compressing objects: 100% (69/69), done. remote: Total 3505 (delta 114), reused 88 (delta 88), pack-reused 3348 (from 2) 接收对象中: 100% (3505/3505), 6.74 MiB | 8.91 MiB/s, 完成. 处理 delta 中: 100% (2048/2048), 完成.我们将下载后的数据集首先转换成YOLO格式,具体的转换教程可以参考网上的公开资料,经过转换后的visdrone数据集包括以下内容:visdrone ├── train │ ├── images │ │ ├── 000001.jpg │ │ ├── 000002.jpg │ │ ├── ... │ │ └── ... │ └── labels │ ├── 000001.txt │ ├── 000002.txt │ ├── ... │ └── ... └── val ├── images │ ├── 000001.jpg │ ├── 000002.jpg │ ├── ... │ └── ... └── labels ├── 000001.txt ├── 000001.txt ├── ... └── ... 二、数据格式转换由于mindyolo中的train过程使用的数据是yolo格式,而eval过程使用coco数据集中的json文件,因此需要再增加coco格式的标注文件instances_train2017.json、instances_val2017.json以及train.txt和val.txt文件,经过转换后的visdrone数据集包括以下内容:visdrone_COCO_format ├── train.txt ├── val.txt ├── train │ ├── images │ │ ├── 000001.jpg │ │ ├── 000002.jpg │ │ ├── ... │ │ └── ... │ └── labels │ ├── 000001.txt │ ├── 000002.txt │ ├── ... │ └── ... ├── annotations │ ├── instances_train2017.json │ └── instances_val2017.json └── val ├── images │ ├── 000001.jpg │ ├── 000002.jpg │ ├── ... │ └── ... └── labels ├── 000001.txt ├── 000001.txt ├── ... └── ... 我们先把YOLO格式的数据集转换为COCO格式,在mindyolo中实现yolov5_yaml_to_coco.py脚本,具体代码如下:# -*- encoding: utf-8 -*- # @Author: SWHL # @Contact: liekkaskono@163.com import argparse import glob import json import os import shutil import time from pathlib import Path import cv2 import yaml from tqdm import tqdm def read_txt(txt_path): with open(str(txt_path), "r", encoding="utf-8") as f: data = list(map(lambda x: x.rstrip("\n"), f)) return data def mkdir(dir_path): Path(dir_path).mkdir(parents=True, exist_ok=True) def verify_exists(file_path): file_path = Path(file_path).resolve() if not file_path.exists(): raise FileNotFoundError(f"The {file_path} is not exists!!!") class YOLOV5CFG2COCO: def __init__(self, yaml_path): verify_exists(yaml_path) with open(yaml_path, "r", encoding="UTF-8") as f: self.data_cfg = yaml.safe_load(f) self.root_dir = Path(yaml_path).parent.parent self.root_data_dir = Path(self.data_cfg.get("path")) self.train_path = self._get_data_dir("train") self.val_path = self._get_data_dir("val") nc = self.data_cfg["nc"] if "names" in self.data_cfg: self.names = self.data_cfg.get("names") else: # assign class names if missing self.names = [f"class{i}" for i in range(self.data_cfg["nc"])] assert ( len(self.names) == nc ), f"{len(self.names)} names found for nc={nc} dataset in {yaml_path}" # 构建COCO格式目录 self.dst = self.root_dir / f"{Path(self.root_data_dir).stem}_COCO_format" self.coco_train = "train/images" self.coco_val = "val/images" self.coco_annotation = "annotations" self.coco_train_json = ( self.dst / self.coco_annotation / f"instances_train2017.json" ) self.coco_val_json = ( self.dst / self.coco_annotation / f"instances_val2017.json" ) mkdir(self.dst) mkdir(self.dst / self.coco_train) mkdir(self.dst / self.coco_val) mkdir(self.dst / self.coco_annotation) # 构建json内容结构 self.type = "instances" self.categories = [] self._get_category() self.annotation_id = 1 cur_year = time.strftime("%Y", time.localtime(time.time())) self.info = { "year": int(cur_year), "version": "1.0", "description": "For object detection", "date_created": cur_year, } self.licenses = [ { "id": 1, "name": "Apache License v2.0", "url": "https://choosealicense.com/licenses/apache-2.0/", } ] def _get_data_dir(self, mode): data_dir = self.data_cfg.get(mode) if data_dir: if isinstance(data_dir, str): full_path = [str(self.root_data_dir / data_dir)] elif isinstance(data_dir, list): full_path = [str(self.root_data_dir / one_dir) for one_dir in data_dir] else: raise TypeError(f"{data_dir} is not str or list.") else: raise ValueError(f"{mode} dir is not in the yaml.") return full_path def _get_category(self): for i, category in enumerate(self.names, start=1): self.categories.append( { "supercategory": category, "id": i, "name": category, } ) def generate(self): self.train_files = self.get_files(self.train_path) self.valid_files = self.get_files(self.val_path) train_dest_dir = Path(self.dst) / self.coco_train self.gen_dataset( self.train_files, train_dest_dir, self.coco_train_json, mode="train" ) val_dest_dir = Path(self.dst) / self.coco_val self.gen_dataset(self.valid_files, val_dest_dir, self.coco_val_json, mode="val") print(f"The output directory is: {self.dst}") def get_files(self, path): IMG_FORMATS = ["bmp", "dng", "jpeg", "jpg", "mpo", "png", "tif", "tiff", "webp"] f = [] for p in path: p = Path(p) if p.is_dir(): f += glob.glob(str(p / "**" / "*.*"), recursive=True) elif p.is_file(): # file with open(p, "r", encoding="utf-8") as t: t = t.read().strip().splitlines() parent = str(p.parent) + os.sep f += [ x.replace("./", parent) if x.startswith("./") else x for x in t ] else: raise FileExistsError(f"{p} does not exist") im_files = sorted( x.replace("/", os.sep) for x in f if x.split(".")[-1].lower() in IMG_FORMATS ) return im_files def gen_dataset(self, img_paths, target_img_path, target_json, mode): """ https://cocodataset.org/#format-data """ images = [] annotations = [] sa, sb = ( os.sep + "images" + os.sep, os.sep + "labels" + os.sep, ) # /images/, /labels/ substrings for img_id, img_path in enumerate(tqdm(img_paths, desc=mode), 1): label_path = sb.join(img_path.rsplit(sa, 1)).rsplit(".", 1)[0] + ".txt" img_path = Path(img_path) verify_exists(img_path) imgsrc = cv2.imread(str(img_path)) height, width = imgsrc.shape[:2] dest_file_name = f"{img_id:012d}.jpg" save_img_path = target_img_path / dest_file_name if img_path.suffix.lower() == ".jpg": shutil.copyfile(img_path, save_img_path) else: cv2.imwrite(str(save_img_path), imgsrc) images.append( { "date_captured": "2021", "file_name": dest_file_name, "id": img_id, "height": height, "width": width, } ) if Path(label_path).exists(): new_anno = self.read_annotation(label_path, img_id, height, width) if len(new_anno) > 0: annotations.extend(new_anno) else: raise ValueError(f"{label_path} is empty") else: raise FileNotFoundError(f"{label_path} not exists") json_data = { "info": self.info, "images": images, "licenses": self.licenses, "type": self.type, "annotations": annotations, "categories": self.categories, } with open(target_json, "w", encoding="utf-8") as f: json.dump(json_data, f, ensure_ascii=False) def read_annotation(self, txt_file, img_id, height, width): annotation = [] all_info = read_txt(txt_file) for label_info in all_info: # 遍历一张图中不同标注对象 label_info = label_info.split(" ") if len(label_info) < 5: continue category_id, vertex_info = label_info[0], label_info[1:] segmentation, bbox, area = self._get_annotation(vertex_info, height, width) annotation.append( { "segmentation": segmentation, "area": area, "iscrowd": 0, "image_id": img_id, "bbox": bbox, "category_id": int(category_id) + 1, "id": self.annotation_id, } ) self.annotation_id += 1 return annotation @staticmethod def _get_annotation(vertex_info, height, width): cx, cy, w, h = [float(i) for i in vertex_info] cx = cx * width cy = cy * height box_w = w * width box_h = h * height x0 = max(cx - box_w / 2, 0) y0 = max(cy - box_h / 2, 0) x1 = min(x0 + box_w, width) y1 = min(y0 + box_h, height) segmentation = [[x0, y0, x1, y0, x1, y1, x0, y1]] bbox = [x0, y0, box_w, box_h] area = box_w * box_h return segmentation, bbox, area def main(): parser = argparse.ArgumentParser("Datasets converter from YOLOV5 to COCO") parser.add_argument( "--yaml_path", type=str, default="dataset/YOLOV5_yaml/sample.yaml", help="Dataset cfg file", ) args = parser.parse_args() converter = YOLOV5CFG2COCO(args.yaml_path) converter.generate() if __name__ == "__main__": main() 之后在mindyolo目录下创建YOLO格式的配置文件visdrone.yaml:# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] path: /root/workspace/dataset/visdrone # dataset root dir (absolute path) train: train/images # train images (relative to 'path') val: val/images # val images (relative to 'path') test: # test images (optional) nc: 12 # Classes,类别 names: 0: ignored regions 1: pedestrian 2: people 3: bicycle 4: car 5: van 6: truck 7: tricycle 8: awning-tricycle 9: bus 10: motor 11: others在终端中运行如下命令将YOLO格式的数据集转换为COCO格式:python3 yolov5_yaml_to_coco.py --yaml_path visdrone.yamltrain: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6471/6471 [01:13<00:00, 88.07it/s] val: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548/548 [00:03<00:00, 148.22it/s] The output directory is: visdrone_COCO_format再创建coco2yolo.py的Python脚本将COCO格式的标注文件.json导出为labels文件夹中YOLO格式的标注文件.txt:import json import os import argparse parser = argparse.ArgumentParser(description='Test yolo data.') parser.add_argument('-j', help='JSON file', dest='json', required=True) parser.add_argument('-o', help='path to output folder', dest='out',required=True) args = parser.parse_args() json_file = args.json output = args.out class COCO2YOLO: def __init__(self): self._check_file_and_dir(json_file, output) self.labels = json.load(open(json_file, 'r', encoding='utf-8')) self.coco_id_name_map = self._categories() self.coco_name_list = list(self.coco_id_name_map.values()) print("total images", len(self.labels['images'])) print("total categories", len(self.labels['categories'])) print("total labels", len(self.labels['annotations'])) def _check_file_and_dir(self, file_path, dir_path): if not os.path.exists(file_path): raise ValueError("file not found") if not os.path.exists(dir_path): os.makedirs(dir_path) def _categories(self): categories = {} for cls in self.labels['categories']: categories[cls['id']] = cls['name'] return categories def _load_images_info(self): images_info = {} for image in self.labels['images']: id = image['id'] file_name = image['file_name'] if file_name.find('\\') > -1: file_name = file_name[file_name.index('\\')+1:] w = image['width'] h = image['height'] images_info[id] = (file_name, w, h) return images_info def _bbox_2_yolo(self, bbox, img_w, img_h): x, y, w, h = bbox[0], bbox[1], bbox[2], bbox[3] centerx = bbox[0] + w / 2 centery = bbox[1] + h / 2 dw = 1 / img_w dh = 1 / img_h centerx *= dw w *= dw centery *= dh h *= dh return centerx, centery, w, h def _convert_anno(self, images_info): anno_dict = dict() for anno in self.labels['annotations']: bbox = anno['bbox'] image_id = anno['image_id'] category_id = anno['category_id'] image_info = images_info.get(image_id) image_name = image_info[0] img_w = image_info[1] img_h = image_info[2] yolo_box = self._bbox_2_yolo(bbox, img_w, img_h) anno_info = (image_name, category_id, yolo_box) anno_infos = anno_dict.get(image_id) if not anno_infos: anno_dict[image_id] = [anno_info] else: anno_infos.append(anno_info) anno_dict[image_id] = anno_infos return anno_dict def save_classes(self): sorted_classes = list(map(lambda x: x['name'], sorted(self.labels['categories'], key=lambda x: x['id']))) print('coco names', sorted_classes) with open('coco.names', 'w', encoding='utf-8') as f: for cls in sorted_classes: f.write(cls + '\n') f.close() def coco2yolo(self): print("loading image info...") images_info = self._load_images_info() print("loading done, total images", len(images_info)) print("start converting...") anno_dict = self._convert_anno(images_info) print("converting done, total labels", len(anno_dict)) print("saving txt file...") self._save_txt(anno_dict) print("saving done") def _save_txt(self, anno_dict): for k, v in anno_dict.items(): file_name = os.path.splitext(v[0][0])[0] + ".txt" with open(os.path.join(output, file_name), 'w', encoding='utf-8') as f: print(k, v) for obj in v: cat_name = self.coco_id_name_map.get(obj[1]) category_id = self.coco_name_list.index(cat_name) box = ['{:.6f}'.format(x) for x in obj[2]] box = ' '.join(box) line = str(category_id) + ' ' + box f.write(line + '\n') if __name__ == '__main__': c2y = COCO2YOLO() c2y.coco2yolo() 在终端中切换到mindyolo目录下依次运行如下命令导出instances_train2017.json和instances_val2017.json文件对应的YOLO格式的标注文件到labels文件夹中:python3 coco2yolo.py -j ./visdrone_COCO_format/annotations/instances_train2017.json -o ./visdrone_COCO_format/train/labelspython3 coco2yolo.py -j ./visdrone_COCO_format/annotations/instances_val2017.json -o ./visdrone_COCO_format/val/labels最后创建generate_txt.sh脚本在COCO数据集目录下生成train.txt和val.txt,指定训练图片和验证图片的在数据集中的相对路径:#!/bin/bash # 检查是否提供了数据集路径参数 if [ $# -eq 0 ]; then echo "Usage: $0 <dataset_path>" echo "Example: $0 /path/to/visdrone" exit 1 fi # 获取数据集路径 DATASET_PATH="$1" # 检查数据集路径是否存在 if [ ! -d "$DATASET_PATH" ]; then echo "Error: Dataset path '$DATASET_PATH' does not exist." exit 1 fi # 定义训练和验证图片目录 TRAIN_DIR="$DATASET_PATH/train/images" VAL_DIR="$DATASET_PATH/val/images" # 检查训练和验证目录是否存在 if [ ! -d "$TRAIN_DIR" ]; then echo "Error: Train directory '$TRAIN_DIR' does not exist." exit 1 fi if [ ! -d "$VAL_DIR" ]; then echo "Error: Validation directory '$VAL_DIR' does not exist." exit 1 fi # 生成 train.txt TRAIN_TXT="$DATASET_PATH/train.txt" ls "$TRAIN_DIR" | grep '\.jpg$' | sort | sed 's/^/\.\/train\/images\//' > "$TRAIN_TXT" echo "Generated $TRAIN_TXT" # 生成 val.txt VAL_TXT="$DATASET_PATH/val.txt" ls "$VAL_DIR" | grep '\.jpg$' | sort | sed 's/^/\.\/val\/images\//' > "$VAL_TXT" echo "Generated $VAL_TXT" echo "Successfully generated train.txt and val.txt in $DATASET_PATH" 在终端中运行generate_txt.sh,并传入前面COCO数据集的路径:chmod +x generate_txt.sh ./generate_txt.sh visdrone_COCO_formatGenerated visdrone_COCO_format/train.txt Generated visdrone_COCO_format/val.txt Successfully generated train.txt and val.txt in visdrone_COCO_format最终生成的visdrone_COCO_format数据集的格式如下,可以直接用于MindYOLOv8模型的训练:visdrone_COCO_format ├── train.txt ├── val.txt ├── train │ ├── images │ │ ├── 000001.jpg │ │ ├── 000002.jpg │ │ ├── ... │ │ └── ... │ └── labels │ ├── 000001.txt │ ├── 000002.txt │ ├── ... │ └── ... ├── annotations │ ├── instances_train2017.json │ └── instances_val2017.json └── val ├── images │ ├── 000001.jpg │ ├── 000002.jpg │ ├── ... │ └── ... └── labels ├── 000001.txt ├── 000001.txt ├── ... └── ... 三、模型训练MindYOLO支持yaml文件继承机制,因此新编写的配置文件只需要继承MindYOLO提供的原生yaml文件现有配置文件:在configs目录下编写MindYOLO数据集的yaml配置文件,指定训练图片和验证图片的路径以及模型的类别标签:data: dataset_name: visdrone_COCO_format train_set: /root/workspace/mindyolo/visdrone_COCO_format/train.txt val_set: /root/workspace/mindyolo/visdrone_COCO_format/val.txt test_set: /root/workspace/mindyolo/visdrone_COCO_format/val.txt nc: 12 # class names names: ['ignored regions', 'pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others' ] train_transforms: [] test_transforms: [] 修改configs/yolov8s.yaml文件,注释掉原有的coco.yaml配置文件,指定我们自己的数据集,同时添加epochs、img_size、per_batch_size、multi-stage data augment等自定义训练参数:__BASE__: [ # '../coco.yaml', '../visdrone.yaml', './hyp.scratch.low.yaml', './yolov8-base.yaml' ] overflow_still_update: False network: depth_multiple: 0.33 # scales module repeats width_multiple: 0.50 # scales convolution channels max_channels: 1024 epochs: 10 img_size: 1024 per_batch_size: 16 data: num_parallel_workers: 8 # multi-stage data augment train_transforms: { stage_epochs: [ 5, 5 ], trans_list: [ [ { func_name: mosaic, prob: 1.0 }, { func_name: resample_segments }, { func_name: random_perspective, prob: 1.0, degrees: 0.0, translate: 0.1, scale: 0.5, shear: 0.0 }, {func_name: albumentations}, {func_name: hsv_augment, prob: 1.0, hgain: 0.015, sgain: 0.7, vgain: 0.4}, {func_name: fliplr, prob: 0.5}, {func_name: label_norm, xyxy2xywh_: True}, {func_name: label_pad, padding_size: 160, padding_value: -1}, {func_name: image_norm, scale: 255.}, {func_name: image_transpose, bgr2rgb: True, hwc2chw: True} ], [ {func_name: letterbox, scaleup: True}, {func_name: resample_segments}, {func_name: random_perspective, prob: 1.0, degrees: 0.0, translate: 0.1, scale: 0.5, shear: 0.0}, {func_name: albumentations}, {func_name: hsv_augment, prob: 1.0, hgain: 0.015, sgain: 0.7, vgain: 0.4}, {func_name: fliplr, prob: 0.5}, {func_name: label_norm, xyxy2xywh_: True}, {func_name: label_pad, padding_size: 160, padding_value: -1}, {func_name: image_norm, scale: 255.}, {func_name: image_transpose, bgr2rgb: True, hwc2chw: True} ]] } test_transforms: [ {func_name: letterbox, scaleup: False, only_image: True}, {func_name: image_norm, scale: 255.}, {func_name: image_transpose, bgr2rgb: True, hwc2chw: True} ] 在终端中运行train.py进行模型训练,指定模型的配置文件以及使用昇腾NPU:python3 train.py --config ./configs/yolov8/yolov8s.yaml --device_target Ascend默认是跑在0卡上也可以在环境变量中指定DEVICE_ID让模型的训练代码跑在1卡上:import os os.setenv("DEVICE_ID", 1) 如果不想设置环境变量也可以修改mindyolo\mindyolo\utils\utils.py中默认的参数:import os import random import yaml import cv2 from datetime import datetime import numpy as np import mindspore as ms from mindspore import ops, Tensor, nn from mindspore.communication.management import get_group_size, get_rank, init from mindspore import ParallelMode from mindyolo.utils import logger def set_seed(seed=2): np.random.seed(seed) random.seed(seed) ms.set_seed(seed) def set_default(args): # Set Context ms.set_context(mode=args.ms_mode) ms.set_recursion_limit(args.max_call_depth) if args.ms_mode == 0: ms.set_context(jit_config={"jit_level": "O2"}) if args.device_target == "Ascend": ms.set_device("Ascend", int(os.getenv("DEVICE_ID", 1))) ... 2025-10-23 14:48:02,364 [INFO] parse_args: 2025-10-23 14:48:02,364 [INFO] task detect 2025-10-23 14:48:02,364 [INFO] device_target Ascend 2025-10-23 14:48:02,364 [INFO] save_dir ./runs/2025.10.23-14.48.02 2025-10-23 14:48:02,364 [INFO] log_level INFO 2025-10-23 14:48:02,364 [INFO] is_parallel False 2025-10-23 14:48:02,364 [INFO] ms_mode 0 2025-10-23 14:48:02,364 [INFO] max_call_depth 2000 2025-10-23 14:48:02,364 [INFO] ms_amp_level O0 2025-10-23 14:48:02,364 [INFO] keep_loss_fp32 True 2025-10-23 14:48:02,364 [INFO] anchor_base False 2025-10-23 14:48:02,364 [INFO] ms_loss_scaler static 2025-10-23 14:48:02,364 [INFO] ms_loss_scaler_value 1024.0 2025-10-23 14:48:02,364 [INFO] ms_jit True 2025-10-23 14:48:02,364 [INFO] ms_enable_graph_kernel False 2025-10-23 14:48:02,364 [INFO] ms_datasink False 2025-10-23 14:48:02,364 [INFO] overflow_still_update False 2025-10-23 14:48:02,364 [INFO] clip_grad False 2025-10-23 14:48:02,364 [INFO] clip_grad_value 10.0 2025-10-23 14:48:02,364 [INFO] ema True 2025-10-23 14:48:02,364 [INFO] weight 2025-10-23 14:48:02,364 [INFO] ema_weight 2025-10-23 14:48:02,364 [INFO] freeze [] 2025-10-23 14:48:02,364 [INFO] epochs 10 2025-10-23 14:48:02,364 [INFO] per_batch_size 16 2025-10-23 14:48:02,364 [INFO] img_size 1024 2025-10-23 14:48:02,364 [INFO] nbs 64 2025-10-23 14:48:02,364 [INFO] accumulate 1 2025-10-23 14:48:02,364 [INFO] auto_accumulate False 2025-10-23 14:48:02,364 [INFO] log_interval 100 2025-10-23 14:48:02,364 [INFO] single_cls False 2025-10-23 14:48:02,364 [INFO] sync_bn False 2025-10-23 14:48:02,364 [INFO] keep_checkpoint_max 100 2025-10-23 14:48:02,364 [INFO] run_eval False 2025-10-23 14:48:02,364 [INFO] run_eval_interval 1 2025-10-23 14:48:02,364 [INFO] conf_thres 0.001 2025-10-23 14:48:02,364 [INFO] iou_thres 0.7 2025-10-23 14:48:02,364 [INFO] conf_free True 2025-10-23 14:48:02,364 [INFO] rect False 2025-10-23 14:48:02,364 [INFO] nms_time_limit 20.0 2025-10-23 14:48:02,364 [INFO] recompute False 2025-10-23 14:48:02,364 [INFO] recompute_layers 0 2025-10-23 14:48:02,364 [INFO] seed 2 2025-10-23 14:48:02,364 [INFO] summary True 2025-10-23 14:48:02,364 [INFO] profiler False 2025-10-23 14:48:02,364 [INFO] profiler_step_num 1 2025-10-23 14:48:02,364 [INFO] opencv_threads_num 0 2025-10-23 14:48:02,364 [INFO] strict_load True 2025-10-23 14:48:02,364 [INFO] enable_modelarts False 2025-10-23 14:48:02,364 [INFO] data_url 2025-10-23 14:48:02,364 [INFO] ckpt_url 2025-10-23 14:48:02,364 [INFO] multi_data_url 2025-10-23 14:48:02,364 [INFO] pretrain_url 2025-10-23 14:48:02,364 [INFO] train_url 2025-10-23 14:48:02,364 [INFO] data_dir /cache/data/ 2025-10-23 14:48:02,364 [INFO] ckpt_dir /cache/pretrain_ckpt/ 2025-10-23 14:48:02,364 [INFO] data.dataset_name result 2025-10-23 14:48:02,364 [INFO] data.train_set /root/workspace/mindyolo/visdrone_COCO_format/train.txt 2025-10-23 14:48:02,364 [INFO] data.val_set /root/workspace/mindyolo/visdrone_COCO_format/val.txt 2025-10-23 14:48:02,364 [INFO] data.test_set /root/workspace/mindyolo/visdrone_COCO_format/val.txt 2025-10-23 14:48:02,364 [INFO] data.nc 12 2025-10-23 14:48:02,364 [INFO] data.names ['ignored regions', 'pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others'] 2025-10-23 14:48:02,364 [INFO] train_transforms.stage_epochs [5, 5] 2025-10-23 14:48:02,364 [INFO] train_transforms.trans_list [[{'func_name': 'mosaic', 'prob': 1.0}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'albumentations'}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}], [{'func_name': 'letterbox', 'scaleup': True}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'albumentations'}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}]] 2025-10-23 14:48:02,364 [INFO] data.test_transforms [{'func_name': 'letterbox', 'scaleup': False, 'only_image': True}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}] 2025-10-23 14:48:02,364 [INFO] data.num_parallel_workers 8 2025-10-23 14:48:02,364 [INFO] optimizer.optimizer momentum 2025-10-23 14:48:02,364 [INFO] optimizer.lr_init 0.01 2025-10-23 14:48:02,364 [INFO] optimizer.momentum 0.937 2025-10-23 14:48:02,364 [INFO] optimizer.nesterov True 2025-10-23 14:48:02,364 [INFO] optimizer.loss_scale 1.0 2025-10-23 14:48:02,364 [INFO] optimizer.warmup_epochs 3 2025-10-23 14:48:02,364 [INFO] optimizer.warmup_momentum 0.8 2025-10-23 14:48:02,364 [INFO] optimizer.warmup_bias_lr 0.1 2025-10-23 14:48:02,364 [INFO] optimizer.min_warmup_step 1000 2025-10-23 14:48:02,364 [INFO] optimizer.group_param yolov8 2025-10-23 14:48:02,364 [INFO] optimizer.gp_weight_decay 0.0005 2025-10-23 14:48:02,364 [INFO] optimizer.start_factor 1.0 2025-10-23 14:48:02,364 [INFO] optimizer.end_factor 0.01 2025-10-23 14:48:02,364 [INFO] optimizer.epochs 10 2025-10-23 14:48:02,364 [INFO] optimizer.nbs 64 2025-10-23 14:48:02,364 [INFO] optimizer.accumulate 1 2025-10-23 14:48:02,364 [INFO] optimizer.total_batch_size 16 2025-10-23 14:48:02,364 [INFO] loss.name YOLOv8Loss 2025-10-23 14:48:02,364 [INFO] loss.box 7.5 2025-10-23 14:48:02,364 [INFO] loss.cls 0.5 2025-10-23 14:48:02,364 [INFO] loss.dfl 1.5 2025-10-23 14:48:02,364 [INFO] loss.reg_max 16 2025-10-23 14:48:02,364 [INFO] network.model_name yolov8 2025-10-23 14:48:02,364 [INFO] network.nc 80 2025-10-23 14:48:02,364 [INFO] network.reg_max 16 2025-10-23 14:48:02,364 [INFO] network.stride [8, 16, 32] 2025-10-23 14:48:02,364 [INFO] network.backbone [[-1, 1, 'ConvNormAct', [64, 3, 2]], [-1, 1, 'ConvNormAct', [128, 3, 2]], [-1, 3, 'C2f', [128, True]], [-1, 1, 'ConvNormAct', [256, 3, 2]], [-1, 6, 'C2f', [256, True]], [-1, 1, 'ConvNormAct', [512, 3, 2]], [-1, 6, 'C2f', [512, True]], [-1, 1, 'ConvNormAct', [1024, 3, 2]], [-1, 3, 'C2f', [1024, True]], [-1, 1, 'SPPF', [1024, 5]]] 2025-10-23 14:48:02,364 [INFO] network.head [[-1, 1, 'Upsample', ['None', 2, 'nearest']], [[-1, 6], 1, 'Concat', [1]], [-1, 3, 'C2f', [512]], [-1, 1, 'Upsample', ['None', 2, 'nearest']], [[-1, 4], 1, 'Concat', [1]], [-1, 3, 'C2f', [256]], [-1, 1, 'ConvNormAct', [256, 3, 2]], [[-1, 12], 1, 'Concat', [1]], [-1, 3, 'C2f', [512]], [-1, 1, 'ConvNormAct', [512, 3, 2]], [[-1, 9], 1, 'Concat', [1]], [-1, 3, 'C2f', [1024]], [[15, 18, 21], 1, 'YOLOv8Head', ['nc', 'reg_max', 'stride']]] 2025-10-23 14:48:02,364 [INFO] network.depth_multiple 0.33 2025-10-23 14:48:02,364 [INFO] network.width_multiple 0.5 2025-10-23 14:48:02,364 [INFO] network.max_channels 1024 2025-10-23 14:48:02,364 [INFO] config ./configs/yolov8/yolov8s.yaml 2025-10-23 14:48:02,364 [INFO] rank 0 2025-10-23 14:48:02,364 [INFO] rank_size 1 2025-10-23 14:48:02,364 [INFO] total_batch_size 16 2025-10-23 14:48:02,364 [INFO] callback [] 2025-10-23 14:48:02,364 [INFO] 2025-10-23 14:48:02,365 [INFO] Please check the above information for the configurations 2025-10-23 14:48:02,441 [WARNING] Parse Model, args: nearest, keep str type 2025-10-23 14:48:02,451 [WARNING] Parse Model, args: nearest, keep str type 2025-10-23 14:48:02,528 [INFO] number of network params, total: 11.160279M, trainable: 11.140228M [WARNING] GE_ADPT(336686,7ff4350e8740,python3):2025-10-23-14:48:13.472.732 [mindspore/ops/kernel/ascend/acl_ir/op_api_exec.cc:169] GetAscendDefaultCustomPath] Checking whether the so exists or if permission to access it is available: /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize_vision/op_api/lib/libcust_opapi.so 2025-10-23 14:48:14,547 [WARNING] Parse Model, args: nearest, keep str type 2025-10-23 14:48:14,558 [WARNING] Parse Model, args: nearest, keep str type 2025-10-23 14:48:14,646 [INFO] number of network params, total: 11.160279M, trainable: 11.140228M .2025-10-23 14:48:30,416 [INFO] ema_weight not exist, default pretrain weight is currently used. 2025-10-23 14:48:30,421 [INFO] No dataset cache available, caching now... Scanning images: 0%| | 0/6471 [00:00<?, ?it/s]WARNING ⚠️ /root/workspace/mindyolo/visdrone_COCO_format/train/images/000000000335.jpg: 1 duplicate labels removed Scanning '/root/workspace/mindyolo/visdrone_COCO_format/train.cache' images and labels... 397 found, 0 missing, 0 empty, 0 corrupted: 6%|████ | 397/6471 [00:00<00:01, 3960.91it/s]WARNING ⚠️ /root/workspace/mindyolo/visdrone_COCO_format/train/images/000000000427.jpg: 1 duplicate labels removed Scanning '/root/workspace/mindyolo/visdrone_COCO_format/train.cache' images and labels... 1261 found, 0 missing, 0 empty, 0 corrupted: 19%|████████████▋ | 1261/6471 [00:00<00:01, 4238.38it/s]WARNING ⚠️ /root/workspace/mindyolo/visdrone_COCO_format/train/images/000000001492.jpg: 1 duplicate labels removed Scanning '/root/workspace/mindyolo/visdrone_COCO_format/train.cache' images and labels... 3866 found, 0 missing, 0 empty, 0 corrupted: 60%|██████████████████████████████████████▊ | 3866/6471 [00:00<00:00, 4332.85it/s]WARNING ⚠️ /root/workspace/mindyolo/visdrone_COCO_format/train/images/000000003868.jpg: 1 duplicate labels removed Scanning '/root/workspace/mindyolo/visdrone_COCO_format/train.cache' images and labels... 5607 found, 0 missing, 0 empty, 0 corrupted: 87%|████████████████████████████████████████████████████████▎ | 5607/6471 [00:01<00:00, 4337.04it/s]WARNING ⚠️ /root/workspace/mindyolo/visdrone_COCO_format/train/images/000000005742.jpg: 1 duplicate labels removed Scanning '/root/workspace/mindyolo/visdrone_COCO_format/train.cache' images and labels... 6471 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████████████████████████████████████████████████████| 6471/6471 [00:01<00:00, 4307.45it/s] 2025-10-23 14:48:32,028 [INFO] New cache created: /root/workspace/mindyolo/visdrone_COCO_format/train.cache.npy 2025-10-23 14:48:32,029 [INFO] Dataset caching success. 2025-10-23 14:48:32,051 [INFO] Dataloader num parallel workers: [8] 2025-10-23 14:48:32,135 [INFO] Dataset Cache file hash/version check success. 2025-10-23 14:48:32,135 [INFO] Load dataset cache from [/root/workspace/mindyolo/visdrone_COCO_format/train.cache.npy] success. Scanning '/root/workspace/mindyolo/visdrone_COCO_format/train.cache.npy' images and labels... 6471 found, 0 missing, 0 empty, 0 corrupted: 100%|███████████████████████████████████████████████████████████████████████| 6471/6471 [00:00<?, ?it/s] 2025-10-23 14:48:32,157 [INFO] Dataloader num parallel workers: [8] 2025-10-23 14:48:32,273 [INFO] Registry(name=callback, total=4) 2025-10-23 14:48:32,273 [INFO] (0): YoloxSwitchTrain in mindyolo/utils/callback.py 2025-10-23 14:48:32,273 [INFO] (1): EvalWhileTrain in mindyolo/utils/callback.py 2025-10-23 14:48:32,273 [INFO] (2): SummaryCallback in mindyolo/utils/callback.py 2025-10-23 14:48:32,273 [INFO] (3): ProfilerCallback in mindyolo/utils/callback.py 2025-10-23 14:48:32,273 [INFO] 2025-10-23 14:48:32,276 [INFO] got 1 active callback as follows: 2025-10-23 14:48:32,276 [INFO] SummaryCallback() 2025-10-23 14:48:32,276 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :). albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success [INFO] albumentations load success [INFO] albumentations load success [INFO] albumentations load success [INFO] albumentations load success [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success .........2025-10-23 14:52:54,293 [INFO] Epoch 1/10, Step 100/404, imgsize (1024, 1024), loss: 5.5585, lbox: 3.2052, lcls: 0.4855, dfl: 1.8678, cur_lr: 0.09257426112890244 2025-10-23 14:52:55,203 [INFO] Epoch 1/10, Step 100/404, step time: 2629.27 ms 2025-10-23 14:55:40,115 [INFO] Epoch 1/10, Step 200/404, imgsize (1024, 1024), loss: 4.5693, lbox: 2.5884, lcls: 0.4230, dfl: 1.5578, cur_lr: 0.08514851331710815 2025-10-23 14:55:40,138 [INFO] Epoch 1/10, Step 200/404, step time: 1649.36 ms 2025-10-23 14:58:25,055 [INFO] Epoch 1/10, Step 300/404, imgsize (1024, 1024), loss: 3.9681, lbox: 2.1428, lcls: 0.3853, dfl: 1.4400, cur_lr: 0.07772277295589447 2025-10-23 14:58:25,078 [INFO] Epoch 1/10, Step 300/404, step time: 1649.39 ms 2025-10-23 15:01:10,020 [INFO] Epoch 1/10, Step 400/404, imgsize (1024, 1024), loss: 3.6795, lbox: 2.0528, lcls: 0.3339, dfl: 1.2929, cur_lr: 0.07029703259468079 2025-10-23 15:01:10,044 [INFO] Epoch 1/10, Step 400/404, step time: 1649.65 ms 2025-10-23 15:01:17,111 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-1_404.ckpt 2025-10-23 15:01:17,111 [INFO] Epoch 1/10, epoch time: 12.75 min. 2025-10-23 15:04:02,010 [INFO] Epoch 2/10, Step 100/404, imgsize (1024, 1024), loss: 3.5361, lbox: 1.9678, lcls: 0.3183, dfl: 1.2500, cur_lr: 0.062162574380636215 2025-10-23 15:04:02,018 [INFO] Epoch 2/10, Step 100/404, step time: 1649.07 ms 2025-10-23 15:06:46,939 [INFO] Epoch 2/10, Step 200/404, imgsize (1024, 1024), loss: 3.3767, lbox: 1.8395, lcls: 0.3042, dfl: 1.2329, cur_lr: 0.05465514957904816 2025-10-23 15:06:46,947 [INFO] Epoch 2/10, Step 200/404, step time: 1649.28 ms 2025-10-23 15:09:31,885 [INFO] Epoch 2/10, Step 300/404, imgsize (1024, 1024), loss: 3.3604, lbox: 1.8753, lcls: 0.3134, dfl: 1.1718, cur_lr: 0.0471477210521698 2025-10-23 15:09:31,894 [INFO] Epoch 2/10, Step 300/404, step time: 1649.46 ms 2025-10-23 15:12:16,806 [INFO] Epoch 2/10, Step 400/404, imgsize (1024, 1024), loss: 3.2902, lbox: 1.8262, lcls: 0.2795, dfl: 1.1846, cur_lr: 0.03964029625058174 2025-10-23 15:12:16,814 [INFO] Epoch 2/10, Step 400/404, step time: 1649.20 ms 2025-10-23 15:12:23,860 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-2_404.ckpt 2025-10-23 15:12:23,860 [INFO] Epoch 2/10, epoch time: 11.11 min. 2025-10-23 15:15:08,782 [INFO] Epoch 3/10, Step 100/404, imgsize (1024, 1024), loss: 3.3220, lbox: 1.7991, lcls: 0.3124, dfl: 1.2106, cur_lr: 0.031090890988707542 2025-10-23 15:15:08,791 [INFO] Epoch 3/10, Step 100/404, step time: 1649.30 ms 2025-10-23 15:17:53,703 [INFO] Epoch 3/10, Step 200/404, imgsize (1024, 1024), loss: 3.1162, lbox: 1.6879, lcls: 0.2824, dfl: 1.1460, cur_lr: 0.02350178174674511 2025-10-23 15:17:53,711 [INFO] Epoch 3/10, Step 200/404, step time: 1649.20 ms 2025-10-23 15:20:38,631 [INFO] Epoch 3/10, Step 300/404, imgsize (1024, 1024), loss: 3.0332, lbox: 1.6024, lcls: 0.2703, dfl: 1.1605, cur_lr: 0.015912672504782677 2025-10-23 15:20:38,639 [INFO] Epoch 3/10, Step 300/404, step time: 1649.28 ms 2025-10-23 15:23:23,580 [INFO] Epoch 3/10, Step 400/404, imgsize (1024, 1024), loss: 3.1371, lbox: 1.7095, lcls: 0.2808, dfl: 1.1469, cur_lr: 0.008323564194142818 2025-10-23 15:23:23,589 [INFO] Epoch 3/10, Step 400/404, step time: 1649.49 ms 2025-10-23 15:23:30,617 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-3_404.ckpt 2025-10-23 15:23:30,617 [INFO] Epoch 3/10, epoch time: 11.11 min. 2025-10-23 15:26:15,527 [INFO] Epoch 4/10, Step 100/404, imgsize (1024, 1024), loss: 3.2965, lbox: 1.8179, lcls: 0.2614, dfl: 1.2172, cur_lr: 0.007029999978840351 2025-10-23 15:26:15,535 [INFO] Epoch 4/10, Step 100/404, step time: 1649.18 ms 2025-10-23 15:29:00,451 [INFO] Epoch 4/10, Step 200/404, imgsize (1024, 1024), loss: 3.1855, lbox: 1.7697, lcls: 0.2504, dfl: 1.1654, cur_lr: 0.007029999978840351 2025-10-23 15:29:00,459 [INFO] Epoch 4/10, Step 200/404, step time: 1649.24 ms 2025-10-23 15:31:45,369 [INFO] Epoch 4/10, Step 300/404, imgsize (1024, 1024), loss: 2.9900, lbox: 1.6270, lcls: 0.2307, dfl: 1.1323, cur_lr: 0.007029999978840351 2025-10-23 15:31:45,378 [INFO] Epoch 4/10, Step 300/404, step time: 1649.18 ms 2025-10-23 15:34:30,277 [INFO] Epoch 4/10, Step 400/404, imgsize (1024, 1024), loss: 3.1742, lbox: 1.7506, lcls: 0.2590, dfl: 1.1646, cur_lr: 0.007029999978840351 2025-10-23 15:34:30,285 [INFO] Epoch 4/10, Step 400/404, step time: 1649.07 ms 2025-10-23 15:34:37,315 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-4_404.ckpt 2025-10-23 15:34:37,316 [INFO] Epoch 4/10, epoch time: 11.11 min. 2025-10-23 15:37:22,195 [INFO] Epoch 5/10, Step 100/404, imgsize (1024, 1024), loss: 2.9632, lbox: 1.6123, lcls: 0.2424, dfl: 1.1085, cur_lr: 0.006039999891072512 2025-10-23 15:37:22,204 [INFO] Epoch 5/10, Step 100/404, step time: 1648.88 ms 2025-10-23 15:40:07,094 [INFO] Epoch 5/10, Step 200/404, imgsize (1024, 1024), loss: 2.7776, lbox: 1.4777, lcls: 0.2025, dfl: 1.0975, cur_lr: 0.006039999891072512 2025-10-23 15:40:07,103 [INFO] Epoch 5/10, Step 200/404, step time: 1648.99 ms 2025-10-23 15:42:52,021 [INFO] Epoch 5/10, Step 300/404, imgsize (1024, 1024), loss: 2.7209, lbox: 1.4253, lcls: 0.2130, dfl: 1.0826, cur_lr: 0.006039999891072512 2025-10-23 15:42:52,029 [INFO] Epoch 5/10, Step 300/404, step time: 1649.26 ms 2025-10-23 15:45:36,965 [INFO] Epoch 5/10, Step 400/404, imgsize (1024, 1024), loss: 2.7360, lbox: 1.4817, lcls: 0.2157, dfl: 1.0387, cur_lr: 0.006039999891072512 2025-10-23 15:45:36,973 [INFO] Epoch 5/10, Step 400/404, step time: 1649.44 ms 2025-10-23 15:45:44,037 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-5_404.ckpt 2025-10-23 15:45:44,037 [INFO] Epoch 5/10, epoch time: 11.11 min. 2025-10-23 15:48:28,914 [INFO] Epoch 6/10, Step 100/404, imgsize (1024, 1024), loss: 2.6675, lbox: 1.4472, lcls: 0.2042, dfl: 1.0161, cur_lr: 0.005049999803304672 2025-10-23 15:48:28,923 [INFO] Epoch 6/10, Step 100/404, step time: 1648.85 ms 2025-10-23 15:51:13,798 [INFO] Epoch 6/10, Step 200/404, imgsize (1024, 1024), loss: 2.7114, lbox: 1.4235, lcls: 0.1986, dfl: 1.0893, cur_lr: 0.005049999803304672 2025-10-23 15:51:13,807 [INFO] Epoch 6/10, Step 200/404, step time: 1648.84 ms 2025-10-23 15:53:58,688 [INFO] Epoch 6/10, Step 300/404, imgsize (1024, 1024), loss: 2.6783, lbox: 1.4169, lcls: 0.1985, dfl: 1.0629, cur_lr: 0.005049999803304672 2025-10-23 15:53:58,697 [INFO] Epoch 6/10, Step 300/404, step time: 1648.90 ms 2025-10-23 15:56:43,578 [INFO] Epoch 6/10, Step 400/404, imgsize (1024, 1024), loss: 2.7539, lbox: 1.4734, lcls: 0.2037, dfl: 1.0768, cur_lr: 0.005049999803304672 2025-10-23 15:56:43,586 [INFO] Epoch 6/10, Step 400/404, step time: 1648.89 ms 2025-10-23 15:56:50,613 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-6_404.ckpt 2025-10-23 15:56:50,613 [INFO] Epoch 6/10, epoch time: 11.11 min. 2025-10-23 15:59:35,561 [INFO] Epoch 7/10, Step 100/404, imgsize (1024, 1024), loss: 2.9109, lbox: 1.6203, lcls: 0.2210, dfl: 1.0696, cur_lr: 0.00406000018119812 2025-10-23 15:59:35,569 [INFO] Epoch 7/10, Step 100/404, step time: 1649.56 ms 2025-10-23 16:02:20,470 [INFO] Epoch 7/10, Step 200/404, imgsize (1024, 1024), loss: 2.6941, lbox: 1.4727, lcls: 0.2068, dfl: 1.0147, cur_lr: 0.00406000018119812 2025-10-23 16:02:20,479 [INFO] Epoch 7/10, Step 200/404, step time: 1649.10 ms 2025-10-23 16:05:05,384 [INFO] Epoch 7/10, Step 300/404, imgsize (1024, 1024), loss: 2.8098, lbox: 1.4810, lcls: 0.2188, dfl: 1.1101, cur_lr: 0.00406000018119812 2025-10-23 16:05:05,391 [INFO] Epoch 7/10, Step 300/404, step time: 1649.12 ms 2025-10-23 16:07:50,302 [INFO] Epoch 7/10, Step 400/404, imgsize (1024, 1024), loss: 2.8426, lbox: 1.5529, lcls: 0.2108, dfl: 1.0788, cur_lr: 0.00406000018119812 2025-10-23 16:07:50,310 [INFO] Epoch 7/10, Step 400/404, step time: 1649.18 ms 2025-10-23 16:07:57,341 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-7_404.ckpt 2025-10-23 16:07:57,342 [INFO] Epoch 7/10, epoch time: 11.11 min. 2025-10-23 16:10:42,225 [INFO] Epoch 8/10, Step 100/404, imgsize (1024, 1024), loss: 2.4095, lbox: 1.2257, lcls: 0.1704, dfl: 1.0134, cur_lr: 0.0030700000934302807 2025-10-23 16:10:42,233 [INFO] Epoch 8/10, Step 100/404, step time: 1648.92 ms 2025-10-23 16:13:27,126 [INFO] Epoch 8/10, Step 200/404, imgsize (1024, 1024), loss: 2.6034, lbox: 1.3788, lcls: 0.1872, dfl: 1.0374, cur_lr: 0.0030700000934302807 2025-10-23 16:13:27,134 [INFO] Epoch 8/10, Step 200/404, step time: 1649.00 ms 2025-10-23 16:16:12,032 [INFO] Epoch 8/10, Step 300/404, imgsize (1024, 1024), loss: 2.6074, lbox: 1.3916, lcls: 0.1787, dfl: 1.0371, cur_lr: 0.0030700000934302807 2025-10-23 16:16:12,041 [INFO] Epoch 8/10, Step 300/404, step time: 1649.07 ms 2025-10-23 16:18:56,946 [INFO] Epoch 8/10, Step 400/404, imgsize (1024, 1024), loss: 2.8867, lbox: 1.4981, lcls: 0.2189, dfl: 1.1697, cur_lr: 0.0030700000934302807 2025-10-23 16:18:56,954 [INFO] Epoch 8/10, Step 400/404, step time: 1649.13 ms 2025-10-23 16:19:03,973 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-8_404.ckpt 2025-10-23 16:19:03,973 [INFO] Epoch 8/10, epoch time: 11.11 min. 2025-10-23 16:21:48,883 [INFO] Epoch 9/10, Step 100/404, imgsize (1024, 1024), loss: 2.8544, lbox: 1.6248, lcls: 0.2181, dfl: 1.0115, cur_lr: 0.0020800000056624413 2025-10-23 16:21:48,891 [INFO] Epoch 9/10, Step 100/404, step time: 1649.18 ms 2025-10-23 16:24:33,791 [INFO] Epoch 9/10, Step 200/404, imgsize (1024, 1024), loss: 2.9393, lbox: 1.6026, lcls: 0.2223, dfl: 1.1145, cur_lr: 0.0020800000056624413 2025-10-23 16:24:33,799 [INFO] Epoch 9/10, Step 200/404, step time: 1649.08 ms 2025-10-23 16:27:18,695 [INFO] Epoch 9/10, Step 300/404, imgsize (1024, 1024), loss: 2.4632, lbox: 1.2884, lcls: 0.1701, dfl: 1.0047, cur_lr: 0.0020800000056624413 2025-10-23 16:27:18,703 [INFO] Epoch 9/10, Step 300/404, step time: 1649.04 ms 2025-10-23 16:30:03,567 [INFO] Epoch 9/10, Step 400/404, imgsize (1024, 1024), loss: 2.7216, lbox: 1.4867, lcls: 0.2002, dfl: 1.0346, cur_lr: 0.0020800000056624413 2025-10-23 16:30:03,575 [INFO] Epoch 9/10, Step 400/404, step time: 1648.72 ms 2025-10-23 16:30:10,627 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-9_404.ckpt 2025-10-23 16:30:10,627 [INFO] Epoch 9/10, epoch time: 11.11 min. 2025-10-23 16:32:55,537 [INFO] Epoch 10/10, Step 100/404, imgsize (1024, 1024), loss: 2.5899, lbox: 1.4239, lcls: 0.1668, dfl: 0.9992, cur_lr: 0.0010900000343099236 2025-10-23 16:32:55,545 [INFO] Epoch 10/10, Step 100/404, step time: 1649.18 ms 2025-10-23 16:35:40,433 [INFO] Epoch 10/10, Step 200/404, imgsize (1024, 1024), loss: 2.5535, lbox: 1.3745, lcls: 0.1813, dfl: 0.9976, cur_lr: 0.0010900000343099236 2025-10-23 16:35:40,441 [INFO] Epoch 10/10, Step 200/404, step time: 1648.95 ms 2025-10-23 16:38:25,358 [INFO] Epoch 10/10, Step 300/404, imgsize (1024, 1024), loss: 2.4509, lbox: 1.2441, lcls: 0.1717, dfl: 1.0351, cur_lr: 0.0010900000343099236 2025-10-23 16:38:25,366 [INFO] Epoch 10/10, Step 300/404, step time: 1649.25 ms 2025-10-23 16:41:10,260 [INFO] Epoch 10/10, Step 400/404, imgsize (1024, 1024), loss: 2.6832, lbox: 1.4217, lcls: 0.1896, dfl: 1.0719, cur_lr: 0.0010900000343099236 2025-10-23 16:41:10,268 [INFO] Epoch 10/10, Step 400/404, step time: 1649.02 ms 2025-10-23 16:41:17,324 [INFO] Saving model to ./runs/2025.10.23-14.48.02/weights/yolov8s-10_404.ckpt 2025-10-23 16:41:17,324 [INFO] Epoch 10/10, epoch time: 11.11 min. 2025-10-23 16:41:17,742 [INFO] End Train. 2025-10-23 16:41:18,446 [INFO] Training completed.平均每个epoch耗时约10min左右,在训练过程中我们也可以查看AI Core的利用率以及内存的占用情况:npu-smi info+--------------------------------------------------------------------------------------------------------+ | npu-smi v1.0 Version: 24.1.rc4.b999 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 30208 310P1 | OK | NA 52 11372 / 11372 | | 0 0 | 0000:77:00.0 | 99 24288/ 89608 | +-------------------------------+-----------------+------------------------------------------------------+ | 30208 310P1 | OK | NA 42 0 / 0 | | 1 1 | 0000:77:00.0 | 0 1576 / 89085 | +===============================+=================+======================================================+ +-------------------------------+-----------------+------------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===============================+=================+======================================================+ | 30208 0 | 336686 | python3 | 22835 | +===============================+=================+======================================================+四、模型验证这里我们仅训练了10个epoch进行模型的验证,可以看到模型的精度和召回率如下:python3 test.py --config ./configs/yolov8/yolov8s.yaml --device_target Ascend --weight ./runs/2025.10.23-14.48.02/weights/yolov8s-10_404.ckpt2025-10-23 16:46:18,824 [INFO] parse_args: 2025-10-23 16:46:18,824 [INFO] task detect 2025-10-23 16:46:18,824 [INFO] device_target Ascend 2025-10-23 16:46:18,824 [INFO] ms_mode 0 2025-10-23 16:46:18,824 [INFO] ms_amp_level O0 2025-10-23 16:46:18,824 [INFO] ms_enable_graph_kernel False 2025-10-23 16:46:18,824 [INFO] precision_mode None 2025-10-23 16:46:18,824 [INFO] weight ./runs/2025.10.23-14.48.02/weights/yolov8s-10_404.ckpt 2025-10-23 16:46:18,824 [INFO] per_batch_size 16 2025-10-23 16:46:18,824 [INFO] img_size 1024 2025-10-23 16:46:18,824 [INFO] single_cls False 2025-10-23 16:46:18,824 [INFO] rect False 2025-10-23 16:46:18,824 [INFO] exec_nms True 2025-10-23 16:46:18,824 [INFO] nms_time_limit 60.0 2025-10-23 16:46:18,824 [INFO] conf_thres 0.001 2025-10-23 16:46:18,824 [INFO] iou_thres 0.7 2025-10-23 16:46:18,824 [INFO] conf_free True 2025-10-23 16:46:18,824 [INFO] seed 2 2025-10-23 16:46:18,824 [INFO] log_level INFO 2025-10-23 16:46:18,824 [INFO] save_dir ./runs_test/2025.10.23-16.46.18 2025-10-23 16:46:18,824 [INFO] enable_modelarts False 2025-10-23 16:46:18,824 [INFO] data_url 2025-10-23 16:46:18,824 [INFO] ckpt_url 2025-10-23 16:46:18,824 [INFO] train_url 2025-10-23 16:46:18,824 [INFO] data_dir /cache/data/ 2025-10-23 16:46:18,824 [INFO] is_parallel False 2025-10-23 16:46:18,824 [INFO] ckpt_dir /cache/pretrain_ckpt/ 2025-10-23 16:46:18,824 [INFO] data.dataset_name result 2025-10-23 16:46:18,824 [INFO] data.train_set /root/workspace/mindyolo/visdrone_COCO_format/train.txt 2025-10-23 16:46:18,824 [INFO] data.val_set /root/workspace/mindyolo/visdrone_COCO_format/val.txt 2025-10-23 16:46:18,824 [INFO] data.test_set /root/workspace/mindyolo/visdrone_COCO_format/val.txt 2025-10-23 16:46:18,824 [INFO] data.nc 12 2025-10-23 16:46:18,824 [INFO] data.names ['ignored regions', 'pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others'] 2025-10-23 16:46:18,824 [INFO] train_transforms.stage_epochs [5, 5] 2025-10-23 16:46:18,824 [INFO] train_transforms.trans_list [[{'func_name': 'mosaic', 'prob': 1.0}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'albumentations'}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}], [{'func_name': 'letterbox', 'scaleup': True}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'albumentations'}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}]] 2025-10-23 16:46:18,824 [INFO] data.test_transforms [{'func_name': 'letterbox', 'scaleup': False, 'only_image': True}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}] 2025-10-23 16:46:18,824 [INFO] data.num_parallel_workers 8 2025-10-23 16:46:18,824 [INFO] optimizer.optimizer momentum 2025-10-23 16:46:18,824 [INFO] optimizer.lr_init 0.01 2025-10-23 16:46:18,824 [INFO] optimizer.momentum 0.937 2025-10-23 16:46:18,824 [INFO] optimizer.nesterov True 2025-10-23 16:46:18,824 [INFO] optimizer.loss_scale 1.0 2025-10-23 16:46:18,824 [INFO] optimizer.warmup_epochs 3 2025-10-23 16:46:18,824 [INFO] optimizer.warmup_momentum 0.8 2025-10-23 16:46:18,824 [INFO] optimizer.warmup_bias_lr 0.1 2025-10-23 16:46:18,824 [INFO] optimizer.min_warmup_step 1000 2025-10-23 16:46:18,824 [INFO] optimizer.group_param yolov8 2025-10-23 16:46:18,824 [INFO] optimizer.gp_weight_decay 0.0005 2025-10-23 16:46:18,824 [INFO] optimizer.start_factor 1.0 2025-10-23 16:46:18,824 [INFO] optimizer.end_factor 0.01 2025-10-23 16:46:18,824 [INFO] loss.name YOLOv8Loss 2025-10-23 16:46:18,824 [INFO] loss.box 7.5 2025-10-23 16:46:18,824 [INFO] loss.cls 0.5 2025-10-23 16:46:18,824 [INFO] loss.dfl 1.5 2025-10-23 16:46:18,824 [INFO] loss.reg_max 16 2025-10-23 16:46:18,824 [INFO] epochs 10 2025-10-23 16:46:18,824 [INFO] sync_bn True 2025-10-23 16:46:18,824 [INFO] anchor_base False 2025-10-23 16:46:18,824 [INFO] opencv_threads_num 0 2025-10-23 16:46:18,824 [INFO] network.model_name yolov8 2025-10-23 16:46:18,824 [INFO] network.nc 80 2025-10-23 16:46:18,824 [INFO] network.reg_max 16 2025-10-23 16:46:18,824 [INFO] network.stride [8, 16, 32] 2025-10-23 16:46:18,824 [INFO] network.backbone [[-1, 1, 'ConvNormAct', [64, 3, 2]], [-1, 1, 'ConvNormAct', [128, 3, 2]], [-1, 3, 'C2f', [128, True]], [-1, 1, 'ConvNormAct', [256, 3, 2]], [-1, 6, 'C2f', [256, True]], [-1, 1, 'ConvNormAct', [512, 3, 2]], [-1, 6, 'C2f', [512, True]], [-1, 1, 'ConvNormAct', [1024, 3, 2]], [-1, 3, 'C2f', [1024, True]], [-1, 1, 'SPPF', [1024, 5]]] 2025-10-23 16:46:18,824 [INFO] network.head [[-1, 1, 'Upsample', ['None', 2, 'nearest']], [[-1, 6], 1, 'Concat', [1]], [-1, 3, 'C2f', [512]], [-1, 1, 'Upsample', ['None', 2, 'nearest']], [[-1, 4], 1, 'Concat', [1]], [-1, 3, 'C2f', [256]], [-1, 1, 'ConvNormAct', [256, 3, 2]], [[-1, 12], 1, 'Concat', [1]], [-1, 3, 'C2f', [512]], [-1, 1, 'ConvNormAct', [512, 3, 2]], [[-1, 9], 1, 'Concat', [1]], [-1, 3, 'C2f', [1024]], [[15, 18, 21], 1, 'YOLOv8Head', ['nc', 'reg_max', 'stride']]] 2025-10-23 16:46:18,824 [INFO] network.depth_multiple 0.33 2025-10-23 16:46:18,824 [INFO] network.width_multiple 0.5 2025-10-23 16:46:18,824 [INFO] network.max_channels 1024 2025-10-23 16:46:18,824 [INFO] overflow_still_update False 2025-10-23 16:46:18,824 [INFO] config ./configs/yolov8/yolov8s.yaml 2025-10-23 16:46:18,824 [INFO] rank 0 2025-10-23 16:46:18,824 [INFO] rank_size 1 2025-10-23 16:46:18,824 [INFO] 2025-10-23 16:46:18,898 [WARNING] Parse Model, args: nearest, keep str type 2025-10-23 16:46:18,909 [WARNING] Parse Model, args: nearest, keep str type 2025-10-23 16:46:18,984 [INFO] number of network params, total: 11.160279M, trainable: 11.140228M [WARNING] GE_ADPT(540183,7efcd8e26740,python3):2025-10-23-16:46:22.493.658 [mindspore/ops/kernel/ascend/acl_ir/op_api_exec.cc:169] GetAscendDefaultCustomPath] Checking whether the so exists or if permission to access it is available: /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize_vision/op_api/lib/libcust_opapi.so 2025-10-23 16:46:23,434 [INFO] Load checkpoint from [./runs/2025.10.23-14.48.02/weights/yolov8s-10_404.ckpt] success. 2025-10-23 16:46:23,437 [INFO] No dataset cache available, caching now... Scanning '/root/workspace/mindyolo/visdrone_COCO_format/val.cache' images and labels... 548 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████████████████████████████████████████████████████████████████| 548/548 [00:00<00:00, 3754.44it/s] 2025-10-23 16:46:23,595 [INFO] New cache created: /root/workspace/mindyolo/visdrone_COCO_format/val.cache.npy 2025-10-23 16:46:23,595 [INFO] Dataset caching success. 2025-10-23 16:46:23,597 [INFO] Dataloader num parallel workers: [8] 2025-10-23 16:46:23,607 [WARNING] unable to load fast_coco_eval api, use normal one instead Warning: tiling offset out of range, index: 32 ..2025-10-23 16:46:55,297 [INFO] Sample 35/1, time cost: 30512.14 ms. 2025-10-23 16:46:57,108 [INFO] Sample 35/2, time cost: 1722.38 ms. 2025-10-23 16:46:58,628 [INFO] Sample 35/3, time cost: 1420.95 ms. 2025-10-23 16:47:00,538 [INFO] Sample 35/4, time cost: 1809.91 ms. 2025-10-23 16:47:02,502 [INFO] Sample 35/5, time cost: 1865.30 ms. 2025-10-23 16:47:04,321 [INFO] Sample 35/6, time cost: 1718.46 ms. 2025-10-23 16:47:06,724 [INFO] Sample 35/7, time cost: 2303.35 ms. 2025-10-23 16:47:08,940 [INFO] Sample 35/8, time cost: 2117.25 ms. 2025-10-23 16:47:11,018 [INFO] Sample 35/9, time cost: 1978.46 ms. 2025-10-23 16:47:13,101 [INFO] Sample 35/10, time cost: 1982.41 ms. 2025-10-23 16:47:14,871 [INFO] Sample 35/11, time cost: 1671.05 ms. 2025-10-23 16:47:17,112 [INFO] Sample 35/12, time cost: 2140.79 ms. 2025-10-23 16:47:19,142 [INFO] Sample 35/13, time cost: 1930.53 ms. 2025-10-23 16:47:20,984 [INFO] Sample 35/14, time cost: 1741.35 ms. 2025-10-23 16:47:23,393 [INFO] Sample 35/15, time cost: 2307.50 ms. 2025-10-23 16:47:25,557 [INFO] Sample 35/16, time cost: 2060.89 ms. 2025-10-23 16:47:27,324 [INFO] Sample 35/17, time cost: 1664.00 ms. 2025-10-23 16:47:29,254 [INFO] Sample 35/18, time cost: 1824.31 ms. 2025-10-23 16:47:31,281 [INFO] Sample 35/19, time cost: 1921.78 ms. 2025-10-23 16:47:33,331 [INFO] Sample 35/20, time cost: 1942.85 ms. 2025-10-23 16:47:35,806 [INFO] Sample 35/21, time cost: 2368.87 ms. 2025-10-23 16:47:38,165 [INFO] Sample 35/22, time cost: 2255.00 ms. 2025-10-23 16:47:40,453 [INFO] Sample 35/23, time cost: 2182.96 ms. 2025-10-23 16:47:42,588 [INFO] Sample 35/24, time cost: 2029.14 ms. 2025-10-23 16:47:44,490 [INFO] Sample 35/25, time cost: 1796.02 ms. 2025-10-23 16:47:46,804 [INFO] Sample 35/26, time cost: 2207.91 ms. 2025-10-23 16:47:49,181 [INFO] Sample 35/27, time cost: 2270.69 ms. 2025-10-23 16:47:50,926 [INFO] Sample 35/28, time cost: 1638.70 ms. 2025-10-23 16:47:53,079 [INFO] Sample 35/29, time cost: 2046.37 ms. 2025-10-23 16:47:55,061 [INFO] Sample 35/30, time cost: 1875.28 ms. 2025-10-23 16:47:57,140 [INFO] Sample 35/31, time cost: 1972.00 ms. 2025-10-23 16:47:59,895 [INFO] Sample 35/32, time cost: 2647.24 ms. 2025-10-23 16:48:02,196 [INFO] Sample 35/33, time cost: 2191.50 ms. 2025-10-23 16:48:04,739 [INFO] Sample 35/34, time cost: 2434.77 ms. ..2025-10-23 16:48:20,509 [INFO] Sample 35/35, time cost: 15723.18 ms. 2025-10-23 16:48:20,509 [INFO] loading annotations into memory... 2025-10-23 16:48:20,639 [INFO] Done (t=0.13s) 2025-10-23 16:48:20,639 [INFO] creating index... 2025-10-23 16:48:20,650 [INFO] index created! 2025-10-23 16:48:20,650 [INFO] Loading and preparing results... 2025-10-23 16:48:21,106 [INFO] DONE (t=0.46s) 2025-10-23 16:48:21,106 [INFO] creating index... 2025-10-23 16:48:21,134 [INFO] index created! 2025-10-23 16:48:21,135 [INFO] Running per image evaluation... 2025-10-23 16:48:21,135 [INFO] Evaluate annotation type *bbox* 2025-10-23 16:48:31,087 [INFO] DONE (t=9.95s). 2025-10-23 16:48:31,087 [INFO] Accumulating evaluation results... 2025-10-23 16:48:31,996 [INFO] DONE (t=0.91s). 2025-10-23 16:48:31,996 [INFO] Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.019 2025-10-23 16:48:31,996 [INFO] Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.036 2025-10-23 16:48:31,996 [INFO] Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.019 2025-10-23 16:48:31,996 [INFO] Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016 2025-10-23 16:48:31,997 [INFO] Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.024 2025-10-23 16:48:31,997 [INFO] Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.056 2025-10-23 16:48:31,997 [INFO] Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.009 2025-10-23 16:48:31,997 [INFO] Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.049 2025-10-23 16:48:31,997 [INFO] Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.076 2025-10-23 16:48:31,997 [INFO] Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.057 2025-10-23 16:48:31,997 [INFO] Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.106 2025-10-23 16:48:31,997 [INFO] Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.161 2025-10-23 16:48:31,997 [INFO] Speed: 99.0/100.3/199.3 ms inference/NMS/total per 1024x1024 image at batch-size 16; 2025-10-23 16:48:31,997 [INFO] Testing completed, cost 133.18s.使用predict.py测试训练模型参数的结果并进行可视化推理,运行方式如下:python3 examples/finetune_visdrone/predict.py --config ./configs/yolov8/yolov8s.yaml --weight=./runs/2025.10.23-14.48.02/weights/yolov8s-120_404.ckpt --image_path ./visdrone_COCO_format/val/images/000000000001.jpg训练120个epoch后,模型的推理效果如下:五、小结本文详细阐述了在OrangePi AI Studio Pro上基于昇腾310P使用MindYolo框架实现YOLOv8模型训练与验证的完整流程,涵盖环境准备、数据集格式转换、模型训练参数配置及性能评估。
-
【朝推夜训】Ascend310p YOLOv8 NPU 训练和推理在华为昇思MindSpore框架的加持下,我们在OrangePi AI Studio Pro开发板上实现YOLOv8m模型的完整训练流程。在单块NPU上训练YOLOv8m模型,每轮训练7000张图像仅需6.92分钟,10轮训练总耗时约69分钟。从训练日志可以看出,模型损失值loss从第一轮的6.45逐步下降到最后一轮的2.58左右,表明模型训练效果良好。训练过程中,NPU的AICore利用率和内存占用情况都保持在合理水平,证明了Ascend 310P芯片在目标检测任务中的优异表现,其性能可与NVIDIA GPU相媲美,为开发者提供了另一种高效的AI计算平台选择。通过mindyolo开源仓库,其他开发者也可以复现这一成果并进行进一步的开发和优化。我们在昇腾310AI加速卡上使用昇思MindSpore把YOLOv8模型的NPU训练和推理给跑通了,性能不输于NVIDIA的GPU。OrangePi AI Stuido Pro与Atlas 300V Pro视频解析卡搭载是同款Ascend 310p芯片,总共是两块,每块有96G的内存,可以提供176TFlops的训练算力和352Tops的推理算力。上图是在单块NPU上训练yolov8m模型的AICore的利用率以及内存的占用情况,总共7000张图像每轮训练时长仅需6.92分钟:2025-09-24 16:47:11,931 [INFO] 2025-09-24 16:47:11,931 [INFO] Please check the above information for the configurations 2025-09-24 16:47:12,050 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:12,069 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:12,184 [INFO] number of network params, total: 25.896391M, trainable: 25.863252M 2025-09-24 16:47:16,786 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:16,807 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 16:47:16,920 [INFO] number of network params, total: 25.896391M, trainable: 25.863252M 2025-09-24 16:47:31,011 [INFO] ema_weight not exist, default pretrain weight is currently used. 2025-09-24 16:47:31,118 [INFO] Dataset Cache file hash/version check success. 2025-09-24 16:47:31,118 [INFO] Load dataset cache from [/home/orangepi/workspace/mindyolo/examples/finetune_visdrone/train.cache.npy] success. 2025-09-24 16:47:31,142 [INFO] Dataloader num parallel workers: [8] 2025-09-24 16:47:31,240 [INFO] Dataset Cache file hash/version check success. 2025-09-24 16:47:31,240 [INFO] Load dataset cache from [/home/orangepi/workspace/mindyolo/examples/finetune_visdrone/train.cache.npy] success. 2025-09-24 16:47:31,264 [INFO] Dataloader num parallel workers: [8] 2025-09-24 16:47:31,438 [INFO] 2025-09-24 16:47:31,445 [INFO] got 1 active callback as follows: 2025-09-24 16:47:31,445 [INFO] SummaryCallback() 2025-09-24 16:47:31,445 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :). 2025-09-24 16:50:38,076 [INFO] Epoch 1/10, Step 100/404, imgsize (640, 640), loss: 6.4507, lbox: 3.8446, lcls: 0.5687, dfl: 2.0375, cur_lr: 0.09257426112890244 2025-09-24 16:50:38,970 [INFO] Epoch 1/10, Step 100/404, step time: 1875.26 ms 2025-09-24 16:52:21,629 [INFO] Epoch 1/10, Step 200/404, imgsize (640, 640), loss: 4.8078, lbox: 3.0080, lcls: 0.4118, dfl: 1.3880, cur_lr: 0.08514851331710815 2025-09-24 16:52:21,653 [INFO] Epoch 1/10, Step 200/404, step time: 1026.83 ms 2025-09-24 16:54:04,347 [INFO] Epoch 1/10, Step 300/404, imgsize (640, 640), loss: 4.0795, lbox: 2.4281, lcls: 0.3466, dfl: 1.3048, cur_lr: 0.07772277295589447 2025-09-24 16:54:04,371 [INFO] Epoch 1/10, Step 300/404, step time: 1027.18 ms 2025-09-24 16:55:47,067 [INFO] Epoch 1/10, Step 400/404, imgsize (640, 640), loss: 3.8245, lbox: 2.1755, lcls: 0.3567, dfl: 1.2923, cur_lr: 0.07029703259468079 2025-09-24 16:55:47,091 [INFO] Epoch 1/10, Step 400/404, step time: 1027.19 ms 2025-09-24 16:55:52,087 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-1_404.ckpt 2025-09-24 16:55:52,087 [INFO] Epoch 1/10, epoch time: 8.34 min. 2025-09-24 16:57:34,759 [INFO] Epoch 2/10, Step 100/404, imgsize (640, 640), loss: 3.8083, lbox: 2.2584, lcls: 0.3404, dfl: 1.2095, cur_lr: 0.062162574380636215 2025-09-24 16:57:34,768 [INFO] Epoch 2/10, Step 100/404, step time: 1026.80 ms 2025-09-24 16:59:17,441 [INFO] Epoch 2/10, Step 200/404, imgsize (640, 640), loss: 3.7835, lbox: 2.2670, lcls: 0.3574, dfl: 1.1592, cur_lr: 0.05465514957904816 2025-09-24 16:59:17,450 [INFO] Epoch 2/10, Step 200/404, step time: 1026.82 ms 2025-09-24 17:01:00,127 [INFO] Epoch 2/10, Step 300/404, imgsize (640, 640), loss: 3.5251, lbox: 2.0144, lcls: 0.3210, dfl: 1.1898, cur_lr: 0.0471477210521698 2025-09-24 17:01:00,136 [INFO] Epoch 2/10, Step 300/404, step time: 1026.85 ms 2025-09-24 17:02:42,826 [INFO] Epoch 2/10, Step 400/404, imgsize (640, 640), loss: 3.5596, lbox: 2.0947, lcls: 0.3086, dfl: 1.1563, cur_lr: 0.03964029625058174 2025-09-24 17:02:42,835 [INFO] Epoch 2/10, Step 400/404, step time: 1026.99 ms 2025-09-24 17:02:47,745 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-2_404.ckpt 2025-09-24 17:02:47,745 [INFO] Epoch 2/10, epoch time: 6.93 min. 2025-09-24 17:04:30,489 [INFO] Epoch 3/10, Step 100/404, imgsize (640, 640), loss: 3.5524, lbox: 2.1004, lcls: 0.2938, dfl: 1.1582, cur_lr: 0.031090890988707542 2025-09-24 17:04:30,497 [INFO] Epoch 3/10, Step 100/404, step time: 1027.52 ms 2025-09-24 17:06:13,196 [INFO] Epoch 3/10, Step 200/404, imgsize (640, 640), loss: 3.8549, lbox: 2.2845, lcls: 0.3526, dfl: 1.2178, cur_lr: 0.02350178174674511 2025-09-24 17:06:13,205 [INFO] Epoch 3/10, Step 200/404, step time: 1027.07 ms 2025-09-24 17:07:55,875 [INFO] Epoch 3/10, Step 300/404, imgsize (640, 640), loss: 3.6236, lbox: 2.1016, lcls: 0.3113, dfl: 1.2106, cur_lr: 0.015912672504782677 2025-09-24 17:07:55,883 [INFO] Epoch 3/10, Step 300/404, step time: 1026.78 ms 2025-09-24 17:09:38,572 [INFO] Epoch 3/10, Step 400/404, imgsize (640, 640), loss: 3.5586, lbox: 2.0730, lcls: 0.3314, dfl: 1.1542, cur_lr: 0.008323564194142818 2025-09-24 17:09:38,581 [INFO] Epoch 3/10, Step 400/404, step time: 1026.97 ms 2025-09-24 17:09:43,528 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-3_404.ckpt 2025-09-24 17:09:43,529 [INFO] Epoch 3/10, epoch time: 6.93 min. 2025-09-24 17:11:26,211 [INFO] Epoch 4/10, Step 100/404, imgsize (640, 640), loss: 3.3767, lbox: 1.9760, lcls: 0.2928, dfl: 1.1079, cur_lr: 0.007029999978840351 2025-09-24 17:11:26,218 [INFO] Epoch 4/10, Step 100/404, step time: 1026.90 ms 2025-09-24 17:13:08,899 [INFO] Epoch 4/10, Step 200/404, imgsize (640, 640), loss: 3.4213, lbox: 1.9382, lcls: 0.3052, dfl: 1.1779, cur_lr: 0.007029999978840351 2025-09-24 17:13:08,908 [INFO] Epoch 4/10, Step 200/404, step time: 1026.89 ms 2025-09-24 17:14:51,583 [INFO] Epoch 4/10, Step 300/404, imgsize (640, 640), loss: 2.8313, lbox: 1.5666, lcls: 0.2380, dfl: 1.0267, cur_lr: 0.007029999978840351 2025-09-24 17:14:51,591 [INFO] Epoch 4/10, Step 300/404, step time: 1026.83 ms 2025-09-24 17:16:34,277 [INFO] Epoch 4/10, Step 400/404, imgsize (640, 640), loss: 3.2905, lbox: 1.9274, lcls: 0.2889, dfl: 1.0741, cur_lr: 0.007029999978840351 2025-09-24 17:16:34,285 [INFO] Epoch 4/10, Step 400/404, step time: 1026.94 ms 2025-09-24 17:16:39,232 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-4_404.ckpt 2025-09-24 17:16:39,232 [INFO] Epoch 4/10, epoch time: 6.93 min. 2025-09-24 17:18:21,892 [INFO] Epoch 5/10, Step 100/404, imgsize (640, 640), loss: 3.1534, lbox: 1.7844, lcls: 0.2581, dfl: 1.1109, cur_lr: 0.006039999891072512 2025-09-24 17:18:21,900 [INFO] Epoch 5/10, Step 100/404, step time: 1026.67 ms 2025-09-24 17:20:04,596 [INFO] Epoch 5/10, Step 200/404, imgsize (640, 640), loss: 3.1152, lbox: 1.7685, lcls: 0.2518, dfl: 1.0949, cur_lr: 0.006039999891072512 2025-09-24 17:20:04,604 [INFO] Epoch 5/10, Step 200/404, step time: 1027.04 ms 2025-09-24 17:21:47,284 [INFO] Epoch 5/10, Step 300/404, imgsize (640, 640), loss: 3.3179, lbox: 1.8412, lcls: 0.2888, dfl: 1.1880, cur_lr: 0.006039999891072512 2025-09-24 17:21:47,292 [INFO] Epoch 5/10, Step 300/404, step time: 1026.88 ms 2025-09-24 17:23:29,968 [INFO] Epoch 5/10, Step 400/404, imgsize (640, 640), loss: 3.2193, lbox: 1.8366, lcls: 0.2620, dfl: 1.1207, cur_lr: 0.006039999891072512 2025-09-24 17:23:29,976 [INFO] Epoch 5/10, Step 400/404, step time: 1026.84 ms 2025-09-24 17:23:34,954 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-5_404.ckpt 2025-09-24 17:23:34,954 [INFO] Epoch 5/10, epoch time: 6.93 min. 2025-09-24 17:25:17,530 [INFO] Epoch 6/10, Step 100/404, imgsize (640, 640), loss: 2.7642, lbox: 1.5834, lcls: 0.2164, dfl: 0.9643, cur_lr: 0.005049999803304672 2025-09-24 17:25:17,538 [INFO] Epoch 6/10, Step 100/404, step time: 1025.84 ms 2025-09-24 17:27:00,125 [INFO] Epoch 6/10, Step 200/404, imgsize (640, 640), loss: 2.6854, lbox: 1.4272, lcls: 0.2080, dfl: 1.0502, cur_lr: 0.005049999803304672 2025-09-24 17:27:00,134 [INFO] Epoch 6/10, Step 200/404, step time: 1025.96 ms 2025-09-24 17:28:42,720 [INFO] Epoch 6/10, Step 300/404, imgsize (640, 640), loss: 2.7541, lbox: 1.5028, lcls: 0.2171, dfl: 1.0342, cur_lr: 0.005049999803304672 2025-09-24 17:28:42,728 [INFO] Epoch 6/10, Step 300/404, step time: 1025.94 ms 2025-09-24 17:30:25,315 [INFO] Epoch 6/10, Step 400/404, imgsize (640, 640), loss: 2.8092, lbox: 1.5545, lcls: 0.2121, dfl: 1.0427, cur_lr: 0.005049999803304672 2025-09-24 17:30:25,323 [INFO] Epoch 6/10, Step 400/404, step time: 1025.95 ms 2025-09-24 17:30:30,293 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-6_404.ckpt 2025-09-24 17:30:30,294 [INFO] Epoch 6/10, epoch time: 6.92 min. 2025-09-24 17:32:12,881 [INFO] Epoch 7/10, Step 100/404, imgsize (640, 640), loss: 3.0997, lbox: 1.8226, lcls: 0.2402, dfl: 1.0369, cur_lr: 0.00406000018119812 2025-09-24 17:32:12,890 [INFO] Epoch 7/10, Step 100/404, step time: 1025.96 ms 2025-09-24 17:33:55,477 [INFO] Epoch 7/10, Step 200/404, imgsize (640, 640), loss: 2.8140, lbox: 1.5979, lcls: 0.2143, dfl: 1.0018, cur_lr: 0.00406000018119812 2025-09-24 17:33:55,485 [INFO] Epoch 7/10, Step 200/404, step time: 1025.96 ms 2025-09-24 17:35:38,072 [INFO] Epoch 7/10, Step 300/404, imgsize (640, 640), loss: 3.0294, lbox: 1.6439, lcls: 0.2544, dfl: 1.1310, cur_lr: 0.00406000018119812 2025-09-24 17:35:38,081 [INFO] Epoch 7/10, Step 300/404, step time: 1025.95 ms 2025-09-24 17:37:20,660 [INFO] Epoch 7/10, Step 400/404, imgsize (640, 640), loss: 2.8015, lbox: 1.5686, lcls: 0.2252, dfl: 1.0077, cur_lr: 0.00406000018119812 2025-09-24 17:37:20,669 [INFO] Epoch 7/10, Step 400/404, step time: 1025.88 ms 2025-09-24 17:37:25,643 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-7_404.ckpt 2025-09-24 17:37:25,644 [INFO] Epoch 7/10, epoch time: 6.92 min. 2025-09-24 17:39:08,227 [INFO] Epoch 8/10, Step 100/404, imgsize (640, 640), loss: 2.5091, lbox: 1.3373, lcls: 0.1711, dfl: 1.0007, cur_lr: 0.0030700000934302807 2025-09-24 17:39:08,236 [INFO] Epoch 8/10, Step 100/404, step time: 1025.92 ms 2025-09-24 17:40:50,818 [INFO] Epoch 8/10, Step 200/404, imgsize (640, 640), loss: 2.5926, lbox: 1.4141, lcls: 0.1923, dfl: 0.9863, cur_lr: 0.0030700000934302807 2025-09-24 17:40:50,826 [INFO] Epoch 8/10, Step 200/404, step time: 1025.91 ms 2025-09-24 17:42:33,392 [INFO] Epoch 8/10, Step 300/404, imgsize (640, 640), loss: 2.5341, lbox: 1.3811, lcls: 0.1869, dfl: 0.9660, cur_lr: 0.0030700000934302807 2025-09-24 17:42:33,400 [INFO] Epoch 8/10, Step 300/404, step time: 1025.74 ms 2025-09-24 17:44:15,994 [INFO] Epoch 8/10, Step 400/404, imgsize (640, 640), loss: 3.0024, lbox: 1.6379, lcls: 0.2284, dfl: 1.1361, cur_lr: 0.0030700000934302807 2025-09-24 17:44:16,002 [INFO] Epoch 8/10, Step 400/404, step time: 1026.02 ms 2025-09-24 17:44:20,974 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-8_404.ckpt 2025-09-24 17:44:20,975 [INFO] Epoch 8/10, epoch time: 6.92 min. 2025-09-24 17:46:03,561 [INFO] Epoch 9/10, Step 100/404, imgsize (640, 640), loss: 3.0890, lbox: 1.8395, lcls: 0.2321, dfl: 1.0174, cur_lr: 0.0020800000056624413 2025-09-24 17:46:03,569 [INFO] Epoch 9/10, Step 100/404, step time: 1025.94 ms 2025-09-24 17:47:46,157 [INFO] Epoch 9/10, Step 200/404, imgsize (640, 640), loss: 2.9621, lbox: 1.6608, lcls: 0.2360, dfl: 1.0652, cur_lr: 0.0020800000056624413 2025-09-24 17:47:46,166 [INFO] Epoch 9/10, Step 200/404, step time: 1025.96 ms 2025-09-24 17:49:28,755 [INFO] Epoch 9/10, Step 300/404, imgsize (640, 640), loss: 2.4801, lbox: 1.3320, lcls: 0.1753, dfl: 0.9728, cur_lr: 0.0020800000056624413 2025-09-24 17:49:28,763 [INFO] Epoch 9/10, Step 300/404, step time: 1025.97 ms 2025-09-24 17:51:11,359 [INFO] Epoch 9/10, Step 400/404, imgsize (640, 640), loss: 2.8075, lbox: 1.5971, lcls: 0.1995, dfl: 1.0109, cur_lr: 0.0020800000056624413 2025-09-24 17:51:11,367 [INFO] Epoch 9/10, Step 400/404, step time: 1026.03 ms 2025-09-24 17:51:16,330 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-9_404.ckpt 2025-09-24 17:51:16,331 [INFO] Epoch 9/10, epoch time: 6.92 min. 2025-09-24 17:52:58,913 [INFO] Epoch 10/10, Step 100/404, imgsize (640, 640), loss: 2.6278, lbox: 1.4529, lcls: 0.1860, dfl: 0.9889, cur_lr: 0.0010900000343099236 2025-09-24 17:52:58,921 [INFO] Epoch 10/10, Step 100/404, step time: 1025.90 ms 2025-09-24 17:54:41,521 [INFO] Epoch 10/10, Step 200/404, imgsize (640, 640), loss: 2.7550, lbox: 1.5724, lcls: 0.2083, dfl: 0.9742, cur_lr: 0.0010900000343099236 2025-09-24 17:54:41,529 [INFO] Epoch 10/10, Step 200/404, step time: 1026.08 ms 2025-09-24 17:56:24,125 [INFO] Epoch 10/10, Step 300/404, imgsize (640, 640), loss: 2.4470, lbox: 1.2448, lcls: 0.1758, dfl: 1.0263, cur_lr: 0.0010900000343099236 2025-09-24 17:56:24,133 [INFO] Epoch 10/10, Step 300/404, step time: 1026.03 ms 2025-09-24 17:58:06,727 [INFO] Epoch 10/10, Step 400/404, imgsize (640, 640), loss: 2.5783, lbox: 1.3733, lcls: 0.1848, dfl: 1.0202, cur_lr: 0.0010900000343099236 2025-09-24 17:58:06,736 [INFO] Epoch 10/10, Step 400/404, step time: 1026.02 ms 2025-09-24 17:58:11,744 [INFO] Saving model to ./runs/2025.09.24-16.47.11/weights/yolov8m-10_404.ckpt 2025-09-24 17:58:11,745 [INFO] Epoch 10/10, epoch time: 6.92 min. 2025-09-24 17:58:12,149 [INFO] End Train. 2025-09-24 17:58:12,561 [INFO] Training completed.以下是模型训练了10个epoch的使用NPU在测试集图片上的推理结果:2025-09-24 18:13:24,511 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 18:13:24,532 [WARNING] Parse Model, args: nearest, keep str type 2025-09-24 18:13:24,639 [INFO] number of network params, total: 25.896391M, trainable: 25.863252M 2025-09-24 18:13:29,405 [INFO] Load checkpoint from [/home/orangepi/workspace/mindyolo/runs/2025.09.24-16.47.11/weights/yolov8m-10_404.ckpt] success. 2025-09-24 18:13:53,915 [INFO] Predict result is: {'category_id': [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 4, 4, 5, 10, 4, 1, 4, 2, 4, 1, 5, 10, 4, 2, 4, 1], 'bbox': [[866.402, 359.922, 125.209, 179.961], [619.836, 379.246, 140.848, 229.434], [704.238, 192.678, 102.631, 112.359], [572.588, 189.689, 108.707, 103.76], [80.484, 471.75, 334.953, 243.844], [739.99, 15.987, 60.305, 60.944], [1179.242, 68.017, 143.637, 56.163], [1220.215, 154.843, 138.523, 76.782], [1217.559, 108.026, 140.516, 63.733], [822.475, 15.34, 56.744, 75.039], [621.438, 70.781, 19.938, 55.292], [1106.859, 128.463, 79.986, 95.99], [773.168, 90.047, 71.42, 95.293], [773.467, 88.951, 70.988, 95.924], [1122.158, 371.145, 48.12, 90.512], [1168.982, 2.274, 83.141, 77.081], [723.45, 65.277, 21.877, 51.017], [1145.906, 0.556, 76.467, 46.708], [672.513, 71.818, 25.857, 46.933], [488.816, 350.559, 107.844, 117.605], [672.778, 71.918, 26.172, 48.194], [1106.826, 128.612, 79.621, 96.239], [1058.831, 319.314, 35.087, 75.056], [1146.62, 0.365, 54.586, 48.643], [1124.963, 370.945, 42.359, 66.473], [1148.197, 1.046, 92.537, 51.581], [526.153, 87.349, 29.123, 37.91]], 'score': [0.93223, 0.92336, 0.90671, 0.90539, 0.84414, 0.83682, 0.83292, 0.75641, 0.74857, 0.74295, 0.72221, 0.63341, 0.62439, 0.5829, 0.50411, 0.48259, 0.42391, 0.42188, 0.42185, 0.36533, 0.29963, 0.29451, 0.29264, 0.28265, 0.26525, 0.2585, 0.25038]} 2025-09-24 18:13:53,915 [INFO] Speed: 24481.6/5.7/24487.3 ms inference/NMS/total per 640x640 image at batch-size 1; 2025-09-24 18:13:53,915 [INFO] Detect a image success. 2025-09-24 18:13:53,924 [INFO] Infer completed.模型训练和推理代码可以从mindyolo仓库上下载:https://github.com/mindspore-lab/mindyolo
-
华为云开发者空间基于昇腾NPU实现CT肺炎影像分割模型训练与推理本案例将介绍如何在华为云开发者空间的AI Notebook环境中,利用昇腾NPU 910B4硬件资源训练和推理一个用于CT肺炎影像分割的深度学习模型,涵盖从数据准备、预处理、模型构建、训练到推理可视化的完整深度学习工作流。首先,从OBS存储桶下载并解压了COVID-19 CT扫描数据集,该数据集包含原始CT扫描图像、肺部掩码、感染区域掩码以及肺部和感染区域的组合掩码。import os import zipfile # 下载数据集 if not os.path.exists('Covid-19.zip'): os.system('wget -q https://orangepi-ai-studio.obs.cn-north-4.myhuaweicloud.com/Covid-19.zip') # 解压数据集 if not os.path.exists('Covid-19'): zip_file = zipfile.ZipFile('Covid-19.zip') zip_file.extractall() zip_file.close() 读取数据集中的元数据并显示前5行:import pandas as pd data = pd.read_csv('Covid-19/metadata.csv') data.head() ct_scanlung_maskinfection_masklung_and_infection_mask0…/input/covid19-ct-scans/ct_scans/coronacases……/input/covid19-ct-scans/lung_mask/coronacase……/input/covid19-ct-scans/infection_mask/coron……/input/covid19-ct-scans/lung_and_infection_m…1…/input/covid19-ct-scans/ct_scans/coronacases……/input/covid19-ct-scans/lung_mask/coronacase……/input/covid19-ct-scans/infection_mask/coron……/input/covid19-ct-scans/lung_and_infection_m…2…/input/covid19-ct-scans/ct_scans/coronacases……/input/covid19-ct-scans/lung_mask/coronacase……/input/covid19-ct-scans/infection_mask/coron……/input/covid19-ct-scans/lung_and_infection_m…3…/input/covid19-ct-scans/ct_scans/coronacases……/input/covid19-ct-scans/lung_mask/coronacase……/input/covid19-ct-scans/infection_mask/coron……/input/covid19-ct-scans/lung_and_infection_m…4…/input/covid19-ct-scans/ct_scans/coronacases……/input/covid19-ct-scans/lung_mask/coronacase……/input/covid19-ct-scans/infection_mask/coron……/input/covid19-ct-scans/lung_and_infection_m…我们分别获取原始图像、肺部mask、感染mask、肺部和感染mask的文件路径:# 原始图像 ct_scan_sample_file = data.loc[0,'ct_scan'].replace('../input/covid19-ct-scans','Covid-19') # 肺部mask lung_mask_sample_file = data.loc[0,'lung_mask'].replace('../input/covid19-ct-scans','Covid-19') # 感染mask infection_mask_sample_file = data.loc[0,'infection_mask'].replace('../input/covid19-ct-scans','Covid-19') # 肺部和感染mask lung_and_infection_mask_sample_file = data.loc[0,'lung_and_infection_mask'].replace('../input/covid19-ct-scans','Covid-19') 安装nibabel库读取NIfTI格式的医学影像文件,使用matplotlib库进行可视化展示:!pip install nibabelimport numpy as np import nibabel as nib # 读取nifti文件 def read_nii_file(fileName): img = nib.load(fileName) img_data = img.get_fdata() img_data = np.rot90(np.array(img_data)) return img_data # 读取 ct_scan_imgs = read_nii_file(ct_scan_sample_file) lung_mas_imgs = read_nii_file(lung_mask_sample_file) infection_mask_imgs = read_nii_file(infection_mask_sample_file) lung_and_infection_mas_imgs = read_nii_file(lung_and_infection_mask_sample_file) # 查看大小 print(ct_scan_imgs.shape) print(lung_mas_imgs.shape) (512, 512, 301) (512, 512, 301) # 绘制 import matplotlib.pyplot as plt %matplotlib inline color_map = 'spring' layer_index = 180 fig = plt.figure(figsize=(20, 4)) plt.subplot(1, 4, 1) plt.imshow(ct_scan_imgs[:,:,layer_index], cmap='bone') plt.title('Original Image') plt.axis('off') plt.subplot(1,4,2) plt.imshow(ct_scan_imgs[:,:,layer_index], cmap='bone') mask_ = np.ma.masked_where(lung_mas_imgs[:,:,layer_index]== 0, lung_mas_imgs[:,:,layer_index]) plt.imshow(mask_, alpha=0.8, cmap=color_map) plt.title('Lung Mask') plt.axis('off') plt.subplot(1,4,3) plt.imshow(ct_scan_imgs[:,:,layer_index], cmap='bone') mask_ = np.ma.masked_where(infection_mask_imgs[:,:,layer_index]== 0, infection_mask_imgs[:,:,layer_index]) plt.imshow(mask_, alpha=0.8, cmap=color_map) plt.title('Infection Mask') plt.axis('off') plt.subplot(1,4,4) plt.imshow(ct_scan_imgs[:,:,layer_index], cmap='bone') mask_ = np.ma.masked_where(lung_and_infection_mas_imgs[:,:,layer_index]== 0, lung_and_infection_mas_imgs[:,:,layer_index]) plt.imshow(mask_, alpha=0.8, cmap=color_map) plt.title('Lung and Infection Mask') plt.axis('off') plt.show() 之后对数据进行标准化和归一化,划分训练集和测试集,并统一缩放到256x256的大小保存为npy文件。标准化x′=x−mean(x)σx'= \frac{x-mean(x)}{\sigma} x′=σx−mean(x)归一化x′=x−min(x)max(x)−min(x)x'= \frac{x-min(x)}{max(x)-min(x)} x′=max(x)−min(x)x−min(x)# 标准化 def standardize(data): # 计算均值 mean = data.mean() # 计算标准差 std = np.std(data) # 计算结果 standardized = (data - mean) / std return standardized # 归一化 def normalize(data): # 计算最大最小值 max_val = data.max() min_val = data.min() normalized = (data - min_val) / (max_val - min_val) return normalized std = standardize(ct_scan_imgs) normalize(std).max(),normalize(std).min() (1.0, 0.0) # 处理所有文件 import cv2 import glob train_file_list =[file_path.replace('../input/covid19-ct-scans','Covid-19') for file_path in data.loc[:,'ct_scan']] train_label_list = [file_path.replace('../input/covid19-ct-scans','Covid-19') for file_path in data.loc[:,'infection_mask']] train_file_list[:5], len(train_label_list), train_label_list[:5], len(train_file_list) (['Covid-19/ct_scans/coronacases_org_001.nii', 'Covid-19/ct_scans/coronacases_org_002.nii', 'Covid-19/ct_scans/coronacases_org_003.nii', 'Covid-19/ct_scans/coronacases_org_004.nii', 'Covid-19/ct_scans/coronacases_org_005.nii'], 20, ['Covid-19/infection_mask/coronacases_001.nii', 'Covid-19/infection_mask/coronacases_002.nii', 'Covid-19/infection_mask/coronacases_003.nii', 'Covid-19/infection_mask/coronacases_004.nii', 'Covid-19/infection_mask/coronacases_005.nii'], 20) from tqdm import tqdm for index in tqdm(range(len(train_file_list))): # 读取 img = nib.load(train_file_list[index]) mask = nib.load(train_label_list[index]) img_data = img.get_fdata() mask_data = mask.get_fdata().astype(np.uint8) # 标准化和归一化 std = standardize(img_data) normalized = normalize(std) # 分为训练数据和测试数据 if index < 17: save_dir = 'processed/train/' else: save_dir = 'processed/test/' # 遍历所有层,分层存入文件夹,存储路径格式:'processed/train/0/img_0.npy','processed/train/0/label_0.npy', layer_num = normalized.shape[-1] for i in range(layer_num): layer = normalized[:,:,i] mask = mask_data[:,:,i] # 缩放 layer = cv2.resize(layer, (256, 256)) mask = cv2.resize(mask, (256, 256), interpolation=cv2.INTER_NEAREST) # 创建文件夹 img_dir = save_dir + str(index) if not os.path.exists(img_dir): os.makedirs(img_dir) # 保存为npy文件 np.save(img_dir+'/img_'+str(i), layer) np.save(img_dir+'/label_'+str(i), mask) 100%|██████████| 20/20 [01:08<00:00, 3.44s/it] 同时采用imgaug库进行数据增强,包括图像的缩放、旋转和弹性变换等操作,以提升模型的泛化能力。!pip install imgaugimport imgaug as ia import imgaug.augmenters as iaa from torch.utils.data import Dataset from imgaug.augmentables.segmaps import SegmentationMapsOnImage class SegmentDataset(Dataset): def __init__(self,where='train',seq=None): # 获取数据 self.img_list = glob.glob('processed/{}/*/img_*'.format(where)) self.mask_list = glob.glob('processed/{}/*/img_*') # 数据增强pipeline self.seq = seq def __len__(self): # 返回数据大小 return len(self.img_list) def __getitem__(self, idx): # 获取具体每一个数据 # 获取图片 img_file = self.img_list[idx] mask_file = img_file.replace('img','label') img = np.load(img_file) # 获取mask mask = np.load(mask_file) # 如果需要数据增强 if self.seq: segmap = SegmentationMapsOnImage(mask, shape=mask.shape) img,mask = seq(image=img, segmentation_maps=segmap) # 直接获取数组内容 mask = mask.get_arr() # 灰度图扩张维度成张量 return np.expand_dims(img,0) , np.expand_dims(mask,0) # 数据增强处理流程 seq = iaa.Sequential([ iaa.Affine(scale=(0.8, 1.2), # 缩放 rotate=(-45, 45)), # 旋转 iaa.ElasticTransformation() # 变换 ]) 创建dataloader,开启8个线程一次加载16张图片进行处理。import torch import torch_npu from torch_npu.contrib import transfer_to_npu # 使用dataloader加载 batch_size = 16 num_workers = 8 train_dataset = SegmentDataset('train', seq) test_dataset = SegmentDataset('test', None) train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, num_workers=num_workers, shuffle=True) test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, num_workers=num_workers, shuffle=False) 我们可以查看单张图像增强后的效果:# 对同一个图片显示多次 fig = plt.figure(figsize=(12, 12)) for i in range(16): plt.subplot(4, 4, i+1) img , mask = train_dataset[101] plt.imshow(img[0], cmap='bone') mask_ = np.ma.masked_where(mask[0]== 0, mask[0]) plt.imshow(mask_, alpha=0.8, cmap="spring") plt.axis('off') plt.show() 数据准备完成后,我们开始从头构建Unet的网络结构。U-Net是一种经典的编码器-解码器结构,特别适用于医学图像分割任务。网络包含四个编码层和四个解码层,通过跳跃连接将编码器的特征图与解码器对应层进行融合,保留了丰富的空间信息。# 定义两次卷积操作 class ConvBlock(torch.nn.Module): def __init__(self,in_channels,out_channels): super().__init__() self.step = torch.nn.Sequential( # 第一次卷积 torch.nn.Conv2d(in_channels=in_channels,out_channels=out_channels,kernel_size=3,padding=1,stride=1), # ReLU torch.nn.ReLU(), # 第二次卷积 torch.nn.Conv2d(in_channels=out_channels,out_channels=out_channels,kernel_size=3,padding=1,stride=1), # ReLU torch.nn.ReLU() ) def forward(self,x): return self.step(x) class UNet(torch.nn.Module): def __init__(self): super().__init__() # 定义左侧编码器的操作 self.layer1 = ConvBlock(1,64) self.layer2 = ConvBlock(64,128) self.layer3 = ConvBlock(128,256) self.layer4 = ConvBlock(256,512) # 定义右侧解码器的操作 self.layer5 = ConvBlock(256+512,256) self.layer6 = ConvBlock(128+256,128) self.layer7 = ConvBlock(64+128,64) #最后一个卷积 self.layer8 = torch.nn.Conv2d(in_channels=64,out_channels=1,kernel_size=1,padding=0,stride=1) # 定一些其他操作 # 池化 self.maxpool = torch.nn.MaxPool2d(kernel_size=2) #上采样 self.upsample = torch.nn.Upsample(scale_factor=2,mode='bilinear') # sigmoid self.sigmoid = torch.nn.Sigmoid() def forward(self,x): # 对输入数据进行处理 # 定义下采样部分 # input:1X256x256, output: 64x256x256 x1 = self.layer1(x) # input:64x256x256, output: 64 x 128 x 128 x1_p = self.maxpool(x1) # input: 64 x 128 x 128 , output: 128 x 128 x 128 x2 = self.layer2(x1_p) # input:128 x 128 x 128 , output: 128 x 64 x 64 x2_p = self.maxpool(x2) # input: 128 x 64 x 64, output: 256 x 64 x 64 x3 = self.layer3(x2_p) #input:256 x 64 x 64, output: 256 x 32 x 32 x3_p = self.maxpool(x3) #input: 256 x 32 x 32, output: 512 x 32 x 32 x4 = self.layer4(x3_p) # 定义上采样 # input: 512 x 32 x 32,output: 512 x 64 x 64 x5 = self.upsample(x4) # 拼接,output: 768x 64 x 64 x5 = torch.cat([x5,x3],dim=1) # input: 768x 64 x 64,output: 256 x 64 x 64 x5 = self.layer5(x5) # input: 256 x 64 x 64,output: 256 x 128 x 128 x6 = self.upsample(x5) # 拼接,output: 384 x 128 x 128 x6 = torch.cat([x6,x2],dim=1) # input: 384 x 128 x 128, output: 128 x 128 x 128 x6 = self.layer6(x6) # input:128 x 128 x 128, output: 128 x 256 x 256 x7 = self.upsample(x6) # 拼接, output: 192 x 256 x256 x7 = torch.cat([x7,x1],dim=1) # input: 192 x 256 x256, output: 64 x 256 x 256 x7 = self.layer7(x7) # 最后一次卷积,input: 64 x 256 x 256, output: 1 x 256 x 256 x8 = self.layer8(x7) #sigmoid # x9= self.sigmoid(x8) return x8网络定义完成后我们可以安装torchsummary库将搭建好的模型可视化。!pip install torchsummary# 模型架构可视化 from torchsummary import summary # device device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = UNet().to(device) summary(model,(1, 256, 256)) [W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator()) ---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 64, 256, 256] 640 ReLU-2 [-1, 64, 256, 256] 0 Conv2d-3 [-1, 64, 256, 256] 36,928 ReLU-4 [-1, 64, 256, 256] 0 ConvBlock-5 [-1, 64, 256, 256] 0 MaxPool2d-6 [-1, 64, 128, 128] 0 Conv2d-7 [-1, 128, 128, 128] 73,856 ReLU-8 [-1, 128, 128, 128] 0 Conv2d-9 [-1, 128, 128, 128] 147,584 ReLU-10 [-1, 128, 128, 128] 0 ConvBlock-11 [-1, 128, 128, 128] 0 MaxPool2d-12 [-1, 128, 64, 64] 0 Conv2d-13 [-1, 256, 64, 64] 295,168 ReLU-14 [-1, 256, 64, 64] 0 Conv2d-15 [-1, 256, 64, 64] 590,080 ReLU-16 [-1, 256, 64, 64] 0 ConvBlock-17 [-1, 256, 64, 64] 0 MaxPool2d-18 [-1, 256, 32, 32] 0 Conv2d-19 [-1, 512, 32, 32] 1,180,160 ReLU-20 [-1, 512, 32, 32] 0 Conv2d-21 [-1, 512, 32, 32] 2,359,808 ReLU-22 [-1, 512, 32, 32] 0 ConvBlock-23 [-1, 512, 32, 32] 0 Upsample-24 [-1, 512, 64, 64] 0 Conv2d-25 [-1, 256, 64, 64] 1,769,728 ReLU-26 [-1, 256, 64, 64] 0 Conv2d-27 [-1, 256, 64, 64] 590,080 ReLU-28 [-1, 256, 64, 64] 0 ConvBlock-29 [-1, 256, 64, 64] 0 Upsample-30 [-1, 256, 128, 128] 0 Conv2d-31 [-1, 128, 128, 128] 442,496 ReLU-32 [-1, 128, 128, 128] 0 Conv2d-33 [-1, 128, 128, 128] 147,584 ReLU-34 [-1, 128, 128, 128] 0 ConvBlock-35 [-1, 128, 128, 128] 0 Upsample-36 [-1, 128, 256, 256] 0 Conv2d-37 [-1, 64, 256, 256] 110,656 ReLU-38 [-1, 64, 256, 256] 0 Conv2d-39 [-1, 64, 256, 256] 36,928 ReLU-40 [-1, 64, 256, 256] 0 ConvBlock-41 [-1, 64, 256, 256] 0 Conv2d-42 [-1, 1, 256, 256] 65 ================================================================ Total params: 7,781,761 Trainable params: 7,781,761 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.25 Forward/backward pass size (MB): 706.50 Params size (MB): 29.69 Estimated Total Size (MB): 736.44 ----------------------------------------------------------------random_input = torch.randn(1, 1, 256, 256).to(device) output = model(random_input) output.shapetorch.Size([1, 1, 256, 256]) 最后定义损失函数和优化器,编写模型的训练代码,在昇腾NPU上训练50轮,使用Adam优化器和BCEWithLogitsLoss损失函数,并通过ReduceLROnPlateau调度器动态调整模型的学习率,每轮训练结束后保存模型的最优权重。import time from torch.optim.lr_scheduler import ReduceLROnPlateau # 定义损失 loss_fn = torch.nn.BCEWithLogitsLoss() # 定义优化器 optimizer = torch.optim.Adam(model.parameters(), lr=1e-4) # 动态减少LR scheduler = ReduceLROnPlateau(optimizer, 'min') # 计算测试集的loss def check_test_loss(loader,model): loss = 0 # 不记录梯度 with torch.no_grad(): for i, (x, y) in enumerate(loader): # 图片 x = x.to(device,dtype=torch.float32) # 标签 y = y.to(device,dtype=torch.float32) # 预测值 y_pred = model(x) #计算损失 loss_batch = loss_fn(y_pred, y) loss += loss_batch return loss / len(loader) !pip install tensorboard progressbar# 使用tensorboard记录参数 from torch.utils.tensorboard import SummaryWriter # 使用progressbar打印进度 from progressbar import ProgressBar, Percentage, Bar, Timer, ETA, FileTransferSpeed # 记录变量 writer = SummaryWriter(log_dir='./log') # 打印进度 widgets = ['Progress: ', Percentage(), ' ', Bar('#'), ' ', Timer(), ' ', ETA(), ' ', FileTransferSpeed()] # 训练100个epoch EPOCH_NUM = 50 # 记录最好的测试acc best_test_loss = 10 # 保存训练结果 train_loss_results = [] test_loss_results = [] for epoch in range(EPOCH_NUM): # 获取批次图像 print('Epoch:{}'.format(epoch+1)) start_time = time.time() loss = 0 progress = ProgressBar(widgets=widgets) for (x, y) in progress(train_loader): # !!!每次update前清空梯度 model.zero_grad() # 获取数据 # 图片 x = x.to(device,dtype=torch.float32) # 标签 y = y.to(device,dtype=torch.float32) # 预测值 y_pred = model(x) #计算损失 loss_batch = loss_fn(y_pred, y) # 计算梯度 loss_batch.backward() optimizer.step() optimizer.zero_grad() # 记录每个batch的train loss loss_batch = loss_batch.detach().cpu() loss += loss_batch # 每个epoch的loss loss = loss / len(train_loader) # 如果降低LR:如果loss连续10个epoch不再下降,就减少LR scheduler.step(loss) # 计算测试集的loss test_loss = check_test_loss(test_loader, model) # tensorboard 记录 Loss/train writer.add_scalar('Loss/train', loss, epoch) # tensorboard 记录 Loss/test writer.add_scalar('Loss/test', test_loss, epoch) #保存信息 train_loss_results.append(loss.item()) test_loss_results.append(test_loss.item()) # 保存最新模型 torch.save(model.state_dict(), 'unet_latest.pt') # 记录最好的测试loss,并保存模型 if best_test_loss > test_loss: print('test loss improved from {:.4f} to {:.4f}'.format(best_test_loss, test_loss.item())) best_test_loss = test_loss # 保存模型 torch.save(model.state_dict(), 'unet_best.pt') Epoch:1 Progress: 0% | | Elapsed Time: 0:00:00 ETA: --:--:-- 0.00 B/s . Progress: 100% |##############| Elapsed Time: 0:00:29 Time: 0:00:29 7.17 B/s test loss improved from 50.0000 to 0.1825 Epoch:2 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.90 B/s test loss improved from 0.1825 to 0.1356 Epoch:3 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.88 B/s test loss improved from 0.1356 to 0.1190 Epoch:4 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.88 B/s test loss improved from 0.1190 to 0.1071 Epoch:5 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.99 B/s test loss improved from 0.1071 to 0.0826 Epoch:6 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.98 B/s test loss improved from 0.0826 to 0.0783 Epoch:7 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s Epoch:8 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.01 B/s test loss improved from 0.0783 to 0.0564 Epoch:9 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.01 B/s Epoch:10 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.02 B/s Epoch:11 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.96 B/s Epoch:12 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s Epoch:13 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.99 B/s test loss improved from 0.0564 to 0.0546 Epoch:14 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.99 B/s Epoch:15 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.89 B/s Epoch:16 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.99 B/s Epoch:17 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.01 B/s Epoch:18 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.01 B/s test loss improved from 0.0546 to 0.0502 Epoch:19 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.02 B/s Epoch:20 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s Epoch:21 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.98 B/s Epoch:22 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s test loss improved from 0.0502 to 0.0434 Epoch:23 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s Epoch:24 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.98 B/s Epoch:25 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.99 B/s Epoch:26 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.94 B/s Epoch:27 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.84 B/s Epoch:28 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.93 B/s Epoch:29 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.89 B/s Epoch:30 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.96 B/s Epoch:31 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.94 B/s Epoch:32 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.92 B/s Epoch:33 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.88 B/s Epoch:34 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.93 B/s Epoch:35 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.97 B/s Epoch:36 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.92 B/s Epoch:37 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.91 B/s Epoch:38 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.93 B/s Epoch:39 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.91 B/s Epoch:40 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.89 B/s Epoch:41 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.87 B/s Epoch:42 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.96 B/s Epoch:43 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.97 B/s Epoch:44 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.97 B/s test loss improved from 0.0434 to 0.0378 Epoch:45 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.96 B/s Epoch:46 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.91 B/s Epoch:47 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s Epoch:48 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.99 B/s Epoch:49 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 7.98 B/s Epoch:50 Progress: 100% |##############| Elapsed Time: 0:00:26 Time: 0:00:26 8.00 B/s 在模型的训练过程中,我们可以在另一个终端中查看NPU的利用率:训练结束后我们使用matplotlib绘制模型的损失曲线,观察在训练过程中loss的收敛情况。plt.plot(range(1, len(train_loss_results)+1), train_loss_results, label='train_loss') plt.plot(range(1, len(test_loss_results)+1), test_loss_results, label='test_loss') plt.title('Model Loss') plt.legend() 这里我们加载模型在训练过程中的最优权重在测试集上评估模型的训练效果:model.load_state_dict(torch.load('unet_best.pt')) model.eval() UNet( (layer1): ConvBlock( (step): Sequential( (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer2): ConvBlock( (step): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer3): ConvBlock( (step): Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer4): ConvBlock( (step): Sequential( (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer5): ConvBlock( (step): Sequential( (0): Conv2d(768, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer6): ConvBlock( (step): Sequential( (0): Conv2d(384, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer7): ConvBlock( (step): Sequential( (0): Conv2d(192, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() ) ) (layer8): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1)) (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (upsample): Upsample(scale_factor=2.0, mode='bilinear') (sigmoid): Sigmoid() ) class SegmentDataset(Dataset): def __init__(self,where='train',seq=None): # 获取数据 self.img_list =natsorted(glob.glob('processed/{}/*/img_*'.format(where))) self.mask_list =natsorted( glob.glob('processed/{}/*/img_*') ) # 数据增强pipeline self.seq = seq def __len__(self): # 返回数据大小 return len(self.img_list) def __getitem__(self, idx): # 获取具体每一个数据 # 获取图片 img_file = self.img_list[idx] mask_file = img_file.replace('img','label') img = np.load(img_file) # 获取mask mask = np.load(mask_file) # 如果需要数据增强 if self.seq: segmap = SegmentationMapsOnImage(mask, shape=mask.shape) img,mask = seq(image=img, segmentation_maps=segmap) # 直接获取数组内容 mask = mask.get_arr() # 灰度图扩张维度成张量 return np.expand_dims(img,0) , np.expand_dims(mask,0) !pip install natsortfrom natsort import natsorted # 使用dataloader加载 batch_size = 12 num_workers = 8 test_dataset = SegmentDataset('test',None) test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, num_workers=num_workers, shuffle=False) !pip install celluloidfrom tqdm import tqdm from celluloid import Camera from IPython.display import Image # 将每层画面制作成视频 fig = plt.figure(figsize=(10, 10)) camera = Camera(fig) # 遍历所有数据 index = 0 for x, y in tqdm(test_dataset): # 输出输入 input = torch.tensor([x]).to(device,dtype=torch.float32) # 推理 y_pred = model(input) # 获取mask mask_data = (y_pred.detach().cpu().numpy()[0][0] > 0.5) plt.subplot(1, 2, 1) plt.imshow(x[0], cmap='bone') mask_ = np.ma.masked_where(y[0] == 0, y[0]) plt.imshow(mask_, alpha=0.8, cmap="spring") plt.title('truth') plt.axis('off') plt.subplot(1, 2, 2) plt.imshow(x[0], cmap='bone') mask_ = np.ma.masked_where(mask_data == 0, mask_data) plt.imshow(mask_, alpha=0.8, cmap="spring") plt.title('prediction') plt.axis('off') camera.snap() index +=1 if index > 500: break animation = camera.animate() 0%| | 0/177 [00:00<?, ?it/s]/home/service/.local/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:151: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /pytorch/torch/csrc/utils/tensor_new.cpp:261.) return fn(*args, **kwargs) 100%|██████████| 177/177 [00:06<00:00, 29.07it/s] # convert the animation to a video animation.save('animation.gif', writer='imagemagick') Image(open('animation.gif','rb').read()) 可以看到,模型准确地分割出肺炎的病变区域,为医疗诊断提供决策支持。本案例充分体现了华为云昇腾AI平台在医学图像处理领域的强大计算能力和易用性,为医疗AI应用的开发提供了完整的实践范例。
-
破解目标检测中的“隐形杀手”:样本失衡难题攻克之道在目标检测领域,样本失衡是困扰工程师和研究者的“隐形杀手”。它普遍存在于现实场景中:背景区域远多于目标区域、常见类别目标(如行人、车辆)的样本量远超稀有类别(如交通锥、特定动物)。这种失衡会导致模型严重偏向多数类,对稀有目标的检测性能急剧下降,直接影响模型的实用价值。那么,如何有效攻克这一难题?以下是一些关键策略:数据层面:开源与增流重采样: 核心思路是调整训练数据的分布。过采样: 复制稀有目标样本或应用高级技巧(如SMOTE及其变种)在特征空间合成新样本。需警惕单纯复制导致的过拟合。欠采样: 随机或有策略地减少多数类样本(如背景或大类别)。优点是加速训练,但可能损失重要信息。通常结合其他方法使用。数据增强: 对少数类样本应用更积极、多样化的增强(旋转、缩放、裁剪、色彩变换、MixUp、CutMix等),显著增加其“曝光度”和多样性,提高模型鲁棒性。算法层面:加权与重构损失函数改造:Focal Loss: 这是针对样本失衡的里程碑式解决方案。它通过降低易分类样本(通常是多数类)的权重,让模型更关注难分类样本(通常是少数类或困难正样本),有效缓解失衡问题。代价敏感学习: 在损失函数中为不同类别的错误预测分配不同的惩罚权重。通常对误检稀有目标的惩罚远大于误检背景,迫使模型重视小类。模型结构调整: 针对高度失衡问题(如背景占绝对主导),可考虑两阶段策略(先粗筛疑似目标区域,再精细分类)或引入在线困难样本挖掘机制,让模型在训练中自动聚焦于那些分类困难的样本(常包含小目标)。评估与调优:洞察全局选用合适的评估指标: 单纯追求高mAP可能掩盖问题。需特别关注小类、稀有类的精确率、召回率以及F1-Score。mAP@0.5:0.95、P-R曲线能更全面反映模型在类别不平衡下的性能。消融实验: 严格测试不同策略(单独或组合)的效果,记录它们对各类别性能的影响,找到最优解。实用技巧:知己知彼: 开始前,务必可视化数据集,精确掌握每个类别的样本数量和分布特点。组合拳出击: 单一方法往往效果有限。结合使用数据增强(针对小类)、Focal Loss(或加权损失) 是目前最常见有效的组合。持续监控: 在验证集和测试集上,持续跟踪各个类别的单独表现,特别是那些你关心的关键少数类。拥抱新技术: Few-Shot Learning、自监督预训练、域适应等技术在缓解小样本问题方面展现出潜力,值得持续关注。小结:攻克目标检测样本失衡,本质上是对数据进行深度理解和智慧干预。需要在数据的源头(采样、增强)、模型学习的核心(损失函数)以及最终的评估环节(指标选择)进行系统性优化。没有“银弹”,但通过科学组合上述策略,并辅以细致的分析调优,我们完全能够显著提升模型在真实、复杂场景下的均衡检测能力,让“隐形杀手”无所遁形。
-
视觉多模态模型切分检测和边缘推理一、NanoOWL + SAHINanoOWL(边缘实时开放词汇目标检测模型)基于Vision Transformer架构,结合CLIP的图文对齐能力,可通过文本查询在图像中检测任意类别目标。我们可以结合SAHI框架使用“文本提示+图像切分”在Jetson Orin等嵌入式设备上实现低空目标的多模态检测和TensorRT推理。二、代码实现首先我们拉取官方代码仓库https://github.com/dusty-nv/jetson-containers并运行安装命令install.sh:git clone https://github.com/dusty-nv/jetson-containers bash jetson-containers/install.sh之后使用jetson-containers run和autotag命令自动提取并构建兼容的容器:jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl) 我们在终端中查看容器的ID并将容器中nanoowl拷贝到自定义目录下/home/vsuav/workspace/vit:sudo docker ps -asudo docker cp af063d738879:/opt/nanoowl /home/vsuav/workspace/vit将/home/vsuav/workspace/vit/nanoowl目录及其内部所有文件和子目录的所有者和所属组都设置为当前登录用户,从而确保我们可以正常访问和修改这些文件。sudo chown -R $(whoami):$(whoami) /home/vsuav/workspace/vit/nanoowl运行jetson-containers run命令并指定--workdir参数,将nanoowl目录挂载到容器中,设置容器的名称为NanoOWL:jetson-containers run -v /home/vsuav/workspace/vit/nanoowl:/opt/nanoowl --name NanoOWL --workdir /opt/nanoowl $(autotag nanoowl) 运行docker start NanoOWL启动容器并进入容器的终端:sudo docker start NanoOWL sudo docker exec -it NanoOWL bash 我们在容器内部使用pip3安装python库sahi并创建main.pypip3 install sahi -i https://pypi.tuna.tsinghua.edu.cn/simple我们在examples/owl_predict.py基础上添加SAHI切分检测的逻辑,将无人机拍摄的高清大图切分为640x640和1280x1280的子图并叠加原图和类别名称的编码信息送入模型进行预测,最后把推理结果映射到原图上使用GreedyNMM进行合并后处理,完整代码如下:import os import cv2 import PIL.Image import numpy as np from sahi.slicing import get_slice_bboxes from nanoowl.owl_predictor import OwlPredictor from sahi.postprocess.utils import ObjectPrediction from sahi.postprocess.combine import GreedyNMMPostprocess class OWL_SAHI: def __init__(self, model_path, label_list, OBJ_THRESH, NMS_THRESH, overlap_ratio, slice, slice_scales): self.model_path = model_path if label_list != []: self.label_list = label_list else: self.label_list = [""] self.OBJ_THRESH = OBJ_THRESH self.NMS_THRESH = NMS_THRESH self.overlap_ratio = overlap_ratio self.slice = slice self.slice_scales = slice_scales self.predictor = OwlPredictor( "google/owlvit-base-patch32", image_encoder_engine = self.model_path ) self.text_encodings = self.predictor.encode_text(self.label_list) self.postprocess = GreedyNMMPostprocess( match_threshold = self.NMS_THRESH, match_metric = "IOS", class_agnostic = False, ) def getImageSlices(self, image): img_height, img_width = image.shape[:2] slice_bboxes = [] if self.slice: for slice_scale in self.slice_scales: slice_bboxe = get_slice_bboxes( image_height = img_height, image_width = img_width, auto_slice_resolution = True, slice_height = slice_scale[1], slice_width = slice_scale[0], overlap_height_ratio = self.overlap_ratio, overlap_width_ratio = self.overlap_ratio, ) slice_bboxes.extend(slice_bboxe) slice_bboxes.append([0, 0, img_width, img_height]) else: slice_bboxes = [[0, 0, img_width, img_height]] img_batch = [] for bbox in slice_bboxes: l, t, r, b = bbox img_batch.append(image[t:b, l:r]) return img_batch, slice_bboxes def predict(self, image_path): image = cv2.imread(image_path) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) img_batch, slice_bboxes = self.getImageSlices(image) all_boxes = [] for img, slice_bbox in zip(img_batch, slice_bboxes): img_pil = PIL.Image.fromarray(img) output = self.predictor.predict( image = img_pil, text = self.label_list, text_encodings = self.text_encodings, threshold = self.OBJ_THRESH, pad_square = True ) boxes = output.boxes.cpu().numpy() if boxes.shape[0] > 0: boxes[:, 0] = boxes[:, 0] + slice_bbox[0] boxes[:, 1] = boxes[:, 1] + slice_bbox[1] boxes[:, 2] = boxes[:, 2] + slice_bbox[0] boxes[:, 3] = boxes[:, 3] + slice_bbox[1] boxes = boxes.astype(np.int32).tolist() for i in range(len(boxes)): obj_item = ObjectPrediction( bbox = boxes[i], score = float(output.scores[i]), category_id = int(output.labels[i]) ) all_boxes.append(obj_item) if len(all_boxes) > 0: all_boxes = self.postprocess(all_boxes) return all_boxes if __name__ == "__main__": model_path = "./data/owl_image_encoder_patch32.engine" label_list = ["car", "tower"] OBJ_THRESH = 0.1 NMS_THRESH = 0.5 overlap_ratio = 0.25 slice = True slice_scales = [[640, 640], [1280, 1280]] owl_sahi = OWL_SAHI(model_path, label_list, OBJ_THRESH, NMS_THRESH, overlap_ratio, slice, slice_scales) images = os.listdir("images") for image_file in images: print(image_file) image_path = os.path.join("images", image_file) all_boxes_processed = owl_sahi.predict(image_path) image = cv2.imread(image_path) for box in all_boxes_processed: xmin, ymin, xmax, ymax = box.bbox.to_xyxy() score = box.score.value clsse = box.category.id cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0), 4) cv2.putText(image, '{0} {1:.2f}'.format(label_list[clsse], score), (xmin, ymin - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 4, cv2.LINE_AA) cv2.imwrite(f"output/{image_file}", image) 三、小结本文介绍了基于NanoOWL和SAHI框架的视觉多模态模型切分检测方案,通过结合文本提示与图像切分技术,在Jetson Orin等边缘设备上实现开放词汇目标检测。该方案利用Vision Transformer架构和CLIP图文对齐能力,支持任意类别目标检测,并通过SAHI进行图像切片处理与后处理合并,提升检测精度与效率,适用于低空无人机目标检测等边缘推理场景。
-
【朝推夜训】松材线虫病高清图片切分检测我们以 Jetson Orin Nano 为例,介绍如何使用Python在资源受限的嵌入式设备上实现高清大图切分检测。一、模型导出1. 安装 Cmake,创建 24G swap 空间模型导出时依赖更高版本的Cmake,这里我们直接编译安装:sudo apt update sudo apt install libssl-dev git clone -b v3.25.1 https://github.com/Kitware/CMake.git cd CMake ./bootstrap && make && sudo make install cmake --version交换空间是操作系统用来拓展可用内存的一种机制,可以在内存不足的情况下继续运行,避免程序崩溃或者系统卡死,但是交换空间的访问速度远低于物理内存!禁用Jetson设备上的ZRAM交换配置:ZRAM会将内存页面压缩并存储在内存中,以减少对磁盘的依赖。sudo systemctl disable nvzramconfig使用fallocate创建一个24GB大小的文件,位于/var/24GB.swap路径。sudo fallocate -l 24G /var/24GB.swap设置交换空间格式sudo mkswap /var/24GB.swap启用交换空间sudo swapon /var/24GB.swap永久自启交换空间echo "/var/24GB.swap none swap sw 0 0" | sudo tee -a /etc/fstab重启系统后,系统交换空间增加至24GB:sudo reboot 2. 安装 ultralytics,创建 TensorRT 软连接pip install ultralytics pip install tqdm pandas pip install onnx==1.12.0 onnxslim==0.1.65 protobuf==3.20.1 pip install onnx-simplifier==0.3.10 pip install /home/vsuav/Downloads/onnxruntime_gpu-1.12.1-cp38-cp38-linux_aarch64.whl导出TensorRT模型时依赖其Python安装包,一般在系统Python目录下,以Jetson Orin Nano为例,我们可以建立软连接指向TensorRT安装路径:sudo ln -s /usr/lib/python3.8/dist-packages/tensorrt* /home/vsuav/miniconda3/envs/py38/lib/python3.8/site-packages3. 导出 TensorRT FP16 精度的模型from ultralytics import YOLO model = YOLO("yolov8-1_640x640_amd64_fp32.pt") model.export( format="engine", workspace=4, imgsz=640, half=True, device=0, batch=1 ) 二、切分检测1. 安装 SAHI 库pip install sahi==0.11.18这里我们使用0.11.18版本,修改/home/vsuav/miniconda3/envs/py38/lib/python3.8/site-packages/sahi/models/yolov8.py文件,注释掉第33行代码使其能够加载导出的Engine模型。class Yolov8DetectionModel(DetectionModel): def check_dependencies(self) -> None: check_requirements(["ultralytics"]) def load_model(self): """ Detection model is initialized and set to self.model. """ from ultralytics import YOLO try: model = YOLO(self.model_path) # model.to(self.device) self.set_model(model) except Exception as e: raise TypeError("model_path is not a valid yolov8 model path: ", e) 2. 运行检测代码加载模型import cv2 import numpy as np import matplotlib.pyplot as plt from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction detection_model = AutoDetectionModel.from_pretrained( model_type='yolov8', model_path="yolov8-1_640x640_amd64_fp16.engine", confidence_threshold=0.45 ) WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'. Loading yolov8-1_640x640_amd64_fp16.engine for TensorRT inference... [09/04/2025-11:27:15] [TRT] [I] Loaded engine size: 51 MiB [09/04/2025-11:27:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +616, GPU +757, now: CPU 1052, GPU 5537 (MiB) [09/04/2025-11:27:17] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +49, now: CPU 0, GPU 49 (MiB) [09/04/2025-11:27:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +29, now: CPU 1001, GPU 5519 (MiB) [09/04/2025-11:27:18] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +27, now: CPU 0, GPU 76 (MiB) 图片切分检测image_path = "28c56c0b-3ff7-4997-88b3-5a8330f7ea88.jpeg" result = get_sliced_prediction( image_path, detection_model, slice_height = 640, slice_width = 640, overlap_height_ratio = 0.2, overlap_width_ratio = 0.2, perform_standard_pred = True, postprocess_class_agnostic = True, postprocess_match_threshold = 0.55, ) Performing prediction on 6 slices. Loading yolov8-1_640x640_amd64_fp16.engine for TensorRT inference... [09/04/2025-11:27:20] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value. [09/04/2025-11:27:20] [TRT] [I] Loaded engine size: 51 MiB [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +32, now: CPU 1581, GPU 6577 (MiB) [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +50, now: CPU 0, GPU 126 (MiB) [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +6, now: CPU 1530, GPU 6537 (MiB) [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +27, now: CPU 1, GPU 153 (MiB) 导出检测结果result.export_visuals(export_dir="output/", file_name="sliced_result") result_img_split = cv2.imread("output/sliced_result.png") plt.imshow(result_img_split[:, :, ::-1]) plt.axis('off') plt.show() 三、小结该方案成功在Jetson Orin Nano设备上运行,能够有效处理高清大图的松材线虫病检测任务,在保证检测精度的同时充分利用了嵌入式设备的硬件资源,为林业病虫害防治提供了实用的技术方案。
上滑加载中
推荐直播
-
HDC深度解读系列 - Serverless与MCP融合创新,构建AI应用全新智能中枢2025/08/20 周三 16:30-18:00
张昆鹏 HCDG北京核心组代表
HDC2025期间,华为云展示了Serverless与MCP融合创新的解决方案,本期访谈直播,由华为云开发者专家(HCDE)兼华为云开发者社区组织HCDG北京核心组代表张鹏先生主持,华为云PaaS服务产品部 Serverless总监Ewen为大家深度解读华为云Serverless与MCP如何融合构建AI应用全新智能中枢
回顾中 -
关于RISC-V生态发展的思考2025/09/02 周二 17:00-18:00
中国科学院计算技术研究所副所长包云岗教授
中科院包云岗老师将在本次直播中,探讨处理器生态的关键要素及其联系,分享过去几年推动RISC-V生态建设实践过程中的经验与教训。
回顾中 -
一键搞定华为云万级资源,3步轻松管理企业成本2025/09/09 周二 15:00-16:00
阿言 华为云交易产品经理
本直播重点介绍如何一键续费万级资源,3步轻松管理成本,帮助提升日常管理效率!
回顾中
热门标签