-
RK3588部署CNN-LSTM驾驶行为识别模型CNN(卷积神经网络)擅长提取图像的空间特征,LSTM(长短期记忆网络)则擅长处理序列数据的时间特征。首先使用CNN提取视频每一帧特征,之后将提取出的所有特征送入LSTM捕捉视频中的时空特征并对视频特征序列进行分类,实现正常驾驶、闭眼、打哈欠、打电话、左顾右盼5种驾驶行为的识别。一. 模型训练我们在ModelArts创建Notebook完成模型的训练,使用规格是GPU: 1*Pnt1(16GB)|CPU: 8核 64GB,镜像为tensorflow_2.1.0-cuda_10.1-py_3.7-ubuntu_18.04,首先下载数据集:import os import moxing as mox if not os.path.exists('fatigue_driving'): mox.file.copy_parallel('obs://modelbox-course/fatigue_driving', 'fatigue_driving') if not os.path.exists('rknn_toolkit2-2.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl'): mox.file.copy_parallel('obs://modelbox-course/rknn_toolkit2-2.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', 'rknn_toolkit2-2.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl') 该数据集包含1525段视频,总共有5个类别:0:正常驾驶、1:闭眼、2:打哈欠、3:打电话、4:左顾右盼我们从原视频中裁剪出主驾驶位画面,并将画面缩放到特征提取网络的输入大小:def crop_driving_square(frame): h, w = frame.shape[:2] start_x = w // 2 end_x = w start_y = 0 end_y = h return frame[start_y:end_y, start_x:end_x] 使用在imagenet上预训练的MobileNetV2网络作为卷积基创建并保存图像特征提取器:def get_feature_extractor(): feature_extractor = keras.applications.mobilenet_v2.MobileNetV2( weights = 'imagenet', include_top = False, pooling = 'avg', input_shape = (IMG_SIZE, IMG_SIZE, 3) ) preprocess_input = keras.applications.mobilenet_v2.preprocess_input inputs = keras.Input((IMG_SIZE, IMG_SIZE, 3)) preprocessed = preprocess_input(inputs) outputs = feature_extractor(preprocessed) model = keras.Model(inputs, outputs, name = 'feature_extractor') return model feature_extractor = get_feature_extractor() feature_extractor.save('feature_extractor') feature_extractor.summary() Model: "feature_extractor" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 256, 256, 3)] 0 _________________________________________________________________ tf_op_layer_truediv (TensorF [(None, 256, 256, 3)] 0 _________________________________________________________________ tf_op_layer_sub (TensorFlowO [(None, 256, 256, 3)] 0 _________________________________________________________________ mobilenetv2_1.00_224 (Model) (None, 1280) 2257984 ================================================================= Total params: 2,257,984 Trainable params: 2,223,872 Non-trainable params: 34,112 设置网络的输入大小为256x256,每隔6帧截取一帧提取视频的图像特征,特征向量的大小为1280,最终得到每个视频的特征序列,序列的最大长度为40,不足用0补齐:def load_video(file_name): cap = cv2.VideoCapture(file_name) frame_interval = 6 frames = [] count = 0 while True: ret, frame = cap.read() if not ret: break if count % frame_interval == 0: frame = crop_driving_square(frame) frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE)) frame = frame[:, :, [2, 1, 0]] frames.append(frame) count += 1 return np.array(frames) def load_data(videos, labels): video_features = [] for video in tqdm(videos): frames = load_video(video) counts = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if counts < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - counts # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 获取前MAX_SEQUENCE_LENGTH帧画面 frames = frames[:MAX_SEQUENCE_LENGTH, :] # 批量提取图像特征 video_feature = feature_extractor.predict(frames) video_features.append(video_feature) return np.array(video_features), np.array(labels) video_features, classes = load_data(videos, labels) video_features.shape, classes.shape((1525, 40, 1280), (1525,)) 总共提取了1525个视频的特征序列,按照8:2的比例划分训练集和测试集(batchsize的大小设为16):batch_size = 16 dataset = tf.data.Dataset.from_tensor_slices((video_features, classes)) dataset = dataset.shuffle(len(videos)) test_count = int(len(videos) * 0.2) train_count = len(videos) - test_count dataset_train = dataset.skip(test_count).cache().repeat() dataset_test = dataset.take(test_count).cache().repeat() train_dataset = dataset_train.shuffle(train_count).batch(batch_size) test_dataset = dataset_test.shuffle(test_count).batch(batch_size) train_dataset, train_count, test_dataset, test_count(<BatchDataset shapes: ((None, 40, 1280), (None,)), types: (tf.float32, tf.int64)>, 1220, <BatchDataset shapes: ((None, 40, 1280), (None,)), types: (tf.float32, tf.int64)>, 305) 之后创建LSTM提取视频特征序列的时间信息送入Dense分类器,模型的定义如下:def video_cls_model(class_vocab): # 类别数量 classes_num = len(class_vocab) # 定义模型 model = keras.Sequential([ layers.Input(shape=(MAX_SEQUENCE_LENGTH, NUM_FEATURES)), layers.LSTM(64, return_sequences=True), layers.Flatten(), layers.Dense(classes_num, activation='softmax') ]) # 编译模型 model.compile(optimizer = keras.optimizers.Adam(1e-5), loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'] ) return model # 模型实例化 model = video_cls_model(np.unique(labels)) # 保存检查点 checkpoint = keras.callbacks.ModelCheckpoint(filepath='best.h5', monitor='val_loss', save_weights_only=True, save_best_only=True, verbose=1, mode='min') # 模型结构 model.summary() 网络的输入大小为(N, 40, 1280),使用softmax进行激活,输出5个类别的概率:Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm (LSTM) (None, 40, 64) 344320 _________________________________________________________________ flatten (Flatten) (None, 2560) 0 _________________________________________________________________ dense (Dense) (None, 5) 12805 ================================================================= Total params: 357,125 Trainable params: 357,125 Non-trainable params: 0 _________________________________________________________________实验表明模型训练300个Epoch基本收敛:history = model.fit(train_dataset, epochs = 300, steps_per_epoch = train_count // batch_size, validation_steps = test_count // batch_size, validation_data = test_dataset, callbacks=[checkpoint]) plt.plot(history.epoch, history.history['loss'], 'r', label='loss') plt.plot(history.epoch, history.history['val_loss'], 'g--', label='val_loss') plt.title('LSTM') plt.xlabel('Epoch') plt.ylabel('Loss') plt.legend() plt.plot(history.epoch, history.history['accuracy'], 'r', label='acc') plt.plot(history.epoch, history.history['val_accuracy'], 'g--', label='val_acc') plt.title('LSTM') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() 加载模型最优权重,模型在测试集上的分类准确率为95.8%,保存为saved_model格式:model.load_weights('best.h5') model.evaluate(dataset.batch(batch_size)) model.save('saved_model') 96/96 [==============================] - 0s 5ms/step - loss: 0.2169 - accuracy: 0.9580 [0.21687692414949802, 0.9580328] 二、模型转换首先将图像特征提取器feature_extractor转为tflite格式,并开启模型量化:import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model('feature_extractor') converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] converter.post_training_quantize = True # 模型量化 tflite_model = converter.convert() with open('mbv2.tflite', 'wb') as f: f.write(tflite_model) 再将视频序列分类模型转为onnx格式,由于lstm参数量较少,不需要进行量化:python -m tf2onnx.convert --saved-model saved_model --output lstm.onnx --opset 12 最后导出RKNN格式的模型,可根据需要设置target_platform为rk3568/rk3588:from rknn.api import RKNN rknn = RKNN(verbose=False) rknn.config(target_platform="rk3588") rknn.load_tflite(model="mbv2.tflite") rknn.build(do_quantization=False) rknn.export_rknn('mbv2.rknn') rknn.release() rknn = RKNN(verbose=False) rknn.config(target_platform="rk3588") rknn.load_onnx( model="lstm.onnx", inputs=['input_3'], # 输入节点名称 input_size_list=[[1, 40, 1280]] # 固定输入尺寸 ) rknn.build(do_quantization=False) rknn.export_rknn('lstm.rknn') rknn.release() 三、模型部署我们在RK3588上部署MobileNetV2和LSTM模型,以下是板侧的推理代码:import os import cv2 import glob import shutil import imageio import numpy as np from IPython.display import Image from rknnlite.api import RKNNLite MAX_SEQUENCE_LENGTH = 40 IMG_SIZE = 256 NUM_FEATURES = 1280 def crop_driving_square(img): h, w = img.shape[:2] start_x = w // 2 end_x = w start_y = 0 end_y = h result = img[start_y:end_y, start_x:end_x] return result def load_video(file_name): cap = cv2.VideoCapture(file_name) # 每隔多少帧抽取一次 frame_interval = 6 frames = [] count = 0 while True: ret, frame = cap.read() if not ret: break # 每隔frame_interval帧保存一次 if count % frame_interval == 0: # 中心裁剪 frame = crop_driving_square(frame) # 缩放 frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE)) # BGR -> RGB [0,1,2] -> [2,1,0] frame = frame[:, :, [2, 1, 0]] frames.append(frame) count += 1 cap.release() return np.array(frames).astype(np.uint8) # 获取视频特征序列 def getVideoFeat(frames): frames_count = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if frames_count < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - frames_count # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 取前MAX_SEQ_LENGTH帧 frames = frames[:MAX_SEQUENCE_LENGTH,:] frames = frames.astype(np.float32) # 提取视频每一帧特征 feats = [] for frame in frames: frame = np.expand_dims(frame, axis=0) result = rknn_lite_mbv2.inference(inputs=[frame]) feats.append(result[0]) return feats rknn_lite_mbv2 = RKNNLite() rknn_lite_lstm = RKNNLite() rknn_lite_mbv2.load_rknn('model/mbv2.rknn') rknn_lite_lstm.load_rknn('model/lstm.rknn') rknn_lite_mbv2.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) rknn_lite_lstm.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) files = glob.glob("video/*.mp4") for video_path in files: label_to_name = {0:'正常驾驶', 1:'闭眼', 2:'打哈欠', 3:'打电话', 4:'左顾右盼'} frames = load_video(video_path) frames = frames[:MAX_SEQUENCE_LENGTH] imageio.mimsave('test.gif', frames, durations=10, loop=0) display(Image(open('test.gif', 'rb').read())) feats = getVideoFeat(frames) feats = np.concatenate(feats, axis=0) feats = np.expand_dims(feats, axis=0) preds = rknn_lite_lstm.inference(inputs=[feats])[0][0] for i in np.argsort(preds)[::-1][:5]: print('{}: {}%'.format(label_to_name[i], round(preds[i]*100, 2))) rknn_lite_mbv2.release() rknn_lite_lstm.release() 最终的视频识别效果如下:🚀四、本文小结本文详细阐述了基于RK3588平台的CNN-LSTM驾驶行为识别模型全流程,利用MobileNetV2提取图像的空间特征、LSTM处理视频的时序特征完成对正常驾驶、闭眼、打哈欠、打电话和左顾右盼5类驾驶行为的精准识别,在ModelArts上训练达到95.8%分类准确率,并分别将mbv2.tflite和lstm.onnx转换为RKNN格式实现板侧的高效推理部署。
-
在CUDA编程中,一个CUDA Kernel是由众多线程(threds)组成,而这些线程又可以被组织成一个或多个block块。在同一线程块中,线程ID是从0开始连续编号的,可以通过内置变量threadIdx来获取:// 获取本线程的索引,blockIdx 指的是线程块的索引,blockDim 指的是线程块的大小,threadIdx 指的是本线程块中的线程索引 int tid = blockIdx.x * blockDim.x + threadIdx.x; 以对图像的归一化处理为例,需要对图片中的每一个像素点的三个通道值分别除以255,相比于使用CPU进行串行计算,我们可以使用CUDA核函数创建更多的线程和线程块来充分利用GPU的并行处理能力:// 计算需要的线程总量(高度 x 宽度):640*640=409600 int jobs = dst_height * dst_width; // 一个线程块包含256个线程 int threads = 256; // 计算线程块的数量(向上取整) int blocks = ceil(jobs / (float)threads); // 调用kernel函数 preprocess_kernel<<<blocks, threads>>>( img_buffer_device, dst, dst_width, dst_height, jobs); // 函数的参数 这里我们定义每个线程块的线程数量为256,线程块的数量为ceil(jobs / (float)threads),总的线程总量要大于等图片的像素数量。当启动Kernel函数时,GPU上的每个线程都会执行相同的程序代码,从而实现更高效的并行计算,函数具体实现如下:// 一个线程处理一个像素点 __global__ void preprocess_kernel( uint8_t *src, float *dst, int dst_width, int dst_height, int edge) { int tid = blockDim.x * blockIdx.x + threadIdx.x; if (tid >= edge) return; int dx = tid % dst_width; // 计算当前线程对应的目标图像的x坐标 int dy = tid / dst_width; // 计算当前线程对应的目标图像的y坐标 // normalization(对原图中(x,y)坐标的像素点3个通道进行归一化) float c0 = src[dy * dst_width * 3 + dx * 3 + 0] / 255.0f; float c1 = src[dy * dst_width * 3 + dx * 3 + 1] / 255.0f; float c2 = src[dy * dst_width * 3 + dx * 3 + 2] / 255.0f; // bgr to rgb float t = c2; c2 = c0; c0 = t; // rgbrgbrgb to rrrgggbbb // NHWC to NCHW int area = dst_width * dst_height; float *pdst_c0 = dst + dy * dst_width + dx; float *pdst_c1 = pdst_c0 + area; float *pdst_c2 = pdst_c1 + area; *pdst_c0 = c0; *pdst_c1 = c1; *pdst_c2 = c2; } 其中tid是本线程的索引,dst_width和dst_height是图像的宽和高,edge是图片的像素数量,每一个线程处理一个像素点。由于线程索引是从0开始计数的,我们要确保tid不能超过图片的像素数量edge:int tid = blockDim.x * blockIdx.x + threadIdx.x; if (tid >= edge) return; 由于图像数据以行优先(row-major)顺序连续存储在内存中,每个像素由3个字节表示(BGR)。为了获取每个线程所处理的像素点在内存中的起始位置,我们可以先计算当前线程所对应图像的x和y坐标即dx和dy:int dx = tid % dst_width; // 计算当前线程对应的目标图像的x坐标 int dy = tid / dst_width; // 计算当前线程对应的目标图像的y坐标 然后获取当前线程所处理的像素点在内存中的起始位置:dy * dst_width * 3 + dx * 3,*3是因为每个像素点有3个通道值,在内存中的排列方式为:BGRBGRBGR...,最后再/255对原图中(x,y)坐标的像素点3个通道值进行归一化:// normalization float c0 = src[dy * dst_width * 3 + dx * 3 + 0] / 255.0f; float c1 = src[dy * dst_width * 3 + dx * 3 + 1] / 255.0f; float c2 = src[dy * dst_width * 3 + dx * 3 + 2] / 255.0f; dy * dst_width * 3:定位到第dy行的起始位置dx * 3:在当前行中定位到第dx个像素的起始位置+ 0, + 1, + 2:分别访问B、G、R三个通道的值除以255交换变量做BGR到RGB的通道转换:// bgr to rgb float t = c2; c2 = c0; c0 = t; 目标图像(RGB)像素点在内存中的排列方式为RRR...GGG...BBB,当前像素点R通道的值在目标图像中内存地址为(dst + dy * dst_width + dx),G通道的值在目标图像中内存地址为(dst + dy * dst_width + dx) + area,加上1个通道的偏移量area,以此类推,完成对图像的通道转换:// NHWC to NCHW // rgbrgbrgb to rrrgggbbb int area = dst_width * dst_height; float *pdst_c0 = dst + dy * dst_width + dx; float *pdst_c1 = pdst_c0 + area; float *pdst_c2 = pdst_c1 + area; *pdst_c0 = c0; *pdst_c1 = c1; *pdst_c2 = c2;
-
Ascend310部署Qwen-VL-7B实现吸烟动作识别OrangePi AI Studio Pro是基于2个昇腾310P处理器的新一代高性能推理解析卡,提供基础通用算力+超强AI算力,整合了训练和推理的全部底层软件栈,实现训推一体。其中AI半精度FP16算力约为176TFLOPS,整数Int8精度可达352TOPS,本文将带领大家在Ascend 310P上部署Qwen2.5-VL-7B多模态理解大模型实现吸烟动作的识别。一、环境配置我们在OrangePi AI Stuido上使用Docker容器部署MindIE:docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-300I-Duo-py311-openeuler24.03-ltsroot@orangepi:~# docker images REPOSITORY TAG IMAGE ID CREATED SIZE swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 2.1.RC1-300I-Duo-py311-openeuler24.03-lts 0574b8d4403f 3 months ago 20.4GB langgenius/dify-web 1.0.1 b2b7363571c2 8 months ago 475MB langgenius/dify-api 1.0.1 3dd892f50a2d 8 months ago 2.14GB langgenius/dify-plugin-daemon 0.0.4-local 3f180f39bfbe 8 months ago 1.35GB ubuntu/squid latest dae40da440fe 8 months ago 243MB postgres 15-alpine afbf3abf6aeb 8 months ago 273MB nginx latest b52e0b094bc0 9 months ago 192MB swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 1.0.0-300I-Duo-py311-openeuler24.03-lts 74a5b9615370 10 months ago 17.5GB redis 6-alpine 6dd588768b9b 10 months ago 30.2MB langgenius/dify-sandbox 0.2.10 4328059557e8 13 months ago 567MB semitechnologies/weaviate 1.19.0 8ec9f084ab23 2 years ago 52.5MB之后创建一个名为start-docker.sh的启动脚本,内容如下:NAME=$1 if [ $# -ne 1 ]; then echo "warning: need input container name.Use default: mindie" NAME=mindie fi docker run --name ${NAME} -it -d --net=host --shm-size=500g \ --privileged=true \ -w /usr/local/Ascend/atb-models \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --entrypoint=bash \ -v /models:/models \ -v /data:/data \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/sbin:/usr/local/sbin \ -v /home:/home \ -v /tmp:/tmp \ -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \ -e http_proxy=$http_proxy \ -e https_proxy=$https_proxy \ -e "PATH=/usr/local/python3.11.6/bin:$PATH" \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-300I-Duo-py311-openeuler24.03-ltsbash start-docker.sh启动容器后,我们需要替换几个文件并安装Ascend-cann-nnal软件包:root@orangepi:~# docker exec -it mindie bash Welcome to 5.15.0-126-generic System information as of time: Sat Nov 15 22:06:48 CST 2025 System load: 1.87 Memory used: 6.3% Swap used: 0.0% Usage On: 33% Users online: 0 [root@orangepi atb-models]# cd /usr/local/Ascend/ascend-toolkit/8.2.RC1/lib64/ [root@orangepi lib64]# ls /data/fix_openeuler_docker/fixhccl/8.2hccl/ libhccl.so libhccl_alg.so libhccl_heterog.so libhccl_plf.so [root@orangepi lib64]# cp /data/fix_openeuler_docker/fixhccl/8.2hccl/* ./ cp: overwrite './libhccl.so'? cp: overwrite './libhccl_alg.so'? cp: overwrite './libhccl_heterog.so'? cp: overwrite './libhccl_plf.so'? [root@orangepi lib64]# source /usr/local/Ascend/ascend-toolkit/set_env.sh [root@orangepi lib64]# chmod +x /data/fix_openeuler_docker/Ascend-cann-nnal/Ascend-cann-nnal_8.3.RC1_linux-x86_64.run [root@orangepi lib64]# /data/fix_openeuler_docker/Ascend-cann-nnal/Ascend-cann-nnal_8.3.RC1_linux-x86_64.run --install --quiet [NNAL] [20251115-22:41:45] [INFO] LogFile:/var/log/ascend_seclog/ascend_nnal_install.log [NNAL] [20251115-22:41:45] [INFO] Ascend-cann-atb_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 start WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv [NNAL] [20251115-22:41:58] [INFO] Ascend-cann-atb_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 install success [NNAL] [20251115-22:41:58] [INFO] Ascend-cann-SIP_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 start [NNAL] [20251115-22:41:59] [INFO] Ascend-cann-SIP_8.3.RC1_linux-x86_64.run --install --install-path=/usr/local/Ascend/nnal --install-for-all --quiet --nox11 install success [NNAL] [20251115-22:41:59] [INFO] Ascend-cann-nnal_8.3.RC1_linux-x86_64.run install success Warning!!! If the environment variables of atb and asdsip are set at the same time, unexpected consequences will occur. Import the corresponding environment variables based on the usage scenarios: atb for large model scenarios, asdsip for embedded scenarios. Please make sure that the environment variables have been configured. If you want to use atb module: - To take effect for current user, you can exec command below: source /usr/local/Ascend/nnal/atb/set_env.sh or add "source /usr/local/Ascend/nnal/atb/set_env.sh" to ~/.bashrc. If you want to use asdsip module: - To take effect for current user, you can exec command below: source /usr/local/Ascend/nnal/asdsip/set_env.sh or add "source /usr/local/Ascend/nnal/asdsip/set_env.sh" to ~/.bashrc. [root@orangepi lib64]# cat /usr/local/Ascend/nnal/atb/latest/version.info Ascend-cann-atb : 8.3.RC1 Ascend-cann-atb Version : 8.3.RC1.B106 Platform : x86_64 branch : 8.3.rc1-0702 commit id : 16004f23040e0dcdd3cf0c64ecf36622487038ba修改推理使用的逻辑NPU核心为0,1,测试多模态理解大模型:Qwen2.5-VL-7B-Instruct:运行结果表明,Qwen2.5-VL-7B-Instruct在2 x Ascned 310P上推理平均每秒可以输出20个tokens,同时准确理解画面中的人物信息和行为动作。[root@orangepi atb-models]# bash examples/models/qwen2_vl/run_pa.sh --model_path /models/Qwen2.5-VL-7B-Instruct/ --input_image /root/pic/test.jpg [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] ***************************************** [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2025-11-15 22:12:49,663] torch.distributed.run: [WARNING] ***************************************** /usr/local/lib64/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( /usr/local/lib64/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 2025-11-15 22:12:53.250 7934 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llmmodels] [loglevel] [file:line] [status code] msg 2025-11-15 22:12:53.250 7933 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llmmodels] [loglevel] [file:line] [status code] msg [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] deepseekV2_DecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7934] [139886327420160] [llmmodels] [WARN] [model_factory.cpp:28] llama_LlamaDecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:53.250] [7933] [139649439929600] [llmmodels] [WARN] [model_factory.cpp:28] llama_LlamaDecoderModel model already exists, but the duplication doesn't matter. [2025-11-15 22:12:55,335] [7934] [139886327420160] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 1, device_id: 1, numa_id: 0, shard_devices: [0, 1], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-11-15 22:12:55,336] [7934] [139886327420160] [llmmodels] [INFO] [cpu_binding.py-280] : process 7934, new_affinity is [8, 9, 10, 11, 12, 13, 14, 15], cpu count 8 [2025-11-15 22:12:55,356] [7933] [139649439929600] [llmmodels] [INFO] [cpu_binding.py-254] : rank_id: 0, device_id: 0, numa_id: 0, shard_devices: [0, 1], cpus: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-11-15 22:12:55,357] [7933] [139649439929600] [llmmodels] [INFO] [cpu_binding.py-280] : process 7933, new_affinity is [0, 1, 2, 3, 4, 5, 6, 7], cpu count 8 [2025-11-15 22:12:56,032] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-156] : model_runner.quantize: None, model_runner.kv_quant_type: None, model_runner.fa_quant_type: None, model_runner.dtype: torch.float16 [2025-11-15 22:13:01,826] [7933] [139649439929600] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set [2025-11-15 22:13:01,827] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-187] : init tokenizer done Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [2025-11-15 22:13:02,070] [7934] [139886327420160] [llmmodels] [INFO] [dist.py-81] : initialize_distributed has been Set Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [W InferFormat.cpp:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator()) [W InferFormat.cpp:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator()) [2025-11-15 22:13:08,435] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-153] : >>>> qwen_QwenDecoderModel is called. [2025-11-15 22:13:08,526] [7934] [139886327420160] [llmmodels] [INFO] [flash_causal_qwen2.py-153] : >>>> qwen_QwenDecoderModel is called. [2025-11-15 22:13:16.666] [7933] [139649439929600] [llmmodels] [WARN] [operation_factory.cpp:42] OperationName: TransdataOperation not find in operation factory map [2025-11-15 22:13:16.698] [7934] [139886327420160] [llmmodels] [WARN] [operation_factory.cpp:42] OperationName: TransdataOperation not find in operation factory map [2025-11-15 22:13:22,379] [7933] [139649439929600] [llmmodels] [INFO] [model_runner.py-282] : model: FlashQwen2vlForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (vision_tower): Qwen25VisionTransformerPretrainedModelATB( (encoder): Qwen25VLVisionEncoderATB( (layers): ModuleList( (0-31): 32 x Qwen25VLVisionLayerATB( (attn): VisionAttention( (qkv): TensorParallelColumnLinear( (linear): FastLinear() ) (proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): VisionMlp( (gate_up_proj): TensorParallelColumnLinear( (linear): FastLinear() ) (down_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (norm1): BaseRMSNorm() (norm2): BaseRMSNorm() ) ) (patch_embed): FastPatchEmbed( (proj): TensorReplicatedLinear( (linear): FastLinear() ) ) (patch_merger): PatchMerger( (patch_merger_mlp_0): TensorParallelColumnLinear( (linear): FastLinear() ) (patch_merger_mlp_2): TensorParallelRowLinear( (linear): FastLinear() ) (patch_merger_ln_q): BaseRMSNorm() ) ) (rotary_pos_emb): VisionRotaryEmbedding() ) (language_model): FlashQwen2UsingMROPEForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (transformer): FlashQwenModel( (wte): TensorEmbeddingWithoutChecking() (h): ModuleList( (0-27): 28 x FlashQwenLayer( (attn): FlashQwenAttention( (rotary_emb): PositionRotaryEmbedding() (c_attn): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): QwenMLP( (act): SiLU() (w2_w1): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (ln_1): QwenRMSNorm() (ln_2): QwenRMSNorm() ) ) (ln_f): QwenRMSNorm() ) (lm_head): TensorParallelHead( (linear): FastLinear() ) ) ) [2025-11-15 22:13:24,268] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-134] : hbm_capacity(GB): 87.5078125, init_memory(GB): 11.376015624962747 [2025-11-15 22:13:24,789] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-342] : pa_runner: PARunner(model_path=/models/Qwen2.5-VL-7B-Instruct/, input_text=请用超过500个字详细说明图片的内容,并仔细判断画面中的人物是否有吸烟动作。, max_position_embeddings=None, max_input_length=16384, max_output_length=1024, max_prefill_tokens=-1, load_tokenizer=True, enable_atb_torch=False, max_prefill_batch_size=None, max_batch_size=1, dtype=torch.float16, block_size=128, model_config=ModelConfig(num_heads=14, num_kv_heads=2, num_kv_heads_origin=4, head_size=128, k_head_size=128, v_head_size=128, num_layers=28, device=npu:0, dtype=torch.float16, soc_info=NPUSocInfo(soc_name='', soc_version=200, need_nz=True, matmul_nd_nz=False), kv_quant_type=None, fa_quant_type=None, mapping=Mapping(world_size=2, rank=0, num_nodes=1,pp_rank=0, pp_groups=[[0], [1]], micro_batch_size=1, attn_dp_groups=[[0], [1]], attn_tp_groups=[[0, 1]], attn_inner_sp_groups=[[0], [1]], attn_cp_groups=[[0], [1]], attn_o_proj_tp_groups=[[0], [1]], mlp_tp_groups=[[0, 1]], moe_ep_groups=[[0], [1]], moe_tp_groups=[[0, 1]]), cla_share_factor=1, model_type=qwen2_5_vl, enable_nz=False), max_memory=93960798208, [2025-11-15 22:13:24,794] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-122] : ---------------Begin warm_up--------------- [2025-11-15 22:13:24,794] [7933] [139649439929600] [llmmodels] [INFO] [cache.py-154] : kv cache will allocate 0.46484375GB memory [2025-11-15 22:13:24,821] [7934] [139886327420160] [llmmodels] [INFO] [cache.py-154] : kv cache will allocate 0.46484375GB memory [2025-11-15 22:13:24,827] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1139] : ------total req num: 1, infer start-------- [2025-11-15 22:13:26,002] [7934] [139886327420160] [llmmodels] [INFO] [flash_causal_qwen2.py-680] : <<<<<<<after transdata k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,023] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-676] : <<<<<<< ori k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,023] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-680] : <<<<<<<after transdata k_caches[0].shape=torch.Size([136, 16, 128, 16]) [2025-11-15 22:13:26,024] [7933] [139649439929600] [llmmodels] [INFO] [flash_causal_qwen2.py-705] : >>>>>>id of kcache is 139645634198608 id of vcache is 139645634198320 [2025-11-15 22:13:34,363] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 9476.590633392334ms, Prefill average time: 9476.590633392334ms, Decode token time: 54.94809150695801ms, E2E time: 9531.538724899292ms [2025-11-15 22:13:34,363] [7934] [139886327420160] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 9452.020645141602ms, Prefill average time: 9452.020645141602ms, Decode token time: 54.654598236083984ms, E2E time: 9506.675243377686ms [2025-11-15 22:13:34,366] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1326] : -------------------performance dumped------------------------ [2025-11-15 22:13:34,371] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1329] : | batch_size | input_seq_len | output_seq_len | e2e_time(ms) | prefill_time(ms) | decoder_token_time(ms) | prefill_count | prefill_average_time(ms) | |-------------:|----------------:|-----------------:|---------------:|-------------------:|-------------------------:|----------------:|---------------------------:| | 1 | 16384 | 2 | 9531.54 | 9476.59 | 54.95 | 1 | 9476.59 | /usr/local/lib64/python3.11/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( [2025-11-15 22:13:35,307] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-148] : warmup_memory(GB): 15.75 [2025-11-15 22:13:35,307] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-153] : ---------------End warm_up--------------- /usr/local/lib64/python3.11/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( [2025-11-15 22:13:35,363] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1139] : ------total req num: 1, infer start-------- [2025-11-15 22:13:50,021] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 1004.0028095245361ms, Prefill average time: 1004.0028095245361ms, Decode token time: 13.301290491575836ms, E2E time: 14611.222982406616ms [2025-11-15 22:13:50,021] [7934] [139886327420160] [llmmodels] [INFO] [generate.py-1294] : Prefill time: 1067.9974555969238ms, Prefill average time: 1067.9974555969238ms, Decode token time: 13.300292536193908ms, E2E time: 14674.196720123291ms [2025-11-15 22:13:50,025] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1326] : -------------------performance dumped------------------------ [2025-11-15 22:13:50,028] [7933] [139649439929600] [llmmodels] [INFO] [generate.py-1329] : | batch_size | input_seq_len | output_seq_len | e2e_time(ms) | prefill_time(ms) | decoder_token_time(ms) | prefill_count | prefill_average_time(ms) | |-------------:|----------------:|-----------------:|---------------:|-------------------:|-------------------------:|----------------:|---------------------------:| | 1 | 1675 | 1024 | 14611.2 | 1004 | 13.3 | 1 | 1004 | [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-385] : Question[0]: [{'image': '/root/pic/test.jpg'}, {'text': '请用超过500个字详细说明图片的内容,并仔细判断画面中的人物是否有吸烟动作。'}] [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-386] : Answer[0]: 这张图片展示了一个无人机航拍的场景,画面中可以看到两名工人站在一个雪地或冰面上。他们穿着橙色的安全背心和红色的安全帽,显得非常醒目。背景中可以看到一些雪地和一些金属结构,可能是桥梁或工业设施的一部分。 从图片的细节来看,画面右侧的工人右手放在嘴边,似乎在吸烟。他的姿势和动作与吸烟者的典型姿势相符。然而,由于图片的分辨率和角度限制,无法完全确定这个动作是否真实发生。如果要准确判断,可能需要更多的视频片段或更清晰的图像。 从无人机航拍的角度来看,这个场景可能是在进行某种工业或建筑项目的检查或监控。两名工人可能正在进行现场检查或讨论工作事宜。雪地和金属结构表明这可能是一个寒冷的冬季,或者是一个寒冷的气候区域。 无人机航拍技术在工业和建筑领域中非常常见,因为它可以提供高空视角,帮助工程师和管理人员更好地了解现场情况。这种技术不仅可以节省时间和成本,还可以提高工作效率和安全性。在进行航拍时,确保遵守当地的法律法规和安全规定是非常重要的。 总的来说,这张图片展示了一个无人机航拍的场景,画面中两名工人站在雪地上,其中一人似乎在吸烟。虽然无法完全确定这个动作是否真实发生,但根据他们的姿势和动作,可以合理推测这个动作的存在。 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-387] : Generate[0] token num: 282 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-389] : Latency(s): 14.721353530883789 [2025-11-15 22:13:50,035] [7933] [139649439929600] [llmmodels] [INFO] [run_pa.py-390] : Throughput(tokens/s): 19.15584728050956 本文详细介绍了在OrangePi AI Studio上使用Docker容器部署MindIE环境并运行Qwen2.5-VL-7B-Instruct多模态大模型实现吸烟动作识别的完整过程,验证了在Ascned 310p设备上运行多模态理解大模型的可靠性。
-
如何在Jetson上将YOLOv5实时检测速度提升至120+FPS这个项目提供了基于 Pybind11 的 TensorRT YOLOv5 插件 Python 绑定,实现了令人难以置信的实时目标检测性能!⚡ 超100FPS性能: 在 Jetson Orin Nano 上轻松实现超过 120 帧/秒的检测速度🎯 高精度检测: 基于成熟的 YOLOv5 架构,准确识别COCO数据集上的80类目标🔌 即插即用: 简单的 Python 接口,无需复杂的配置🛠️ 工业级优化: 采用 TensorRT 进行模型优化和加速1. Building the plugin首先安装必要的库克隆仓库构建项目,注意JetPack 5.x版本才能正常运行:sudo apt update sudo apt install ffmpeg sudo apt install pybind11-dev git clone https://github.com/HouYanSong/yolov5_trt_pybind11.git cd yolov5_trt_pybind11 pip install pybind11 rm -fr build cmake -S . -B build cmake --build build2. Model quantization生成量化图片对YOLOv5s模型进行Int8量化,保存量化后的模型:./media/gen_calib.sh ./build/build weights/yolov5s.onnx 1 ./media/ ./media/filelist.txt weights/yolov5s.engine[11/06/2025-11:57:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +221, GPU +0, now: CPU 249, GPU 4229 (MiB) [11/06/2025-11:57:39] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +277, now: CPU 574, GPU 4529 (MiB) [11/06/2025-11:57:39] [I] [TRT] ---------------------------------------------------------------- [11/06/2025-11:57:39] [I] [TRT] Input filename: weights/yolov5s.onnx [11/06/2025-11:57:39] [I] [TRT] ONNX IR version: 0.0.7 [11/06/2025-11:57:39] [I] [TRT] Opset version: 12 [11/06/2025-11:57:39] [I] [TRT] Producer name: [11/06/2025-11:57:39] [I] [TRT] Producer version: [11/06/2025-11:57:39] [I] [TRT] Domain: [11/06/2025-11:57:39] [I] [TRT] Model version: 0 [11/06/2025-11:57:39] [I] [TRT] Doc string: [11/06/2025-11:57:39] [I] [TRT] ---------------------------------------------------------------- [11/06/2025-11:57:39] [I] [TRT] No importer registered for op: YoloLayer_TRT. Attempting to import as plugin. [11/06/2025-11:57:39] [I] [TRT] Searching for plugin: YoloLayer_TRT, plugin_version: 1, plugin_namespace: [11/06/2025-11:57:39] [I] [TRT] Successfully created plugin: YoloLayer_TRT [11/06/2025-11:57:39] [I] sample0001.png [11/06/2025-11:57:39] [I] sample0002.png [11/06/2025-11:57:39] [I] sample0003.png [11/06/2025-11:57:39] [I] sample0004.png [11/06/2025-11:57:39] [I] sample0005.png [11/06/2025-11:57:39] [I] sample0006.png [11/06/2025-11:57:39] [I] sample0007.png [11/06/2025-11:57:39] [I] sample0008.png [11/06/2025-11:57:39] [I] sample0009.png [11/06/2025-11:57:39] [I] sample0010.png [11/06/2025-11:57:39] [I] sample0011.png [11/06/2025-11:57:39] [I] sample0012.png [11/06/2025-11:57:39] [I] sample0013.png [11/06/2025-11:57:39] [I] sample0014.png [11/06/2025-11:57:39] [I] sample0015.png [11/06/2025-11:57:39] [I] sample0016.png [11/06/2025-11:57:39] [I] sample0017.png [11/06/2025-11:57:39] [I] sample0018.png [11/06/2025-11:57:39] [I] sample0019.png [11/06/2025-11:57:39] [I] sample0020.png [11/06/2025-11:57:39] [I] sample0021.png [11/06/2025-11:57:39] [I] sample0022.png [11/06/2025-11:57:39] [I] sample0023.png [11/06/2025-11:57:39] [I] sample0024.png [11/06/2025-11:57:39] [I] sample0025.png [11/06/2025-11:57:39] [I] sample0026.png [11/06/2025-11:57:39] [I] sample0027.png [11/06/2025-11:57:39] [I] sample0028.png [11/06/2025-11:57:39] [I] sample0029.png [11/06/2025-11:57:39] [I] sample0030.png [11/06/2025-11:57:39] [I] sample0031.png [11/06/2025-11:57:39] [I] sample0032.png [11/06/2025-11:57:39] [I] sample0033.png [11/06/2025-11:57:39] [I] sample0034.png [11/06/2025-11:57:39] [I] sample0035.png [11/06/2025-11:57:39] [I] sample0036.png [11/06/2025-11:57:39] [I] sample0037.png [11/06/2025-11:57:39] [I] sample0038.png [11/06/2025-11:57:39] [I] sample0039.png [11/06/2025-11:57:39] [I] sample0040.png [11/06/2025-11:57:39] [I] sample0041.png [11/06/2025-11:57:39] [I] sample0042.png [11/06/2025-11:57:39] [I] sample0043.png [11/06/2025-11:57:39] [I] sample0044.png [11/06/2025-11:57:39] [I] sample0045.png [11/06/2025-11:57:39] [I] sample0046.png [11/06/2025-11:57:39] [I] sample0047.png [11/06/2025-11:57:39] [I] sample0048.png [11/06/2025-11:57:39] [I] sample0049.png [11/06/2025-11:57:39] [I] sample0050.png [11/06/2025-11:57:39] [I] sample0051.png [11/06/2025-11:57:39] [I] sample0052.png [11/06/2025-11:57:39] [I] sample0053.png [11/06/2025-11:57:39] [I] sample0054.png [11/06/2025-11:57:39] [I] sample0055.png [11/06/2025-11:57:39] [I] sample0056.png [11/06/2025-11:57:39] [I] sample0057.png [11/06/2025-11:57:39] [I] sample0058.png [11/06/2025-11:57:39] [I] sample0059.png [11/06/2025-11:57:39] [I] sample0060.png [11/06/2025-11:57:39] [I] sample0061.png [11/06/2025-11:57:39] [I] sample0062.png [11/06/2025-11:57:39] [I] sample0063.png [11/06/2025-11:57:39] [I] sample0064.png [11/06/2025-11:57:39] [I] sample0065.png [11/06/2025-11:57:39] [I] sample0066.png [11/06/2025-11:57:39] [I] sample0067.png [11/06/2025-11:57:39] [I] sample0068.png [11/06/2025-11:57:39] [I] sample0069.png [11/06/2025-11:57:39] [I] sample0070.png [11/06/2025-11:57:39] [I] sample0071.png [11/06/2025-11:57:39] [I] sample0072.png [11/06/2025-11:57:39] [I] sample0073.png [11/06/2025-11:57:39] [I] sample0074.png [11/06/2025-11:57:39] [I] sample0075.png [11/06/2025-11:57:39] [I] sample0076.png [11/06/2025-11:57:39] [I] sample0077.png [11/06/2025-11:57:39] [I] sample0078.png [11/06/2025-11:57:39] [I] sample0079.png [11/06/2025-11:57:39] [I] sample0080.png [11/06/2025-11:57:39] [I] sample0081.png [11/06/2025-11:57:39] [I] sample0082.png [11/06/2025-11:57:39] [I] sample0083.png [11/06/2025-11:57:39] [I] sample0084.png [11/06/2025-11:57:39] [I] sample0085.png [11/06/2025-11:57:39] [I] sample0086.png [11/06/2025-11:57:39] [I] sample0087.png [11/06/2025-11:57:39] [I] sample0088.png [11/06/2025-11:57:39] [I] sample0089.png [11/06/2025-11:57:39] [I] sample0090.png [11/06/2025-11:57:39] [I] sample0091.png [11/06/2025-11:57:39] [I] sample0092.png [11/06/2025-11:57:39] [I] sample0093.png [11/06/2025-11:57:39] [I] sample0094.png [11/06/2025-11:57:39] [I] sample0095.png [11/06/2025-11:57:39] [I] sample0096.png [11/06/2025-11:57:39] [I] sample0097.png [11/06/2025-11:57:39] [I] sample0098.png [11/06/2025-11:57:39] [I] sample0099.png [11/06/2025-11:57:39] [I] sample0100.png [11/06/2025-11:57:39] [I] sample0101.png [11/06/2025-11:57:39] [I] sample0102.png [11/06/2025-11:57:39] [I] sample0103.png [11/06/2025-11:57:39] [I] sample0104.png [11/06/2025-11:57:39] [I] sample0105.png [11/06/2025-11:57:39] [I] sample0106.png [11/06/2025-11:57:39] [I] sample0107.png [11/06/2025-11:57:39] [I] sample0108.png [11/06/2025-11:57:39] [I] sample0109.png [11/06/2025-11:57:39] [I] sample0110.png [11/06/2025-11:57:39] [I] sample0111.png [11/06/2025-11:57:39] [I] sample0112.png [11/06/2025-11:57:39] [I] sample0113.png [11/06/2025-11:57:39] [I] sample0114.png [11/06/2025-11:57:39] [I] sample0115.png [11/06/2025-11:57:39] [I] sample0116.png [11/06/2025-11:57:39] [I] sample0117.png [11/06/2025-11:57:39] [I] sample0118.png [11/06/2025-11:57:39] [I] sample0119.png [11/06/2025-11:57:39] [I] sample0120.png [11/06/2025-11:57:39] [I] sample0121.png [11/06/2025-11:57:39] [I] sample0122.png [11/06/2025-11:57:39] [I] sample0123.png [11/06/2025-11:57:39] [I] sample0124.png [11/06/2025-11:57:39] [I] sample0125.png [11/06/2025-11:57:39] [I] sample0126.png [11/06/2025-11:57:39] [I] sample0127.png [11/06/2025-11:57:39] [I] sample0128.png [11/06/2025-11:57:39] [I] sample0129.png [11/06/2025-11:57:39] [I] sample0130.png [11/06/2025-11:57:39] [I] sample0131.png [11/06/2025-11:57:39] [I] sample0132.png [11/06/2025-11:57:39] [I] sample0133.png [11/06/2025-11:57:39] [I] sample0134.png [11/06/2025-11:57:39] [I] sample0135.png [11/06/2025-11:57:39] [I] sample0136.png [11/06/2025-11:57:39] [I] sample0137.png [11/06/2025-11:57:39] [I] sample0138.png [11/06/2025-11:57:39] [I] sample0139.png [11/06/2025-11:57:39] [I] sample0140.png [11/06/2025-11:57:39] [I] sample0141.png [11/06/2025-11:57:39] [I] sample0142.png [11/06/2025-11:57:39] [I] sample0143.png [11/06/2025-11:57:39] [I] sample0144.png [11/06/2025-11:57:39] [I] sample0145.png CalibrationDataReader: 145 images, 145 batches. [11/06/2025-11:57:39] [I] [TRT] Reading Calibration Cache for calibrator: MinMaxCalibration [11/06/2025-11:57:39] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales. [11/06/2025-11:57:39] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache. [11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeNumDetection, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [11/06/2025-11:57:39] [W] [TRT] Missing scale and zero-point for tensor DecodeDetectionClasses, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on DLA ---------- [11/06/2025-11:57:39] [I] [TRT] ---------- Layers Running on GPU ---------- [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.0/conv/Conv + PWN(PWN(/model.0/act/Sigmoid), /model.0/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.1/conv/Conv + PWN(PWN(/model.1/act/Sigmoid), /model.1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv1/conv/Conv + PWN(PWN(/model.2/cv1/act/Sigmoid), /model.2/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv2/conv/Conv + PWN(PWN(/model.2/cv2/act/Sigmoid), /model.2/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv1/conv/Conv + PWN(PWN(/model.2/m/m.0/cv1/act/Sigmoid), /model.2/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.2/m/m.0/cv2/act/Sigmoid), /model.2/m/m.0/cv2/act/Mul), /model.2/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.2/cv3/conv/Conv + PWN(PWN(/model.2/cv3/act/Sigmoid), /model.2/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.3/conv/Conv + PWN(PWN(/model.3/act/Sigmoid), /model.3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv1/conv/Conv + PWN(PWN(/model.4/cv1/act/Sigmoid), /model.4/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv2/conv/Conv + PWN(PWN(/model.4/cv2/act/Sigmoid), /model.4/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv1/conv/Conv + PWN(PWN(/model.4/m/m.0/cv1/act/Sigmoid), /model.4/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.0/cv2/act/Sigmoid), /model.4/m/m.0/cv2/act/Mul), /model.4/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv1/conv/Conv + PWN(PWN(/model.4/m/m.1/cv1/act/Sigmoid), /model.4/m/m.1/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/m/m.1/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.4/m/m.1/cv2/act/Sigmoid), /model.4/m/m.1/cv2/act/Mul), /model.4/m/m.1/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.4/cv3/conv/Conv + PWN(PWN(/model.4/cv3/act/Sigmoid), /model.4/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.5/conv/Conv + PWN(PWN(/model.5/act/Sigmoid), /model.5/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv1/conv/Conv + PWN(PWN(/model.6/cv1/act/Sigmoid), /model.6/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv2/conv/Conv + PWN(PWN(/model.6/cv2/act/Sigmoid), /model.6/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv1/conv/Conv + PWN(PWN(/model.6/m/m.0/cv1/act/Sigmoid), /model.6/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.0/cv2/act/Sigmoid), /model.6/m/m.0/cv2/act/Mul), /model.6/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv1/conv/Conv + PWN(PWN(/model.6/m/m.1/cv1/act/Sigmoid), /model.6/m/m.1/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.1/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.1/cv2/act/Sigmoid), /model.6/m/m.1/cv2/act/Mul), /model.6/m/m.1/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv1/conv/Conv + PWN(PWN(/model.6/m/m.2/cv1/act/Sigmoid), /model.6/m/m.2/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/m/m.2/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.6/m/m.2/cv2/act/Sigmoid), /model.6/m/m.2/cv2/act/Mul), /model.6/m/m.2/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.6/cv3/conv/Conv + PWN(PWN(/model.6/cv3/act/Sigmoid), /model.6/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.7/conv/Conv + PWN(PWN(/model.7/act/Sigmoid), /model.7/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv1/conv/Conv + PWN(PWN(/model.8/cv1/act/Sigmoid), /model.8/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv2/conv/Conv + PWN(PWN(/model.8/cv2/act/Sigmoid), /model.8/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv1/conv/Conv + PWN(PWN(/model.8/m/m.0/cv1/act/Sigmoid), /model.8/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/m/m.0/cv2/conv/Conv [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(PWN(/model.8/m/m.0/cv2/act/Sigmoid), /model.8/m/m.0/cv2/act/Mul), /model.8/m/m.0/Add) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.8/cv3/conv/Conv + PWN(PWN(/model.8/cv3/act/Sigmoid), /model.8/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv1/conv/Conv + PWN(PWN(/model.9/cv1/act/Sigmoid), /model.9/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m/MaxPool [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_1/MaxPool [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] POOLING: /model.9/m_2/MaxPool [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/cv1/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m/MaxPool_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.9/m_1/MaxPool_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.9/cv2/conv/Conv + PWN(PWN(/model.9/cv2/act/Sigmoid), /model.9/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.10/conv/Conv + PWN(PWN(/model.10/act/Sigmoid), /model.10/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.11/Resize [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.11/Resize_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv1/conv/Conv + PWN(PWN(/model.13/cv1/act/Sigmoid), /model.13/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv2/conv/Conv + PWN(PWN(/model.13/cv2/act/Sigmoid), /model.13/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv1/conv/Conv + PWN(PWN(/model.13/m/m.0/cv1/act/Sigmoid), /model.13/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/m/m.0/cv2/conv/Conv + PWN(PWN(/model.13/m/m.0/cv2/act/Sigmoid), /model.13/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.13/cv3/conv/Conv + PWN(PWN(/model.13/cv3/act/Sigmoid), /model.13/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.14/conv/Conv + PWN(PWN(/model.14/act/Sigmoid), /model.14/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] RESIZE: /model.15/Resize [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.15/Resize_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.4/cv3/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv1/conv/Conv + PWN(PWN(/model.17/cv1/act/Sigmoid), /model.17/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv2/conv/Conv + PWN(PWN(/model.17/cv2/act/Sigmoid), /model.17/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv1/conv/Conv + PWN(PWN(/model.17/m/m.0/cv1/act/Sigmoid), /model.17/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/m/m.0/cv2/conv/Conv + PWN(PWN(/model.17/m/m.0/cv2/act/Sigmoid), /model.17/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.17/cv3/conv/Conv + PWN(PWN(/model.17/cv3/act/Sigmoid), /model.17/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.18/conv/Conv + PWN(PWN(/model.18/act/Sigmoid), /model.18/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.0/Conv + PWN(/model.24/Sigmoid) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.14/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv1/conv/Conv + PWN(PWN(/model.20/cv1/act/Sigmoid), /model.20/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv2/conv/Conv + PWN(PWN(/model.20/cv2/act/Sigmoid), /model.20/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv1/conv/Conv + PWN(PWN(/model.20/m/m.0/cv1/act/Sigmoid), /model.20/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/m/m.0/cv2/conv/Conv + PWN(PWN(/model.20/m/m.0/cv2/act/Sigmoid), /model.20/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.20/cv3/conv/Conv + PWN(PWN(/model.20/cv3/act/Sigmoid), /model.20/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.21/conv/Conv + PWN(PWN(/model.21/act/Sigmoid), /model.21/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.1/Conv + PWN(/model.24/Sigmoid_1) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] COPY: /model.10/act/Mul_output_0 copy [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv1/conv/Conv + PWN(PWN(/model.23/cv1/act/Sigmoid), /model.23/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv2/conv/Conv + PWN(PWN(/model.23/cv2/act/Sigmoid), /model.23/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv1/conv/Conv + PWN(PWN(/model.23/m/m.0/cv1/act/Sigmoid), /model.23/m/m.0/cv1/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/m/m.0/cv2/conv/Conv + PWN(PWN(/model.23/m/m.0/cv2/act/Sigmoid), /model.23/m/m.0/cv2/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.23/cv3/conv/Conv + PWN(PWN(/model.23/cv3/act/Sigmoid), /model.23/cv3/act/Mul) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] CONVOLUTION: /model.24/m.2/Conv + PWN(/model.24/Sigmoid_2) [11/06/2025-11:57:39] [I] [TRT] [GpuLayer] PLUGIN_V2: YoloLayer [11/06/2025-11:57:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +689, now: CPU 1137, GPU 5200 (MiB) [11/06/2025-11:57:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +132, now: CPU 1220, GPU 5332 (MiB) [11/06/2025-11:57:41] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [11/06/2025-12:00:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes. [11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 1115794944 [11/06/2025-12:01:03] [I] [TRT] Detected 1 inputs and 4 output network tensors. [11/06/2025-12:01:03] [I] [TRT] Total Host Persistent Memory: 175984 [11/06/2025-12:01:03] [I] [TRT] Total Device Persistent Memory: 614912 [11/06/2025-12:01:03] [I] [TRT] Total Scratch Memory: 0 [11/06/2025-12:01:03] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 7 MiB, GPU 553 MiB [11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 67 steps to complete. [11/06/2025-12:01:03] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 2.77161ms to assign 6 blocks to 67 nodes requiring 10925056 bytes. [11/06/2025-12:01:03] [I] [TRT] Total Activation Memory: 10925056 [11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB) [11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1557, GPU 5945 (MiB) [11/06/2025-12:01:04] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +8, now: CPU 7, GPU 8 (MiB) Engine build success! Python call example以下是一个简单Python示例调用C++生成的动态链接库,仅需指定模型文件的路径和视频输入的大小,就能返回视频每一帧的检测结果,并且在视频推理过程中可以动态调整置信度和交并比等参数的阈值。import cv2 import time import ctypes ctypes.CDLL("./build/libyolo_plugin.so", mode=ctypes.RTLD_GLOBAL) ctypes.CDLL("./build/libyolo_utils.so", mode=ctypes.RTLD_GLOBAL) from build import yolov5_trt def draw_detections(image, detections, fps): for detection in detections: class_id = detection['class_id'] x1, y1, x2, y2 = detection['bbox'] confidence = detection['confidence'] cv2.rectangle(image, (x1, y1), (x2, y2), (0x27, 0xC1, 0x36), 2) cv2.putText(image, f"{class_id}:{confidence:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_PLAIN, 1.2, (0x27, 0xC1, 0x36), 2) cv2.putText(image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 0, 255), 2) return image def main(input_path, output_path): cap = cv2.VideoCapture(input_path) fps = int(cap.get(cv2.CAP_PROP_FPS)) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) detector = yolov5_trt.YOLOv5Detector("./weights/yolov5s.engine", width, height) writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'MJPG'), fps, (width, height)) fps_list = [] frame_count = 0 total_time = 0.0 while cap.isOpened(): ret, frame = cap.read() if not ret: break start_time = time.time() detections = detector.detect(input_image=frame, input_w=640, input_h=640, conf_thresh=0.45, nms_thresh=0.55) process_time = time.time() - start_time current_fps = 1.0 / process_time if process_time > 0 else 0 frame_count += 1 total_time += process_time fps_list.append(current_fps) image = draw_detections(frame, detections, current_fps) writer.write(image) cap.release() writer.release() if frame_count > 0: avg_fps = frame_count / total_time if total_time > 0 else 0 print(f"Processed {frame_count} frames") print(f"Average FPS: {avg_fps:.2f}") print(f"Min FPS: {min(fps_list):.2f}") print(f"Max FPS: {max(fps_list):.2f}") if __name__ == "__main__": input_video = "./media/sample_720p.mp4" output_video = "./result.avi" main(input_video, output_video) 对应的C++推理代码如下:#include "NvInfer.h" #include "logger.h" #include "common.h" #include "buffers.h" #include "utils/preprocess.h" #include "utils/postprocess.h" #include "utils/types.h" #include "utils/utils.h" #include <pybind11/pybind11.h> #include <pybind11/numpy.h> #include <pybind11/stl.h> #include <memory> #include <mutex> namespace py = pybind11; // 将numpy数组转换为cv::Mat cv::Mat numpy_to_mat(py::array_t<unsigned char>& input) { py::buffer_info buf_info = input.request(); if (buf_info.ndim == 3) { // 彩色图像 int height = buf_info.shape[0]; int width = buf_info.shape[1]; int channels = buf_info.shape[2]; cv::Mat mat(height, width, CV_8UC3, (unsigned char*)buf_info.ptr); return mat.clone(); } else if (buf_info.ndim == 2) { // 灰度图像 int height = buf_info.shape[0]; int width = buf_info.shape[1]; cv::Mat mat(height, width, CV_8UC1, (unsigned char*)buf_info.ptr); return mat.clone(); } throw std::runtime_error("Unsupported array dimensions"); } // 将cv::Mat转换为numpy数组 py::array_t<unsigned char> mat_to_numpy(cv::Mat& mat) { if (mat.empty()) { return py::array_t<unsigned char>(); } if (mat.channels() == 1) { // 灰度图像 auto result = py::array_t<unsigned char>({mat.rows, mat.cols}); auto buf = result.request(); memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total()); return result; } else { // 彩色图像 auto result = py::array_t<unsigned char>({mat.rows, mat.cols, mat.channels()}); auto buf = result.request(); memcpy(buf.ptr, mat.data, sizeof(unsigned char) * mat.total() * mat.channels()); return result; } } // 加载模型文件 std::vector<unsigned char> load_engine_file(const std::string &file_name) { std::vector<unsigned char> engine_data; std::ifstream engine_file(file_name, std::ios::binary); assert(engine_file.is_open() && "Unable to load engine file."); engine_file.seekg(0, engine_file.end); int length = engine_file.tellg(); engine_data.resize(length); engine_file.seekg(0, engine_file.beg); engine_file.read(reinterpret_cast<char *>(engine_data.data()), length); return engine_data; } // YOLOv5推理器类 class YOLOv5Detector { private: std::unique_ptr<nvinfer1::IRuntime> runtime; std::shared_ptr<nvinfer1::ICudaEngine> engine; std::unique_ptr<nvinfer1::IExecutionContext> context; std::unique_ptr<samplesCommon::BufferManager> buffers; bool initialized = false; public: YOLOv5Detector(const std::string& engine_file, int frame_width, int frame_height) { initialize(engine_file); int img_size = frame_width * frame_height; cuda_preprocess_init(img_size); // 申请cuda内存 } void initialize(const std::string& engine_file) { // ========== 1. 创建推理运行时runtime ========== runtime = std::unique_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(sample::gLogger.getTRTLogger())); if (!runtime) { throw std::runtime_error("Failed to create TensorRT runtime"); } // ========== 2. 反序列化生成engine ========== auto plan = load_engine_file(engine_file); engine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(plan.data(), plan.size())); if (!engine) { throw std::runtime_error("Failed to deserialize engine"); } // ========== 3. 创建执行上下文context ========== context = std::unique_ptr<nvinfer1::IExecutionContext>(engine->createExecutionContext()); if (!context) { throw std::runtime_error("Failed to create execution context"); } // ========== 4. 创建输入输出缓冲区 ========== buffers = std::make_unique<samplesCommon::BufferManager>(engine); initialized = true; } py::list detect(py::array_t<unsigned char>& input_image, int input_w=kInputW, int input_h=kInputH, float conf_thresh=kConfThresh, float nms_thresh=kNmsThresh) { if (!initialized) { throw std::runtime_error("Detector not initialized"); } // 将numpy数组转换为cv::Mat cv::Mat frame = numpy_to_mat(input_image); if (frame.empty()) { throw std::runtime_error("Invalid input image"); } // CUDA预处理 process_input_gpu(frame, (float *)buffers->getDeviceBuffer(kInputTensorName), input_w, input_h); // ========== 5. 执行推理 ========== context->executeV2(buffers->getDeviceBindings().data()); // 拷贝回host buffers->copyOutputToHost(); // 从buffer manager中获取模型输出 int32_t *num_det = (int32_t *)buffers->getHostBuffer(kOutNumDet); int32_t *cls = (int32_t *)buffers->getHostBuffer(kOutDetCls); float *conf = (float *)buffers->getHostBuffer(kOutDetScores); float *bbox = (float *)buffers->getHostBuffer(kOutDetBBoxes); // 执行nms(非极大值抑制) std::vector<Detection> bboxs; yolo_nms(bboxs, num_det, cls, conf, bbox, conf_thresh, nms_thresh); // 返回检测结果 py::list result_list; for (size_t j = 0; j < bboxs.size(); j++) { cv::Rect r = get_rect(frame, bboxs[j].bbox, input_w, input_h); py::dict detection; detection["class_id"] = (int)bboxs[j].class_id; detection["confidence"] = (float)bboxs[j].conf; detection["bbox"] = py::cast(std::vector<int>{r.x, r.y, r.x + r.width, r.y + r.height}); result_list.append(detection); } return result_list; } }; // Python绑定代码 PYBIND11_MODULE(yolov5_trt, m) { m.doc() = "YOLOv5 TensorRT Python bindings"; py::class_<YOLOv5Detector>(m, "YOLOv5Detector") .def(py::init<const std::string&, int, int>(), "Initialize detector with engine file", py::arg("engine_file"), py::arg("frame_width"), py::arg("frame_height")) .def("detect", &YOLOv5Detector::detect, "Perform detection on input image", py::arg("input_image"), py::arg("input_w") = kInputW, py::arg("input_h") = kInputH, py::arg("conf_thresh") = kConfThresh, py::arg("nms_thresh") = kNmsThresh); } 实际在Jetson Oron Nano (8GB)上对720P输入大小的视频进行目标检测,平均帧率稳定在120+ FPS,满足工业场景下对实时性的要求。python yolov5_infer.py[11/06/2025-15:23:26] [I] [TRT] Loaded engine size: 7 MiB Deserialize yoloLayer plugin: YoloLayer [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +955, now: CPU 830, GPU 4470 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +149, now: CPU 913, GPU 4619 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7, now: CPU 0, GPU 7 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 913, GPU 4620 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 913, GPU 4623 (MiB) [11/06/2025-15:23:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 18 (MiB) Processed 1442 frames Average FPS: 127.51 Min FPS: 75.75 Max FPS: 134.67 Conclusion Remarks最后我们还提供了ByteTrack跟踪算法的Python绑定,基于Pybind11实现,并在原有算法基础上提供了跟踪目标的类别信息,Jetson Orin Nano也能在此基础上也能实现高达83 FPS的实时目标检测和跟踪性能:ByteTrack-Pybind11: 高性能实时目标跟踪解决方案 🚀
-
视觉多模态模型切分检测和边缘推理一、NanoOWL + SAHINanoOWL(边缘实时开放词汇目标检测模型)基于Vision Transformer架构,结合CLIP的图文对齐能力,可通过文本查询在图像中检测任意类别目标。我们可以结合SAHI框架使用“文本提示+图像切分”在Jetson Orin等嵌入式设备上实现低空目标的多模态检测和TensorRT推理。二、代码实现首先我们拉取官方代码仓库https://github.com/dusty-nv/jetson-containers并运行安装命令install.sh:git clone https://github.com/dusty-nv/jetson-containers bash jetson-containers/install.sh之后使用jetson-containers run和autotag命令自动提取并构建兼容的容器:jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl) 我们在终端中查看容器的ID并将容器中nanoowl拷贝到自定义目录下/home/vsuav/workspace/vit:sudo docker ps -asudo docker cp af063d738879:/opt/nanoowl /home/vsuav/workspace/vit将/home/vsuav/workspace/vit/nanoowl目录及其内部所有文件和子目录的所有者和所属组都设置为当前登录用户,从而确保我们可以正常访问和修改这些文件。sudo chown -R $(whoami):$(whoami) /home/vsuav/workspace/vit/nanoowl运行jetson-containers run命令并指定--workdir参数,将nanoowl目录挂载到容器中,设置容器的名称为NanoOWL:jetson-containers run -v /home/vsuav/workspace/vit/nanoowl:/opt/nanoowl --name NanoOWL --workdir /opt/nanoowl $(autotag nanoowl) 运行docker start NanoOWL启动容器并进入容器的终端:sudo docker start NanoOWL sudo docker exec -it NanoOWL bash 我们在容器内部使用pip3安装python库sahi并创建main.pypip3 install sahi -i https://pypi.tuna.tsinghua.edu.cn/simple我们在examples/owl_predict.py基础上添加SAHI切分检测的逻辑,将无人机拍摄的高清大图切分为640x640和1280x1280的子图并叠加原图和类别名称的编码信息送入模型进行预测,最后把推理结果映射到原图上使用GreedyNMM进行合并后处理,完整代码如下:import os import cv2 import PIL.Image import numpy as np from sahi.slicing import get_slice_bboxes from nanoowl.owl_predictor import OwlPredictor from sahi.postprocess.utils import ObjectPrediction from sahi.postprocess.combine import GreedyNMMPostprocess class OWL_SAHI: def __init__(self, model_path, label_list, OBJ_THRESH, NMS_THRESH, overlap_ratio, slice, slice_scales): self.model_path = model_path if label_list != []: self.label_list = label_list else: self.label_list = [""] self.OBJ_THRESH = OBJ_THRESH self.NMS_THRESH = NMS_THRESH self.overlap_ratio = overlap_ratio self.slice = slice self.slice_scales = slice_scales self.predictor = OwlPredictor( "google/owlvit-base-patch32", image_encoder_engine = self.model_path ) self.text_encodings = self.predictor.encode_text(self.label_list) self.postprocess = GreedyNMMPostprocess( match_threshold = self.NMS_THRESH, match_metric = "IOS", class_agnostic = False, ) def getImageSlices(self, image): img_height, img_width = image.shape[:2] slice_bboxes = [] if self.slice: for slice_scale in self.slice_scales: slice_bboxe = get_slice_bboxes( image_height = img_height, image_width = img_width, auto_slice_resolution = True, slice_height = slice_scale[1], slice_width = slice_scale[0], overlap_height_ratio = self.overlap_ratio, overlap_width_ratio = self.overlap_ratio, ) slice_bboxes.extend(slice_bboxe) slice_bboxes.append([0, 0, img_width, img_height]) else: slice_bboxes = [[0, 0, img_width, img_height]] img_batch = [] for bbox in slice_bboxes: l, t, r, b = bbox img_batch.append(image[t:b, l:r]) return img_batch, slice_bboxes def predict(self, image_path): image = cv2.imread(image_path) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) img_batch, slice_bboxes = self.getImageSlices(image) all_boxes = [] for img, slice_bbox in zip(img_batch, slice_bboxes): img_pil = PIL.Image.fromarray(img) output = self.predictor.predict( image = img_pil, text = self.label_list, text_encodings = self.text_encodings, threshold = self.OBJ_THRESH, pad_square = True ) boxes = output.boxes.cpu().numpy() if boxes.shape[0] > 0: boxes[:, 0] = boxes[:, 0] + slice_bbox[0] boxes[:, 1] = boxes[:, 1] + slice_bbox[1] boxes[:, 2] = boxes[:, 2] + slice_bbox[0] boxes[:, 3] = boxes[:, 3] + slice_bbox[1] boxes = boxes.astype(np.int32).tolist() for i in range(len(boxes)): obj_item = ObjectPrediction( bbox = boxes[i], score = float(output.scores[i]), category_id = int(output.labels[i]) ) all_boxes.append(obj_item) if len(all_boxes) > 0: all_boxes = self.postprocess(all_boxes) return all_boxes if __name__ == "__main__": model_path = "./data/owl_image_encoder_patch32.engine" label_list = ["car", "tower"] OBJ_THRESH = 0.1 NMS_THRESH = 0.5 overlap_ratio = 0.25 slice = True slice_scales = [[640, 640], [1280, 1280]] owl_sahi = OWL_SAHI(model_path, label_list, OBJ_THRESH, NMS_THRESH, overlap_ratio, slice, slice_scales) images = os.listdir("images") for image_file in images: print(image_file) image_path = os.path.join("images", image_file) all_boxes_processed = owl_sahi.predict(image_path) image = cv2.imread(image_path) for box in all_boxes_processed: xmin, ymin, xmax, ymax = box.bbox.to_xyxy() score = box.score.value clsse = box.category.id cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0), 4) cv2.putText(image, '{0} {1:.2f}'.format(label_list[clsse], score), (xmin, ymin - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 4, cv2.LINE_AA) cv2.imwrite(f"output/{image_file}", image) 三、小结本文介绍了基于NanoOWL和SAHI框架的视觉多模态模型切分检测方案,通过结合文本提示与图像切分技术,在Jetson Orin等边缘设备上实现开放词汇目标检测。该方案利用Vision Transformer架构和CLIP图文对齐能力,支持任意类别目标检测,并通过SAHI进行图像切片处理与后处理合并,提升检测精度与效率,适用于低空无人机目标检测等边缘推理场景。
-
【朝推夜训】松材线虫病高清图片切分检测我们以 Jetson Orin Nano 为例,介绍如何使用Python在资源受限的嵌入式设备上实现高清大图切分检测。一、模型导出1. 安装 Cmake,创建 24G swap 空间模型导出时依赖更高版本的Cmake,这里我们直接编译安装:sudo apt update sudo apt install libssl-dev git clone -b v3.25.1 https://github.com/Kitware/CMake.git cd CMake ./bootstrap && make && sudo make install cmake --version交换空间是操作系统用来拓展可用内存的一种机制,可以在内存不足的情况下继续运行,避免程序崩溃或者系统卡死,但是交换空间的访问速度远低于物理内存!禁用Jetson设备上的ZRAM交换配置:ZRAM会将内存页面压缩并存储在内存中,以减少对磁盘的依赖。sudo systemctl disable nvzramconfig使用fallocate创建一个24GB大小的文件,位于/var/24GB.swap路径。sudo fallocate -l 24G /var/24GB.swap设置交换空间格式sudo mkswap /var/24GB.swap启用交换空间sudo swapon /var/24GB.swap永久自启交换空间echo "/var/24GB.swap none swap sw 0 0" | sudo tee -a /etc/fstab重启系统后,系统交换空间增加至24GB:sudo reboot 2. 安装 ultralytics,创建 TensorRT 软连接pip install ultralytics pip install tqdm pandas pip install onnx==1.12.0 onnxslim==0.1.65 protobuf==3.20.1 pip install onnx-simplifier==0.3.10 pip install /home/vsuav/Downloads/onnxruntime_gpu-1.12.1-cp38-cp38-linux_aarch64.whl导出TensorRT模型时依赖其Python安装包,一般在系统Python目录下,以Jetson Orin Nano为例,我们可以建立软连接指向TensorRT安装路径:sudo ln -s /usr/lib/python3.8/dist-packages/tensorrt* /home/vsuav/miniconda3/envs/py38/lib/python3.8/site-packages3. 导出 TensorRT FP16 精度的模型from ultralytics import YOLO model = YOLO("yolov8-1_640x640_amd64_fp32.pt") model.export( format="engine", workspace=4, imgsz=640, half=True, device=0, batch=1 ) 二、切分检测1. 安装 SAHI 库pip install sahi==0.11.18这里我们使用0.11.18版本,修改/home/vsuav/miniconda3/envs/py38/lib/python3.8/site-packages/sahi/models/yolov8.py文件,注释掉第33行代码使其能够加载导出的Engine模型。class Yolov8DetectionModel(DetectionModel): def check_dependencies(self) -> None: check_requirements(["ultralytics"]) def load_model(self): """ Detection model is initialized and set to self.model. """ from ultralytics import YOLO try: model = YOLO(self.model_path) # model.to(self.device) self.set_model(model) except Exception as e: raise TypeError("model_path is not a valid yolov8 model path: ", e) 2. 运行检测代码加载模型import cv2 import numpy as np import matplotlib.pyplot as plt from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction detection_model = AutoDetectionModel.from_pretrained( model_type='yolov8', model_path="yolov8-1_640x640_amd64_fp16.engine", confidence_threshold=0.45 ) WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'. Loading yolov8-1_640x640_amd64_fp16.engine for TensorRT inference... [09/04/2025-11:27:15] [TRT] [I] Loaded engine size: 51 MiB [09/04/2025-11:27:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +616, GPU +757, now: CPU 1052, GPU 5537 (MiB) [09/04/2025-11:27:17] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +49, now: CPU 0, GPU 49 (MiB) [09/04/2025-11:27:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +29, now: CPU 1001, GPU 5519 (MiB) [09/04/2025-11:27:18] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +27, now: CPU 0, GPU 76 (MiB) 图片切分检测image_path = "28c56c0b-3ff7-4997-88b3-5a8330f7ea88.jpeg" result = get_sliced_prediction( image_path, detection_model, slice_height = 640, slice_width = 640, overlap_height_ratio = 0.2, overlap_width_ratio = 0.2, perform_standard_pred = True, postprocess_class_agnostic = True, postprocess_match_threshold = 0.55, ) Performing prediction on 6 slices. Loading yolov8-1_640x640_amd64_fp16.engine for TensorRT inference... [09/04/2025-11:27:20] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value. [09/04/2025-11:27:20] [TRT] [I] Loaded engine size: 51 MiB [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +32, now: CPU 1581, GPU 6577 (MiB) [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +50, now: CPU 0, GPU 126 (MiB) [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +6, now: CPU 1530, GPU 6537 (MiB) [09/04/2025-11:27:20] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +27, now: CPU 1, GPU 153 (MiB) 导出检测结果result.export_visuals(export_dir="output/", file_name="sliced_result") result_img_split = cv2.imread("output/sliced_result.png") plt.imshow(result_img_split[:, :, ::-1]) plt.axis('off') plt.show() 三、小结该方案成功在Jetson Orin Nano设备上运行,能够有效处理高清大图的松材线虫病检测任务,在保证检测精度的同时充分利用了嵌入式设备的硬件资源,为林业病虫害防治提供了实用的技术方案。
-
C++ TensorRT YOLOv8-SAHI 高性能部署指南项目介绍本项目将介绍如何在Jetson等嵌入式设备上实现YOLOv8-SAHI的高性能部署,特别是使用Int8引擎的优化方案。在Jetson Orin Nano (8GB)设备上,图像切片和批量推理的测试时间消耗小于0.05秒,对1080p视频进行切分检测和bytetrack跟踪性能接近15FPS。# 代码仓库: https://github.com/HouYanSong/tensorrtx-yolov8-sahi导出 YOLOv8 Int8 量化模型我们固定输入图像尺寸1440x1080进行切分,其中每张切分子图的大小为640x640重叠度>20%,加上原始图像一次推理对8张图像进行检测,导出Int8量化后BatchSize=8的模型。从yolov8.pt生成yolov8s.wts权重文件pip install ultralytics python gen_wts.py从yolov8s.wts导出yolov8s.engine引擎文件,BatchSize大小为8sudo apt install libeigen3-devrm -fr build cmake -S . -B build cmake --build build cd build ./yolov8_sahi -s ../weights/yolov8s.wts ../weights/yolov8s.engine s模型的参数配置模型的配置文件为include/config.h,这里我们使用yolov8s官方预训练模型,模型的输入大小为640x640总共有80个类别,并且设置模型的kBatchSize = 8,一次最多可8以推理8张图像,指定量化图片的路径导出Int8量化后的模型。#ifndef CONFIG_H #define CONFIG_H // #define USE_FP16 #define USE_INT8 #include <string> #include <vector> const static char *kInputTensorName = "images"; const static char *kOutputTensorName = "output"; const static int kNumClass = 80; const static int kBatchSize = 8; const static int kGpuId = 0; const static int kInputH = 640; const static int kInputW = 640; const static float kNmsThresh = 0.55f; const static float kConfThresh = 0.45f; const static int kMaxInputImageSize = 3000 * 3000; const static int kMaxNumOutputBbox = 1000; const std::string trtFile = "../weights/yolov8s.engine"; const std::string cacheFile = "./int8calib.table"; const std::string calibrationDataPath = "../images/"; // 存放用于 int8 量化校准的图像 const std::vector<std::string> vClassNames { "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" }; #endif // CONFIG_H YOLOv8-SAHI 切分检测为了验证量化后模型精度以及Batch推理的性能,这里我们使用Int8量化后的模型直接对量化图片进行切分检测,推理命令如下:cd build ./yolov8_sahi -d ../weights/yolov8s.engine ../images/在Jetson Orin Nano (8GB)上使用Int8引擎的YOLOv8-SAHI性能表现如下:sample0102.png YOLOv8-SAHI: 1775ms sample0206.png YOLOv8-SAHI: 46ms sample0121.png YOLOv8-SAHI: 44ms sample0058.png YOLOv8-SAHI: 44ms sample0070.png YOLOv8-SAHI: 44ms sample0324.png YOLOv8-SAHI: 43ms sample0122.png YOLOv8-SAHI: 44ms sample0086.png YOLOv8-SAHI: 45ms sample0124.png YOLOv8-SAHI: 45ms sample0230.png YOLOv8-SAHI: 45ms ...可以看到模型对单张图片的推理时间小于0.5毫秒,可以达到实时检测的要求。YOLOv8-SAHI-ByteTrack 视频跟踪我们可以结合ByteTrack跟踪算法对视频文件进行实时的切分检测和跟踪,在build目录下执行:cd build ./yolov8_sahi_track ../media/c3_1080.mp4 在Jetson Orin Nano (8GB)上YOLOv8-SAHI-ByteTrack性能表现如下:Total frames: 341 Init ByteTrack! Processing frame 20 (8 fps) Processing frame 40 (11 fps) Processing frame 60 (12 fps) Processing frame 80 (12 fps) Processing frame 100 (13 fps) Processing frame 120 (13 fps) Processing frame 140 (13 fps) Processing frame 160 (14 fps) Processing frame 180 (14 fps) Processing frame 200 (14 fps) Processing frame 220 (14 fps) Processing frame 240 (14 fps) Processing frame 260 (14 fps) Processing frame 280 (14 fps) Processing frame 300 (14 fps) Processing frame 320 (14 fps) Processing frame 340 (15 fps) FPS: 15 可以看到模型在1080p的视频上切分检测的帧率接近15FPS,并且ByteTrack的跟踪效果非常优秀。小结通过本项目,开发者可以在资源受限的嵌入式设备上实现高效的YOLOv8切分检测和跟踪,特别适用于需要实时处理的边缘计算场景。
-
如何在虚拟环境中调用TensorRTPython版本的TensorRT是跟随Jetpack已经安装好的,但只适配了Jetpack自带的Python版本,因此我们在使用conda创建虚拟环境时Python版本尽量和系统版本保持一致。我们首先激活虚拟环境,我创建的名称是py38:conda activate py38输入pip list查看已经安装好的Python依赖包:Package Version ------------------ ----------------------- certifi 2025.8.3 charset-normalizer 3.4.3 filelock 3.16.1 fsspec 2025.3.0 idna 3.10 Jinja2 3.1.6 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.1 numpy 1.23.5 Pillow 9.5.0 pip 24.2 requests 2.32.4 setuptools 75.1.0 sympy 1.13.3 torch 2.1.0a0+41361538.nv23.6 torchvision 0.16.1 typing_extensions 4.13.2 urllib3 2.2.3 wheel 0.44.0可以看到没有TensorRT,其安装路径一般在系统Python目录下,以Jetson Orin Nano为例,我们可以建立软连接指向TensorRT安装路径:sudo ln -s /usr/lib/python3.8/dist-packages/tensorrt* /home/houyansong/miniconda3/envs/py38/lib/python3.8/site-packages之后再次输入pip list可以看到已经包含TensoRT依赖包:Package Version ------------------ ----------------------- certifi 2025.8.3 charset-normalizer 3.4.3 filelock 3.16.1 fsspec 2025.3.0 idna 3.10 Jinja2 3.1.6 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.1 numpy 1.23.5 Pillow 9.5.0 pip 24.2 requests 2.32.4 setuptools 75.1.0 sympy 1.13.3 tensorrt 8.5.2.2 torch 2.1.0a0+41361538.nv23.6 torchvision 0.16.1 typing_extensions 4.13.2 urllib3 2.2.3 wheel 0.44.0我们可以验证一下,在命令行中输入:python -c "import tensorrt; print(tensorrt.__version__)" 输出结果为8.5.2.2,说明安装成功。
-
【朝推夜训】如何在边缘设备上搭建深度学习开发环境如何在边缘设备上搭建深度学习的开发环境,我们以Jetson Orin Nano为例,介绍如何在开发板上安装Miniconda并配置conda源,以及如何安装Pytorch和Torchvision。1. 安装 Miniconda首先下载Miniconda最新安装包:wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh运行Miniconda3-latest-Linux-aarch64.sh安装脚本进行安装:bash ~/Miniconda3-latest-Linux-aarch64.sh关闭并重新打开终端窗口以使安装完全生效,或者使用以下命令刷新终端:source ~/.bashrc2. conda 换源首先编辑.condarc文件:vi ~/.condarc我们使用清华源,将文件修改为如下内容,即可添加Anaconda Python免费仓库。channels: - defaults show_channel_urls: true default_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud使用下列命令清除索引缓存,并创建Python-3.8开发环境。conda clean -i conda create -n py38 python=3.8 3. pip 换源在用户目录创建.pip目录,并编辑pip.conf文件:cd ~ mkdir .pip cd .pip vi pip.confpip.conf写入以下内容:[global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple/ [install] trusted-host = pypi.tuna.tsinghua.edu.cn保存pip.conf4. 安装 Pytorch 和 Torchvision首先打开PyTorch for Jetson官网根据自己的JetPack版本选择合适安装包进行下载:https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048 cd ~/Downloads wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl我下载版本的是PyTorch v2.1.0,之后激活我们刚刚创建好的conda环境py38并安装Pytorch:conda activate py38 sudo apt-get install python3-pip libopenblas-base libopenmpi-dev libomp-dev pip install ~/Downloads/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl安装完成后,我们可以验证一下,在命令行中输入:python -c "import torch; print(torch.cuda.is_available())" 结果为True,则说明安装成功。之后安装Torchvision,Pytorch v2.1.0对应torchvision v0.16.1:PyTorch v1.0 - torchvision v0.2.2PyTorch v1.1 - torchvision v0.3.0PyTorch v1.2 - torchvision v0.4.0PyTorch v1.3 - torchvision v0.4.2PyTorch v1.4 - torchvision v0.5.0PyTorch v1.5 - torchvision v0.6.0PyTorch v1.6 - torchvision v0.7.0PyTorch v1.7 - torchvision v0.8.1PyTorch v1.8 - torchvision v0.9.0PyTorch v1.9 - torchvision v0.10.0PyTorch v1.10 - torchvision v0.11.1PyTorch v1.11 - torchvision v0.12.0PyTorch v1.12 - torchvision v0.13.0PyTorch v1.13 - torchvision v0.13.0PyTorch v1.14 - torchvision v0.14.1PyTorch v2.0 - torchvision v0.15.1PyTorch v2.1 - torchvision v0.16.1PyTorch v2.2 - torchvision v0.17.1PyTorch v2.3 - torchvision v0.18.0Torchvision安装命令如下:sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libopenblas-dev libavcodec-dev libavformat-dev libswscale-dev git clone --branch v0.16.1 https://github.com/pytorch/vision torchvision cd torchvision conda activate py38 export BUILD_VERSION=0.16.1 pip install numpy==1.23.5 Pillow==9.5.0 requests==2.32.4 python setup.py install --user等待Trochvision安装完成后,我们可以验证一下,在命令行中输入:python -c "import torchvision; print(torchvision.__version__)" 成功打印版本号说明安装成功!
-
松材线虫病检测1. 数据切分无人机广角拍摄的影像分辨率较高(4000x3000),首先对人工标注好的松材线虫病数据集进行切分,将大图切分成小图并设置不同的切分尺寸(例如:1000x1000、1500x1500、2000x2000)和重叠比例(例如:0%、10%、20%、30%)送入模型进行训练。2. 模型训练YOLOv8自2023年推出后,经过多次优化迭代,其架构设计(如C2F模块、动态标签分配)与训练流程已趋于成熟。例如嵌入式设备依赖v8的轻量化特效,在医疗检测领域,v8的高召回率已被临床验证。YOLO12等虽在理论上超越YOLOv8,但是v8的推理速度仍具不可替代性,目前在工业界广泛采用该版本进行部署。我们使用YOLOv8对等比例缩放后的原始图像和切分后的松材线虫病检测数据集进行训练,提高模型对不同大小目标的泛化能力,每次迭代训练s和m两种尺寸的模型,分别用于视频直播检测和图像的自动标注。目前我们的模型已经适配国产昇腾和英伟达的算力卡,可以实现模型的自动化训练作业,并针对不同算力芯片进行模型的自动转换和量化。3. 云上标注我们的模型可以对无人机回传的图片和视频进行切分检测和自动标注,针对不同大小的目标和类别可以设置不同的切分尺寸和重叠比例,实现无人机影像的细粒度检测。4. 直播推理我们的AI直播推理业务Pipeline并发运行,使用Python结合C++进行开发,功能模块化,业务运行更高效,可以在RK3588、Jetson系列开发板上进行部署。目前针对松材线虫病检测的场景,已经支持对9种疫木的实时识别。----转自博客:https://bbs.huaweicloud.com/blogs/458003
-
如何使用 Python 开发 AI 图编排应用本文将介绍使用Python开发一个简单的AI图编排应用,我们的目标是实现AI应用在RK3588上灵活编排和高效部署。首先我们定义的图是由边和节点组成的有向无环图,边代表任务队列,表示数据在节点之间的流动关系,每个节点都是一个计算单元,用于处理特定的任务。之后我们可以定义一组的处理特定任务的函数节点也称为计算单元,例如:read_frame、model_infer、kf_tracker、draw_boxes、push_frame、redis_push,分别用于读取视频、模型检测、目标跟踪、图像绘制、视频输出以及结果推送。每个节点可以有一个输入和多个输出,数据在节点之间是单向流动的,节点之间通过边进行连接,每个节点通过队列消费和传递数据。代码地址:https://github.com/HouYanSong/modelbox-rk3588一. 计算节点的实现我们在Json文件中定义每一个节点的的数据结构并使用Python进行代码实现:读流计算单元有4个参数:pull_video_url、height、width、fps,分别代表视频地址、视频高度和宽度以及读取帧率,它仅作为生产者,产生的数据可以输出到多个队列。"read_frame": { "config": { "pull_video_url": { "type": "str", "required": true, "default": null, "desc": "pull video url", "source": "mp4|flv|rtmp|rtsp" }, "height": { "type": "int", "required": true, "default": null, "max": 1440, "min": 720, "desc": "video height" }, "width": { "type": "int", "required": true, "default": null, "max": 1920, "min": 960, "desc": "video width" }, "fps": { "type": "int", "required": true, "default": null, "max": 15, "min": 5, "desc": "frame rate" } }, "multi_output": [] } 函数代码的实现如下,我们可以对视频文件或者视频流使用ffmpeg进行硬件解码,并将解码后的帧数据写入到队列中,用于后续任务节点的计算。def read_frame(share_dict, flowunit_data, queue_dict, data): pull_video_url = flowunit_data["config"]["pull_video_url"] height = flowunit_data["config"]["height"] width = flowunit_data["config"]["width"] fps = flowunit_data["config"]["fps"] ffmpeg_cmd = [ 'ffmpeg', '-c:v', 'h264_rkmpp', '-i', pull_video_url, '-r', f'{fps}', '-loglevel', 'info', '-s', f'{width}x{height}', '-an', '-f', 'rawvideo', '-pix_fmt', 'bgr24', 'pipe:' ] ffmpeg_process = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE, stderr=sp.DEVNULL, bufsize=10**7) index = 0 while True: index += 1 raw_frame = ffmpeg_process.stdout.read(width * height * 3) if not raw_frame: break else: frame = np.frombuffer(raw_frame, dtype=np.uint8).reshape((height, width, -1)) data["frame"] = frame for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) # 读取结束,图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) ffmpeg_process.stdout.close() ffmpeg_process.terminate() 推理计算单元的函数定义如下,它有一个输入和多个输出,我们可以指定模型和配置文件路径以及单次图像推理的批次大小等参数。"model_infer": { "config": { "model_file": { "type": "str", "required": true, "default": null, "desc": "model file path, rk3588 mostly ends with .rknn" }, "model_info": { "type": "str", "required": true, "default": null, "desc": "model info file path, mostly use json file" }, "batch_size": { "type": "int", "required": true, "default": null, "max": 8, "min": 1, "desc": "batch size" } }, "single_input": null, "multi_output": [] } 对应的函数实现如下,这里我们通过创建线程池的方式对图像进行批量推理,BatchSize的大小代表创建线程池的数量,将一个批次的推理结果写入到输出队列中,输出队列不唯一,可以为空或有多个输出队列。def model_infer(share_dict, flowunit_data, queue_dict, data): model_file = flowunit_data["config"]["model_file"] model_info = flowunit_data["config"]["model_info"] batch_size = flowunit_data["config"]["batch_size"] rknn_lite_list = [] for i in range(batch_size): rknn_lite = RKNNLite() rknn_lite.load_rknn(model_file) rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) rknn_lite_list.append(rknn_lite) with open(model_info, "r") as f: model_info = json.load(f) labels = [] for label in list(model_info["model_classes"].values()): labels.append(label) IMG_SIZE = model_info["input_shape"][0][-2:] OBJ_THRESH = model_info["conf_threshold"] NMS_THRESH = model_info["nms_threshold"] exist = False index = 0 while True: index += 1 image_batch = [] if flowunit_data["single_input"] is not None: for i in range(batch_size): data = queue_dict[flowunit_data["single_input"]].get() # 图片数据为None就退出循环 if data["frame"] is None: exist = True break image_batch.append(data) else: break with ThreadPoolExecutor(max_workers=batch_size) as executor: results = list(executor.map(infer_single_image, [(data["frame"], rknn_lite_list[i % batch_size], IMG_SIZE, OBJ_THRESH, NMS_THRESH) for i, data in enumerate(image_batch)])) for i, (boxes, classes, scores) in enumerate(results): classes = [labels[class_id] for class_id in classes] data = image_batch[i] if data.get("boxes") is None: data["boxes"] = boxes data["classes"] = classes data["scores"] = scores else: data["boxes"].extend(boxes) data["classes"].extend(classes) data["scores"].extend(scores) for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) if exist: break # 读取结束,图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) for rknn_lite in rknn_lite_list: rknn_lite.release() 跟踪功能单元的可以对推理结果添加跟踪ID,如果没有推理结果,则直接返回原始数据,其定义如下:"kf_tracker": { "config": {}, "single_input": null, "multi_output": [] } 对应的函数代码实现如下:def kf_tracker(share_dict, flowunit_data, queue_dict, data): tracker = CentroidKF_Tracker(max_lost=30) index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break boxes, classes, scores = data.get("boxes"), data.get("classes"), data.get("scores") boxes = np.array(boxes) classes = np.array(classes) scores = np.array(scores) boxes[:, 2] = boxes[:, 2] - boxes[:, 0] boxes[:, 3] = boxes[:, 3] - boxes[:, 1] results = tracker.update(boxes, scores, classes) boxes = [] classes = [] scores = [] tracks = [] for result in results: frame_num, id, bb_left, bb_top, bb_width, bb_height, confidence, x, y, z, class_id = result boxes.append([bb_left, bb_top, bb_left + bb_width, bb_top + bb_height]) classes.append(class_id) scores.append(confidence) tracks.append(id) data["boxes"] = boxes data["classes"] = classes data["scores"] = scores data["tracks"] = tracks for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) # 读取结束,图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) 绘制功能单元可以对检测和跟踪结果进行绘制,如果检测结果或跟踪结果为空,则直接返回原始数据,其定义如下:"draw_boxes": { "single_input": null, "config": {}, "multi_output": [] } 代码逻辑如下:def draw_boxes(share_dict, flowunit_data, queue_dict, data): index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break boxes, classes, scores = data.get("boxes"), data.get("classes"), data.get("scores") if boxes is not None: tracks = data.get("tracks") if tracks is not None: for boxe, clss, track in zip(boxes, classes, tracks): cv2.rectangle(data["frame"], (boxe[0], boxe[1]), (boxe[2], boxe[3]), (0, 255, 0), 2) cv2.putText(data["frame"], f"{clss} {track}", (boxe[0], boxe[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) else: for boxe, clss, conf in zip(boxes, classes, scores): cv2.rectangle(data["frame"], (boxe[0], boxe[1]), (boxe[2], boxe[3]), (0, 255, 0), 2) cv2.putText(data["frame"], f"{clss} {conf * 100:.2f}%", (boxe[0], boxe[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) # 读取结束,图片数据置为None data["frame"] = None for queue_name in flowunit_data["multi_output"]: queue_dict[queue_name].put(data) 输出功能单元可以将视频帧编码成视频输到到视频文件或者推流到RTMP服务器,其参数定义如下:"push_frame": { "config": { "push_video_url": { "type": "str", "required": true, "default": null, "desc": "push video url", "source": "rtmp|flv|mp4" }, "format": { "type": "str", "required": true, "default": null, "desc": "vodeo format", "source": "flv|mp4" }, "height": { "type": "int", "required": true, "default": null, "max": 1920, "min": 720, "desc": "video height" }, "width": { "type": "int", "required": true, "default": null, "max": 1920, "min": 960, "desc": "video width" }, "fps": { "type": "int", "required": true, "default": null, "max": 15, "min": 5, "desc": "frame rate" } }, "single_input": null } push_video_url参数是推流地址,也可以输出到本地视频文件。format参数指定视频格式,支持flv和mp4。height和width为视频分辨率,fps是输出帧率。它仅作为消费者,具体函数代码实现如下:def push_frame(share_dict, flowunit_data, queue_dict, data): push_video_url = flowunit_data["config"]["push_video_url"] format = flowunit_data["config"]["format"] height = flowunit_data["config"]["height"] width = flowunit_data["config"]["width"] fps = flowunit_data["config"]["fps"] process_stdin = ( ffmpeg .input('pipe:', format='rawvideo', pix_fmt='bgr24', s="{}x{}".format(width, height), framerate=fps) .filter('fps', fps=fps, round='up') .output( push_video_url, vcodec='h264_rkmpp', bitrate='2500k', f=format, g=fps, an=None, timeout='0' ) .overwrite_output() .run_async(cmd=["ffmpeg", "-re"], pipe_stdin=True) ) index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break frame = data["frame"] frame = cv2.resize(frame, (width, height)) process_stdin.stdin.write(frame.tobytes()) process_stdin.stdin.close() process_stdin.terminate() 消息功能单元可以将检测或跟踪结果发送到Redis服务器,具体可以根据实际情况进行调整。"redis_push": { "config": { "task_id": { "type": "str", "required": true, "default": null, "desc": "task id" }, "host": { "type": "str", "required": true, "default": null, "desc": "redis host" }, "port": { "type": "int", "required": true, "default": null, "desc": "redis port" }, "username": { "type": "str", "required": true, "default": null, "desc": "redis username" }, "password": { "type": "str", "required": true, "default": null, "desc": "redis password" }, "db": { "type": "int", "required": true, "default": null, "desc": "redis db" } }, "single_input": null } 同样,它也仅作为消费者,只有一个输入,具体函数代码如下:def redis_push(share_dict, flowunit_data, queue_dict, data): task_id = flowunit_data["config"]["task_id"] host = flowunit_data["config"]["host"] port = flowunit_data["config"]["port"] username = flowunit_data["config"]["username"] password = flowunit_data["config"]["password"] db = flowunit_data["config"]["db"] r = redis.Redis( host = host, port = port, username = username, password = password, db = db, decode_responses = True ) index = 0 while True: index += 1 if flowunit_data["single_input"] is not None: data = queue_dict[flowunit_data["single_input"]].get() else: break # 图片数据为None就退出循环 if data["frame"] is None: break track_objs = [] height, width = data["frame"].shape[:2] boxes, classes, scores, tracks = data.get("boxes"), data.get("classes"), data.get("scores"), data.get("tracks") if boxes is not None: for boxe, clss, conf, track in zip(boxes, classes, scores, tracks): x1 = float(boxe[0] / width) y1 = float(boxe[1] / height) x2 = float(boxe[2] / width) y2 = float(boxe[3] / height) track_obj = { "bbox": [x1, y1, x2, y2], "track_id": int(track), "class_id": 0, "class_name": str(clss) } track_objs.append(track_obj) key = 'vision:track:' + str(task_id) + ':frame:' + str(index) value = json.dumps({"track_result": track_objs}) r.set(key, value) r.expire(key, 2) print(track_objs) r.close() 二、流程图编排定义好节点,我们就可以定义管道也就是“边”将“节点”的输入和输出连接起来,这里我们定义6条边也就是实例化6个队列,在配置文件中声明每条管道的名称以及队列的最大容量。"queue_size": 16, "queue_list": [ "frame_queue", "infer_queue_1", "infer_queue_2", "track_queue", "draw_queue_1", "draw_queue_2" ] 之后就是对每一个节点的参数进行配置,并定义功能单元的输入和输出。"graph_edge": { "读流功能单元": { "read_frame": { "config": { "pull_video_url": "/home/orangepi/workspace/modelbox/data/car.mp4", "height": 720, "width": 1280, "fps": 20 }, "multi_output": [ "frame_queue" ] } }, "推理功能单元": { "model_infer": { "config": { "model_file": "/home/orangepi/workspace/modelbox/model/yolov8n_800x800_int8.rknn", "model_info": "/home/orangepi/workspace/modelbox/model/yolov8n_800x800_int8.json", "batch_size": 8 }, "single_input": "frame_queue", "multi_output": [ "infer_queue_1", "infer_queue_2" ] } }, "跟踪功能单元_2": { "kf_tracker": { "config": {}, "single_input": "infer_queue_2", "multi_output": [ "track_queue" ] } }, "绘图功能单元_1": { "draw_boxes": { "config": {}, "single_input": "infer_queue_1", "multi_output": [ "draw_queue_1" ] } }, "绘图功能单元_2": { "draw_boxes": { "single_input": "track_queue", "config": {}, "multi_output": [ "draw_queue_2" ] } }, "推流功能单元_1": { "push_frame": { "config": { "push_video_url": "/home/orangepi/workspace/modelbox/output/det_result.mp4", "format": "mp4", "height": 720, "width": 1280, "fps": 20 }, "single_input": "draw_queue_1" } }, "推流功能单元_2": { "push_frame": { "config": { "push_video_url": "/home/orangepi/workspace/modelbox/output/track_result.mp4", "format": "mp4", "height": 720, "width": 1280, "fps": 20 }, "single_input": "draw_queue_2" } } } 每个功能单元需要起一个节点名称用于功能单元的创建,每个节点名称保证全局唯一,正如字典中的键值不能重复。之后根据这份图文件编排启动AI应用,Python代码如下:import os import sys import json import argparse sys.path.append(os.path.join(os.path.dirname(__file__), '..')) from etc.flowunit import * from multiprocessing import Process, Queue, Manager if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('graph_path', type=str, nargs='?', default='/home/orangepi/workspace/modelbox/graph/person_car.json') args = parser.parse_args() # 初始化数据 data = {"frame": None} config = {} # 读取流程图 with open(args.graph_path) as f: graph = json.load(f) # 创建队列 queue_dict = {} queue_size = graph["queue_size"] for queue_name in graph["queue_list"]: queue_dict[queue_name] = Queue(maxsize=queue_size) with Manager() as manager: # 创建共享字典 share_dict = manager.dict() # 创建进程 process_list = [] graph_edge = graph["graph_edge"] for process in graph_edge.keys(): p = Process(target=eval(list(graph_edge[process].keys())[0]), args=(share_dict, list(graph_edge[process].values())[0], queue_dict, data,)) process_list.append(p) print("=============Start Process...=============") # 启动进程 for p in process_list: p.start() # 等待进程结束 for p in process_list: p.join() print("==========All Process Finished.===========") 这里我们读取一段测试视频分别将检测结果和跟踪结果保存为两个视频文件输出到output目录下:(python-3.9.15) orangepi@orangepi5plus:~$ python /home/orangepi/workspace/modelbox/graph/graph.py /home/orangepi/workspace/modelbox/graph/person_car.json =============Start Process...============= ffmpeg version 04f5eaa Copyright (c) 2000-2023 the FFmpeg developers built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04) configuration: --prefix=/usr --enable-gpl --enable-version3 --enable-libdrm --enable-rkmpp --enable-rkrga libavutil 58. 29.100 / 58. 29.100 libavcodec 60. 31.102 / 60. 31.102 libavformat 60. 16.100 / 60. 16.100 libavdevice 60. 3.100 / 60. 3.100 libavfilter 9. 12.100 / 9. 12.100 libswscale 7. 5.100 / 7. 5.100 libswresample 4. 12.100 / 4. 12.100 libpostproc 57. 3.100 / 57. 3.100 ffmpeg version 04f5eaa Copyright (c) 2000-2023 the FFmpeg developers built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04) configuration: --prefix=/usr --enable-gpl --enable-version3 --enable-libdrm --enable-rkmpp --enable-rkrga libavutil 58. 29.100 / 58. 29.100 libavcodec 60. 31.102 / 60. 31.102 libavformat 60. 16.100 / 60. 16.100 libavdevice 60. 3.100 / 60. 3.100 libavfilter 9. 12.100 / 9. 12.100 libswscale 7. 5.100 / 7. 5.100 libswresample 4. 12.100 / 4. 12.100 libpostproc 57. 3.100 / 57. 3.100 W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.190] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.191] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.192] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.248] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.338] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.338] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.339] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.384] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.459] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.459] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.460] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.504] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.606] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.606] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.608] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.658] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.761] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.761] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.762] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.814] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:47.910] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:47.910] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:47.912] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:47.962] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:48.069] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:48.070] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:48.071] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:48.122] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [13:10:48.228] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) I RKNN: [13:10:48.228] RKNN Driver Information, version: 0.9.6 I RKNN: [13:10:48.229] RKNN Model Information, version: 2, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [13:10:48.280] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) Input #0, rawvideo, from 'pipe:': Duration: N/A, start: 0.000000, bitrate: 442368 kb/s Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 442368 kb/s, 20 tbr, 20 tbn Stream mapping: Stream #0:0 (rawvideo) -> fps:default fps:default -> Stream #0:0 (h264_rkmpp) Output #0, mp4, to '/home/orangepi/workspace/modelbox/output/det_result.mp4': Metadata: encoder : Lavf60.16.100 Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), bgr24(progressive), 1280x720, q=2-31, 2000 kb/s, 20 fps, 10240 tbn Metadata: encoder : Lavc60.31.102 h264_rkmpp Input #0, rawvideo, from 'pipe:': 0kB time=N/A bitrate=N/A speed=N/A Duration: N/A, start: 0.000000, bitrate: 442368 kb/s Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 442368 kb/s, 20 tbr, 20 tbn Stream mapping: Stream #0:0 (rawvideo) -> fps:default fps:default -> Stream #0:0 (h264_rkmpp) Output #0, mp4, to '/home/orangepi/workspace/modelbox/output/track_result.mp4': Metadata: encoder : Lavf60.16.100 Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), bgr24(progressive), 1280x720, q=2-31, 2000 kb/s, 20 fps, 10240 tbn Metadata: encoder : Lavc60.31.102 h264_rkmpp [out#0/mp4 @ 0x558f2625e0] video:2330kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.062495% frame= 132 fps= 19 q=-0.0 Lsize= 2331kB time=00:00:06.55 bitrate=2915.8kbits/s speed=0.924x Exiting normally, received signal 15. [out#0/mp4 @ 0x557e4875e0] video:1990kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.072192% frame= 131 fps= 18 q=-0.0 Lsize= 1991kB time=00:00:06.50 bitrate=2509.6kbits/s speed= 0.9x ==========All Process Finished.=========== Exiting normally, received signal 15.应用推理的帧率取决于视频读取的帧率以及耗时最久的功能单元,实测FPS约为20左右,满足AI实时检测的场景。
-
最近入手了一套国产的香橙派和Jetson开发板,如何在板子上搭建深度学习开发环境进行AI应用开发?
-
通过改进采样策略,扩散模型可以在保持生成质量的同时显著减少推理时间。以下是核心方法及其数学依据的详细解析:一、传统扩散模型的采样瓶颈扩散模型的生成过程需要逐步去噪(通常需数千步),每一步均需运行噪声预测网络(如UNet)。例如,DDPM生成512×512图像需1000步,耗时约10秒。其核心瓶颈在于:马尔可夫链的线性依赖:每一步仅依赖前一步的状态,无法跳步。局部线性近似:传统方法(如DDPM)假设反向过程是局部线性的,导致收敛速度慢。二、加速采样策略的核心方法1. DDIM(Denoising Diffusion Implicit Models)核心思想:将扩散过程参数化为非马尔可夫过程,允许跳步生成。数学依据:重新参数化反向过程:传统DDPM定义反向过程为 x_{t-1} = f(x_t, t),而DDIM将其扩展为:其中 \lambda 为跳步比例,允许直接从 x_t 生成 x_{t-\lambda}。确定性生成:通过固定随机种子,DDIM可一步生成完整图像(类似GAN)。效果:在ImageNet上,仅需50步即可达到DDPM 1000步的FID(25.6 vs 25.8)。2. PLMS(Pseudo Linear Multi-Step Sampling)核心思想:用线性插值估计多步后的状态,减少迭代次数。数学依据:假设多步噪声预测可近似为线性组合:权重 w_i 通过最小化MSE优化。效果:在50步时FID为26.1,接近DDPM 1000步效果。3. Stable Consistency Models(SCM)核心思想:直接建模多步一致性,避免迭代。数学依据:定义一致性损失函数:其中 \text{Iterate} 表示从 x_t 经过 T-t 步生成 x_0 的过程。效果:仅需10步即可生成高质量图像,速度提升100倍。4. 动态步长调整(Dynamic Step Selection)核心思想:根据生成中间结果的置信度自适应调整步数。数学依据:使用强化学习策略(如PPO)选择步数:其中状态 s 为当前去噪图像,动作 a 为选择步数。效果:平均步数从1000降至300,速度提升3倍。三、数学核心:扩散过程的重新参数化所有加速方法均基于对扩散过程的重新参数化,其理论基础可归纳为:非马尔可夫性:允许反向过程跨越多步,打破马尔可夫链的线性依赖。噪声预测的泛化性:假设噪声预测网络 \epsilon_\theta 能够隐式建模多步分布:重参数化技巧:通过引入虚拟变量(如DDIM的 \lambda),将多步过程映射到单步空间。四、实际效果与优化组合DiT-XL/2 + DDIM:在ImageNet 256×256生成任务中,仅需50步即可达到FID 29.7(接近1000步的38.5)。SCM + 潜在扩散模型:在3D生成中,10步生成质量与1000步相当,显存占用减少90%。混合策略:结合动态步长(前100步)与SCM(后900步),总步数减少至200步,速度提升5倍。五、未来方向神经微分方程求解:将扩散过程建模为ODE,用自适应求解器(如DPM-Solver)动态调整步数。硬件感知优化:针对GPU/NPU特性设计并行化采样算法(如CUDA核融合)。多模态联合训练:共享噪声预测网络,提升跨任务采样效率。总结改进采样策略的核心在于打破扩散过程的线性依赖和增强噪声预测的泛化能力。通过数学上的重新参数化与非马尔可夫建模,DDIM、SCM等方法可将推理时间从小时级缩短至秒级,同时保持生成质量。未来方向是结合硬件特性与多模态架构,进一步突破效率瓶颈。
-
1.执行 build_sdk.sh 报如下图错误(dockhub网站内未能找到huawei-ec-iot/sdk:base镜像)2.镜像服务应该是正常的,可以下载nginx3.docker镜像拉取采用了国内代理
-
8月21日-23日,由云原生计算基金会(CNCF)和 Linux 基金会联合主办的KubeCon + CloudNativeCon + Open Source Summit + Al_dev China 2024 大会在中国香港盛大召开。会上,华为云云原生开源负责人,CNCF TOC王泽锋,蔚来汽车战略新业务数字系统架构师蒋旭辉联合发表“云原生技术加速电动汽车创新”主题演讲,深入探讨云原生解决方案在革新EV领域中的转变影响和未来前景。KubeCon China 2024 主题演讲作为一家全球化的智能电动汽车公司,蔚来致力于提供高性能的智能电动汽车与极致用户体验,坚持核心技术的正向研发,建立了由12个领域的技术栈构成的“蔚来技术全栈”。硬件基础决定软件形态,随着车载算力的不断增强,车端软件数量也在爆发式的增长。车端作为其团队重点,在新的行业变革中也产生了新的需求和挑战。E/E架构与SDV趋势下车端软件开发挑战根据博世2019年提出的整车电子电器架构的演进图,当前的新能源汽车有一部分已经达到了3.0时代,即区域控制器和车载电脑;在向车云计算的演进过程中,部分功能已在实现车云协同。基于3.0架构,汽车行业有一个比较热门的话题,是软件定义汽车。软件定义汽车实际是SOA架构和中央计算E/E架构的合体。其中的核心就是中央计算单元。当前的中央计算单元已经融合核座舱、网联、智驾的能力,软件平台的重要性更加突出。在规划中央计算单元的规划定义阶段,将云端的能力当成整体平台的一部分,实现车云的一体化设计。行业趋势 – SDV蔚来数字系统团队,主要聚焦于整个平台中的智能网联和工具链的部分。在智能网联的研发环节,面临的行业环境变化有:敏捷开发敏捷交付需求:软件研发周期变短,汽车换代时间由以前的8年左右现在提速到1年多。随着软件比重的增加,交付后版本更新成为一个必须项。硬件平台异构,开发人员很并行开发难度高。研发与测试管理成本提升:汽车软件除了一些硬件的差异化配置外,软件也开始出现差异化。为了实现软件的千人千面,需要平台提供定向推送的能力,管理复杂。传统的汽车厂商作为集成商,更多的是做整车的功能测试。随着汽车厂商的软件自研能力提高,软件测试项目的内容和复杂度也大幅提高,这些变化带来了测试成本的挑战。跨领域团队协作愈发频繁:中央计算单元集成的功能递增,车和云之间,自动驾驶、网联、座舱等团队的交叉协作越来越密切。汽车软件的开发也在引入互联网的模式,由传统的V模型,转变到V模型与敏捷开发混合。技术生态双重优势云原生助力车端软件平台构建对于当前车企研发所面临的问题,王泽锋提到,构建车端软件平台,云原生从技术维度和生态维度均具备明显优势。技术层面,云原生提供便捷的软件依赖管理,灵活的编排部署策略,技术栈开放,灵活可定制;生态层面,成熟的云原生生态为企业提供了丰富的选择,厂商基于标准接口提供服务,互操作性强且开源为主,拥有丰富的标准软件生态,与此同时,云原生行业人才系统成熟,这为车企提供了众多方案选择与研发力量后盾。CNCF TOC 华为云云原生开源负责人 王泽锋如何基于云原生技术构建车端软件平台?将云原生技术栈应用到车的领域,也面临着以下挑战:1. 算力稀缺:车端算力成本比云数据中心、消费电子高出很多;2. 海量边缘节点接入:汽车的接入数量级在数十万到数百万之间,对于平台的管理规模本身就是巨大的挑战;3. 运行环境差异:汽车的网络环境稳定性差(经常处于地下室、隧道等无网络环境),本身的高速移动也会表现为网络的高延迟高丢包现象。以KubeEdge为核心构建蔚来整套车云协同平台蒋旭辉提到,经过大量调研和选型工作后,我们发现KubeEdge能够很好地解决这些挑战,因此我们选择使用KubeEdge作为平台的核心,以Kubernetes + KubeEdge为技术底座,构建了整套车云协同平台。在实车端应用的容器化后,蔚来在车上引入了KubeEdge,将车端的容器应用也纳入到API-Server统一管理。KubeEdge在给车端带来容器应用编排能力的同时,自身占用资源较少,并且启动非常迅速,可以满足汽车软件的使用场景需求。借助KubeEdge的离线自治能力,在弱网/断网环境下,平台也可以实现车端软件的稳定运行和故障恢复。蔚来汽车战略新业务数字系统架构师 蒋旭辉KubeEdge架构优势作为专为云边协同开发的平台,KubeEdge兼顾各种边缘场景的特殊性:使用K8s作为控制面,并将KubeEdge的额外功能也通过K8s API提供,最大限度地帮助用户融合云数据中心与边缘的生态;针对边缘环境受限的场景,KubeEdge在完成自身轻量化的基础上支持用户自定义功能裁剪,以满足不同的资源需求。并且KubeEdge提供了节点级元数据持久化,支持边缘离线自治;KubeEdge双向多路复用的云边消息通道,替代原本的节点与控制面之间链接,实现对于APIserver连接数的放大,并且引入全时段可靠增量同步的机制应对弱网环境挑战。KubeEdge设计理念在车上引入KubeEdge,将车端的容器应用也纳入到API-Server统一管理,在给车端带来容器应用编排能力的同时,KubeEdge自身占用资源较少,并且启动非常迅速,可以满足汽车软件的使用场景需求。借助KubeEdge的离线自治能力,在弱网/断网环境下,也可以实现车端软件的稳定运行和故障恢复,蒋旭辉在演讲中表示。▍突破APIserver连接数限制,实现超大规模边缘汽车管理在量产车型大规模接入的场景中,需要实现高出传统云数据中心几个数量级的节点管理规模,并且应对节点联接的潮汐效应问题。在KubeEdge的云边通信机制中,配合车端的持久化存储,我们实现了全时段的增量同步机制,可以有效降低车辆启动和断联恢复时的网络冲击,以及状态同步过程中持续开销。通过云边消息通道的双向多路复用机制,KubeEdge可以突破APIserver的连接数限制,实现超大规模的边缘汽车管理。蔚来基于KubeEdge构建车云协同平台架构KubeEdge使用K8s作为控制面,将车的Node、Pod等资源对象的管理实现为K8s原生的API,屏蔽了车端与云端资源的管理差异。业务系统可以很方便地管理车上的容器应用,而不需要感知应用在不同环境应该如何部署。▍场景实际落地, 开发速度、软件质量提升,有效降低使用成本新能源汽车电池健康安全数据分析新能源汽车电池安全一直是用户比较关心的重点,蔚来在电池安全和电池健康方面也一直投入了大量的精力去实现更优的体验,除了电池本身的技术演进外,还运用大数据和人工智能算法来预测和分析电池健康程度,从而优化电池策略,提高电池寿命。场景1 数据分析-电池健康安全检测在具体的工程侧,由于成本和网络的限制,数据分析团队需要进行车和云端结合的算法来达到最佳效果。边缘算法部署在车端,进行特征提取等计算,云端进行时间序列分析等。基于此场景,蔚来数字系统团队创新使用云原生技术,在算法开发阶段,算法开发同事使用容器化的方式进行边缘算法的开发。统一使用容器打包镜像,通过K8s,使云端的算法和车端的算法同步部署。在工程车辆验证阶段,算法团队只需切换依赖的基础镜像,就可以将边缘计算的容器应用快速小批量地部署到工程车辆,进行算法的验证。验证通过后,整个算法主体部分开发完成,算法团队只需根据目标车型替换对应的量产基础镜像,即可完成量产包的制作,无需关心车端的运行环境、系统版本等细节问题。引入云原生能力构建车端软件测试管理平台蔚来在开发阶段使用云原生技术以外,在软件测试阶段也引入云原生的能力。以往的的测试台架资源主要为离线的人工管理方式,不能充分利用台架资源。实车、台架本身具备较大的差异,各测试阶段和测试环境比较孤立,难以覆盖组合场景的测试需求。场景二 功能软件测试引入云原生能力后,Virtual car、台架和实车通过接入到K8s的统一监控和管理,可以更合理地安排测试任务,从而提高测试资源的利用率。蔚来团队同时创新性地将Testcase也进行了容器化,通过基于K8s Job的调度机制,可以更灵活地进行让我们的测试用例在不同测试环境上交叉执行,覆盖更多的场景。通过以上的两种场景应用,实现效能提升:开发速度提升:平台提供了统一的容器化环境依赖管理和部署方式,降低了开发门槛,提高了效率;软件质量提升:平台提供了多环境多节点的统一管理,可以支持规模的自动化测试并行执行;使用成本方面:平台学习门槛低,灵活的发布策略使得整个平台的台架等硬件环境可以更高效合理地被分配和使用。车载硬件和算力的提升带来了车端软件新的发展,在车云协同的当下,智能汽车领域更需要更新的平台技术,来支撑汽车软件的持续演进。蔚来汽车基于Kubernetes + KubeEdge开发云原生车云协同平台,并且首次搭载于量产车型,这是云原生生态领域中一次全新的尝试,为车企带来开发交付效率、团队协作等方面的巨大提升。也相信云原生技术将持续推进整个车端软件的研发创新与深入应用,助力汽车行业迎来更广阔的未来。更多云原生技术动向关注容器魔方
上滑加载中
推荐直播
-
HDC深度解读系列 - Serverless与MCP融合创新,构建AI应用全新智能中枢2025/08/20 周三 16:30-18:00
张昆鹏 HCDG北京核心组代表
HDC2025期间,华为云展示了Serverless与MCP融合创新的解决方案,本期访谈直播,由华为云开发者专家(HCDE)兼华为云开发者社区组织HCDG北京核心组代表张鹏先生主持,华为云PaaS服务产品部 Serverless总监Ewen为大家深度解读华为云Serverless与MCP如何融合构建AI应用全新智能中枢
回顾中 -
关于RISC-V生态发展的思考2025/09/02 周二 17:00-18:00
中国科学院计算技术研究所副所长包云岗教授
中科院包云岗老师将在本次直播中,探讨处理器生态的关键要素及其联系,分享过去几年推动RISC-V生态建设实践过程中的经验与教训。
回顾中 -
一键搞定华为云万级资源,3步轻松管理企业成本2025/09/09 周二 15:00-16:00
阿言 华为云交易产品经理
本直播重点介绍如何一键续费万级资源,3步轻松管理成本,帮助提升日常管理效率!
回顾中
热门标签