• [技术干货] 识别你的手势,它很有一手
    CNN-RNN 视频动态手势识别人工智能的发展日新月异,也深刻的影响到人机交互领域的发展。手势动作作为一种自然、快捷的交互方式,在智能驾驶、虚拟现实等领域有着广泛的应用。手势识别的任务是,当操作者做出某个手势动作后,计算机能够快速准确的判断出该手势的类型。本文将使用ModelArts开发训练一个视频动态手势识别的算法模型,对上滑、下滑、左滑、右滑、打开、关闭等动态手势类别进行检测,实现类似华为手机隔空手势的功能。算法简介CNN-RNN视频动态手势识别算法首先使用预训练网络InceptionResNetV2逐帧提取视频动作片段特征,然后输入LSTM进行分类。我们使用全栈AI黑客松决赛样例数据对算法进行测试,总共包含108段视频,数据集包含无效手势、上滑、下滑、左滑、右滑、打开、关闭、放大、缩小等9种手势的视频,数据集下载链接如下:https://developer.huaweicloud.com/develop/aigallery/dataset/detail?id=7e9c0d90-461f-4af2-93b3-67a8df76c109代码实现首先我们将采集的视频文件解码抽取关键帧,每隔4帧保存一次,然后对图像进行中心裁剪和预处理,代码如下:def load_video(file_name): cap = cv2.VideoCapture(file_name) # 每隔多少帧抽取一次 frame_interval = 4 frames = [] count = 0 while True: ret, frame = cap.read() if not ret: break # 每隔frame_interval帧保存一次 if count % frame_interval == 0: # 中心裁剪 frame = crop_center_square(frame) # 缩放 frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE)) # BGR -> RGB [0,1,2] -> [2,1,0] frame = frame[:, :, [2, 1, 0]] frames.append(frame) count += 1 return np.array(frames) 然后我们创建图像特征提取器,使用预训练模型InceptionResNetV2提取图像特征,代码如下:def get_feature_extractor(): feature_extractor = keras.applications.inception_resnet_v2.InceptionResNetV2( weights = 'imagenet', include_top = False, pooling = 'avg', input_shape = (IMG_SIZE, IMG_SIZE, 3) ) preprocess_input = keras.applications.inception_resnet_v2.preprocess_input inputs = keras.Input((IMG_SIZE, IMG_SIZE, 3)) preprocessed = preprocess_input(inputs) outputs = feature_extractor(preprocessed) model = keras.Model(inputs, outputs, name = 'feature_extractor') return model接着提取视频特征向量,如果视频不足40帧就创建全0数组进行补白:def load_data(videos, labels): video_features = [] for video in tqdm(videos): frames = load_video(video) counts = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if counts < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - counts # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 获取前MAX_SEQUENCE_LENGTH帧画面 frames = frames[:MAX_SEQUENCE_LENGTH, :] # 批量提取特征 video_feature = feature_extractor.predict(frames) video_features.append(video_feature) return np.array(video_features), np.array(labels) 最后创建LSTM Model,代码如下:def video_cls_model(class_vocab): # 类别数量 classes_num = len(class_vocab) # 定义模型 model = keras.Sequential([ layers.Input(shape=(MAX_SEQUENCE_LENGTH, NUM_FEATURES)), layers.LSTM(64, return_sequences=True), layers.Flatten(), layers.Dense(classes_num, activation='softmax') ]) # 编译模型 model.compile(optimizer = keras.optimizers.Adam(1e-5), loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics = ['accuracy'] ) return model模型训练体验完整的训练流程可以点击Run in ModelArts运行我发布的Notebook使用云上的免费算力训练,代码链接如下:https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=e8914ecc-953e-48b1-8e5d-90b37b3bc8e9视频推理首先加载LSTM Model,获取视频类别索引标签:import random # 加载模型 model = tf.keras.models.load_model('saved_model') # 类别标签 label_to_name = {0:'无效手势', 1:'上滑', 2:'下滑', 3:'左滑', 4:'右滑', 5:'打开', 6:'关闭', 7:'放大', 8:'缩小'} 然后使用图像特征提取器InceptionResNetV2提取视频特征:# 获取视频特征 def getVideoFeat(frames): frames_count = len(frames) # 如果帧数小于MAX_SEQUENCE_LENGTH if frames_count < MAX_SEQUENCE_LENGTH: # 补白 diff = MAX_SEQUENCE_LENGTH - frames_count # 创建全0的numpy数组 padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3)) # 数组拼接 frames = np.concatenate((frames, padding)) # 取前MAX_SEQ_LENGTH帧 frames = frames[:MAX_SEQUENCE_LENGTH,:] # 计算视频特征 N, 1536 video_feat = feature_extractor.predict(frames) return video_feat最后将视频序列的特征向量输入LSTM进行预测:# 视频预测 def testVideo(): test_file = random.sample(videos, 1)[0] label = test_file.split('_')[-2] print('文件名:{}'.format(test_file) ) print('真实类别:{}'.format(label_to_name.get(int(label))) ) # 读取视频每一帧 frames = load_video(test_file) # 挑选前帧MAX_SEQUENCE_LENGTH显示 frames = frames[:MAX_SEQUENCE_LENGTH].astype(np.uint8) # 保存为GIF imageio.mimsave('animation.gif', frames, duration=10) # 获取特征 feat = getVideoFeat(frames) # 模型推理 prob = model.predict(tf.expand_dims(feat, axis=0))[0] print('预测类别:') for i in np.argsort(prob)[::-1][:5]: print('{}: {}%'.format(label_to_name[i], round(prob[i]*100, 2))) return display(Image(open('animation.gif', 'rb').read())) 运行testVideo()函数,会随机选择一个视频进行预测,并显示模型的预测结果:文件名:hand_gesture/man_005_3_1.mp4 真实类别:左滑 预测类别: 左滑: 78.32% 右滑: 8.54% 下滑: 5.51% 无效手势: 4.33% 上滑: 1.67%文章小结本文介绍了基于CNN-RNN架构的视频动态手势识别算法,通过结合InceptionResNetV2卷积神经网络和LSTM循环神经网络,实现了对多种手势动作(如上滑、下滑、左滑、右滑、打开、关闭等)的高效识别。
  • [AI Gallery] AI Gallery 数据集上传数据后无法发布
    数据集上传后点击发布没有反应,想求助下是怎么回事
  • 图像增加小知识点
    图像增强的核心目标是提升图像质量(如清晰度、对比度)、突出关键信息,或为后续任务(如目标检测、医学影像分析、遥感解译)优化数据。根据技术原理,可分为空域增强、频域增强、深度学习增强三大类。一、空域增强:直接操作像素域(最基础、应用最广)空域增强通过修改图像像素的灰度值或空间位置实现,无需转换到其他域,计算效率高,适合实时场景(如监控、手机拍照)。1. 灰度变换:调整像素灰度分布(提升对比度 / 亮度)通过函数映射改变每个像素的灰度值,核心是优化灰度动态范围,解决 “过暗 / 过曝”“对比度低” 问题。(1)线性灰度变换(对比度拉伸)原理:用线性函数 g(x,y) = a*f(x,y) + b 调整灰度,其中 a 控制对比度(a>1 增强,0<a<1 降低),b 控制亮度(b>0 提亮,b<0 变暗)。适用场景:图像整体灰度偏暗 / 偏亮(如逆光照片、监控夜间图像)。示例:将灰度范围 [50,200] 拉伸到 [0,255],突出暗部细节。(2)非线性灰度变换(伽马校正)原理:用幂函数 g(x,y) = c*f(x,y)^γ 调整,γ 是关键参数:γ<1:增强暗部灰度(适合过暗图像,如室内低光照片);γ>1:增强亮部灰度(适合过曝图像,如晴天雪地照片)。特点:比线性变换更贴合人眼对亮度的非线性感知,手机相机 “夜景模式” 常用此技术。(3)直方图处理(均衡化 / 匹配)全局直方图均衡化(GHE):原理:将图像灰度直方图从 “集中分布” 变为 “均匀分布”,最大化灰度动态范围。适用场景:对比度低且灰度分布均匀的图像(如雾天照片、医学 X 光片)。缺点:可能导致局部过曝(如大面积暗部区域被过度拉伸)。自适应直方图均衡化(CLAHE):改进:将图像分割为多个子块(如 8×8),对每个子块单独均衡化,避免全局过曝。核心场景:医学影像(如 CT、眼底照片)—— 需保留局部细节(如肿瘤边缘),不允许整体过曝。直方图匹配(规定化):原理:将图像直方图调整为 “目标直方图”(如参考清晰图像的直方图),用于颜色一致性校正(如批量处理监控摄像头图像)。2. 空间滤波:基于邻域像素的局部增强(去噪 / 锐化)通过 “滤波核(卷积核)” 与图像卷积,改变像素的局部灰度分布,实现去噪或锐化。(1)平滑滤波(去噪为主)核心是 “平均邻域像素”,抑制高频噪声(如椒盐噪声、高斯噪声),但会轻微模糊边缘。均值滤波:用邻域像素平均值替换中心像素,去高斯噪声效果一般,易模糊细节。高斯滤波:用高斯函数加权平均邻域像素(中心权重高、边缘低),去高斯噪声效果好,模糊程度可控( sigma 越大越模糊)。中值滤波:用邻域像素的中值替换中心像素,去椒盐噪声(黑白斑点)效果最优,且能保留边缘(非线性滤波,不平均边缘像素)。适用场景:监控视频去噪(高斯滤波)、老照片修复去斑点(中值滤波)。(2)锐化滤波(突出边缘)核心是 “增强邻域像素的灰度差异”,突出高频细节(如边缘、纹理),弥补平滑滤波的模糊。拉普拉斯滤波:通过计算邻域像素与中心像素的灰度差,强化边缘(如文字边缘、物体轮廓),但会放大噪声(需先去噪再锐化)。Sobel 滤波:分别计算水平和垂直方向的边缘梯度,可单独增强水平 / 垂直边缘(如遥感影像中的道路边缘、建筑轮廓)。USM 锐化(非锐化掩模):先对图像模糊(生成 “掩模”),再用原图减去掩模,增强细节对比度,是 Photoshop “锐化” 功能的核心算法。3. 几何变换:调整图像空间位置(对齐 / 适配)不改变像素灰度,仅调整像素的空间坐标,属于 “预处理型增强”,为后续任务(如目标检测、图像拼接)做准备。常见类型:平移(图像裁剪后对齐)、旋转(校正倾斜照片)、缩放(插值缩放,如双线性插值 —— 平滑缩放,双三次插值 —— 保留细节)、仿射变换(校正透视畸变,如手机拍文档的倾斜校正)。核心场景:OCR 文字识别(需先旋转校正倾斜文档)、无人机遥感拼接(需平移 / 缩放对齐多幅图像)。 二、频域增强:基于频率成分的全局增强(去周期性噪声 / 全局锐化)将图像通过傅里叶变换转换到 “频域”(分解为低频和高频成分):低频成分:图像的整体轮廓、大面积灰度(如天空、墙面);高频成分:图像的细节、边缘、噪声(如物体边缘、椒盐噪声)。 通过修改频域成分(保留 / 抑制高低频),再逆傅里叶变换回空域,实现增强。1. 低通滤波(保留低频,平滑去噪)抑制高频噪声,保留低频轮廓,效果类似空域的平滑滤波,但全局平滑更均匀。示例:高斯低通滤波 —— 在频域中抑制高频区域,去全局高斯噪声效果优于空域均值滤波,适合天文影像(如星空照片去宇宙射线噪声)。2. 高通滤波(保留高频,全局锐化)抑制低频模糊,保留高频细节,效果类似空域的锐化滤波,但全局锐化更自然。示例:理想高通滤波 —— 在频域中保留高频区域,增强遥感影像中的细小目标(如农田边界、电力线),但会产生 “振铃效应”(边缘出现明暗条纹),需用高斯高通滤波优化。3. 带通 / 带阻滤波(针对性处理)带通滤波:保留特定频率范围(如介于高低频之间的纹理信息),用于增强医学影像中的血管纹理(如眼底照片的血管)。带阻滤波:抑制特定频率范围,去周期性噪声效果最优(如监控摄像头因电源干扰产生的横纹 / 竖纹,其噪声频率固定,可精准抑制)。三、深度学习增强:现代技术(复杂场景下效果远超传统方法)传统方法依赖人工设计规则(如滤波核、灰度函数),对复杂场景(如超分辨率、去雾、医学影像细节增强)效果有限;深度学习通过数据驱动学习增强规则,能处理更复杂的图像退化问题。1. 超分辨率重建(SR):提升图像分辨率核心是 “从低分辨率(LR)图像生成高分辨率(HR)图像”,解决 “图像模糊、细节缺失” 问题(如老照片放大、监控图像清晰度提升)。经典模型:SRCNN(首个基于 CNN 的超分模型):用 3 层 CNN 学习 LR 到 HR 的映射,比传统插值(双三次)效果好,但细节不够精细。ESRGAN(基于 GAN 的超分):引入生成对抗网络,生成的 HR 图像细节更真实(如毛发、纹理),是当前 “图像放大” 工具(如 Topaz Gigapixel AI)的核心算法。Real-ESRGAN:针对真实世界模糊图像(如老照片、压缩失真)优化,去模糊 + 超分一体,修复效果远超传统方法。2. 图像去退化(去雾 / 去噪 / 去模糊)去雾:传统方法(如暗通道先验)依赖大气散射模型,对浓雾效果有限;深度学习方法(如 DehazeNet、GCA-Net)通过学习大量雾天 / 无雾图像对,直接生成去雾图像,可处理浓雾、不均匀雾(如城市雾霾照片)。去噪:传统方法(如 BM3D)对复杂噪声(混合高斯 + 椒盐噪声)效果一般;深度学习方法(如 DnCNN、RIDNet)通过残差学习直接学习 “噪声模式”,去噪的同时保留更多细节(如医学影像去噪 —— 不模糊肿瘤边缘)。去模糊:针对运动模糊(如手抖拍的照片)、失焦模糊,深度学习模型(如 DeblurGAN)可学习模糊核,反向恢复清晰图像,比传统盲去模糊效果好。3. 医学 / 遥感影像专用增强医学影像:如 UNet++ 增强 CT 图像中的肺结节边缘,或用注意力机制(如 ResUNet)突出 MRI 图像中的神经纤维束,辅助医生诊断。遥感影像:如用 SegSRNet 同时实现超分和地物分类增强(如区分农田与建筑),提升遥感解译精度。4. 数据增强(为模型训练服务)属于 “人工生成多样性数据”,目的是提升深度学习模型的泛化能力(而非提升单张图像质量),常用方法:基础操作:随机翻转、旋转、裁剪、缩放(扩充训练集);进阶操作:颜色抖动(随机调整亮度 / 对比度 / 饱和度,模拟不同光照)、MixUp(两张图像加权混合,增强模型鲁棒性)、CutMix(裁剪部分区域替换为其他图像,保留局部结构);核心场景:目标检测、图像分类训练(如自动驾驶数据集增强,模拟不同天气、光照下的道路场景)。 四、一些常见的场景和方法应用场景推荐方法核心原因医学影像(CT / 眼底)CLAHE、ResUNet、医学专用去噪模型需保留局部细节,避免过曝 / 模糊关键结构监控视频 / 老照片修复中值滤波(去斑点)、Real-ESRGAN(超分)处理椒盐噪声 + 低分辨率,提升清晰度雾天 / 低光照片伽马校正、DehazeNet(去雾)、USM 锐化提亮暗部 + 去雾 + 突出细节,还原真实场景遥感影像解译高斯高通滤波(边缘增强)、SegSRNet(超分)突出地物边缘,提升小目标(如电力线)识别率深度学习模型训练随机翻转、CutMix、颜色抖动扩充数据集多样性,提升模型泛化能力总结一下下图像增强方法的选择需结合场景需求(去噪 / 锐化 / 超分)、图像退化类型(噪声 / 模糊 / 雾) 及实时性要求:实时场景(如监控、手机拍照)优先用传统空域方法(高斯滤波、CLAHE);复杂场景(如医学影像、老照片修复)优先用深度学习方法(ESRGAN、医学专用模型);周期性噪声(如横纹干扰)优先用频域带阻滤波。 随着 AI 技术发展,深度学习增强正逐步取代传统方法,成为复杂场景下的首选。
  • [版主精选] 【合集】人工智能模块热门问答
    人工智能热门问答TOP101.为什么卷积神经网络在处理一维信号(如语音、EEG)时同样有效?https://bbs.huaweicloud.com/forum/thread-0223191003341365066-1-1.html2.对图像切分检测结果进行后处理时,GREEDYNMM、NMM、NMS、LSNMS有什么区别?https://bbs.huaweicloud.com/forum/thread-0228191663236259099-1-1.html3.在深度学习领域,有哪些高性能的模型推理框架?https://bbs.huaweicloud.com/forum/thread-0223191663216494117-1-1.html4.对于烟雾等不规则形状物体的识别,目标检测和图像分割哪个效果更好?https://bbs.huaweicloud.com/forum/thread-0294191663195690113-1-1.html5.有没有适配边缘设备上的目标跟踪算法,同时保证精度和速度?https://bbs.huaweicloud.com/forum/thread-0293191663050060106-1-1.html6.随着低空经济的爆发,无人机AI算法应该部署在机载、边缘还是云端,各有什么优势?    https://bbs.huaweicloud.com/forum/thread-02114191663070338117-1-1.html7.如果要入门AI,应该先学习Python,还是C++?https://bbs.huaweicloud.com/forum/thread-0293191663097134107-1-1.html8. Jetson系列的开发板可用于YOLO模型训练吗?https://bbs.huaweicloud.com/forum/thread-0228191663116373098-1-1.html9.模型的输入尺寸变大,占用的显存会变高吗?https://bbs.huaweicloud.com/forum/thread-02114191663139067118-1-1.html10.如果要进行实时的目标检测,至少需要多少Tops的算力?https://bbs.huaweicloud.com/forum/thread-02122191663160676103-1-1.html
  • [技术干货] 猫脸关键点检测(ModelBox)
    猫脸关键点检测(ModelBox)一、模型训练与转换ResNet50V2是改进版的深度卷积神经网络,基于 ResNet 架构发展而来。它采用前置激活(将 BN 和 ReLU 移至卷积前)与身份映射,优化了信息传播和模型训练性能。作为 50 层深度的网络,ResNet50V2 广泛应用于图像分类、目标检测等任务,支持迁移学习,适合快速适配新数据集,具有良好的泛化能力和较高准确率。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建ResNet50V2工程:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n ResNet50V2 ... success: create ResNet50V2 in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中,-t参数,表示所创建实例的类型,包括server(ModelBox工程)、python(Python功能单元)、c++(C++功能单元)、infer(推理功能单元)等;-n参数,表示所创建实例的名称;-s参数,表示将使用后面参数值代表的模板创建工程,而不是创建空的工程。2. 创建推理功能单元在ModelBox sdk目录下使用create.bat创建resnet50v2_infer推理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t infer -n resnet50_infer -p ResNet50V2 ... success: create infer resnet50_infer in D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2/model/resnet50_infercreate.bat工具使用时,-t infer即表示创建的是推理功能单元;-n xxx_infer表示创建的功能单元名称为xxx_infer;-p表示所创建的功能单元属于ResNet50V2应用。下载转换好的ResNet50V2.onnx模型到ResNet50V2\model目录下,修改推理功能单元resnet50v2_infer.toml模型的配置文件:# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [base] name = "resnet50_infer" device = "cpu" version = "1.0.0" description = "your description" entry = "./ResNet50V2.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # Input ports description [input] [input.input1] # input port number, Format is input.input[N] name = "Input" # input port name type = "float" # input port data type ,e.g. float or uint8 device = "cpu" # input buffer type: cpu, win10 now copy input from cpu # Output ports description [output] [output.output1] # output port number, Format is output.output[N] name = "Output" # output port name type = "float" # output port data type ,e.g. float or uint83. 创建后处理功能单元在ModelBox sdk目录下使用create.bat创建resnet50v2_post后处理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t python -n resnet50v2_post -p ResNet50V2 ... success: create python resnet50v2_post in D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2/etc/flowunit/resnet50v2_postcreate.bat工具使用时,-t python即表示创建的是通用功能单元;-n xxx_post表示创建的功能单元名称为xxx_post;-p表示所创建的功能单元属于ResNet50V2应用。a. 修改配置文件我们的模型有一个输入和输出,总共包含猫脸的9个关键点:# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "resnet50v2_post" # The FlowUnit name device = "cpu" # The flowunit runs on cpu version = "1.0.0" # The version of the flowunit type = "python" # Fixed value, do not change description = "description" # The description of the flowunit entry = "resnet50v2_post@resnet50v2_postFlowUnit" # Python flowunit entry function group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit # The default Flowunit config [config] keypoints = 9 # Input ports description [input] [input.input1] # Input port number, the format is input.input[N] name = "in_feat" # Input port name type = "float" # Input port type # Output ports description [output] [output.output1] # Output port number, the format is output.output[N] name = "out_data" # Output port name type = "string" # Output port typeb. 修改逻辑代码# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. #!/usr/bin/env python # -*- coding: utf-8 -*- import _flowunit as modelbox import numpy as np import json class resnet50v2_postFlowUnit(modelbox.FlowUnit): # Derived from modelbox.FlowUnit def __init__(self): super().__init__() def open(self, config): # Open the flowunit to obtain configuration information self.params = {} self.params['keypoints'] = config.get_int('keypoints') return modelbox.Status.StatusCode.STATUS_SUCCESS def process(self, data_context): # Process the data in_data = data_context.input("in_feat") out_data = data_context.output("out_data") # resnet50v2_post process code. # Remove the following code and add your own code here. for buffer_feat in in_data: feat_data = np.array(buffer_feat.as_object(), copy=False) keypoints = feat_data.reshape(-1, 2).tolist() result = {"keypoints": keypoints} result_str = json.dumps(result) out_buffer = modelbox.Buffer(self.get_bind_device(), result_str) out_data.push_back(out_buffer) return modelbox.Status.StatusCode.STATUS_SUCCESS def close(self): # Close the flowunit return modelbox.Status() def data_pre(self, data_context): # Before streaming data starts return modelbox.Status() def data_post(self, data_context): # After streaming data ends return modelbox.Status() def data_group_pre(self, data_context): # Before all streaming data starts return modelbox.Status() def data_group_post(self, data_context): # After all streaming data ends return modelbox.Status() 4. 修改应用的流程图ResNet50V2工程graph目录下存放流程图,默认的流程图ResNet50V2.toml与工程同名:# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph ResNet50V2 { node [shape=Mrecord] queue_size = 4 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] httpserver_sync_receive[type=flowunit, flowunit=httpserver_sync_receive_v2, device=cpu, deviceid=0, time_out_ms=5000, endpoint="http://0.0.0.0:1234/v1/ResNet50V2", max_requests=100] image_decoder[type=flowunit, flowunit=image_decoder, device=cpu, key="image_base64", queue_size=4] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=224, image_height=224] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="0.003921568627450,0.003921568627450,0.003921568627450"] resnet50v2_infer[type=flowunit, flowunit=resnet50v2_infer, device=cpu, deviceid=0, batch_size=1] resnet50v2_post[type=flowunit, flowunit=resnet50v2_post, device=cpu, deviceid=0] httpserver_sync_reply[type=flowunit, flowunit=httpserver_sync_reply_v2, device=cpu, deviceid=0] input1:input -> httpserver_sync_receive:in_url httpserver_sync_receive:out_request_info -> image_decoder:in_encoded_image image_decoder:out_image -> image_resize:in_image image_resize:out_image -> normalize:in_data normalize:out_data -> resnet50v2_infer:Input resnet50v2_infer:Output -> resnet50v2_post:in_feat resnet50v2_post:out_data -> httpserver_sync_reply:in_reply_info }""" [flow] desc = "ResNet50V2 run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面,可以实时修改并查看项目的流程图:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor5. 运行应用在ResNet50V2工程目录下执行.\bin\main.bat运行应用:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2 PS D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2> .\bin\main.bat在ResNet50V2工程data目录下新建test_http.py测试脚本:#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. import os import cv2 import json import base64 import http.client class HttpConfig: '''http调用的参数配置''' def __init__(self, host_ip, port, url, img_base64_str): self.hostIP = host_ip self.Port = port self.httpMethod = "POST" self.requstURL = url self.headerdata = { "Content-Type": "application/json" } self.test_data = { "image_base64": img_base64_str } self.body = json.dumps(self.test_data) def read_image(img_path): '''读取图片数据并转为base64编码的字符串''' img_data = cv2.imread(img_path) img_data = cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB) img_str = cv2.imencode('.jpg', img_data)[1].tobytes() img_bin = base64.b64encode(img_str) img_base64_str = str(img_bin, encoding='utf8') return img_data, img_base64_str def test_image(img_path, ip, port, url): '''单张图片测试''' img_data, img_base64_str = read_image(img_path) http_config = HttpConfig(ip, port, url, img_base64_str) conn = http.client.HTTPConnection(host=http_config.hostIP, port=http_config.Port) conn.request(method=http_config.httpMethod, url=http_config.requstURL, body=http_config.body, headers=http_config.headerdata) response = conn.getresponse().read().decode() print('response: ', response) result = json.loads(response) w, h = img_data.shape[1], img_data.shape[0] for x, y in result["keypoints"]: if x > 0 and y > 0: cv2.circle(img_data, (int(x * w), int(y * h)), 5, (0, 255, 0), -1) cv2.imwrite('./result-' + os.path.basename(img_path), img_data[..., ::-1]) if __name__ == "__main__": port = 1234 ip = "127.0.0.1" url = "/v1/ResNet50V2" img_folder = './test_imgs' file_list = os.listdir(img_folder) for img_file in file_list: print("\n================ {} ================".format(img_file)) img_path = os.path.join(img_folder, img_file) test_image(img_path, ip, port, url) 在ResNet50V2工程data目录下新建test_imgs文件夹存放测试图片:在另一个终端中进入ResNet50V2工程目录data文件夹下运行test_http.py脚本发起HTTP请求测试:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2\data PS D:\modelbox-win10-x64-1.5.3\workspace\ResNet50V2\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe .\test_http.py ================ 2256.jpg ================ response: {"keypoints": [[0.19147011637687683, 0.26770520210266113], [0.29639703035354614, 0.26533427834510803], [0.24554343521595, 0.35762542486190796], [0.11009970307350159, 0.2090619057416916], [0.08408773690462112, 0.09547536075115204], [0.17451311647891998, 0.169035404920578], [0.2880205512046814, 0.168979212641716], [0.3739408254623413, 0.0717596635222435], [0.34669068455696106, 0.20229394733905792]]} ================ 6899.jpg ================ response: {"keypoints": [[0.3829421401023865, 0.41393953561782837], [0.47102952003479004, 0.42683106660842896], [0.4321300983428955, 0.5082458853721619], [0.3185971677303314, 0.36286458373069763], [0.33502572774887085, 0.2243150770664215], [0.3852037489414215, 0.29658034443855286], [0.4819968640804291, 0.30954840779304504], [0.5504774451255798, 0.2711380124092102], [0.5290539264678955, 0.3962092399597168]]} 在ResNet50V2工程data目录下即可查看测试图片的推理结果:三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个ResNet50V2猫脸关键点检测的AI应用,我们只需要准备模型文件以及简单的配置即可创建一个HTTP服务。同时我们可以了解到ResNet50V2网络的基本结构、数据处理和模型训练方法,以及对应推理应用的逻辑。----转自博客:https://bbs.huaweicloud.com/blogs/451999
  • [技术干货] 果蔬病虫害分割(ModelBox)
    果蔬病虫害分割(ModelBox)一、模型训练与转换FCN(全卷积网络,Fully Convolutional Networks)是用于语义分割任务的一种深度学习模型架构,引入了跳跃结构(Skip Architecture),通过融合浅层和深层的特征图,保留更多的细节信息,提升分割精度。此外,FCN还利用多尺度上下文聚合,捕捉不同层级的特征,增强了对不同大小目标的识别能力。FCN的成功推动了语义分割领域的发展,成为后续许多先进模型的基础。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建FCN工程:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n FCN ... success: create FCN in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中,-t参数,表示所创建实例的类型,包括server(ModelBox工程)、python(Python功能单元)、c++(C++功能单元)、infer(推理功能单元)等;-n参数,表示所创建实例的名称;-s参数,表示将使用后面参数值代表的模板创建工程,而不是创建空的工程。2. 创建推理功能单元在ModelBox sdk目录下使用create.bat创建fcn_infer推理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t infer -n fcn_infer -p FCN ... success: create infer fcn_infer in D:\modelbox-win10-x64-1.5.3\workspace\FCN/model/fcn_infercreate.bat工具使用时,-t infer即表示创建的是推理功能单元;-n xxx_infer表示创建的功能单元名称为xxx_infer;-p表示所创建的功能单元属于FCN应用。下载转换好的FCN.onnx模型到FCN\model目录下,修改推理功能单元fcn_infer.toml模型的配置文件:# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [base] name = "fcn_infer" device = "cpu" version = "1.0.0" description = "your description" entry = "./FCN.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # Input ports description [input] [input.input1] # input port number, Format is input.input[N] name = "Input" # input port name type = "float" # input port data type ,e.g. float or uint8 device = "cpu" # input buffer type: cpu, win10 now copy input from cpu # Output ports description [output] [output.output1] # output port number, Format is output.output[N] name = "Output" # output port name type = "float" # output port data type ,e.g. float or uint83. 创建后处理功能单元在ModelBox sdk目录下使用create.bat创建fcn_post后处理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t python -n fcn_post -p FCN ... success: create python fcn_post in D:\modelbox-win10-x64-1.5.3\workspace\FCN/etc/flowunit/fcn_postcreate.bat工具使用时,-t python即表示创建的是通用功能单元;-n xxx_post表示创建的功能单元名称为xxx_post;-p表示所创建的功能单元属于FCN应用。a. 修改配置文件我们的模型有一个输入和输出,对116种果蔬病虫害进行分割,加上背景总共是117类:# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "fcn_post" # The FlowUnit name device = "cpu" # The flowunit runs on cpu version = "1.0.0" # The version of the flowunit type = "python" # Fixed value, do not change description = "description" # The description of the flowunit entry = "fcn_post@fcn_postFlowUnit" # Python flowunit entry function group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit # The default Flowunit config [config] num_classes = 117 net_w = 224 net_h = 224 # Input ports description [input] [input.input1] # Input port number, the format is input.input[N] name = "in_image" # Input port name type = "uint8" # Input port type [input.input2] # Input port number, the format is input.input[N] name = "in_feat" # Input port name type = "float" # Input port type # Output ports description [output] [output.output1] # Output port number, the format is output.output[N] name = "out_image" # Output port name type = "uint8" # Output port typeb. 修改逻辑代码# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. #!/usr/bin/env python # -*- coding: utf-8 -*- import _flowunit as modelbox import numpy as np import cv2 class fcn_postFlowUnit(modelbox.FlowUnit): # Derived from modelbox.FlowUnit def __init__(self): super().__init__() def open(self, config): # Open the flowunit to obtain configuration information self.params = {} self.params['num_classes'] = config.get_int('num_classes') self.params['net_w'] = config.get_int('net_w') self.params['net_h'] = config.get_int('net_h') return modelbox.Status.StatusCode.STATUS_SUCCESS def process(self, data_context): # Process the data in_image = data_context.input("in_image") in_feat = data_context.input("in_feat") out_image = data_context.output("out_image") # fcn_post process code. # Remove the following code and add your own code here. for buffer_image, buffer_feat in zip(in_image, in_feat): channel = buffer_image.get('channel') width = buffer_image.get('width') height = buffer_image.get('height') image = np.array(buffer_image.as_object(), dtype=np.uint8, copy=False) image = image.reshape(height, width, channel) feat = np.array(buffer_feat.as_object(), dtype=np.float32, copy=False) feat = feat.reshape(self.params['net_h'], self.params['net_w'], self.params['num_classes']) mask = np.argmax(feat, axis=-1).astype(np.uint8) mask = cv2.resize(mask, (width, height), interpolation=cv2.INTER_NEAREST) overlay = np.zeros_like(image) for i in range(1, self.params['num_classes']): color = np.random.randint(0, 255, (3,)).tolist() overlay[mask==i] = color result_image = cv2.addWeighted(image[..., ::-1], 0.5, overlay, 0.5, 0) add_buffer = modelbox.Buffer(self.get_bind_device(), result_image) add_buffer.copy_meta(buffer_image) out_image.push_back(add_buffer) return modelbox.Status.StatusCode.STATUS_SUCCESS def close(self): # Close the flowunit return modelbox.Status() def data_pre(self, data_context): # Before streaming data starts return modelbox.Status() def data_post(self, data_context): # After streaming data ends return modelbox.Status() def data_group_pre(self, data_context): # Before all streaming data starts return modelbox.Status() def data_group_post(self, data_context): # After all streaming data ends return modelbox.Status() 4. 修改应用的流程图FCN工程graph目录下存放流程图,默认的流程图FCN.toml与工程同名:# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph FCN { node [shape=Mrecord] queue_size = 4 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] httpserver_sync_receive[type=flowunit, flowunit=httpserver_sync_receive_v2, device=cpu, deviceid=0, time_out_ms=5000, endpoint="http://0.0.0.0:1234/v1/FCN", max_requests=100] image_decoder[type=flowunit, flowunit=image_decoder, device=cpu, key="image_base64", queue_size=4] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=224, image_height=224] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="0.003921568627450,0.003921568627450,0.003921568627450"] fcn_infer[type=flowunit, flowunit=fcn_infer, device=cpu, deviceid=0, batch_size=1] fcn_post[type=flowunit, flowunit=fcn_post, device=cpu, deviceid=0] httpserver_sync_reply[type=flowunit, flowunit=httpserver_sync_reply_v2, device=cpu, deviceid=0] input1:input -> httpserver_sync_receive:in_url httpserver_sync_receive:out_request_info -> image_decoder:in_encoded_image image_decoder:out_image -> image_resize:in_image image_resize:out_image -> normalize:in_data normalize:out_data -> fcn_infer:Input image_decoder:out_image -> fcn_post:in_image fcn_infer:Output -> fcn_post:in_feat fcn_post:out_image -> httpserver_sync_reply:in_reply_info }""" [flow] desc = "FCN run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面,可以实时修改并查看项目的流程图:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor5. 运行应用在FCN工程目录下执行.\bin\main.bat运行应用:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\FCN PS D:\modelbox-win10-x64-1.5.3\workspace\FCN> .\bin\main.bat在FCN工程data目录下新建test_http.py测试脚本:import cv2 import json import base64 import requests import numpy as np if __name__ == "__main__": port = 1234 ip = "127.0.0.1" url = "/v1/FCN" img_path = "apple_black_rot_google_0056.jpg" img_data = cv2.imread(img_path) img_data = cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB) img_str = cv2.imencode('.jpg', img_data)[1].tobytes() img = base64.b64encode(img_str) img_base64_str = str(img, encoding='utf8') params = {"image_base64": img_base64_str} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) h, w, c = img_data.shape img_array = np.frombuffer(response.content, np.uint8) img_array = img_array.reshape((h, -1, c)) cv2.imwrite("res.jpg", img_array) 在FCN工程data目录下存放测试图片:在另一个终端中进入FCN工程目录data文件夹下:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\FCN\data首先安装requests依赖包:PS D:\modelbox-win10-x64-1.5.3\workspace\FCN\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe -m pip install requests然后运行test_http.py脚本发起HTTP请求测试:PS D:\modelbox-win10-x64-1.5.3\workspace\FCN\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe .\test_http.py测试图片的分割结果res.jpg将保存在FCN工程data目录下:三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个FCN果蔬病虫害分割的AI应用,我们只需要准备模型文件以及简单的配置即可创建一个HTTP服务。同时我们可以了解到FCN网络的基本结构、数据处理和模型训练方法,以及对应推理应用的逻辑。----转自博客:https://bbs.huaweicloud.com/blogs/449045
  • [技术干货] 深海鱼类检测(ModelBox)
    深海鱼类检测(ModelBox)一、模型训练和转换YOLOX是YOLO系列的优化版本,引入了解耦头、数据增强、无锚点以及标签分类等目标检测领域的优秀进展,拥有较好的精度表现,同时对工程部署友好。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、ModelBox 应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建fish_det工程:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n fish_det -s car_det ... success: create fish_det in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中,-t参数,表示所创建实例的类型,包括server(ModelBox工程)、python(Python功能单元)、c++(C++功能单元)、infer(推理功能单元)等;-n参数,表示所创建实例的名称;-s参数,表示将使用后面参数值代表的模板创建工程,而不是创建空的工程。2. 修改推理功能单元下载转换好的yolox_fish.onnx模型到fish_det\model目录下,修改推理功能单元yolox_infer.toml模型的配置文件:# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. [base] name = "yolox_infer" device = "cpu" version = "1.0.0" description = "fish detection" entry = "./yolox_fish.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # input port description, suporrt multiple input ports [input] [input.input1] name = "input" type = "float" device = "cpu" # output port description, suporrt multiple output ports [output] [output.output1] name = "output" type = "float" 3. 修改后处理功能单元我们的模型的输入大小为320,类别数量是1,修改fish_det\etc\flowunit\yolox_post目录下的yolox_post.toml配置文件:# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "yolox_post" # The FlowUnit name device = "cpu" # The device the flowunit runs on,cpu,cuda,ascend。 version = "1.0.0" # The version of the flowunit description = "description" # The description of the flowunit entry = "yolox_post@yolox_postFlowUnit" # Python flowunit entry function type = "python" # Fixed value group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit [config] net_h = 320 net_w = 320 num_classes = 1 strides = ['8', '16', '32'] conf_threshold = 0.25 iou_threshold = 0.45 [input] [input.input1] name = "in_feat" type = "float" [output] [output.output1] name = "out_data" type = "string" 4. 修改绘图功能单元我们这里只有一个类别,所以修改coco_car_labels = [0]只检测鱼这个类别:... def decode_car_bboxes(self, bbox_str, input_shape): try: coco_car_labels = [0] # fish det_result = json.loads(bbox_str)['det_result'] if (det_result == "None"): return [] bboxes = json.loads(det_result) car_bboxes = list(filter(lambda x: int(x[5]) in coco_car_labels, bboxes)) except Exception as ex: modelbox.error(str(ex)) return [] else: for bbox in car_bboxes: bbox[0] = int(bbox[0] * input_shape[1]) bbox[1] = int(bbox[1] * input_shape[0]) bbox[2] = int(bbox[2] * input_shape[1]) bbox[3] = int(bbox[3] * input_shape[0]) return car_bboxes ... 5. 修改应用的流程图修改image_resize图像预处理功能单元参数image_width=320, image_height=320与模型的输入大小保持一致:# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph fish_det { node [shape=Mrecord] queue_size = 1 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] data_source_parser[type=flowunit, flowunit=data_source_parser, device=cpu, deviceid=0] video_demuxer[type=flowunit, flowunit=video_demuxer, device=cpu, deviceid=0] video_decoder[type=flowunit, flowunit=video_decoder, device=cpu, deviceid=0, pix_fmt=bgr] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=320, image_height=320] image_transpose[type=flowunit, flowunit=packed_planar_transpose, device=cpu, deviceid=0] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="1,1,1"] car_detection[type=flowunit, flowunit=yolox_infer, device=cpu, deviceid=0, batch_size = 1] yolox_post[type=flowunit, flowunit=yolox_post, device=cpu, deviceid=0] draw_car_bbox[type=flowunit, flowunit=draw_car_bbox, device=cpu, deviceid=0] video_out[type=flowunit, flowunit=video_out, device=cpu, deviceid=0] input1:input -> data_source_parser:in_data data_source_parser:out_video_url -> video_demuxer:in_video_url video_demuxer:out_video_packet -> video_decoder:in_video_packet video_decoder:out_video_frame -> image_resize:in_image image_resize:out_image -> image_transpose:in_image image_transpose:out_image -> normalize:in_data normalize:out_data -> car_detection:input car_detection:output -> yolox_post:in_feat video_decoder:out_video_frame -> draw_car_bbox:in_image yolox_post:out_data -> draw_car_bbox:in_bbox draw_car_bbox:out_image -> video_out:in_video_frame }""" [flow] desc = "fish_det run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面,可以实时修改并查看项目的流程图:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor6. 配置应用的输入输出下载测试视频到fish_det\data目录下,修改应用fish_det\bin\mock_task.toml配置文件:# 用于本地mock文件读取任务,脚本中已经配置了IVA_SVC_CONFIG环境变量, 添加了此文件路径 ########### 请确定使用linux的路径类型,比如在windows上要用 D:/xxx/xxx 不能用D:\xxx\xxx ########### # 任务的参数为一个压缩并转义后的json字符串 # 直接写需要转义双引号, 也可以用 content_file 添加一个json文件 [common] content = "{\"param_str\":\"string param\",\"param_int\":10,\"param_float\":10.5}" # 任务输入配置,mock模拟目前仅支持一路rtsp或者本地url, 当前支持以下几种输入方式: # 1. rtsp摄像头或rtsp视频流:type="rtsp", url="rtsp://xxx.xxx" (type为rtsp的时候,支持视频中断自动重连) # 2. 设备自带摄像头或者USB摄像头:type="url",url="摄像头编号,比如 0 或者 1 等" (需配合local_camera功能单元使用) # 3. 本地视频文件:type="url",url="视频文件路径" (可以是相对路径 -- 相对这个mock_task.toml文件, 也支持从环境变量${HILENS_APP_ROOT}所在目录文件输入) # 4. http服务:type="url", url="http://xxx.xxx"(指的是任务作为http服务启动,此处需填写对外暴露的http服务地址,需配合httpserver类的功能单元使用) [input] type = "url" url = "${HILENS_APP_ROOT}/data/Test_ROV_video_h264_decim.mp4" # 任务输出配置,当前支持以下几种输出方式: # 1. rtsp视频流:type="local", url="rtsp://xxx.xxx" # 2. 本地屏幕:type="local", url="0:xxx" (设备需要接显示器,系统需要安装桌面) # 3. 本地视频文件:type="local",url="视频文件路径" (可以是相对路径——相对这个mock_task.toml文件, 也支持输出到环境变量${HILENS_DATA_DIR}所在目录或子目录) # 4. http服务:type="webhook", url="http://xxx.xxx" (指的是任务产生的数据上报给某个http服务,此处需填写上传的http服务地址) [output] type = "local" url = "0" 7. 运行应用在fish_det工程目录下执行.\bin\main.bat运行应用,本地屏幕上会自动弹出鱼群的实时检测画面:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\fish_det PS D:\modelbox-win10-x64-1.5.3\workspace\fish_det> .\bin\main.bat三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个YOLOX鱼类目标检测的AI应用,我们只需要准备模型并配置对应的toml文件,即可快速实现模型的高效推理和部署。 ----转自博客:https://bbs.huaweicloud.com/blogs/449038
  • [技术干货] 动物分类(ModelBox)
    动物分类(ModelBox)一、模型训练与转换Inception V3,GoogLeNet的改进版本,采用InceptionModule和全局平均池化层,v3一个最重要的改进是分解(Factorization),将7x7分解成两个一维的卷积(1x7,7x1),3x3也是一样(1x3,3x1),这样的好处,既可以加速计算(多余的计算能力可以用来加深网络),又可以将1个conv拆成2个conv,使得网络深度进一步增加,增加了网络的非线性。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、ModelBox 应用开发1. 创建工程在ModelBox sdk目录下使用create.bat创建InceptionV3工程:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t server -n InceptionV3 ... success: create InceptionV3 in D:\modelbox-win10-x64-1.5.3\workspacecreate.bat工具的参数中,-t参数,表示所创建实例的类型,包括server(ModelBox工程)、python(Python功能单元)、c++(C++功能单元)、infer(推理功能单元)等;-n参数,表示所创建实例的名称;-s参数,表示将使用后面参数值代表的模板创建工程,而不是创建空的工程。2. 创建推理功能单元在ModelBox sdk目录下使用create.bat创建inceptionv3_infer推理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t infer -n inceptionv3_infer -p InceptionV3 ... success: create infer inceptionv3_infer in D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3/model/inceptionv3_infercreate.bat工具使用时,-t infer即表示创建的是推理功能单元;-n xxx_infer表示创建的功能单元名称为xxx_infer;-p表示所创建的功能单元属于InceptionV3应用。下载转换好的InceptionV3.onnx模型到InceptionV3\model目录下,修改推理功能单元inceptionv3_infer.toml模型的配置文件:# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [base] name = "inceptionv3_infer" device = "cpu" version = "1.0.0" description = "your description" entry = "./InceptionV3.onnx" # model file path, use relative path type = "inference" virtual_type = "onnx" # inference engine type: win10 now only support onnx group_type = "Inference" # flowunit group attribution, do not change # Input ports description [input] [input.input1] # input port number, Format is input.input[N] name = "Input" # input port name type = "float" # input port data type ,e.g. float or uint8 device = "cpu" # input buffer type: cpu, win10 now copy input from cpu # Output ports description [output] [output.output1] # output port number, Format is output.output[N] name = "Output" # output port name type = "float" # output port data type ,e.g. float or uint83. 创建后处理功能单元在ModelBox sdk目录下使用create.bat创建inceptionv3_post后处理功能单元:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t python -n inceptionv3_post -p InceptionV3 ... success: create python inceptionv3_post in D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3/etc/flowunit/inceptionv3_postcreate.bat工具使用时,-t python即表示创建的是通用功能单元;-n xxx_post表示创建的功能单元名称为xxx_post;-p表示所创建的功能单元属于InceptionV3应用。a. 修改配置文件我们的模型有一个输入和输出,总共包含90种动物类别:# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. # Basic config [base] name = "inceptionv3_post" # The FlowUnit name device = "cpu" # The flowunit runs on cpu version = "1.0.0" # The version of the flowunit type = "python" # Fixed value, do not change description = "description" # The description of the flowunit entry = "inceptionv3_post@inceptionv3_postFlowUnit" # Python flowunit entry function group_type = "Generic" # flowunit group attribution, change as Input/Output/Image/Generic ... # Flowunit Type stream = false # Whether the flowunit is a stream flowunit condition = false # Whether the flowunit is a condition flowunit collapse = false # Whether the flowunit is a collapse flowunit collapse_all = false # Whether the flowunit will collapse all the data expand = false # Whether the flowunit is a expand flowunit # The default Flowunit config [config] num_classes = 90 # Input ports description [input] [input.input1] # Input port number, the format is input.input[N] name = "in_feat" # Input port name type = "float" # Input port type # Output ports description [output] [output.output1] # Output port number, the format is output.output[N] name = "out_data" # Output port name type = "string" # Output port typeb. 修改逻辑代码# Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. #!/usr/bin/env python # -*- coding: utf-8 -*- import _flowunit as modelbox import numpy as np import json class inceptionv3_postFlowUnit(modelbox.FlowUnit): # Derived from modelbox.FlowUnit def __init__(self): super().__init__() def open(self, config): # Open the flowunit to obtain configuration information self.params = {} self.params['num_classes'] = config.get_int('num_classes') return modelbox.Status.StatusCode.STATUS_SUCCESS def process(self, data_context): # Process the data in_feat = data_context.input("in_feat") out_data = data_context.output("out_data") # inceptionv3_post process code. # Remove the following code and add your own code here. for buffer_feat in in_feat: feat_data = np.array(buffer_feat.as_object(), copy=False) clsse = np.argmax(feat_data).astype(np.int32).item() score = feat_data[clsse].astype(np.float32).item() result = {"clsse": clsse, "score":score} result_str = json.dumps(result) out_buffer = modelbox.Buffer(self.get_bind_device(), result_str) out_data.push_back(out_buffer) return modelbox.Status.StatusCode.STATUS_SUCCESS def close(self): # Close the flowunit return modelbox.Status() def data_pre(self, data_context): # Before streaming data starts return modelbox.Status() def data_post(self, data_context): # After streaming data ends return modelbox.Status() def data_group_pre(self, data_context): # Before all streaming data starts return modelbox.Status() def data_group_post(self, data_context): # After all streaming data ends return modelbox.Status() 4. 修改应用的流程图InceptionV3工程graph目录下存放流程图,默认的流程图InceptionV3.toml与工程同名:# Copyright (C) 2020 Huawei Technologies Co., Ltd. All rights reserved. [driver] dir = ["${HILENS_APP_ROOT}/etc/flowunit", "${HILENS_APP_ROOT}/etc/flowunit/cpp", "${HILENS_APP_ROOT}/model", "${HILENS_MB_SDK_PATH}/flowunit"] skip-default = true [profile] profile=false trace=false dir="${HILENS_DATA_DIR}/mb_profile" [graph] format = "graphviz" graphconf = """digraph InceptionV3 { node [shape=Mrecord] queue_size = 4 batch_size = 1 input1[type=input,flowunit=input,device=cpu,deviceid=0] httpserver_sync_receive[type=flowunit, flowunit=httpserver_sync_receive_v2, device=cpu, deviceid=0, time_out_ms=5000, endpoint="http://0.0.0.0:1234/v1/InceptionV3", max_requests=100] image_decoder[type=flowunit, flowunit=image_decoder, device=cpu, key="image_base64", queue_size=4] image_resize[type=flowunit, flowunit=resize, device=cpu, deviceid=0, image_width=224, image_height=224] normalize[type=flowunit, flowunit=normalize, device=cpu, deviceid=0, standard_deviation_inverse="0.003921568627450,0.003921568627450,0.003921568627450"] inceptionv3_infer[type=flowunit, flowunit=inceptionv3_infer, device=cpu, deviceid=0, batch_size=1] inceptionv3_post[type=flowunit, flowunit=inceptionv3_post, device=cpu, deviceid=0] httpserver_sync_reply[type=flowunit, flowunit=httpserver_sync_reply_v2, device=cpu, deviceid=0] input1:input -> httpserver_sync_receive:in_url httpserver_sync_receive:out_request_info -> image_decoder:in_encoded_image image_decoder:out_image -> image_resize:in_image image_resize:out_image -> normalize:in_data normalize:out_data -> inceptionv3_infer:Input inceptionv3_infer:Output -> inceptionv3_post:in_feat inceptionv3_post:out_data -> httpserver_sync_reply:in_reply_info }""" [flow] desc = "InceptionV3 run in modelbox-win10-x64" 在命令行中运行.\create.bat -t editor即可打开ModelBox图编排界面,可以实时修改并查看项目的流程图:PS D:\modelbox-win10-x64-1.5.3> .\create.bat -t editor5. 运行应用在InceptionV3工程目录下执行.\bin\main.bat运行应用:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3 PS D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3> .\bin\main.bat在InceptionV3工程data目录下新建test_http.py测试脚本:#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (c) Huawei Technologies Co., Ltd. 2022. All rights reserved. import os import cv2 import json import base64 import http.client class HttpConfig: '''http调用的参数配置''' def __init__(self, host_ip, port, url, img_base64_str): self.hostIP = host_ip self.Port = port self.httpMethod = "POST" self.requstURL = url self.headerdata = { "Content-Type": "application/json" } self.test_data = { "image_base64": img_base64_str } self.body = json.dumps(self.test_data) def read_image(img_path): '''读取图片数据并转为base64编码的字符串''' img_data = cv2.imread(img_path) img_data = cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB) img_str = cv2.imencode('.jpg', img_data)[1].tobytes() img_bin = base64.b64encode(img_str) img_base64_str = str(img_bin, encoding='utf8') return img_data, img_base64_str def decode_result_str(result_str): try: result = json.loads(result_str) except Exception as ex: print(str(ex)) return [] else: return result labels = ['antelope', 'badger', 'bat', 'bear', 'bee', 'beetle', 'bison', 'boar', 'butterfly', 'cat', 'caterpillar', 'chimpanzee', 'cockroach', 'cow', 'coyote', 'crab', 'crow', 'deer', 'dog', 'dolphin', 'donkey', 'dragonfly', 'duck', 'eagle', 'elephant', 'flamingo', 'fly', 'fox', 'goat', 'goldfish', 'goose', 'gorilla', 'grasshopper', 'hamster', 'hare', 'hedgehog', 'hippopotamus', 'hornbill', 'horse', 'hummingbird', 'hyena', 'jellyfish', 'kangaroo', 'koala', 'ladybugs', 'leopard', 'lion', 'lizard', 'lobster', 'mosquito', 'moth', 'mouse', 'octopus', 'okapi', 'orangutan', 'otter', 'owl', 'ox', 'oyster', 'panda', 'parrot', 'pelecaniformes', 'penguin', 'pig', 'pigeon', 'porcupine', 'possum', 'raccoon', 'rat', 'reindeer', 'rhinoceros', 'sandpiper', 'seahorse', 'seal', 'shark', 'sheep', 'snake', 'sparrow', 'squid', 'squirrel', 'starfish', 'swan', 'tiger', 'turkey', 'turtle', 'whale', 'wolf', 'wombat', 'woodpecker', 'zebra'] def test_image(img_path, ip, port, url): '''单张图片测试''' img_data, img_base64_str = read_image(img_path) http_config = HttpConfig(ip, port, url, img_base64_str) conn = http.client.HTTPConnection(host=http_config.hostIP, port=http_config.Port) conn.request(method=http_config.httpMethod, url=http_config.requstURL, body=http_config.body, headers=http_config.headerdata) response = conn.getresponse().read().decode() print('response: ', response) result = decode_result_str(response) clsse, score = result["clsse"], result["score"] result_str = f"{labels[clsse]}:{round(score, 2)}" cv2.putText(img_data, result_str, (0, 100), cv2.FONT_HERSHEY_TRIPLEX, 4, (0, 255, 0), 2) cv2.imwrite('./result-' + os.path.basename(img_path), img_data[..., ::-1]) if __name__ == "__main__": port = 1234 ip = "127.0.0.1" url = "/v1/InceptionV3" img_folder = './test_imgs' file_list = os.listdir(img_folder) for img_file in file_list: print("\n================ {} ================".format(img_file)) img_path = os.path.join(img_folder, img_file) test_image(img_path, ip, port, url) 在InceptionV3工程data目录下新建test_imgs文件夹存放测试图片:在另一个终端中进入InceptionV3工程目录data文件夹下运行test_http.py脚本发起HTTP请求测试:PS D:\modelbox-win10-x64-1.5.3> cd D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3\data PS D:\modelbox-win10-x64-1.5.3\workspace\InceptionV3\data> D:\modelbox-win10-x64-1.5.3\python-embed\python.exe .\test_http.py ================ 61cf5127ce.jpg ================ response: {"clsse": 63, "score": 0.9996486902236938} ================ 7e2a453559.jpg ================ response: {"clsse": 81, "score": 0.999880313873291} 在InceptionV3工程data目录下即可查看测试图片的推理结果:三、小结本节介绍了如何使用ModelArts和ModelBox训练开发一个InceptionV3动物图片分类的AI应用,我们只需要准备模型文件以及简单的配置即可创建一个HTTP服务。同时我们可以了解到InceptionV3网络的基本结构、数据处理和模型训练方法,以及对应推理应用的逻辑。----转自博客:https://bbs.huaweicloud.com/blogs/449036
  • [技术干货] RK3588 AI 应用开发 (ResNet50V2-关键点检测)
    RK3588 AI 应用开发 (ResNet50V2-关键点检测)一、模型训练与转换ResNet50V2 是改进版的深度卷积神经网络,基于 ResNet 架构发展而来。它采用前置激活(将 BN 和 ReLU 移至卷积前)与身份映射,优化了信息传播和模型训练性能。作为 50 层深度的网络,ResNet50V2 广泛应用于图像分类、目标检测等任务,支持迁移学习,适合快速适配新数据集,具有良好的泛化能力和较高准确率。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/ResNet50V2" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="ResNet50V2 猫脸关键点检测") demo.launch(share=False, server_port=3000) /home/orangepi/miniconda3/envs/python-3.10.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码class ResNet50V2: def __init__(self, model_path): self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) def preprocess(self, image): image = image[:, :, ::-1] image = cv2.resize(image, (224, 224)) return np.expand_dims(image, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): feat = pred.squeeze().reshape(-1, 2) return feat def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 keypoints = self.post_process(pred) # 绘制关键点检测结果 h, w, _ = image.shape for x, y in keypoints: cv2.circle(image, (int(x * w), int(y * h)), 5, (0, 255, 0), -1) return image[..., ::-1] def release(self): self.rknn_lite.release() 3. 图片批量预测import os import cv2 import numpy as np import matplotlib.pyplot as plt from rknnlite.api import RKNNLite model = ResNet50V2('model/ResNet50V2.rknn') for image in os.listdir("image"): image = cv2.imread(os.path.join("image", image)) image = model.predict(image) plt.imshow(image) plt.axis('off') plt.show() model.release() 4. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/ResNet50V2', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = ResNet50V2('model/ResNet50V2.rknn') app.run(host='0.0.0.0', port=8000) model.release() W rknn-toolkit-lite2 version: 2.3.2 * Serving Flask app '__main__' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:8000 * Running on http://192.168.3.50:8000 Press CTRL+C to quit 127.0.0.1 - - [02/May/2025 02:13:40] "POST /v1/ResNet50V2 HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 02:13:46] "POST /v1/ResNet50V2 HTTP/1.1" 200 - 5. 上传图片预测三、小结本章介绍了基于 RK3588 的 ResNet50V2 关键点检测应用开发全流程,包括模型训练与转换、Gradio 界面设计、推理代码实现、批量预测处理及 Flask 服务部署,完整实现了从模型到端到端应用的落地。----转自博客:https://bbs.huaweicloud.com/blogs/451999
  • [技术干货] RK3588 AI 应用开发 (FCN-语义分割)
    RK3588 AI 应用开发 (FCN-语义分割)一、模型训练与转换FCN(全卷积网络,Fully Convolutional Networks)是用于语义分割任务的一种深度学习模型架构,引入了跳跃结构(Skip Architecture),通过融合浅层和深层的特征图,保留更多的细节信息,提升分割精度。此外,FCN还利用多尺度上下文聚合,捕捉不同层级的特征,增强了对不同大小目标的识别能力。FCN的成功推动了语义分割领域的发展,成为后续许多先进模型的基础。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/FCN" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="FCN 果蔬病虫害分割") demo.launch(share=False, server_port=3000) /home/orangepi/miniconda3/envs/python-3.10.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码class FCN: def __init__(self, model_path): self.num_classes = 117 self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) self.color_list = np.random.randint(0, 255, size=(self.num_classes, 3), dtype=np.uint8).tolist() def preprocess(self, image): image = image[:, :, ::-1] image = cv2.resize(image, (224, 224)) return np.expand_dims(image, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): feat = pred.squeeze() return np.argmax(feat, axis=-1).astype(np.uint8) def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 feat = self.post_process(pred) # 生成图像分割结果 canv = np.zeros_like(image) mask = cv2.resize(feat, image.shape[:2][::-1], interpolation=cv2.INTER_NEAREST) for i in range(1, self.num_classes): canv[mask==i] = self.color_list[i] return cv2.addWeighted(image[..., ::-1], 0.5, canv, 0.5, 0) def release(self): self.rknn_lite.release() 3. 图片批量预测import os import cv2 import numpy as np import matplotlib.pyplot as plt from rknnlite.api import RKNNLite model = FCN('model/FCN.rknn') for image in os.listdir("image"): image = cv2.imread(os.path.join("image", image)) image = model.predict(image) plt.imshow(image) plt.axis('off') plt.show() model.release() 4. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/FCN', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = FCN('model/FCN.rknn') app.run(host='0.0.0.0', port=8000) model.release() W rknn-toolkit-lite2 version: 2.3.2 I RKNN: [00:06:51.738] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14) I RKNN: [00:06:51.738] RKNN Driver Information: version: 0.9.6 I RKNN: [00:06:51.739] RKNN Model Information: version: 1, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T12:50:09)), target: RKNPU v2, target platform: rk3588, framework name: TFLite, framework layout: NHWC * Serving Flask app '__main__' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:8000 * Running on http://192.168.3.50:8000 Press CTRL+C to quit 127.0.0.1 - - [02/May/2025 00:07:17] "POST /v1/FCN HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 00:07:24] "POST /v1/FCN HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 00:07:31] "POST /v1/FCN HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2025 00:07:39] "POST /v1/FCN HTTP/1.1" 200 - 5. 上传图片预测三、小结本章介绍了基于RK3588平台使用FCN模型进行语义分割的AI应用开发全流程,包括模型训练与转换、Gradio界面开发、推理代码编写、批量预测实现及Flask服务部署。通过该流程,开发者可实现高效的图像分割任务,并在本地或云端进行预测和展示。----转自博客:https://bbs.huaweicloud.com/blogs/451998
  • [技术干货] RK3588 AI 应用开发 (YOLOX-目标检测)
    RK3588 AI 应用开发 (YOLOX-目标检测)一、模型训练和转换YOLOX是YOLO系列的优化版本,引入了解耦头、数据增强、无锚点以及标签分类等目标检测领域的优秀进展,拥有较好的精度表现,同时对工程部署友好。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/fish_det" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="YOLOX 深海鱼类检测") demo.launch(share=False, server_port=3000) * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码%%writefile YOLOX/yolox/data/datasets/voc_classes.py #!/usr/bin/env python3 # -*- coding:utf-8 -*- # Copyright (c) Megvii, Inc. and its affiliates. # VOC_CLASSES = ( '__background__', # always index 0 VOC_CLASSES = ( "fish", ) Overwriting YOLOX/yolox/data/datasets/voc_classes.pyimport sys sys.path.append("YOLOX") from yolox.utils import demo_postprocess, multiclass_nms, vis from yolox.data.data_augment import preproc as preprocess from yolox.data.datasets.voc_classes import VOC_CLASSESimport cv2 import numpy as np import ipywidgets as widgets from rknnlite.api import RKNNLite from IPython.display import display class YOLOX: def __init__(self, model_path): self.ratio = None self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) def preprocess(self, image): start_img, self.ratio = preprocess(image, (320, 320), swap=(0, 1, 2)) return np.expand_dims(start_img, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): predictions = demo_postprocess(pred.squeeze(), (320, 320)) boxes = predictions[:, :4] scores = predictions[:, 4:5] * predictions[:, 5:] boxes_xyxy = np.ones_like(boxes) boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2. boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2. boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2. boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2. boxes_xyxy /= self.ratio dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.25) return dets def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 dets = self.post_process(pred) # 绘制目标检测结果 if dets is not None: final_boxes = dets[:, :4] final_scores, final_cls_inds = dets[:, 4], dets[:, 5] image = vis(image, final_boxes, final_scores, final_cls_inds, conf=0.25, class_names=VOC_CLASSES) return image[..., ::-1] def img2bytes(self, image): """将图片转换为字节码""" return bytes(cv2.imencode('.jpg', image)[1]) def infer_video(self, video_path): """视频推理""" image_widget = widgets.Image(format='jpeg', width=800, height=600) display(image_widget) cap = cv2.VideoCapture(video_path) while True: ret, img_frame = cap.read() if not ret: break image_pred = self.predict(img_frame) image_widget.value = self.img2bytes(image_pred) cap.release() def release(self): """释放资源""" self.rknn_lite.release() 3. 图像预测4. 视频推理5. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/fish_det', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = YOLOX('model/yolox_fish.rknn') app.run(host='0.0.0.0', port=8000) model.release() 6. 上传图片预测三、小结本章介绍了基于RK3588平台使用YOLOX进行目标检测的全流程,包括模型训练与转换、Gradio界面开发、推理代码编写、图像和视频预测实现,以及Flask服务部署。整体实现了高效的鱼类检测应用,适用于嵌入式设备部署与实际场景应用。----转自博客:https://bbs.huaweicloud.com/blogs/452001
  • [技术干货] RK3588 AI 应用开发 (InceptionV3-图像分类)
    RK3588 AI 应用开发 (InceptionV3-图像分类)一、模型训练与转换Inception V3,GoogLeNet的改进版本,采用InceptionModule和全局平均池化层,v3一个最重要的改进是分解(Factorization),将7x7分解成两个一维的卷积(1x7,7x1),3x3也是一样(1x3,3x1),这样的好处,既可以加速计算(多余的计算能力可以用来加深网络),又可以将1个conv拆成2个conv,使得网络深度进一步增加,增加了网络的非线性。模型的训练与转换教程已经开放在AI Gallery中,其中包含训练数据、训练代码、模型转换脚本。在ModelArts的Notebook环境中训练后,再转换成对应平台的模型格式:onnx格式可以用在Windows设备上,RK系列设备上需要转换为rknn格式。二、应用开发1. 开发 Gradio 界面import cv2 import json import base64 import requests import numpy as np import gradio as gr def test_image(image_path): try: image_bgr = cv2.imread(image_path) image_string = cv2.imencode('.jpg', image_bgr)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') params = {"image_base64": image_base64} response = requests.post(f'http://{ip}:{port}{url}', data=json.dumps(params), headers={"Content-Type": "application/json"}) if response.status_code == 200: image_base64 = response.json().get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_rgb = cv2.imdecode(image_array, cv2.IMREAD_COLOR) else: image_rgb = None except Exception as e: return None else: return image_rgb if __name__ == "__main__": port = 8000 ip = "127.0.0.1" url = "/v1/InceptionV3" demo = gr.Interface(fn=test_image, inputs=gr.Image(type="filepath"), outputs=["image"], title="InceptionV3 动物分类") demo.launch(share=False, server_port=3000) /home/orangepi/miniconda3/envs/python-3.10.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm * Running on local URL: http://127.0.0.1:3000 * To create a public link, set `share=True` in `launch()`. 2. 编写推理代码class InceptionV3: def __init__(self, model_path): self.rknn_lite = RKNNLite() self.rknn_lite.load_rknn(model_path) self.rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) self.label = ['antelope', 'badger', 'bat', 'bear', 'bee', 'beetle', 'bison', 'boar', 'butterfly', 'cat', 'caterpillar', 'chimpanzee', 'cockroach', 'cow', 'coyote', 'crab', 'crow', 'deer', 'dog', 'dolphin', 'donkey', 'dragonfly', 'duck', 'eagle', 'elephant', 'flamingo', 'fly', 'fox', 'goat', 'goldfish', 'goose', 'gorilla', 'grasshopper', 'hamster', 'hare', 'hedgehog', 'hippopotamus', 'hornbill', 'horse', 'hummingbird', 'hyena', 'jellyfish', 'kangaroo', 'koala', 'ladybugs', 'leopard', 'lion', 'lizard', 'lobster', 'mosquito', 'moth', 'mouse', 'octopus', 'okapi', 'orangutan', 'otter', 'owl', 'ox', 'oyster', 'panda', 'parrot', 'pelecaniformes', 'penguin', 'pig', 'pigeon', 'porcupine', 'possum', 'raccoon', 'rat', 'reindeer', 'rhinoceros', 'sandpiper', 'seahorse', 'seal', 'shark', 'sheep', 'snake', 'sparrow', 'squid', 'squirrel', 'starfish', 'swan', 'tiger', 'turkey', 'turtle', 'whale', 'wolf', 'wombat', 'woodpecker', 'zebra'] def preprocess(self, image): image = image[:, :, ::-1] image = cv2.resize(image, (224, 224)) return np.expand_dims(image, axis=0) def rknn_infer(self, data): outputs = self.rknn_lite.inference(inputs=[data]) return outputs[0] def post_process(self, pred): clsse = np.argmax(pred, axis=-1) score = pred[0][clsse[0]].item() return self.label[clsse[0]], round(score * 100, 2) def predict(self, image): # 图像预处理 data = self.preprocess(image) # 模型推理 pred = self.rknn_infer(data) # 模型后处理 label, score = self.post_process(pred) # 绘制识别结果 print(f'{label}:{score}%') image = cv2.putText(image, f'{label}:{score}%', (0, 100), cv2.FONT_HERSHEY_TRIPLEX, 4, (0, 255, 0), 8) return image[..., ::-1] def release(self): self.rknn_lite.release() 3. 图片批量预测import os import cv2 import numpy as np import matplotlib.pyplot as plt from rknnlite.api import RKNNLite model = InceptionV3('model/InceptionV3.rknn') for image in os.listdir("image"): image = cv2.imread(os.path.join("image", image)) image = model.predict(image) plt.imshow(image) plt.axis('off') plt.show() model.release() 4. 创建 Flask 服务import cv2 import base64 import numpy as np from rknnlite.api import RKNNLite from flask import Flask, request, jsonify from flask_cors import CORS app = Flask(__name__) CORS(app) @app.route('/v1/InceptionV3', methods=['POST']) def inference(): data = request.get_json() image_base64 = data.get("image_base64") image_binary = base64.b64decode(image_base64) image_array = np.frombuffer(image_binary, dtype=np.uint8) image_bgr = cv2.imdecode(image_array, cv2.IMREAD_COLOR) image_rgb = model.predict(image_bgr) image_string = cv2.imencode('.jpg', image_rgb)[1].tobytes() image_base64 = base64.b64encode(image_string).decode('utf-8') return jsonify({ "image_base64": image_base64 }), 200 if __name__ == '__main__': model = InceptionV3('model/InceptionV3.rknn') app.run(host='0.0.0.0', port=8000) model.release() W rknn-toolkit-lite2 version: 2.3.2 * Serving Flask app '__main__' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:8000 * Running on http://192.168.3.50:8000 Press CTRL+C to quit 127.0.0.1 - - [01/May/2025 20:37:00] "POST /v1/InceptionV3 HTTP/1.1" 200 - pig:99.95% 127.0.0.1 - - [01/May/2025 20:37:09] "POST /v1/InceptionV3 HTTP/1.1" 200 - swan:99.95% 127.0.0.1 - - [01/May/2025 20:37:20] "POST /v1/InceptionV3 HTTP/1.1" 200 - cat:97.02% 5. 上传图片预测三、小结本章介绍了基于RK3588平台的InceptionV3图像分类应用开发全流程,包括模型训练与格式转换、Gradio界面设计、推理代码实现、批量预测处理及Flask服务部署,实现了从本地到Web端的高效AI推理应用。----转自博客:https://bbs.huaweicloud.com/blogs/451978
  • [技术干货] 无人机巡检数据集:空中语义分割
    无人机巡检数据集:空中语义分割由计算机图形与视觉研究所(ICG开发的语义无人机数据集,旨在推动城市场景的语义理解研究,提升自主无人机飞行与着陆的安全性。以下从数量、类别、分布及分辨率四个核心维度展开说明:一、数据数量与划分该数据集包含600张高分辨率图像,按用途分为训练集和测试集两部分:训练集:400张图像,公开可获取,包含完整的标注信息,支持模型训练与算法验证。测试集:200张图像,为私有数据,主要用于评估模型的泛化能力,确保研究结果的客观性。此外,数据集还提供了丰富的辅助数据,包括1Hz采集的高分辨率图像序列、5Hz的鱼眼立体图像(同步配有IMU测量数据)、1Hz的热成像图像,以及3栋房屋的地面控制点和全站仪获取的3D真值数据,进一步扩展了其应用场景。二、语义类别与标注数据集针对语义分割任务定义了20个核心类别,覆盖城市场景中的典型元素,具体分类如下:自然元素:树(tree)、草(gras)、其他植被(other vegetation)、泥土(dirt)、 gravel、岩石(rocks)、水(water);人工构造:铺装区域(paved area)、泳池(pool)、屋顶(roof)、墙(wall)、栅栏(fence)、栅栏柱(fence-pole)、窗户(window)、门(door)、障碍物(obstacle);动态目标:人(person)、狗(dog)、汽车(car)、自行车(bicycle)。标注精度达到像素级,确保语义分割任务的准确性;同时,针对人物检测任务,提供了训练集和测试集的边界框(bounding box)标注,支持多任务研究。三、数据分布特点数据集的图像均通过无人机从天底视角(鸟瞰视角) 采集,覆盖超过20栋房屋的城市区域,拍摄高度在地面以上5至30米之间,确保场景的真实性与多样性。从分布来看:场景覆盖:包含居民区内的建筑、植被、道路、休闲区域(如泳池)等,兼顾自然与人工环境的混合场景;目标密度:图像中包含不同数量的动态目标(人、动物、车辆)和静态结构,适合测试算法在复杂目标交互场景下的表现;辅助数据分布:热成像、立体图像等辅助数据与主图像时空同步,可用于多模态融合研究,提升模型对环境的感知能力。四、分辨率与数据格式数据集的核心图像采用高分辨率相机采集,单张图像尺寸为6000×4000像素(2400万像素),确保细节信息的完整性,满足精细语义分割的需求。训练集提供多种格式的标注文件,包括:Python pickle格式的边界框数据;可选的XML格式边界框标注;可选的掩码图像(mask images),便于不同算法框架的直接使用。五、数据集下载地址# 数据集地址 https://developer.huaweicloud.com/develop/aigallery/dataset/detail?id=0efe9613-5248-4a43-925e-4d6d377b6996综上,该数据集凭借大尺寸、多类别、高精度的特点,为无人机视觉、语义分割、目标检测等领域的研究提供了高质量的基准数据,其丰富的辅助信息也为多模态感知与3D重建等任务奠定了基础。
  • [分享交流] 【话题讨论】AI冷门场景创新
    粮食仓储通过AI多模态监测实现温湿度智能调控,降低损耗率达30%以上,看到相关新闻 还是觉得AI 现在的应用场景真的是无所不在, 还有那些冷门但是有用的场景呢 ,欢迎大家套路。
  • [分享交流] 【话题讨论】AI Agent对传统职业的冲击,该如何应对
    随着AI的快速发展,越来越多的传统行业收到冲击,这种现象到底是好是坏,好坏的点在哪里,欢迎大家来讨论
总条数:71 到第
上滑加载中