• [API使用] mindspore.dataset.vision的HWC2CHW算子问题
    使用的是mindspore1.7版本源代码为
  • [问题求助] 【D2150-10-SIU】【YOLOV3狗识别demo】开启后没有收到告警事件
    麻烦帮忙看下这个问题,实在找不到原因
  • [问题求助] D2150-10-SIU 告警联动的音频听不到?
    打开摄像机页面的配置-》音视频-》音频-》告警联动,点击“试听”没有声音。如果不需要其他外接设备,怎么出声?如果需要外接功放设备,摄像机上也没有其他接口,怎么接呢?
  • [其他] 如何实现计算机视觉项目
    和在组织内值得进行的所有创新一样,你应该选择一种有策略的方式来实现计算机视觉项目。利用计算机视觉技术实现成功创新取决于整体业务策略、资源和数据。以下问题可以帮助你为计算机视觉项目构建战略路线图。1、计算机视觉解决方案应该降低成本还是增加收益?成功的计算机视觉项目要么降低成本要么提高收益(或者二者兼顾),你应该定义该项目的目标。只有这样,它才能对组织及其发展产生重要影响。2、如何衡量项目的成功?每个计算机视觉项目都是不同的,你需要定义一个特定于该项目的成功指标。设置好指标后,你应该确保它被业务人员和数据科学家等认可。3、能否保证信息的获取?开启计算机视觉项目时,数据科学家应该能够轻松访问数据。他们需要和来自不同部门(如 IT 部门)的重要同事合作。这些同事应以其业务知识提供支持,内部官僚主义则会成为主要约束。4、 组织收集的数据是否合适?计算机视觉算法并非魔法。它们需要数据才能运作,输入数据的质量决定其性能。有多种不同方法和来源可供收集合适数据,这取决于你的目标。无论如何,拥有的输入数据越多,计算机视觉模型性能优秀的可能性越大。如果你对数据的量和质存在疑虑,你可以请数据科学家帮忙评估数据集质量,必要情况下,找到获取第三方数据的最优方式。5. 组织是否以恰当格式收集数据?除了拥有合适量和类型的数据以外,你还需要确保数据的格式。假设你使用数千张完美的手机照片(分辨率高,背景为白色)训练目标检测算法。然后发现算法无法运行,因为实际用例是在不同光照/对比度/背景条件下检测持有手机的人,而不是检测手机本身。这样你之前的数据收集努力基本上就作废了,你还需要重头再来。此外,你应该了解,如果数据存在偏见,算法会学到该偏见。
  • [其他] 区分计算机视觉与其相关领域
    计算机视觉完成的任务远超其他领域,如图像处理、机器视觉,尽管它们存在一些共同点。接下来,我们就来了解一下这些领域之间的差异。图像处理图像处理旨在处理原始图像以应用某种变换。其目标通常是改进图像或将其作为某项特定任务的输入,而计算机视觉的目标是描述和解释图像。例如,降噪、对比度或旋转操作这些典型的图像处理组件可以在像素层面执行,无需对图像整体具备全面的了解。机器视觉机器视觉是计算机视觉用于执行某些(生产线)动作的特例。在化工行业中,机器视觉系统可以检查生产线上的容器(是否干净、空置、无损)或检查成品是否恰当封装,从而帮助产品制造。计算机视觉计算机视觉可以解决更复杂的问题,如人脸识别、详细的图像分析(可帮助实现视觉搜索,如 Google Images),或者生物识别方法。
  • [MindX SDK] MindX SDK -- TextSnake弯曲形状文字检测参考设计
    MindX SDK -- TextSnake弯曲形状文字检测参考设计1 案例概述1.1 概要描述在本系统中,目的是基于MindX SDK,在华为云昇腾平台上,开发端到端TextSnake弯曲形状文字检测的参考设计,实现对图像中的弯曲形状文字进行检测的功能,达到功能要求1.2 模型介绍本项目主要用到了TextSnake模型TextSnake模型相关文件可以在此处下载:cid:link_21.3 实现流程1、基础环境:Ascend 310、mxVision、Ascend-CANN-toolkit、Ascend Driver2、模型转换:onnx模型转昇腾离线模型: TextSnake.onnx --> TextSnake_bs1.om3、业务流程编排与配置4、python推理流程代码开发 技术流程图如下:1.4 代码地址本项目的代码地址为:cid:link_62 软件方案介绍2.1 项目方案架构介绍本系统设计了不同的功能模块。主要流程为:图片传入流中,利用TextSnake的检测模型检测弯曲形状文字,最后根据检测结果绘制检测框。各模块功能描述如表2.1所示:表2.1 系统方案中各模块功能:序号子系统功能描述1图像输入调用MindX SDK的appsrc输入图片2弯曲形状文字检测利用TextSnake的检测模型,检测出图片中弯曲形状文字的位置形状信息3后处理使用检测出的位置形状信息在原图上绘制检测框4结果输出将检测结果输出2.2 代码目录结构与说明本工程名称为TextSnake,工程目录如下图所示:├── main.py //运行工程项目的主函数 ├── evaluate.py //精度计算 ├── t.pipeline //pipeline ├── model //存放模型文件 ├── test.jpg //测试图像 ├── result.jpg //输出结果 ├── sdk.png //流程图 ├── pipeline.png //pipeline流程图 └──README.md 3 开发准备3.1 环境依赖说明环境依赖软件和版本如下表:软件名称版本MindX SDK2.0.4Ascend-CANN-toolkit5.0.4ubuntu18.04.1 LTSpython3.9.2cv24.1.2numpy1.15.1onnx1.8.0torch1.5.0torchvision0.6.0scikit_image0.16.2scipy1.5.4easydict1.8tdqm4.62.33.2 环境搭建3.2.1 环境变量设置. ${MX_SDK_HOME}/set_env.sh . ${ascend-toolkit-path}/set_env.sh3.3 模型转换3.3.1 TextSnake模型转换步骤1 在ModelZoo上下载TextSnake模型。下载地址:cid:link_3步骤2 将获取到的TextSnake模型onnx文件存放至./model。步骤3 模型转换在./model目录下执行一下命令atc --model=TextSnake.onnx --framework=5 --output=TextSnake_bs1 --input_format=NCHW --input_shape="image:1,3,512,512" --log=info --soc_version=Ascend310执行完模型转换脚本后,会生成相应的TextSnake_bs1.om模型文件。4 弯曲形状文字检测流程开发实现4.1 pipeline编排 appsrc # 输入 mxpi_tensorinfer # 模型推理(弯曲形状文字检测) mxpi_dataserialize #数据序列化 appsink # 输出4.2 主程序开发1、初始化流管理。2、加载图像,进行推理。3、获取pipeline各插件输出结果,解析输出结果。4、根据识别结果在图上绘制检测框5、销毁流5 编译与运行示例步骤如下:步骤 1 将任意一张jpg格式的图片存到当前目录下(./TextSnake),命名为test.jpg。如果 pipeline 文件(或测试图片)不在当前目录下(./TestSnake),需要修改 main.py 的pipeline(或测试图片)路径指向到所在目录。此外,需要从 cid:link_1 下载util文件夹至当前目录(./TextSnake),并修改其中的detection.py,修改方式如下(以下行数均为原代码行数):(1)将12行改为:def __init__(self, tr_thresh=0.4, tcl_thresh=0.6):并删除该构造函数中与model相关的语句。(2)将38行:in_poly = cv2.pointPolygonTest(cont, (xmean, i), False)改为in_poly = cv2.pointPolygonTest(cont, (int(xmean), int(i)), False)56行改为if cv2.pointPolygonTest(cont, (int(test_pt[0]), int(test_pt[1])), False) > 067行改为return cv2.pointPolygonTest(cont, (int(x), int(y)), False) > 0(3)在315行前后分别添加:conts = list(conts)conts = tuple(conts)步骤2执行python3 main.py6 测试精度步骤 1 安装数据集用以测试精度。数据集 TotalText 需要自行下载。 数据集图片部分: cid:link_4 totaltext.zip数据集ground truth部分: cid:link_5 groundtruth_text.zip将下载好的数据集调整成以下路径的形式├── main.py //运行工程项目的主函数 ├── evaluate.py //精度计算 ├── t.pipeline //pipeline ├── model //存放模型文件 ├── test.jpg //测试图像 ├── result.jpg //输出结果 ├── sdk.png //流程图 ├── pipeline.png //pipeline流程图 ├── data ├── total-text ├── gt ├── Test ├── poly_gt_img1.mat //测试集groundtruth ... ├── img1.jpg //测试集图片 ... └──README.md 步骤 2 除先前下载的util文件夹之外,还需要从以下网址中下载Deteval.py与polygon_wrapper.py文件,放入util文件夹中 cid:link_0步骤 3 在命令行输入 如下命令运行精度测试python3 evaluate.py得到精度测试的结果:与pytorch实现版本的精度结果相对比,其精度相差在1%以下,精度达标。7 参考链接cid:link_7
  • [MindX SDK] 伪装目标分割参考设计案例
    MindX SDK -- 伪装目标分割参考设计案例1 案例概述1.1 概要描述在本系统中,目的是基于MindX SDK,在华为云昇腾平台上,开发端到端伪装目标分割的参考设计,实现对图像中的伪装目标进行识别检测的功能,达到功能要求1.2 模型介绍本项目主要基于用于通用伪装目标分割任务的DGNet模型模型的具体描述和细节可以参考原文:cid:link_4具体实现细节可以参考基于PyTorch深度学习框架的代码:cid:link_3所使用的公开数据集是NC4K,可以在此处下载:cid:link_1所使用的模型是EfficientNet-B4版本的DGNet模型,原始的PyTorch模型文件可以在此处下载:cid:link_01.3 实现流程基础环境:Ascend 310、mxVision、Ascend-CANN-toolkit、Ascend Driver模型转换:将ONNX模型(.onnx)转换为昇腾离线模型(.om)昇腾离线模型推理流程代码开发1.4 代码地址本项目的代码地址为:cid:link_21.5 特性及适用场景本项目适用于自然场景下图片完整清晰、无模糊鬼影的场景,并且建议输入图片为JPEG编码格式,大小不超过10M。注意:由于模型限制,本项目暂只支持自然场景下伪装动物的检测,不能用于其他用途2 软件方案介绍2.1 项目方案架构介绍本系统设计了不同的功能模块。主要流程为:图片传入流中,利用DGNet检测模型检测伪装目标,将检测出的伪装目标以逐像素概率图的形式输出,各模块功能描述如表2.1所示:表2.1 系统方案中各模块功能:序号子系统功能描述1图像输入调用cv2中的图片加载函数,用于加载输入图片2图像前处理将输入图片放缩到352*352大小,并执行归一化操作3伪装目标检测利用DGNet检测模型,检测出图片中的伪装目标4数据分发将DGNet模型检测到的逐像素概率图进行数据分发到下个插件5结果输出将伪装目标概率预测图结果进行输出并保存2.2 代码目录结构与说明本工程名称为DGNet,工程目录如下列表所示:./ ├── assets # 文件 │ ├── 74.jpg │ └── 74.png ├── data # 数据集存放路径 │ └── NC4K ├── inference_om.py # 昇腾离线模型推理python脚本文件 ├── README.md # 本文件 ├── seg_results_om │ ├── Exp-DGNet-OM # 预测结果图存放路径 ├── snapshots │ ├── DGNet # 模型文件存放路径3 开发准备3.1 环境依赖说明环境依赖软件和版本如下表:软件名称版本ubantu18.04.1 LTSMindX SDK2.0.4Python3.9.2CANN5.0.4numpy1.21.2opencv-python4.5.3.56mindspore (cpu)1.9.03.2 环境搭建在编译运行项目前,需要设置环境变量# MindXSDK 环境变量: . ${SDK-path}/set_env.sh # CANN 环境变量: . ${ascend-toolkit-path}/set_env.sh # 环境变量介绍 SDK-path: SDK mxVision 安装路径 ascend-toolkit-path: CANN 安装路径3.3 模型转换步骤1 下载DGNet (Efficient-B4) 的ONNX模型:cid:link_0步骤2 将下载获取到的DGNet模型onxx文件存放至./snapshots/DGNet/DGNet.onnx。步骤3 模型转换具体步骤# 进入对应目录 cd ./snapshots/DGNet/ # 执行以下命令将ONNX模型(.onnx)转换为昇腾离线模型(.om) atc --framework=5 --model=DGNet.onnx --output=DGNet --input_shape="image:1,3,352,352" --log=debug --soc_version=Ascend310执行完模型转换脚本后,会在对应目录中获取到如下转化模型:DGNet.om(本项目中在Ascend平台上所使用的离线模型文件)。4 推理与评测示例步骤如下:步骤0参考1.2节中说明下载一份测试数据集合:下载链接:cid:link_1步骤1执行离线推理Python脚本python inference_om.py --om_path ./snapshots/DGNet/DGNet.om --save_path ./seg_results_om/Exp-DGNet-OM/NC4K/ --data_path ./data/NC4K/Imgs 步骤2定量性能验证:使用原始GitHub仓库中提供的标准评测代码进行测评,具体操作步骤如下:# 拉取原始仓库 git clone https://github.com/GewelsJI/DGNet.git # 将如下两个文件夹放入当前 mv ./DGNet/lib_ascend/eval ./contrib/CamouflagedObjectDetection/ mv ./DGNet/lib_ascend/evaluation.py ./contrib/CamouflagedObjectDetection/ # 运行如下命令进行测评 python evaluation.py然后可以生成评测指标数值表格。可以看出DGNet模型的Smeasure指标数值为0.856,已经超过了项目交付中提到的“大于0.84”的要求。+---------+-----------------------+----------+-----------+-------+-------+--------+-------+-------+--------+-------+ | Dataset | Method | Smeasure | wFmeasure | MAE | adpEm | meanEm | maxEm | adpFm | meanFm | maxFm | +---------+-----------------------+----------+-----------+-------+-------+--------+-------+-------+--------+-------+ | NC4K | Exp-DGNet-OM | 0.856 | 0.782 | 0.043 | 0.909 | 0.91 | 0.921 | 0.8 | 0.812 | 0.833 | +---------+-----------------------+----------+-----------+-------+-------+--------+-------+-------+--------+-------+定性性能验证:输入伪装图片:预测分割结果:5 参考引用主要参考为如下三篇论文:@article{ji2022gradient, title={Deep Gradient Learning for Efficient Camouflaged Object Detection}, author={Ji, Ge-Peng and Fan, Deng-Ping and Chou, Yu-Cheng and Dai, Dengxin and Liniger, Alexander and Van Gool, Luc}, journal={Machine Intelligence Research}, year={2023} } @article{fan2021concealed, title={Concealed Object Detection}, author={Fan, Deng-Ping and Ji, Ge-Peng and Cheng, Ming-Ming and Shao, Ling}, journal={IEEE TPAMI}, year={2022} } @inproceedings{fan2020camouflaged, title={Camouflaged object detection}, author={Fan, Deng-Ping and Ji, Ge-Peng and Sun, Guolei and Cheng, Ming-Ming and Shen, Jianbing and Shao, Ling}, booktitle={IEEE CVPR}, pages={2777--2787}, year={2020} }
  • [技术干货] 实例分割-Mask R-CNN 模型
    实例分割-Mask R-CNN 模型本案例我们将进行实例分割模型Mask R-CNN的训练和测试的学习。在计算机视觉领域,实例分割(Instance Segmentation)是指从图像中识别物体的各个实例,并逐个将实例进行像素级标注的任务。实例分割技术在自动驾驶、医学影像、高精度GIS识别、3D建模辅助等领域有广泛的应用。本案例将对实例分割领域经典的Mask R-CNN模型进行简单介绍,并使用Matterport开源Mask R-CNN实现,展示如何在华为云ModelArts上训练Mask R-CNN模型。点击跳转至Mask R-CNN模型详解注意事项:本案例使用框架**:** TensorFlow-1.13.1本案例使用硬件规格**:** 8 vCPU + 64 GiB + 1 x Tesla V100-PCIE-32GB进入运行环境方法:点此链接进入AI Gallery,点击Run in ModelArts按钮进入ModelArts运行环境,如需使用GPU,您可以在ModelArts JupyterLab运行界面右边的工作区进行切换运行代码方法**:** 点击本页面顶部菜单栏的三角形运行按钮或按Ctrl+Enter键 运行每个方块中的代码JupyterLab的详细用法**:** 请参考《ModelAtrs JupyterLab使用指导》碰到问题的解决办法**:** 请参考《ModelAtrs JupyterLab常见问题解决办法》1.首先进行包的安装与引用!pip install pycocotools==2.0.0Collecting pycocotools==2.0.0 Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/96/84/9a07b1095fd8555ba3f3d519517c8743c2554a245f9476e5e39869f948d2/pycocotools-2.0.0.tar.gz (1.5MB)  100% |████████████████████████████████| 1.5MB 52.3MB/s ta 0:00:01 [?25hBuilding wheels for collected packages: pycocotools Running setup.py bdist_wheel for pycocotools ... [?25ldone [?25h Stored in directory: /home/ma-user/.cache/pip/wheels/63/72/9e/bac3d3e23f6b04351d200fa892351da57f0e68c7aeec0b1b08 Successfully built pycocotools Installing collected packages: pycocotools Successfully installed pycocotools-2.0.0 You are using pip version 9.0.1, however version 21.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command.!pip install imgaug==0.2.9Collecting imgaug==0.2.9 Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/17/a9/36de8c0e1ffb2d86f871cac60e5caa910cbbdb5f4741df5ef856c47f4445/imgaug-0.2.9-py2.py3-none-any.whl (753kB)  100% |████████████████████████████████| 757kB 83.4MB/s ta 0:00:01 91% |█████████████████████████████▏ | 686kB 83.9MB/s eta 0:00:01 [?25hRequirement already satisfied: numpy>=1.15.0 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: opencv-python in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: matplotlib in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: Pillow in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: scikit-image>=0.11.0 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: six in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Collecting Shapely (from imgaug==0.2.9) Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)  100% |████████████████████████████████| 1.0MB 40.5MB/s ta 0:00:01 [?25hRequirement already satisfied: imageio in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: scipy in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from imgaug==0.2.9) Requirement already satisfied: python-dateutil>=2.1 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from matplotlib->imgaug==0.2.9) Requirement already satisfied: pytz in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from matplotlib->imgaug==0.2.9) Requirement already satisfied: kiwisolver>=1.0.1 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from matplotlib->imgaug==0.2.9) Requirement already satisfied: cycler>=0.10 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from matplotlib->imgaug==0.2.9) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from matplotlib->imgaug==0.2.9) Requirement already satisfied: cloudpickle>=0.2.1 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from scikit-image>=0.11.0->imgaug==0.2.9) Requirement already satisfied: PyWavelets>=0.4.0 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from scikit-image>=0.11.0->imgaug==0.2.9) Requirement already satisfied: networkx>=1.8 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from scikit-image>=0.11.0->imgaug==0.2.9) Requirement already satisfied: decorator>=4.1.0 in /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages (from networkx>=1.8->scikit-image>=0.11.0->imgaug==0.2.9) Installing collected packages: Shapely, imgaug Found existing installation: imgaug 0.2.6 Uninstalling imgaug-0.2.6: Successfully uninstalled imgaug-0.2.6 Successfully installed Shapely-1.7.1 imgaug-0.2.9 You are using pip version 9.0.1, however version 21.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command.2.对需要的代码和数据进行下载import os from modelarts.session import Session session = Session() if session.region_name == 'cn-north-1': bucket_path="modelarts-labs/end2end/mask_rcnn/instance_segmentation.tar.gz" elif session.region_name == 'cn-north-4': bucket_path="modelarts-labs-bj4/end2end/mask_rcnn/instance_segmentation.tar.gz" else: print("请更换地区到北京一或北京四") if not os.path.exists('./src/mrcnn'): session.download_data(bucket_path=bucket_path, path='./instance_segmentation.tar.gz') if os.path.exists('./instance_segmentation.tar.gz'): # 使用tar命令解压资源包 os.system("tar zxf ./instance_segmentation.tar.gz") # 清理压缩包 os.system("rm ./instance_segmentation.tar.gz")Successfully download file modelarts-labs-bj4/end2end/mask_rcnn/instance_segmentation.tar.gz from OBS to local ./instance_segmentation.tar.gz3.Mask R-CNN模型训练部分3.1 第一步:导入相应的Python库,准备预训练模型import sys import random import math import re import time import numpy as np import cv2 import matplotlib import matplotlib.pyplot as plt from src.mrcnn.config import Config from src.mrcnn import utils import src.mrcnn.model as modellib from src.mrcnn import visualize from src.mrcnn.model import log %matplotlib inline # Directory to save logs and trained model MODEL_DIR = "logs" # Local path to trained weights file COCO_MODEL_PATH = "data/mask_rcnn_coco.h5"/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) Using TensorFlow backend.3.2 第二步:生成相关配置项我们定义Config类的子类MyTrainConfig,指定相关的参数,较为关键的参数有:__NAME__: Config的唯一名称__NUM_CLASSES__: 分类的数量,COCO中共有80种物体+背景__IMAGE_MIN_DIM和IMAGE_MAX_DIM__: 图片的最大和最小尺寸,我们生成固定的128x128的图片,因此都设置为128__TRAIN_ROIS_PER_IMAGE__: 每张图片上训练的RoI个数__STEPS_PER_EPOCH和VALIDATION_STEPS__: 训练和验证时,每轮的step数量,减少step的数量可以加速训练,但是检测精度降低class MyTrainConfig(Config): # 可辨识的名称 NAME = "my_train" # GPU的数量和每个GPU处理的图片数量,可以根据实际情况进行调整,参考为Nvidia Tesla P100 GPU_COUNT = 1 IMAGES_PER_GPU = 1 # 物体的分类个数,COCO中共有80种物体+背景 NUM_CLASSES = 1 + 80 # background + 80 shapes # 图片尺寸统一处理为1024,可以根据实际情况再进一步调小 IMAGE_MIN_DIM = 1024 IMAGE_MAX_DIM = 1024 # 因为我们生成的形状图片较小,这里可以使用较小的Anchor进行RoI检测 # RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128) # anchor side in pixels # 每张图片上训练的RoI个数,因为我们生成的图片较小,而且每张图片上的形状较少 # 因此可以适当调小该参数,用较少的Anchor即可覆盖大致的物体信息 TRAIN_ROIS_PER_IMAGE = 200 # 每轮训练的step数量 STEPS_PER_EPOCH = 100 # 每轮验证的step数量 VALIDATION_STEPS = 20 config = MyTrainConfig() config.display()Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_CHANNEL_COUNT 3 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 93 IMAGE_MIN_DIM 1024 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NAME my_train NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 100 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 20 WEIGHT_DECAY 0.00013.3 第三步:准备数据集我们使用封装好的CocoDataset类,生成训练集和验证集。from src.mrcnn.coco import CocoDataset COCO_DIR = 'data' # 生成训练集 dataset_train = CocoDataset() dataset_train.load_coco(COCO_DIR, "train") # 加载训练数据集 dataset_train.prepare()loading annotations into memory... Done (t=0.04s) creating index... index created!# 生成验证集 dataset_val = CocoDataset() dataset_val.load_coco(COCO_DIR, "val") # 加载验证数据集 dataset_val.prepare()loading annotations into memory... Done (t=0.17s) creating index... index created!4.创建模型4.1 第一步:用"training"模式创建模型对象,用于形状数据集的训练model = modellib.MaskRCNN(mode="training", config=config, model_dir=MODEL_DIR)WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. [DEBUG] <__main__.MyTrainConfig object at 0x7f9b6edc7c50> [DEBUG] Tensor("rpn_class/concat:0", shape=(?, ?, 2), dtype=float32) Tensor("rpn_bbox_1/concat:0", shape=(?, ?, 4), dtype=float32) <tf.Variable 'anchors/Variable:0' shape=(1, 261888, 4) dtype=float32_ref>4.2 第二步:加载预训练模型的权重model.load_weights(COCO_MODEL_PATH, by_name=True)接下来,我们使用预训练的模型,结合Shapes数据集,对模型进行训练5.训练模型Keras中的模型可以按照制定的层进行构建,在模型的train方法中,我们可以通过layers参数来指定特定的层进行训练。layers参数有以下几种预设值:heads:只训练head网络中的分类、mask和bbox回归all: 所有的layer3+: 训练ResNet Stage3和后续Stage4+: 训练ResNet Stage4和后续Stage5+: 训练ResNet Stage5和后续Stage此外,layers参数还支持正则表达式,按照匹配规则指定layer,可以调用model.keras_model.summary()查看各个层的名称,然后按照需要指定要训练的层。下面的步骤对所有的layer训练1个epoch,耗时约4分钟model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=1, layers='all') model_savepath = 'my_mrcnn_model.h5' model.keras_model.save_weights(model_savepath)Starting at epoch 0. LR=0.001 Checkpoint Path: logs/my_train20210309T1458/mask_rcnn_my_train_{epoch:04d}.h5 WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/keras/engine/training_generator.py:47: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class. UserWarning('Using a generator with `use_multiprocessing=True`' Epoch 1/1 100/100 [==============================] - 111s 1s/step - loss: 0.4283 - rpn_class_loss: 0.0090 - rpn_bbox_loss: 0.0787 - mrcnn_class_loss: 0.0627 - mrcnn_bbox_loss: 0.0758 - mrcnn_mask_loss: 0.2021 - val_loss: 0.4290 - val_rpn_class_loss: 0.0100 - val_rpn_bbox_loss: 0.1086 - val_mrcnn_class_loss: 0.0920 - val_mrcnn_bbox_loss: 0.0539 - val_mrcnn_mask_loss: 0.16456.使用Mask R-CNN 检测图片物体6.1 第一步:定义InferenceConfig,并创建"Inference"模式的模型对象class InferenceConfig(MyTrainConfig): GPU_COUNT = 1 IMAGES_PER_GPU = 1 inference_config = InferenceConfig() inference_model = modellib.MaskRCNN(mode="inference", config=inference_config, model_dir=MODEL_DIR)[DEBUG] <__main__.InferenceConfig object at 0x7f9681f59710> [DEBUG] Tensor("rpn_class_1/concat:0", shape=(?, ?, 2), dtype=float32) Tensor("rpn_bbox_3/concat:0", shape=(?, ?, 4), dtype=float32) Tensor("input_anchors:0", shape=(?, ?, 4), dtype=float32) WARNING:tensorflow:From /home/ma-user/work/case_dev/mask_rcnn/src/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead.将我们生成的模型权重信息加载进来# 加载我们自己训练出的形状模型文件的权重信息 print("Loading weights from ", model_savepath) inference_model.load_weights(model_savepath, by_name=True)Loading weights from my_mrcnn_model.h56.2 第二步:从验证数据集中随机选出一张图片进行预测,并显示结果# 随机选出图片进行测试 image_id = random.choice(dataset_val.image_ids) original_image, image_meta, gt_class_id, gt_bbox, gt_mask =\ modellib.load_image_gt(dataset_val, inference_config, image_id, use_mini_mask=False) log("original_image", original_image) log("image_meta", image_meta) log("gt_class_id", gt_class_id) log("gt_bbox", gt_bbox) log("gt_mask", gt_mask) det_instances_savepath = 'random.det_instances.jpg' visualize.display_instances(original_image, gt_bbox, gt_mask, gt_class_id, dataset_train.class_names, figsize=(8, 8), save_path=det_instances_savepath)original_image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 image_meta shape: (93,) min: 0.00000 max: 1024.00000 float64 gt_class_id shape: (17,) min: 1.00000 max: 74.00000 int32 gt_bbox shape: (17, 4) min: 1.00000 max: 1024.00000 int32 gt_mask shape: (1024, 1024, 17) min: 0.00000 max: 1.00000 bool# 定义助手函数用于设置matplot中的子绘制区域所在的行和列 def get_ax(rows=1, cols=1, size=8): _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows)) return ax results = inference_model.detect([original_image], verbose=1) r = results[0] prediction_savepath = 'random.prediction.jpg' visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], dataset_val.class_names, r['scores'], ax=get_ax(), save_path=prediction_savepath)Processing 1 images image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 93) min: 0.00000 max: 1024.00000 int64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float326.3 第三步:测试其他图片。本目录下的data/val2014目录下有很多测试图片,修改下面代码中test_path变量右边的文件名,即可更换为不同图片,测试图片的预测效果。test_path = './data/val2014/COCO_val2014_000000019176.jpg'import skimage.io image = skimage.io.imread(test_path) results = inference_model.detect([image], verbose=1) r = results[0] prediction_savepath = 'self.prediction.jpg' visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], dataset_val.class_names, r['scores'], ax=get_ax(), save_path=prediction_savepath)Processing 1 images image shape: (480, 640, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 93) min: 0.00000 max: 1024.00000 float64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float327.评估模型这一步我们对自己训练出的模型进行一个简单的评估。计算模型的平均精度mAP(mean Average Precision)# 计算VOC类型的 mAP,条件是 IoU=0.5 # 下面的示例中只选出10张图片进行评估,增加图片数量可以增加模型评估的准确性 image_ids = np.random.choice(dataset_val.image_ids, 10) APs = [] for image_id in image_ids: # Load image and ground truth data image, image_meta, gt_class_id, gt_bbox, gt_mask =\ modellib.load_image_gt(dataset_val, inference_config, image_id, use_mini_mask=False) molded_images = np.expand_dims(modellib.mold_image(image, inference_config), 0) # Run object detection results = inference_model.detect([image], verbose=0) r = results[0] # Compute AP AP, precisions, recalls, overlaps =\ utils.compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks']) APs.append(AP) print("mAP: ", np.mean(APs))mAP: 0.6203394930987131本案例结束。
  • [技术干货] 基于计算机视觉的钢筋条数检测
    使用摄像头进行钢筋盘点案例内容介绍中国的各施工工地每年都要使用大量的钢筋,一车钢筋运到工地现场需要工作人员进行盘点,通常的做法是靠人工一根根数的方式,非常耗时费力。为了提高钢筋盘点效率,业界提出了对钢筋图片进行拍照,然后使用AI算法检测图片中的钢筋条数,实践证明,该方案不仅准确率高,而且可以极大提高效率。本案例基于目标检测的方法,使用250张已经人工标注好的图片进行AI模型的训练,训练25分钟,即可检测出图片中钢筋的横截面,从而统计出钢筋的条数。注意事项本案例推荐使用AI框架:Pytorch-1.0.0;进入运行环境方法:点此链接 进入AI Gallery,点击Run in ModelArts按钮进入ModelArts运行环境,如需使用GPU,可查看《ModelArts JupyterLab 硬件规格使用指南》了解切换硬件规格的方法;如果您是第一次使用 JupyterLab,请查看《ModelArts JupyterLab使用指导》了解使用方法;如果您在使用 JupyterLab 过程中碰到报错,请参考《ModelArts JupyterLab常见问题解决办法》尝试解决问题。实验步骤1.数钢筋案例开始 - 下载代码和数据集import os if not os.path.exists('./rebar_count'): print('Downloading code and datasets...') os.system("wget -N https://modelarts-labs-bj4-v2.obs.cn-north-4.myhuaweicloud.com/notebook/DL_rebar_count/rebar_count_code.zip") os.system("wget -N https://cnnorth4-modelhub-datasets-obsfs-sfnua.obs.cn-north-4.myhuaweicloud.com/content/c2c1853f-d6a6-4c9d-ac0e-203d4c304c88/NkxX5K/dataset/rebar_count_datasets.zip") os.system("unzip rebar_count_code.zip; rm rebar_count_code.zip") os.system("unzip -q rebar_count_datasets.zip; rm rebar_count_datasets.zip") os.system("mv rebar_count_code rebar_count; mv rebar_count_datasets rebar_count/datasets") if os.path.exists('./rebar_count'): print('Download code and datasets success') else: print('Download code and datasets failed, please check the download url is valid or not.') else: print('./rebar_count already exists')./rebar_count already exists2.加载需要的python模块import os import sys sys.path.insert(0, './rebar_count/src') import cv2 import time import random import torch import numpy as np from PIL import Image, ImageDraw import xml.etree.ElementTree as ET from datetime import datetime from collections import OrderedDict import torch.optim as optim import torch.utils.data as data import torch.backends.cudnn as cudnn from data import VOCroot, VOC_Config, AnnotationTransform, VOCDetection, detection_collate, BaseTransform, preproc from models.RFB_Net_vgg import build_net from layers.modules import MultiBoxLoss from layers.functions import Detect, PriorBox from utils.visualize import * from utils.nms_wrapper import nms from utils.timer import Timer import matplotlib.pyplot as plt %matplotlib inline ROOT_DIR = os.getcwd() seed = 0 cudnn.benchmark = False cudnn.deterministic = True torch.manual_seed(seed) # 为CPU设置随机种子 torch.cuda.manual_seed_all(seed) # 为所有GPU设置随机种子 random.seed(seed) np.random.seed(seed) os.environ['PYTHONHASHSEED'] = str(seed) # 设置hash随机种子3.查看训练数据样例def read_xml(xml_path): '''读取xml标签''' tree = ET.parse(xml_path) root = tree.getroot() boxes = [] labels = [] for element in root.findall('object'): label = element.find('name').text if label == 'steel': bndbox = element.find('bndbox') xmin = bndbox.find('xmin').text ymin = bndbox.find('ymin').text xmax = bndbox.find('xmax').text ymax = bndbox.find('ymax').text boxes.append([xmin, ymin, xmax, ymax]) labels.append(label) return np.array(boxes, dtype=np.float64), labels4.显示原图和标注框train_img_dir = './rebar_count/datasets/VOC2007/JPEGImages' train_xml_dir = './rebar_count/datasets/VOC2007/Annotations' files = os.listdir(train_img_dir) files.sort() for index, file_name in enumerate(files[:2]): img_path = os.path.join(train_img_dir, file_name) xml_path = os.path.join(train_xml_dir, file_name.split('.jpg')[0] + '.xml') boxes, labels = read_xml(xml_path) img = Image.open(img_path) resize_scale = 2048.0 / max(img.size) img = img.resize((int(img.size[0] * resize_scale), int(img.size[1] * resize_scale))) boxes *= resize_scale plt.figure(figsize=(img.size[0]/100.0, img.size[1]/100.0)) plt.subplot(2,1,1) plt.imshow(img) img = img.convert('RGB') img = np.array(img) img = img.copy() for box in boxes: xmin, ymin, xmax, ymax = box.astype(np.int) cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (0, 255, 0), thickness=3) plt.subplot(2,1,2) plt.imshow(img) plt.show()5.定义训练超参,模型、日志保存路径# 定义训练超参 num_classes = 2 # 数据集中只有 steel 一个标签,加上背景,所以总共有2个类 max_epoch = 25 # 默认值为1,调整为大于20的值,训练效果更佳 batch_size = 4 ngpu = 1 initial_lr = 0.01 img_dim = 416 # 模型输入图片大小 train_sets = [('2007', 'trainval')] # 指定训练集 cfg = VOC_Config rgb_means = (104, 117, 123) # ImageNet数据集的RGB均值 save_folder = './rebar_count/model_snapshots' # 指定训练模型保存路径 if not os.path.exists(save_folder): os.mkdir(save_folder) log_path = os.path.join('./rebar_count/logs', datetime.now().isoformat()) # 指定日志保存路径 if not os.path.exists(log_path): os.makedirs(log_path)6.构建模型,定义优化器及损失函数net = build_net('train', img_dim, num_classes=num_classes) if ngpu > 1: net = torch.nn.DataParallel(net) net.cuda() # 本案例代码只能在GPU上训练 cudnn.benchmark = True optimizer = optim.SGD(net.parameters(), lr=initial_lr, momentum=0.9, weight_decay=0) # 定义优化器 criterion = MultiBoxLoss(num_classes, overlap_thresh=0.4, prior_for_matching=True, bkg_label=0, neg_mining=True, neg_pos=3, neg_overlap=0.3, encode_target=False) # 定义损失函数 priorbox = PriorBox(cfg) with torch.no_grad(): priors = priorbox.forward() priors = priors.cuda()7.定义自适应学习率函数def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size): """ 自适应学习率 """ if epoch < 11: lr = 1e-8 + (initial_lr-1e-8) * iteration / (epoch_size * 10) else: lr = initial_lr * (gamma ** (step_index)) for param_group in optimizer.param_groups: param_group['lr'] = lr return lr8.定义训练函数def train(): """ 模型训练函数,每10次迭代打印一次日志,20个epoch之后,每个epoch保存一次模型 """ net.train() loc_loss = 0 conf_loss = 0 epoch = 0 print('Loading dataset...') dataset = VOCDetection(VOCroot, train_sets, preproc(img_dim, rgb_means, p=0.0), AnnotationTransform()) epoch_size = len(dataset) // batch_size max_iter = max_epoch * epoch_size stepvalues = (25 * epoch_size, 35 * epoch_size) step_index = 0 start_iter = 0 lr = initial_lr for iteration in range(start_iter, max_iter): if iteration % epoch_size == 0: if epoch > 20: torch.save(net.state_dict(), os.path.join(save_folder, 'epoch_' + repr(epoch).zfill(3) + '_loss_'+ '%.4f' % loss.item() + '.pth')) batch_iterator = iter(data.DataLoader(dataset, batch_size, shuffle=True, num_workers=1, collate_fn=detection_collate)) loc_loss = 0 conf_loss = 0 epoch += 1 load_t0 = time.time() if iteration in stepvalues: step_index += 1 lr = adjust_learning_rate(optimizer, 0.2, epoch, step_index, iteration, epoch_size) images, targets = next(batch_iterator) images = Variable(images.cuda()) targets = [Variable(anno.cuda()) for anno in targets] # forward t0 = time.time() out = net(images) # backprop optimizer.zero_grad() loss_l, loss_c = criterion(out, priors, targets) loss = loss_l + loss_c loss.backward() optimizer.step() t1 = time.time() loc_loss += loss_l.item() conf_loss += loss_c.item() load_t1 = time.time() if iteration % 10 == 0: print('Epoch:' + repr(epoch) + ' || epochiter: ' + repr(iteration % epoch_size) + '/' + repr(epoch_size) + '|| Totel iter ' + repr(iteration) + ' || L: %.4f C: %.4f||' % ( loss_l.item(),loss_c.item()) + 'Batch time: %.4f sec. ||' % (load_t1 - load_t0) + 'LR: %.8f' % (lr)) torch.save(net.state_dict(), os.path.join(save_folder, 'epoch_' + repr(epoch).zfill(3) + '_loss_'+ '%.4f' % loss.item() + '.pth'))9.开始训练,每个epoch训练耗时约60秒t1 = time.time() print('开始训练,本次训练总共需%d个epoch,每个epoch训练耗时约60秒' % max_epoch) train() print('training cost %.2f s' % (time.time() - t1))开始训练,本次训练总共需25个epoch,每个epoch训练耗时约60秒 Loading dataset... Epoch:1 || epochiter: 0/50|| Totel iter 0 || L: 3.7043 C: 3.7730||Batch time: 2.6931 sec. ||LR: 0.00000001 Epoch:1 || epochiter: 10/50|| Totel iter 10 || L: 3.1277 C: 3.1485||Batch time: 1.3692 sec. ||LR: 0.00020001 Epoch:1 || epochiter: 20/50|| Totel iter 20 || L: 3.3249 C: 2.4864||Batch time: 0.7837 sec. ||LR: 0.00040001 Epoch:1 || epochiter: 30/50|| Totel iter 30 || L: 2.8867 C: 2.4690||Batch time: 1.5246 sec. ||LR: 0.00060001 Epoch:1 || epochiter: 40/50|| Totel iter 40 || L: 2.6481 C: 2.1631||Batch time: 1.4777 sec. ||LR: 0.00080001 Epoch:2 || epochiter: 0/50|| Totel iter 50 || L: 3.0177 C: 2.1672||Batch time: 1.5618 sec. ||LR: 0.00100001 Epoch:2 || epochiter: 10/50|| Totel iter 60 || L: 1.9024 C: 1.8743||Batch time: 1.2920 sec. ||LR: 0.00120001 Epoch:2 || epochiter: 20/50|| Totel iter 70 || L: 1.5299 C: 1.7229||Batch time: 1.3726 sec. ||LR: 0.00140001 Epoch:2 || epochiter: 30/50|| Totel iter 80 || L: 1.7592 C: 1.8066||Batch time: 0.9840 sec. ||LR: 0.00160001 Epoch:2 || epochiter: 40/50|| Totel iter 90 || L: 1.4430 C: 1.7445||Batch time: 1.5012 sec. ||LR: 0.00180001 Epoch:3 || epochiter: 0/50|| Totel iter 100 || L: 1.3402 C: 1.5614||Batch time: 1.3830 sec. ||LR: 0.00200001 Epoch:3 || epochiter: 10/50|| Totel iter 110 || L: 1.2771 C: 1.7149||Batch time: 1.4420 sec. ||LR: 0.00220001 Epoch:3 || epochiter: 20/50|| Totel iter 120 || L: 2.1052 C: 2.3860||Batch time: 1.0122 sec. ||LR: 0.00240001 Epoch:3 || epochiter: 30/50|| Totel iter 130 || L: 1.3969 C: 2.0087||Batch time: 1.2500 sec. ||LR: 0.00260001 Epoch:3 || epochiter: 40/50|| Totel iter 140 || L: 1.1426 C: 1.3518||Batch time: 1.3625 sec. ||LR: 0.00280001 Epoch:4 || epochiter: 0/50|| Totel iter 150 || L: 1.3851 C: 1.3837||Batch time: 1.3933 sec. ||LR: 0.00300001 Epoch:4 || epochiter: 10/50|| Totel iter 160 || L: 0.8790 C: 1.0304||Batch time: 1.0430 sec. ||LR: 0.00320001 Epoch:4 || epochiter: 20/50|| Totel iter 170 || L: 1.1230 C: 1.2439||Batch time: 1.0029 sec. ||LR: 0.00340001 Epoch:4 || epochiter: 30/50|| Totel iter 180 || L: 1.0097 C: 1.1061||Batch time: 1.5267 sec. ||LR: 0.00360001 Epoch:4 || epochiter: 40/50|| Totel iter 190 || L: 0.8008 C: 1.0768||Batch time: 1.1727 sec. ||LR: 0.00380001 Epoch:5 || epochiter: 0/50|| Totel iter 200 || L: 1.0015 C: 1.1481||Batch time: 1.3881 sec. ||LR: 0.00400001 Epoch:5 || epochiter: 10/50|| Totel iter 210 || L: 0.9171 C: 1.1305||Batch time: 1.2255 sec. ||LR: 0.00420001 Epoch:5 || epochiter: 20/50|| Totel iter 220 || L: 0.9460 C: 1.0200||Batch time: 1.0095 sec. ||LR: 0.00440001 Epoch:5 || epochiter: 30/50|| Totel iter 230 || L: 0.8780 C: 1.1776||Batch time: 1.3224 sec. ||LR: 0.00460001 Epoch:5 || epochiter: 40/50|| Totel iter 240 || L: 0.8082 C: 0.8878||Batch time: 1.0734 sec. ||LR: 0.00480001 Epoch:6 || epochiter: 0/50|| Totel iter 250 || L: 0.7907 C: 0.9508||Batch time: 1.2835 sec. ||LR: 0.00500001 Epoch:6 || epochiter: 10/50|| Totel iter 260 || L: 0.6690 C: 0.8685||Batch time: 1.4887 sec. ||LR: 0.00520000 Epoch:6 || epochiter: 20/50|| Totel iter 270 || L: 1.1006 C: 0.9525||Batch time: 1.3324 sec. ||LR: 0.00540000 Epoch:6 || epochiter: 30/50|| Totel iter 280 || L: 0.9483 C: 1.0393||Batch time: 1.3198 sec. ||LR: 0.00560000 Epoch:6 || epochiter: 40/50|| Totel iter 290 || L: 0.8986 C: 1.0833||Batch time: 1.3434 sec. ||LR: 0.00580000 Epoch:7 || epochiter: 0/50|| Totel iter 300 || L: 0.8187 C: 0.9676||Batch time: 1.4531 sec. ||LR: 0.00600000 Epoch:7 || epochiter: 10/50|| Totel iter 310 || L: 0.6827 C: 0.9837||Batch time: 0.9223 sec. ||LR: 0.00620000 Epoch:7 || epochiter: 20/50|| Totel iter 320 || L: 0.7325 C: 0.8995||Batch time: 0.9585 sec. ||LR: 0.00640000 Epoch:7 || epochiter: 30/50|| Totel iter 330 || L: 0.9895 C: 1.0482||Batch time: 1.2272 sec. ||LR: 0.00660000 Epoch:7 || epochiter: 40/50|| Totel iter 340 || L: 0.5824 C: 0.8616||Batch time: 1.1445 sec. ||LR: 0.00680000 Epoch:8 || epochiter: 0/50|| Totel iter 350 || L: 1.1853 C: 1.2745||Batch time: 1.5200 sec. ||LR: 0.00700000 Epoch:8 || epochiter: 10/50|| Totel iter 360 || L: 0.7265 C: 1.1777||Batch time: 0.7649 sec. ||LR: 0.00720000 Epoch:8 || epochiter: 20/50|| Totel iter 370 || L: 0.7457 C: 0.8613||Batch time: 1.5218 sec. ||LR: 0.00740000 Epoch:8 || epochiter: 30/50|| Totel iter 380 || L: 0.5295 C: 0.9103||Batch time: 1.2653 sec. ||LR: 0.00760000 Epoch:8 || epochiter: 40/50|| Totel iter 390 || L: 0.7083 C: 1.0060||Batch time: 1.1069 sec. ||LR: 0.00780000 Epoch:9 || epochiter: 0/50|| Totel iter 400 || L: 0.6398 C: 0.9866||Batch time: 1.5802 sec. ||LR: 0.00800000 Epoch:9 || epochiter: 10/50|| Totel iter 410 || L: 0.5987 C: 0.8167||Batch time: 1.0675 sec. ||LR: 0.00820000 Epoch:9 || epochiter: 20/50|| Totel iter 420 || L: 0.5751 C: 0.7944||Batch time: 0.7669 sec. ||LR: 0.00840000 Epoch:9 || epochiter: 30/50|| Totel iter 430 || L: 0.7229 C: 1.0396||Batch time: 1.3895 sec. ||LR: 0.00860000 Epoch:9 || epochiter: 40/50|| Totel iter 440 || L: 0.5569 C: 0.9122||Batch time: 0.8300 sec. ||LR: 0.00880000 Epoch:10 || epochiter: 0/50|| Totel iter 450 || L: 0.6908 C: 0.9928||Batch time: 1.4029 sec. ||LR: 0.00900000 Epoch:10 || epochiter: 10/50|| Totel iter 460 || L: 0.6851 C: 0.8068||Batch time: 1.2804 sec. ||LR: 0.00920000 Epoch:10 || epochiter: 20/50|| Totel iter 470 || L: 0.6783 C: 0.8511||Batch time: 1.7469 sec. ||LR: 0.00940000 Epoch:10 || epochiter: 30/50|| Totel iter 480 || L: 0.7962 C: 0.8040||Batch time: 1.6116 sec. ||LR: 0.00960000 Epoch:10 || epochiter: 40/50|| Totel iter 490 || L: 0.7782 C: 0.9469||Batch time: 1.1979 sec. ||LR: 0.00980000 Epoch:11 || epochiter: 0/50|| Totel iter 500 || L: 0.8902 C: 0.8956||Batch time: 1.8625 sec. ||LR: 0.01000000 Epoch:11 || epochiter: 10/50|| Totel iter 510 || L: 0.8532 C: 0.9259||Batch time: 1.2692 sec. ||LR: 0.01000000 Epoch:11 || epochiter: 20/50|| Totel iter 520 || L: 0.7917 C: 0.7990||Batch time: 1.7494 sec. ||LR: 0.01000000 Epoch:11 || epochiter: 30/50|| Totel iter 530 || L: 0.9688 C: 1.2376||Batch time: 1.1547 sec. ||LR: 0.01000000 Epoch:11 || epochiter: 40/50|| Totel iter 540 || L: 0.7030 C: 0.8440||Batch time: 1.1588 sec. ||LR: 0.01000000 Epoch:12 || epochiter: 0/50|| Totel iter 550 || L: 0.6580 C: 0.8380||Batch time: 1.2196 sec. ||LR: 0.01000000 Epoch:12 || epochiter: 10/50|| Totel iter 560 || L: 0.7978 C: 0.8454||Batch time: 1.1011 sec. ||LR: 0.01000000 Epoch:12 || epochiter: 20/50|| Totel iter 570 || L: 0.6071 C: 0.8394||Batch time: 0.7146 sec. ||LR: 0.01000000 Epoch:12 || epochiter: 30/50|| Totel iter 580 || L: 0.4787 C: 0.6888||Batch time: 1.2482 sec. ||LR: 0.01000000 Epoch:12 || epochiter: 40/50|| Totel iter 590 || L: 0.6505 C: 0.8412||Batch time: 1.1304 sec. ||LR: 0.01000000 Epoch:13 || epochiter: 0/50|| Totel iter 600 || L: 0.6316 C: 0.8319||Batch time: 1.4268 sec. ||LR: 0.01000000 Epoch:13 || epochiter: 10/50|| Totel iter 610 || L: 0.6693 C: 0.7822||Batch time: 1.2204 sec. ||LR: 0.01000000 Epoch:13 || epochiter: 20/50|| Totel iter 620 || L: 0.6773 C: 0.9631||Batch time: 1.2477 sec. ||LR: 0.01000000 Epoch:13 || epochiter: 30/50|| Totel iter 630 || L: 0.4851 C: 0.8346||Batch time: 1.2228 sec. ||LR: 0.01000000 Epoch:13 || epochiter: 40/50|| Totel iter 640 || L: 0.7247 C: 0.9392||Batch time: 1.2318 sec. ||LR: 0.01000000 Epoch:14 || epochiter: 0/50|| Totel iter 650 || L: 0.5716 C: 0.7683||Batch time: 1.8367 sec. ||LR: 0.01000000 Epoch:14 || epochiter: 10/50|| Totel iter 660 || L: 0.7804 C: 1.0285||Batch time: 1.0683 sec. ||LR: 0.01000000 Epoch:14 || epochiter: 20/50|| Totel iter 670 || L: 0.4620 C: 0.8179||Batch time: 1.3811 sec. ||LR: 0.01000000 Epoch:14 || epochiter: 30/50|| Totel iter 680 || L: 0.5459 C: 0.7611||Batch time: 1.4473 sec. ||LR: 0.01000000 Epoch:14 || epochiter: 40/50|| Totel iter 690 || L: 0.4946 C: 0.7604||Batch time: 1.2968 sec. ||LR: 0.01000000 Epoch:15 || epochiter: 0/50|| Totel iter 700 || L: 0.6467 C: 0.6637||Batch time: 1.4271 sec. ||LR: 0.01000000 Epoch:15 || epochiter: 10/50|| Totel iter 710 || L: 0.4383 C: 0.6140||Batch time: 1.1232 sec. ||LR: 0.01000000 Epoch:15 || epochiter: 20/50|| Totel iter 720 || L: 0.5551 C: 0.9027||Batch time: 1.2992 sec. ||LR: 0.01000000 Epoch:15 || epochiter: 30/50|| Totel iter 730 || L: 0.4488 C: 0.7574||Batch time: 0.9148 sec. ||LR: 0.01000000 Epoch:15 || epochiter: 40/50|| Totel iter 740 || L: 0.5179 C: 0.6202||Batch time: 1.5350 sec. ||LR: 0.01000000 Epoch:16 || epochiter: 0/50|| Totel iter 750 || L: 0.4956 C: 0.6740||Batch time: 1.6760 sec. ||LR: 0.01000000 Epoch:16 || epochiter: 10/50|| Totel iter 760 || L: 0.5780 C: 0.8834||Batch time: 1.3318 sec. ||LR: 0.01000000 Epoch:16 || epochiter: 20/50|| Totel iter 770 || L: 0.5829 C: 0.7340||Batch time: 1.0279 sec. ||LR: 0.01000000 Epoch:16 || epochiter: 30/50|| Totel iter 780 || L: 0.4798 C: 0.7019||Batch time: 1.4545 sec. ||LR: 0.01000000 Epoch:16 || epochiter: 40/50|| Totel iter 790 || L: 0.6511 C: 0.7712||Batch time: 1.7330 sec. ||LR: 0.01000000 Epoch:17 || epochiter: 0/50|| Totel iter 800 || L: 0.4281 C: 0.6578||Batch time: 1.6699 sec. ||LR: 0.01000000 Epoch:17 || epochiter: 10/50|| Totel iter 810 || L: 0.5440 C: 0.7102||Batch time: 1.4820 sec. ||LR: 0.01000000 Epoch:17 || epochiter: 20/50|| Totel iter 820 || L: 0.4770 C: 0.7014||Batch time: 1.4020 sec. ||LR: 0.01000000 Epoch:17 || epochiter: 30/50|| Totel iter 830 || L: 0.3601 C: 0.5890||Batch time: 1.0758 sec. ||LR: 0.01000000 Epoch:17 || epochiter: 40/50|| Totel iter 840 || L: 0.4817 C: 0.7329||Batch time: 1.3797 sec. ||LR: 0.01000000 Epoch:18 || epochiter: 0/50|| Totel iter 850 || L: 0.4860 C: 0.7499||Batch time: 1.3214 sec. ||LR: 0.01000000 Epoch:18 || epochiter: 10/50|| Totel iter 860 || L: 0.6856 C: 0.7154||Batch time: 1.4014 sec. ||LR: 0.01000000 Epoch:18 || epochiter: 20/50|| Totel iter 870 || L: 0.6231 C: 0.7692||Batch time: 0.9905 sec. ||LR: 0.01000000 Epoch:18 || epochiter: 30/50|| Totel iter 880 || L: 0.6680 C: 0.8625||Batch time: 1.1373 sec. ||LR: 0.01000000 Epoch:18 || epochiter: 40/50|| Totel iter 890 || L: 0.5535 C: 0.7393||Batch time: 1.1122 sec. ||LR: 0.01000000 Epoch:19 || epochiter: 0/50|| Totel iter 900 || L: 0.4691 C: 0.7235||Batch time: 1.3488 sec. ||LR: 0.01000000 Epoch:19 || epochiter: 10/50|| Totel iter 910 || L: 0.6145 C: 0.7811||Batch time: 1.1163 sec. ||LR: 0.01000000 Epoch:19 || epochiter: 20/50|| Totel iter 920 || L: 0.4698 C: 0.7225||Batch time: 1.6120 sec. ||LR: 0.01000000 Epoch:19 || epochiter: 30/50|| Totel iter 930 || L: 0.5623 C: 0.7341||Batch time: 1.3949 sec. ||LR: 0.01000000 Epoch:19 || epochiter: 40/50|| Totel iter 940 || L: 0.4859 C: 0.5786||Batch time: 0.8949 sec. ||LR: 0.01000000 Epoch:20 || epochiter: 0/50|| Totel iter 950 || L: 0.4193 C: 0.6898||Batch time: 1.4702 sec. ||LR: 0.01000000 Epoch:20 || epochiter: 10/50|| Totel iter 960 || L: 0.4434 C: 0.6261||Batch time: 1.0974 sec. ||LR: 0.01000000 Epoch:20 || epochiter: 20/50|| Totel iter 970 || L: 0.5948 C: 0.8787||Batch time: 1.1951 sec. ||LR: 0.01000000 Epoch:20 || epochiter: 30/50|| Totel iter 980 || L: 0.5842 C: 0.6120||Batch time: 0.9863 sec. ||LR: 0.01000000 Epoch:20 || epochiter: 40/50|| Totel iter 990 || L: 0.4010 C: 0.7356||Batch time: 1.5981 sec. ||LR: 0.01000000 Epoch:21 || epochiter: 0/50|| Totel iter 1000 || L: 0.4719 C: 0.6351||Batch time: 1.1228 sec. ||LR: 0.01000000 Epoch:21 || epochiter: 10/50|| Totel iter 1010 || L: 0.5856 C: 0.7444||Batch time: 1.3812 sec. ||LR: 0.01000000 Epoch:21 || epochiter: 20/50|| Totel iter 1020 || L: 0.5810 C: 0.8371||Batch time: 1.2560 sec. ||LR: 0.01000000 Epoch:21 || epochiter: 30/50|| Totel iter 1030 || L: 0.4583 C: 0.9570||Batch time: 1.1499 sec. ||LR: 0.01000000 Epoch:21 || epochiter: 40/50|| Totel iter 1040 || L: 0.5411 C: 0.5317||Batch time: 1.4007 sec. ||LR: 0.01000000 Epoch:22 || epochiter: 0/50|| Totel iter 1050 || L: 0.3508 C: 0.5599||Batch time: 1.1371 sec. ||LR: 0.01000000 Epoch:22 || epochiter: 10/50|| Totel iter 1060 || L: 0.4045 C: 0.6965||Batch time: 1.1030 sec. ||LR: 0.01000000 Epoch:22 || epochiter: 20/50|| Totel iter 1070 || L: 0.3949 C: 0.6019||Batch time: 1.4505 sec. ||LR: 0.01000000 Epoch:22 || epochiter: 30/50|| Totel iter 1080 || L: 0.3467 C: 0.5563||Batch time: 1.1956 sec. ||LR: 0.01000000 Epoch:22 || epochiter: 40/50|| Totel iter 1090 || L: 0.5757 C: 0.5643||Batch time: 0.8669 sec. ||LR: 0.01000000 Epoch:23 || epochiter: 0/50|| Totel iter 1100 || L: 0.3946 C: 0.6081||Batch time: 1.7117 sec. ||LR: 0.01000000 Epoch:23 || epochiter: 10/50|| Totel iter 1110 || L: 0.3655 C: 0.5579||Batch time: 0.9830 sec. ||LR: 0.01000000 Epoch:23 || epochiter: 20/50|| Totel iter 1120 || L: 0.3912 C: 0.6437||Batch time: 1.2725 sec. ||LR: 0.01000000 Epoch:23 || epochiter: 30/50|| Totel iter 1130 || L: 0.4237 C: 0.6337||Batch time: 1.3346 sec. ||LR: 0.01000000 Epoch:23 || epochiter: 40/50|| Totel iter 1140 || L: 0.3474 C: 0.5517||Batch time: 1.1646 sec. ||LR: 0.01000000 Epoch:24 || epochiter: 0/50|| Totel iter 1150 || L: 0.5573 C: 0.7426||Batch time: 1.5291 sec. ||LR: 0.01000000 Epoch:24 || epochiter: 10/50|| Totel iter 1160 || L: 0.6122 C: 0.6805||Batch time: 1.1861 sec. ||LR: 0.01000000 Epoch:24 || epochiter: 20/50|| Totel iter 1170 || L: 0.3846 C: 0.6484||Batch time: 1.2575 sec. ||LR: 0.01000000 Epoch:24 || epochiter: 30/50|| Totel iter 1180 || L: 0.4183 C: 0.6982||Batch time: 1.1318 sec. ||LR: 0.01000000 Epoch:24 || epochiter: 40/50|| Totel iter 1190 || L: 0.5259 C: 0.7322||Batch time: 1.0091 sec. ||LR: 0.01000000 Epoch:25 || epochiter: 0/50|| Totel iter 1200 || L: 0.4047 C: 0.5544||Batch time: 1.4809 sec. ||LR: 0.01000000 Epoch:25 || epochiter: 10/50|| Totel iter 1210 || L: 0.4519 C: 0.5351||Batch time: 1.2974 sec. ||LR: 0.01000000 Epoch:25 || epochiter: 20/50|| Totel iter 1220 || L: 0.4390 C: 0.6232||Batch time: 1.0032 sec. ||LR: 0.01000000 Epoch:25 || epochiter: 30/50|| Totel iter 1230 || L: 0.4840 C: 0.7323||Batch time: 1.0048 sec. ||LR: 0.01000000 Epoch:25 || epochiter: 40/50|| Totel iter 1240 || L: 0.6699 C: 0.8887||Batch time: 1.7034 sec. ||LR: 0.01000000 training cost 1572.48 s10.已完成训练,下面开始测试模型,首先需定义目标检测类cfg = VOC_Config img_dim = 416 rgb_means = (104, 117, 123) priorbox = PriorBox(cfg) with torch.no_grad(): priors = priorbox.forward() if torch.cuda.is_available(): priors = priors.cuda() class ObjectDetector: """ 定义目标检测类 """ def __init__(self, net, detection, transform, num_classes=num_classes, thresh=0.01, cuda=True): self.net = net self.detection = detection self.transform = transform self.num_classes = num_classes self.thresh = thresh self.cuda = torch.cuda.is_available() def predict(self, img): _t = {'im_detect': Timer(), 'misc': Timer()} scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]]) with torch.no_grad(): x = self.transform(img).unsqueeze(0) if self.cuda: x = x.cuda() scale = scale.cuda() _t['im_detect'].tic() out = net(x) # forward pass boxes, scores = self.detection.forward(out, priors) detect_time = _t['im_detect'].toc() boxes = boxes[0] scores = scores[0] # scale each detection back up to the image boxes *= scale boxes = boxes.cpu().numpy() scores = scores.cpu().numpy() _t['misc'].tic() all_boxes = [[] for _ in range(num_classes)] for j in range(1, num_classes): inds = np.where(scores[:, j] > self.thresh)[0] if len(inds) == 0: all_boxes[j] = np.zeros([0, 5], dtype=np.float32) continue c_bboxes = boxes[inds] c_scores = scores[inds, j] c_dets = np.hstack((c_bboxes, c_scores[:, np.newaxis])).astype( np.float32, copy=False) keep = nms(c_dets, 0.2, force_cpu=False) c_dets = c_dets[keep, :] all_boxes[j] = c_dets nms_time = _t['misc'].toc() total_time = detect_time + nms_time return all_boxes, total_time11.定义推理网络,并加载前面训练的loss最低的模型trained_models = os.listdir(os.path.join(ROOT_DIR, './rebar_count/model_snapshots')) # 模型文件所在目录 lowest_loss = 9999 best_model_name = '' for model_name in trained_models: if not model_name.endswith('pth'): continue loss = float(model_name.split('_loss_')[1].split('.pth')[0]) if loss < lowest_loss: lowest_loss = loss best_model_name = model_name best_model_path = os.path.join(ROOT_DIR, './rebar_count/model_snapshots', best_model_name) print('loading model from', best_model_path) net = build_net('test', img_dim, num_classes) # 加载模型 state_dict = torch.load(best_model_path) new_state_dict = OrderedDict() for k, v in state_dict.items(): head = k[:7] if head == 'module.': name = k[7:] else: name = k new_state_dict[name] = v net.load_state_dict(new_state_dict) net.eval() print('Finish load model!') if torch.cuda.is_available(): net = net.cuda() cudnn.benchmark = True else: net = net.cpu() detector = Detect(num_classes, 0, cfg) transform = BaseTransform(img_dim, rgb_means, (2, 0, 1)) object_detector = ObjectDetector(net, detector, transform)loading model from /home/ma-user/work/./rebar_count/model_snapshots/epoch_023_loss_1.0207.pth Finish load model!12.测试图片,输出每条钢筋的位置和图片中钢筋总条数test_img_dir = r'./rebar_count/datasets/test_dataset' # 待预测的图片目录 files = os.listdir(test_img_dir) files.sort() for i, file_name in enumerate(files[:2]): image_src = cv2.imread(os.path.join(test_img_dir, file_name)) detect_bboxes, tim = object_detector.predict(image_src) image_draw = image_src.copy() rebar_count = 0 for class_id, class_collection in enumerate(detect_bboxes): if len(class_collection) > 0: for i in range(class_collection.shape[0]): if class_collection[i, -1] > 0.6: pt = class_collection[i] cv2.circle(image_draw, (int((pt[0] + pt[2]) * 0.5), int((pt[1] + pt[3]) * 0.5)), int((pt[2] - pt[0]) * 0.5 * 0.6), (255, 0, 0), -1) rebar_count += 1 cv2.putText(image_draw, 'rebar_count: %d' % rebar_count, (25, 50), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 3) plt.figure(i, figsize=(30, 20)) plt.imshow(image_draw) plt.show()至此,本案例结束。
  • [技术干货] 物体检测YOLOv3实践
    物体检测YOLOv3实践物体检测是计算机视觉中的一个重要的研究领域,在人流检测,行人跟踪,自动驾驶,医学影像等领域有着广泛的应用。不同于简单的图像分类,物体检测旨在对图像中的目标进行精确识别,包括物体的位置和分类,因此能够应用于更多高层视觉处理的场景。例如在自动驾驶领域,需要辨识摄像头拍摄的图像中的车辆、行人、交通指示牌及其位置,以便进一步根据这些数据决定驾驶策略。本期学习案例,我们将聚焦于YOLO算法,YOLO(You Only Look Once)是一种one-stage物体检测算法。注意事项:本案例使用框架: TensorFlow-1.13.1本案例使用硬件规格: GPU V100进入运行环境方法:点此链接进入AI Gallery,点击Run in ModelArts按钮进入ModelArts运行环境,如需使用GPU,您可以在ModelArts JupyterLab运行界面右边的工作区进行切换运行代码方法: 点击本页面顶部菜单栏的三角形运行按钮或按Ctrl+Enter键 运行每个方块中的代码JupyterLab的详细用法: 请参考《ModelAtrs JupyterLab使用指导》碰到问题的解决办法: 请参考《ModelAtrs JupyterLab常见问题解决办法》1.数据和代码下载运行下面代码,进行数据和代码的下载和解压本案例使用coco数据,共80个类别。import os from modelarts.session import Session sess = Session() if sess.region_name == 'cn-north-1': bucket_path="modelarts-labs/notebook/DL_object_detection_yolo/yolov3.tar.gz" elif sess.region_name == 'cn-north-4': bucket_path="modelarts-labs-bj4/notebook/DL_object_detection_yolo/yolov3.tar.gz" else: print("请更换地区到北京一或北京四") if not os.path.exists('./yolo3'): sess.download_data(bucket_path=bucket_path, path="./yolov3.tar.gz") if os.path.exists('./yolov3.tar.gz'): # 解压文件 os.system("tar -xf ./yolov3.tar.gz") # 清理压缩包 os.system("rm -r ./yolov3.tar.gz")2.准备数据2.1文件路径定义from train import get_classes, get_anchors # 数据文件路径 data_path = "./coco/coco_data" # coco类型定义文件存储位置 classes_path = './model_data/coco_classes.txt' # coco数据anchor值文件存储位置 anchors_path = './model_data/yolo_anchors.txt' # coco数据标注信息文件存储位置 annotation_path = './coco/coco_train.txt' # 预训练权重文件存储位置 weights_path = "./model_data/yolo.h5" # 模型文件存储位置 save_path = "./result/models/" classes = get_classes(classes_path) anchors = get_anchors(anchors_path) # 获取类型数量和anchor数量变量 num_classes = len(classes) num_anchors = len(anchors)Using TensorFlow backend. /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)])2.2读取标注数据import numpy as np # 训练集与验证集划分比例 val_split = 0.1 with open(annotation_path) as f: lines = f.readlines() np.random.seed(10101) np.random.shuffle(lines) np.random.seed(None) num_val = int(len(lines)*val_split) num_train = len(lines) - num_val2.3数据读取函数,构建数据生成器。每次读取一个批次的数据至内存训练,并做数据增强。def data_generator(annotation_lines, batch_size, input_shape, data_path,anchors, num_classes): n = len(annotation_lines) i = 0 while True: image_data = [] box_data = [] for b in range(batch_size): if i==0: np.random.shuffle(annotation_lines) image, box = get_random_data(annotation_lines[i], input_shape, data_path,random=True) # 随机挑选一个批次的数据 image_data.append(image) box_data.append(box) i = (i+1) % n image_data = np.array(image_data) box_data = np.array(box_data) y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) # 对标注框预处理,过滤异常标注框 yield [image_data, *y_true], np.zeros(batch_size) def data_generator_wrapper(annotation_lines, batch_size, input_shape, data_path,anchors, num_classes): n = len(annotation_lines) if n==0 or batch_size<=0: return None return data_generator(annotation_lines, batch_size, input_shape, data_path,anchors, num_classes)3.模型训练本案例使用Keras深度学习框架搭建YOLOv3神经网络。可以进入相应的文件夹路径查看源码实现。3.1构建神经网络可以在./yolo3/model.py文件中查看细节import keras.backend as K from yolo3.model import preprocess_true_boxes, yolo_body, yolo_loss from keras.layers import Input, Lambda from keras.models import Model # 初始化session K.clear_session() # 图像输入尺寸 input_shape = (416, 416) image_input = Input(shape=(None, None, 3)) h, w = input_shape # 设置多尺度检测的下采样尺寸 y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], num_anchors//3, num_classes+5)) for l in range(3)] # 构建YOLO模型结构 model_body = yolo_body(image_input, num_anchors//3, num_classes) # 将YOLO权重文件加载进来,如果希望不加载预训练权重,从头开始训练的话,可以删除这句代码 model_body.load_weights(weights_path, by_name=True, skip_mismatch=True) # 定义YOLO损失函数 model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})([*model_body.output, *y_true]) # 构建Model,为训练做准备 model = Model([model_body.input, *y_true], model_loss)WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.# 打印模型各层结构 model.summary()(此处代码执行的输出很长,省略)训练回调函数定义from keras.callbacks import ReduceLROnPlateau, EarlyStopping # 定义回调方法 reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1) # 学习率衰减策略 early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1) # 早停策略3.2开始训练from keras.optimizers import Adam from yolo3.utils import get_random_data # 设置所有的层可训练 for i in range(len(model.layers)): model.layers[i].trainable = True # 选择Adam优化器,设置学习率 learning_rate = 1e-4 model.compile(optimizer=Adam(lr=learning_rate), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # 设置批大小和训练轮数 batch_size = 16 max_epochs = 2 print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) # 开始训练 model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, data_path,anchors, num_classes), steps_per_epoch=max(1, num_train//batch_size), validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, data_path,anchors, num_classes), validation_steps=max(1, num_val//batch_size), epochs=max_epochs, initial_epoch=0, callbacks=[reduce_lr, early_stopping])Train on 179 samples, val on 19 samples, with batch size 16. Epoch 1/2 11/11 [==============================] - 25s 2s/step - loss: 46.6694 - val_loss: 39.1381 Epoch 2/2 11/11 [==============================] - 5s 452ms/step - loss: 45.5145 - val_loss: 43.6707 <keras.callbacks.History at 0x7fbff60659e8>3.3保存模型import os if not os.path.exists(save_path): os.makedirs(save_path) # 保存模型 model.save_weights(os.path.join(save_path, 'trained_weights_final.h5'))4.模型测试4.1打开一张测试图片from PIL import Image import numpy as np # 测试文件路径 test_file_path = './test.jpg' # 打开测试文件 image = Image.open(test_file_path) image_ori = np.array(image) image_ori.shape(640, 481, 3)4.2图片预处理from yolo3.utils import letterbox_image new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32)) boxed_image = letterbox_image(image, new_image_size) image_data = np.array(boxed_image, dtype='float32') image_data /= 255. image_data = np.expand_dims(image_data, 0) image_data.shape(1, 640, 480, 3)import keras.backend as K sess = K.get_session()4.3构建模型from yolo3.model import yolo_body from keras.layers import Input # coco数据anchor值文件存储位置 anchor_path = "./model_data/yolo_anchors.txt" with open(anchor_path) as f: anchors = f.readline() anchors = [float(x) for x in anchors.split(',')] anchors = np.array(anchors).reshape(-1, 2) yolo_model = yolo_body(Input(shape=(None,None,3)), len(anchors)//3, num_classes)4.4加载模型权重,或将模型路径替换成上一步训练得出的模型路径# 模型权重存储路径 weights_path = "./model_data/yolo.h5" yolo_model.load_weights(weights_path)4.5定义IOU以及score:IOU: 将交并比大于IOU的边界框作为冗余框去除score:将预测分数大于score的边界框筛选出来iou = 0.45 score = 0.84.6构建输出[boxes, scores, classes]from yolo3.model import yolo_eval input_image_shape = K.placeholder(shape=(2, )) boxes, scores, classes = yolo_eval( yolo_model.output, anchors, num_classes, input_image_shape, score_threshold=score, iou_threshold=iou)4.7进行预测out_boxes, out_scores, out_classes = sess.run( [boxes, scores, classes], feed_dict={ yolo_model.input: image_data, input_image_shape: [image.size[1], image.size[0]], K.learning_phase(): 0 })class_coco = get_classes(classes_path) out_coco = [] for i in out_classes: out_coco.append(class_coco[i])print(out_boxes) print(out_scores) print(out_coco)[[152.69937 166.2726 649.0503 459.9374 ] [ 68.62158 21.843088 465.66208 452.6878 ]] [0.9838943 0.999688 ] ['person', 'umbrella']4.8将预测结果绘制在图片上from PIL import Image, ImageFont, ImageDraw font = ImageFont.truetype(font='font/FiraMono-Medium.otf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) thickness = (image.size[0] + image.size[1]) // 300 for i, c in reversed(list(enumerate(out_coco))): predicted_class = c box = out_boxes[i] score = out_scores[i] label = '{} {:.2f}'.format(predicted_class, score) draw = ImageDraw.Draw(image) label_size = draw.textsize(label, font) top, left, bottom, right = box top = max(0, np.floor(top + 0.5).astype('int32')) left = max(0, np.floor(left + 0.5).astype('int32')) bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32')) right = min(image.size[0], np.floor(right + 0.5).astype('int32')) print(label, (left, top), (right, bottom)) if top - label_size[1] >= 0: text_origin = np.array([left, top - label_size[1]]) else: text_origin = np.array([left, top + 1]) for i in range(thickness): draw.rectangle( [left + i, top + i, right - i, bottom - i], outline=225) draw.rectangle( [tuple(text_origin), tuple(text_origin + label_size)], fill=225) draw.text(text_origin, label, fill=(0, 0, 0), font=font) del drawumbrella 1.00 (22, 69) (453, 466) person 0.98 (166, 153) (460, 640)image
  • [经验分享] 基于MindStudio的图像中英文文本检测模型en_PP-OCRv3_det实现离线推理的图文案例
    目录一、模型介绍二、获取代码2.1、创建工程... 122.2、添加工程代码... 122.3、添加模型代码... 132.4、添加推理工具代码... 152.5、远程同步代码... 15三、模型转换... 163.1、获取权重... 163.2、安装依赖... 173.3、转onnx. 173.4、转om... 18四、离线推理... 224.1、数据预处理... 224.2、模型推理... 244.3、推理结果展示... 25五、总结... 32一、模型介绍本文主要介绍使用MindStudio开发工具进行英语检测模型 en_PP-OCRv3_det离线推理的开发过程,主要内容包括环境安装配置、模型获取转换以及离线推理三个总分。en_PP-OCRv3_det是基于PP-OCRv3的英文文本检测模型,PP-OCRv3检测模型对PP-OCRv2中的CML协同互学习文本检测蒸馏策略进行了升级,分别针对教师模型和学生模型进行进一步效果优化。其中,在对教师模型优化时,提出了大感受野的PAN结构LK-PAN和引入了DML蒸馏策略;在对学生模型优化时,提出了残差注意力机制的FPN结构RSE-FPN。关于PP-OCRv3的详细介绍参考以下链接:cid:link_0二、获取代码... 122.1、创建工程选择ACL Project(Python),点击Next。如下图所示,输入工程名和路径,点击Finish,完成工程创建。如下图所示,进入工程后会有提示信息,点击Close关闭即可。如下图所示,点击File->Project Structure。如下图所示,点击SDKs->+号->Add Python SDK。如下图所示,点击SSH Interpreter,选择Deploy系统会自动识别Python环境,然后点击OK。如下图所示,点击Project->Remote Python 3.7.5,选择Python环境,点击OK。如下图所示,点击Tools->Deployment->Configuration。如下图所示,在Deployment选项下,选中远程服务器,点击Mappings,选择远程映射目录,点击OK。如下图所示,点击Tools->Start SSH session。如下图所示,点击服务器,启动远程终端Remote Terminal窗口。通过以上步骤,工程创建配置完成。2.2、添加工程代码如下图所示,工程所需要开发的代码包括:en_PP-OCRv3_det_ais_infer.py:推理脚本en_PP-OCRv3_det_postprocess.py:数据后处理脚本en_PP-OCRv3_det_preprocess.py:数据前处理脚本requirements.txt:依赖文件2.3、添加模型代码在本地终端窗口中可以通过git命令添加模型代码,相关命令为:git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleOCR.gitcd PaddleOCRgit reset --hard 274c216c6771a94807a34fb94377a1d7d674a69frm .\applications\rm .\doc\imgs_en\img_12.jpgrm .\doc\imgs_en\wandb_ models.pngrm .\doc\imgs_en\model_prod_flow_en.pngcd ..如下图为模型代码结构。如下图所示,将模型配置文件ch_PP-OCRv3_det_cml.yml中的use_gpu设置成false。如下图所示,将目标检测推理文件 infer_det.py中的第53行代码注释掉。2.4、添加推理工具代码推理工具使用的是ais_infer,该工具的获取、编译、安装说明链接为:cid:link_1下载后将ais_infer工具代码放到本地工程路径下。2.5、远程同步代码如下图为工程代码结构,选择工程名,点击Tools->Deployment->Upload to。如下图所示,选中服务器,开始远程同步代码。如下图,代码远程同步完成。三、模型转换3.1、获取权重模型权重链接如下:https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar下载后解压至en_PP-OCRv3_det\ inference目录下,并同步到远程目录,如下图。3.2、安装依赖在远程终端中使用命令远程安装Python依赖,命令如下:pip3 install -r requirements.txt --user然后进入到ais_infer目录下,编译并安装aclruntime,命令如下:cd ais_infer/backend/pip3 wheel ./pip3 install aclruntime-0.0.1-cp37-cp37m-linux_x86_64.whlcd ../../3.3、转onnx在远程终端中使用命令转onnx模型,命令如下:paddle2onnx \--model_dir ./inference/en_PP-OCRv3_det_infer \--model_filename inference.pdmodel \--params_filename inference.pdiparams \--save_file ../en_PP-OCRv3_det.onnx \--opset_version 11 \--enable_onnx_checker True \--input_shape_dict="{'x':[-1,3,-1,-1]}"其中--model_dir表示模型路径,--model_filename表示模型文件名称,--params_filename表示参数文件名称,--save_file表示onnx保存路径,--opset_version表示onnx算子版本,--enable_onnx_checker表示是否校验onnx模型,--input_shape_dict表示模型输入shape信息列表。如下图所示,onnx模型转换成功。3.4、转om如下图所示,点击Model Converter按钮。如下图所示,点击Model File文件夹图标,选择转出的onnx模型,点击ok。如下图所示,模型转换工具开始解析onnx模型。如下图所示,同步完成后如下图所示,模型解析完成后,填入相关模型转换信息,点击Next。模型转换信息说明:1.Model Name:保存om模型的名称2.Target SoC Version:目标芯片型号,本文使用的是Ascend310P33.Output Path:om模型保存路径4.Input Format:输入数据格式,本文模型使用是NCHW,表示图像数据。5.Input Nodes:模型输入节点:本文模型包括一个x,其Shape为1,3,-1,-1,Type为FP32。由于输入为动态shape,Dynamic Image Size表示指定模型输入图像的大小。6.Output Nodes:模型输出节点,本文没有配置,使用默认输出节点。如下图所示,模型转换工具开始校验参数。参数校验完成后如下图做数据预处理,因为本文模型不需要数据预处理因此关闭Data Preprocessing,点击Next继续。如下图确认命令无误后点击Finish开始进行模型转换。模型转换成功如下图所示,在本地生成om模型:四、离线推理4.1、数据预处理创建可执行命令,如下图所示,点击Add Configuration。如下图所示,点击+号,选择Python。如下图所示,输入命令名称、执行文件、参数、Python解释器,点击OK。如下图所示,点击命令执行按钮,开始执行数据预处理命令。如下图所示,数据预处理保存在pre_data目录下。4.2、模型推理创建推理执行命令如下图所示。创建完成后执行该命令。命令参数如下:--ais_infer=./ais-infer/ais_infer.py --model=./en_PP-OCRv3_det.om --inputs=/home/pre_data/ --batchsize=1参数说明:如下图所示,推理完成后,推理结果保存在results_bs1目录下。​4.3、推理结果展示如下图所示,创建数据后处理命令。并执行该命令,命令参数如下:-c PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.infer_img="./PaddleOCR/doc/imgs_en/" Global.infer_results=/home/results_bs1/参数说明:-c:模型配置文件-o:可选参数:Global.infer_img表示图片路径,Global.infer_results表示推理结果路径​下图所示,图像检测结果保存在det_results目录下。​如下图所示,通过远程终端使用命令将det_results复制到远程映射目录,并同步到本地。​如下图所示,可以在本地看到推理结果图片。​本地打开图片,可以看出en_PP-OCRv3_det通过离线推理,基本上可以检测出文本单据或者自然场景中的文本位置。检测效果如下:​​五、总结本文通过使用MindStudio工具,完成了en_PP-OCRv3_det模型离线推理以及英文文本检测,​
  • [问题求助] yolov5训练loss比较高
    https://www.hiascend.com/zh/software/modelzoo/models/detail/C/9af06eedaa1b9577d6221a939a31747d/1【Issues Section】/【问题文档片段】**附件如图 loss值比较高,10左右,计算出来的召回率不高,达不到业务要求**原用其它框架准确率能达到要求【Existing Issues】/【存在的问题】【Expected Result】【预期结果】希望能帮忙看看,问题是参数问题,还是其它什么问题?
  • [MindX SDK] MindX SDK -- TSM视频分类参考设计案例
    MindX SDK -- TSM视频分类参考设计案例1 案例概述1.1 概要描述在本系统中,目的是基于MindX SDK,在华为云昇腾平台上,开发端到端TSM视频分类的参考设计,达到功能要求、离线精度要求和性能要求。1.2 模型介绍本项目主要用到了两个模型,分别是:用于视频分类的TSM模型和用于手势识别的jester 模型。TSM模型相关文件可以在此处下载:cid:link_10jester原始模型文件可以在此处下载:cid:link_61.3 实现流程1、基础环境:Ascend 310、mxVision、Ascend-CANN-toolkit、Ascend Driver 2、模型转换:TensorFlow模型转昇腾离线模型:TSM.onnx --> TSM.om 、 jester.onnx --> jester.om 3、视频抽帧:ffmpeg 4、python推理流程代码开发1.4 代码地址本项目的代码地址为:cid:link_72 软件方案介绍2.1 代码目录结构与说明本工程名称为TSM,工程目录如下图所示:├── TSM ├── README.md // 所有模型相关说明 ├── model ├── onnx2om.sh // 转om模型脚本 ├── onnx2om1.sh // 在线模型转om模型脚本 ├── label ├── kinetics_val.csv // label文件 ├── download_data ├── k400_extractor.sh // 解压数据集脚本 ├── offline.png // 离线推理技术实现流程 ├── online.png // 在线推理技术实现流程 ├── online_infer.py // 在线推理精度脚本 ├── offline_infer.py // 离线推理精度脚本 ├── speed.py // 离线单视频推理NPU性能脚本 ├── speed_gpu.py // 离线单视频推理GPU性能脚本3 开发准备3.1 环境依赖说明环境依赖软件和版本如下表:软件名称版本cmake3.5+mxVision5.1RC2Python3.9torch1.10.0ffmpeg4.2.13.2 环境搭建在运行项目前,需要设置环境变量:MindSDK 环境变量:. ${SDK-path}/set_env.shCANN 环境变量:. ${ascend-toolkit-path}/set_env.sh环境变量介绍SDK-path: mxVision SDK 安装路径ascend-toolkit-path: CANN 安装路径。下载ffmpeg,解压进入并执行以下命令安装:./configure --prefix=/usr/local/ffmpeg --enable-shared make -j make install安装完毕后导入环境变量export PATH=/usr/local/ffmpeg/bin:$PATH export LD_LIBRARY_PATH=/usr/local/ffmpeg/lib:$LD_LIBRARY_PATH3.3 模型转换3.3.1 TSM模型转换下载离线模型 TSM.onnx, 将下载好的模型放在“${TSM代码根目录}/model”目录下。将模型转换为om模型,在“model”目录下,执行以下命令生成om模型bash onnx2om.sh3.3.2 jester模型转换下载在线模型 jester.onnx将下载好的模型放在参考设计代码根目录的“model”目录下。将模型转换为om模型,在“model”目录下,运行脚本生成om模型bash onnx2om1.sh模型转换使用了ATC工具,如需更多信息请参考: cid:link_54 离线精度4.1 Kinetics-400数据集下载在 cid:link_2 连接下载download.sh和val_link.list两个脚本,放入/TSM/download_data 目录下,在“/TSM/download_data/”目录下,运行数据集下载脚本,在“/TSM/”目录下新建文件夹data用于存放下载的数据集bash download.sh val_link.list bash k400_extractor.sh4.2 数据集预处理步骤1 Kinetics-400数据集下载参考Kinetics-400 数据准备中的脚本下载操作,在代码根目录的"download_data"目录下准备"download.sh"数据集下载脚本和"val_link.list"验证集链接列表文件。├── TSM ├── download_data ├── download.sh // 下载数据集脚本 ├── k400_extractor.sh // 解压数据集脚本 ├── val_link.list 进入代码根目录的"download_data"目录下,执行以下命令下载数据集压缩包val_part1.tar、val_part2.tar、val_part3.tar:bash download.sh val_link.list然后执行以下命令解压数据集到代码根目录下:bash k400_extractor.sh数据集结构如下:├── TSM ├── data ├── abseiling ├── air_drumming ├── ... ├── zumba步骤2 数据集预处理1、视频抽帧在代码根目录执行以下命令创建所需目录:mkdir tools mkdir ops下载“temporal-shift-module-master.zip”代码包并上传服务器解压,将代码包中"tools"目录下的"vid2img_kinetics.py"、"gen_label_kinetics.py"、"kinetics_label_map.txt"三个文件拷贝至参考设计代码根目录的“tools”目录下。├── TSM ├── tools ├── gen_label_kinetics.py // label生成脚本 ├── vid2img_kinetics.py // 视频抽帧脚本 ├── kinetics_label_map.txt将代码包中"ops"目录下的"basic_ops.py"、"dataset.py"、"dataset_config.py"、"models.py"、"temporal_shift.py"、"transforms.py"六个文件拷贝至参考设计代码根目录的“ops”目录下。 ├── ops ├── basic_ops.py ├── dataset.py // 数据集构建脚本 ├── dataset_config.py // 数据集配置脚本 ├── models.py // 模型搭建脚本 ├── temporal_shift.py ├── transforms.py修改“tools”目录下的 vid2img_kinetics.py 内容,将77、78行注释。77行 #class_name = 'test' 78行 #class_process(dir_path, dst_dir_path, class_name)在参考设计代码根目录下,执行以下命令对数据集视频进行抽帧并生成图片:mkdir dataset cd ./tools python3 vid2img_kinetics.py [video_path] [image_path] e.g. python3 vid2img_kinetics.py ../data ../dataset/修改“tools”目录下gen_label_kinetics.py 内容。# 11行 dataset_path = '../dataset' # 放视频抽帧后的图片路径 # 12行 label_path = '../label' # 存放label路径 # 25行 files_input = ['kinetics_val.csv'] # 26行 files_output = ['val_videofolder.txt'] # 37行 folders.append(items[1]) # 57行 output.append('%s %d %d'%(os.path.join('../dataset/',os.path.join(categories_list[i], curFolder)), len(dir_files), curIDX))在“tools”目录下,执行以下命令生成标签文件:python3 gen_label_kinetics.py4.3 精度测试修改${TSM代码根目录}/ops/dataset_config.py 脚本中参数root_data、filename_imglist_train和filename_imglist_val,若仅进行离线精度测试则可忽略filename_imglist_train设置。import os ROOT_DATASET = './labels/' # 标签文件所在路径 ... def return_kinetics(modality): filename_categories = 400 if modality == 'RGB': root_data = ROOT_DATASET # 训练集根目录 filename_imglist_train = 'train_videofolder.txt' # 训练数据集标签 filename_imglist_val = 'val_videofolder.txt' # 测试数据集标签 prefix = 'img_{:05d}.jpg' else: raise NotImplementedError('no such modality:' + modality) return filename_categories, filename_imglist_train, filename_imglist_val, root_data, prefix在参考设计代码根目录下,运行精度测试脚本python3 offline_infer.py kinetics原模型精度值为71.1%,实测精度值为71.01%,符合精度偏差范围,精度达标。5 离线性能测试将用来测试的单视频放在“/TSM/”目录下,如视频“test.mp4”,运行性能测试脚本修改参数,'./test.mp4'为测试视频def main(): cmd = 'ffmpeg -i \"{}\" -threads 1 -vf scale=-1:331 -q:v 0 \"{}/img_d.jpg\"'.format('./test.mp4', './image') subprocess.call(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) files = os.listdir(r"./image/")5.1 单视频推理性能将用来测试的单视频放在参考设计代码根目录下,如视频“test_speed.mp4”,运行性能测试脚本修改speed_gpu.py与speed.py参数,'./test_speed.mp4'为测试视频,测试视频类别需在Kinetics-400数据集的400个种类内且视频长度至少为3s。def main(): cmd = 'ffmpeg -i \"{}\" -threads 1 -vf scale=-1:331 -q:v 0 \"{}/img_d.jpg\"'.format('./test_speed.mp4', './image') subprocess.call(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) files = os.listdir(r"./image/")GPU性能(Tesla_V100S_PCIE_32GB)在参考设计代码根目录下,运行GPU性能测试脚本python3 speed_gpu.py kinetics --test_segments=8 --test_crops=1 --batch_size=1注:speed_gpu.py脚本需在GPU环境上运行,NPU环境无法运行。得到单视频纯推理性能为0.08sec/videoSDK性能在参考设计代码根目录下,运行SDK性能测试脚本python3 speed.py注:speed.py脚本需在NPU环境上运行。得到单视频纯推理性能为0.189sec/video5.2 GPU与NPU性能对比ModelBatch size310 FPS/CardT4 FPS/CardT4/310TSM16.4819.12.95TSM425.976.42.95TSM851.84153.552.96TSM16103.74310.312.99TSM32207.27627.823.02TSM32207.27627.823.02性能对比评价标准为NPU的最优bs比T4的最优bs:NPU最优性能为bs32,单卡吞吐率:103.74fps/cardGPU最优性能为bs32,单卡吞吐率:310.31fps/cardGPU/ Ascend 310=310.31/103.74=2.99倍6 在线手势识别6.1 安装视频流工具安装视频流工具6.2 生成视频流根据提示当前只支持部分视频格式,并不支持.mp4后缀的文件,但可以通过ffmpeg转换生成ffmpeg安装教程,如下所示为MP4转换为h.264命令:使用ffmpeg工具将带有手势的“jester.mp4”的mp4格式视频转换生成为“jester.264”的264格式视频:ffmpeg -i jester.mp4 -vcodec h264 -bf 0 -g 25 -r 10 -s 1280*720 -an -f h264 jester.264 //-bf B帧数目控制,-g 关键帧间隔控制,-s 分辨率控制 -an关闭音频, -r 指定帧率使用live555生成视频流。6.3 程序测试python3 online_infer.py修改参数,'ip:port/jester.264'为测试视频流,其中ip为起流的机器ip地址,port为起流的机器端口地址,jester.264为测试视频jester.mp4通过ffmpeg转换后的视频。def video2img(): cmd = 'ffmpeg -i \"{}\" -threads 1 -vf scale=-1:331 -q:v 0 \"{}/img_d.jpg\"'.format('rtsp://ip:port/jester.264', './image') subprocess.call(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
  • [硬件整机] 摄像头读取图像格式YUV420转RGB问题
    我的om模型输入图像格式要求是RGB格式,并使用到了resize成指定大小,在使用华为mdc300时,框架提供的HafimageResize接口函数,文档上说明输出图像格式仅支持HAF_IMAGE_YUV420SP_NV12_UINT8。另外提供的Hafimagecvtcolor接口也是输出图像格式仅支持HAF_IMAGE_YUV420SP_NV12_UINT8。请问有解决方法吗,自己想的是:1.经过打印imagetype,摄像头读入的图像格式是RGB_uint8,调用HafimageResize函数,转成自己需要的尺寸大小,但是现在图像格式变为HAF_IMAGE_YUV420SP_NV12_UINT8,所以自己写YUV420转RGB的代码?2.不使用提供的接口,自己写resize的代码3.其他方法
  • [MindX SDK] PraNet分割案例分享
    本案例基于mindxsdk-referenceapps代码仓中的PraNetSegmentation项目(cid:link_2).1 案例概述1.1 概要描述PraNet是一种针对息肉分割任务需求设计的,名为并行反向注意力的深度神经网络。 本项目基于Mind SDK框架实现了PraNet模型的推理。1.2 特性及适用场景在医疗图像处理领域,PraNet针对息肉识别需求而设计。Pranet网络能够对息肉图片进行语义分割,功能正常,且精度达标。但是在以下情况下,分割效果不够理想:1、当息肉相比整张图片面积很小时,分割效果不够理想,边缘会比较模糊。2、当息肉大面具处于整张图片的边缘时,有一定概率分割失败,效果较差。(测试用例3.1.2)轻微的精度损失:该模型相比于原模型精度稍有下降,这是因为mindsdk只提供了jpg格式图片的解码,而原数据集中的图片为png格式,所以为了将模型迁移到mindsdk,需要将数据全部转换为jpg格式。而jpg格式压缩图片是有损失的,所以造成了一定的精度下降。1.3 模型介绍基于并行反向注意力的息肉分割网络(PraNet),利用并行的部分解码器(PPD)在高级层中聚合特征作为初始引导区域,再使用反向注意模块(RA)挖掘边界线索。1.4 实现流程Pranet语义分割流程图如下:(1)输入类型可以是图片数据(jpg图片序列)(2)通过调用MindX SDK提供的图像解码插件mxpi_imagedecoder,解码后获取图像数据。(3)然后进行图像尺寸大小变换,调用MindX SDK提供的图像尺寸大小变换插件mxpi_imageresize。(4)调用MindX_SDK的mxpi_tensorinfer插件,将尺寸变换后的图像数据输入Pranet模型进行推理。(5)使用项目内实现的pranet_postprocess插件对mxpi_tensorinfer插件的输出结果进行后处理,得到语义分割的结果。(6)调用MindX_SDK的mxpi_imageencoder插件,输出结果。2 软件方案介绍2.1 技术原理在SDK框架下使用对模型的推理结果进行可视化,用插件实现种类到颜色的映射,最终可视化为语义分割结果。2.2 项目方案架构介绍语义分割的SDK流程图如下: 序号插件功能描述1解码调用MindX SDK的 mxpi_imagedecoder2缩放调用MindX SDK的mxpi_imageresize3推理使用已经训练好的Pranet模型,对图片进行分割。插件:mxpi_tensorinfer4后处理项目内实现的pranet_postprocess插件。5图像编码调用MindX SDK的mxpi_imageencoder3 开发环境准备3.1 环境依赖说明推荐系统为ubuntu 18.04,环境依赖软件和版本如下表:软件名称版本说明获取方式MindX SDK2.0.4mxVision软件包链接ubuntu18.04.1 LTS操作系统Ubuntu官网获取Ascend-CANN-toolkit5.0.4Ascend-cann-toolkit开发套件包链接python3.9.2numpy1.22.4维度数组运算依赖库服务器中使用pip或conda安装opencv-python4.5.5.64图像处理依赖库服务器中使用pip或conda安装PIL9.0.1图像处理依赖库服务器中使用pip或conda安装onnx1.12.0模型转化库服务器中使用pip或conda安装tabulate0.8.10格式化输出服务器中使用pip或conda安装4 编译与运行4.1 下载数据集数据集下载地址: cid:link_4本模型使用Kvasir的验证集。请用户需自行获取Kvasir数据集,上传数据集到项目根目录并解压。TestDataset ├── Kvasir ├── images ├── masks4.2 获取OM模型文件OM权重文件下载参考华为昇腾社区ModelZoo 获取到PraNet-19.onnx模型后,将其放在model目录下。在model目录键入以下命令bash onnx2om.sh能获得PraNet-19_bs1.om模型文件。注: ModelZoo 中的模型文件PraNet-19_bs1.om不能用于本项目。4.3 编译插件首先进入文件夹plugin/postprocess/,键入bash build.sh,对后处理插件进行编译。4.4 推理在项目根目录下键入python main.py --pipeline_path pipeline/pranet_pipeline.json --data_path ./TestDataset/Kvasir/images/其中参数--pipeline_path为pipeline配置文件的路径,项目中已经给出该文件,所以直接使用相对路径即可; --data_path参数为推理图片的路径。最终用户可以在./infer_result/路径下查看结果。4.5 指标验证在项目根目录下键入python test_metric.py --pipeline_path pipeline/pranet_pipeline.json --data_path ./TestDataset/Kvasir/其中参数--pipeline_path为pipeline配置文件的路径,项目中已经给出该文件,所以直接使用相对路径即可; --data_path参数为数据集的路径。待脚本运行完毕,会输出以下精度信息。dataset meanDic meanIoU --------- --------- --------- res 0.890 0.828原模型的精度为:dataset meanDic meanIoU --------- --------- --------- res 0.895 0.836精度符合要求。5 常见问题5.1 未正确生成om模型文件问题描述: om模型文件未正确生成,或者路径错误。错误信息如图:解决方案:检查模型是否放入./model/文件夹,检查./pipeline/pranet.pipeline中对模型文件的位置是否正确。检查无误后重新生成om模型。5.2 未链接动态库问题描述: 未链接动态库。错误信息如图:解决方案:根据步骤对插件进行编译。