• [技术干货] pytorch实现LSTM自动AI作诗(藏头诗和首句续写)| 第6例
    前言大家好,我是阿光。本专栏整理了《PyTorch深度学习项目实战100例》,内包含了各种不同的深度学习项目,包含项目原理以及源码,每一个项目实例都附带有完整的代码+数据集。正在更新中~ ✨🚨 我的项目环境:平台:Windows10语言环境:python3.7编译器:PyCharmPyTorch版本:1.8.1💥 项目专栏:【PyTorch深度学习项目实战100例】一、LSTM自动AI作诗本项目使用了LSTM作为模型实现AI作诗,作诗模式分为两种,一是根据给定诗句继续生成完整诗句,二是给定诗头生成藏头诗。二、数据集介绍数据来源于chinese-poetry,最全中文诗歌古典文集数据库最全的中华古典文集数据库,包含 5.5 万首唐诗、26 万首宋诗、2.1 万首宋词和其他古典文集。诗 人包括唐宋两朝近 1.4 万古诗人,和两宋时期 1.5 千古词人。实验使用预处理过的二进制文件 tang.npz 作为数据集,含有 57580 首唐诗,每首诗限定在 125 词, 不足 125 词的以空格填充。数据集以 npz 文件形式保存,包含三个部分:data: (57580,125) 的 numpy 数组,总共有 57580 首诗歌,每首诗歌长度为 125 字符 (不足 125 补空格,超过 125 的丢弃),将诗词中的字转化为其在字典中的序号表示ix2word: 序号到字的映射word2ix: 字到序号的映射三、算法流程介绍1.输入数据为input,形状为 124 * 162.输入数据的label为 124 * 16之后需要对输入数据进行嵌入,如果不嵌入那么每个古诗的字应为对应的索引,为了能够进行训练所以需要将其进行嵌入,然后形成连续性变量。之后我们的数据就变成了 124 * 16 * embedding_dim然后将其导入到LSTM模块中,则形成的形状为 124 * 16 * hidden_dim之后将其导入到全连接层,形成分类,变为的形状为 124 * 16 ,vocab_size注意一定要清楚各个位置不同变量的形状,这些在代码中已经注明,一定要弄明白batch_size,time_step,embedding_dim,vocal_size,num_layers,hidden_dim以及input_size在代码中的意义。四、定义网络模型项目中使用的模型是LSTM,在模型中我们定义了三个组件,分别是embedding层,lstm层和全连接层。Embedding层:将每个词生成对应的嵌入向量,就是利用一个连续型向量来表示每个词Lstm层:提取诗句中的语义信息Linear层:将结果映射成vocab_size大小用于分类,即每个字的概率class LSTM(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim): super(LSTM, self).__init__() self.hidden_dim = hidden_dim self.embeddings = nn.Embedding(vocab_size, embedding_dim) self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers) self.linear = nn.Linear(hidden_dim, vocab_size) def forward(self, x, hidden=None): time_step, batch_size = x.size() # 124, 16 if hidden is None: h_0 = x.data.new(num_layers, batch_size, self.hidden_dim).fill_(0).float() c_0 = x.data.new(num_layers, batch_size, self.hidden_dim).fill_(0).float() else: h_0, c_0 = hidden embeds = self.embeddings(x) output, (h_n, c_n) = self.lstm(embeds, (h_0, c_0)) output = self.linear(output.reshape(time_step * batch_size, -1)) return output, (h_n, c_n)五、给定首句生成古诗该函数会根据给定的语句生成一段古诗,会根据start_words继续生成下一个字,对于给定的语句生成相应的hidden,然后将最后一个字和对应的hidden将其输入到模型中,生成新的下一个字,然后将新生成的字作为新的输入。此外还可以加入诗句前缀,加入的目的是会影响生成诗句的风格,首先利用prefix_words生成对应的hidden,然后将hidden送入模型生成诗句,此hidden中包含了前缀中的语义信息。start_words:给定的初始语句,基于它生成之后的诗句prefix_words:前缀诗句,该句会影响诗句的风格,因为首先会学习对应的hidden然后将其和开始词送入模型生成对应的诗句六、生成藏头诗生成藏头诗的原理与上述函数同理,只不过是利用给定的藏头字分别作为输入,一旦遇到特殊符号就说明该句生成结束,继续生成下一句藏头诗同理这个函数也会包含诗句前缀,影响诗句的风格七、模型训练对于模型训练最重要的就是模型的输入和输出,针对于写诗这个任务,我们的输入应该是给定一句诗,然后错位1位作为它的标签用于监督,例如:床前明月光疑是地上霜那么我们的输入、输出就应该为:输入数据:床前明月光疑是地上输出数据:前明月光疑是地上霜每个时间步对应的输出应该是他下个时间步对应的字每个时间步的输出应该是vocab_size维度,就是词大小,用于全分类,这样就会根据概率获得该时间步对应的字。由于本项目中采用的是唐诗数据集,数据为data: (57580,125) 的 numpy 数组,也就是每个样本为一句诗,每首诗中含有125个字,相当于不同的时间步,然后将每个字进行embedding进行编码,这里我们将输入作为124,也就是像上面所说,利用前124个字和他对应之后的124个字进行监督。def train(): if use_gpu: device = torch.device("cuda") else: device = torch.device("cpu") # 获取数据 datas = np.load("tang.npz", allow_pickle=True) data = datas['data'] ix2word = datas['ix2word'].item() word2ix = datas['word2ix'].item() data = torch.from_numpy(data) dataloader = DataLoader(data, batch_size=batch_size, shuffle=True) # 定义模型 model = LSTM(len(word2ix), embedding_dim=embedding_dim, hidden_dim=hidden_dim) Configimizer = optim.Adam(model.parameters(), lr=lr) criterion = nn.CrossEntropyLoss() # if model_path: # model.load_state_dict(torch.load(model_path,map_location='cpu')) # 转移到相应计算设备上 model.to(device) loss_meter = meter.AverageValueMeter() # 进行训练 f = open('result.txt', 'w') hidden = None for epoch in range(epochs): loss_meter.reset() for li, data_ in tqdm.tqdm(enumerate(dataloader)): # print(data_.shape) # 16 * 125 batch_size * time_step data_ = data_.long().transpose(1, 0).contiguous() # 125 * 16 time_step * batch_size # 注意这里,也转移到了计算设备上 data_ = data_.to(device) Configimizer.zero_grad() # n个句子,前n-1句作为输入,后n-1句作为输出,二者一一对应 input_, target = data_[:-1, :], data_[1:, :] # 都是124 * 16 output, hidden = model(input_, hidden) # 1984 * 8293 [batch_size * time_step, vocab_size] 124为时间步大小 # print("Here",output.shape) # # 这里为什么view(-1) # output:1984 * 8293 # target.view(-1):1984 # 可以直接放入,因为是交叉熵损失函数,target为标签,而output每个样本的输出为所有样本的类别概率 loss = criterion(output, target.view(-1)) loss.backward() Configimizer.step() loss_meter.add(loss.item()) # 进行可视化 if (1 + li) % plot_every == 0: print("训练损失为%s" % (str(loss_meter.mean))) f.write("训练损失为%s" % (str(loss_meter.mean))) for word in list(u"春江花朝秋月夜"): gen_poetry = ''.join(generate(model, word, ix2word, word2ix)) print(gen_poetry) f.write(gen_poetry) f.write("\n\n\n") f.flush() torch.save(model.state_dict(), '%s_%s.pth' % (model_prefix, epoch))八、生成诗句该函数用于进行测试生成语句,我们会首先加载我们训练好的模型,然后传入续写的诗句或者需要加入的前缀信息,形成诗句。八、完整源码【PyTorch深度学习项目实战100例】—— 使用pytorch实现LSTM自动AI作诗(藏头诗和首句续写)| 第6例_咕 嘟的博客-CSDN博客_pytorch基于 lstm 的自动写诗作者:咕嘟_链接:https://www.jianshu.com/p/ac8c57048ec5来源:简书著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
  • [技术干货] 文本生成图像-本地电脑实现text2img
    今天给大家带来一个文本生成图像的案例。让大家都成为艺术家,自己电脑也能生成图片 ,该模型它能让数十亿人在几秒钟内创建出精美的艺术品。在速度和质量方面,都有所突破,这意味着图像生成技术走向大众。Stable Diffusion模型包括两个步骤:前向扩散——通过逐渐扰动输入数据将数据映射到噪声。这是通过一个简单的随机过程正式实现的,该过程从数据样本开始,并使用简单的高斯扩散核迭代地生成噪声样本。此过程仅在训练期间使用,而不用于推理。参数化反向——撤消前向扩散并执行迭代去噪。这个过程代表数据合成,并被训练通过将随机噪声转换为真实数据来生成数据。模型架构————————————————下面介绍一下,Stable Diffusion模型的代码实现,主代码:```pythonfrom tensorflow import kerasfrom stable_diffusion_tf.stable_diffusion import StableDiffusionimport argparsefrom PIL import Imageimport osparser = argparse.ArgumentParser()parser.add_argument(    "--prompt",    type=str,    nargs="?",    default="Romantic lavender and sunset",    help="the prompt to render",)parser.add_argument(    "--output",    type=str,    nargs="?",    default="output4.png",    help="where to save the output image",)parser.add_argument(    "--H",    type=int,    default=256,    help="image height, in pixels",)parser.add_argument(    "--W",    type=int,    default=512,    help="image width, in pixels",)parser.add_argument(    "--scale",    type=float,    default=7.5,    help="unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))",)parser.add_argument(    "--steps", type=int, default=50, help="number of ddim sampling steps")parser.add_argument(    "--seed",    type=int,    help="optionally specify a seed integer for reproducible results",)parser.add_argument(    "--mp",    default=False,    action="store_true",    help="Enable mixed precision (fp16 computation)",)args = parser.parse_args()if args.mp:    print("Using mixed precision.")    keras.mixed_precision.set_global_policy("mixed_float16")# 引入StableDiffusion模型generator = StableDiffusion(img_height=args.H, img_width=args.W, jit_compile=False)img = generator.generate(    args.prompt,    num_steps=args.steps,    unconditional_guidance_scale=args.scale,    temperature=1,    batch_size=1,    seed=args.seed,)Image.fromarray(img[0]).save(args.output)print(f"saved at {args.output}")```代码下载地址:链接:https://pan.baidu.com/s/1ES7Vr_gla5hwPmdkUdR8Qg?pwd=wqmb提取码:wqmb运行案例,大家用GPU跑会快一点。描述语言: A girl carrying a bag faces the sea as the sun sets(一个女孩背着包面向大海,夕阳西下)描述语言: A sea of romantic lavender with the sun setting(浪漫的薰衣草的海洋,夕阳西下)版权声明:本文为CSDN博主「微学AI」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。原文链接:https://blog.csdn.net/weixin_42878111/article/details/127674871
  • [技术干货] 学习Pytorch要掌握哪些知识点
    PyTorch是一种深度学习框架,学习PyTorch需要掌握以下知识点:PyTorch基础:包括PyTorch的张量、自动求导机制、模型构建等基本概念和使用方法。深度学习算法:包括卷积神经网络、循环神经网络、生成对抗网络等基本的深度学习算法。数据预处理与增强:包括数据加载、数据预处理、数据增强等技术,用于提高模型的准确率和泛化能力。模型训练与优化:包括模型训练、模型评估、优化器选择等基本的深度学习模型训练和优化技术。模型部署与使用:包括将训练好的模型部署到生产环境中,以及使用PyTorch进行预测和推理等技术。模型调试与改进:包括调试深度学习模型的方法和技巧,以及改进模型的方法,提高模型的准确率和性能。深度学习应用:包括计算机视觉、自然语言处理、语音识别等领域的深度学习应用,以及深度学习在各个领域的最新研究进展。以上是学习PyTorch的主要知识点,通过系统的学习,可以掌握PyTorch的基本原理和使用方法,提高深度学习的应用能力。PyTorch开发工具的推荐:Anaconda(必备):是一款Python发行版,包含了PyTorch和其他常用的科学计算、数据分析、机器学习库,可以方便地创建和管理Python环境。PyCharm:是一款功能强大的Python IDE,支持PyTorch的开发和调试,提供了丰富的代码编辑、调试、测试等功能。Jupyter Notebook:是一款基于Web的交互式笔记本,支持PyTorch的开发和演示,可以快速创建和分享代码、数据分析、可视化等内容。Visual Studio Code:是一款轻量级的代码编辑器,支持PyTorch的开发和调试,提供了丰富的插件和扩展,可以满足不同的开发需求。
  • [技术干货] pytorch用来干嘛的
    一、pytorch有什么用PyTorch是一个开源的Python机器学习库,基于Torch,用于自然语言处理等应用程序。2017年1月,由Facebook人工智能研究院(FAIR)基于Torch推出了PyTorch。它是一个基于Python的可续计算包,提供两个高级功能:1、具有强大的GPU加速的张量计算(如NumPy)。2、包含自动求导系统的深度神经网络。PyTorch使用Python作为开发语言,允许开发人员访问广泛的Python生态圈库和软件。 而在PyTorch开发中,数据处理型与数据计算包Numpy的矩阵型、代码样式型机器学习包scikit-learn相似,便于广大机器学习者进入深度学习这一新领域。目前,许多开源框架(如TensorFlow、Caffe2、CNTK和Theano )采用静态计算图,而PyTorch采用动态计算图。 在静态计算图表中,必须先定义网络模型,然后运行,一次定义并运行多次。 动态计算图表可以在运行中定义,可以在运行时构建,多次构建并运行。静态图的实现代码是冗馀的,不直观的。 动态图的实现简洁优雅,直观。 动态计算图表的另一个显著优点是易于调试,并且可以随时查看变量值。 由于模型可能会变得复杂,因此如果可以直观地看到变量值,就可以快速构建模型。 PyTorch的API设计简单优雅,使用方便。 二、pytorch的运行环境已兼容Windows(CUDA,CPU)、MacOS(CPU)、Linux(CUDA,ROCm,CPU) 。基础环境:一台PC设备、一张高性能NVIDIA显卡(可选)、Ubuntu系统。 三、pytorch的优点PyTorch是相当简洁且高效快速的框架 设计追求最少的封装 设计符合人类思维,它让用户尽可能地专注于实现自己的想法 与google的Tensorflow类似,FAIR的支持足以确保PyTorch获得持续的开发更新 PyTorch作者亲自维护的论坛 供用户交流和求教问题 入门简单拓展阅读 PyTorch 库深度学习流程数据加载和处理任何深度学习项目的第一步都是处理数据加载和处理。PyTorch 通过torch.utils.data提供相同的实用程序。该模块中的两个重要类是Dataset和DataLoader。如果我们可以访问多台机器或 GPU,我们也可以使用torch.nn.DataParallel和torch.distributed。构建神经网络torch.nn模块用于创建神经网络。它提供了所有常见的神经网络层,如全连接层、卷积层、激活和损失函数等。一旦创建了网络架构并且准备好将数据馈送到网络,我们就需要更新权重和偏差的,以便网络开始学习。这些实用程序在torch.optim模块中提供。类似地,对于反向传递期间所需的自动微分,我们使用torch.autograd模块模型推断和兼容性模型经过训练后,可用于预测测试,甚至测试新数据集的输出。这个过程被称为模型推理。PyTorch 还提供TorchScript,可用于独立于 Python 运行时运行模型。这可以被认为是一个虚拟机,其指令主要针对张量。还可以将使用 PyTorch 训练的模型转换为 ONNX 等格式,这样就可以在其他 DL 框架(例如 MXNet、CNTK、Caffe2)中使用这些模型。转自 cid:link_0
  • [技术干货] 想学人工智能,需要哪些基础知识(面向新手)
    主流框架人工智能现在主流的框架有两个:TensorflowPytorchPytorch接口要稳定点,学习线路也不那么陡峭。下面以学习pytorch为例要具备哪些知识学习PyTorch需要具备以下知识:Python编程基础:了解Python基本语法、掌握基本数据结构和函数使用机器学习基础知识:了解机器学习的基本概念、算法、模型和评估指标,掌握常见的分类、回归等机器学习任务神经网络基础:了解神经网络的基本结构、激活函数、损失函数和优化器等深度学习框架:了解深度学习框架的基本原理,可以选择其他框架如TensorFlow、Keras等数学基础:了解线性代数、微积分等数学基础知识计算机视觉或自然语言处理等领域的基础知识:了解相关领域的基本概念和算法实践经验:通过参加实际项目、做实验等方式积累实践经验,提高实际应用能力要学多久学习PyTorch的时间因人而异,取决于个人的学习速度、学习目的和学习深度。如果你只是初步了解PyTorch的基本语法和使用方式,可能只需要几周的时间。但是,如果你希望深入学习PyTorch,并在实际项目中应用,可能需要更长的时间。
  • [算法管理] 如何关联yolov5和obs
    yolov5-6.1版本如何设置data_url,train_url,让云平台上的训练模型导出到obs里.
  • [技术干货] RetinaFace(人脸检测/PyTorch)
    RetinaFace(人脸检测/PyTorch)RetinaFace是一个强大的单阶段人脸检测模型,它利用联合监督和自我监督的多任务学习,在各种人脸尺度上执行像素方面的人脸定位。本案例是RetinaFace论文复现的体验案例,此模型基于RetinaFace: Single-stage Dense Face Localisation in the Wild中提出的模型结构实现,该算法会载入在WiderFace 上的预训练模型,在用户数据集上做迁移学习。我们提供了训练代码和可用于训练的模型,用于实际场景的微调训练。具体算法介绍:cid:link_0注意事项:1.本案例使用框架:PyTorch1.4.02.本案例使用硬件:GPU: 1*NVIDIA-V100NV32(32GB) | CPU: 8 核 64GB 3.运行代码方法: 点击本页面顶部菜单栏的三角形运行按钮或按Ctrl+Enter键 运行每个方块中的代码4.JupyterLab的详细用法: [请参考《ModelAtrs JupyterLab使用指导》](https://bbs.huaweicloud.com/forum/thread-97603-1-1.html)5.碰到问题的解决办法**:** [请参考《ModelAtrs JupyterLab常见问题解决办法》](https://bbs.huaweicloud.com/forum/thread-98681-1-1.html)1.下载数据和代码运行下面代码,进行数据和代码的下载和解压本案例使用WIDER人脸数据集。import os# 数据代码下载!wget https://obs-aigallery-zc.obs.cn-north-4.myhuaweicloud.com/algorithm/RetinaFace.zip# 解压缩os.system('unzip RetinaFace.zip -d ./')--2021-06-25 15:19:18-- https://obs-aigallery-zc.obs.cn-north-4.myhuaweicloud.com/algorithm/RetinaFace.zipResolving proxy-notebook.modelarts.com (proxy-notebook.modelarts.com)... 192.168.6.62Connecting to proxy-notebook.modelarts.com (proxy-notebook.modelarts.com)|192.168.6.62|:8083... connected.Proxy request sent, awaiting response... 200 OKLength: 1997846711 (1.9G) [application/zip]Saving to: ‘RetinaFace.zip’RetinaFace.zip 100%[===================>] 1.86G 177MB/s in 13s 2021-06-25 15:19:31 (149 MB/s) - ‘RetinaFace.zip’ saved [1997846711/1997846711]02.模型训练2.1依赖库加载可能会耗时几分钟,请耐心等待from __future__ import print_functionimport osroot_path = './RetinaFace/'os.chdir(root_path)import torchimport torch.optim as optimimport torch.backends.cudnn as cudnnimport argparseimport torch.utils.data as datafrom data import WiderFaceDetection, detection_collate, preproc, cfg_mnet, cfg_re50,cfg_re152from layers.modules import MultiBoxLossfrom layers.functions.prior_box import PriorBoximport timeimport datetimeimport mathfrom models.retinaface import RetinaFacefrom eval import eval_runfrom eval_standard import run_eval_standardfrom widerface_evaluate.evaluation import evaluation===================Install cython_bbox successful========================2.2参数设置及依赖库安装安装依赖库会花费几分钟的时间,请耐心等待parser = argparse.ArgumentParser(description='Retinaface Training')parser.add_argument('--data_url', default='./WIDER_train/', help='Training dataset directory')parser.add_argument('--train_url', default='./output/', help='Location to save checkpoint models')parser.add_argument('--data_format', type=str, default="zip", help='zip or dir')parser.add_argument('--network', default='resnet50', help='Backbone network mobile0.25 , resnet50 or resnet152')parser.add_argument('--num_workers', default=1, type=int, help='Number of workers used in dataloading')parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, help='initial learning rate')parser.add_argument('--momentum', default=0.9, type=float, help='momentum')parser.add_argument('--load_weight', default='weight/best_model.pth', help='resume net for retraining')parser.add_argument('--resume_epoch', default=0, type=int, help='resume iter for retraining')parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD')parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD')parser.add_argument('--img_size', default=1024, type=int,help='image size')parser.add_argument('--test_origin_size', default=False, help='Whether use origin image size to evaluate')parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold')parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold')parser.add_argument('--gpu_train', default=True, type=bool, help='gpu or cpu train')parser.add_argument('--num_gpu', default=1, type=int, help='if 1,use one gpu,is more than 1,use all gpus')parser.add_argument('--batch_size', default=16, type=int, help='train batch_size')parser.add_argument('--epoch', default=1, type=int, help='train epoch')parser.add_argument('--use_backbone', default='True', type=str, help='use backbone pretrain')parser.add_argument('--is_eval_in_train', default=False, type=bool, help='Do eval on the val dataset atfter every train epoch')parser.add_argument('--use_mixed', default='True',type=str, help='')parser.add_argument('--amp_level', default='O1', help='mixed_precision level,eg:O0,O1,O2,O3')parser.add_argument('--warmup_epoch', default=10, type=int, help='lr warm up epoch')parser.add_argument('--decay1', default=50, type=int, help='lr first decay epoch')parser.add_argument('--decay2', default=80, type=int, help='lr second decay epoch')parser.add_argument('--use_cosine_decay', default='True', type=str, help='use cosine_decay_learning_rate')parser.add_argument('--optimizer', default='sgd', help='sgd or adam')parser.add_argument('--eval', default='False', type=str,help='')parser.add_argument('--init_method', help='')#modelarts运行需要接收该参数,代码中未使用args, unknown = parser.parse_known_args()if args.eval=='True' and ((args.load_weight is None) or (args.load_weight=='None') or (args.load_weight=='')): raise Exception('when "eval" set to True,"load_weight" must set the weigth path')try: from moxing.framework import file if args.train_url is not None: save_folder = args.train_urlexcept: save_folder = args.train_url print('Is not ModelArts platform')if args.eval=='True': args.use_mixed=Falseif args.use_mixed=='True': mixed_precision = True try: # Mixed precision training https://github.com/NVIDIA/apex from apex import amp except: print("install apex") os.system('pip --default-timeout=100 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" '+ './apex-master') try: from apex import amp except: print('Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex') mixed_precision = False # not installedelse: mixed_precision = Falseif not os.path.exists(save_folder): os.makedirs(save_folder)cfg = Noneif args.network == "mobile0.25": cfg = cfg_mnetelif args.network == "resnet50": cfg = cfg_re50elif args.network == "resnet152": cfg = cfg_re152if args.eval=='False': if args.img_size is not None: cfg['image_size']=args.img_size if args.gpu_train is not None: cfg['gpu_train']=args.gpu_train if args.num_gpu is not None: cfg['ngpu']=args.num_gpu if args.batch_size is not None: cfg['batch_size']=args.batch_size if args.epoch is not None: cfg['epoch']=args.epoch if args.decay1 is not None: cfg['decay1']=args.decay1 if args.decay2 is not None: cfg['decay2']=args.decay2 rgb_mean = (104, 117, 123) # bgr order num_classes = 2 img_dim = cfg['image_size'] num_gpu = cfg['ngpu'] batch_size = cfg['batch_size'] max_epoch = cfg['epoch'] gpu_train = cfg['gpu_train'] num_workers = args.num_workers momentum = args.momentum weight_decay = args.weight_decay initial_lr = args.lr gamma = args.gamma training_dataset = os.path.join(args.data_url,'label.txt')INFO:root:Using MoXing-v2.0.0.rc0-19e4d3abINFO:root:Using OBS-Python-SDK-3.20.9.1install apex2.3创建模型device = torch.device("cuda:0" if args.gpu_train else "cpu")backbone_pretrain=Noneif args.use_backbone=='True': if args.network=='mobile0.25': backbone_pretrain='./backbone_pretrain/mobilenetV1X0.25_pretrain.tar' if args.network=='resnet50': backbone_pretrain= './backbone_pretrain/resnet50-19c8e357.pth' if args.network=='resnet152': backbone_pretrain='./backbone_pretrain/resnet152-b121ed2d.pth'net = RetinaFace(cfg=cfg,backbone_pretrain=backbone_pretrain).to(device)print("Printing net...")if args.load_weight is not None: print('Loading resume network...') state_dict = torch.load(args.load_weight) # create new OrderedDict that does not contain `module.` from collections import OrderedDict new_state_dict = OrderedDict() for k, v in state_dict.items(): head = k[:7] if head == 'module.': name = k[7:] # remove `module.` else: name = k new_state_dict[name] = v net.load_state_dict(new_state_dict) cudnn.benchmark = True2.4激活函数、优化算法、训练函数if args.eval=='False': if args.optimizer=='sgd': optimizer = optim.SGD(net.parameters(), lr=initial_lr, momentum=momentum, weight_decay=weight_decay) elif args.optimizer=='adam': optimizer = optim.Adam(net.parameters(), lr=initial_lr, weight_decay=weight_decay) criterion = MultiBoxLoss(num_classes, 0.35, True, 0, True, 7, 0.35, False) # Mixed precision training https://github.com/NVIDIA/apex if mixed_precision: net, optimizer = amp.initialize(net, optimizer, opt_level=args.amp_level, verbosity=0) if num_gpu > 1 and gpu_train: net = torch.nn.DataParallel(net) priorbox = PriorBox(cfg, image_size=(img_dim, img_dim)) with torch.no_grad(): priors = priorbox.forward() priors = priors.to(device) def train(): net.train() epoch = 0 + args.resume_epoch print('Loading Dataset...') dataset = WiderFaceDetection( training_dataset,preproc(img_dim, rgb_mean)) epoch_size = math.ceil(len(dataset) / batch_size) max_iter = max_epoch * epoch_size stepvalues = (cfg['decay1'] * epoch_size, cfg['decay2'] * epoch_size) step_index = 0 if args.resume_epoch > 0: start_iter = args.resume_epoch * epoch_size else: start_iter = 0 best_ap50=0 for iteration in range(start_iter, max_iter): if iteration % epoch_size == 0: # create batch iterator batch_iterator = iter(data.DataLoader(dataset, batch_size, shuffle=True, num_workers=num_workers, collate_fn=detection_collate)) epoch += 1 load_t0 = time.time() if iteration in stepvalues: step_index += 1 if args.use_cosine_decay=='True': lr=cosine_decay_learning_rate(optimizer,iteration,max_iter,epoch,epoch_size) else: lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size) # load train data images, targets = next(batch_iterator) images = images.to(device) targets = [anno.to(device) for anno in targets] # forward out = net(images) # backprop optimizer.zero_grad() loss_l, loss_c, loss_landm = criterion(out, priors, targets) loss = cfg['loc_weight'] * loss_l + loss_c + loss_landm # Backward if mixed_precision: with amp.scale_loss(loss, optimizer) as scaled_loss: scaled_loss.backward() else: loss.backward() optimizer.step() load_t1 = time.time() batch_time = load_t1 - load_t0 eta = int(batch_time * (max_iter - iteration)) curr_epoch_iteration=iteration % epoch_size if curr_epoch_iteration==0: print('Epoch:{}/{} || Epochiter: {}/{} || Iter: {}/{} || Loc: {:.4f} Cla: {:.4f} Landm: {:.4f} || LR: {:.18f} || Batchtime: {:.4f} s || ETA: {}' .format(epoch, max_epoch, curr_epoch_iteration + 1, epoch_size, iteration + 1, max_iter, loss_l.item(), loss_c.item(), loss_landm.item(), lr, batch_time, str(datetime.timedelta(seconds=eta)))) model_name='RetinaFace_'+cfg['name'] + '_Final.pth' save_model_path=os.path.join(save_folder,model_name) torch.save(net.state_dict(), save_model_path)def cosine_decay_learning_rate(optimizer,iteration,max_iter,epoch,epoch_size): warmup_epoch = args.warmup_epoch if epoch <= warmup_epoch: lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch) else: if warmup_epoch>0: max_iter=max_iter-warmup_epoch*epoch_size iteration=iteration-warmup_epoch*epoch_size lf = lambda x: (((1 + math.cos(x * math.pi / max_iter)) / 2) ** 1.0) lr = initial_lr*lf(iteration) for param_group in optimizer.param_groups: param_group['lr'] = lr return lrdef adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size): """Sets the learning rate # Adapted from PyTorch Imagenet example: # https://github.com/pytorch/examples/blob/master/imagenet/main.py """ warmup_epoch = args.warmup_epoch if epoch <= warmup_epoch: lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch) else: lr = initial_lr * (gamma ** (step_index)) for param_group in optimizer.param_groups: param_group['lr'] = lr return lr2.5开始训练由于数据量大,一次训练需要几分钟if __name__ == '__main__': train()Loading Dataset...Epoch:1/1 || Epochiter: 1/805 || Iter: 1/805 || Loc: 0.2966 Cla: 0.7142 Landm: 0.6346 || LR: 0.000001000000000000 || Batchtime: 18.7019 s || ETA: 4:10:55Epoch:1/1 || Epochiter: 11/805 || Iter: 11/805 || Loc: 0.5082 Cla: 0.9243 Landm: 2.6661 || LR: 0.000002240993788820 || Batchtime: 0.8892 s || ETA: 0:11:46 . . .Epoch:1/1 || Epochiter: 791/805 || Iter: 791/805 || Loc: 0.4356 Cla: 0.7774 Landm: 0.6903 || LR: 0.000099038509316770 || Batchtime: 0.8913 s || ETA: 0:00:13Epoch:1/1 || Epochiter: 801/805 || Iter: 801/805 || Loc: 0.7497 Cla: 0.9509 Landm: 0.6751 || LR: 0.000100279503105590 || Batchtime: 0.9403 s || ETA: 0:00:043.模型测试3.1测试函数¶# -*- coding: utf-8 -*-import numpy as npfrom PIL import Image,ImageDrawimport osimport torchimport torch.backends.cudnn as cudnnimport cv2import timefrom models.retinaface import RetinaFacefrom utils.box_utils import decode, decode_landmfrom utils.timer import Timerfrom data import cfg_mnet, cfg_re50,cfg_re152from layers.functions.prior_box import PriorBoxfrom utils.nms.py_cpu_nms import py_cpu_nmsclass ObjectDetect(): def __init__(self, model_path): torch.set_grad_enabled(False) self.cfg=cfg_re50 if torch.cuda.is_available(): use_cpu=False else: use_cpu=True self.device=torch.device("cpu" if use_cpu else "cuda") self.target_size=1600 self.max_size=2150 self.origin_size=False self.confidence_threshold=0.02 self.nms_threshold=0.4 self.net = RetinaFace(cfg=self.cfg, phase = 'test') self.net = load_model(self.net, model_path, use_cpu) self.net.eval() print('load smodel success') cudnn.benchmark = True self.net = self.net.to(self.device) def predict(self, file_name): image = Image.open(file_name).convert('RGB') img_rgb=np.array(image) img_bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR) # PIL读取的RGB图像转换为CV2的BGR格式 img = np.float32(img_bgr) im_shape = img.shape im_size_min = np.min(im_shape[0:2]) im_size_max = np.max(im_shape[0:2]) resize = float(self.target_size) / float(im_size_min) if np.round(resize * im_size_max) >self. max_size: resize = float(self.max_size) / float(im_size_max) if self.origin_size: resize = 1 if resize != 1: img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_CUBIC) im_height, im_width, _ = img.shape scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]]) img -= (104, 117, 123) img = img.transpose(2, 0, 1)#channel last转换为channel first img = torch.from_numpy(img).unsqueeze(0) img = img.to(self.device) scale = scale.to(self.device) loc, conf, landms = self.net(img) # forward pass priorbox = PriorBox(self.cfg, image_size=(im_height, im_width)) priors = priorbox.forward() priors = priors.to(self.device) prior_data = priors.data boxes = decode(loc.data.squeeze(0), prior_data, self.cfg['variance']) boxes = boxes * scale / resize boxes = boxes.cpu().numpy() scores = conf.squeeze(0).data.cpu().numpy()[:, 1] landms = decode_landm(landms.data.squeeze(0), prior_data, self.cfg['variance']) scale1 = torch.Tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2], img.shape[3], img.shape[2], img.shape[3], img.shape[2], img.shape[3], img.shape[2]]) scale1 = scale1.to(self.device) landms = landms * scale1 / resize landms = landms.cpu().numpy() # ignore low scores inds = np.where(scores > self.confidence_threshold)[0] boxes = boxes[inds] landms = landms[inds] scores = scores[inds] # keep top-K before NMS order = scores.argsort()[::-1] # order = scores.argsort()[::-1][:args.top_k] boxes = boxes[order] landms = landms[order] scores = scores[order] # do NMS dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False) keep = py_cpu_nms(dets, self.nms_threshold) # keep = nms(dets, args.nms_threshold,force_cpu=args.cpu) dets = dets[keep, :] landms = landms[keep] # keep top-K faster NMS # dets = dets[:args.keep_top_k, :] # landms = landms[:args.keep_top_k, :] dets = np.concatenate((dets, landms), axis=1) bboxs = dets for box in bboxs: if box[4] > 0.7: image = cv2.cvtColor(np.asarray(image),cv2.COLOR_RGB2BGR) print(int(box[0]),int(box[1]),int(box[2]),int(box[3])) confidence = str(box[4]) image = cv2.rectangle(image,(int(box[0]),int(box[1])),(int(box[2]),int(box[3])),(0,255,0),2) return imagedef check_keys(model, pretrained_state_dict): ckpt_keys = set(pretrained_state_dict.keys()) model_keys = set(model.state_dict().keys()) used_pretrained_keys = model_keys & ckpt_keys unused_pretrained_keys = ckpt_keys - model_keys missing_keys = model_keys - ckpt_keys print('Missing keys:{}'.format(len(missing_keys))) print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys))) print('Used keys:{}'.format(len(used_pretrained_keys))) assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint' return Truedef remove_prefix(state_dict, prefix): ''' Old style model is stored with all names of parameters sharing common prefix 'module.' ''' print('remove prefix \'{}\''.format(prefix)) f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x return {f(key): value for key, value in state_dict.items()}def load_model(model, pretrained_path, load_to_cpu): print('Loading pretrained model from {}'.format(pretrained_path)) if load_to_cpu: pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage) else: device = torch.cuda.current_device() pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device)) if "state_dict" in pretrained_dict.keys(): pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.') else: pretrained_dict = remove_prefix(pretrained_dict, 'module.') check_keys(model, pretrained_dict) model.load_state_dict(pretrained_dict, strict=False) return model3.2开始预测测试图像可以自行修改import matplotlib.pyplot as pltRetinaface =ObjectDetect('./output/RetinaFace_Resnet50_Final.pth')filename = './WIDER_train/images/16--Award_Ceremony/16_Award_Ceremony_Awards_Ceremony_16_110.jpg'result = Retinaface.predict(filename)result = Image.fromarray(cv2.cvtColor(result,cv2.COLOR_BGR2RGB))plt.figure(figsize=(10,10)) #设置窗口大小plt.imshow(result)plt.show()
  • [技术干货] CenterNet-Hourglass (物体检测/Pytorch)
    CenterNet-Hourglass (物体检测/Pytorch)目标检测常采用Anchor的方法来获取物体可能存在的位置,再对该位置进行分类,这样的做法耗时、低效,同时需要后处理(比如NMS)。CenterNet将目标看成一个点,即目标bounding box的中心点,整个问题转变成了关键点估计问题,其他目标属性,比如尺寸、3D位置、方向和姿态等都以估计的中心点为基准进行参数回归。本案例是CenterNet-Hourglass论文复现的体验案例,此模型是对Objects as Points 中提出的CenterNet进行结果复现(原论文Table 2 最后一行)。本模型是以Hourglass网络架构作为backbone,以ExtremNet 作为预训练模型,在COCO数据集上进行50epochs的训练后得到的。本项目是基于原论文的官方代码进行针对ModelArts平台的修改来实现ModelArts上的训练与部署。 具体算法介绍:cid:link_0注意事项:1.本案例使用框架:PyTorch1.4.02.本案例使用硬件:GPU: 1*NVIDIA-V100NV32(32GB) | CPU: 8 核 64GB3.运行代码方法: 点击本页面顶部菜单栏的三角形运行按钮或按Ctrl+Enter键 运行每个方块中的代码4.JupyterLab的详细用法: 请参考《ModelAtrs JupyterLab使用指导》5.碰到问题的解决办法**:** 请参考《ModelAtrs JupyterLab常见问题解决办法》1.下载数据和代码运行下面代码,进行数据和代码的下载和解压本案例使用COCO数据集。import os #数据代码下载 !wget https://obs-aigallery-zc.obs.cn-north-4.myhuaweicloud.com/algorithm/CenterNet.zip # 解压缩 os.system('unzip CenterNet.zip -d ./')--2021-06-25 17:50:11-- https://obs-aigallery-zc.obs.cn-north-4.myhuaweicloud.com/algorithm/CenterNet.zip Resolving proxy-notebook.modelarts.com (proxy-notebook.modelarts.com)... 192.168.6.62 Connecting to proxy-notebook.modelarts.com (proxy-notebook.modelarts.com)|192.168.6.62|:8083... connected. Proxy request sent, awaiting response... 200 OK Length: 1529663572 (1.4G) [application/zip] Saving to: ‘CenterNet.zip’ CenterNet.zip 100%[===================>] 1.42G 279MB/s in 5.6s 2021-06-25 17:50:16 (261 MB/s) - ‘CenterNet.zip’ saved [1529663572/1529663572] 02.训练2.1依赖库加载和安装from __future__ import absolute_import from __future__ import division from __future__ import print_function root_path = './CenterNet/' os.chdir(root_path) os.system('pip install pycocotools') import _init_paths import torch import torch.utils.data from opts import opts from models.model import create_model, load_model, save_model from models.data_parallel import DataParallel from logger import Logger from datasets.dataset_factory import get_dataset from trains.train_factory import train_factory from evaluation import test, prefetch_test, image_infer USE_MODELARTS = TrueINFO:root:Using MoXing-v2.0.0.rc0-19e4d3ab INFO:root:Using OBS-Python-SDK-3.20.9.1 NMS not imported! If you need it, do cd $CenterNet_ROOT/src/lib/external make2.2训练函数def main(opt): torch.manual_seed(opt.seed) torch.backends.cudnn.benchmark = not opt.not_cuda_benchmark and not opt.test Dataset = get_dataset(opt.dataset, opt.task) opt = opts().update_dataset_info_and_set_heads(opt, Dataset) logger = Logger(opt) os.environ['CUDA_VISIBLE_DEVICES'] = opt.gpus_str opt.device = torch.device('cuda' if opt.gpus[0] >= 0 else 'cpu') print('Creating model...') model = create_model(opt.arch, opt.heads, opt.head_conv) optimizer = torch.optim.Adam(model.parameters(), opt.lr) start_epoch = 0 if opt.load_model != '': model, optimizer, start_epoch = load_model( model, opt.load_model, optimizer, opt.resume, opt.lr, opt.lr_step) Trainer = train_factory[opt.task] trainer = Trainer(opt, model, optimizer) trainer.set_device(opt.gpus, opt.chunk_sizes, opt.device) print('Setting up data...') train_loader = torch.utils.data.DataLoader( Dataset(opt, 'train'), batch_size=opt.batch_size, shuffle=True, num_workers=opt.num_workers, pin_memory=True, drop_last=True ) print('Starting training...') best = 1e10 for epoch in range(start_epoch + 1, opt.num_epochs + 1): mark = epoch if opt.save_all else 'last' log_dict_train, _ = trainer.train(epoch, train_loader) logger.write('epoch: {} |'.format(epoch)) for k, v in log_dict_train.items(): logger.scalar_summary('train_{}'.format(k), v, epoch) logger.write('{} {:8f} | '.format(k, v)) save_model(os.path.join(opt.save_dir, 'model_last.pth'), epoch, model) logger.write('\n') if epoch in opt.lr_step: save_model(os.path.join(opt.save_dir, 'model_{}.pth'.format(epoch)), epoch, model, optimizer) lr = opt.lr * (0.1 ** (opt.lr_step.index(epoch) + 1)) print('Drop LR to', lr) for param_group in optimizer.param_groups: param_group['lr'] = lr logger.close()2.3开始训练训练需要一点时间,请耐心等待if __name__ == '__main__': opt = opts().parse() if USE_MODELARTS: pwd = os.getcwd() print('Copying dataset to work space...') print('Listing directory: ') print(os.listdir()) if not os.path.exists(opt.save_dir): os.makedirs(opt.save_dir) main(opt) if USE_MODELARTS: print("Processing model checkpoints & service config for deployment...") if not opt.eval: infer_dir = os.path.join(opt.save_dir, 'model') os.makedirs(infer_dir) os.system(f'mv ./trained_model/* {infer_dir}') pretrained_pth = os.path.join(infer_dir, '*.pth') ckpt_dir = os.path.join(opt.save_dir, 'checkpoints') os.makedirs(ckpt_dir) os.system(f'mv {pretrained_pth} {ckpt_dir}') pth_files = os.path.join(opt.save_dir, '*.pth') infer_pth = os.path.join(ckpt_dir, f'{opt.model_deploy}.pth') os.system(f'mv {pth_files} {ckpt_dir}') os.system(f'mv {infer_pth} {infer_dir}') print(os.listdir(opt.save_dir)) print("ModelArts post-training work is done!")Fix size testing. training chunk_sizes: [8] The output will be saved to ./output/exp/ctdet/default Copying dataset to work space... Listing directory: ['pre-trained_weights', '.ipynb_checkpoints', 'coco_eval.py', 'train.py', 'coco', 'output', 'training_logs', 'trained_model', '_init_paths.py', '__pycache__', 'coco_classes.py', 'lib', 'evaluation.py'] heads {'hm': 80, 'wh': 2, 'reg': 2} Creating model... loaded ./trained_model/epoch_50_mAP_42.7.pth, epoch 50 Setting up data... ==> initializing coco 2017 train data. loading annotations into memory... Done (t=0.54s) creating index... index created! Loaded train 5000 samples Starting training... /home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) ctdet/default| train: [1][0/625] |loss 1.7568 |hm_loss 1.3771 |wh_loss 1.9394 |off_loss 0.1857 |Data 0.384s (0.384s) |Net 5.019s (5.019s) ctdet/default| train: [1][200/625] |loss 1.9275 |hm_loss 1.4429 |wh_loss 2.7269 |off_loss 0.2119 |Data 0.001s (0.003s) |Net 0.759s (0.779s) ctdet/default| train: [1][400/625] |loss 1.9290 |hm_loss 1.4430 |wh_loss 2.7423 |off_loss 0.2118 |Data 0.001s (0.002s) |Net 0.760s (0.770s) ctdet/default| train: [1][600/625] |loss 1.9276 |hm_loss 1.4397 |wh_loss 2.7623 |off_loss 0.2117 |Data 0.001s (0.002s) |Net 0.765s (0.767s) Processing model checkpoints & service config for deployment... ['model', 'logs_2021-06-25-17-51', 'opt.txt', 'checkpoints'] ModelArts post-training work is done!3.模型测试3.1推理函数# -*- coding: utf-8 -*- # TODO 添加模型运行需要导入的模块 import os import torch import numpy as np from PIL import Image from io import BytesIO from collections import OrderedDict import cv2 import sys sys.path.insert(0, './lib') from opts import opts from coco_classes import coco_class_map from detectors.detector_factory import detector_factory class ModelClass(): def __init__(self, model_path): self.model_path = model_path # 本行代码必须保留,且无需修改 self.opt = opts().parse() self.opt.num_classes = 80 self.opt.resume = True self.opt.keep_res = True self.opt.fix_res = False self.opt.heads = {'hm': 80, 'wh': 2, 'reg': 2} self.opt.load_model = model_path self.opt.mean = np.array([0.40789654, 0.44719302, 0.47026115], dtype=np.float32).reshape(1, 1, 3) self.opt.std = np.array([0.28863828, 0.27408164, 0.27809835], dtype=np.float32).reshape(1, 1, 3) self.opt.batch_infer = False # configurable varibales: if 'BATCH_INFER' in os.environ: print('Batch inference mode!') self.opt.batch_infer = True if 'FLIP_TEST' in os.environ: print('Flip test!') self.opt.flip_test = True if 'MULTI_SCALE' in os.environ: print('Multi scale!') self.opt.test_scales = [0.5,0.75,1,1.25,1.5] self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") if not torch.cuda.is_available(): self.opt.gpus = [-1] self.class_map = coco_class_map() torch.set_grad_enabled(False) Detector = detector_factory[self.opt.task] self.detector = Detector(self.opt) print('load model success') def predict(self, file_name): image = Image.open(file_name).convert('RGB') img = np.array(image) img = img[:, :, ::-1] results = self.detector.run(img)['results'] image = cv2.cvtColor(np.asarray(image),cv2.COLOR_RGB2BGR) if not self.opt.batch_infer: for c_id, dets in results.items(): for det in dets: if det[4] > self.opt.vis_thresh: scores = str(round(float(det[4]), 4)) classes = self.class_map[c_id] image = cv2.rectangle(image,(int(det[0]),int(det[1])),(int(det[2]),int(det[3])),(0,255,0),2) image = cv2.putText(image,classes+':'+scores,(int(det[0]),int(det[1])),cv2.FONT_HERSHEY_SIMPLEX,0.7,(0,0,255),2) else: for c_id, dets in results.items(): for det in dets: scores = str(round(float(det[4]), 4)) classes = self.class_map[c_id] image = cv2.rectangle(image,(int(det[0]),int(det[1])),(int(det[2]),int(det[3])),(0,255,0),2) image = cv2.putText(image,classes+':'+scores,(int(det[0]),int(det[1])),cv2.FONT_HERSHEY_SIMPLEX,0.5,(0,0,255),2) return image3.2开始推理可以自行修改预测的图像路径if __name__ == '__main__': import matplotlib.pyplot as plt img_path = './coco/train/000000021903.jpg' model_path = './output/exp/ctdet/default/model/model_last.pth' #模型的保存路径,你可以自己找一下 # 以下代码无需修改 my_model = ModelClass(model_path) result = my_model.predict(img_path) result = Image.fromarray(cv2.cvtColor(result,cv2.COLOR_BGR2RGB)) plt.figure(figsize=(10,10)) #设置窗口大小 plt.imshow(result) plt.show()Fix size testing. training chunk_sizes: [8] The output will be saved to ./output/exp/ctdet/default Creating model... loaded ./output/exp/ctdet/default/model/model_last.pth, epoch 1 load model success
  • [应用开发] onnx模型转换om失败
    我在用点云的onnx模型向om模型转换的时候,转换报错:Input shape digit should be -1 or greater than 0。我看文档说onnx模型转换shape是NCHW,而我的onnx模型是(15000,5),因为是点云模型,不需要NCHW,请问这种情况该怎么处理?
  • [技术干货] pix2pix图像风格迁移
    pix2pix论文链接: cid:link_3图像处理的很多问题都是将一张输入的图片转变为一张对应的输出图片,比如灰度图、梯度图、彩色图之间的转换等。通常每一种问题都使用特定的算法(如:使用CNN来解决图像转换问题时,要根据每个问题设定一个特定的loss function 来让CNN去优化,而一般的方法都是训练CNN去缩小输入跟输出的欧氏距离,但这样通常会得到比较模糊的输出)。这些方法的本质其实都是从像素到像素的映射。于是论文在GAN的基础上提出一个通用的方法:pix2pix 来解决这一类问题。通过pix2pix来完成成对的图像转换(Labels to Street Scene, Aerial to Map,Day to Night等),可以得到比较清晰的结果。注意事项:使用框架**:** PyTorch1.4.0使用硬件**:** 8 vCPU + 64 GiB + 1 x Tesla V100-PCIE-32GB运行代码方法**:** 点击本页面顶部菜单栏的三角形运行按钮或按Ctrl+Enter键 运行每个方块中的代码JupyterLab的详细用法**:** 请参考《ModelAtrs JupyterLab使用指导》碰到问题的解决办法**:** 请参考《ModelAtrs JupyterLab常见问题解决办法》1.下载代码和数据集运行下面代码,进行数据和代码的下载和解压缩使用facades数据集,数据位于pix2pix/datasets/facades/中import os数据代码下载!wget cid:link_0解压缩os.system('unzip pix2pix.zip -d ./') os.chdir('./pix2pix')2.训练2.1安装依赖库!pip install -r requirements.txt2.2开始训练训练参数可以在pix2pix/options/train_options.py中查看和修改 如果使用其他数据集,需要修改数据路径 模型命名为facades_pix2pix!python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA----------------- Options --------------- batch_size: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False crop_size: 256 dataroot: ./datasets/facades [default: None] dataset_mode: aligned direction: BtoA [default: AtoB] ----------------------------------------------- /home/ma-user/anaconda3/envs/PyTorch-1.4/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:122: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) learning rate 0.0002000 -> 0.0002000 (epoch: 1, iters: 100, time: 0.041, data: 0.195) (epoch: 1, iters: 200, time: 0.041, data: 0.001) End of epoch 10 / 10 Time Taken: 14 sec3.测试查看刚才生成的模型facades_pix2pix是否已经生成,如果生成则会在checkpoints文件下!ls checkpoints/facades_label2photo_pretrained facades_pix2pix用训练生成的模型facades_pix2pix进行测试!python test.py --dataroot ./datasets/facades --direction BtoA --model pix2pix --name facades_pix2pix展示测试结果可以在./results/facades_pix2pix/文件下看到测试生成的结果import matplotlib.pyplot as plt img = plt.imread('./results/facades_pix2pix/test_latest/images/100_fake_B.png') plt.imshow(img)<matplotlib.image.AxesImage at 0x7ff680a8cd50>img = plt.imread('./results/facades_pix2pix/test_latest/images/100_real_A.png') plt.imshow(img)<matplotlib.image.AxesImage at 0x7ff680524090>img = plt.imread('./results/facades_pix2pix/test_latest/images/100_real_B.png') plt.imshow(img)<matplotlib.image.AxesImage at 0x7ff6613ceb90>
  • [技术干货] 使用SAC算法控制倒立摆
    使用SAC算法控制倒立摆-作业欢迎你将完成的作业分享到 AI Gallery Notebook 版块获得成长值,分享方法请查看此文档。题目描述请你调整步骤2中的训练参数,重新训练一个模型,使它在游戏中获得更好的表现提示:请在下文中搜索“# 请在此处实现代码”,注释所在之处就是你需要修改代码的地方;修改好代码之后,跑通整个案例代码,即可完成作业,请将完成的作业分享到AI Gallery,标题以“2021实战营”为开头命名;代码实现1. 程序初始化第1步:安装基础依赖!pip install gym pybullet第2步:导入相关的库import time import random import itertools import gym import numpy as np import torch import torch.nn as nn import torch.nn.functional as F from torch.optim import Adam from torch.distributions import Normal import pybullet_envs2. 训练参数初始化本案例设置的 num_steps = 30000,可以达到200分,训练耗时约5分钟。# 请在此处实现代码3. 定义SAC算法第1步:定义Q网络,Q1和Q2,结构相同,为[256,256,256]的全连接层# Initialize Policy weights def weights_init_(m): if isinstance(m, nn.Linear): torch.nn.init.xavier_uniform_(m.weight, gain=1) torch.nn.init.constant_(m.bias, 0) class QNetwork(nn.Module): def __init__(self, num_inputs, num_actions): super(QNetwork, self).__init__() # Q1 architecture self.linear1 = nn.Linear(num_inputs + num_actions, 256) self.linear2 = nn.Linear(256, 256) self.linear3 = nn.Linear(256, 1) # Q2 architecture self.linear4 = nn.Linear(num_inputs + num_actions, 256) self.linear5 = nn.Linear(256, 256) self.linear6 = nn.Linear(256, 1) self.apply(weights_init_) def forward(self, state, action): xu = torch.cat([state, action], 1) x1 = F.relu(self.linear1(xu)) x1 = F.relu(self.linear2(x1)) x1 = self.linear3(x1) x2 = F.relu(self.linear4(xu)) x2 = F.relu(self.linear5(x2)) x2 = self.linear6(x2) return x1, x2第2步:Policy网络,采用高斯分布,两层[256,256]全连接+均值+标准差class GaussianPolicy(nn.Module): def __init__(self, num_inputs, num_actions, action_space=None): super(GaussianPolicy, self).__init__() self.linear1 = nn.Linear(num_inputs, 256) self.linear2 = nn.Linear(256, 256) self.mean_linear = nn.Linear(256, num_actions) self.log_std_linear = nn.Linear(256, num_actions) self.apply(weights_init_) # action rescaling if action_space is None: self.action_scale = torch.tensor(1.) self.action_bias = torch.tensor(0.) else: self.action_scale = torch.FloatTensor( (action_space.high - action_space.low) / 2.) self.action_bias = torch.FloatTensor( (action_space.high + action_space.low) / 2.) def forward(self, state): x = F.relu(self.linear1(state)) x = F.relu(self.linear2(x)) mean = self.mean_linear(x) log_std = self.log_std_linear(x) log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX) return mean, log_std def sample(self, state): mean, log_std = self.forward(state) std = log_std.exp() normal = Normal(mean, std) # 重参数化技巧 (mean + std * N(0,1)) x_t = normal.rsample() y_t = torch.tanh(x_t) action = y_t * self.action_scale + self.action_bias log_prob = normal.log_prob(x_t) log_prob -= torch.log(self.action_scale * (1 - y_t.pow(2)) + epsilon) log_prob = log_prob.sum(1, keepdim=True) mean = torch.tanh(mean) * self.action_scale + self.action_bias return action, log_prob, mean def to(self, device): self.action_scale = self.action_scale.to(device) self.action_bias = self.action_bias.to(device) return super(GaussianPolicy, self).to(device)第3步: 定义sac训练部分class SAC(object): def __init__(self, num_inputs, action_space): self.alpha = alpha self.auto_entropy = auto_entropy self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # critic网络 self.critic = QNetwork(num_inputs, action_space.shape[0]).to(device=self.device) self.critic_optim = Adam(self.critic.parameters(), lr=lr) # critic_target网络 self.critic_target = QNetwork(num_inputs, action_space.shape[0]).to(self.device) hard_update(self.critic_target, self.critic) # Target Entropy = −dim(A) if auto_entropy is True: self.target_entropy = -torch.prod(torch.Tensor(action_space.shape).to(self.device)).item() self.log_alpha = torch.zeros(1, requires_grad=True, device=self.device) self.alpha_optim = Adam([self.log_alpha], lr=lr) self.policy = GaussianPolicy(num_inputs, action_space.shape[0], action_space).to(self.device) self.policy_optim = Adam(self.policy.parameters(), lr=lr) def select_action(self, state): state = torch.FloatTensor(state).to(self.device).unsqueeze(0) action, _, _ = self.policy.sample(state) return action.detach().cpu().numpy()[0] def update_parameters(self, memory, batch_size, updates): # Sample a batch from memory state_batch, action_batch, reward_batch, next_state_batch, mask_batch = memory.sample(batch_size=batch_size) state_batch = torch.FloatTensor(state_batch).to(self.device) next_state_batch = torch.FloatTensor(next_state_batch).to(self.device) action_batch = torch.FloatTensor(action_batch).to(self.device) reward_batch = torch.FloatTensor(reward_batch).to(self.device).unsqueeze(1) mask_batch = torch.FloatTensor(mask_batch).to(self.device).unsqueeze(1) with torch.no_grad(): # 经过policy_network得到action next_state_action, next_state_log_pi, _ = self.policy.sample(next_state_batch) # 输入next_state,和next_action,经过target_critic_network得到Q值 qf1_next_target, qf2_next_target = self.critic_target(next_state_batch, next_state_action) min_qf_next_target = torch.min(qf1_next_target, qf2_next_target) - self.alpha * next_state_log_pi next_q_value = reward_batch + mask_batch * gamma * (min_qf_next_target) # 将当前state,action输入critic_network得到Q值 qf1, qf2 = self.critic(state_batch, action_batch) # JQ = 𝔼(st,at)~D[0.5(Q1(st,at) - r(st,at) - γ(𝔼st+1~p[V(st+1)]))^2] qf1_loss = F.mse_loss(qf1, next_q_value) # JQ = 𝔼(st,at)~D[0.5(Q1(st,at) - r(st,at) - γ(𝔼st+1~p[V(st+1)]))^2] qf2_loss = F.mse_loss(qf2, next_q_value) qf_loss = qf1_loss + qf2_loss self.critic_optim.zero_grad() qf_loss.backward() self.critic_optim.step() pi, log_pi, _ = self.policy.sample(state_batch) qf1_pi, qf2_pi = self.critic(state_batch, pi) min_qf_pi = torch.min(qf1_pi, qf2_pi) # Jπ = 𝔼st∼D,εt∼N[α * logπ(f(εt;st)|st) − Q(st,f(εt;st))] policy_loss = ((self.alpha * log_pi) - min_qf_pi).mean() self.policy_optim.zero_grad() policy_loss.backward() self.policy_optim.step() if self.auto_entropy: alpha_loss = -(self.log_alpha * (log_pi + self.target_entropy).detach()).mean() self.alpha_optim.zero_grad() alpha_loss.backward() self.alpha_optim.step() self.alpha = self.log_alpha.exp() else: alpha_loss = torch.tensor(0.).to(self.device) if updates % target_update_interval == 0: soft_update(self.critic_target, self.critic, tau) def soft_update(target, source, tau): for target_param, param in zip(target.parameters(), source.parameters()): target_param.data.copy_(target_param.data * (1.0 - tau) + param.data * tau) def hard_update(target, source): for target_param, param in zip(target.parameters(), source.parameters()): target_param.data.copy_(param.data)第4步:定义replay buffer,存储[s,a,r,s_,done]class ReplayMemory: def __init__(self, capacity): random.seed(seed) self.capacity = capacity self.buffer = [] self.position = 0 def push(self, state, action, reward, next_state, done): if len(self.buffer) < self.capacity: self.buffer.append(None) self.buffer[self.position] = (state, action, reward, next_state, done) self.position = (self.position + 1) % self.capacity def sample(self, batch_size): batch = random.sample(self.buffer, batch_size) state, action, reward, next_state, done = map(np.stack, zip(*batch)) return state, action, reward, next_state, done def __len__(self): return len(self.buffer)4. 训练模型初始化环境和算法# 创建环境 env = gym.make(env_name) # 设置随机数 env.seed(seed) env.action_space.seed(seed) torch.manual_seed(seed) np.random.seed(seed) # 创建agent agent = SAC(env.observation_space.shape[0], env.action_space) # replay buffer memory = ReplayMemory(replay_size) # 训练步数记录 total_numsteps = 0 updates = 0 max_reward = 0开始训练print('\ntraining...') begin_t = time.time() for i_episode in itertools.count(1): episode_reward = 0 episode_steps = 0 done = False state = env.reset() while not done: if start_steps > total_numsteps: # 随机采样过程 action = env.action_space.sample() else: # 根据策略采样 action = agent.select_action(state) if len(memory) > batch_size: # 每个step更新次数 for i in range(updates_per_step): agent.update_parameters(memory, batch_size, updates) updates += 1 # 执行该步 next_state, reward, done, _ = env.step(action) # 更新记录参数 episode_steps += 1 total_numsteps += 1 episode_reward += reward # -done mask = 1 if episode_steps == env._max_episode_steps else float(not done) # 存入buffer memory.push(state, action, reward, next_state, mask) # 更新state state = next_state # 达到终止条件后,停止 if total_numsteps > num_steps: break if episode_reward >= max_reward: max_reward = episode_reward print("current_max_reward {}".format(max_reward)) # 保存模型 torch.save(agent.policy, "model.pt") print("Episode: {}, total numsteps: {}, reward: {}".format(i_episode, total_numsteps,round(episode_reward, 2))) env.close() print("finish! time cost is {}s".format(time.time() - begin_t))5. 使用模型推理游戏由于本内核可视化依赖于OpenGL,需要窗口显示,但当前环境暂不支持,因此无法可视化,请将代码下载到本地,取消 env.render() 这行代码的注释,可查看可视化效果。# 可视化部分 model = torch.load("model.pt") model.eval() device = torch.device("cuda" if torch.cuda.is_available() else "cpu") state = env.reset() # env.render() done = False episode_reward = 0 while not done: _, _, action = model.sample(torch.FloatTensor(state).to(device).unsqueeze(0)) action = action.detach().cpu().numpy()[0] next_state, reward, done, _ = env.step(action) episode_reward += reward # env.render() state = next_state print(episode_reward)可视化效果如下:
  • [MindX SDK] 伪装目标分割参考设计案例
    MindX SDK -- 伪装目标分割参考设计案例1 案例概述1.1 概要描述在本系统中,目的是基于MindX SDK,在华为云昇腾平台上,开发端到端伪装目标分割的参考设计,实现对图像中的伪装目标进行识别检测的功能,达到功能要求1.2 模型介绍本项目主要基于用于通用伪装目标分割任务的DGNet模型模型的具体描述和细节可以参考原文:cid:link_4具体实现细节可以参考基于PyTorch深度学习框架的代码:cid:link_3所使用的公开数据集是NC4K,可以在此处下载:cid:link_1所使用的模型是EfficientNet-B4版本的DGNet模型,原始的PyTorch模型文件可以在此处下载:cid:link_01.3 实现流程基础环境:Ascend 310、mxVision、Ascend-CANN-toolkit、Ascend Driver模型转换:将ONNX模型(.onnx)转换为昇腾离线模型(.om)昇腾离线模型推理流程代码开发1.4 代码地址本项目的代码地址为:cid:link_21.5 特性及适用场景本项目适用于自然场景下图片完整清晰、无模糊鬼影的场景,并且建议输入图片为JPEG编码格式,大小不超过10M。注意:由于模型限制,本项目暂只支持自然场景下伪装动物的检测,不能用于其他用途2 软件方案介绍2.1 项目方案架构介绍本系统设计了不同的功能模块。主要流程为:图片传入流中,利用DGNet检测模型检测伪装目标,将检测出的伪装目标以逐像素概率图的形式输出,各模块功能描述如表2.1所示:表2.1 系统方案中各模块功能:序号子系统功能描述1图像输入调用cv2中的图片加载函数,用于加载输入图片2图像前处理将输入图片放缩到352*352大小,并执行归一化操作3伪装目标检测利用DGNet检测模型,检测出图片中的伪装目标4数据分发将DGNet模型检测到的逐像素概率图进行数据分发到下个插件5结果输出将伪装目标概率预测图结果进行输出并保存2.2 代码目录结构与说明本工程名称为DGNet,工程目录如下列表所示:./ ├── assets # 文件 │ ├── 74.jpg │ └── 74.png ├── data # 数据集存放路径 │ └── NC4K ├── inference_om.py # 昇腾离线模型推理python脚本文件 ├── README.md # 本文件 ├── seg_results_om │ ├── Exp-DGNet-OM # 预测结果图存放路径 ├── snapshots │ ├── DGNet # 模型文件存放路径3 开发准备3.1 环境依赖说明环境依赖软件和版本如下表:软件名称版本ubantu18.04.1 LTSMindX SDK2.0.4Python3.9.2CANN5.0.4numpy1.21.2opencv-python4.5.3.56mindspore (cpu)1.9.03.2 环境搭建在编译运行项目前,需要设置环境变量# MindXSDK 环境变量: . ${SDK-path}/set_env.sh # CANN 环境变量: . ${ascend-toolkit-path}/set_env.sh # 环境变量介绍 SDK-path: SDK mxVision 安装路径 ascend-toolkit-path: CANN 安装路径3.3 模型转换步骤1 下载DGNet (Efficient-B4) 的ONNX模型:cid:link_0步骤2 将下载获取到的DGNet模型onxx文件存放至./snapshots/DGNet/DGNet.onnx。步骤3 模型转换具体步骤# 进入对应目录 cd ./snapshots/DGNet/ # 执行以下命令将ONNX模型(.onnx)转换为昇腾离线模型(.om) atc --framework=5 --model=DGNet.onnx --output=DGNet --input_shape="image:1,3,352,352" --log=debug --soc_version=Ascend310执行完模型转换脚本后,会在对应目录中获取到如下转化模型:DGNet.om(本项目中在Ascend平台上所使用的离线模型文件)。4 推理与评测示例步骤如下:步骤0参考1.2节中说明下载一份测试数据集合:下载链接:cid:link_1步骤1执行离线推理Python脚本python inference_om.py --om_path ./snapshots/DGNet/DGNet.om --save_path ./seg_results_om/Exp-DGNet-OM/NC4K/ --data_path ./data/NC4K/Imgs 步骤2定量性能验证:使用原始GitHub仓库中提供的标准评测代码进行测评,具体操作步骤如下:# 拉取原始仓库 git clone https://github.com/GewelsJI/DGNet.git # 将如下两个文件夹放入当前 mv ./DGNet/lib_ascend/eval ./contrib/CamouflagedObjectDetection/ mv ./DGNet/lib_ascend/evaluation.py ./contrib/CamouflagedObjectDetection/ # 运行如下命令进行测评 python evaluation.py然后可以生成评测指标数值表格。可以看出DGNet模型的Smeasure指标数值为0.856,已经超过了项目交付中提到的“大于0.84”的要求。+---------+-----------------------+----------+-----------+-------+-------+--------+-------+-------+--------+-------+ | Dataset | Method | Smeasure | wFmeasure | MAE | adpEm | meanEm | maxEm | adpFm | meanFm | maxFm | +---------+-----------------------+----------+-----------+-------+-------+--------+-------+-------+--------+-------+ | NC4K | Exp-DGNet-OM | 0.856 | 0.782 | 0.043 | 0.909 | 0.91 | 0.921 | 0.8 | 0.812 | 0.833 | +---------+-----------------------+----------+-----------+-------+-------+--------+-------+-------+--------+-------+定性性能验证:输入伪装图片:预测分割结果:5 参考引用主要参考为如下三篇论文:@article{ji2022gradient, title={Deep Gradient Learning for Efficient Camouflaged Object Detection}, author={Ji, Ge-Peng and Fan, Deng-Ping and Chou, Yu-Cheng and Dai, Dengxin and Liniger, Alexander and Van Gool, Luc}, journal={Machine Intelligence Research}, year={2023} } @article{fan2021concealed, title={Concealed Object Detection}, author={Fan, Deng-Ping and Ji, Ge-Peng and Cheng, Ming-Ming and Shao, Ling}, journal={IEEE TPAMI}, year={2022} } @inproceedings{fan2020camouflaged, title={Camouflaged object detection}, author={Fan, Deng-Ping and Ji, Ge-Peng and Sun, Guolei and Cheng, Ming-Ming and Shen, Jianbing and Shao, Ling}, booktitle={IEEE CVPR}, pages={2777--2787}, year={2020} }
  • [API使用] MS 中有没有和 torch 中的 view 算子类似的 inplace 的算子
    MS 提供了 view 算子,但其似乎和 torch 中的 view 算子不太一样。torch 中的 view 算子是 inplace 的,view 操作后的 tensor 和原tensor 在底层仍是同一份数据。MS 中的 view 算子则会复制一份原tensor的数据,因此 view 操作后的 tensor 和原 tensor 就是两份不一样的数据了。那 MS 中有没有类似的 inplace 的 view 算子呢?
  • [API使用] ResizeBilinear与torch.nn.Upsample输出差异情况下,如何实现对torch.nn.Upsample的ms算子转化
    现业务需要基于ms实现对torch版本的复现,并将torch训练的模型权重导入ms,发现upsample算子在输入一致的情况下输出不同,请问ms中如何实现torch版本的upsample操作?
  • [问题求助] 手动编译PyTorch,运行失败,报undefined symbol错误
    这几天想升级容器中的PyTorch版本(1.5->1.8),Gitee上Ascend提供了PyTorch1.8的编译安装教程,但我尝试多次均以失败告终,具体情况是import torch正常,但import torch_npu报错cdws@dev-614175bc-d1c1-4701-8999-a48965cd902c-d7q2z:~/userdata/Downloads/pytorch/test/test_network_ops$ pythonPython 3.7.5 (default, Aug 13 2021, 09:52:41)[GCC 7.5.0] on linuxType "help", "copyright", "credits" or "license" for more information.>>> import torch>>> import torch_npuTraceback (most recent call last):File "", line 1, inFile "/home/cdws/.local/lib/python3.7/site-packages/torch_npu/__init__.py", line 26, inimport torch_npu.npuFile "/home/cdws/.local/lib/python3.7/site-packages/torch_npu/npu/__init__.py", line 42, infrom .utils import (is_initialized, _lazy_call, _lazy_init, init, set_dump,File "/home/cdws/.local/lib/python3.7/site-packages/torch_npu/npu/utils.py", line 27, inimport torch_npu._CImportError: /usr/local/python3.7.5/lib/python3.7/site-packages/torch/lib/libc10_npu.so: undefined symbol: _ZN3c107Warning4warnENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE其他信息请见本人提的issue:安装PyTorch1.8.1后,导入torch_npu报错,报undefined symbol错误 · Issue #I5U4Z2 · Ascend/pytorch - Gitee.com