您对华为云开发者网站的整体评价?

非常不满意 非常满意

0

1

2

3

4

5

6

7

8

9

10

*您遇到了哪些问题?(最多选三项)
*您感到满意的原因是?(最多选三项)
*请针对您所遇到的问题给出具体的反馈
200/200
http://huaweicloud.ai
学AI,就上华为云 AI Gallery

华为云 AI 经典论文复现

华为云AI经典论文复现

活动规则

活动对象:华为云电销客户及渠道伙伴客户可参与消费满送活动,其他客户参与前请咨询客户经理

活动时间: 2020年8月12日-2020年9月11日

活动期间,华为云用户通过活动页面购买云服务,或使用上云礼包优惠券在华为云官网新购云服务,累计新购实付付费金额达到一定额度,可兑换相应的实物礼品。活动优惠券可在本活动页面中“上云礼包”等方式获取,在华为云官网直接购买(未使用年中云钜惠活动优惠券)或参与其他活动的订单付费金额不计入统计范围内;

活动对象:华为云电销客户及渠道伙伴客户可参与消费满送活动,其他客户参与前请咨询客户经理

华为云 AI 论文精读会2021

高效语义分割模型Fast-SCNN分享

2021/04/10 19:00:00

分享人:历天一

华为云MVP、华为云云享专家、昇腾HAE、2020年度华为云开发者社区风云人物

d-vector用于说话人变化点检测的论文解析与复现

2021/04/25 10:00:00

分享人:蒋士强 李朝阳

蒋士强,哈尔滨工业大学控制科学与工程系硕士研究生,研究方向为惯性导航、细粒度情感分析,知识图谱相关

李朝阳,北京理工大学模式识别与智能系统专业硕士研究生,研究方向为目标检测、基于机器学习的智能体决策

目标检测模型CrowdDet解读分享

2021/05/06 10:00:00

分享人:徐继晟

上海交通大学致远工科荣誉计划自动化系,曾参与国家级大创项目<<基于深度学习的癌症筛查样本中细菌污染物检测与分割>>,担任上海市级项目<<基于强化学习的RoboCup小型足球机器人算法设计与实现>>的项目负责人

Dynamic RCNN:一种有效提升RCNN系列网络表现的动态训练方法

2021/05/13 19:00:00

分享人:白璐斌

北京大学智能科学与技术专业硕士研究生,研究方向为图像处理与深度学习,热爱新事物与黑科技

基于迁移学习的语义分割算法分享

2021/05/24 10:00:00

分享人:刘亚豪

电子科技大学计算机硕士,研究方向为迁移学习和语义分割


知识蒸馏模型TinyBert解读分享

2021/06/02 19:00:00

分享人:李朝阳

西北工业大学软件学院软件工程硕士,研究方向为脑机交互

目标检测算法TSD解耦分类与回归

2021/06/11 10:00:00

分享人:江扬舟

上海交通大学电子系图像所硕士研究生,研究方向为对抗攻击问题,在NeurIPS 2020发表论文一篇。在IJCAI-19人工智能对抗算法竞赛中排名3/2000+;在Riiid AIEd Challenge 2020中排名5/3000+

探讨文字识别中的语言模型

2021/06/24 19:00:00

分享人:王裕鑫

中国科学技术大学信息与通信技术博士,研究方向为ocr、图像分割、目标检测。在CVPR,IJCAI,TMM刊物以一作发表多篇论文

多卡昇腾环境实现MindSpore DenseNet 分类训练

2021/07/05 10:00:00

分享人:周一锋

苏州大学计算机软件技术专业,热爱AI深度学习。昇腾模型王者挑战赛青铜赛段和白银赛段,荣获一等奖和二等奖


Tab1

告别互信息:变分蒸馏的跨模态行人重识别

2021/07/23 19:00:00

分享人:田旭东

华东师范大学计算机博士,研究方向为机器学习,信息论,以及行人重识别。目前已在CVPR、IJCAI各发表一篇论文

基于转移的语义Parser分享

2021/07/27 19:00:00

分享人:刘泽洋

苏州大学自然语言处理(NLP)硕士研究生,研究方向为面向表格知识库问答语义解析;数据标注系统设计和开发

阅读理解模型AoA Reader解读分享

分享人:范睿

中国科学技术大学计算机科学与技术学院计算机专业,研究方向为推荐系统


图像风格处理算法的实操分享

2021/08/18 19:00:00

分享人:Eric

华为云ModelArts开发人员,美国Tufts University计算机工程硕士研究生

图像分割模型PointRend解读分享

2021/08/24 10:00:00

分享人:焦文科

武汉大学模式识别与智能系统专业硕士研究生,研究方向为遥感影像的语义分割

M-SQL:一种将自然语言转换为SQL语句的多任务表示学习方法

2021/09/02 10:00:00

分享人:潘名扬

哈尔滨工业大学社会计算与信息检索研究中心研究生,研究方向为自然语言处理、Text to SQL


精选论文

领域:
  • MindSpore
  • CV
  • NLP
  • 机器学习
  • 深度学习
  • 迁移学习
  • 半监督学习
  • 小样本学习
  • 知识图谱
  • 二阶优化
  • 可解释AI
  • 图神经网络
  • 网络优化
  • GNN
  • 人机交互
  • 计算视觉
  • 目标检测
  • 行为检测/识别
  • 微体系结构
  • 语音语义
  • 深度神经网络
  • 推荐系统
  • 生成对抗网络
  • 其他
论文复现:
  • 已复现
  • 待复现

Fast-scnn: Fast Semantic Segmentation Network

The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above realtime semantic segmentation model on high resolution image data (1024 × 2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our ‘learning to downsample’ module which computes lowlevel features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

CV

Attention and language ensemble for scene text recognition with convolutional sequence modeling

Recent dominant approaches for scene text recognition are mainly based on convolutional neural network (CNN) and recurrent neural network (RNN), where the CNN processes images and the RNN generates character sequences. Different from these methods, we propose an attention-based architecture1 which is completely based on CNNs. The distinctive characteristics of our method include: (1) the method follows encoder-decoder architecture, in which the encoder is a two-dimensional residual CNN and the decoder is a deep one-dimensional CNN. (2) An attention module that captures visual cues, and a language module that models linguistic rules are designed equally in the decoder. Therefore the attention and language can be viewed as an ensemble to boost predictions jointly. (3) Instead of using a single loss from language aspect, multiple losses from attention and language are accumulated for training the networks in an end-to-end way. We conduct experiments on standard datasets for scene text recognition, including Street View Text, IIIT5K and ICDAR datasets. The experimental results show our CNN-based method has achieved state-of-the-art performance on several benchmark datasets, even without the use of RNN.

CV

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

Although two-stage object detectors have continuously advanced the state-of-the-art performance in recent years, the training process itself is far from crystal. In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance. For example, the fixed label assignment strategy and regression loss function cannot fit the distribution change of proposals and thus are harmful to training high quality detectors. Consequently, we propose Dynamic R-CNN to adjust the label assignment criteria (IoU threshold) and the shape of regression loss function (parameters of SmoothL1 Loss) automatically based on the statistics of proposals during training. This dynamic design makes better use of the training samples and pushes the detector to fit more high quality samples. Specifically, our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP90 on the MS COCO dataset with no extra overhead. Codes and models are available at https://github.com/hkzhang95/DynamicRCNN

CV 目标检测

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, PanopticDeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025 × 2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several topdown approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.

CV

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

CV

Revisiting the Sibling Head in Object Detector

The ``shared head for classification and localization’’ (sibling head), firstly denominated in Fast RCNN~ cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years. This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process, but this misalignment can be resolved by a very simple operator called task-aware spatial disentanglement (TSD). Considering the classification and regression, TSD decouples them from the spatial dimension by generating two disentangled proposals for them, which are estimated by the shared proposal. This is inspired by the natural insight that for one instance, the features in some salient area may have rich information for classification while these around the boundary may be good at bounding box regression. Surprisingly, this simple design can boost all backbones and models on both MS COCO and Google OpenImage consistently by ~3% mAP. Further, we propose a progressive constraint to enlarge the performance margin between the disentangled and the shared proposals, and gain ~1% more mAP. We show the algname{} breaks through the upper bound of nowadays single-model detector by a large margin (mAP 49.4 with ResNet-101, 51.2 with SENet154), and is the core model of our 1st place solution on the Google OpenImage Challenge 2019.

CV目标检测

HRNet:Deep High-Resolution Representation Learning for Human Pose Estimation

This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

CV

Detection in Crowded Scenes: One Proposal, Multiple Predictions

We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes. The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks. Equipped with new techniques such as EMD Loss and Set NMS, our detector can effectively handle the difficulty of detecting highly overlapped objects. On a FPN-Res50 baseline, our detector can obtain 4.9% AP gains on challenging CrowdHuman dataset and 1.0% MR−2 improvements on CityPersons dataset, without bells and whistles. Moreover, on less crowed datasets like COCO, our approach can still achieve moderate improvement, suggesting the proposed method is robust to crowdedness. Code and pre-trained models will be released at this https URL.

CV目标检测

PointRend: Image Segmentation as Rendering

We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network module: a module that performs point-based segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm. PointRend can be flexibly applied to both instance and semantic segmentation tasks by building on top of existing state-ofthe-art models. While many concrete implementations of the general idea are possible, we show that a simple design already achieves excellent results. Qualitatively, PointRend outputs crisp object boundaries in regions that are oversmoothed by previous methods. Quantitatively, PointRend yields significant gains on COCO and Cityscapes, for both instance and semantic segmentation. PointRend’s efficiency enables output resolutions that are otherwise impractical in terms of memory or computation compared to existing approaches. Code has been made available at https://github.com/facebookresearch/detectron2/tree/master/projects/PointRend.

CV
127
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 14
前往

加入社群,与达人交流学习!

  • 扫码添加小助手,加入华为云 AI 开发者社群

    回复“新手福利”,还有特别惊喜哦!