• [经验分享] MindStudio模型训练场景精度比对全流程和结果分析
    一、基于MindStudio模型精度比对介绍1.1 MindStudio介绍MindStudio是一套基于华为昇腾AI处理器开发的AI全栈开发平台,包括基于芯片的算子开发、以及自定义算子开发,同时还包括网络层的网络移植、优化和分析,另外在业务引擎层提供了可视化的AI引擎拖拽式编程服务,极大的降低了AI引擎的开发门槛。MindStudio工具中的功能框架如图1所示:图1 MindStudio功能框架MindStudio工具中的主要几个功能特性如下:工程管理:为开发人员提供创建工程、打开工程、关闭工程、删除工程、新增工程文件目录和属性设置等功能。SSH管理:为开发人员提供新增SSH连接、删除SSH连接、修改SSH连接、加密SSH密码和修改SSH密码保存方式等功能。应用开发:针对业务流程开发人员,MindStudio工具提供基于AscendCL(Ascend Computing Language)和集成MindX SDK的应用开发编程方式,编程后的编译、运行、结果显示等一站式服务让流程开发更加智能化,可以让开发者快速上手。自定义算子开发:提供了基于TBE和AI CPU的算子编程开发的集成开发环境,让不同平台下的算子移植更加便捷,适配昇腾AI处理器的速度更快。离线模型转换:训练好的第三方网络模型可以直接通过离线模型工具导入并转换成离线模型,并可一键式自动生成模型接口,方便开发者基于模型接口进行编程,同时也提供了离线模型的可视化功能。日志管理:MindStudio为昇腾AI处理器提供了覆盖全系统的日志收集与日志分析解决方案,提升运行时算法问题的定位效率。提供了统一形式的跨平台日志可视化分析能力及运行时诊断能力,提升日志分析系统的易用性。性能分析:MindStudio以图形界面呈现方式,实现针对主机和设备上多节点、多模块异构体系的高效、易用、可灵活扩展的系统化性能分析,以及针对昇腾AI处理器的性能和功耗的同步分析,满足算法优化对系统性能分析的需求。设备管理:MindStudio提供设备管理工具,实现对连接到主机上的设备的管理功能。精度比对:可以用来比对自有模型算子的运算结果与Caffe、TensorFlow、ONNX标准算子的运算结果,以便用来确认神经网络运算误差发生的原因。开发工具包的安装与管理:为开发者提供基于昇腾AI处理器的相关算法开发套件包Ascend-cann-toolkit,旨在帮助开发者进行快速、高效的人工智能算法开发。开发者可以将开发套件包安装到MindStudio上,使用MindStudio进行快速开发。Ascend-cann-toolkit包含了基于昇腾AI处理器开发依赖的头文件和库文件、编译工具链、调优工具等。 1.2 精度比对介绍自有实现的算子在昇腾AI处理器上的运算结果与业界标准算子(如Caffe、ONNX、TensorFlow、PyTorch)的运算结果可能存在差异:在模型转换过程中对模型进行了优化,包括算子消除、算子融合、算子拆分,这些动作可能会造成自有实现的算子运算结果与业界标准算子(如Caffe、TensorFlow、ONNX)运算结果存在偏差。用户原始网络可以迁移到昇腾910 AI处理器上执行训练,网络迁移可能会造成自有实现的算子运算结果与用业界标准算子(如TensorFlow、PyTorch)运算结果存在偏差。为了帮助开发人员快速解决算子精度问题,需要提供比对自有实现的算子运算结果与业界标准算子运算结果之间差距的工具。精度比对工具提供Vector比对能力,包含余弦相似度、最大绝对误差、累积相对误差、欧氏相对距离、KL散度、标准差、平均绝对误差、均方根误差、最大相对误差、平均相对误差的算法比对维度。二、环境准备在进行实验之前需要配置好远端Linux服务器并下载安装MindStudio。首先在Linux服务器上安装部署好Ascend-cann-toolkit开发套件包、Ascend-cann-tfplugin框架插件包和TensorFlow 1.15.0深度学习框架。之后在Windows上安装MindStudio,安装完成后通过配置远程连接的方式建立MindStudio所在的Windows服务器与Ascend-cann-toolkit开发套件包所在的Linux服务器的连接,实现全流程开发功能。接下来配置环境变量,以运行用户登录服务器,在任意目录下执行vi ~/.bashrc命令,打开.bashrc文件,在文件最后一行后面添加以下内容(以非root用户的默认安装路径为例)。然后执行:wq!命令保存文件并退出。最后执行source ~/.bashrc命令使其立即生效。关于MindStudio的具体安装流程可以参考Windows安装MindStudio(点我跳转),MindStudio环境搭建指导视频(点我跳转)。MindStudio官方下载地址:点我跳转。本文教程基于MindStudio5.0.RC2 x64,CANN版本5.1.RC2实现。三、准备基于GPU运行生成的原始训练网络npy数据文件3.1 获取项目代码本样例选择resnet50模型,利用git克隆代码(git clone -b r1.13.0 https://github.com/tensorflow/models.git),下载成功后如下图所示:3.2 生成数据前处理数据比对前,需要先检查并去除训练脚本内部使用到的随机处理,避免由于输入数据不一致导致数据比对结果不可用。编辑resnet_run_loop.py文件,修改如下(以下行数仅为示例,请以实际为准):注释掉第83、85行注释掉第587~594行第607行,修改为“return None”编辑cifar10_main.py文件,将train_epochs的值改为1。3.3 生成npy文件进入训练脚本所在目录(如“~/models/official/resnet”),修改训练脚本,添加tfdbg的hook。编辑resnet_run_loop.py文件,添加如下加粗字体的信息。配置环境变量执行训练脚本训练任务停止后,在命令行输入run,训练会往下执行一个step。执行lt>gpu_dump命令将所有tensor的名称暂存到自定义名称的gpu_dump文件里。命令行中会有如下回显。另外开启一个终端,在linux命令行下进入gpu_dump文件所在目录,执行下述命令,用以生成在tfdbg命令行执行的命令。timestamp=$[$(date +%s%N)/1000] ; cat gpu_dump | awk '{print "pt",$4,$4}' | awk '{gsub("/", "_", $3);gsub(":", ".", $3);print($1,$2,"-n 0 -w "$3".""'$timestamp'"".npy")}'>dump.txt将上一步生成的dump.txt文件中所有tensor存储的命令复制(所有以“pt”开头的命令),然后回到tfdbg命令行(刚才执行训练脚本的控制台)粘贴执行,即可存储所有的npy文件,存储路径为训练脚本所在目录。退出tfdbg命令行,将生成的npy文件保存到tf_resnet50_gpu_dump_data(用户可自定义)目录下。四、准备基于NPU运行生成的训练网络dump数据和计算图文件4.1 分析迁移单击菜单栏“File > New > Project...”弹出“New Project”窗口。在New Project窗口中,选择Ascend Training。输入项目的名称、CANN远程地址以及本地地址。点击Change配置CANN,如下图所示:Name:工程名称,可自定义。Description:工程描述,可按需补充关于工程的详细信息。CANN Version:CANN软件包版本,如未识别或想要更换使用的版本,可单击“Change”,在弹出界面中选择Ascend-cann-toolkit开发套件包的安装路径(注意需选择到版本号一级)。Project Location:工程目录,默认在“$HOME/AscendProjects”下创建。点击右侧 + 进行配置远程服务器,如下图所示:在出现的信息配置框输入相关配置信息,如下图所示:输入服务器的SSH信息,如果测试连接失败,建议使用CMD或XShell等工具进行排查。选择远程 CANN 安装位置,如下图所示:在Remote CANN location中选择CANN的路径,需要注意的是必须选择到CANN的版本号目录,这里选择的是5.1.RC2版本,如下图所示:点击确定后,需要等待MindStudio进行文件同步操作,这个过程会持续数分钟,期间如果遇到Sync remote CANN files error.错误,考虑是否无服务器root权限。配置完成CANN点击下一步在训练工程选择界面,选择“TensorFlow Project”,单击“Finish”。进入工程界面,单击工具栏中按钮( TensorFlow GPU2Ascend工具)。进入“TensorFlow GPU2Ascend”参数配置页,配置command fileCommand File:tfplugin插件包中的工具脚本文件。Input Path:待转换脚本文件的路径。Output Path:脚本转换后的输出路径。根据tfplugin文件所在路径选择/Ascend/tfplugin/5.1.RC2/python/site-packages/npu_bridge/convert_tf2npu/main.py,如下图所示同样的,选择下载的代码路径作为input path,并选择输出路径,如下图所示:点击Transplant进行转换,如下图所示:出现“Transplant success!”的回显信息,即转换成功。如下图所示:4.2 生成dump数据和计算图文件步骤一 dump前准备。编辑resnet_run_loop.py文件,修改如下(以下行数仅为示例,请以实际为准):注释掉第83、85行把max_steps设置为1。注释掉第575~582行注释掉第595行,修改为“return None”。编辑cifar10_main.py文件,将train_epochs的值改为1。步骤二 dump参数配置。为了让训练脚本能够dump出计算图,我们在训练脚本中的包引用区域引入os,并在构建模型前设置DUMP_GE_GRAPH参数。配置完成后,在训练过程中,计算图文件会保存在训练脚本所在目录中。编辑cifar10_main.py,添加如下方框中的信息。修改训练脚本(resnet_run_loop.py),开启dump功能。在相应代码中,增加如下方框中的信息。步骤三 环境配置。单击MindStudio菜单栏“Run > Edit Configurations...”。进入运行配置界面,选择迁移后的训练脚本。配置环境变量,打开下图所示界面,配置训练进程启动依赖的环境变量,参数设置完成后,单击“OK”,环境变量配置说明请参见下表。环境变量的解释如下表所示:User environment variablesJOB_ID训练任务ID,用户自定义,仅支持大小写字母,数字,中划线,下划线。不建议使用以0开始的纯数字。ASCEND_DEVICE_ID指定昇腾AI处理器的逻辑ID,单P训练也可不配置,默认为0,在0卡执行训练。RANK_ID指定训练进程在集合通信进程组中对应的rank标识序号,单P训练固定配置为0。RANK_SIZE指定当前训练进程对应的Device在本集群大小,单P训练固定配置为1。RANK_TABLE_FILE如果用户原始训练脚本中使用了hvd接口或tf.data.Dataset对象的shard接口,需要配置,否则无需配置。由于ResNet50原始训练脚本中使用了tf.data.Dataset对象的shard接口,因此需要配置,请指定训练前准备中准备好的配置文件。PYTHONPATH请在此配置项末尾追加迁移后的模型所在路径步骤四 执行训练生成dump数据。点击按钮开始训练训练时控制台输出如下所示:resnet目录下生成的数据文件展示如下:在所有以“_Build.txt”为结尾的dump图文件中,查找“Iterator”这个关键词。记住查找出的计算图文件名称,用于后续精度比对。如上图所示,“ge_proto_00000343_Build.txt”文件即是我们需要找到的计算图文件。将此文件拷贝至用户家目录下,便于在执行比对操作时选择。打开上面找到的计算图文件,记录下第一个graph中的name字段值。如下示例中,记录下“ge_default_20220926160231_NPU_61”。进入以时间戳命名的dump文件存放路径下,找到刚记录的名称为name值的文件夹,例如ge_default_20220926160231_NPU_61,则下图目录下的文件即为需要的dump数据文件:五 比对操作在MindStudio菜单栏选择“Ascend > Model Accuracy Analyzer > New Task”,进入精度比对参数配置界面。配置tookit path,点击文件标识,如下图所示:选择对应的版本,如5.1.RC2版本,单击ok:单击next进入参数配置页面:接着填写gpu和npu的数据的相关信息,如下图所示:参数解释如下所示:Output Path比对数据结果存放路径Analysis Mode精度比对分析模式,本样例选择“NPU vs GPU/CPU”Framework本样例选择“TensorFlow”NPU DumpNPU上运行生成的dump文件目录Model File TensorFlow训练场景选择计算图文件(*.txt),Ground Truth原始模型的npy文件目录点击start:结果展示:如上图所示将Vector比对结果界面分为四个区域分别进行介绍。区域区域名称说明1菜单栏从左到右分别为Open…、New Project、Refresh、Help四项功能。Open…为打开并展示比对结果csv文件;New Project为创建新的比对任务;Refresh用于读取并刷新File Manager中管理的文件;单击Help弹出小窗,可展示精度比对工具的使用限制(Restrictions)、使用建议、在线教程链接等。2File Manager,历史数据管理显示用户指定文件夹以及文件夹下生成的整网比对的csv文件以及显示通过Open…单独打开的csv文件;对文件夹和csv,提供历史数据管理功能,包括打开、删除、另存为;在文件夹处右键删除;在空白处右键创建新比对任务(New Task)、刷新(Refresh)和Open…(打开并展示比对结果csv文件)。3Model Accuracy Analysis,精度比对分析界面默认仅显示有结果的算子。可单击列名,进行排序;单击Show Invalid Data,可展示无法比对的数据,各列字段含义请参见下表1。4Scatter Diagram,各项算法指标的散点分布图横坐标表示算子的执行顺序,纵坐标为算法指标在对应Tensor上的实际取值。各字段含义请参见下表2。表1 精度比对分析界面字段说明字段说明Index网络模型中算子的ID。OpSequence算子运行的序列。全网层信息文件中算子的ID。仅配置“Operator Range”时展示。NPUDump表示My Output模型的算子名。光标悬浮时,可显示具体算子所在的文件路径。DataType表示My Output侧数据算子的数据类型。Addressdump tensor的虚拟内存地址。用于判断算子的内存问题。仅基于昇腾AI处理器运行生成的dump数据文件在整网比对时可提取该数据。GroundTruth表示Ground Truth模型的算子名。光标悬浮时,可显示具体算子所在的文件路径。DataType表示Ground Truth侧数据算子的数据类型。TensorIndex表示My Output模型算子的input ID和output ID。表2 散点分布图字段说明字段说明Algorithm选择展示对应比对算法结果的散点分布图,不支持展示StandardDeviation、KullbackLeiblerDivergence和AccumulatedRelativeError。Tensor过滤显示Input、Output结果散点分布图。Highlight对算子Tensor散点进行高亮。通过拖拉游标在对应算法指标的[min,max]间滑动来设置算法指标(纵坐标)的阈值,高于或等于阈值的点显示为蓝色,低于阈值的点显示为红色。如针对余弦相似度,图中设置阈值为0.98,小于0.98的算子Tensor被标记为红色。六、常见问题 & 解决方案汇总Q:tfdbg复制pt命令时执行出错A:由于tfdbg将多行的pt命令识别为了单个命令,使得命令执行失败。解决办法如下:先退出tfdbg命令行安装pexpect库,命令为 pip install pexpect --user(--user只针对普通用户,root用户是没有的)进入resnet所在的目录,cd ~/models/official/resnet确保目录下有dump.txt文件,即生成的pt命令编写下述代码,vim auto_run.pyimport pexpectimport syscmd_line = 'python3 -u ./cifar10_main.py'tfdbg = pexpect.spawn(cmd_line)tfdbg.logfile = sys.stdout.buffertfdbg.expect('tfdbg>')tfdbg.sendline('run')pt_list = []with open('dump.txt', 'r') as f:for line in f:pt_list.append(line.strip('\n'))for pt in pt_list:tfdbg.expect('tfdbg>')tfdbg.sendline(pt)tfdbg.expect('tfdbg>')tfdbg.sendline('exit')保存退出vim,执行python auto_run.py七、从昇腾官方体验更多内容更多的疑问和信息可以在昇腾论坛进行讨论和交流:https://bbs.huaweicloud.com/forum/forum-726-1.html
  • [经验分享] 使用MindStudio进行bert-large推理 在CoNll2003上实体识别NER任务
    一、 MindStudio介绍MindStudio提供了在AI开发所需的一站式开发环境,支持模型开发、算子开发以及应用开发三个主流程中的开发任务。依靠模型可视化、算力测试、IDE本地仿真调试等功能,MindStudio能够实现在一个工具上高效便捷地完成AI应用开发。对推理任务而言,MindStudio提供了模型压缩工具、模型转换工具和模型可视化工具。模型转换工具将开源框架的网络模型 (如Caffe、TensorFlow等)转换成昇腾AI处理器支持的离线模型,模型转换过程中可以实现算子调度的优化、权值数据重排、内存使用优化等。二、概述bert-big-NER是一个经过微调的 BERT 模型,可用于命名实体识别任务(NER),并为NER任务实现一流的性能。它可以识别四种类型的实体:位置(LOC),组织(ORG),人员(PER)和其他(MISC)。具体而言,此模型是一个bert-large-cased模型,在标准CoNLL-2003命名实体识别(https://www.aclweb.org/anthology/W03-0419.pdf)数据集的英文版上进行了微调。如果要在同一数据集上使用较小的 BERT 模型进行微调,也可以使用基于 NER 的 BERT(https://huggingface.co/dslim/bert-base-NER/) 版本。本文介绍了如何使用MindStudio将hugging face上开源的bert_large_NER模型部署到Ascend平台上,并进行数据预处理、推理脚本的开发,在CoNLL-2003命名实体识别数据集上完成推理任务。三、 推理环境准备3.1 Linux端环境准备1. 配置conda环境、安装依赖包依赖名称版本ONNX1.9.0onnxruntime1.12.1Pytorch1.8.0TorchVision0.9.0numpy1.20.3transformers4.21.1tensorflow2.9.1创建conda环境,并安装对应版本安装项目依赖的包。2. 配置环境变量source /usr/local/Ascend/ascend-toolkit/set_env.sh #root用户下export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/Ascend/driver/lib64/driver/source /usr/local/Ascend/ascend-toolkit/set_env.shnpu-smi info #查看npu信息3.2 windows端环境准备按照MindStudio用户手册中的安装指南—>本地安装依赖:Python(版本要求:3.7~3.9)、MinGW、CMake,安装MindStudio。四、 创建工程4.1 创建工程添加远程服务器,选择远程服务器中正确的CANN版本:等待本地同步远端服务器CANN文件,同步完成后,选择ACL Projection(python)项目,点击完成,完成新建项目。配置项目结构:选择add python SDK:选择ssh interpreter,并配置正确的python解释器:配置本地文件夹和远程服务器映射:4.2配置本地、远端环境同步工程目录为:├──bert_large_NER└── bert-large-NER //hugging face提供的词典等└── bert-large-OUT //推理结果输出路径└── bert_bin //生成的推理用数据保存路径└── conll2003 //CoNll-2003数据集└── bert_metric.py //精度测试脚本└── bert_onnx_inference.py //使用onnx模型推理脚本└── bin_create.py //生成om模型推理用数据脚本└── npy_dataset_generate.py //生成onnx模型推理用数据脚本五、 执行推理5.1 数据预处理获取原始数据集CoNLL-2003:数据集下载链接:https://data.deepai.org/conll2003.zip数据集目录为:├──conll2003└── valid.txt //验证集└── train.txt //验证集└── test.txt //测试集该数据集为从路透社的新闻文章中摘取的句子,并为这些单词标记人名、地名和组织名称。以测试集为例,数据集的主要形式如下:词 词性 词块 实体U.N. NNP I-NP I-ORGofficial NN I-NP OEkeus NNP I-NP I-PERheads VBZ I-VP Ofor IN I-PP OBaghdad NNP I-NP I-LOC. . O O在NER任务中,只关心1、4列,其中,第一列代表单词,最后一列代表实体对应的类别。实体类别NAME_ENTITY记录为以下九类:B-PER/I-PER表示单词对应于个人实体的开头。B-ORG/I-ORG表示单词对应于组织实体的开头/内部。B-LOC/I-LOC表示对应于位置实体开头的单词。B-MISC/I-MISC表示单词对应于其他实体的开头。0表示单词非四大类实体。数据预处理:将原始数据集转换为模型输入的二进制数据。数据预处理脚本开发:模型有三个输入,input_ids,attention_mask,token_type_ids;input_ids表示将输入的单词经过bert_large_NER模型生成embedding,在这个过程中,设置sequence长度为512,padding为Ture,实现将input_ids补全为长度为512的向量。同时,在每一条句子对应的512个单词中,哪些是句子的实际长度就将其对应的attention_mask设置为1,padding的部分就将对应的attention_mask设置为0。在一些任务中,存在模型的一条输入超过一句话的情况,此时借助token_type_ids来区分不同的句子,但在NER任务中,CoNLL-2003这个语料库里每个语料只有一句话,因此token_type_ids全都是0。数据预处理的代码实现如下:(bin_create.py、npy_dataset_generate.py)首先,定义INPUT_KEYS和NAME_ENTITY两个列表,分别记录输入和实体名称:设置生成数据的文件结构,并创建三个输入token对应的文件夹:加载bert_large_NER模型中定义的tokenizer:使用tokenizer中的convert_tokens_to_ids方法,将英语单词根据对应的词汇表转换成embedding。手动将每个句子的长度填充到512,并根据句子长度填写attention_mask的值。此外,处理每个单词时,记录其对应的实体类别,并将其记录在annofile中,便于后续精度的计算。对于om模型而言,要求的输入是.bin格式的,并将每条数据对应的三条输入分别存入三个文件夹。对于onnx模型而言,要求的输入是.npy格式的,并将每条数据对应的三条输入存入三个.npy文件。因此,在脚本开发中文件保存时要注意格式要求。执行bin_create.py脚本生成om模型需要的推理数据:运行成功后生成:input_ids.npy、attention_mask.npy、token_type_ids.npy三个npy文件,保存在./bert_bin/bert_npy_2022xxxx-xxxxxx/文件夹下,.anno文件记录token对应的label,保存在./bert_bin文件夹下。执行npy_dataset_generate.py脚本生成onnx模型需要的推理数据:运行成功后生成:input_ids、attention_mask、token_type_ids三个文件夹,保存在./bert_bin/bert_bin_2022xxxx-xxxxxx/文件夹下,文件夹中存的数据格式为.bin,作为om模型的输入。.anno文件记录token对应的label,保存在./bert_bin/文件夹下。5.2 模型转换5.2.1 借助transformers[onnx]工具由path转换成onnxpip install transformers[onnx]使用transformers.onnx进行模型转换:python -m transformers.onnx --model=bert-large-NER --feature=token-classification onnx/■参数说明:​ -- model:hugging face上下载的开源模型​ -- feature:用于导出模型的特征类型​ -- onnx/:保存导出的onnx模型的路径运行结束后生成model.onnx保存在./onnx文件夹下。5.2.2 onnx转换成om使用ATC命令将onnx模型转换为om模型:atc --framework=5 --model=model.onnx --output=bert_large_bs16_seq512 --input_shape="input_ids:16,512;attention_mask:16,512;token_type_ids:16,512" --soc_version=Ascend310P3■参数说明:​ --model:为ONNX模型文件。​ --framework:5代表ONNX模型。​ --output:输出的OM模型。​ --input_shape:输入数据的shape。输入数据有三条,均为batch*512,其中512为sequence序列长度。​ --soc_version:处理器型号。运行成功后生成bert_large_bs16_seq512.om模型文件。对om模型进行可视化,观察模型的输入输出。5.3 执行离线推理转换om模型成功后,使用MindStudio remote终端执行ais_infer推理。启动ssh session,切换conda环境,切换工作目录:使用ais_infer工具进行推理:a. 下载推理工具ais_infer。git clone https://gitee.com/ascend/tools.gitb. 编译、安装推理工具cd /home/lcy/RotaE/tools/ais-bench_workload/tool/ais_infer/backend/pip3.7 wheel ./ #编译 要根据自己的python版本lspip install aclruntime-0.0.1-cp37-cp37m-linux_x86_64.whl精度测试(以batchsize=16为例):python ./tools/ais-bench_workload/tool/ais_infer/ais_infer.py --model ./bert_large_bs16_seq512_1.om --input "./bert_bin/bert_bin_20220928-061343/input_ids,.bert_bin/bert_bin_20220928-061343/attention_mask,./bert_bin/bert_bin_20220928-061343/token_type_ids" --output ./bert-large-OUT/bs16 --outfmt NPY■参数说明:​ --model:为ONNX模型文件。​ --batchsize:模型的batchsize大小。​ --input:模型的输入,input_ids、attention_mask、token_type_ids三个文件夹。​ --output:输出指定在./bert-large-OUT/bs16下。​ --outfmt:推理结果保存格式。执行结束输出保存在./bert-large-OUT/bs16下。5.4 精度验证推理成功,需要对推理结果进行后处理,通过bert_metric.py进行后处理,验证推理结果,进行精度评估。精度推理脚本开发:首先获取到./bert-large-OUT/bs16目录下的推理结果文件:根据预测正确的条数/总数量得到预测正确的准确率acc:在MindStudio运行bert_metric.py脚本进行精度验证:运行成功后输出模型预测结果的精度为90.73%,接近于hugging face中在测试集上的精度结果91.2%:六、 性能调优使用aoe工具进行自动性能调优。No performance improvement”表明:自动性能调优未带来模型推理性能的提升。Q&A由于bert_large_NER模型转换得到的onnx模型较大,且三个输入的形状均为动态的[batch, sequence],因此在使用MindStudio进行onnx模型的可视化以及onnx模型向om模型转换时出现报错:故在模型转换时直接使用ATC工具完成。在数据预处理过程中,transformer库提供的AutoTokenizer.tokenizer方法,生成的embedding存在两个问题:①对未见过的单词自动进行拆分,导致生成的input_keys与原句子相比常常会变长,此时annofile中记录的每个单词对应的实体类别就会失效;②在句子的起始处和结尾处自动增加[CLS]、[SEP]作为起始符和终止符,在更加强调整句话语义的NLP任务中是至关重要的,但在关注每个单词对应的实体类别的NER任务中是不重要的。在测试过程中,也推测出作者在进行模型训练时,也是未增加起始、终止符的。因此,选择借助AutoTokenizer.convert_tokens_to_ids方法,先手动的对应词汇表将英语单词编码为embedding,对于词汇表中没有的单词会将其编码成100。之后再对根据句子长度和sequence长度(512)对编码后的input_ids进行padding,完成input_ids,attention_mask,token_type_ids的生成和annofile记录单词label的对应 。bert_large_NER的vocab.txt如下所示:由tokenizer方法生成的数据如下所示,101表示[CLS],102表示[SEP]。由convert_tokens_to_ids生成的数据如下所示,对词汇表中未出现过的单词会将其编码为100。若读者在使用MindStudio过程中或推理过程中遇到问题,可在MindStudio昇腾论坛进行提问、讨论。
  • [经验分享] 基于MindX SDK的ChineseOCR文字识别模型开发全流程
    b站视频案例链接:视频案例基于MindX SDK的ChineseOCR文字识别教程目录基于MindX SDK的ChineseOCR文字识别教程1. 任务介绍 1.1. 任务场景 1.2. 任务描述 1.3. 任务目标 1.4. 环境信息 2. 模型介绍 3. MindStudio介绍 4. 开发前准备 4.1. 环境准备 4.2. 安装MindStudio 5. 开发过程 5.1. 工程创建 5.2. 工程结构介绍 5.3. om模型文件准备 5.4. 官方模型转onnx模型 5.5. Pipeline流程编排 5.6. 主程序开发 5.6.1   代码逻辑 5.6.2   主程序实现 5.7. 数据集准备 5.8. 运行 5.9. FAQ 5.10. 推广 1 任务介绍1.1 任务场景MindX SDK应用开发1.2 任务描述本开发样例使用MindX SDK,演示中文字体识别ChineseOCR,供用户参考。 本系统基于昇腾Atlas310卡。主要为单行中文识别系统,系统将图像进行适当的仿射变化,然后送入字符识别系统中进行识别后将识别结果输出。1.3 任务目标在Ascend 310上能使模型成功识别手写文字图片1.4 环境信息开发环境:Windows 10 + MindStudio 5.0.RC2昇腾设备:Atlas 200DK昇腾芯片:Ascend 310服务器环境依赖软件和版本如下表:软件名称版本mxVision3.0.RC2Python3.9.12CANN5.1.RC1本地环境依赖软件和版本如下表:软件名称版本Python3.7.13Docker1.5-2Python第三方库依赖如下表:软件名称版本protobuf3.19.02 模型介绍ChineseOCR是一个主要识别中文字符的系统。系统可以实现将字符检测结果中的文字进行识别。本方案选择使用PaddleOCR作为字符识别模型。我们也提供了已经转换好的模型以及一些测试数据集的OBS地址:cid:link_03 MindStudio介绍MindStudio是一套基于华为自研昇腾AI处理器开发的AI全栈开发工具平台,集成了工程管理、编译器、仿真器以及命令行开发工具包,提供网络模型移植、应用开发、推理运行及自定义算子开发等功能。通过MindStudio能够进行工程管理、编译、调试、运行、性能分析等全流程开发,支持仿真环境及真实芯片运行,提高开发效率。通过MindStudio,众智团队可以基本脱离终端命令行模式,搭配昇腾AI硬件环境(实体服务器或远端环境)体验AI开发的所有功能。并通过MindStudio后端主导的独有的负载建模和专家系统,以及可视化的数据分析来更高效的完成调优等过程。4 开发前准备4.1 MindStudio环境搭建首先安装好CANN和MindX SDK,具体可参考如下链接:CANN安装指导: cid:link_2MindX SDK安装指导:cid:link_4然后开始设置环境变量,在CANN的安装目录和MindX SDK的安装目录可以分别找到set_env.sh,它们包含了MindX SDK App所需的大部分环境变量。我们可以打开它们查看内容并且运行脚本,也可以将它们加入~/.bashrc,以便每次进入bash时不用重新手动运行。编辑bashrc,vi ~/.bashrc,在bashrc中应用这两个脚本,然后重启bash。vi ~/.bashrc# 在bashrc中加入以下两行并保存source ${SDK安装路径}/set_env.shsource ${CANN安装路径}/set_env.sh# 保存后重启bashbashMindStudio的主要安装依赖项有CANN,若需开发MindX SDK应用,还需MindX SDK的支持。我们点开setting设置,点击install SDK进行CANN和SDK设置如果我们在Windows环境下开发。基于MindStudio的SDK应用开发环境搭建可以参考: cid:link_34.2 onnx模型文件准备步骤一:首先用户需下载大于等于1.8.0以上版本的paddle包和1.7.0以上版本的onnx,用户可以通过以下两种方式进行安装安装方式一:pip install paddle2onnx==0.3.1 [--user]安装方式二:git clone https://github.com/PaddlePaddle/paddle2onnx.gitpython setup.py install步骤二:进入下载的目录/models/paddleocr/执行以下命令paddle2onnx --model_dir ./ch_ppocr_server_v2.0_rec_infer/ --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./ch_ppocr_server_v2.0_rec_infer.onnx --opset_version 11 --enable_onnx_checker True  如果执行成功则会生成转化成功的onnx模型,如果出现E16005错误E16005: The model has [2] [--domain_version] fields, but only one is allowed.则调用keep_default_domain这个接口修改onnx解决,参考链接网址为cid:link_75 开发过程5.1 工程创建下载MindStudio压缩包解压打开后进入MindStudio的安装目录,选择bin目录下的MindStudio64.exe,打开MindStudio。点击Ascend App,选择项目路径,然后点击下一步选择昇腾应用工程类型。选择如图所示,选择Python框架的MindX SDK应用工程,点击Finish完成创建。5.2 工程结构介绍本工程结构包含如下文件:│  README.md│  main.py                  # 主程序│  chineseocr.pipeline    # pipeline文件├─dataset               #输入图片├─output                #输出图片5.3 om模型文件准备MindX SDK支持的模型格式是om模型,因此使用之前须进行模型转换。首先需要下载官方的paddle模型文件,再使用官方的转化工具将模型转化为onnx模型。若下载我们提供的已经转化完成的om模型,则可以跳跃至5.4阅读。首先在PaddleOCR下载官方的的pdparams模型文件。5.4 onnx模型转om模型将onnx文件上传到CANN所在服务器后,打开MindStudio,在顶部菜单栏中选择“Ascend>Model Converter”,打开图形化模型转换工具。在Model File中选中上传至服务器的onnx模型。Model Name一栏可自行更改,其为输出的om模型名。Target Soc Version选中目标平台,这里我们选择Ascend310。Output Path为输出路径,模型转换工具会将转换后的一些文件拷贝至该位置。更改Input Format和Input Nodes,一般情况下,选择好PB文件后,该栏会自动和模型匹配,若是因某些原因没有自动匹配,需自行选定。旧版本的MindX SDK的已有插件中的推理插件mxpi_tensorinfer暂不支持动态分辨率模型,因此即使原模型支持动态分辨率,也需要在此步将input可能的分辨率固定下来。若需要动态分辨率支持,可以使用mxVision 3.0.RC2及以上的版本,经实验确定推理插件mxpi_tensorinfer已经支持动态分辨率模型。Input Nodes中的Shape为-1的一项表明该维度是动态的,因此可以根据需要将N取值为-1以实现动态Batch,或者将H,W取值 -1,以实现动态分辨率。但注意,动态分辨率和动态Batch是不能同时应用的。最后点击Output Nodes下方的Select,会出现可视化的模型图,找到模型的最终输出节点,然后点击OK确认。点击Next,下一步是一些数据预处理,比如转换颜色空间、裁切输入以适配模型等,可按需选择。本项目并不需要进行预处理,因此继续进行下一步。本模型不需要设置其他参数,因此直接finish结束模型转换,转换完成。5.5 Pipeline流程编排MindX SDK实现功能的最小粒度是插件,每一个插件实现特定的功能,如图片解码、图片缩放等。将这些插件按照合理的顺序编排,实现相应的功能。我们将这个配置文件叫做pipeline,以JSON格式编写,用户必须指定业务流名称、元件名称和插件名称,并根据需要,补充元件属性和下游元件名称信息。我们在MindStudio中可以进行可视化流程编排。在顶部菜单栏中选择“Ascend>MindX SDK Pipeline”,打开空白的pipeline绘制界面,可以在左方插件库中选中所需的插件,并进行插入插件、修改参数等操作。点击MindX SDK Pipeline新建pipeline本模型使用的插件和工作流程如下表所示序号子系统功能描述1图片输入调用appsrc接口输入图片2图像解码调用mxpi-imagedecoder接口对图像解码3模型推理调用mxpi-tensorinfer接口对图像进行推理4模型后处理将图像推理后结果进行最后的处理输出结果5数据序列化用mxpi_dataserialize插件对数据进行序列化输出结果本项目Pipeline的文本格式如下:{  "chineseocr": {    "stream_config": {      "deviceId": "0"    },    "appsrc0": {      "props": {        "blocksize": "4096000"      },      "factory": "appsrc",      "next": "mxpi_imagedecoder0"    },    "mxpi_imagedecoder0": {      "props": {        "dataSource": "appsrc0",        "deviceId": "0",        "cvProcessor": "opencv",        "dataType": "uint8",        "outputDataFormat": "RGB"      },      "factory": "mxpi_imagedecoder",      "next": "mxpi_imageresize0"    },    "mxpi_imageresize0": {      "props": {        "dataSource": "mxpi_imagedecoder0",        "resizeType": "Resizer_Stretch",        "cvProcessor": "opencv",        "resizeHeight": "32",        "resizeWidth": "320"      },      "factory": "mxpi_imageresize",      "next": "crnn_recognition"    },    "crnn_recognition": {      "props": {        "dataSource": "mxpi_imageresize0",        "modelPath": "./model/ch_ppocr_server_v2.0_rec_infer_bs1.om"      },      "factory": "mxpi_tensorinfer",      "next": "mxpi_textgenerationpostprocessor0"    },    "mxpi_textgenerationpostprocessor0": {      "props": {        "dataSource": "crnn_recognition",        "postProcessConfigPath": "./cfg/crnn.cfg",        "labelPath": "./cfg/ppocr_keys_v1.txt",        "postProcessLibPath": "./cfg/libcrnnpostprocess.so"      },      "factory": "mxpi_textgenerationpostprocessor",      "next": "mxpi_dataserialize0"    },    "mxpi_dataserialize0": {      "props": {        "outputDataKeys": "mxpi_textgenerationpostprocessor0",        "deviceId": "0"      },      "factory": "mxpi_dataserialize",      "next": "appsink0"    },    "appsink0": {      "props": {        "blocksize": "4096000"      },      "factory": "appsink"    }  }}next和dataSouce制定了各个元件之间的连接关系,om模型地址需要放在推理插件里面,推理插件输出结果不一定可以可视化,所以需要后处理元件对推理插件进行处理输出。5.6 主程序开发5.6.1代码逻辑接下来就是应用主程序的编写。本项目主程序的逻辑如下:初始化流管理。 加载图像,对图像进行预处理以符合动态分辨率模型的档位。向流发送图像数据,进行推理。获取pipeline各插件输出结果,将结果写入文件。销毁流。5.6.2主程序实现下图是本程序的需要的依赖库,其中StreamManagerApi是MindX SDK自带的,如果在Windows本地进行编辑代码,需要同步MindX SDK,将这些库文件下载到本地,才可以有代码补全提示等,并消去MindStudio对于没有找到对应库的提示。由于运行程序是在远端安装了CANN和MindX SDK的昇腾设备上进行的,因此这些错误提示可以忽略。 另外,如果在昇腾设备上运行程序时报找不到OpenCV等第三方库的错误提示,可以使用pip或者conda安装,但如果是报找不到StreamManagerApi等MindX SDK自带的库的错误提示,此时要确认环境变量是否配置正确,${PYTHONPATH}这个环境变量用于在导入模块的时候搜索路径,配置正确会给程序指明MindX SDK自带模块的位置。配置环境变量请看第4节。预先设置需要的全局变量以便后面使用在程序开始前,应先检查pipeline文件是否存在且可以运行。设置好文件输出路径,使图片识别结果的保存到txt文档中,以便后续与标签的比对。设置好输入路径的图片和标签:发送数据时需要将数据赋给dataInput,然后指定流名,指定输入插件的插件名称,调用SendData发送。推理结果在终端输出打印台同时显示:将打印结果写入文档,并与标签进行文字比对,输出识别结果的相似度:数据完成输出后后,应当回收并销毁所创建的流:其中发送数据和接收数据这一套业务流数据对接接口共有4套,但有些接口是可以不用成套使用的。比如本程序中的SendData是和GetResult配对的,但是本程序使用的获取结果的接口是GetProtobuf,这些根据实际情况使用即可。详细的使用说明可以查看官方文档:cid:link_1。发送数据后的处理是对用户透明的,用户只需要确认数据发送成功后,就可以尝试获取结果。在GetProtobuf这个接口中,我们需要指定流名称、对应的输入接口的编号,以及要获取结果的插件的插件名。5.7 数据集准备数据集为cid:link_8官方提供的OCR手写数据集,直接下载到本地即可5.8 运行在3.2中,按照给出的教程连接,我们已经将本地Windows的MindStudio与远程服务器连接。准备好数据集后,修改main.py里的DATA_PATH为自己放置数据集的路径。接下来,点击顶部菜单栏的Tools>Deployment>Upload,将项目与远程服务器同步。当然,也可以勾选Automatic Upload,这会让MindStudio在文件更改后就会上传到远程服务器。然后点击编辑运行配置,选中main.py为Excutable文件。然后保存配置,点击运行。这是运行成功的控制台输出:等待运行成功后,MindStudio会自动同步远程项目,但是若是自动同步失败或者没有运行,可以点击菜单栏中的Tools>Deployment>Download,下载服务器里的项目,应当包含模型的输出。测试图片输入(放置在dataset文件夹内):测试图片输出(控制台的打印输出)因为本项目要测试识别文字的精度,因此需要额外将识别结果写入文件,保存后与标签文件进行相似度计算。5.9 FAQ输入图片大小与模型不匹配问题问题描述:运行失败,错误提示:E20220826 10:05:45.466817 19546 MxpiTensorInfer.cpp:750] [crnn recognition][1001][General Failed] The shape of concat inputTensors[0] does not match model inputTensors[0]解决方案:在imagedecode插件,设定解码方式的参数为opencv,选择模型格式为RGB,然后再imageresize插件里面设定o解码方式为opencv5.10 推广昇腾(Ascend)开发者论坛面向开发者提供的AI计算平台,包含计算资源、运行框架以及相关配套工具等,这里有昇腾专家在线答疑,欢迎开发者来昇腾论坛学习和交流。链接地址:华为云论坛_云计算论坛_开发者论坛_技术论坛-华为云 (huaweicloud.com)
  • [基础知识] 【合集】吴恩达来信(连载中)
    ↵【转载】吴恩达来信2022-08-11:衡量算法一致性,减少模型偏差cid:link_0【转载】吴恩达来信2022-08-18:人工智能领域求职的小tipscid:link_1【转载】吴恩达来信2022-08-25:想尝试新工作?先进行信息性面试吧!cid:link_2【转载】吴恩达来信2022-09-01:人工智能领域的求职小 tipscid:link_3【转载】吴恩达来信2022-09-08:推动科研成果及时、免费发布cid:link_4【转载】吴恩达来信2022-09-15:为艺术创造打开新大门的 Stable Diffusioncid:link_5【转载】吴恩达来信2022-09-22:为“智能记忆”多加练习吧!cid:link_6【转载】吴恩达来信2022-10-09:考虑用户和数据的不确定性cid:link_7【转载】吴恩达来信2022-10-13:AI, GPU和芯片的未来cid:link_8【转载】吴恩达来信2022-10-20:Prompt engineering的现状及未来cid:link_9【转载】吴恩达来信2022-10-27:人类和鬼魂都在使用AI?!cid:link_10
  • [基础知识] 【转载】吴恩达来信2022-08-11:衡量算法一致性,减少模型偏差
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Bias in AI is a serious problem. For example, if a judge who’s deciding how to sentence a defendant relies on an AI system that routinely estimates a higher risk that offenders of a particular race will reoffend, that’s a terrible thing. As we work to reduce bias in AI models, though, it’s also worth exploring a different issue: inconsistency. Specifically, let’s consider how inconsistent human decisions are, and how AI can reduce that inconsistency.If a human judge, given two defendants who committed the same crime under identical circumstances, sentences one to three years in prison and the other to 30 days, we would consider this inconsistency blatantly unfair. Yet, as Daniel Kahneman and his co-authors document in their book, Noise: A Flaw in Human Judgment, human decision-making is extremely inconsistent (or noisy).One study found that judges systematically sentenced defendants more harshly if the local football team had suffered an upset loss (which presumably made the judge cranky). Judges are only human, and if they’re swayed by football outcomes, imagine how many other irrelevant factors may influence their decisions!Many human decisions rest on complex criteria, and humans don’t always define their criteria before weighing them. For example:In medicine, I’ve seen individual doctors make highly inconsistent diagnoses given the same input. Working on a project with a doctor whom I’ll call Alice, we measured the “inter-Alice agreement score,” which was loosely a measure of how much her diagnoses differed between morning and afternoon. (For the record, Alice is a brilliant doctor and wonderful collaborator. This score measured the inherent ambiguity of the task more than it measured her competence.)In manufacturing, I’ve seen skilled inspectors make very different decisions about whether or not parts with similar flaws were defective.In online retailing, I’ve seen human annotators make inconsistent decisions about how to tag or categorize products. (Should a fun gadget go under electronics or entertainment?)In contrast, given the same input, a trained neural network will produce the same output every time. Given similar inputs, a trained model will also typically output similar results. Automated software tends to be highly consistent. This is one of automation’s huge advantages: Algorithms make decisions much more consistently than humans. To my mind, they offer a way to give patients more consistent and fair treatment options, make manufacturing more efficient, make retail product catalogs less confusing to shoppers, and so on.In conversations about whether and how to build an AI system, it’s important to address how to ensure that the system doesn’t have significant bias as well as how to benchmark its bias against human bias. If you’re trying to get an AI project approved, you may find it useful raise the issue of consistency as well. Measuring the consistency of your algorithm relative to humans who make the same decision can add weight to arguments in favor of investing in an automated system.Keep learning!Andrew亲爱的朋友们,人工智能中存在的偏见是一个严峻的问题。例如,一名法官需要依赖人工智能系统对被告做出判决,而该系统通常会做出特定种族再次犯罪风险更高的估计,这个情况很可怕。然而,当我们致力于减少人工智能模型中的偏差时,还有一个问题也同样值得思考:不一致性。具体来说,就是考虑一下人类决策的不一致性,以及人工智能如何减少这种不一致性。如果一名人类法官,考虑对两名在相同情况下犯下相同罪行的被告做出判决,一人判处了三年监禁,另一人则是30天监禁,我们就会认为这种不一致是非常不公平的。然而,正如 Daniel Kahneman 和其合著者在他们的著作《噪音:人类判断的缺陷》(Noise: A Flaw in Human Judgment)中所阐述的那样,人类的决策极其不一致(或有噪)。一项研究发现,如果当地足球队遭遇惨败(这可能会让法官暴躁),法官会相应对被告做出更加严厉的判罚。法官也是人,如果他们被足球比赛的结果所左右,想象一下还有多少其他不相关的因素会影响他们的决定!许多人类的决策都依赖于复杂的标准,但人类并不总会在衡量标准之前定义标准。例如:在医学领域,我见过个别医生在相同的输入下做出了高度不一致的诊断。在与一位名叫 Alice 的医生合作的项目中,我们测量了“inter Alice agreement score”,这是一个粗略的测量方法,用于测量早上和下午她做出诊断的差异。(自此声明,Alice 是一位杰出的医生和出色的合作者。这个分数衡量的是任务固有的模糊性,而不是她的能力。)在制造业,我见过熟练的检验员对有类似缺陷的零件是否存在缺陷做出了非常不同的决定。在网上零售业,我遇到过人类标注员在如何标记或分类产品方面做出不一致的决定。(一个有趣的小玩意儿应该归入电子类还是娱乐类?)相反,给定相同的输入,经过训练的神经网络每次都会产生相同的输出。如果输入相似,经过训练的模型通常也会输出相似的结果。自动化软件往往具有高度一致性。这是自动化的巨大优势之一:比起人类,算法可以更一致地做出决策。在我看来,它们提供了一种方式,可以让患者获得更一致、更公平的治疗选择,也可以提高生产效率,减少零售产品目录给购物者造成的困扰,等等。在关于是否以及如何构建人工智能系统的对话中,重点是要解决如何确保系统没有明显的偏见,以及如何将其偏见与人类偏见进行对比。如果你想让人工智能项目获得批准,你可能会发现提出一致性问题也很有用。相较于做出相同决定的人,衡量算法的一致性可以增加支持投资自动化系统的论据权重。请不断学习!吴恩达发布于 2022-08-11 19:44原帖作者:吴恩达原帖标题:吴恩达来信:衡量算法一致性,减少模型偏差原帖地址:cid:link_3
  • [基础知识] 【转载】吴恩达来信2022-08-18:人工智能领域求职的小tips
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,I’ve written about how to build a career in AI and focused on tips for learning technical skills, choosing projects, and sequencing projects over a career. This time, I’d like to talk about searching for a job.A job search has a few predictable steps including selecting companies to apply to, preparing for interviews, and finally picking a job and negotiating an offer. In this letter, I’d like to focus on a framework that’s useful for many job seekers in AI, especially those who are entering AI from a different field.If you’re considering your next job, ask yourself:Are you switching roles? For example, if you’re a software engineer, university student, or physicist who’s looking to become a machine learning engineer, that’s a role switch.Are you switching industries? For example, if you work for a healthcare company, financial services company, or a government agency and want to work for a software company, that’s a switch in industries.A product manager at a tech startup who becomes a data scientist at the same company (or a different one) has switched roles. A marketer at a manufacturing firm who becomes a marketer in a tech company has switched industries. An analyst in a financial services company who becomes a machine learning engineer in a tech company has switched both roles and industries.If you’re looking for your first job in AI, you’ll probably find switching either roles or industries easier than doing both at the same time. Let’s say you’re the analyst working in financial services:If you find a data science or machine learning job in financial services, you can continue to use your domain-specific knowledge while gaining knowledge and expertise in AI. After working in this role for a while, you’ll be better positioned to switch to a tech company (if that’s still your goal).Alternatively, if you become an analyst in a tech company, you can continue to use your skills as an analyst but apply them to a different industry. Being part of a tech company also makes it much easier to learn from colleagues about practical challenges of AI, key skills to be successful in AI, and so on.If you’re considering a role switch, a startup can be an easier place to do it than a big company. While there are exceptions, startups usually don’t have enough people to do all the desired work. If you’re able to help with AI tasks — even if it’s not your official job — your work is likely to be appreciated. This lays the groundwork for a possible role switch without needing to leave the company. In contrast, in a big company, a rigid reward system is more likely to reward you for doing your job well (and your manager for supporting you in doing the job for which you were hired), but it’s not as likely to reward contributions outside your job’s scope.After working for a while in your desired role and industry (for example, a machine learning engineer in a tech company), you’ll have a good sense of the requirements for that role in that industry at a more senior level. You’ll also have a network within that industry to help you along. So future job searches — if you choose to stick with the role and industry — likely will be easier.When changing jobs, you’re taking a step into the unknown, particularly if you’re switching either roles or industries. One of the most underused tools for becoming more familiar with a new role and/or industry is the informational interview. I’ll share more about that in the next letter.亲爱的朋友们,我曾撰写过关于如何在人工智能领域建立职业生涯的文章,重点介绍了学习技术技能、选择项目以及在职业生涯中安排项目顺序的技巧。这一次,我想谈谈找工作的问题。求职有几个可预测的步骤,包括选择要申请的公司、准备面试,最后选择一份工作并进行商谈。在今天这封来信中,我想重点介绍一个对许多人工智能求职者有用的框架,特别是那些从不同领域进入人工智能的求职者。如果你正在考虑换一份工作,请先问问自己:你需要变换角色吗?例如,如果你是一名软件工程师、大学生或是一名想成为机器学习工程师的物理学家,那么这就是一次角色转换。你需要变换行业吗?例如,如果你目前在为一家医疗保健公司、金融服务公司或政府机构工作,之后想为一家软件公司工作,这就是行业的转变。科技初创公司的一名产品经理变成了同一家公司(或另一家公司)的数据科学家,他就转换了角色。一家制造公司的营销人员成为一家科技公司的营销员,他就改变了行业。如果一位金融服务公司的分析师成为了一家科技公司的机器学习工程师,那么他既转换了角色也转换了行业。如果你正在寻找人工智能领域的第一份工作,你可能会发现单独转换角色或行业比同时进行这两件事更容易。假设你是金融服务业的分析师:如果你在金融服务业找到数据科学或机器学习相关的工作,你可以沿用你再特定领域的知识,同时获得人工智能方面的知识和专业技能。在这个职位上工作一段时间后,你将更适合转投科技公司(如果这仍然是你的目标的话)。或者,如果你成为一家科技公司的分析师,你可以继续使用你作为分析师的技能,但需要将其应用到不同的行业。作为科技公司的一员,你还可以更容易地向同事学习人工智能面临的实际挑战、在人工智能领域取得成功的关键技能等等。如果你正在考虑进行角色转换,创业公司可能比大企业更容易实现这一点。当然也有例外,但初创公司通常缺乏足够的人员来完成所有的预期工作。如果你能够帮助完成人工智能任务——即使这并非你的正式工作——你的付出可能会受到赞赏。在无需离开公司的前提下,这为可能的角色转换奠定了基础。相比之下,大企业严格的奖励制度更有可能奖励你出色完成了本职工作(以及支持你完成工作的经理),但不太会奖励你在工作范围之外做出的贡献。在你理想的职位和行业(例如,科技公司的机器学习工程师)工作一段时间后,你会对该行业更高层次的职位要求有更好的了解。你还将在该行业中拥有一个关系网络来帮助你继续发展。因此,如果你选择坚守岗位和行业,未来的求职可能会更容易。更换工作是你向未知迈出的一步,特别是当你换角色或换行业时。为了更熟悉新角色和/或行业,信息性面试是最未被充分利用的工具之一。我将在下一封信中更多地分享这一点。请不断学习!吴恩达发布于 2022-08-18 16:31原帖作者:吴恩达原帖标题:吴恩达来信:人工智能领域求职的小tips原帖地址:cid:link_4
  • [基础知识] 【转载】吴恩达来信2022-08-25:想尝试新工作?先进行信息性面试吧!
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Last week, I wrote about switching roles, industries, or both as a framework for considering a job search. If you’re preparing to switch roles (say, taking a job as a machine learning engineer for the first time) or industries (say, working in an AI tech company for the first time), there’s a lot about your target job that you probably don’t know. A technique known as informational interviewing is a great way to learnAn informational interview involves finding someone in a company or role you’d like to know more about and informally interviewing them about their work. Such conversations are separate from searching for a job. In fact, it’s helpful to interview people who hold positions that align with your interests well before you’re ready to kick off a job search.Informational interviews are particularly relevant to AI. Because the field is evolving, many companies use job titles in inconsistent ways. In one company, data scientists might be expected mainly to analyze business data and present conclusions on a slide deck. In another, they might write and maintain production code. An informational interview can help you sort out what the AI people in a particular company actually do.With the rapid expansion of opportunities in AI, many people will be taking on an AI job for the first time. In this case, an informational interview can be invaluable for learning what happens and what skills are needed to do the job well. For example, you can learn what algorithms, deployment processes, and software stacks a particular company uses. You may be surprised — if you’re not already familiar with the data-centric AI movement — to learn how much time most machine learning engineers spend iteratively cleaning datasets.Prepare for informational interviews by researching the interviewee and company in advance, so you can arrive with thoughtful questions. You might ask:What do you do in a typical week or day?What are the most important tasks in this role?What skills are most important for success?How does your team work together to accomplish its goals?What is the hiring process?Considering candidates who stood out in the past, what enabled them to shine?Finding someone to interview isn’t always easy, but many people who are in senior positions today received help when they were new from those who had entered the field ahead of them, and many are eager to pay it forward. If you can reach out to someone who’s already in your network — perhaps a friend who made the transition ahead of you or someone who attended the same school as you — that’s great! Meetups such as Pie & AI can also help you build your network.Finally, be polite and professional, and thank the people you’ve interviewed. And when you get a chance, please pay it forward as well and help someone coming up after you. If you receive a request for an informational interview from someone in the DeepLearning.AI community, I hope you’ll lean in to help them take a step up! If you’re interested in learning more about informational interviews, I recommend this articlefrom the UC Berkeley Career Center.I’ve mentioned a few times the importance of your network and community. People you’ve met, beyond providing valuable information, can play an invaluable role by referring you to potential employers. Stay tuned for more on this topic.Keep learning!Andrew亲爱的朋友们,我在上周来信中阐述了关于转换角色、行业或两者皆变的内容,可作为考虑求职的框架。如果你正准备转换角色(比如第一次成为机器学习工程师)或行业(比如第一次在人工智能技术公司工作),你的目标工作有很多你可能不知道的地方。一种被称为信息性面试的技术是一种很好的学习方法。信息性面试指的是在感兴趣的公司或职位中找到你想了解更多的人,并就他们的工作进行非正式面试。这种交谈与求职是分开的。事实上,在你准备开始找工作之前,与那些职位与你的兴趣相符的人交谈是很有帮助的。信息性面试在人工智能行业尤其可以提供帮助。由于该领域在不断发展,许多公司定义职位的方式并不一致。在一家公司,数据科学家可能主要负责分析业务数据,并用幻灯片呈现结论。而另一家公司下,他们可能需要编写和维护生产代码。信息性面试可以帮助你了解特定公司的人工智能人员的实际工作。随着人工智能机会的迅速增多,许多人将有机会首次从事人工智能工作。在这种情况下,信息性面试对于了解发生了什么以及做好工作需要什么技能是非常宝贵的。例如,你可以了解特定公司使用的算法、部署过程和软件堆栈。如果你还不熟悉以数据为中心的人工智能浪潮,了解大多数机器学习工程师在迭代清理数据集上花费的时间,可能会让你感到惊讶。提前对接受交谈的人和其任职公司进行研究,为信息性面试做好准备,这样你就可以带着深思熟虑过的问题进行交谈。你可能会问:你在平常的一周或一天会做什么?这个职位中最重要的任务是什么?什么技能对成功最为重要?你的团队需要如何合作才能实现其目标?招聘流程是什么?对于那些在过去脱颖而出的候选人,是什么让他们光芒四射?找到合适的交谈并对象不总是容易的,许多如今任职高位的人在刚开始工作时都受益于那些帮助他们进入该领域的人,而且许多人都渴望向前迈进。如果你能联系到你已有人际网络中的人(也许是在你之前完成过渡的朋友,或者是和你在同一所学校就读的同学)来寻求帮助,那是极好的!诸如 Deeplearning.AI 的 Pie & AI 等聚会也可以帮助你建立网络。最后,保持礼貌和专业,并感谢与你交谈过的人。当你有机会的时候,也请帮助新入行的人。如果你收到来自 DeepLearning.AI 社区成员发出的信息性面试请求,我希望你能帮助他们迈出一步!如果你有兴趣了解更多有关信息性面试的信息,欢迎点击下方阅读原文查看一篇来自加州大学伯克利分校职业中心的文章。我已多次提到人际网络和社区的重要性。你所结识的人除了可以提供有价值的信息外,还可以在将你引荐给潜在雇主方面提供更宝贵的帮助。请继续关注有关此主题的更多信息。请不断学习!吴恩达发表于 2022-08-25 14:47原帖作者:吴恩达原帖标题:吴恩达来信:想尝试新工作?先进行信息性面试吧!原帖地址:cid:link_3
  • [基础知识] 【转载】吴恩达来信2022-09-01:人工智能领域的求职小 tips
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,I’ve devoted several recent letters to building a career in AI. In this one, I’d like to discuss some fine points of finding a job.The typical job search follows a fairly predictable path.Research roles and companies online or by talking to friends.Optionally, arrange informal informational interviews with people in companies that appeal to you.Either apply directly or, if you can, get a referral from someone on the inside.Interview with companies that give you an invitation.Receive one or more offers and pick one. Or, if you don’t receive an offer, ask for feedback from the interviewers, the human resources staff, online discussion boards, or anyone in your network who can help you plot your next move.Although the process may be familiar, every job search is different. Here are some tips to increase the odds you’ll find a position that supports your thriving and enables you to keep growing.Pay attention to the fundamentals. A compelling resume, portfolio of technical projects, and a strong interview performance will unlock doors. Even if you have a referral from someone in a company, a resume and portfolio will be your first contact with many people who don’t already know about you. Update your resume and make sure it clearly presents your education and experience relevant to the role you want. Customize your communications with each company to explain why you’re a good fit. Before an interview, ask the recruiter what to expect. Take time to review and practice answers to common interview questions, brush up key skills, and study technical materials to make sure they are fresh in your mind. Afterward, take notes to help you remember what was said.Proceed respectfully and responsibly. Approach interviews and offer negotiations with a win-win mindset. Outrage spreads faster than reasonableness on social media, so a story about how an employer underpaid someone gets amplified, whereas stories about how an employer treated someone fairly do not. The vast majority of employers are ethical and fair, so don’t let stories about the small fraction of mistreated individuals sway your approach. If you’re leaving a job, exit gracefully. Give your employer ample notice, give your full effort through your last hour on the job, transition unfinished business as best you can, and leave in a way that honors the responsibilities you were entrusted with.Choose who to work with. It’s tempting to take a position because of the projects you’ll work on. But the teammates you’ll work with are at least equally important. We’re influenced by people around us, so your colleagues will make a big difference. For example, if your friends smoke, the odds rise that you, too, will smoke. I don’t know of a study that shows this, but I’m pretty sure that if most of your colleagues work hard, learn continuously, and build AI to benefit all people, you’re likely to do the same. (By the way, some large companies won’t tell you who your teammates will be until you’ve accepted an offer. In this case, be persistent and keep pushing to identify and speak with potential teammates. Strict policies may make it impossible to accommodate you, but in my mind, that increases the risk of accepting the offer, as it increases the odds you’ll end up with a manager or teammates who aren’t a good fit.)Get help from your community. Most of us go job hunting only a small number of times in our careers, so few of us get much practice at doing it well. Collectively, though, people in your immediate community probably have a lot of experience. Don’t be shy about calling on them. Friends and associates can provide advice, share inside knowledge, and refer you to others who may help. I got a lot of help from supportive friends and mentors when I applied for my first faculty position, and many of the tips they gave me were very helpful.I know that the job search process can be intimidating. Instead of viewing it as a great leap, consider an incremental approach. Start by identifying possible roles and conducting a handful of informational interviews. If these conversations tell you that you have more learning to do before you’re ready to apply, that’s great! At least you have a clear path forward. The most important part of any journey is to take the first step, and that step can be a small one.Keep learning!Andrew亲爱的朋友们,最近的几封来信致力于帮助大家在人工智能领域建立职业生涯。今天的这篇文章中,我想着重讨论一些求职要点。典型的求职会遵循一条相当可预测的路径。在线研究感兴趣的职务和公司,或线下与朋友交谈。或者,与你感兴趣的公司的工作人员进行非正式的信息性面试。可以直接申请,也可以寻求内推。与向你发出邀请的公司进行面试。接收一个或多个工作邀请并从中选择一个。即便你没有收到工作邀请,也可以向面试官、人力资源部员工、在线讨论或关系网中任何可以帮助你规划下一步行动的人寻求反馈。你可能对这个过程感到熟悉,但每次求职都是不同的。以下是一些小tips,可以增加你找到一个建立事业并使你不断成长的职位的几率。注重面试前的准备工作。一份引人注目的简历、一系列技术项目和一次出色的面试表现将为你打开大门。即使你是公司内推的,简历和项目集合将成为你与许多不了解你的人的第一次接触。更新你的简历,确保它清楚地展示了与你想要的角色相关的受教育程度和工作经验。根据每家公司的特点进行沟通,以说明为什么你适合这个职位。在面试前询问招聘人员公司的预期是什么。花时间演练常见面试问题的回答、复习关键技能、学习技术材料,并做笔记以确保将它们印刻在你的脑海中。尊重和负责地进行过渡。以双赢的心态进行面试和协商。在社交媒体上,愤怒情绪的传播速度要比理性言论快得多,因此,关于雇主如何克扣付员工薪酬的故事会被放大,而雇主公平对待员工的故事则无人问津。绝大多数雇主都是道德和公平的,因此不要让一小部分受到不公待遇的人的故事影响你的做法。如果你要离职,请尽量妥善地安排离职交接。给你的雇主留出充分的应对时间,站好最后一班岗,尽力完成未完的工作,并体面地离开。选择与谁合作。我们很容易因为想从事的项目而接受一个职位,而你将与之合作的队友也同样重要。正所谓近朱者赤,近墨者黑。如果你的朋友吸烟,那么你吸烟的可能性就会增加。我无法说清是哪项研究表明了这一观点,但我非常肯定,如果你的大多数同事都努力工作、不断学习、构建人工智能来让所有人受益,那么你很可能也会这样做。(顺便说一句,一些大公司在你正接受一份工作之前不会告诉你你的队友是谁。在这种情况下,请坚持下去,不断努力寻找潜在的队友并与之交谈。严格的政策可能会让你无法适应,在我看来,这会增加你拒接这份工作的风险,因为这会提高你与不太“合得来”的经理或队友共事的可能性。)从你的社群获得帮助。我们中的大多数人在职业生涯中只经历过为数不多的几次求职,因此很少有人对此具有丰富的实践。不过,作为一个整体,社区里的人们可能会提供很多经验。不要羞于拜访他们,朋友和同事可以提供建议,分享内部知识,并将你引荐给可能带来帮助的其他人。当我申请第一个教师职位时,我从支持我的朋友和导师那里得到了很多帮助,他们的很多建议都非常有用。我知道求职的过程可能会令人生畏。与其将其视为一次巨大的飞跃,不如考虑一种渐进的方法。首先确定潜在的职位,并进行一些信息性面试。如果对话的内容告诉你,在申请之前还有更很多的准备工作要做,那就太好了!至少你有了一条明确的前进道路。任何旅程中最重要的部分都是迈出第一步,尽管这一步可能很小。请不断学习!吴恩达发布于 2022-09-01 19:33原帖作者:吴恩达原帖标题:吴恩达来信:人工智能领域的求职小 tips原帖地址:cid:link_7
  • [基础知识] 【转载】吴恩达来信2022-09-08:推动科研成果及时、免费发布
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,A few weeks ago, the White House required that research papers funded by the U.S. government be available online promptly and freely by the end of 2025. Data that underlies those publications must also be made available.I’m thrilled! Paywalled journals that block free access to scientific research are the bane of the academic community.The AI world is fortunate to have shifted years ago to free online distribution of research papers, primarily through the arXiv site. I have no doubt that this has contributed to the rapid rise of AI and am confident that, thanks to the new U.S. policy, promoting a similar shift in other disciplines will accelerate global scientific progress.In the year 2000 — before modern deep learning, and when dinosaurs still roamed the planet — AI researchers were up in arms against paywalled journals. Machine Learning Journal, a prominent journal of the time, refused to open up access. With widespread support from the AI community, MIT computer scientist Leslie Kaelbling started the free Journal of Machine Learning Research, and many researchers promptly began publishing there instead. This move led to the rapid decline of Machine Learning Journal. The Journal of Machine Learning Research remains a respected institution today, edited by David Blei and Francis Bach (both of who are my former officemates at UC Berkeley).Before the modern internet, journal publishers played an important role by printing and disseminating hard copies of papers. It was only fair that they could charge fees to recoup their costs and make a modest profit. But in today’s research environment, for-profit journals rely mainly on academics to review papers for free, and they harvest the journals’ reputations (as reflected in metrics such as impact factor) to extract a profit.Today, there are peer-reviewed journal papers, peer-reviewed conference papers, and non-peer-reviewed papers posted online directly by the authors. Journal articles tend to be longer and undergo peer review and careful revisions. In contrast, conference papers (such as NeurIPS, ICML and ICLR articles) tend to be shorter and less carefully edited, and thus they can be published more quickly. And papers published on arXiv aren’t peer reviewed, so they can be published and reach interested readers immediately.The benefits of rapid publication and distribution have caused a lot of the action to shift away from journals and toward conferences and arXiv. While the volume of research is overwhelming (that’s why The Batch tries to summarize the AI research that matters), the velocity at which ideas circulate has contributed to AI’s rise.By the time the new White House guidance takes effect, a quarter century will have passed since machine learning researchers took a key step toward unlocking journal access. When I apply AI to healthcare, climate change, and other topics, I occasionally bump into an annoyingly paywalled article from these other disciplines. I look forward to seeing these walls come down.Don’t underestimate the impact of freeing up knowledge. I wish all these changes had taken place a quarter century ago, but I’m glad we’re getting there and look forward to the acceleration of research in all disciplines!Keep learning!Andrew亲爱的朋友们,几周前,美国白宫提出要求——由美国政府资助产出的研究论文需在2025年底前及时、免费在线公开。这些出版物涉及的基础数据也必须一并公开。我对此感到很激动!收费期刊阻碍了人们免费获取科学研究,导致学术壁垒无法被打破。人工智能世界很幸运地在几年前转向了免费在线发布研究论文,主要是通过arXiv网站。毫无疑问,这促进了人工智能的迅速崛起,我相信,通过美国采取的新政策,促进其他学科进行类似转变将加速全球科学的进步。2000年时,在现代深度学习时代之前,人工智能研究人员就与付费期刊展开了激烈的斗争。《机器学习》杂志是当时的一本著名杂志,拒绝向民众开放访问。在人工智社群的广泛支持下,麻省理工学院计算机科学家 Leslie Kaelbling 创办了免费的《机器学习研究》杂志,许多研究人员立即开始在那里发表文章。这一举动导致了《机器学习》杂志的迅速衰落。《机器学习研究》杂志至今仍是一个受人尊敬的机构,由 David Blei 和 Francis Bach(他们都是我在加州大学伯克利分校的前同事)共同主编。在现代互联网出现之前,期刊出版商通过印刷和传播纸质论文发挥了重要作用。公平的做法是,他们可以收取费用以收回成本,并获取适当利润。但在如今的研究环境中,营利性期刊主要依靠由学术界免费审查论文,并赚取期刊声誉(如影响因子等指标所反映的)来获利。如今,我们能看到同行评议的期刊论文、同行评议的会议论文和由作者直接在线发布的非同行评议论文。期刊文章往往较长,且经过同行审查和仔细修订。相反,会议论文(如 NeurIPS, ICML 和 ICLR 文章)往往较短,编辑较少,因此可以更快地发表。在 arXiv 上发表的论文没有经过同行评议,因此它们可以快速发表并接触到感兴趣的读者。快速出版和发行的好处导致许多行动从期刊转向会议和arXiv。虽然科研数量巨大(这就是为什么 The Batch 试图总结重要的人工智能研究),但思想的快速传播有效推动了人工智能的崛起。到新的白宫指南真正生效时,机器学习研究人员就能在解锁期刊访问方面迈出关键一步,耗时四分之一个世纪。当我将人工智能应用于医疗保健、气候变化和其他主题时,也偶尔会遇到来自这些其他学科的令人讨厌的付费文章。期待早日着看到这些知识围墙倒塌。不要低估释放知识的影响。我希望所有这些变化都能发生在四分之一个世纪前,我也很高兴我们正在推进,并期待所有学科的研究都能加速前进!请不断学习!吴恩达发布于 2022-09-08 19:37原帖作者:吴恩达原帖标题:吴恩达来信:推动科研成果及时、免费发布原帖地址:cid:link_3
  • [基础知识] 【转载】吴恩达来信2022-09-15:为艺术创造打开新大门的 Stable Diffusion
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Stable Diffusion, an image generation model that takes a text prompt and produces an image, was released a few weeks ago in a landmark event for AI. While similar programs like DALL·E and Craiyon can be used via API calls or a web user interface, Stable Diffusion can be freely downloaded and run on the user’s hardware.I'm excited by the artwork produced by such programs (Developer Simon Willison posted a fun tweetstorm that highlights some of the creativity they’ve unleashed), but I’m also excited by the ways in which other developers are incorporating it into their own drawing tools. Ironically, Stable Diffusion’s manner of release moves us closer to “open AI” than the way DALL·E was released by the company called OpenAI. Kudos to Emad Mostaque and his Stability AI team, which developed the program.If you want to learn about how diffusion models like Stable Diffusion work, you can find a concise description here.Image generation is still maturing, but it’s a big deal. Many people have the creativity to produce art but lack the drawing skill to do so. As an amateur illustrator (I like to draw pandas to entertain my daughter using the Procreate paint app), my meager skill limits what I can create. But sitting in front of the DALL·E or Stable Diffusion user interface, I can ask her what she wants to see a panda doing and render a picture for her.Artists who have greater skill than I can use image generators to create stunning artworks more efficiently. In fact, an image produced this way recently won an art competition at the Colorado State Fair.The rise of inexpensive smartphone cameras brought an explosion in photography, and while expensive DSLRs still have a role, they now produce a minuscule fraction of all pictures taken. I expect AI-powered image generators to do something similar in art: Ever-improving models and user interfaces will make it much more efficient to generate art using AI than without. I see a future where most art is generated using AI, and novices who have great creativity but little drawing skill will be able to participate.My friend and collaborator Curt Langlotz, addressing the question of whether AI will replace radiologists, said that radiologists who use AI will replace radiologists who don’t. The same will be true here: Artists who use AI will (largely) replace artists who don’t. Imagine the transition in the 1800s from the time when each artist had to source their own minerals to mix shades of paint to when they could purchase ready-mixed paint in a tube. This development made it easier for any artist to paint whatever and whenever they wished. I see a similar transition ahead. What an exciting time!Separately from generating images for human consumption, these algorithms have great potential to generate images for machine consumption. A number of companies have been developing image generation techniques to produce training images for computer vision algorithms. But because of the difficulty of generating realistic images, many have focused on vertical applications that are sufficiently valuable to justify their investment, such as generating road scenes to train self-driving cars or portraits of diverse faces to train face recognition algorithms.Will image generation algorithms reduce the cost of data generation and other machine-to-machine processes? I believe so. It will be interesting to see this space evolve.亲爱的朋友们,Stable Diffusion 是一种采用文本提示为输入的图像生成模型,于几周前在一个人工智能里程碑活动中发布。虽然与之类似的程序,如 DALL·E 和 Craiyon 可以通过 API 调用或 web 用户界面进行使用,但 Stable Diffusion 可以免费下载并在用户的个人硬件上运行。我对这些程序制作的作品充满兴趣(开发者 Simon Willison 发布了一个有趣的 tweetstorm,强调了他们释放的一些创造力),但我也对其他开发者将其融入自己的绘图工具的方式感到兴奋。具有讽刺意味的是,Stable Diffusion 的发布方式使我们更接近“Open AI”,而不是像 OpenAI 发布 DALL·E 那样。向 Emad Mostaque 和他的 Stability AI 团队致敬,是他们开发了 Stable Diffusion 程序。如果你想了解诸如 Stable Diffusion 这样的扩散模型是如何工作的,欢迎点击此处浏览一个简洁的介绍。图像生成技术仍在不断成熟,这是一件大事。许多人有创作艺术的创造力,但缺乏绘画技巧。作为一名业余插画师(我喜欢用 Procreate paint 应用程序绘制熊猫来讨女儿欢心),有限的绘画技能限制了我的创作力。但使用 DALL·E 或 Stable Diffusion 程序,我可以直接询问女儿想看熊猫做什么,并为她制作一张图片。拥有比我更高技能的艺术家可以使用图像生成器更高效地创作令人惊叹的艺术品。事实上,以这种方式制作的图像最近在科罗拉多州博览会上赢得了一场艺术比赛。廉价智能手机相机的兴起引发了人们对摄影的巨大兴趣,虽然昂贵的单反相机仍然发挥着作用,但它们在如今的摄影作品中的占比微乎其微。我希望人工智能图像生成器在艺术领域也能产生类似的影响:不断改进的模型和用户界面将让使用人工智能生成艺术更加高效。我看到了一个未来趋势,大多数艺术都是使用人工智能生成的,而那些有很大创造力但绘画技能很少的新手将有机会参与其中。我的朋友兼合作者 Curt Langlotz 在谈到人工智能是否会取代放射科医生的问题时说,使用人工智能的放射科医生将取代不使用的放射科医师。对艺术来说也是如此:使用人工智能的艺术家将(很大程度上)取代不使用这项技术的艺术家。想象一下19世纪时发生的转变,从每一位艺术家都必须自己寻找矿物来混合不同颜色的颜料,到他们可以直接购买预调好的管状颜料。这一发展使艺术家们能够更随心所欲地创作。我看到了类似的转变。多么激动人心的时刻!与生成供人类消费的图像不同,这些算法具有生成供机器消费的图像的巨大潜力。许多公司一直在开发图像生成技术,为计算机视觉算法生成训练图像。但由于难以生成真实图像,许多人将注意力集中在了具有足够价值的垂直应用上,以证明其投资的合理性。例如生成道路场景以训练自动驾驶汽车,或生成不同的人脸肖像以训练人脸识别算法。图像生成算法是否会降低数据生成和其他机器对机器过程的成本?我相信会是这样的。期待看到这个领域的发展。请不断学习!吴恩达发布于 2022-09-15 15:35原帖作者:吴恩达原帖标题:吴恩达来信:为艺术创造打开新大门的 Stable Diffusion原帖地址:cid:link_6
  • [基础知识] 【转载】吴恩达来信2022-09-22:为“智能记忆”多加练习吧!
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Activities such as writing code and solving math problems are often perceived as purely intellectual pursuits. But this ignores the fact that they involve the mental equivalent of muscle memory.The idea of muscle memory is a powerful concept in human learning. It has helped millions of people to understand the importance of practice in learning motor tasks. However, it’s also misleading because it excludes skills that don’t involve using muscles.I believe that a similar principle operates in learning intellectual skills. Lack of recognition of this fact has made it harder for people to appreciate the importance of practice in acquiring those skills as well.The phenomenon of muscle memory is widely acknowledged. When you repeatedly practice balancing on a bicycle, swinging a tennis racquet, or typing without looking at the keyboard, adaptations in your brain, nervous system, and muscles eventually allow you to carry out the task without having to consciously pay attention to it.The brain and nervous system are central to learning intellectual skills, and these parts of the body also respond to practice. Whether you’re writing code, solving math problems, or playing chess, practice makes you better at it. It leads your brain to form mental chunks that allow you to reason at a higher level. For example, a novice programmer has to think carefully about every parenthesis or colon, but with enough practice, coding common subroutines can take little conscious effort. Practice frees up your attention to focus on higher-level architectural issues.Of course, there are biological differences between learning motor skills and learning intellectual skills. For example, the former involves parts of the brain that specialize in movement. And the physical world presents somewhat different challenges each time you perform an action (for example, your bicycle hits different bumps, and an opposing tennis player returns each of your serves differently). Thus practicing motor skills automatically leads you to try out your actions in different situations, which trains your brain to adapt to different problems.But I think there are more similarities than people generally appreciate. While watching videos of people playing tennis can help your game, you can’t learn to play tennis solely by watching videos. Neither can you learn to code solely by watching videos of coding. You have to write code, see it sometimes work and sometimes not, and use that feedback to keep improving. Like muscle memory, this kind of learning requires training the brain and nervous system through repetition, focused attention, making decisions, and taking breaks between practice sessions to consolidate learning. And, like muscle memory, it benefits from variation: When practicing an intellectual task, we need to challenge ourselves to work through a variety of situations rather than, say, repeatedly solving the same coding problem.All of this leads me to think that we need an equivalent term for muscle memory in the intellectual domain. As knowledge work has come to play a larger economic role relative to physical labor, the ability to learn intellectual tasks has become much more important than it was when psychologists formed the idea of muscle memory around 150 years ago. This new term would help people understand that practice is as crucial to developing intellectual skills as muscular ones.How about intellect memory? It’s not an elegant phrase, but it acknowledges this under-appreciated reality of learning.What intellectual task do you develop intellect memory for, and can you find time in your schedule to do the necessary practice? After all, there’s no better way to learn.Keep learning!Andrew亲爱的朋友们:编写代码和解决数学问题等行为通常被视为纯粹的智能追求。但这忽略了一个事实,即它们涉及到与心理相当的肌肉记忆。肌肉记忆是人类学习中一个强有力的概念。它帮助数百万人理解了练习在学习行为中的重要性。然而,这也是一种误导,因为它排除了不需要使用肌肉的技能。我相信,在学习智能技术方面也有类似的原则。由于对这一事实的认识不足,人们也很难认识到实践在获取这些技能方面的重要性。肌肉记忆现象是公认的。当你在自行车上反复练习平衡、挥动网球拍或打字而不用看键盘时,你的大脑、神经系统和肌肉的适应最终会让你无需刻意关注就能完成任务。大脑和神经系统是学习智能技术的核心,身体的这些部位也会对练习作出反应。无论你是在编写代码、解决数学问题还是下棋,充分地练习都会让你做得更好。它会引导你的大脑形成心理语块(mental chunks),让你在更高层次上进行推理。例如,新手程序员必须仔细考虑每个括号或冒号,但经过足够的练习,编写通用子程序几乎不需要付出有意识的努力。实践可以让你将注意力放在更高级别的体系结构问题上。当然,学习运动技能和学习智能技术之间存在生物学差异。例如,前者涉及大脑中专门从事运动的部分。每当你执行一个动作时,现实世界都会呈现出一些不同的挑战(例如,你的自行车碰到不同的颠簸、对面的网球运动员对每个发球的回球都不同)。因此,练习运动技能会自动引导你在不同的情况下尝试你的动作,训练你的大脑适应不同的问题。但我认为,它们之间的相似性比人们通常所理解的要多。虽然观看人们打网球的视频有助于你的比赛,但你不能仅仅通过观看视频来学习打球;也不能仅仅通过观看编码视频来学习编码。你必须自己联系编写代码,看到它有时有效有时无效,并利用反馈来不断改进。像肌肉记忆一样,这种学习需要通过重复、集中注意力、做决定和在练习之间休息来训练大脑和神经系统,以巩固学习。和肌肉记忆一样,它也能从变化中受益:在练习智能任务时,我们需要在各种情况下工作以挑战自己,而不是重复解决同一个编码问题。这些都使我认为,我们需要一个在智能领域中与肌肉记忆等效的术语。随着知识工作相对于体力劳动发挥着更大的经济作用,学习智能任务的能力变得比150年前心理学家提出肌肉记忆概念时更为重要。这个新术语将帮助人们理解,练习对于发展智能技术和肌肉技能同样重要。我们称之为“智能记忆”怎么样?这也许不是最合适的形容,但它认证了这一关于学习的被低估的事实。你会为提升智能记忆开发什么样的智能任务,你能在你的时间表中抽出时间来做必要的练习吗?毕竟,没有更好的学习方法了。请不断学习!吴恩达发布于 2022-09-22 15:44原帖作者:吴恩达原帖标题:吴恩达来信:为“智能记忆”多加练习吧!原帖地址:cid:link_2
  • [基础知识] 【转载】吴恩达来信2022-10-27:人类和鬼魂都在使用AI?!
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Each year, AI brings wondrous advances. But, as Halloween approaches and the veil lifts between the material and ghostly realms, we see that spirits take advantage of these developments at least as much as humans do.As I wrote last week, prompt engineering, the art of writing text prompts to get an AI model to generate the output you want, is a major new trend. Did you know that the Japanese word for prompt — 呪文— also means spell or incantation? (Hat tip to natural language processing developer Paul O’Leary McCann.) The process of generating an image using a model like DALL·E 2 or Stable Diffusion does seem like casting a magic spell — not to mention these programs' apparent ability to reanimate long-dead artists like Pablo Picasso — so Japan's AI practitioners may be onto something.Some AI companies are deliberately reviving the dead. The startup HereAfter AIproduces chatbots that speak, sound, and look just like your long-lost great grandma. Sure, it's a simulation. Sure, the purpose is to help the living connect with deceased loved ones. When it comes to reviving the dead — based on what I've learned by watching countless zombie movies — I'm sure nothing can go wrong.I'm more concerned by AI researchers who seem determined to conjure ghastly creatures. Consider the abundance of recent research into transformers. Every transformer uses multi-headed attention. Since when is having multiple heads natural? Researchers are sneaking multi-headed beasts into our computers, and everyone cheers for the new state of the art! If there's one thing we know about transformers, it's that there's more than meets the eye.This has also been a big year for learning from masked inputs, and approaches like Masked Autoencoders, MaskGIT, and MaskViT have achieved outstanding performance in difficult tasks. So if you put on a Halloween mask, know that you're supporting a key idea behind AI progress.Trick or treat!Andrew亲爱的朋友们:人工智能每年都会带来惊人的进步。但是,随着万圣节的临近,物质世界和幽灵世界之间的面纱被缓缓揭开,我们看到,鬼魂世界也和人类世界一样利用了这些发展。正如我在上周的来信中所写的,prompt engineering(提示词工程)——即编写文本提示以使AI模型生成所需输出的艺术,是一个主要的新趋势。你知道日语中的“提示”一词——呪文——也意味着咒语或咒语?(向自然语言处理开发人员Paul O'Leary McCann致敬。)使用DALL·E 2或Stable Diffusion等模型生成图像的过程确实像是施了一个魔法(更不用说这些程序明显有能力让帕勃罗·毕加索等已故艺术家“复活”),所以日本的人工智能从业者可能会有所收获。一些人工智能公司正在试图复活逝者。初创公司HereAfter AI生产的聊天机器人在讲话、声音和外观上都像你许久不见的曾祖母。当然,这只是一个模拟,目的是帮助生者与已故亲人取得某种“联系”。根据我从无数僵尸电影中学到的,当谈到复活逝者时,我确信没有什么会出错。我更关心的是那些似乎决心创造恐怖生物的人工智能研究人员。想想最近对transformer的大量研究。每个transformer都用到了多头关注。什么时候开始有“多个头”是自然的了?研究人员正在潜入我们的电脑中,每个人都在为这项新技术而欢呼!如果说我们对transformer有一点了解的话,那就是事情并不像最初看到的那样简单。今年也是从掩码输入中学习的一年,掩码自动编码器、MaskGIT和MaskViT等方法在困难任务中取得了出色的表现。所以,如果你戴上万圣节面具,那么你就支持了人工智能进步背后的一个关键想法。不给糖就捣蛋!吴恩达发布于 2022-10-27 14:20原帖作者:吴恩达原帖标题:吴恩达来信:人类和鬼魂都在使用AI?!原帖地址:cid:link_6
  • [基础知识] 【转载】吴恩达来信2022-10-20:Prompt engineering的现状及未来
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Is prompt engineering — the art of writing text prompts to get an AI system to generate the output you want — going to be a dominant user interface for AI? With the rise of text generators such as GPT-3 and Jurassic and image generators such as DALL·E, Midjourney, and Stable Diffusion, which take text input and produce output to match, there has been growing interest in how to craft prompts to get the output you want. For example, when generating an image of a panda, how does adding an adjective such as “beautiful” or a phrase like “trending on artstation” influence the output? The response to a particular prompt can be hard to predict and varies from system to system.So is prompt engineering an important direction for AI, or is it a hack?Here’s how we got to this point:The availability of large amounts of text or text-image data enabled researchers to train text-to-text or text-to-image models.Because of this, our models expect text as input.So many people have started experimenting with more sophisticated prompts.Some people have predicted that prompt engineering jobs would be plentiful in the future. I do believe that text prompts will be an important way to tell machines what we want — after all, they’re a dominant way to tell other humans what we want. But I think that prompt engineering will be only a small piece of the puzzle, and breathless predictions about the rise of professional prompt engineers are missing the full picture.Just as a TV has switches that allow you to precisely control the brightness and contrast of the image — which is more convenient than trying to use language to describe the image quality you want — I look forward to a user interface (UI) that enables us to tell computers what we want in a more intuitive and controllable way.ake speech synthesis (also called text-to-speech). Researchers have developed systems that allow users to specify which part of a sentence should be spoken with what emotion. Virtual knobs allow you to dial up or down the degree of different emotions. This provides fine control over the output that would be difficult to express in language. By examining an output and then fine-tuning the controls, you can iteratively improve the output until you get the effect you want.So, while I expect text prompts to remain an important part of how we communicate with image generators, I look forward to more efficient and understandable ways for us to control their output. For example, could a set of virtual knobs enable you to generate an image that is 30 percent in the style of Studio Ghibli and 70 percent the style of Disney? Drawing sketches is another good way to communicate, and I’m excited by img-to-img UIs that help turn a sketch into a drawing.Likewise, controlling large language models remains an important problem. If you want to generate empathetic, concise, or some other type of prose, is there an easier way than searching (sometimes haphazardly) among different prompts until you chance upon a good one?When I’m just playing with these models, I find prompt engineering a creative and fun activity; but when I’m trying to get to a specific result, I find it frustratingly opaque. Text prompts are good at specifying a loose concept such as “a picture of a panda eating bamboo,” but new UIs will make it easier to get the results we want. And this will help open up generative algorithms to even more applications; say, text editors that can adjust a piece of writing to a specific style, or graphics editors that can make images that look a certain way.Lots of exciting research ahead! I look forward to UIs that complement writing text prompts.Keep learning!Andrew亲爱的朋友们:Prompt engineering(提示工程)—— 即编写文本提示以生成想要的输出的人工智能系统的艺术是否会成为人工智能的主导用户界面?随着文本生成器(如GPT-3和Jurassic)和图像生成器(如DALL·E、Midtridge和Stable Diffusion)的兴起(需要输入文本并生成匹配的输出),人们对如何创建提示以获得想要的输出越来越感兴趣。例如,在生成熊猫图像时,添加诸如“beautiful”之类的形容词或诸如“trending on artstation”之类的短语将如何影响输出?对特定提示的响应可能很难预测,并且会因系统而异。那么,prompt engineering是人工智能的一个重要方向,还是一种黑客行为呢?为什么会这么想:大量文本或文本-图像数据的可用性使研究人员能够训练文本-文本或文本-图像模型。因此,我们的模型期望将文本作为输入。因此,许多人开始尝试使用更为复杂的提示。一些人预测,未来会出现大量涉及prompt engineering的工作。我确实相信,文本提示将是告诉机器我们想要什么的一种重要方式——毕竟,它是告诉其他人我们需要什么的主要方式。但我认为,prompt engineering只是构成这个谜团的一小部分,关于专业提示工程师即将崛起的令人兴奋的预测也并不明朗。正如电视上的转换按键可以让你精确控制图像的亮度和对比度,这比试图用语言描述你想要的图像质量更方便。我期待会有一个用户界面(UI),它使我们能够以更直观和可控的方式告诉计算机我们想要什么。以语音合成(也称为文本-语音)为例。研究人员开发了一种系统,允许用户指定句子的哪个部分应该用什么样的情感说话。虚拟旋钮允许你调高或调低不同情绪的程度。这提供了对难以用语言表达的输出的精细控制。通过检查输出,然后微调控件,我们可以反复改进输出直到获得所需的效果。因此,虽然我希望文本提示仍然是我们与图像生成器通信的重要组成部分。但我希望我们能找到更高效、更容易理解的方法来控制它们的输出。例如,一组虚拟旋钮是否可以生成一个30%是吉卜力工作室(知名日本动画工作室)风格,70%是迪斯尼风格的图像?绘制草图是另一种很好的交流方式,使用img-img UIs将草图转换为绘图的方式也令我感到兴奋。同样,控制大型语言模型仍然是一个重要问题。如果你想产生感同身受的、简洁的或其他类型的散文,有没有比在不同提示中进行搜索(有时只是随意浏览)更简单的方法,帮你找到一个合适的提示?当我只是在试用这些模型时,我发现prompt engineering是一项富有创造性和有趣的活动;但当我试图得到一个具体的结果时,却发现它迟钝地令人沮丧。文本提示可以很好地指定一个松散的概念,例如“熊猫吃竹子的图片”,但新的UI可以更容易地获得我们想要的结果。这将有助于将生成算法扩展到更多的应用程序;例如,可以将一段文字调整为特定样式的文本编辑器,或者可以将图像调整为某种形式的图形编辑器。未来还将有许多令人兴奋的研究出现!我期待着UI能够帮助补充编写文本提示。请不断学习!吴恩达发布于 2022-10-20 19:05原帖作者:吴恩达原帖标题:吴恩达来信:Prompt engineering的现状及未来原帖地址:cid:link_3
  • [基础知识] 【转载】吴恩达来信2022-10-13:AI, GPU和芯片的未来
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,The rise of AI over the last decade has been powered by the increasing speed and decreasing cost of GPUs and other accelerator chips. How long will this continue? The past month saw several events that might affect how GPU prices evolve.In September, Ethereum, a major blockchain that supports the cryptocurrency known as ether, completed a shift that significantly reduced the computation it requires. This shift — dubbed the Merge — should benefit the natural environment by consuming less energy. It will also decrease demand for GPUs to carry out cryptocurrency mining. (The Bitcoin blockchain remains computationally expensive.) I expect that lower demand will help lower GPU prices.On the other hand, Nvidia CEO Jensen Huang declared recently that the era in which chip prices could be expected to fall is over. Moore’s Law, the longstanding trend that has doubled the number of transistors that could fit in a given area of silicon roughly every two years, is dead, he said. It remains to be seen how accurate his prediction is. After all, many earlier reports of the death of Moore’s Law have turned out to be wrong. Intel continues to bet that it will hold up.That said, improvements in GPU performance have exceeded the pace of Moore’s Law as Nvidia has optimized its chips to process neural networks, while the pace of improvements in CPUs, which are designed to process a wider range of programming, has fallen behind. So even if chip manufacturers can’t pack silicon more densely with transistors, chip designers may be able to continue optimizing to improve the price/performance ratio for AI.I’m optimistic that AI practitioners will get the processing power they need. While much AI progress has been — and a meaningful fraction still is — driven by using cheaper computation to train bigger neural networks on bigger datasets, other engines of innovation now drive AI as well. Data-centric AI, small data, more efficient algorithms, and ongoing work to adapt AI to thousands (millions?) of new applications will keep things moving forward.Semiconductor startups have had a hard time in recent years because, by the time they caught up with any particular offering by market leader Nvidia, Nvidia had already moved on to a faster, cheaper product. If chip prices stop falling, they’ll have a bigger market opportunity — albeit with significant technical hurdles — to build competitive chips. The industry for AI accelerators remains dynamic. Intel and AMD are making significant investments and a growing number of companies are duking it out on the MLPerf benchmark that measures chip performance. I believe the options for training and inference in the cloud and at the edge will continue to expand.Keep learning!Andrew亲爱的朋友们:近十年来,人工智能的兴起得益于GPU及其他加速器芯片速度的提高和成本的降低。这个趋势会持续多久?过去一个月发生了一些可能影响GPU价格变化的事件。9月,支持以太加密货币的主要区块链以太坊(Ethereum)完成了一次转换,大大减少了所需的计算量。这种转变被称为“合并”(the Merge),它应该能够通过消耗更少的能源来造福自然环境。它还将减少对GPU进行加密货币挖掘的需求。(比特币区块链的计算成本仍然很高。)我预计较低的需求将有助于降低GPU价格。另一方面,英伟达首席执行官黄仁勋近日宣布,预计芯片价格下跌的时代已经结束。他说,摩尔定律已经过时,这一长期趋势使大约每两年可以在硅的某一特定区域内安装的晶体管数量翻了一番。这一预测有多准确尚待观察。毕竟,早年间许多关于摩尔定律已经过时的报道都被证明是错误的。英特尔就笃定它会持续下去。也就是说,GPU性能的改进已经超过了摩尔定律的速度,因为英伟达已经优化了其芯片以处理神经网络,而用于处理更大编程范围的CPU的改进速度已经落后。因此,即使芯片制造商无法用晶体管更密集地封装硅,芯片设计者也可以持续进行优化以提高AI的性价比。尽管全球范围内对芯片的生产和需求出现波动,我对人工智能从业者将获得他们需要的处理能力依然持乐观态度。虽然使用更廉价的计算在更大的数据集上训练更大的神经网络来推动人工智能已取得了很大的进步,但现在其他创新引擎也在推动人工智能的发展。以数据为中心的人工智能、小数据、更高效的算法,以及正在进行的使人工智能适应数千(百万?)新应用程序的开发工作将推动事情向前发展。近年来,半导体初创公司经历了一段艰难的时期,因为当他们赶上市场领导者英伟达提供的任何特定产品时,英伟达的重心却开始转向研发速度更快、更便宜的产品了。如果芯片价格停止下跌,他们将获得更大的市场机会来制造具有竞争力的芯片——尽管依然存在重大的技术壁垒。人工智能加速器行业仍然充满活力。英特尔和AMD正在进行重大投资,越来越多的公司正在MLPerf基准(用以衡量芯片性能)上进行较量。我相信在云端和边缘设备上进行训练和推理的选项将继续被扩展。请不断学习!吴恩达发布于 2022-10-13 15:38原帖作者:吴恩达原帖标题:吴恩达来信:AI, GPU和芯片的未来原帖地址:cid:link_5
  • [基础知识] 【转载】吴恩达来信2022-10-09:考虑用户和数据的不确定性
    【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,When I wrote recently about how to build a career in AI, several readers wrote to ask specifically about AI product management: the art and science of designing compelling AI products. I’ll share lessons I’ve learned about this here and in future letters.A key concept in building AI products is iteration. As I’ve explained in past letters, developing a machine learning system is a highly iterative process. First you build something, then run experiments to see how it performs, then analyze the results, which enables you to build a better version based on what you’ve learned. You may go through this loop several times in various phases of development — collecting data, training a model, deploying the system — before you have a finished product.Why is development of machine learning systems so iterative? Because (i) when starting on a project, you almost never know what strange and wonderful things you’ll find in the data, and discoveries along the way will help you to make better decisions on how to improve the model; and (ii) it’s relatively quick and inexpensive to try out different models.Not all projects are iterative. For example, if you’re preparing a medical drug for approval by the U.S. government — an expensive process that can cost tens of millions of dollars and take years — you’d usually want to get the drug formulation and experimental design right the first time, since repeating the process to correct a mistake would be costly in time and money. Or if you’re building a space telescope (such as the wonderful Webb Space Telescope) that’s intended to operate far from Earth with little hope of repair if something goes wrong, you’d think through every detail carefully before you hit the launch button on your rocket.Iterating on projects tends to be beneficial when (i) you face uncertainty or risk, and building or launching something can provide valuable feedback that helps you reduce the uncertainty or risk, and (ii) the cost of each attempt is modest.This is why The Lean Startup, a book that has significantly influenced my thinking, advocates building a minimum viable product (MVP) and launching it quickly. Developing software products often involves uncertainty about how users will react, which creates risk for the success of the product. Making a quick-and-dirty, low-cost implementation helps you to get valuable user feedback before you’ve invested too much in building features that users don’t want. An MVP lets you resolve questions about what users want quickly and inexpensively, so you can make decisions and investments with greater confidence.When building AI products, I often see two major sources of uncertainty, which in turn creates risk:Users.The considerations here are similar to those that apply to building software products. Will they like it? Are the features you’re prioritizing the ones they’ll find most valuable? Is the user interface confusing?Data.Does your dataset have enough examples of each class? Which classes are hardest to detect? What ishuman-level performanceon the task, and what level of AI performance is reasonable to expect?A quick MVP or proof of concept, built at low cost, helps to reduce uncertainty about users and/or data. This enables you to uncover and address hidden issues that may hinder your success.Many product managers are used to thinking through user uncertainty and using iteration to manage risk in that dimension. AI product managers should also consider the data uncertainty and decide on the appropriate pace and nature of iteration to enable the development team to learn the needed lessons about the data and, given the data, what level of AI functionality and performance is possible.Keep learning!Andrew亲爱的朋友们:在我分享了关于如何在人工智能领域建立职业生涯的内容后,有几位读者来信想要具体了解对人工智能产品的管理:设计引人注目的AI产品的艺术和科学。我将在这周及以后的来信中分享我的经验。构建AI产品的一个关键概念是迭代。正如我在过去的来信中所阐释的,开发机器学习系统是一个高度迭代的过程。首先你需要构建一些东西,然后运行实验来检验它的性能,并分析结果,这使你能够根据所学内容构建更好的版本。在开发的各个阶段(收集数据、训练模型、部署系统),你可能会多次经历此循环,然后才能得到成品。为什么机器学习系统的开发会如此迭代?因为(i)在开始一个项目时,你几乎无法知道你会在数据中发现什么奇怪和奇妙的东西,在开发过程中发现将帮助你更好地决定如何改进模型;(ii)尝试不同的模型相对来说是一种既快速又低成本的方式。并非所有项目都是迭代的。例如在美国,如果你正在为某种药品申请政府的批准——这是一个昂贵的过程,可能需要花费数千万美元和数年时间。我们通常希望第一次尝试就正确地获得药物配方和实验设计,因为重复这一过程来纠错将耗费大量的时间和金钱。又或者,如果你正在建造一架太空望远镜(比如神奇的韦伯太空望远镜),目的是在远离地球的地方运行,一旦运行出现问题,几乎没有修复的希望,那么在按下火箭的发射按钮之前,你应该仔细检查每个细节。当(i)你面临不确定性或风险时,迭代项目往往是有益的,构建或启动某些东西可以提供有价值的反馈,帮助你减少不确定性或危险,以及(ii)每次尝试的成本比较适中。这就是为什么《精益创业》(The Lean Startup)一书对我的思想产生了重大影响,它提倡建立一个最低可行产品(minimum viable product, MVP)并迅速推出。开发软件产品通常会涉及用户反馈的不确定性,这为产品的成功创造了风险。创建一个快捷、低成本的实现可以帮助你在投入过多精力构建用户不想要的功能之前获得有价值的用户反馈。MVP可以让你快速、低成本地解决用户需求的问题,让你可以更有信心地做出决策和投资。在构建AI产品时,我发现通常有两个主要的不确定性来源,这会带来风险:用户。针对这个问题的注意事项与构建软件产品时的类似。用户会喜欢吗?你优先考虑的功能是他们认为最有价值的功能吗?用户界面令人困惑吗?数据。你的数据集中的每个类别是否都有足够的样本?哪些类别最难检测?任务的人类水平表现是什么?人工智能的水平是什么?进行快速MVP或概念验证、低成本完成构建,有助于减少用户和/或数据的不确定性。这使你能够发现并解决可能阻碍成功的隐藏问题。其他行业中的许多产品经理习惯于思考用户的不确定性,并使用迭代来管理该维度的风险。AI产品经理在此基础上还应考虑数据的不确定性,并决定适当的迭代速度和性质,以使开发团队能够学习有关数据的必要经验教训,并根据数据确定AI功能和性能的可实现级别。请不断学习!吴恩达发布于 2022-10-09 21:19原帖作者:吴恩达原帖标题:吴恩达来信:考虑用户和数据的不确定性原帖地址:cid:link_6