-
↵【转载】吴恩达来信2022-08-11:衡量算法一致性,减少模型偏差cid:link_0【转载】吴恩达来信2022-08-18:人工智能领域求职的小tipscid:link_1【转载】吴恩达来信2022-08-25:想尝试新工作?先进行信息性面试吧!cid:link_2【转载】吴恩达来信2022-09-01:人工智能领域的求职小 tipscid:link_3【转载】吴恩达来信2022-09-08:推动科研成果及时、免费发布cid:link_4【转载】吴恩达来信2022-09-15:为艺术创造打开新大门的 Stable Diffusioncid:link_5【转载】吴恩达来信2022-09-22:为“智能记忆”多加练习吧!cid:link_6【转载】吴恩达来信2022-10-09:考虑用户和数据的不确定性cid:link_7【转载】吴恩达来信2022-10-13:AI, GPU和芯片的未来cid:link_8【转载】吴恩达来信2022-10-20:Prompt engineering的现状及未来cid:link_9【转载】吴恩达来信2022-10-27:人类和鬼魂都在使用AI?!cid:link_10
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Bias in AI is a serious problem. For example, if a judge who’s deciding how to sentence a defendant relies on an AI system that routinely estimates a higher risk that offenders of a particular race will reoffend, that’s a terrible thing. As we work to reduce bias in AI models, though, it’s also worth exploring a different issue: inconsistency. Specifically, let’s consider how inconsistent human decisions are, and how AI can reduce that inconsistency.If a human judge, given two defendants who committed the same crime under identical circumstances, sentences one to three years in prison and the other to 30 days, we would consider this inconsistency blatantly unfair. Yet, as Daniel Kahneman and his co-authors document in their book, Noise: A Flaw in Human Judgment, human decision-making is extremely inconsistent (or noisy).One study found that judges systematically sentenced defendants more harshly if the local football team had suffered an upset loss (which presumably made the judge cranky). Judges are only human, and if they’re swayed by football outcomes, imagine how many other irrelevant factors may influence their decisions!Many human decisions rest on complex criteria, and humans don’t always define their criteria before weighing them. For example:In medicine, I’ve seen individual doctors make highly inconsistent diagnoses given the same input. Working on a project with a doctor whom I’ll call Alice, we measured the “inter-Alice agreement score,” which was loosely a measure of how much her diagnoses differed between morning and afternoon. (For the record, Alice is a brilliant doctor and wonderful collaborator. This score measured the inherent ambiguity of the task more than it measured her competence.)In manufacturing, I’ve seen skilled inspectors make very different decisions about whether or not parts with similar flaws were defective.In online retailing, I’ve seen human annotators make inconsistent decisions about how to tag or categorize products. (Should a fun gadget go under electronics or entertainment?)In contrast, given the same input, a trained neural network will produce the same output every time. Given similar inputs, a trained model will also typically output similar results. Automated software tends to be highly consistent. This is one of automation’s huge advantages: Algorithms make decisions much more consistently than humans. To my mind, they offer a way to give patients more consistent and fair treatment options, make manufacturing more efficient, make retail product catalogs less confusing to shoppers, and so on.In conversations about whether and how to build an AI system, it’s important to address how to ensure that the system doesn’t have significant bias as well as how to benchmark its bias against human bias. If you’re trying to get an AI project approved, you may find it useful raise the issue of consistency as well. Measuring the consistency of your algorithm relative to humans who make the same decision can add weight to arguments in favor of investing in an automated system.Keep learning!Andrew亲爱的朋友们,人工智能中存在的偏见是一个严峻的问题。例如,一名法官需要依赖人工智能系统对被告做出判决,而该系统通常会做出特定种族再次犯罪风险更高的估计,这个情况很可怕。然而,当我们致力于减少人工智能模型中的偏差时,还有一个问题也同样值得思考:不一致性。具体来说,就是考虑一下人类决策的不一致性,以及人工智能如何减少这种不一致性。如果一名人类法官,考虑对两名在相同情况下犯下相同罪行的被告做出判决,一人判处了三年监禁,另一人则是30天监禁,我们就会认为这种不一致是非常不公平的。然而,正如 Daniel Kahneman 和其合著者在他们的著作《噪音:人类判断的缺陷》(Noise: A Flaw in Human Judgment)中所阐述的那样,人类的决策极其不一致(或有噪)。一项研究发现,如果当地足球队遭遇惨败(这可能会让法官暴躁),法官会相应对被告做出更加严厉的判罚。法官也是人,如果他们被足球比赛的结果所左右,想象一下还有多少其他不相关的因素会影响他们的决定!许多人类的决策都依赖于复杂的标准,但人类并不总会在衡量标准之前定义标准。例如:在医学领域,我见过个别医生在相同的输入下做出了高度不一致的诊断。在与一位名叫 Alice 的医生合作的项目中,我们测量了“inter Alice agreement score”,这是一个粗略的测量方法,用于测量早上和下午她做出诊断的差异。(自此声明,Alice 是一位杰出的医生和出色的合作者。这个分数衡量的是任务固有的模糊性,而不是她的能力。)在制造业,我见过熟练的检验员对有类似缺陷的零件是否存在缺陷做出了非常不同的决定。在网上零售业,我遇到过人类标注员在如何标记或分类产品方面做出不一致的决定。(一个有趣的小玩意儿应该归入电子类还是娱乐类?)相反,给定相同的输入,经过训练的神经网络每次都会产生相同的输出。如果输入相似,经过训练的模型通常也会输出相似的结果。自动化软件往往具有高度一致性。这是自动化的巨大优势之一:比起人类,算法可以更一致地做出决策。在我看来,它们提供了一种方式,可以让患者获得更一致、更公平的治疗选择,也可以提高生产效率,减少零售产品目录给购物者造成的困扰,等等。在关于是否以及如何构建人工智能系统的对话中,重点是要解决如何确保系统没有明显的偏见,以及如何将其偏见与人类偏见进行对比。如果你想让人工智能项目获得批准,你可能会发现提出一致性问题也很有用。相较于做出相同决定的人,衡量算法的一致性可以增加支持投资自动化系统的论据权重。请不断学习!吴恩达发布于 2022-08-11 19:44原帖作者:吴恩达原帖标题:吴恩达来信:衡量算法一致性,减少模型偏差原帖地址:cid:link_3
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,I’ve written about how to build a career in AI and focused on tips for learning technical skills, choosing projects, and sequencing projects over a career. This time, I’d like to talk about searching for a job.A job search has a few predictable steps including selecting companies to apply to, preparing for interviews, and finally picking a job and negotiating an offer. In this letter, I’d like to focus on a framework that’s useful for many job seekers in AI, especially those who are entering AI from a different field.If you’re considering your next job, ask yourself:Are you switching roles? For example, if you’re a software engineer, university student, or physicist who’s looking to become a machine learning engineer, that’s a role switch.Are you switching industries? For example, if you work for a healthcare company, financial services company, or a government agency and want to work for a software company, that’s a switch in industries.A product manager at a tech startup who becomes a data scientist at the same company (or a different one) has switched roles. A marketer at a manufacturing firm who becomes a marketer in a tech company has switched industries. An analyst in a financial services company who becomes a machine learning engineer in a tech company has switched both roles and industries.If you’re looking for your first job in AI, you’ll probably find switching either roles or industries easier than doing both at the same time. Let’s say you’re the analyst working in financial services:If you find a data science or machine learning job in financial services, you can continue to use your domain-specific knowledge while gaining knowledge and expertise in AI. After working in this role for a while, you’ll be better positioned to switch to a tech company (if that’s still your goal).Alternatively, if you become an analyst in a tech company, you can continue to use your skills as an analyst but apply them to a different industry. Being part of a tech company also makes it much easier to learn from colleagues about practical challenges of AI, key skills to be successful in AI, and so on.If you’re considering a role switch, a startup can be an easier place to do it than a big company. While there are exceptions, startups usually don’t have enough people to do all the desired work. If you’re able to help with AI tasks — even if it’s not your official job — your work is likely to be appreciated. This lays the groundwork for a possible role switch without needing to leave the company. In contrast, in a big company, a rigid reward system is more likely to reward you for doing your job well (and your manager for supporting you in doing the job for which you were hired), but it’s not as likely to reward contributions outside your job’s scope.After working for a while in your desired role and industry (for example, a machine learning engineer in a tech company), you’ll have a good sense of the requirements for that role in that industry at a more senior level. You’ll also have a network within that industry to help you along. So future job searches — if you choose to stick with the role and industry — likely will be easier.When changing jobs, you’re taking a step into the unknown, particularly if you’re switching either roles or industries. One of the most underused tools for becoming more familiar with a new role and/or industry is the informational interview. I’ll share more about that in the next letter.亲爱的朋友们,我曾撰写过关于如何在人工智能领域建立职业生涯的文章,重点介绍了学习技术技能、选择项目以及在职业生涯中安排项目顺序的技巧。这一次,我想谈谈找工作的问题。求职有几个可预测的步骤,包括选择要申请的公司、准备面试,最后选择一份工作并进行商谈。在今天这封来信中,我想重点介绍一个对许多人工智能求职者有用的框架,特别是那些从不同领域进入人工智能的求职者。如果你正在考虑换一份工作,请先问问自己:你需要变换角色吗?例如,如果你是一名软件工程师、大学生或是一名想成为机器学习工程师的物理学家,那么这就是一次角色转换。你需要变换行业吗?例如,如果你目前在为一家医疗保健公司、金融服务公司或政府机构工作,之后想为一家软件公司工作,这就是行业的转变。科技初创公司的一名产品经理变成了同一家公司(或另一家公司)的数据科学家,他就转换了角色。一家制造公司的营销人员成为一家科技公司的营销员,他就改变了行业。如果一位金融服务公司的分析师成为了一家科技公司的机器学习工程师,那么他既转换了角色也转换了行业。如果你正在寻找人工智能领域的第一份工作,你可能会发现单独转换角色或行业比同时进行这两件事更容易。假设你是金融服务业的分析师:如果你在金融服务业找到数据科学或机器学习相关的工作,你可以沿用你再特定领域的知识,同时获得人工智能方面的知识和专业技能。在这个职位上工作一段时间后,你将更适合转投科技公司(如果这仍然是你的目标的话)。或者,如果你成为一家科技公司的分析师,你可以继续使用你作为分析师的技能,但需要将其应用到不同的行业。作为科技公司的一员,你还可以更容易地向同事学习人工智能面临的实际挑战、在人工智能领域取得成功的关键技能等等。如果你正在考虑进行角色转换,创业公司可能比大企业更容易实现这一点。当然也有例外,但初创公司通常缺乏足够的人员来完成所有的预期工作。如果你能够帮助完成人工智能任务——即使这并非你的正式工作——你的付出可能会受到赞赏。在无需离开公司的前提下,这为可能的角色转换奠定了基础。相比之下,大企业严格的奖励制度更有可能奖励你出色完成了本职工作(以及支持你完成工作的经理),但不太会奖励你在工作范围之外做出的贡献。在你理想的职位和行业(例如,科技公司的机器学习工程师)工作一段时间后,你会对该行业更高层次的职位要求有更好的了解。你还将在该行业中拥有一个关系网络来帮助你继续发展。因此,如果你选择坚守岗位和行业,未来的求职可能会更容易。更换工作是你向未知迈出的一步,特别是当你换角色或换行业时。为了更熟悉新角色和/或行业,信息性面试是最未被充分利用的工具之一。我将在下一封信中更多地分享这一点。请不断学习!吴恩达发布于 2022-08-18 16:31原帖作者:吴恩达原帖标题:吴恩达来信:人工智能领域求职的小tips原帖地址:cid:link_4
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Last week, I wrote about switching roles, industries, or both as a framework for considering a job search. If you’re preparing to switch roles (say, taking a job as a machine learning engineer for the first time) or industries (say, working in an AI tech company for the first time), there’s a lot about your target job that you probably don’t know. A technique known as informational interviewing is a great way to learnAn informational interview involves finding someone in a company or role you’d like to know more about and informally interviewing them about their work. Such conversations are separate from searching for a job. In fact, it’s helpful to interview people who hold positions that align with your interests well before you’re ready to kick off a job search.Informational interviews are particularly relevant to AI. Because the field is evolving, many companies use job titles in inconsistent ways. In one company, data scientists might be expected mainly to analyze business data and present conclusions on a slide deck. In another, they might write and maintain production code. An informational interview can help you sort out what the AI people in a particular company actually do.With the rapid expansion of opportunities in AI, many people will be taking on an AI job for the first time. In this case, an informational interview can be invaluable for learning what happens and what skills are needed to do the job well. For example, you can learn what algorithms, deployment processes, and software stacks a particular company uses. You may be surprised — if you’re not already familiar with the data-centric AI movement — to learn how much time most machine learning engineers spend iteratively cleaning datasets.Prepare for informational interviews by researching the interviewee and company in advance, so you can arrive with thoughtful questions. You might ask:What do you do in a typical week or day?What are the most important tasks in this role?What skills are most important for success?How does your team work together to accomplish its goals?What is the hiring process?Considering candidates who stood out in the past, what enabled them to shine?Finding someone to interview isn’t always easy, but many people who are in senior positions today received help when they were new from those who had entered the field ahead of them, and many are eager to pay it forward. If you can reach out to someone who’s already in your network — perhaps a friend who made the transition ahead of you or someone who attended the same school as you — that’s great! Meetups such as Pie & AI can also help you build your network.Finally, be polite and professional, and thank the people you’ve interviewed. And when you get a chance, please pay it forward as well and help someone coming up after you. If you receive a request for an informational interview from someone in the DeepLearning.AI community, I hope you’ll lean in to help them take a step up! If you’re interested in learning more about informational interviews, I recommend this articlefrom the UC Berkeley Career Center.I’ve mentioned a few times the importance of your network and community. People you’ve met, beyond providing valuable information, can play an invaluable role by referring you to potential employers. Stay tuned for more on this topic.Keep learning!Andrew亲爱的朋友们,我在上周来信中阐述了关于转换角色、行业或两者皆变的内容,可作为考虑求职的框架。如果你正准备转换角色(比如第一次成为机器学习工程师)或行业(比如第一次在人工智能技术公司工作),你的目标工作有很多你可能不知道的地方。一种被称为信息性面试的技术是一种很好的学习方法。信息性面试指的是在感兴趣的公司或职位中找到你想了解更多的人,并就他们的工作进行非正式面试。这种交谈与求职是分开的。事实上,在你准备开始找工作之前,与那些职位与你的兴趣相符的人交谈是很有帮助的。信息性面试在人工智能行业尤其可以提供帮助。由于该领域在不断发展,许多公司定义职位的方式并不一致。在一家公司,数据科学家可能主要负责分析业务数据,并用幻灯片呈现结论。而另一家公司下,他们可能需要编写和维护生产代码。信息性面试可以帮助你了解特定公司的人工智能人员的实际工作。随着人工智能机会的迅速增多,许多人将有机会首次从事人工智能工作。在这种情况下,信息性面试对于了解发生了什么以及做好工作需要什么技能是非常宝贵的。例如,你可以了解特定公司使用的算法、部署过程和软件堆栈。如果你还不熟悉以数据为中心的人工智能浪潮,了解大多数机器学习工程师在迭代清理数据集上花费的时间,可能会让你感到惊讶。提前对接受交谈的人和其任职公司进行研究,为信息性面试做好准备,这样你就可以带着深思熟虑过的问题进行交谈。你可能会问:你在平常的一周或一天会做什么?这个职位中最重要的任务是什么?什么技能对成功最为重要?你的团队需要如何合作才能实现其目标?招聘流程是什么?对于那些在过去脱颖而出的候选人,是什么让他们光芒四射?找到合适的交谈并对象不总是容易的,许多如今任职高位的人在刚开始工作时都受益于那些帮助他们进入该领域的人,而且许多人都渴望向前迈进。如果你能联系到你已有人际网络中的人(也许是在你之前完成过渡的朋友,或者是和你在同一所学校就读的同学)来寻求帮助,那是极好的!诸如 Deeplearning.AI 的 Pie & AI 等聚会也可以帮助你建立网络。最后,保持礼貌和专业,并感谢与你交谈过的人。当你有机会的时候,也请帮助新入行的人。如果你收到来自 DeepLearning.AI 社区成员发出的信息性面试请求,我希望你能帮助他们迈出一步!如果你有兴趣了解更多有关信息性面试的信息,欢迎点击下方阅读原文查看一篇来自加州大学伯克利分校职业中心的文章。我已多次提到人际网络和社区的重要性。你所结识的人除了可以提供有价值的信息外,还可以在将你引荐给潜在雇主方面提供更宝贵的帮助。请继续关注有关此主题的更多信息。请不断学习!吴恩达发表于 2022-08-25 14:47原帖作者:吴恩达原帖标题:吴恩达来信:想尝试新工作?先进行信息性面试吧!原帖地址:cid:link_3
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,I’ve devoted several recent letters to building a career in AI. In this one, I’d like to discuss some fine points of finding a job.The typical job search follows a fairly predictable path.Research roles and companies online or by talking to friends.Optionally, arrange informal informational interviews with people in companies that appeal to you.Either apply directly or, if you can, get a referral from someone on the inside.Interview with companies that give you an invitation.Receive one or more offers and pick one. Or, if you don’t receive an offer, ask for feedback from the interviewers, the human resources staff, online discussion boards, or anyone in your network who can help you plot your next move.Although the process may be familiar, every job search is different. Here are some tips to increase the odds you’ll find a position that supports your thriving and enables you to keep growing.Pay attention to the fundamentals. A compelling resume, portfolio of technical projects, and a strong interview performance will unlock doors. Even if you have a referral from someone in a company, a resume and portfolio will be your first contact with many people who don’t already know about you. Update your resume and make sure it clearly presents your education and experience relevant to the role you want. Customize your communications with each company to explain why you’re a good fit. Before an interview, ask the recruiter what to expect. Take time to review and practice answers to common interview questions, brush up key skills, and study technical materials to make sure they are fresh in your mind. Afterward, take notes to help you remember what was said.Proceed respectfully and responsibly. Approach interviews and offer negotiations with a win-win mindset. Outrage spreads faster than reasonableness on social media, so a story about how an employer underpaid someone gets amplified, whereas stories about how an employer treated someone fairly do not. The vast majority of employers are ethical and fair, so don’t let stories about the small fraction of mistreated individuals sway your approach. If you’re leaving a job, exit gracefully. Give your employer ample notice, give your full effort through your last hour on the job, transition unfinished business as best you can, and leave in a way that honors the responsibilities you were entrusted with.Choose who to work with. It’s tempting to take a position because of the projects you’ll work on. But the teammates you’ll work with are at least equally important. We’re influenced by people around us, so your colleagues will make a big difference. For example, if your friends smoke, the odds rise that you, too, will smoke. I don’t know of a study that shows this, but I’m pretty sure that if most of your colleagues work hard, learn continuously, and build AI to benefit all people, you’re likely to do the same. (By the way, some large companies won’t tell you who your teammates will be until you’ve accepted an offer. In this case, be persistent and keep pushing to identify and speak with potential teammates. Strict policies may make it impossible to accommodate you, but in my mind, that increases the risk of accepting the offer, as it increases the odds you’ll end up with a manager or teammates who aren’t a good fit.)Get help from your community. Most of us go job hunting only a small number of times in our careers, so few of us get much practice at doing it well. Collectively, though, people in your immediate community probably have a lot of experience. Don’t be shy about calling on them. Friends and associates can provide advice, share inside knowledge, and refer you to others who may help. I got a lot of help from supportive friends and mentors when I applied for my first faculty position, and many of the tips they gave me were very helpful.I know that the job search process can be intimidating. Instead of viewing it as a great leap, consider an incremental approach. Start by identifying possible roles and conducting a handful of informational interviews. If these conversations tell you that you have more learning to do before you’re ready to apply, that’s great! At least you have a clear path forward. The most important part of any journey is to take the first step, and that step can be a small one.Keep learning!Andrew亲爱的朋友们,最近的几封来信致力于帮助大家在人工智能领域建立职业生涯。今天的这篇文章中,我想着重讨论一些求职要点。典型的求职会遵循一条相当可预测的路径。在线研究感兴趣的职务和公司,或线下与朋友交谈。或者,与你感兴趣的公司的工作人员进行非正式的信息性面试。可以直接申请,也可以寻求内推。与向你发出邀请的公司进行面试。接收一个或多个工作邀请并从中选择一个。即便你没有收到工作邀请,也可以向面试官、人力资源部员工、在线讨论或关系网中任何可以帮助你规划下一步行动的人寻求反馈。你可能对这个过程感到熟悉,但每次求职都是不同的。以下是一些小tips,可以增加你找到一个建立事业并使你不断成长的职位的几率。注重面试前的准备工作。一份引人注目的简历、一系列技术项目和一次出色的面试表现将为你打开大门。即使你是公司内推的,简历和项目集合将成为你与许多不了解你的人的第一次接触。更新你的简历,确保它清楚地展示了与你想要的角色相关的受教育程度和工作经验。根据每家公司的特点进行沟通,以说明为什么你适合这个职位。在面试前询问招聘人员公司的预期是什么。花时间演练常见面试问题的回答、复习关键技能、学习技术材料,并做笔记以确保将它们印刻在你的脑海中。尊重和负责地进行过渡。以双赢的心态进行面试和协商。在社交媒体上,愤怒情绪的传播速度要比理性言论快得多,因此,关于雇主如何克扣付员工薪酬的故事会被放大,而雇主公平对待员工的故事则无人问津。绝大多数雇主都是道德和公平的,因此不要让一小部分受到不公待遇的人的故事影响你的做法。如果你要离职,请尽量妥善地安排离职交接。给你的雇主留出充分的应对时间,站好最后一班岗,尽力完成未完的工作,并体面地离开。选择与谁合作。我们很容易因为想从事的项目而接受一个职位,而你将与之合作的队友也同样重要。正所谓近朱者赤,近墨者黑。如果你的朋友吸烟,那么你吸烟的可能性就会增加。我无法说清是哪项研究表明了这一观点,但我非常肯定,如果你的大多数同事都努力工作、不断学习、构建人工智能来让所有人受益,那么你很可能也会这样做。(顺便说一句,一些大公司在你正接受一份工作之前不会告诉你你的队友是谁。在这种情况下,请坚持下去,不断努力寻找潜在的队友并与之交谈。严格的政策可能会让你无法适应,在我看来,这会增加你拒接这份工作的风险,因为这会提高你与不太“合得来”的经理或队友共事的可能性。)从你的社群获得帮助。我们中的大多数人在职业生涯中只经历过为数不多的几次求职,因此很少有人对此具有丰富的实践。不过,作为一个整体,社区里的人们可能会提供很多经验。不要羞于拜访他们,朋友和同事可以提供建议,分享内部知识,并将你引荐给可能带来帮助的其他人。当我申请第一个教师职位时,我从支持我的朋友和导师那里得到了很多帮助,他们的很多建议都非常有用。我知道求职的过程可能会令人生畏。与其将其视为一次巨大的飞跃,不如考虑一种渐进的方法。首先确定潜在的职位,并进行一些信息性面试。如果对话的内容告诉你,在申请之前还有更很多的准备工作要做,那就太好了!至少你有了一条明确的前进道路。任何旅程中最重要的部分都是迈出第一步,尽管这一步可能很小。请不断学习!吴恩达发布于 2022-09-01 19:33原帖作者:吴恩达原帖标题:吴恩达来信:人工智能领域的求职小 tips原帖地址:cid:link_7
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,A few weeks ago, the White House required that research papers funded by the U.S. government be available online promptly and freely by the end of 2025. Data that underlies those publications must also be made available.I’m thrilled! Paywalled journals that block free access to scientific research are the bane of the academic community.The AI world is fortunate to have shifted years ago to free online distribution of research papers, primarily through the arXiv site. I have no doubt that this has contributed to the rapid rise of AI and am confident that, thanks to the new U.S. policy, promoting a similar shift in other disciplines will accelerate global scientific progress.In the year 2000 — before modern deep learning, and when dinosaurs still roamed the planet — AI researchers were up in arms against paywalled journals. Machine Learning Journal, a prominent journal of the time, refused to open up access. With widespread support from the AI community, MIT computer scientist Leslie Kaelbling started the free Journal of Machine Learning Research, and many researchers promptly began publishing there instead. This move led to the rapid decline of Machine Learning Journal. The Journal of Machine Learning Research remains a respected institution today, edited by David Blei and Francis Bach (both of who are my former officemates at UC Berkeley).Before the modern internet, journal publishers played an important role by printing and disseminating hard copies of papers. It was only fair that they could charge fees to recoup their costs and make a modest profit. But in today’s research environment, for-profit journals rely mainly on academics to review papers for free, and they harvest the journals’ reputations (as reflected in metrics such as impact factor) to extract a profit.Today, there are peer-reviewed journal papers, peer-reviewed conference papers, and non-peer-reviewed papers posted online directly by the authors. Journal articles tend to be longer and undergo peer review and careful revisions. In contrast, conference papers (such as NeurIPS, ICML and ICLR articles) tend to be shorter and less carefully edited, and thus they can be published more quickly. And papers published on arXiv aren’t peer reviewed, so they can be published and reach interested readers immediately.The benefits of rapid publication and distribution have caused a lot of the action to shift away from journals and toward conferences and arXiv. While the volume of research is overwhelming (that’s why The Batch tries to summarize the AI research that matters), the velocity at which ideas circulate has contributed to AI’s rise.By the time the new White House guidance takes effect, a quarter century will have passed since machine learning researchers took a key step toward unlocking journal access. When I apply AI to healthcare, climate change, and other topics, I occasionally bump into an annoyingly paywalled article from these other disciplines. I look forward to seeing these walls come down.Don’t underestimate the impact of freeing up knowledge. I wish all these changes had taken place a quarter century ago, but I’m glad we’re getting there and look forward to the acceleration of research in all disciplines!Keep learning!Andrew亲爱的朋友们,几周前,美国白宫提出要求——由美国政府资助产出的研究论文需在2025年底前及时、免费在线公开。这些出版物涉及的基础数据也必须一并公开。我对此感到很激动!收费期刊阻碍了人们免费获取科学研究,导致学术壁垒无法被打破。人工智能世界很幸运地在几年前转向了免费在线发布研究论文,主要是通过arXiv网站。毫无疑问,这促进了人工智能的迅速崛起,我相信,通过美国采取的新政策,促进其他学科进行类似转变将加速全球科学的进步。2000年时,在现代深度学习时代之前,人工智能研究人员就与付费期刊展开了激烈的斗争。《机器学习》杂志是当时的一本著名杂志,拒绝向民众开放访问。在人工智社群的广泛支持下,麻省理工学院计算机科学家 Leslie Kaelbling 创办了免费的《机器学习研究》杂志,许多研究人员立即开始在那里发表文章。这一举动导致了《机器学习》杂志的迅速衰落。《机器学习研究》杂志至今仍是一个受人尊敬的机构,由 David Blei 和 Francis Bach(他们都是我在加州大学伯克利分校的前同事)共同主编。在现代互联网出现之前,期刊出版商通过印刷和传播纸质论文发挥了重要作用。公平的做法是,他们可以收取费用以收回成本,并获取适当利润。但在如今的研究环境中,营利性期刊主要依靠由学术界免费审查论文,并赚取期刊声誉(如影响因子等指标所反映的)来获利。如今,我们能看到同行评议的期刊论文、同行评议的会议论文和由作者直接在线发布的非同行评议论文。期刊文章往往较长,且经过同行审查和仔细修订。相反,会议论文(如 NeurIPS, ICML 和 ICLR 文章)往往较短,编辑较少,因此可以更快地发表。在 arXiv 上发表的论文没有经过同行评议,因此它们可以快速发表并接触到感兴趣的读者。快速出版和发行的好处导致许多行动从期刊转向会议和arXiv。虽然科研数量巨大(这就是为什么 The Batch 试图总结重要的人工智能研究),但思想的快速传播有效推动了人工智能的崛起。到新的白宫指南真正生效时,机器学习研究人员就能在解锁期刊访问方面迈出关键一步,耗时四分之一个世纪。当我将人工智能应用于医疗保健、气候变化和其他主题时,也偶尔会遇到来自这些其他学科的令人讨厌的付费文章。期待早日着看到这些知识围墙倒塌。不要低估释放知识的影响。我希望所有这些变化都能发生在四分之一个世纪前,我也很高兴我们正在推进,并期待所有学科的研究都能加速前进!请不断学习!吴恩达发布于 2022-09-08 19:37原帖作者:吴恩达原帖标题:吴恩达来信:推动科研成果及时、免费发布原帖地址:cid:link_3
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Stable Diffusion, an image generation model that takes a text prompt and produces an image, was released a few weeks ago in a landmark event for AI. While similar programs like DALL·E and Craiyon can be used via API calls or a web user interface, Stable Diffusion can be freely downloaded and run on the user’s hardware.I'm excited by the artwork produced by such programs (Developer Simon Willison posted a fun tweetstorm that highlights some of the creativity they’ve unleashed), but I’m also excited by the ways in which other developers are incorporating it into their own drawing tools. Ironically, Stable Diffusion’s manner of release moves us closer to “open AI” than the way DALL·E was released by the company called OpenAI. Kudos to Emad Mostaque and his Stability AI team, which developed the program.If you want to learn about how diffusion models like Stable Diffusion work, you can find a concise description here.Image generation is still maturing, but it’s a big deal. Many people have the creativity to produce art but lack the drawing skill to do so. As an amateur illustrator (I like to draw pandas to entertain my daughter using the Procreate paint app), my meager skill limits what I can create. But sitting in front of the DALL·E or Stable Diffusion user interface, I can ask her what she wants to see a panda doing and render a picture for her.Artists who have greater skill than I can use image generators to create stunning artworks more efficiently. In fact, an image produced this way recently won an art competition at the Colorado State Fair.The rise of inexpensive smartphone cameras brought an explosion in photography, and while expensive DSLRs still have a role, they now produce a minuscule fraction of all pictures taken. I expect AI-powered image generators to do something similar in art: Ever-improving models and user interfaces will make it much more efficient to generate art using AI than without. I see a future where most art is generated using AI, and novices who have great creativity but little drawing skill will be able to participate.My friend and collaborator Curt Langlotz, addressing the question of whether AI will replace radiologists, said that radiologists who use AI will replace radiologists who don’t. The same will be true here: Artists who use AI will (largely) replace artists who don’t. Imagine the transition in the 1800s from the time when each artist had to source their own minerals to mix shades of paint to when they could purchase ready-mixed paint in a tube. This development made it easier for any artist to paint whatever and whenever they wished. I see a similar transition ahead. What an exciting time!Separately from generating images for human consumption, these algorithms have great potential to generate images for machine consumption. A number of companies have been developing image generation techniques to produce training images for computer vision algorithms. But because of the difficulty of generating realistic images, many have focused on vertical applications that are sufficiently valuable to justify their investment, such as generating road scenes to train self-driving cars or portraits of diverse faces to train face recognition algorithms.Will image generation algorithms reduce the cost of data generation and other machine-to-machine processes? I believe so. It will be interesting to see this space evolve.亲爱的朋友们,Stable Diffusion 是一种采用文本提示为输入的图像生成模型,于几周前在一个人工智能里程碑活动中发布。虽然与之类似的程序,如 DALL·E 和 Craiyon 可以通过 API 调用或 web 用户界面进行使用,但 Stable Diffusion 可以免费下载并在用户的个人硬件上运行。我对这些程序制作的作品充满兴趣(开发者 Simon Willison 发布了一个有趣的 tweetstorm,强调了他们释放的一些创造力),但我也对其他开发者将其融入自己的绘图工具的方式感到兴奋。具有讽刺意味的是,Stable Diffusion 的发布方式使我们更接近“Open AI”,而不是像 OpenAI 发布 DALL·E 那样。向 Emad Mostaque 和他的 Stability AI 团队致敬,是他们开发了 Stable Diffusion 程序。如果你想了解诸如 Stable Diffusion 这样的扩散模型是如何工作的,欢迎点击此处浏览一个简洁的介绍。图像生成技术仍在不断成熟,这是一件大事。许多人有创作艺术的创造力,但缺乏绘画技巧。作为一名业余插画师(我喜欢用 Procreate paint 应用程序绘制熊猫来讨女儿欢心),有限的绘画技能限制了我的创作力。但使用 DALL·E 或 Stable Diffusion 程序,我可以直接询问女儿想看熊猫做什么,并为她制作一张图片。拥有比我更高技能的艺术家可以使用图像生成器更高效地创作令人惊叹的艺术品。事实上,以这种方式制作的图像最近在科罗拉多州博览会上赢得了一场艺术比赛。廉价智能手机相机的兴起引发了人们对摄影的巨大兴趣,虽然昂贵的单反相机仍然发挥着作用,但它们在如今的摄影作品中的占比微乎其微。我希望人工智能图像生成器在艺术领域也能产生类似的影响:不断改进的模型和用户界面将让使用人工智能生成艺术更加高效。我看到了一个未来趋势,大多数艺术都是使用人工智能生成的,而那些有很大创造力但绘画技能很少的新手将有机会参与其中。我的朋友兼合作者 Curt Langlotz 在谈到人工智能是否会取代放射科医生的问题时说,使用人工智能的放射科医生将取代不使用的放射科医师。对艺术来说也是如此:使用人工智能的艺术家将(很大程度上)取代不使用这项技术的艺术家。想象一下19世纪时发生的转变,从每一位艺术家都必须自己寻找矿物来混合不同颜色的颜料,到他们可以直接购买预调好的管状颜料。这一发展使艺术家们能够更随心所欲地创作。我看到了类似的转变。多么激动人心的时刻!与生成供人类消费的图像不同,这些算法具有生成供机器消费的图像的巨大潜力。许多公司一直在开发图像生成技术,为计算机视觉算法生成训练图像。但由于难以生成真实图像,许多人将注意力集中在了具有足够价值的垂直应用上,以证明其投资的合理性。例如生成道路场景以训练自动驾驶汽车,或生成不同的人脸肖像以训练人脸识别算法。图像生成算法是否会降低数据生成和其他机器对机器过程的成本?我相信会是这样的。期待看到这个领域的发展。请不断学习!吴恩达发布于 2022-09-15 15:35原帖作者:吴恩达原帖标题:吴恩达来信:为艺术创造打开新大门的 Stable Diffusion原帖地址:cid:link_6
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Activities such as writing code and solving math problems are often perceived as purely intellectual pursuits. But this ignores the fact that they involve the mental equivalent of muscle memory.The idea of muscle memory is a powerful concept in human learning. It has helped millions of people to understand the importance of practice in learning motor tasks. However, it’s also misleading because it excludes skills that don’t involve using muscles.I believe that a similar principle operates in learning intellectual skills. Lack of recognition of this fact has made it harder for people to appreciate the importance of practice in acquiring those skills as well.The phenomenon of muscle memory is widely acknowledged. When you repeatedly practice balancing on a bicycle, swinging a tennis racquet, or typing without looking at the keyboard, adaptations in your brain, nervous system, and muscles eventually allow you to carry out the task without having to consciously pay attention to it.The brain and nervous system are central to learning intellectual skills, and these parts of the body also respond to practice. Whether you’re writing code, solving math problems, or playing chess, practice makes you better at it. It leads your brain to form mental chunks that allow you to reason at a higher level. For example, a novice programmer has to think carefully about every parenthesis or colon, but with enough practice, coding common subroutines can take little conscious effort. Practice frees up your attention to focus on higher-level architectural issues.Of course, there are biological differences between learning motor skills and learning intellectual skills. For example, the former involves parts of the brain that specialize in movement. And the physical world presents somewhat different challenges each time you perform an action (for example, your bicycle hits different bumps, and an opposing tennis player returns each of your serves differently). Thus practicing motor skills automatically leads you to try out your actions in different situations, which trains your brain to adapt to different problems.But I think there are more similarities than people generally appreciate. While watching videos of people playing tennis can help your game, you can’t learn to play tennis solely by watching videos. Neither can you learn to code solely by watching videos of coding. You have to write code, see it sometimes work and sometimes not, and use that feedback to keep improving. Like muscle memory, this kind of learning requires training the brain and nervous system through repetition, focused attention, making decisions, and taking breaks between practice sessions to consolidate learning. And, like muscle memory, it benefits from variation: When practicing an intellectual task, we need to challenge ourselves to work through a variety of situations rather than, say, repeatedly solving the same coding problem.All of this leads me to think that we need an equivalent term for muscle memory in the intellectual domain. As knowledge work has come to play a larger economic role relative to physical labor, the ability to learn intellectual tasks has become much more important than it was when psychologists formed the idea of muscle memory around 150 years ago. This new term would help people understand that practice is as crucial to developing intellectual skills as muscular ones.How about intellect memory? It’s not an elegant phrase, but it acknowledges this under-appreciated reality of learning.What intellectual task do you develop intellect memory for, and can you find time in your schedule to do the necessary practice? After all, there’s no better way to learn.Keep learning!Andrew亲爱的朋友们:编写代码和解决数学问题等行为通常被视为纯粹的智能追求。但这忽略了一个事实,即它们涉及到与心理相当的肌肉记忆。肌肉记忆是人类学习中一个强有力的概念。它帮助数百万人理解了练习在学习行为中的重要性。然而,这也是一种误导,因为它排除了不需要使用肌肉的技能。我相信,在学习智能技术方面也有类似的原则。由于对这一事实的认识不足,人们也很难认识到实践在获取这些技能方面的重要性。肌肉记忆现象是公认的。当你在自行车上反复练习平衡、挥动网球拍或打字而不用看键盘时,你的大脑、神经系统和肌肉的适应最终会让你无需刻意关注就能完成任务。大脑和神经系统是学习智能技术的核心,身体的这些部位也会对练习作出反应。无论你是在编写代码、解决数学问题还是下棋,充分地练习都会让你做得更好。它会引导你的大脑形成心理语块(mental chunks),让你在更高层次上进行推理。例如,新手程序员必须仔细考虑每个括号或冒号,但经过足够的练习,编写通用子程序几乎不需要付出有意识的努力。实践可以让你将注意力放在更高级别的体系结构问题上。当然,学习运动技能和学习智能技术之间存在生物学差异。例如,前者涉及大脑中专门从事运动的部分。每当你执行一个动作时,现实世界都会呈现出一些不同的挑战(例如,你的自行车碰到不同的颠簸、对面的网球运动员对每个发球的回球都不同)。因此,练习运动技能会自动引导你在不同的情况下尝试你的动作,训练你的大脑适应不同的问题。但我认为,它们之间的相似性比人们通常所理解的要多。虽然观看人们打网球的视频有助于你的比赛,但你不能仅仅通过观看视频来学习打球;也不能仅仅通过观看编码视频来学习编码。你必须自己联系编写代码,看到它有时有效有时无效,并利用反馈来不断改进。像肌肉记忆一样,这种学习需要通过重复、集中注意力、做决定和在练习之间休息来训练大脑和神经系统,以巩固学习。和肌肉记忆一样,它也能从变化中受益:在练习智能任务时,我们需要在各种情况下工作以挑战自己,而不是重复解决同一个编码问题。这些都使我认为,我们需要一个在智能领域中与肌肉记忆等效的术语。随着知识工作相对于体力劳动发挥着更大的经济作用,学习智能任务的能力变得比150年前心理学家提出肌肉记忆概念时更为重要。这个新术语将帮助人们理解,练习对于发展智能技术和肌肉技能同样重要。我们称之为“智能记忆”怎么样?这也许不是最合适的形容,但它认证了这一关于学习的被低估的事实。你会为提升智能记忆开发什么样的智能任务,你能在你的时间表中抽出时间来做必要的练习吗?毕竟,没有更好的学习方法了。请不断学习!吴恩达发布于 2022-09-22 15:44原帖作者:吴恩达原帖标题:吴恩达来信:为“智能记忆”多加练习吧!原帖地址:cid:link_2
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Each year, AI brings wondrous advances. But, as Halloween approaches and the veil lifts between the material and ghostly realms, we see that spirits take advantage of these developments at least as much as humans do.As I wrote last week, prompt engineering, the art of writing text prompts to get an AI model to generate the output you want, is a major new trend. Did you know that the Japanese word for prompt — 呪文— also means spell or incantation? (Hat tip to natural language processing developer Paul O’Leary McCann.) The process of generating an image using a model like DALL·E 2 or Stable Diffusion does seem like casting a magic spell — not to mention these programs' apparent ability to reanimate long-dead artists like Pablo Picasso — so Japan's AI practitioners may be onto something.Some AI companies are deliberately reviving the dead. The startup HereAfter AIproduces chatbots that speak, sound, and look just like your long-lost great grandma. Sure, it's a simulation. Sure, the purpose is to help the living connect with deceased loved ones. When it comes to reviving the dead — based on what I've learned by watching countless zombie movies — I'm sure nothing can go wrong.I'm more concerned by AI researchers who seem determined to conjure ghastly creatures. Consider the abundance of recent research into transformers. Every transformer uses multi-headed attention. Since when is having multiple heads natural? Researchers are sneaking multi-headed beasts into our computers, and everyone cheers for the new state of the art! If there's one thing we know about transformers, it's that there's more than meets the eye.This has also been a big year for learning from masked inputs, and approaches like Masked Autoencoders, MaskGIT, and MaskViT have achieved outstanding performance in difficult tasks. So if you put on a Halloween mask, know that you're supporting a key idea behind AI progress.Trick or treat!Andrew亲爱的朋友们:人工智能每年都会带来惊人的进步。但是,随着万圣节的临近,物质世界和幽灵世界之间的面纱被缓缓揭开,我们看到,鬼魂世界也和人类世界一样利用了这些发展。正如我在上周的来信中所写的,prompt engineering(提示词工程)——即编写文本提示以使AI模型生成所需输出的艺术,是一个主要的新趋势。你知道日语中的“提示”一词——呪文——也意味着咒语或咒语?(向自然语言处理开发人员Paul O'Leary McCann致敬。)使用DALL·E 2或Stable Diffusion等模型生成图像的过程确实像是施了一个魔法(更不用说这些程序明显有能力让帕勃罗·毕加索等已故艺术家“复活”),所以日本的人工智能从业者可能会有所收获。一些人工智能公司正在试图复活逝者。初创公司HereAfter AI生产的聊天机器人在讲话、声音和外观上都像你许久不见的曾祖母。当然,这只是一个模拟,目的是帮助生者与已故亲人取得某种“联系”。根据我从无数僵尸电影中学到的,当谈到复活逝者时,我确信没有什么会出错。我更关心的是那些似乎决心创造恐怖生物的人工智能研究人员。想想最近对transformer的大量研究。每个transformer都用到了多头关注。什么时候开始有“多个头”是自然的了?研究人员正在潜入我们的电脑中,每个人都在为这项新技术而欢呼!如果说我们对transformer有一点了解的话,那就是事情并不像最初看到的那样简单。今年也是从掩码输入中学习的一年,掩码自动编码器、MaskGIT和MaskViT等方法在困难任务中取得了出色的表现。所以,如果你戴上万圣节面具,那么你就支持了人工智能进步背后的一个关键想法。不给糖就捣蛋!吴恩达发布于 2022-10-27 14:20原帖作者:吴恩达原帖标题:吴恩达来信:人类和鬼魂都在使用AI?!原帖地址:cid:link_6
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,Is prompt engineering — the art of writing text prompts to get an AI system to generate the output you want — going to be a dominant user interface for AI? With the rise of text generators such as GPT-3 and Jurassic and image generators such as DALL·E, Midjourney, and Stable Diffusion, which take text input and produce output to match, there has been growing interest in how to craft prompts to get the output you want. For example, when generating an image of a panda, how does adding an adjective such as “beautiful” or a phrase like “trending on artstation” influence the output? The response to a particular prompt can be hard to predict and varies from system to system.So is prompt engineering an important direction for AI, or is it a hack?Here’s how we got to this point:The availability of large amounts of text or text-image data enabled researchers to train text-to-text or text-to-image models.Because of this, our models expect text as input.So many people have started experimenting with more sophisticated prompts.Some people have predicted that prompt engineering jobs would be plentiful in the future. I do believe that text prompts will be an important way to tell machines what we want — after all, they’re a dominant way to tell other humans what we want. But I think that prompt engineering will be only a small piece of the puzzle, and breathless predictions about the rise of professional prompt engineers are missing the full picture.Just as a TV has switches that allow you to precisely control the brightness and contrast of the image — which is more convenient than trying to use language to describe the image quality you want — I look forward to a user interface (UI) that enables us to tell computers what we want in a more intuitive and controllable way.ake speech synthesis (also called text-to-speech). Researchers have developed systems that allow users to specify which part of a sentence should be spoken with what emotion. Virtual knobs allow you to dial up or down the degree of different emotions. This provides fine control over the output that would be difficult to express in language. By examining an output and then fine-tuning the controls, you can iteratively improve the output until you get the effect you want.So, while I expect text prompts to remain an important part of how we communicate with image generators, I look forward to more efficient and understandable ways for us to control their output. For example, could a set of virtual knobs enable you to generate an image that is 30 percent in the style of Studio Ghibli and 70 percent the style of Disney? Drawing sketches is another good way to communicate, and I’m excited by img-to-img UIs that help turn a sketch into a drawing.Likewise, controlling large language models remains an important problem. If you want to generate empathetic, concise, or some other type of prose, is there an easier way than searching (sometimes haphazardly) among different prompts until you chance upon a good one?When I’m just playing with these models, I find prompt engineering a creative and fun activity; but when I’m trying to get to a specific result, I find it frustratingly opaque. Text prompts are good at specifying a loose concept such as “a picture of a panda eating bamboo,” but new UIs will make it easier to get the results we want. And this will help open up generative algorithms to even more applications; say, text editors that can adjust a piece of writing to a specific style, or graphics editors that can make images that look a certain way.Lots of exciting research ahead! I look forward to UIs that complement writing text prompts.Keep learning!Andrew亲爱的朋友们:Prompt engineering(提示工程)—— 即编写文本提示以生成想要的输出的人工智能系统的艺术是否会成为人工智能的主导用户界面?随着文本生成器(如GPT-3和Jurassic)和图像生成器(如DALL·E、Midtridge和Stable Diffusion)的兴起(需要输入文本并生成匹配的输出),人们对如何创建提示以获得想要的输出越来越感兴趣。例如,在生成熊猫图像时,添加诸如“beautiful”之类的形容词或诸如“trending on artstation”之类的短语将如何影响输出?对特定提示的响应可能很难预测,并且会因系统而异。那么,prompt engineering是人工智能的一个重要方向,还是一种黑客行为呢?为什么会这么想:大量文本或文本-图像数据的可用性使研究人员能够训练文本-文本或文本-图像模型。因此,我们的模型期望将文本作为输入。因此,许多人开始尝试使用更为复杂的提示。一些人预测,未来会出现大量涉及prompt engineering的工作。我确实相信,文本提示将是告诉机器我们想要什么的一种重要方式——毕竟,它是告诉其他人我们需要什么的主要方式。但我认为,prompt engineering只是构成这个谜团的一小部分,关于专业提示工程师即将崛起的令人兴奋的预测也并不明朗。正如电视上的转换按键可以让你精确控制图像的亮度和对比度,这比试图用语言描述你想要的图像质量更方便。我期待会有一个用户界面(UI),它使我们能够以更直观和可控的方式告诉计算机我们想要什么。以语音合成(也称为文本-语音)为例。研究人员开发了一种系统,允许用户指定句子的哪个部分应该用什么样的情感说话。虚拟旋钮允许你调高或调低不同情绪的程度。这提供了对难以用语言表达的输出的精细控制。通过检查输出,然后微调控件,我们可以反复改进输出直到获得所需的效果。因此,虽然我希望文本提示仍然是我们与图像生成器通信的重要组成部分。但我希望我们能找到更高效、更容易理解的方法来控制它们的输出。例如,一组虚拟旋钮是否可以生成一个30%是吉卜力工作室(知名日本动画工作室)风格,70%是迪斯尼风格的图像?绘制草图是另一种很好的交流方式,使用img-img UIs将草图转换为绘图的方式也令我感到兴奋。同样,控制大型语言模型仍然是一个重要问题。如果你想产生感同身受的、简洁的或其他类型的散文,有没有比在不同提示中进行搜索(有时只是随意浏览)更简单的方法,帮你找到一个合适的提示?当我只是在试用这些模型时,我发现prompt engineering是一项富有创造性和有趣的活动;但当我试图得到一个具体的结果时,却发现它迟钝地令人沮丧。文本提示可以很好地指定一个松散的概念,例如“熊猫吃竹子的图片”,但新的UI可以更容易地获得我们想要的结果。这将有助于将生成算法扩展到更多的应用程序;例如,可以将一段文字调整为特定样式的文本编辑器,或者可以将图像调整为某种形式的图形编辑器。未来还将有许多令人兴奋的研究出现!我期待着UI能够帮助补充编写文本提示。请不断学习!吴恩达发布于 2022-10-20 19:05原帖作者:吴恩达原帖标题:吴恩达来信:Prompt engineering的现状及未来原帖地址:cid:link_3
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,The rise of AI over the last decade has been powered by the increasing speed and decreasing cost of GPUs and other accelerator chips. How long will this continue? The past month saw several events that might affect how GPU prices evolve.In September, Ethereum, a major blockchain that supports the cryptocurrency known as ether, completed a shift that significantly reduced the computation it requires. This shift — dubbed the Merge — should benefit the natural environment by consuming less energy. It will also decrease demand for GPUs to carry out cryptocurrency mining. (The Bitcoin blockchain remains computationally expensive.) I expect that lower demand will help lower GPU prices.On the other hand, Nvidia CEO Jensen Huang declared recently that the era in which chip prices could be expected to fall is over. Moore’s Law, the longstanding trend that has doubled the number of transistors that could fit in a given area of silicon roughly every two years, is dead, he said. It remains to be seen how accurate his prediction is. After all, many earlier reports of the death of Moore’s Law have turned out to be wrong. Intel continues to bet that it will hold up.That said, improvements in GPU performance have exceeded the pace of Moore’s Law as Nvidia has optimized its chips to process neural networks, while the pace of improvements in CPUs, which are designed to process a wider range of programming, has fallen behind. So even if chip manufacturers can’t pack silicon more densely with transistors, chip designers may be able to continue optimizing to improve the price/performance ratio for AI.I’m optimistic that AI practitioners will get the processing power they need. While much AI progress has been — and a meaningful fraction still is — driven by using cheaper computation to train bigger neural networks on bigger datasets, other engines of innovation now drive AI as well. Data-centric AI, small data, more efficient algorithms, and ongoing work to adapt AI to thousands (millions?) of new applications will keep things moving forward.Semiconductor startups have had a hard time in recent years because, by the time they caught up with any particular offering by market leader Nvidia, Nvidia had already moved on to a faster, cheaper product. If chip prices stop falling, they’ll have a bigger market opportunity — albeit with significant technical hurdles — to build competitive chips. The industry for AI accelerators remains dynamic. Intel and AMD are making significant investments and a growing number of companies are duking it out on the MLPerf benchmark that measures chip performance. I believe the options for training and inference in the cloud and at the edge will continue to expand.Keep learning!Andrew亲爱的朋友们:近十年来,人工智能的兴起得益于GPU及其他加速器芯片速度的提高和成本的降低。这个趋势会持续多久?过去一个月发生了一些可能影响GPU价格变化的事件。9月,支持以太加密货币的主要区块链以太坊(Ethereum)完成了一次转换,大大减少了所需的计算量。这种转变被称为“合并”(the Merge),它应该能够通过消耗更少的能源来造福自然环境。它还将减少对GPU进行加密货币挖掘的需求。(比特币区块链的计算成本仍然很高。)我预计较低的需求将有助于降低GPU价格。另一方面,英伟达首席执行官黄仁勋近日宣布,预计芯片价格下跌的时代已经结束。他说,摩尔定律已经过时,这一长期趋势使大约每两年可以在硅的某一特定区域内安装的晶体管数量翻了一番。这一预测有多准确尚待观察。毕竟,早年间许多关于摩尔定律已经过时的报道都被证明是错误的。英特尔就笃定它会持续下去。也就是说,GPU性能的改进已经超过了摩尔定律的速度,因为英伟达已经优化了其芯片以处理神经网络,而用于处理更大编程范围的CPU的改进速度已经落后。因此,即使芯片制造商无法用晶体管更密集地封装硅,芯片设计者也可以持续进行优化以提高AI的性价比。尽管全球范围内对芯片的生产和需求出现波动,我对人工智能从业者将获得他们需要的处理能力依然持乐观态度。虽然使用更廉价的计算在更大的数据集上训练更大的神经网络来推动人工智能已取得了很大的进步,但现在其他创新引擎也在推动人工智能的发展。以数据为中心的人工智能、小数据、更高效的算法,以及正在进行的使人工智能适应数千(百万?)新应用程序的开发工作将推动事情向前发展。近年来,半导体初创公司经历了一段艰难的时期,因为当他们赶上市场领导者英伟达提供的任何特定产品时,英伟达的重心却开始转向研发速度更快、更便宜的产品了。如果芯片价格停止下跌,他们将获得更大的市场机会来制造具有竞争力的芯片——尽管依然存在重大的技术壁垒。人工智能加速器行业仍然充满活力。英特尔和AMD正在进行重大投资,越来越多的公司正在MLPerf基准(用以衡量芯片性能)上进行较量。我相信在云端和边缘设备上进行训练和推理的选项将继续被扩展。请不断学习!吴恩达发布于 2022-10-13 15:38原帖作者:吴恩达原帖标题:吴恩达来信:AI, GPU和芯片的未来原帖地址:cid:link_5
-
【编者按】吴恩达是AI界的翘楚。他在知乎上一直有个专题,叫做《吴恩达来信》。信中时常用中英文对AI的过去、现在和未来做了分析,对人醍醐灌顶或者发人深省。下面张小白从最新到最旧对这些文章做个转载,仅供学习之用,不用于商业目的,如有侵权请告知。Dear friends,When I wrote recently about how to build a career in AI, several readers wrote to ask specifically about AI product management: the art and science of designing compelling AI products. I’ll share lessons I’ve learned about this here and in future letters.A key concept in building AI products is iteration. As I’ve explained in past letters, developing a machine learning system is a highly iterative process. First you build something, then run experiments to see how it performs, then analyze the results, which enables you to build a better version based on what you’ve learned. You may go through this loop several times in various phases of development — collecting data, training a model, deploying the system — before you have a finished product.Why is development of machine learning systems so iterative? Because (i) when starting on a project, you almost never know what strange and wonderful things you’ll find in the data, and discoveries along the way will help you to make better decisions on how to improve the model; and (ii) it’s relatively quick and inexpensive to try out different models.Not all projects are iterative. For example, if you’re preparing a medical drug for approval by the U.S. government — an expensive process that can cost tens of millions of dollars and take years — you’d usually want to get the drug formulation and experimental design right the first time, since repeating the process to correct a mistake would be costly in time and money. Or if you’re building a space telescope (such as the wonderful Webb Space Telescope) that’s intended to operate far from Earth with little hope of repair if something goes wrong, you’d think through every detail carefully before you hit the launch button on your rocket.Iterating on projects tends to be beneficial when (i) you face uncertainty or risk, and building or launching something can provide valuable feedback that helps you reduce the uncertainty or risk, and (ii) the cost of each attempt is modest.This is why The Lean Startup, a book that has significantly influenced my thinking, advocates building a minimum viable product (MVP) and launching it quickly. Developing software products often involves uncertainty about how users will react, which creates risk for the success of the product. Making a quick-and-dirty, low-cost implementation helps you to get valuable user feedback before you’ve invested too much in building features that users don’t want. An MVP lets you resolve questions about what users want quickly and inexpensively, so you can make decisions and investments with greater confidence.When building AI products, I often see two major sources of uncertainty, which in turn creates risk:Users.The considerations here are similar to those that apply to building software products. Will they like it? Are the features you’re prioritizing the ones they’ll find most valuable? Is the user interface confusing?Data.Does your dataset have enough examples of each class? Which classes are hardest to detect? What ishuman-level performanceon the task, and what level of AI performance is reasonable to expect?A quick MVP or proof of concept, built at low cost, helps to reduce uncertainty about users and/or data. This enables you to uncover and address hidden issues that may hinder your success.Many product managers are used to thinking through user uncertainty and using iteration to manage risk in that dimension. AI product managers should also consider the data uncertainty and decide on the appropriate pace and nature of iteration to enable the development team to learn the needed lessons about the data and, given the data, what level of AI functionality and performance is possible.Keep learning!Andrew亲爱的朋友们:在我分享了关于如何在人工智能领域建立职业生涯的内容后,有几位读者来信想要具体了解对人工智能产品的管理:设计引人注目的AI产品的艺术和科学。我将在这周及以后的来信中分享我的经验。构建AI产品的一个关键概念是迭代。正如我在过去的来信中所阐释的,开发机器学习系统是一个高度迭代的过程。首先你需要构建一些东西,然后运行实验来检验它的性能,并分析结果,这使你能够根据所学内容构建更好的版本。在开发的各个阶段(收集数据、训练模型、部署系统),你可能会多次经历此循环,然后才能得到成品。为什么机器学习系统的开发会如此迭代?因为(i)在开始一个项目时,你几乎无法知道你会在数据中发现什么奇怪和奇妙的东西,在开发过程中发现将帮助你更好地决定如何改进模型;(ii)尝试不同的模型相对来说是一种既快速又低成本的方式。并非所有项目都是迭代的。例如在美国,如果你正在为某种药品申请政府的批准——这是一个昂贵的过程,可能需要花费数千万美元和数年时间。我们通常希望第一次尝试就正确地获得药物配方和实验设计,因为重复这一过程来纠错将耗费大量的时间和金钱。又或者,如果你正在建造一架太空望远镜(比如神奇的韦伯太空望远镜),目的是在远离地球的地方运行,一旦运行出现问题,几乎没有修复的希望,那么在按下火箭的发射按钮之前,你应该仔细检查每个细节。当(i)你面临不确定性或风险时,迭代项目往往是有益的,构建或启动某些东西可以提供有价值的反馈,帮助你减少不确定性或危险,以及(ii)每次尝试的成本比较适中。这就是为什么《精益创业》(The Lean Startup)一书对我的思想产生了重大影响,它提倡建立一个最低可行产品(minimum viable product, MVP)并迅速推出。开发软件产品通常会涉及用户反馈的不确定性,这为产品的成功创造了风险。创建一个快捷、低成本的实现可以帮助你在投入过多精力构建用户不想要的功能之前获得有价值的用户反馈。MVP可以让你快速、低成本地解决用户需求的问题,让你可以更有信心地做出决策和投资。在构建AI产品时,我发现通常有两个主要的不确定性来源,这会带来风险:用户。针对这个问题的注意事项与构建软件产品时的类似。用户会喜欢吗?你优先考虑的功能是他们认为最有价值的功能吗?用户界面令人困惑吗?数据。你的数据集中的每个类别是否都有足够的样本?哪些类别最难检测?任务的人类水平表现是什么?人工智能的水平是什么?进行快速MVP或概念验证、低成本完成构建,有助于减少用户和/或数据的不确定性。这使你能够发现并解决可能阻碍成功的隐藏问题。其他行业中的许多产品经理习惯于思考用户的不确定性,并使用迭代来管理该维度的风险。AI产品经理在此基础上还应考虑数据的不确定性,并决定适当的迭代速度和性质,以使开发团队能够学习有关数据的必要经验教训,并根据数据确定AI功能和性能的可实现级别。请不断学习!吴恩达发布于 2022-10-09 21:19原帖作者:吴恩达原帖标题:吴恩达来信:考虑用户和数据的不确定性原帖地址:cid:link_6
-
基于MindStudio的MindX SDK应用开发全流程目录一、MindStudio介绍与安装 21 MindStudio介绍 22 MindSpore 安装 4二、MindX SDK介绍与安装 51 MindX SDK介绍 52.MindX SDK安装 6三、可视化流程编排介绍 81 SDK 基础概念 82.可视化流程编排 8四、SE-Resnet介绍 10五、开发过程 101 创建工程 102 代码开发 123 数据准备及模型准备 144 模型转换功能介绍 155 运行测试 16六、遇见的问题 21MindStudio介绍与安装相关课程:昇腾全流程开发工具链(MindStudio)本课程主要介绍MindStudio在昇腾AI开发中的使用,作为昇腾AI全栈中的全流程开发工具链,提供覆盖训练模型、推理应用和自定义算子开发三个场景下端到端工具,极大提高开发效率。建议开发前,学习该课程的第1章和第3章,可以少走很多弯路!!!MindStudio介绍MindStudio提供您在AI开发所需的一站式开发环境,支持模型开发、算子开发以及应用开发三个主流程中的开发任务。依靠模型可视化、算力测试、IDE本地仿真调试等功能,MindStudio能够帮助您在一个工具上就能高效便捷地完成AI应用开发。MindStudio采用了插件化扩展机制,开发者可以通过开发插件来扩展已有功能。功能简介针对安装与部署,MindStudio提供多种部署方式,支持多种主流操作系统,为开发者提供最大便利。针对网络模型的开发,MindStudio支持TensorFlow、Pytorch、MindSpore框架的模型训练,支持多种主流框架的模型转换。集成了训练可视化、脚本转换、模型转换、精度比对等工具,提升了网络模型移植、分析和优化的效率。针对算子开发,MindStudio提供包含UT测试、ST测试、TIK算子调试等的全套算子开发流程。支持TensorFlow、PyTorch、MindSpore等多种主流框架的TBE和AI CPU自定义算子开发。针对应用开发,MindStudio集成了Profiling性能调优、编译器、MindX SDK的应用开发、可视化pipeline业务流编排等工具,为开发者提供了图形化的集成开发环境,通过MindStudio能够进行工程管理、编译、调试、性能分析等全流程开发,能够很大程度提高开发效率。功能框架MindStudio功能框架如图1-1所示,目前含有的工具链包括:模型转换工具、模型训练工具、自定义算子开发工具、应用开发工具、工程管理工具、编译工具、流程编排工具、精度比对工具、日志管理工具、性能分析工具、设备管理工具等多种工具。图1-1 工具链功能架构工具功能MindStudio工具中的主要几个功能特性如下:工程管理:为开发人员提供创建工程、打开工程、关闭工程、删除工程、新增工程文件目录和属性设置等功能。SSH管理:为开发人员提供新增SSH连接、删除SSH连接、修改SSH连接、加密SSH密码和修改SSH密码保存方式等功能。应用开发:针对业务流程开发人员,MindStudio工具提供基于AscendCL(Ascend Computing Language)和集成MindX SDK的应用开发编程方式,编程后的编译、运行、结果显示等一站式服务让流程开发更加智能化,可以让开发者快速上手。自定义算子开发:提供了基于TBE和AI CPU的算子编程开发的集成开发环境,让不同平台下的算子移植更加便捷,适配昇腾AI处理器的速度更快。离线模型转换:训练好的第三方网络模型可以直接通过离线模型工具导入并转换成离线模型,并可一键式自动生成模型接口,方便开发者基于模型接口进行编程,同时也提供了离线模型的可视化功能。日志管理:MindStudio为昇腾AI处理器提供了覆盖全系统的日志收集与日志分析解决方案,提升运行时算法问题的定位效率。提供了统一形式的跨平台日志可视化分析能力及运行时诊断能力,提升日志分析系统的易用性。性能分析:MindStudio以图形界面呈现方式,实现针对主机和设备上多节点、多模块异构体系的高效、易用、可灵活扩展的系统化性能分析,以及针对昇腾AI处理器的性能和功耗的同步分析,满足算法优化对系统性能分析的需求。设备管理:MindStudio提供设备管理工具,实现对连接到主机上的设备的管理功能。精度比对:可以用来比对自有模型算子的运算结果与Caffe、TensorFlow、ONNX标准算子的运算结果,以便用来确认神经网络运算误差发生的原因。开发工具包的安装与管理:为开发者提供基于昇腾AI处理器的相关算法开发套件包Ascend-cann-toolkit,旨在帮助开发者进行快速、高效的人工智能算法开发。开发者可以将开发套件包安装到MindStudio上,使用MindStudio进行快速开发。Ascend-cann-toolkit包含了基于昇腾AI处理器开发依赖的头文件和库文件、编译工具链、调优工具等。MindStudio安装具体安装操作请参考:MindStudio安装指南 MindStudio环境搭建指导视频场景介绍纯开发场景(分部署形态):在非昇腾AI设备上安装MindStudio和Ascend-cann-toolkit开发套件包。可作为开发环境仅能用于代码开发、编译等不依赖于昇腾设备的开发活动(例如ATC模型转换、算子和推理应用程序的纯代码开发)。如果想运行应用程序或进行模型训练等,需要通过MindStudio远程连接功能连接已部署好运行环境所需软件包的昇腾AI设备。开发运行场景(共部署形态):在昇腾AI设备上安装MindStudio、Ascend-cann-toolkit开发套件包、npu-firmware安装包、npu-driver安装包和AI框架(进行模型训练时需要安装)。作为开发环境,开发人员可以进行普通的工程管理、代码编写、编译、模型转换等功能。同时可以作为运行环境,运行应用程序或进行模型训练。软件包介绍MindStudio:提供图形化开发界面,支持应用开发、调试和模型转换功能,同时还支持网络移植、优化和分析等功能。Ascend-cann-toolkit:开发套件包。为开发者提供基于昇腾AI处理器的相关算法开发工具包,旨在帮助开发者进行快速、高效的模型、算子和应用的开发。开发套件包只能安装在Linux服务器上,开发者可以在安装开发套件包后,使用MindStudio开发工具进行快速开发。MindX SDK介绍与安装MindX SDK介绍MindX SDK提供昇腾AI处理器加速的各类AI软件开发套件(SDK),提供极简易用的API,加速AI应用的开发。应用开发旨在使用华为提供的SDK和应用案例快速开发并部署人工智能应用,是基于现有模型、使用pyACL提供的Python语言API库开发深度神经网络应用,用于实现目标识别、图像分类等功能。图2-1 MindX SDK总体结构通过MindStudio实现SDK应用开发分为基础开发与深入开发,通常情况下用户关注基础开发即可,基础开发主要包含如何通过现有的插件构建业务流并实现业务数据对接,采用模块化的设计理念,将业务流程中的各个功能单元封装成独立的插件,通过插件的串接快速构建推理业务。mxManufacture & mxVision关键特性:配置文件快速构建AI推理业务。插件化开发模式,将整个推理流程“插件化”,每个插件提供一种功能,通过组装不同的插件,灵活适配推理业务流程。提供丰富的插件库,用户可根据业务需求组合Jpeg解码、抠图、缩放、模型推理、数据序列化等插件。基于Ascend Computing Language(ACL),提供常用功能的高级API,如模型推理、解码、预处理等,简化Ascend芯片应用开发。支持自定义插件开发,用户可快速地将自己的业务逻辑封装成插件,打造自己的应用插件。MindX SDK安装步骤1 Windows场景下基于MindStuido的SDK应用开发,请先确保远端环境上MindX SDK软件包已安装完成,安装方式请参见《mxManufacture 用户指南》 和《mxVision 用户指南》 的“使用命令行方式开发”>“安装MindX SDK开发套件”章节。步骤2 在Windows本地进入工程创建页面,工具栏点击File > Settings > Appearance & Behavior > System Settings > MindX SDK进入MindX SDK管理界面。界面中MindX SDK Location为软件包的默认安装路径,默认安装路径为“C:\Users\用户名\Ascend\mindx_sdk”。单击Install SDK进入Installation settings界面,如图2-2。图2-2 MindX SDK管理界面如图2-3所示,为MindX SDK的安装界面,各参数选择如下:Remote Connection:远程连接的用户及IP。Remote CANN Location:远端环境上CANN开发套件包的路径,请配置到版本号一级。Remote SDK Location:远端环境上SDK的路径,请配置到版本号一级。IDE将同步该层级下的include、opensource、python、samples文件夹到本地Windows环境,层级选择错误将导致安装失败。Local SDK Location:同步远端环境上SDK文件夹到本地的路径。默认安装路径为“C:\Users\用户名\Ascend\mindx_sdk”。图2-3 MindX SDK安装界面图2-4 安装完成后的MindX SDK管理界面步骤3 单击OK结束,返回SDK管理界面,可查看安装后的SDK的信息,如图2-4所示,可单击OK结束安装流程。可视化流程编排介绍SDK基础概念通过stream(业务流)配置文件,Stream manager(业务流管理模块)可识别需要构建的element(功能元件)以及element之间的连接关系,并启动业务流程。Stream manager对外提供接口,用于向stream发送数据和获取结果,帮助用户实现业务对接。Plugin(功能插件)表示业务流程中的基础模块,通过element的串接构建成一个stream。Buffer(插件缓存)用于内部挂载解码前后的视频、图像数据,是element之间传递的数据结构,同时也允许用户挂载Metadata(插件元数据),用于存放结构化数据(如目标检测结果)或过程数据(如缩放后的图像)。图3-1 SDK业务流程相关基础单元可视化流程编排MindX SDK实现功能的最小粒度是插件,每一个插件实现特定的功能,如图片解码、图片缩放等。流程编排是将这些插件按照合理的顺序编排,实现负责的功能。可视化流程编排是以可视化的方式,开发数据流图,生成pipeline文件供应用框架使用。图 4-2 为推理业务流Stream配置文件pipeline样例。配置文件以json格式编写,用户必须指定业务流名称、元件名称和插件名称,并根据需要,补充元件属性和下游元件名称信息。步骤1 进入工程创建页面,用户可通过以下方式开始流程编排。在顶部菜单栏中选择Ascend>MindX SDK Pipeline,打开空白的pipeline绘制界面绘制,也可打开用户自行绘制好的pipeline文件,如图3-3。绘制界面分为左侧插件库、中间编辑区、右侧插件属性展示区,具体参考pipeline绘制 。步骤2 在左侧编辑框选择插件,拖动至中间编辑框,按照用户的业务流程进行连接。如果拖动错误插件或者错误连线,选中错误插件或者错误连线单击键盘Del键删除。用户自定义的流水线绘制完成后,选中流水线中的所有插件,右键选择Set Stream Name设置Stream名称,如果有多条流水线则需要对每一条流水线设置Stream名称。绘制完成单击Save保存。图3-2 Detection and Classification配置pipeline样例图3-3 pipeline绘制界面SE-Resnet50介绍残差神经网络是何凯明提出的网络.在深度学习中,网络越深往往取得的效果越好,但是设计的网络过深后若干不为零的梯度相乘导致了梯度消失的现象影响了训练,在残差神经网络中借助其残差结构可以有效的避免梯度消失的问题,在imagenet数据集上取得了优异的结果.SE-Resnet50网络结构,如图4-1所示:图4-1 SE-Resnet50网络结构开发过程创建工程步骤一:安装完成后,点击 ”New Project” 创建新的项目,进入创建工程界面。选择Ascend App项目类别,然后就是常规的命名和路径修改,在CANN Version处点击change配置远程连接和远程CANN地址。图5-1-1 创建项目步骤二:点击CANN Version的Change后进入下界面进行相关配置,点击远程连接配置的最右侧添加按钮。图5-1-2 远程连接配置步骤三:在定义SSH 配置、保存远程服务器的连接配置后返回Remote CANN Setting界面,继续配置CANN location。加载完后再点击Finish即可完成远程环境配置。图5-1-3 配置CANN location 步骤四:完成远程连接设置后,点击next会进到模板选择界面,由于我们是推理任务,此时我们选择MindX SDK Project(Python),再点击Finish。MindX SDK(昇腾行业SDK),提供面向不同行业使能的开发套件,简化了使用昇腾芯片推理业务开发的过程。SDK提供了以昇腾硬件功能为基础的功能插件,用户可以通过拼接功能插件,快速构建推理业务,也可以开发自定义功能插件。图5-1-4 MindX SDK Project(Python)代码开发代码地址:cid:link_2SDK相关工程目录结构:代码介绍:acc.py:求精度代码,在得到sdk推理结果之后会运行acc.py来求精度,具体使用会在本章节运行测试部分详细展示.data_to_bin.py:数据预处理代码,会将数据集中图片转换为二进制形式保存在路径中,具体使用会在本章节数据准备部分详细展示infer.py:里面包含了Sdk_Api这个类,其中主要使用到的函数为图5-2-1 将cv形式输入转换为sdk输入图5-2-2 得到输出结果config.py:已经写好的配置文件,运行时不需要改动Se_resnet50_ms_test.pipeline:pipeline配置文件,运行时需要修改其中的om文件路径,具体会在运行测试部分详细说明.main.py:推理时所运行的文件,会将所有经过预处理之后的二进制文件通过图5-2-1、5-2-2所示函数,得到推理结果.数据准备及模型准备数据集使用的是imagenet,在infer/sdk/目录下先创建一个文件夹“./dataset”,将910上经过数据预处理的图片保存为二进制放入,具体操作如下:在910服务器上执行文件data_to_bin.py图5-3-1 data_to_bin.py配置参数在文件中将数据集路径如图5-3-1所示改为实际路径之后,运行python data_to_bin.py.运行之后./dataset中会生成images与target两个文件夹,里面分别为图片经过预处理之后保存的二进制文件以及每个图片对应的lebel.图5-3-2 生成的images与target的二进制文件在准备好二进制文件后在910上导出onnx模型文件,保存到infer/sdk目录下。具体操作如下:按照图5-3-3所示位置修改pth路径信息,之后运行python pthtar2onnx.py图5-3-3修改pth路径信息图5-3-4 生成onnx文件运行之后会生成图5-3-4所示onnx文件.模型转换功能介绍用户使用torch框架训练好的第三方模型,在导出onnx模型后,可通过ATC工具将其转换为昇腾AI处理器支持的离线模型(*.om文件),模型转换过程中可以实现算子调度的优化、权重数据重排、内存使用优化等,可以脱离设备完成模型的预处理,详细架构如图5-4-1所示。图5-4-1 ATC工具功能架构在本项目中,要将pytorch框架下训练好的模型(*.onnx文件),转换为昇腾AI处理器支持的离线模型(*.om文件),具体步骤如下:步骤1 点击Ascend > Model Converter,进入模型转换界面,参数配置如图5-4-2所示,若没有CANN Machine,请参见第六章第一节CANN安装。图5-4-2 模型转换界面1各参数解释如下表所示:CANN MachineCANN的远程服务器Model File*.onnx文件的路径(可以在本地,也可以在服务器上)Model Name生成的om模型名字Output Path生成的om模型保存在本地的路径步骤2 点击Next进入图5-4-3界面,该项目数据不需要预处理,直接点击Next,进入图5-4-4界面,再点击Finish开始模型转换。图5-4-3 模型转换界面2图5-4-4 模型转换界面3步骤3 等待出现如图5-4-5所示的提示,模型转换成功图5-4-5模型转换成功运行测试步骤1 修改“sdk/config/SE-resnet50.pipeline”中的参数,具体操作如图5-5-1所示;图5-5-1 修改pipeline中*.om文件路径步骤2 在MindStudio工程界面,依次选择“Run > Edit Configurations...”,进入运行配置页面。选择“Ascend App > 工程名”配置应用工程运行参数,图5-5-2为配置示例。配置完成后,单击“Apply”保存运行配置,单击“OK”,关闭运行配置窗口。图5-5-2 工程推理工程运行参数配置在本工程中,推理时运行文件选择main.py,运行参数为--img_path [LR_path] --dataset_name images --pipeline_path [pipeline_path] python3 main.py --img_path "/home/data/xd_mindx/csl/val/" --dataset_name images --pipeline_path "/home/data/xd_mindx/csl/infer/sdk/config/SE-resnet50_test.pipeline"参数解释如下表:参数解释我的设置img_path推理图片路径./val/images/pipeline_pathPipeline文件路径./config/Se_resnet50_ms_test.pipelineinfer_result_dir推理结果保存路径./infer_result/images/images/步骤3 点击运行,出现如图5-5-3所示提示,即为运行成功,infer_result文件夹中即为推理结果,保存为二进制形式。图5-5-3推理操作过程步骤4 配置后处理运行程序,在MindStudio工程界面,依次选择“Run > Edit Configurations...”,进入运行配置页面,如图5-5-4所示,点击“+”,后选择Python(后处理可以直接在本地运行),如图5-5-5所示。图5-5-4运行配置界面图5-5-5 运行后处理相关配置Script path运行文件路径Parameters运行时的参数如图5-5-5所示,运行文件为acc.py:步骤5 点击运行,出现如图5-5-6所示提示,即为运行成功。图5-5-6 运行后处理程序步骤6 结果分析,如图5-5-6所示,已经达到了标准精度。 遇见的问题在使用MindStudio时,遇到问题,可以登陆MindStudio昇腾论坛进行互动,提出问题,会有专家老师为你解答。模型转换时,没有CANN Machine图6-1 CANN管理界面解决方案:按以下步骤,重新安装CANN Machine步骤1 点击File>Settings>Appearance & Behavior > System Settings > CANN,进入CANN管理界面,如图6-1所示:步骤2 点击Change CANN,进入Remote CANN Setting界面,如图6-2所示重新安装CANN,点击Finish,安装CANN。6-2 Remote CANN Setting界面图6-3 安装CANN完成参数解释如下表:Remote Connection远程服务器IPRemote CANN location远程服务器中CANN路径步骤3 完成CANN安装,点击OK,重启MindStudio,如图6-3所示。MindStudio导入应用工程后,提示“No Python interpreter configured for the module”解决方案:步骤1 在顶部菜单栏中选择File > Project Structure,在Project Structure窗口中,点击Platform Settings > SDKs,点击上方的“+”添加Python SDK,从本地环境中导入Python,如图6-4所示。图6-4 导入Python SDK步骤2 点击Project Settings > Project,选择上一步添加的Python SDK,如图6-5所示。图6-5 设置Project SDK步骤3 点击Project Settings > Modules,选中“MyApp”,点击“+”后选择Python,为Python Interpreter选择上述添加的Python SDK。点击OK完成应用工程Python SDK配置,如图6-6所示。图6-6 选择Python SDK
-
使用MindStudio进行PSENet_MobileNetv3模型的onnx推理一.MindStudio介绍MindStudio提供在AI开发所需的一站式开发环境,支持模型开发、算子开发以及应用开发三个主流程中的开发任务。依靠模型可视化、算力测试、IDE本地仿真调试等功能,MindStudio能够在一个工具上就能高效便捷地完成AI应用开发。MindStudio采用了插件化扩展机制,开发者可以通过开发插件来扩展已有功能。 二.概述随着卷积神经网络的发展,场景文本检测取得了迅速的进展。然而,仍然存在两个挑战,阻碍了该算法在工业应用中的应用。一方面,大多数最先进的算法需要四边形边界框,这对于定位具有任意形状的文本是不准确的。另一方面,两个彼此接近的文本实例可能会导致覆盖两个实例的错误检测。传统上,基于分割的方法可以缓解第一个问题,但通常无法解决第二个问题。为了解决这两个问题,提出了一种新的渐进尺度扩展网络(PSENet),它可以精确地检测任意形状的文本实例。更具体地说,PSENet为每个文本实例生成不同尺度的内核,并逐渐将最小尺度内核扩展到具有完整形状的文本实例。由于最小尺度核之间存在较大的几何余量,此方法能够有效地分割封闭文本实例,从而更容易使用基于分割的方法来检测任意形状的文本实例。 PSENet模型如图所示,使用ResNet作为PSENet主干。将低级纹理特征与高级语义特征连接起来。这些特征图在F中进一步融合,用不同的视图编码信息,促进不同尺度的核的生成。然后将特征映射F投影到n个分支中,生成多个分割结果S1、S2、…、Sn。每个Si将是一个特定比例的所有文本实例的分段掩码。在这些掩码中,S1给出了具有最小尺度(即最小核)的文本实例的分割结果,Sn表示原始分割掩码(即最大核)。在获得这些分割掩模后,使用渐进尺度扩展算法将S1中的所有实例核逐渐扩展到Sn中的完整形状,并获得最终的检测结果R。ICDAR 2015(IC15)是文本检测的常用数据集。它总共包含1500张图片,其中1000张用于训练,其余用于测试。文本区域由四边形的4个顶点注释。本文介绍了基于MindStudio平台,将PaddlePaddle上开源的PSENet_MobileNetv3模型部署到Ascend平台上,并进行数据预处理、推理脚本的开发,在ICDAR 2015数据集上完成推理任务。三.推理环境准备本图文案例中,使用的环境为本地Windows 10安装的MindStudio和远程昇腾AI运行环境。使用的MindStudio版本为5.0.RC2,Ascend-cann-toolkit版本为远程环境下的5.1.RC1。3.1 windows端环境准备(1)根据MindStudio官方安装指南进行安装即可。 2 Linux端环境准备(1)配置conda环境,并安装项目依赖包。 (2)配置环境变量 四.创建工程(1)点击 “New Project” 创建新的项目,进入创建工程界面。 (2)选择Ascend App项目类别,然后就是常规的命名和路径修改,在CANN Version处点击change配置远程连接和远程CANN地址。 (3)当点击CANN Version的Change后进入下界面进行相关配置,点击远程连接配置的最右侧添加按钮。 (4)进入SSH Configurations后,可进行如图所示的连接配置。 (5) 完成远程连接设置后,点击next会进到模板选择界面,由于这里是推理任务,此时选择MindX SDK Project(Python),再点击Finish。 (6)配置本地和远程的项目的路径,通过选择界面菜单栏的File->Settings,找到其中Tools下的Deployment可配置本地项目路径和对应远程路径的映射。Deployment管理远程开发时的本地文件与远程文件的同步。 (7)配置本地,远端环境同步 (8)添加python解释器,选择远程环境下创建好的推理虚拟环境下的解释器。 点加号 添加好我们环境所需的解释器后,我们需要在Project和Modules下给项目配置。 五.执行推理5.1 数据预处理(1)ICDAR2015数据集下载链接:cid:link_0 数据集文件目录结构如下 (2)将原始数据集转换为模型输入的二进制数据,即bin文件,并生成数据集信息pkl文件,处理文件为pse_preprocess.py脚本文件。 (3)点击Run->Edit Configurations进入配置界面, 配置脚本运行环境。运行框中的参数为模型的配置文件。配置好后,点击运行,产生bin文件和数据集信息pkl文件,如下图所示。 5.2 模型转换5.2.1 训练模型转推理模型 首先将PSE文本检测训练过程中保存的模型,转换成inference model。点击run,选择Edit Configurations 运行export_model.py,配置好参数后,点击ok。 运行框中的参数分别为模型的配置文件,训练模型路径和推理模型的存储路径。 转换结果如图所示,5.2.2 导出onnx模型运行脚本paddle2onnx.sh,将inference model转换为onnx 参数分别为:推理模型路径,模型文件名,参数文件名,存储文件名,ONNX的op版本,输入形状和是否启动转换程序帮助我们检查模型。5.2.3 转换为om模型此步借助ATC(Ascend Tensor Compiler)工具,ATC工具将开源框架的网络模型(如Caffe、TensorFlow等)以及单算子Json文件,转换成昇腾AI处理器支持的离线模型,模型转换过程中可以实现算子调度的优化、权值数据重排、内存使用优化等,可以脱离设备完成模型的预处理。ATC工具功能架构如图所示: ATC工具运行流程如图所示: 首先,点击Ascend,接着点击Model Converter, 进入以下界面, Model File选择进行转换的 onnx 模型,Model Name 为模型名称,Target SoC Version 为需要转换成适配的芯片类型,并输入图片的Shape。 点击图中Model File最右边的按钮,可以生成可视化模型流程框架,帮助用户了解每层的结构与参数,如下图所示: 点击 ok,next,进行下一页的配置。 点击finish开始模型的转换。5.3 执行离线推理执行ais_infer.py脚本进行离线推理,测试推理模型的性能(包括吞吐率、时延).以batch size=1为例,batch size=4,8,16,32,64同理。参数分别为:模型路径,数据路径和推理结果存放路径。 推理结果如下图所示: batch size为1时生成的图片如图所示: 5.4精度验证使用后处理文件pse_postprocess.py,借助性能结果获取精度。其中的res_dir为不同batch size的性能结果存放目录, Info_dir为数据集配置文件存放目录。 执行脚本pse_postprocess.py即可进行模型精度验证,以batch size=64为例,其他batch size同理。 其中参数为:模型的配置文件路径 得到如下推理结果: F&Q1.在执行离线推理时,切换自己创建的虚拟环境时报错 最后发现是忘记切换到root用户下,输入su,输入密码解决。2.在执行离线推理时,报如下错误 寻求负责支撑推理的老师的帮助后,设置如下环境变量后就不会报错了。 3.在进行精度验证时,执行以下命令时,报错 最后发现是进入了tools目录下,执行cd ..进入主目录,执行以下命令就可以成功验证精度 大家遇到任何问题都可以去昇腾论坛,在帖子里提出自己关于项目的问题,会有华为内部技术人员进行答疑解惑。
-
本文的视频教程可点击下面链接:使用MindStudio进行UAV场景下人群密度估计开发1、项目介绍任务场景:无人机场景下人群密度估计算法任务描述:基于MindX SDK将其部署于昇腾平台上,实现无人机在线人群密度估计。任务目标:在VisDrone2021数据集上实现,均方误差(MSE)不大于35,绝对误差(MAE)不大于20,且FPS不低于20最低实时检测帧数。环境信息:昇腾设备: Ascend310;Mind SDK版本:2.0.4;CANN版本:5.0.42、模型介绍本项目使用的模型为DM-Count,论文地址链接,其模型是基于VGG19()使用Pytorch框架开发,其论文表明高斯规则强加于注释标注会损害泛化性能。DM-Count则不需要任何高斯方法对真值标注进行处理,进而提升算法性能。由于本项目是基于MindX SDK部署于昇腾边缘硬件,因此本文着重于在MindX SDK部署过程中所遇到的问题以及解决方法,为基于Mind SDK的开发人员排坑。对模型的训练本文只提及训练VisDrone2021数据集时需修改的代码。具体训练过程也可直接参考地址。下载DM-Count项目,并使用MindStudio打开项目目录,其目录结构如图所示,由于服务器需用内网连接的,因此本文对DM-Count布置本地实验环境,首先配置python环境以及SDK等,步骤如下图所示3、模型训练本项目是基于无人机场景,首先我们需要修改VisDrone2021数据集的标注文件以适用于DM-Count模型的训练。下载VisDrone2021数据集并将其移至DM-Count根目录,其数据集目录结构如下图所示,GT_文件夹为xml标注文件,RGB为无人机场景下RGB颜色模型JPG格式图像文件,TIR为红外拍摄的灰度JPG格式图像文件,两个文件夹图像一一对应。由于VisDrone2021测试数据集没有真实值标注,需要在官网提交检测结果才能获取测试精度,因此本项目将RGB文件夹下的数据以4:1随机分为训练集和测试集,即训练集1445张图像,测试集为362张图像。然后在DroneRGBT/Train目录中新建xml2mat.py文件,添加如下图代码,并运行,其目的是将xml文件改为预处理所需用到的mat文件。运行结果在DroneRGBT/Train目录中生成mats文件夹,文件夹内.mat文件为对应图像标注文件。然后,在此目录中新建spilt_tain_test.py文件夹生成train.txt和val.txt文件,即训练图像和测试图像,代码以及结果如下图所示,其train.txt和val.txt文件内容为训练和测试图像的文件名字。3.1、标注文件修改训练DM-Count需要对数据集进行预处理生成密度图和训练真实值,其源码如下图所示,原始代码中只对QNRF和NWPU数据集做了预处理,我们添加22-25行代码,即图中红色框部分,VisDrone2021数据集图像为640*512的固定大小,因此我们在main()函数中传入参数为:输入数据集路径,输出数据及路径,图像高度,图像宽度。在DM-Count/preprocess/文件夹下复制preprocess_dataset_nwpu.py并重命名为preprocess_dataset_vis.py并修改main()函数,具体代码如下图所示然后运行preprocess_dataset.py预处理文件,可生成train和val文件夹,其中每个文件夹下为训练图像,该训练图像的密度图以及真实值,即.jpg文件为训练或测试图像,.npy文件为该图像真实值, 以_densitymap.npy后缀的文件为该图像密度图。至此,原DM-Count代码修改完毕,更改train.py文件如图所示,然后点击运行按钮等待运行完毕即可生成Pytorch框架的权重文件。模型训练过程中需注意:可能对于pytorch的版本会出现如下问题,解决办法:在datasets/crowd.py文件中修改如下代码3.2、模型转换完成训练之后,得到的Pytorch权重文件,因此我们在主目录新建pth2onnx.py将权重转换成Mind SDK所支持的onnx文件,其代码如下,3.1节所生成的权重文件在ckpts文件夹中,最后生成Visdrone_CrowdCounting.onnx4、基于MindX SDK开发完成开发后的UAV_CrowdCounting无人机场景下人群密度估计目录结构4.1、创建MindX SDK(Python)工程在MindStudio IDE中新建Ascend App项目,如图所示,首先,选择Ascend App,然后后可以命名项目名称,如图本项目名称为UAV_CrowdCounting,项目位置可以直接选择默认位置,紧接着部署服务器CANN,点击图中Change按钮点击Change按钮之后,我们在远程连接中配置我们的服务器,如图所示,成功连接服务器后,找到服务器中CANN安装文件夹,并选择其中使用的版本,我所使用的服务器只安装了5.0.4版本,如图所示选择之后,点击Finish,等待MindStudio完成配置成功配置之后,可以在创建项目页面看到CANN版本,然后点击下一步本次项目使用python语言开发基于MindX SDK的UAV场景下的人群密度估计,选择模板如图所示,并在新的窗口打开窗口变化如图所示,则完成了UAV_CrowdCounting项目的创建,并点击红框处,配置执行文件。选择mian.py作为执行程序,如下图所示,4.2、配置MindStudio的python环境在MindStudio中配置接众智实验室服务器的开发环境,首先连接服务器,并测试测试是否连接成功,以确保与服务器文件夹完成映射。如图所示将本地文件夹映射在服务器的文件夹中,路径可设置为自己方便寻找以及开发的目录下,如下图红色框所示,本项目映射在/home目录下。勾选代码自动上传,即修改代码按ctrl+s可自动上传至服务器,如下图所示增加远程的python SDK,如下图所示,以方便在本地运行服务器上的python环境增加SSH Interpreter,即远程python解释器(服务器上使用的python),再将配置好的python SDK配置于本项目中,步骤如下图所示4.3、ATC模型转换至此,我们已将完成了该项目在MindStudio所需的环境,我们创建如图目录文件夹,转换模型时,首先我们需要设置aipp配置文件,并将其命名为aipp.cfg如图所示,图中,我们使用静态模型,即aipp_mode设置为static,然后输入的图片格式为RGB888_U8,关闭色域转换,关闭R通道与B通道交换,即csc_switch和rbuv_swap_switch,因为我们在训练过程中会对图像进行归一化,aipp配置文件归一化的运算方式可参考链接。mean_chn_0,mean_chn_1,mean_chn_2分别代表RGB颜色每个通道中的均值,min_chn_0,min_chn_1,min_chn_2分别代表RGB颜色每个通道的最小值,var_reci_chn_0,var_reci_chn_1,var_reci_chn_2表示RGB颜色每个通道方差的倒数。例如,在本项目中,模型训练过程中使用使用了归一化和标准化,即相对应Pytorch框架中transforms.ToTensor()和transforms.Normalize()数据预处理方法。transforms.ToTensor()首先会将图像的像素值输入图像缩放为[0.0,1.0]之间,在将数据转化为张量,而transforms.Normalize()是逐通道的对图像进行标准化,本项目中mean= [0.485, 0.456, 0.406](各通道的均值),std= [0.229, 0.224, 0.225](各通道的标准差)。具体运算如下图所示,完成aipp文件的设置后,我们可以将上文所训练的onnx模型文件或下载提供的模型并放置models然后,我们可以点击模型转换按钮,进行模型转换,如图所示添加models文件夹下的onnx模型,操作如下完成之后如上操作后点击下一步,配置模型的图像预处理函数,最后,点击Finish按钮,完成模型转换,如图可看到模型转换成功4.4、配置插件在MindStudio中,可以直接可视化编排插件,如下图所示,新建Mind SDK Pipeline,然后直接搜索或者选取项目所需用的插件,并对其进行连线,然后保存在pipeline文件夹中,并命名为crowdcount.pipeline,在新建的pipeline中,首先选择输入插件(appsrc),获取图像输入数据,如下图所示,然后在其他插件中选取图像解码插件(mxpi_imagedecoder),并将输入插件(appsrc)与解码插件连线,如下图所示,注意需以OpenCV方式解码图片并将图片转成RGB颜色模型,以符合模型推理数据格式避免检测精度下降。接着引入缩放插件(mxpi_imageresize),同样以OpenCV方式将输入图片放缩到模型指定输入640*512的尺寸大小,以满足模型推理时图像的大小。如图所示,紧接着调用推理插件(mxpi_tensorinfer)插件,并加载已转换好的om模型(写入相对路径,如在本项目中则为./models/uav_crowdcounting_norm.om,若使用绝对路径则会报错或保存不了文件)对输入数据张量进行推理。如图所示,然后再使用输出插件appsink,从stream中获取数据。如下图所示最后,对每个插件点击鼠标右键进行命名,如图所示至此,插件配置完成,将文件保存在pipeline文件夹中,并命名为crowdcount.pipeline配置插件过程中需注意:1、图像解码插件mxpi_imagedecoder,在本项目模型推理使用的是RGB数据,因此图像解码插件要使用OpenCV处理方式,且需将输出数据改为RGB数据格式。2、图像归一化或标准化处理插件mxpi_imagenormalize的使用,与4.2模型转换有相关联系,在初步模型转换过程中,我们使用mxpi_imagenormalize图像的归一化与标准化并未使用aipp配置文件,模型虽然能够完成转换,但是在模型推理的结果值为零,具体原因目前尚未知晓,解决方式是将图像的归一化或标准化配置在aipp文件中。4.5、图片测试首先我们需要在MindStudio中配置MindX SDK,如图所示,点击红色框按钮,然后选择MindX SDK配置,并点击安装Install SDK按钮点击之后和配置CANN类似,完成远程CANN和SDK远程路径,然后等待配置完成。如图所示,SDK配置完之后的效果到此,我们准备好了模型, pipeline,以及MindX SDK,现在可以使用模型测试单张图片了,main.py编写单张测试代码,如图23所示,其中创建StreamManager,以及创建Pipeline部分是常规代码,可按照一般流程开发即可,主要区别于对推理结果的处理,即红色框中的代码(69-78行)。模型最终输出为1*5120的一维张量,每个张量代表人群密度值,因此张量之和为该张图片的人群统计数。对于人群密度图,我们将1*5120的一维张量重塑成大小为64*80的数组,并进行最大最小归一化形成人群密度图,然后保存即完成单张图片的人群密度估计。我们可以在3.1节中在生成的val文件夹下随意选取一张图片复制到data文件夹下并重命名为test.jpg,如图所示,然后点击运行按钮,测试该图像人群密度运行之后,结果如图所示,对于生成的密度图,我们可以新建一个名为vis_img.jpg的文件,然后同步服务器文件夹,即可在本地查看生成的人群密度图。运行程序过程中需注意:可能出现找不到如下模块的问题,解决办法:在MindStudio中打开SSH远程连接,在用户目录下新建.profile文件,添加CANN和MindX SDK的环境变量。步骤如图所示4.6、精度测试在data文件夹下,将第3节中val.txt重命名为visdrone_test.txt放置data文件夹下,并新建VisDrone2021文件夹,在官网链接下载数据集,且将将数据集中RGB和GT_目录放置在新建的VisDrone2021目录中,步骤如下图所示,新建eval.py文件之后,我们将测试代码写入eval.py文件中,代码如下在精度测试中,主要代码则为获取VisDrone测试的数据集,即Dataset类,代码首先按照visdrone_test.txt文件中的图片名加载到Dataset类中,然后在迭代图片的时候以MxDataInput()读取图像的数据,与训练不同测试时是直接使用xml文件中的人群统计数,最后我们计算平均FPS,均方误差(MSE)以及绝对误差(MSE)。由于我们设置了main.py程序作为MindStudio的运行文件,因此测试精度时,我们需要将运行文件更改为eval.py文件,步骤如下更改完成之后,我们直接开始进度测试,如下图所示,为运行结果在图中可以看到FPS未能满足最低20的要求,这可能是因为图片传输过程中有一些耗时,在服务器中均能满足任务要求,服务器中结果图下图所示精度测试过程中需注意:所遇问题如4.5中一致获取帮助如果有任何疑问,都可以前往MindStudio昇腾论坛获取更多信息。
推荐直播
-
HDC深度解读系列 - Serverless与MCP融合创新,构建AI应用全新智能中枢2025/08/20 周三 16:30-18:00
张昆鹏 HCDG北京核心组代表
HDC2025期间,华为云展示了Serverless与MCP融合创新的解决方案,本期访谈直播,由华为云开发者专家(HCDE)兼华为云开发者社区组织HCDG北京核心组代表张鹏先生主持,华为云PaaS服务产品部 Serverless总监Ewen为大家深度解读华为云Serverless与MCP如何融合构建AI应用全新智能中枢
回顾中 -
关于RISC-V生态发展的思考2025/09/02 周二 17:00-18:00
中国科学院计算技术研究所副所长包云岗教授
中科院包云岗老师将在本次直播中,探讨处理器生态的关键要素及其联系,分享过去几年推动RISC-V生态建设实践过程中的经验与教训。
回顾中 -
一键搞定华为云万级资源,3步轻松管理企业成本2025/09/09 周二 15:00-16:00
阿言 华为云交易产品经理
本直播重点介绍如何一键续费万级资源,3步轻松管理成本,帮助提升日常管理效率!
回顾中
热门标签