• [互动交流] 怎么查看具体训练内容,就是训练过程中的系统日志啊?
    新手小白找不到:怎么查看具体训练内容,就是训练过程中的系统日志啊?
  • 华为云HCSD校园沙龙走进赣州职业技术学院
    12⽉7⽇下午,华为云【HCSD校园沙龙】赣州职业技术学院站成功举办。本次活动由华为技术有限公司主办,华为云开发者联盟产品部、江西赣州华为云承办,赣州职业技术学院信息工程系、北京中思育仁教育科技有限公司协办。华为云技术专家走进赣职院,与高校开发者共话行业技术、勇攀技术高峰,繁荣高校生态,助力院校培育创新技术型人才。活动伊始,赣州职业技术学院信息工程系主任张俊华对举办此次活动的目的作出说明,并向在场同学介绍此次莅临活动的嘉宾和专家。随后赣州职业技术学院副院长欧阳荣华在致辞中表示,校企合作是数字人才培养制度的创新,更是理论与实践相结合的教学模式传承,这对促进学校人才培养也具有重要意义。在校内举办HCSD校园沙龙活动,不仅可加强同学们的职业素养、专业技能和实践能力,同时可进一步加强校企协同创新。江西华为云政务云运营部部长汪何说:一直以来,华为云和赣州职业技术学院保持着紧密的合作,携手共建物联网、云计算、计算机网络技术、电子信息工程技术等相关学科的人才培养、学术研究、创新科研等活动他还表示华为云将配合赣州职业技术学院为同学们创造更佳的教学环境和培养条件,为新工科建设培养卓越优秀的人才。在线上,华为开发者创新中心运营总监高荀从产业发展与人才培养双向驱动的角度出发,展开《华为开放能力助力新时代人才培养》的主题演讲。在演讲中,高荀为学生们介绍了华为开放能力,并分享华为开发者创新应用中心的运营模式等相关内容。其中,她重点提到了华为云人才培养战略,介绍了沿着开发者的旅程明确了各节点下的高校生态发展措施,通过丰富的资源和活动助力产教融合,以多维度互动体验式教学助力院校培养高水平创新创业型人才。活动现场,同学们也踊跃参与,与专家进行互动问答、切磋交流。华为云EI开发者生态工程师青姚在线上结合人工智能定位、人工智能技术发展历程、人工智能入门难点及AI工程师面临的问题和挑战、华为云一站式AI开发管理平台ModelArts、AI Gallery社区、华为云开发者认证简介和华为云开发者认证权益等方面的知识,进行了深入浅出的讲解。还手把手为学生们演示了如何开发一个新的AI模型,教大家如何将动漫化图切换各种不同风格,这既提升了同学们的运用能力,也带领同学们通过实践体验开发乐趣,增强同学们在数字领域深耕的信心。转眼活动进入尾声。会场还为同学们开放了华为云开发者认证考试券名额,旨在通过开发者认证体系,帮助同学们学习并考取认证,真正在云上作业,懂开发会开发,进而助力数字产业人才生态发展。一直以来,华为云积极与高校以及产教融合机构紧密合作,持续孵化江西开发者人才生态圈。未来,华为云将继续扎根江西省,加速全栈创新,为开发者带来实际技术提供支持,并将基于华为云生态体系,为校园开发者提供更加丰富的开发工具、学习交流以及实践平台。
  • [热门活动] 江水波澜,枫叶红红火火
    关键词:A bleak night......
  • sea
    prompt:sunshine,sea,beach.
  • 少女
    少女,少女,少女,少女
  • 浪漫的玫瑰
    冰雪里的玫瑰,也掩盖不了它的浪漫。
  • 使用强化学习AlphaZero算法训练中国象棋AI
    使用强化学习AlphaZero算法训练中国象棋AI案例目标通过本案例的学习和课后作业的练习:了解强化学习AlphaZero算法;利用AlphaZero算法进行一次中国象棋AI训练;你也可以将本案例相关的 ipynb 学习笔记分享到 AI Gallery Notebook 版块获得成长值,分享方法请查看此文档。案例内容介绍AlphaZero是一种强化学习算法,近期利用AlphaZero训练出的AI以绝对的优势战胜了多名围棋以及国际象棋冠军。AlphaZero创新点在于,它能够在不依赖于外部先验知识(也称专家知识),仅仅了解游戏规则的情况下,在棋盘类游戏中获得超越人类的表现。本次案例将详细的介绍AlphaZero算法核心原理,包括神经网络构建、MCTS搜索、自博弈训练,以代码的形式加深对算法的理解,算法详情亦可见论文《Mastering the game of Go without human knowledge》。同时本案例提供中国象棋强化学习环境,利用AlphaZero进行一次中国象棋训练,最后可视化象棋AI自博弈对局。由于训练一个强力的中国象棋AI需要大量的训练时间和资源,本案例偏重于算法理解,在运行过程中简化了训练过程,减少了自博弈次数和搜索次数。如果想要完整地训练一个中国象棋AlphaZero AI,可在AI Gallery中订阅《CChess中国象棋》算法,并在ModelArts中进行训练。注意事项本案例运行环境为 TensorFlow-1.13.1,且建议使用 GPU 运行,请查看《ModelAtrs JupyterLab 硬件规格使用指南》了解切换硬件规格的方法;如果您是第一次使用 JupyterLab,请查看《ModelAtrs JupyterLab使用指导》了解使用方法;如果您在使用 JupyterLab 过程中碰到报错,请参考《ModelAtrs JupyterLab常见问题解决办法》尝试解决问题;请逐步运行下面的每一个代码块;实验步骤程序初始化构建神经网络实现MCTS实现自博弈过程进行训练参数配置开始自博弈训练模型更新可视化对局1. 程序初始化第1步:安装基础依赖要确保所有依赖都安装成功后,再执行之后的代码。如果某些模块因为网络原因导致安装失败,直接重试一次即可。!pip install tornado==6.1.0!pip install tflearn==0.3.2!pip install tqdm!pip install urllib3==1.22!pip install threadpool==1.3.2!pip install xmltodict==0.12.0!pip install requests!pip install pandas==0.19.2!pip install numpy==1.14.5!pip install scipy==1.1.0!pip install matplotlib==2.0.0!pip install nest_asyncio!pip install gast==0.2.2第2步: 下载依赖包import osimport moxing as moxif not os.path.exists('cchess_training'): mox.file.copy("obs://modelarts-labs-bj4/course/modelarts/reinforcement_learning/cchess_gameplay/cchess_training/cchess_training.zip", "cchess_training.zip") os.system('unzip cchess_training.zip')第3步:导入相关的库%matplotlib notebook%matplotlib autoimport osimport sysimport loggingimport subprocessimport copyimport randomimport jsonimport asyncioimport timeimport numpy as npimport tensorflow as tffrom multiprocessing import Processfrom cchess_training.cchess_zero import board_visualizerfrom cchess_training.gameplays import players, gameplayfrom cchess_training.config import conffrom cchess_training.common.board import create_uci_labelsfrom cchess_training.cchess_training_model_update import model_updatefrom cchess_training.cchess_zero.gameboard import GameBoardfrom cchess_training.cchess_zero import cbffrom cchess_training.utils import get_latest_weight_pathimport nest_asyncionest_asyncio.apply()os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)logging.basicConfig(level=logging.INFO, format="[%(asctime)s] [%(levelname)s] [%(message)s]", datefmt='%Y-%m-%d %H:%M:%S' )2.构建神经网络这里基于Resnet实现了AlphaZero中的神经网络,神经网络输入为当前象棋棋面转化得到的0-1图,大小为[10, 9, 14],[10, 9]表示象棋棋盘大小,[14]每一个plane对应一类棋子,我方7类(兵、炮、车、马、相、仕、将),敌方7类,共14个plane。经过Resnet提取特征后分为两个分支,一个是价值分支,输出当前棋面价值,另一个是策略头,输出神经网络计算得到的动作对应概率。# resnetdef res_block(inputx, name, training, block_num=2, filters=256, kernel_size=(3, 3)): net = inputx for i in range(block_num): net = tf.layers.conv2d( net, filters=filters, kernel_size=kernel_size, activation=None, name="{}_res_conv{}".format(name, i), padding='same' ) net = tf.layers.batch_normalization(net, training=training, name="{}_res_bn{}".format(name, i)) if i == block_num - 1: net = net + inputx net = tf.nn.elu(net, name="{}_res_elu{}".format(name, i)) return netdef conv_block(inputx, name, training, block_num=1, filters=2, kernel_size=(1, 1)): net = inputx for i in range(block_num): net = tf.layers.conv2d( net, filters=filters, kernel_size=kernel_size, activation=None, name="{}_convblock_conv{}".format(name, i), padding='same' ) net = tf.layers.batch_normalization(net, training=training, name="{}_convblock_bn{}".format(name, i)) net = tf.nn.elu(net, name="{}_convblock_elu{}".format(name, i)) # net shape [None,10,9,2] netshape = net.get_shape().as_list() net = tf.reshape(net, shape=(-1, netshape[1] * netshape[2] * netshape[3])) net = tf.layers.dense(net, 10 * 9, name="{}_dense".format(name)) net = tf.nn.elu(net, name="{}_elu".format(name)) return netdef res_net_board(inputx, name, training, filters=256, num_res_layers=4): net = inputx net = tf.layers.conv2d( net, filters=filters, kernel_size=(3, 3), activation=None, name="{}_res_convb".format(name), padding='same' ) net = tf.layers.batch_normalization(net, training=training, name="{}_res_bnb".format(name)) net = tf.nn.elu(net, name="{}_res_elub".format(name)) for i in range(num_res_layers): net = res_block(net, name="{}_layer_{}".format(name, i + 1), training=training, filters=filters) return netdef get_scatter(name): with tf.variable_scope("Test"): ph = tf.placeholder(tf.float32, name=name) op = tf.summary.scalar(name, ph) return ph, opdef average_gradients(tower_grads): """Calculate the average gradient for each shared variable across all towers. Note that this function provides a synchronization point across all towers. Args: tower_grads: List of lists of (gradient, variable) tuples. The outer list is over individual gradients. The inner list is over the gradient calculation for each tower. Returns: List of pairs of (gradient, variable) where the gradient has been averaged across all towers. """ average_grads = [] for grad_and_vars in zip(*tower_grads): # Note that each grad_and_vars looks like the following: # ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN)) grads = [] for g, _ in grad_and_vars: # Add 0 dimension to the gradients to represent the tower. expanded_g = tf.expand_dims(g, 0) # Append on a 'tower' dimension which we will average over below. grads.append(expanded_g) # Average over the 'tower' dimension. grad = tf.concat(grads, 0) grad = tf.reduce_mean(grad, 0) # Keep in mind that the Variables are redundant because they are shared # across towers. So .. we will just return the first tower's pointer to # the Variable. v = grad_and_vars[0][1] grad_and_var = (grad, v) average_grads.append(grad_and_var) return average_gradsdef add_grad_to_list(opt, train_param, loss, tower_grad): grads = opt.compute_gradients(loss, var_list=train_param) grads = [i[0] for i in grads] tower_grad.append(zip(grads, train_param))def get_op_mul(tower_gradients, optimizer, gs): grads = average_gradients(tower_gradients) train_op = optimizer.apply_gradients(grads, gs) return train_opdef reduce_mean(x): return tf.reduce_mean(x)def merge(x): return tf.concat(x, axis=0)def get_model_resnet( model_name, labels, gpu_core=[0], batch_size=512, num_res_layers=4, filters=256, extra=False, extrav2=False): tf.reset_default_graph() graph = tf.Graph() with graph.as_default(): x_input = tf.placeholder(tf.float32, [None, 10, 9, 14]) nextmove = tf.placeholder(tf.float32, [None, len(labels)]) score = tf.placeholder(tf.float32, [None, 1]) training = tf.placeholder(tf.bool, name='training_mode') learning_rate = tf.placeholder(tf.float32) global_step = tf.train.get_or_create_global_step() optimizer_policy = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9) optimizer_value = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9) optimizer_multitarg = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9) tower_gradients_policy, tower_gradients_value, tower_gradients_multitarg = [], [], [] net_softmax_collection = [] value_head_collection = [] multitarget_loss_collection = [] value_loss_collection = [] policy_loss_collection = [] accuracy_select_collection = [] with tf.variable_scope(tf.get_variable_scope()) as vscope: for ind, one_core in enumerate(gpu_core): if one_core is not None: devicestr = "/gpu:{}".format(one_core) if one_core is not None else "" else: devicestr = '/cpu:0' with tf.device(devicestr): body = res_net_board( x_input[ind * (batch_size // len(gpu_core)):(ind + 1) * (batch_size // len(gpu_core))], "selectnet", training=training, filters=filters, num_res_layers=num_res_layers ) with tf.variable_scope("policy_head"): policy_head = tf.layers.conv2d(body, 2, 1, padding='SAME') policy_head = tf.contrib.layers.batch_norm( policy_head, center=False, epsilon=1e-5, fused=True, is_training=training, activation_fn=tf.nn.relu ) policy_head = tf.reshape(policy_head, [-1, 9 * 10 * 2]) policy_head = tf.contrib.layers.fully_connected(policy_head, len(labels), activation_fn=None) # 价值头 with tf.variable_scope("value_head"): value_head = tf.layers.conv2d(body, 1, 1, padding='SAME') value_head = tf.contrib.layers.batch_norm( value_head, center=False, epsilon=1e-5, fused=True, is_training=training, activation_fn=tf.nn.relu ) value_head = tf.reshape(value_head, [-1, 9 * 10 * 1]) value_head = tf.contrib.layers.fully_connected(value_head, 256, activation_fn=tf.nn.relu) value_head = tf.contrib.layers.fully_connected(value_head, 1, activation_fn=tf.nn.tanh) value_head_collection.append(value_head) net_unsoftmax = policy_head with tf.variable_scope("Loss"): policy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( labels=nextmove[ind * (batch_size // len(gpu_core)): (ind + 1) * (batch_size // len(gpu_core))], logits=net_unsoftmax)) value_loss = tf.losses.mean_squared_error( labels=score[ind * (batch_size // len(gpu_core)):(ind + 1) * (batch_size // len(gpu_core))], predictions=value_head) value_loss = tf.reduce_mean(value_loss) regularizer = tf.contrib.layers.l2_regularizer(scale=1e-5) regular_variables = tf.trainable_variables() l2_loss = tf.contrib.layers.apply_regularization(regularizer, regular_variables) multitarget_loss = value_loss + policy_loss + l2_loss multitarget_loss_collection.append(multitarget_loss) value_loss_collection.append(value_loss) policy_loss_collection.append(policy_loss) net_softmax = tf.nn.softmax(net_unsoftmax) net_softmax_collection.append(net_softmax) correct_prediction = tf.equal(tf.argmax(nextmove, 1), tf.argmax(net_softmax, 1)) with tf.variable_scope("Accuracy"): accuracy_select = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) accuracy_select_collection.append(accuracy_select) tf.get_variable_scope().reuse_variables() trainable_params = tf.trainable_variables() tp_policy = [i for i in trainable_params if ('value_head' not in i.name)] tp_value = [i for i in trainable_params if ('policy_head' not in i.name)] add_grad_to_list(optimizer_policy, tp_policy, policy_loss, tower_gradients_policy) add_grad_to_list(optimizer_value, tp_value, value_loss, tower_gradients_value) add_grad_to_list(optimizer_multitarg, trainable_params, multitarget_loss, tower_gradients_multitarg) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op_policy = get_op_mul(tower_gradients_policy, optimizer_policy, global_step) train_op_value = get_op_mul(tower_gradients_value, optimizer_value, global_step) train_op_multitarg = get_op_mul(tower_gradients_multitarg, optimizer_multitarg, global_step) net_softmax = merge(net_softmax_collection) value_head = merge(value_head_collection) multitarget_loss = reduce_mean(multitarget_loss_collection) value_loss = reduce_mean(value_loss_collection) policy_loss = reduce_mean(policy_loss_collection) accuracy_select = reduce_mean(accuracy_select_collection) with graph.as_default(): config = tf.ConfigProto() config.gpu_options.allow_growth = True config.allow_soft_placement = True sess = tf.Session(config=config) if model_name is not None: with graph.as_default(): saver = tf.train.Saver(var_list=tf.global_variables()) saver.restore(sess, model_name) else: with graph.as_default(): sess.run(tf.global_variables_initializer()) return (sess, graph), ((x_input, training), (net_softmax, value_head))3.实现MCTSAlphaZero利用MCTS来自博弈生成棋局,MCTS搜索原理简述如下:每次模拟通过选择具有最大行动价值Q的边加上取决于所存储的先验概率P和该边的访问计数N(每次访问都被增加一次)的上限置信区间U来遍历树;展开叶子节点,通过神经网络来评估局面s,向量P的值存储在叶子结点扩展的边上;更新行动价值Q等于在该行动下的子树中的所有评估值V的均值;一旦MCTS搜索完成,返回局面s下的落子概率π。def softmax(x): probs = np.exp(x - np.max(x)) probs /= np.sum(probs) return probsclass TreeNode(object): """A node in the MCTS tree. Each node keeps track of its own value Q, prior probability P, and its visit-count-adjusted prior score u. """ def __init__(self, parent, prior_p, noise=False): self._parent = parent self._children = {} # a map from action to TreeNode self._n_visits = 0 self._Q = 0 self._u = 0 self._P = prior_p self.virtual_loss = 0 self.noise = noise def expand(self, action_priors): """Expand tree by creating new children. action_priors: a list of tuples of actions and their prior probability according to the policy function. """ # dirichlet noise should be applied when every select action if False and self.noise is True and self._parent is None: noise_d = np.random.dirichlet([0.3] * len(action_priors)) for (action, prob), one_noise in zip(action_priors, noise_d): if action not in self._children: prob = (1 - 0.25) * prob + 0.25 * one_noise self._children[action] = TreeNode(self, prob, noise=self.noise) else: for action, prob in action_priors: if action not in self._children: self._children[action] = TreeNode(self, prob) def select(self, c_puct): """Select action among children that gives maximum action value Q plus bonus u(P). Return: A tuple of (action, next_node) """ if self.noise is False: return max(self._children.items(), key=lambda act_node: act_node[1].get_value(c_puct)) elif self.noise is True and self._parent is not None: return max(self._children.items(), key=lambda act_node: act_node[1].get_value(c_puct)) else: noise_d = np.random.dirichlet([0.3] * len(self._children)) return max(list(zip(noise_d, self._children.items())), key=lambda act_node: act_node[1][1].get_value(c_puct, noise_p=act_node[0]))[1] def update(self, leaf_value): """Update node values from leaf evaluation. leaf_value: the value of subtree evaluation from the current player's perspective. """ # Count visit. self._n_visits += 1 # Update Q, a running average of values for all visits. self._Q += 1.0 * (leaf_value - self._Q) / self._n_visits def update_recursive(self, leaf_value): """Like a call to update(), but applied recursively for all ancestors. """ # If it is not root, this node's parent should be updated first. if self._parent: self._parent.update_recursive(-leaf_value) self.update(leaf_value) def get_value(self, c_puct, noise_p=None): """Calculate and return the value for this node. It is a combination of leaf evaluations Q, and this node's prior adjusted for its visit count, u. c_puct: a number in (0, inf) controlling the relative impact of value Q, and prior probability P, on this node's score. """ if noise_p is None: self._u = (c_puct * self._P * np.sqrt(self._parent._n_visits) / (1 + self._n_visits)) return self._Q + self._u + self.virtual_loss else: self._u = (c_puct * (self._P * 0.75 + noise_p * 0.25) * np.sqrt(self._parent._n_visits) / (1 + self._n_visits)) return self._Q + self._u + self.virtual_loss def is_leaf(self): """Check if leaf node (i.e. no nodes below this have been expanded).""" return self._children == {} def is_root(self): return self._parent is Noneclass MCTS(object): """An implementation of Monte Carlo Tree Search.""" def __init__( self, policy_value_fn, c_puct=5, n_playout=10000, search_threads=32, virtual_loss=3, policy_loop_arg=False, dnoise=False, play=False ): """ policy_value_fn: a function that takes in a board state and outputs a list of (action, probability) tuples and also a score in [-1, 1] (i.e. the expected value of the end game score from the current player's perspective) for the current player. c_puct: a number in (0, inf) that controls how quickly exploration converges to the maximum-value policy. A higher value means relying on the prior more. """ self._root = TreeNode(None, 1.0, noise=dnoise) self._policy = policy_value_fn self._c_puct = c_puct self._n_playout = n_playout self.virtual_loss = virtual_loss self.loop = asyncio.get_event_loop() self.policy_loop_arg = policy_loop_arg self.sem = asyncio.Semaphore(search_threads) self.now_expanding = set() self.select_time = 0 self.policy_time = 0 self.update_time = 0 self.num_proceed = 0 self.dnoise = dnoise self.play = play async def _playout(self, state): """Run a single playout from the root to the leaf, getting a value at the leaf and propagating it back through its parents. State is modified in-place, so a copy must be provided. """ with await self.sem: node = self._root road = [] while 1: while node in self.now_expanding: await asyncio.sleep(1e-4) start = time.time() if node.is_leaf(): break # Greedily select next move. action, node = node.select(self._c_puct) road.append(node) node.virtual_loss -= self.virtual_loss state.do_move(action) self.select_time += (time.time() - start) # at leave node if long check or long catch then cut off the node if state.should_cutoff() and not self.play: # cut off node for one_node in road: one_node.virtual_loss += self.virtual_loss # now at this time, we do not update the entire tree branch, the accuracy loss is supposed to be small # set virtual loss to -inf so that other threads would not # visit the same node again(so the node is cut off) node.virtual_loss = - np.inf self.update_time += (time.time() - start) # however the proceed number still goes up 1 self.num_proceed += 1 return start = time.time() self.now_expanding.add(node) # Evaluate the leaf using a network which outputs a list of # (action, probability) tuples p and also a score v in [-1, 1] # for the current player if self.policy_loop_arg is False: action_probs, leaf_value = await self._policy(state) else: action_probs, leaf_value = await self._policy(state, self.loop) self.policy_time += (time.time() - start) start = time.time() # Check for end of game. end, winner = state.game_end() if not end: node.expand(action_probs) else: # for end state,return the "true" leaf_value if winner == -1: # tie leaf_value = 0.0 else: leaf_value = ( 1.0 if winner == state.get_current_player() else -1.0 ) # Update value and visit count of nodes in this traversal. for one_node in road: one_node.virtual_loss += self.virtual_loss node.update_recursive(-leaf_value) self.now_expanding.remove(node) # node.update_recursive(leaf_value) self.update_time += (time.time() - start) self.num_proceed += 1 def get_move_probs(self, state, temp=1e-3, predict_workers=[], can_apply_dnoise=False, verbose=False, infer_mode=False): """Run all playouts sequentially and return the available actions and their corresponding probabilities. state: the current game state temp: temperature parameter in (0, 1] controls the level of exploration """ if can_apply_dnoise is False: self._root.noise = False coroutine_list = [] for n in range(self._n_playout): state_copy = copy.deepcopy(state) coroutine_list.append(self._playout(state_copy)) coroutine_list += predict_workers self.loop.run_until_complete(asyncio.gather(*coroutine_list)) # calc the move probabilities based on visit counts at the root node act_visits = [(act, node._n_visits) for act, node in self._root._children.items()] acts, visits = zip(*act_visits) act_probs = softmax(1.0 / temp * np.log(np.array(visits) + 1e-10)) if infer_mode: info = [(act, node._n_visits, node._Q, node._P) for act, node in self._root._children.items()] if infer_mode: return acts, act_probs, info else: return acts, act_probs def update_with_move(self, last_move, allow_legacy=True): """Step forward in the tree, keeping everything we already know about the subtree. """ self.num_proceed = 0 if last_move in self._root._children and allow_legacy: self._root = self._root._children[last_move] self._root._parent = None else: self._root = TreeNode(None, 1.0, noise=self.dnoise) def __str__(self): return "MCTS"4.实现自博弈过程实现自博弈训练,基于同一个神经网络初始化对弈双方棋手,对弈过程中双方棋手每下一步前均采用MCTS搜索最优下子策略,每次自博弈一局结束后保存棋局。# Self-playclass Game(object): def __init__(self, white, black, verbose=True): self.white = white self.black = black self.verbose = verbose self.gamestate = gameplay.GameState() def play_till_end(self): winner = 'peace' moves = [] peace_round = 0 remain_piece = gameplay.countpiece(self.gamestate.statestr) while True: start_time = time.time() if self.gamestate.move_number % 2 == 0: player_name = 'w' player = self.white else: player_name = 'b' player = self.black move, score = player.make_move(self.gamestate) if move is None: winner = 'b' if player_name == 'w' else 'w' break moves.append(move) total_time = time.time() - start_time logging.info('move {} {} play {} use {:.2f}s'.format( self.gamestate.move_number, player_name, move, total_time,)) game_end, winner_p = self.gamestate.game_end() if game_end: winner = winner_p break remain_piece_round = gameplay.countpiece(self.gamestate.statestr) if remain_piece_round < remain_piece: remain_piece = remain_piece_round peace_round = 0 else: peace_round += 1 if peace_round > conf.non_cap_draw_round: winner = 'peace' break return winner, movesclass NetworkPlayGame(Game): def __init__(self, network_w, network_b, **xargs): whiteplayer = players.NetworkPlayer('w', network_w, **xargs) blackplayer = players.NetworkPlayer('b', network_b, **xargs) super(NetworkPlayGame, self).__init__(whiteplayer, blackplayer)class ContinousNetworkPlayGames(object): def __init__( self, network_w=None, network_b=None, white_name='net', black_name='net', random_switch=True, recoard_game=True, recoard_dir='data/distributed/', play_times=np.inf, distributed_dir='data/prepare_weight', **xargs ): self.network_w = network_w self.network_b = network_b self.white_name = white_name self.black_name = black_name self.random_switch = random_switch self.play_times = play_times self.recoard_game = recoard_game self.recoard_dir = recoard_dir self.xargs = xargs # self.distributed_server = distributed_server self.distributed_dir = distributed_dir def begin_of_game(self): pass def end_of_game(self, cbf_name, moves, cbfile, training_dt, epoch): pass def play(self, data_url=None, epoch=0): num = 0 while num < self.play_times: time_one_game_start = time.time() num += 1 self.begin_of_game(epoch) if self.random_switch and random.random() < 0.5: self.network_w, self.network_b = self.network_b, self.network_w self.white_name, self.black_name = self.black_name, self.white_name network_play_game = NetworkPlayGame(self.network_w, self.network_b, **self.xargs) winner, moves = network_play_game.play_till_end() stamp = time.strftime('%Y-%m-%d_%H-%M-%S', time.localtime(time.time())) date = time.strftime('%Y-%m-%d', time.localtime(time.time())) cbfile = cbf.CBF( black=self.black_name, red=self.white_name, date=date, site='北京', name='noname', datemodify=date, redteam=self.white_name, blackteam=self.black_name, round='第一轮' ) cbfile.receive_moves(moves) randstamp = random.randint(0, 1000) cbffilename = '{}_{}_mcts-mcts_{}-{}_{}.cbf'.format( stamp, randstamp, self.white_name, self.black_name, winner) if not os.path.exists(self.recoard_dir): os.makedirs(self.recoard_dir) cbf_name = os.path.join(self.recoard_dir, cbffilename) cbfile.dump(cbf_name) training_dt = time.time() - time_one_game_start self.end_of_game(cbffilename, moves, cbfile, training_dt, epoch)class DistributedSelfPlayGames(ContinousNetworkPlayGames): def __init__(self, gpu_num=0, auto_update=True, mode='train', **kwargs): self.gpu_num = gpu_num self.auto_update = auto_update self.model_name_in_use = None # for tracking latest weight self.mode = mode super(DistributedSelfPlayGames, self).__init__(**kwargs) def begin_of_game(self, epoch): """ when self playing, init network player using the latest weights """ if not self.auto_update: return latest_model_name = get_latest_weight_path() logging.info('------------------ restoring model {}'.format(latest_model_name)) model_path = os.path.join(self.distributed_dir, latest_model_name) if self.network_w is None or self.network_b is None: network = get_model_resnet( model_path, create_uci_labels(), gpu_core=[self.gpu_num], filters=conf.network_filters, num_res_layers=conf.network_layers ) self.network_w = network self.network_b = network self.model_name_in_use = model_path else: if model_path != self.model_name_in_use: (sess, graph), ((X, training), (net_softmax, value_head)) = self.network_w with graph.as_default(): saver = tf.train.Saver(var_list=tf.global_variables()) saver.restore(sess, model_path) self.model_name_in_use = model_path def end_of_game(self, cbf_name, moves, cbfile, training_dt, epoch): played_games = len(os.listdir(conf.distributed_datadir)) if self.mode == 'train': logging.info('------------------ epoch {}: trained {} games, this game used {}s'.format( epoch, played_games, round(training_dt, 6), )) else: logging.info('------------------ infer {} games, this game used {}s'.format( played_games, round(training_dt, 6), )) def self_play_gpu(gpu_num=0, play_times=np.inf, mode='train', n_playout=50, save_dir=conf.distributed_datadir): logging.info('------------------ self play start') cn = DistributedSelfPlayGames( gpu_num=gpu_num, n_playout=n_playout, recoard_dir=save_dir, c_puct=conf.c_puct, distributed_dir=conf.distributed_server_weight_dir, dnoise=True, is_selfplay=True, play_times=play_times, mode=mode, ) cn.play(epoch=0) logging.info('------------------ self play done') 5.进行训练参数配置配置一次训练过程中自博弈次数、训练结束后采用训练出的模型进行推理局数、训练batch_size。为简化训练过程参数均较小。config = { "n_playout": 100, # MCTS搜索次数,推荐(10-1600),数字越小程序运行越快,数字越大算法搜索准确度越高 "self_play_games": 2, # 自博弈对局数, 推荐(5-10000),注意数字较大时可能会超过资源免费体验时长 "infer_games": 1, # 推理对局数 "gpu_num": 0, # 使用的GPU卡号}6.开始自博弈训练,结束后更新模型运行过程中会打印出双方下棋动作self_play_gpu(config['gpu_num'], config['self_play_games'], mode='train', n_playout=config['n_playout'])# model updatemodel_update(gpu_num=config['gpu_num'])7.可视化对局(等待第六步运行结束后再运行此步)在此将加载模型进行博弈一次,可视化对局过程,最后显示对弈结束时的棋面self_play_gpu(config['gpu_num'], config['infer_games'], mode='infer', n_playout=config['n_playout'], save_dir='./infer_res')加载对局文件显示双方所有动作,动作为棋盘上起点坐标至终点坐标,具体坐标定义见后面的棋盘。%reload_ext autoreload%autoreload 2%matplotlib inlinefrom matplotlib import pyplot as pltfrom cchess_training.cchess_zero.gameboard import *from PIL import Imageimport imageiogameplay_path = './infer_res'while not os.path.exists(gameplay_path) or len(os.listdir(gameplay_path)) == 0: time.sleep(5) logging.info('第6步未运行结束,建议停止运行,重新逐步运行')gameplays = os.listdir(gameplay_path)fullpath = '{}/{}'.format(gameplay_path, random.choice(gameplays))moves = cbf.cbf2move(fullpath)fname = fullpath.split('/')[-1]print(moves)['b2e2', 'h7h5', 'b0c2', 'b7e7', 'h0g2', 'h5i5', 'h2i2', 'a9a7', 'i0i1', 'i5g5', 'c2e1', 'h9i7', 'c3c4', 'e6e5', 'a0b0', 'e7g7', 'g3g4', 'c6c5', 'i1h1', 'i7g8', 'e2e5', 'i9i8', 'g2f4', 'b9c7', 'g4g5', 'c7b9', 'f4e6', 'a7e7', 'e6g7', 'e7e5', 'b0b9', 'c5c4', 'i2e2', 'c4d4', 'e2e5', 'i8i7', 'b9c9', 'i7g7', 'h1h8', 'i6i5', 'a3a4', 'g7a7', 'i3i4', 'i5i4', 'g5g6', 'i4i3', 'h8g8', 'i3h3', 'g8f8', 'a7a8', 'f8f9', 'e9f9', 'g6f6', 'a6a5', 'a4a5', 'd4e4', 'e3e4', 'g9e7', 'g0e2', 'a8g8', 'c9d9', 'g8g1']可视化对弈过程import cv2from IPython.display import clear_output, Image, displaystate = gameplay.GameState()statestr = 'RNBAKABNR/9/1C5C1/P1P1P1P1P/9/9/p1p1p1p1p/1c5c1/9/rnbakabnr'for move in moves: clear_output(wait=True) statestr = GameBoard.sim_do_action(move, statestr) img = board_visualizer.get_board_img(statestr) img_show = cv2.cvtColor(img, cv2.COLOR_RGBA2BGR) display(Image(data=cv2.imencode('.jpg', img_show)[1])) time.sleep(0.5)显示终局棋面plt.figure(figsize=(8,8))plt.imshow(board_visualizer.get_board_img(statestr))至此,本案例结束,如果想要完整地训练一个中国象棋AlphaZero AI,可在AI Gallery中订阅《CChess中国象棋》算法,并在ModelArts中进行训练。8. 作业请你调整步骤5中的训练参数,重新训练一个模型,使它在游戏中获得更好的表现
  • [互动交流] 在AI Gallery订阅的数据集可以在SDK中使用吗?
    在AI Gallery订阅的数据集可以在SDK中使用吗
  • [互动交流] 线上训练得到的模型是否支持离线部署在本地?
    线上训练得到的模型是否支持离线部署在本地
  • [互动交流] Notebook报RuntimeError怎么解决
    报错信息 RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory刷新重启都没有用,求助,如何解决?
  • [互动交流] jupyter页面的“ModelArts Examples”页签无法加载,并且在jupyter页面中创建文件夹失败
    打开一个使用GPU的Notebook后,jupyter页面的“ModelArts Examples”页签无法加载,引擎类型无法加载,并且在jupyter页面中创建文件夹失败。
  • [互动交流] notebook中的checkpoints文件夹打不开
    创建的checkpoints文件夹打不开
  • [互动交流] 请问HiLens和ModelArts是什么关系
    HiLens和ModelArts的关系
  • [互动交流] apis怎么定义?为什么参数配置为空?
    创建AI应用时,apis如何定义,老是创建后显示为空
  • [其他] AI大赛
    Dying polar bear standing on collapsing glacier, white brown hair, mess, melting glacier, collapse, black sky, endless white, ocean, spectacular detail, volume lighting, dramatic lighting -9:16 - Test - Creative, realistic