-
LLaMA-Factory多机多卡训练为了在多机多卡环境下训练大模型,我们可以使用LLaMA-Factory。它支持多种常见模型,集成了包括(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等训练方法,并且有web-ui和命令行两种使用方式,是目前主流的模型训练框架之一。1 安装LLaMa-Factory下载 LLAMA-Factory 并进入项目目录,本文档所有操作均在该目录下进行:git clone https://github.com/hiyouga/LLaMA-Factory.gitcd LLaMA-Factory2、Python 环境创建创建并激活 Python 环境:conda create -y -n llamafactory python=3.10conda activate llamafactory3、LaMA-Factory 安装使用以下指令安装带有 torch-npu 的 LLaMA-Factory:pip install -e “.[torch-npu,metrics]” -i https://pypi.tuna.tsinghua.edu.cn/simple使用自定义数据集时,需要更新 data/dataset_info.json 文件。image.png多机多卡训练LLaMA-Factory支持多种多机多卡训练方式,包括DDP,DeepSpeed,FSDP。针对想要使用 NativeDDP 或 DeepSpeed 两种分布式训练引擎,推荐使用下列命令,区分两种训练引擎仅仅在于训练的yaml参数文件中。然后,必须在每个节点上使用export HCCL_SOCKET_IFNAME=eth0 来指定当前节点的 HCCL 通信网卡(请使用目标网卡名替换 eth0)。以两机环境为例,分别在主、从节点(机器)上执行如下两条命令即可启动多机训练:FORCE_TORCHRUN=1 NNODES=2 RANK=0 STER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml 主节点运行FORCE_TORCHRUN=1 NNODES=2 RANK=1STER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml 从节点运行使用此方式需要在每台机器上分别运行指令,同时每台机器上都需要安装LLaMA-Factory和相同的conda环境,都需要保存一份要训练的模型文件。参考yamlmodelmodel_name_or_path: Qwen/Qwen3-8B-Base # 更新为 本地路径(否则回去社区下载)methodstage: sftdo_train: truefinetuning_type: loralora_target: q_proj,v_proj # 可保留,也可扩展为更多模块(见下方建议)lora_rank: 64 # 推荐设置,LoRA 秩lora_dropout: 0.05 # 可选:增加轻微 dropout 提升泛化lora_alpha: 16 # 缩放参数,一般设为 r 的倍数ddpddp_timeout: 180000000deepspeed: examples/deepspeed/ds_z0_config.json # 根据显卡数量选择合适的 ZeRO 配置datasetdataset: identity,alpaca_en_demo # 示例数据集,可替换为你自己的template: qwen # 注意:Qwen3 支持新的 template 名称,但目前仍可用 qwencutoff_len: 8192 # Qwen3 支持最长 32768,但训练时建议从 8192 起步以节省显存max_samples: 1000overwrite_cache: truepreprocessing_num_workers: 16outputoutput_dir: saves/Qwen3-8B/lora/sft # 输出路径更新logging_steps: 10save_steps: 500eval_steps: 500plot_loss: trueoverwrite_output_dir: truetrainper_device_train_batch_size: 1 # 根据 GPU 显存调整(如 A100 80G 可尝试 2)gradient_accumulation_steps: 4 # 增大以补偿小 batch size,提升有效 batchlearning_rate: 2e-5 # 推荐 LoRA 学习率范围 1e-5 ~ 5e-5num_train_epochs: 3.0lr_scheduler_type: cosinewarmup_ratio: 0.1fp16: true # 如果使用 bf16,请确保硬件支持并改用 bf16: trueevalval_size: 0.1per_device_eval_batch_size: 1eval_strategy: stepseval_on_train: false # 是否在训练集上也评估(可选)additionalreport_to: tensorboard # 或 wandb,用于可视化监控seed: 42
-
引言监督微调(Supervised Fine-Tuning, SFT)是大语言模型(LLM)适应特定任务的关键环节。高质量指令数据是SFT成功的基石,直接决定了模型的表现和能力边界。本文将深入探讨SFT指令数据的构建全流程,从理论框架到实践操作,提供详细的代码实现和系统化的构建方法。一、SFT指令数据概述1.1 什么是SFT指令数据?SFT指令数据是用于教导大模型遵循人类指令的监督学习数据,通常采用以下格式:<指令><输入>(可选)<期望输出>1.2 数据构建核心原则多样性:覆盖多种任务类型和领域高质量:准确、有用、安全的回复一致性:相同指令的回复风格一致可扩展性:便于后续迭代和扩充二、SFT数据构建全流程SFT数据构建流程需求分析与任务定义数据收集与生成数据清洗与格式化质量评估与筛选数据增强与扩充数据集划分与验证最终数据集输出三、数据需求分析与任务定义3.1 确定目标任务领域from typing import List, Dict, Anyfrom enum import Enumfrom dataclasses import dataclassclass TaskType(Enum): """任务类型枚举""" QA = "question_answering" # 问答 SUMMARIZATION = "summarization" # 摘要生成 TRANSLATION = "translation" # 翻译 CREATIVE_WRITING = "creative_writing" # 创意写作 CODE_GENERATION = "code_generation" # 代码生成 REASONING = "reasoning" # 逻辑推理 DIALOGUE = "dialogue" # 对话系统@dataclassclass TaskDefinition: """任务定义数据结构""" task_type: TaskType domain: str complexity_level: str # simple, medium, complex expected_skills: List[str] examples: List[Dict[str, str]]def define_tasks() -> List[TaskDefinition]: """定义需要覆盖的任务类型""" tasks = [ TaskDefinition( task_type=TaskType.QA, domain="general", complexity_level="medium", expected_skills=["information_retrieval", "explanation"], examples=[ {"instruction": "解释量子计算的基本原理", "output": "量子计算利用量子比特..."} ] ), TaskDefinition( task_type=TaskType.CODE_GENERATION, domain="programming", complexity_level="complex", expected_skills=["python", "algorithm"], examples=[ {"instruction": "写一个Python函数计算斐波那契数列", "output": "def fibonacci(n):..."} ] ), # 更多任务定义... ] return tasks# 任务定义示例tasks = define_tasks()print(f"定义的任务数量: {len(tasks)}")for task in tasks[:2]: print(f"任务类型: {task.task_type.value}, 领域: {task.domain}")3.2 设计指令模板系统class InstructionTemplate: """指令模板类""" def __init__(self): self.templates = { TaskType.QA: [ "请解释{concept}", "什么是{concept}?", "详细说明{concept}的工作原理" ], TaskType.CODE_GENERATION: [ "写一个{language}函数来实现{functionality}", "用{language}编写代码解决{problem}", "实现一个{algorithm}算法" ], TaskType.SUMMARIZATION: [ "总结以下文本:{text}", "为这篇文章生成一个摘要:{text}", "用一句话概括主要内容:{text}" ] } def generate_instruction(self, task_type: TaskType, **kwargs) -> str: """生成指令""" if task_type not in self.templates: return "" template = random.choice(self.templates[task_type]) return template.format(**kwargs) def add_template(self, task_type: TaskType, template: str): """添加新模板""" if task_type not in self.templates: self.templates[task_type] = [] self.templates[task_type].append(template)# 使用示例template_system = InstructionTemplate()instruction = template_system.generate_instruction( TaskType.QA, concept="机器学习")print(f"生成的指令: {instruction}")四、数据收集与生成策略4.1 多源数据收集方法import jsonimport pandas as pdfrom typing import List, Dictimport requestsfrom bs4 import BeautifulSoupclass DataCollector: """多源数据收集器""" def __init__(self): self.collected_data = [] def collect_from_existing_datasets(self, file_paths: List[str]): """从现有数据集中收集""" for file_path in file_paths: if file_path.endswith('.jsonl'): with open(file_path, 'r', encoding='utf-8') as f: for line in f: data = json.loads(line) self.collected_data.append(data) elif file_path.endswith('.json'): with open(file_path, 'r', encoding='utf-8') as f: data = json.load(f) self.collected_data.extend(data) def scrape_qa_websites(self, urls: List[str]): """从问答网站抓取数据""" for url in urls: try: response = requests.get(url, timeout=10) soup = BeautifulSoup(response.content, 'html.parser') # 假设的抓取逻辑,需要根据实际网站结构调整 questions = soup.find_all('h2', class_='question-title') answers = soup.find_all('div', class_='answer-content') for q, a in zip(questions, answers): self.collected_data.append({ "instruction": q.text.strip(), "input": "", "output": a.text.strip()[:500] # 限制长度 }) except Exception as e: print(f"抓取 {url} 失败: {e}") def collect_from_api(self, api_url: str, params: Dict): """从API获取数据""" try: response = requests.get(api_url, params=params) data = response.json() for item in data.get('results', []): self.collected_data.append({ "instruction": item.get('question', ''), "input": item.get('context', ''), "output": item.get('answer', '') }) except Exception as e: print(f"API请求失败: {e}") def get_collected_data(self) -> List[Dict]: """获取收集的数据""" return self.collected_data# 使用示例collector = DataCollector()collector.collect_from_existing_datasets(['existing_data.jsonl'])4.2 使用LLM生成高质量数据from openai import OpenAIimport timefrom typing import Listclass DataGenerator: """使用LLM生成指令数据""" def __init__(self, api_key: str): self.client = OpenAI(api_key=api_key) self.generated_data = [] def generate_qa_pairs(self, topics: List[str], num_pairs_per_topic: int = 10): """生成问答对数据""" for topic in topics: print(f"生成主题 '{topic}' 的问答对...") prompt = f""" 请为以下主题生成{num_pairs_per_topic}个高质量的问答对。 主题:{topic} 要求: 1. 问题要有挑战性且明确 2. 答案要准确、详细、有用 3. 格式为JSON列表,每个元素包含question和answer字段 返回格式: [{{"question": "问题1", "answer": "答案1"}}, ...] """ try: response = self.client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0.7, max_tokens=2000 ) # 解析生成的问答对 content = response.choices[0].message.content qa_pairs = json.loads(content) for pair in qa_pairs: self.generated_data.append({ "instruction": pair["question"], "input": "", "output": pair["answer"] }) time.sleep(1) # 避免速率限制 except Exception as e: print(f"生成主题 '{topic}' 时出错: {e}") def generate_with_template(self, template: str, variations: int = 5): """基于模板生成数据变体""" prompt = f""" 基于以下模板生成{variations}个不同的指令-输出对: 模板:{template} 要求: 1. 保持相同的意思但使用不同的表达方式 2. 输出要高质量、准确 3. 返回JSON格式:{{"instructions": [{{"instruction": "...", "output": "..."}}]}} """ try: response = self.client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0.8, max_tokens=1500 ) content = response.choices[0].message.content data = json.loads(content) for item in data.get("instructions", []): self.generated_data.append({ "instruction": item["instruction"], "input": "", "output": item["output"] }) except Exception as e: print(f"模板生成失败: {e}") def get_generated_data(self) -> List[Dict]: """获取生成的数据""" return self.generated_data# 使用示例(需要API密钥)# generator = DataGenerator("your-api-key")# generator.generate_qa_pairs(["人工智能", "机器学习", "深度学习"], 5)五、数据清洗与格式化5.1 数据清洗管道import refrom typing import List, Dict, Anyimport pandas as pdclass DataCleaner: """数据清洗处理器""" def __init__(self): self.cleaning_rules = { 'remove_html': True, 'remove_special_chars': True, 'min_length': 10, 'max_length': 1000, 'language_filter': 'zh' # 中文过滤 } def clean_text(self, text: str) -> str: """清洗单个文本""" if not text or not isinstance(text, str): return "" # 移除HTML标签 if self.cleaning_rules['remove_html']: text = re.sub(r'<[^>]+>', '', text) # 移除特殊字符 if self.cleaning_rules['remove_special_chars']: text = re.sub(r'[^\w\s\u4e00-\u9fff,。!?:;()【】《》]', '', text) # 长度过滤 if len(text) < self.cleaning_rules['min_length']: return "" if len(text) > self.cleaning_rules['max_length']: text = text[:self.cleaning_rules['max_length']] return text.strip() def clean_dataset(self, dataset: List[Dict]) -> List[Dict]: """清洗整个数据集""" cleaned_data = [] for item in dataset: try: # 清洗每个字段 instruction = self.clean_text(item.get('instruction', '')) input_text = self.clean_text(item.get('input', '')) output = self.clean_text(item.get('output', '')) # 跳过无效数据 if not instruction or not output: continue cleaned_data.append({ 'instruction': instruction, 'input': input_text, 'output': output }) except Exception as e: print(f"清洗数据时出错: {e}") continue return cleaned_data def remove_duplicates(self, dataset: List[Dict]) -> List[Dict]: """去除重复数据""" seen = set() unique_data = [] for item in dataset: # 基于instruction和output的哈希去重 key = hash(item['instruction'] + item['output']) if key not in seen: seen.add(key) unique_data.append(item) return unique_data# 使用示例cleaner = DataCleaner()raw_data = [ {"instruction": "<html>解释机器学习</html>", "input": "", "output": "机器学习是..."}, {"instruction": "什么是深度学习?", "input": "", "output": "深度学习是机器学习的一个子领域..."}]cleaned_data = cleaner.clean_dataset(raw_data)print(f"清洗后数据量: {len(cleaned_data)}")5.2 数据格式化与标准化class DataFormatter: """数据格式化处理器""" def __init__(self): self.format_templates = { 'alpaca': { 'template': "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n{output}", 'description': "Alpaca格式" }, 'simple': { 'template': "Instruction: {instruction}\nInput: {input}\nOutput: {output}", 'description': "简单格式" }, 'chatml': { 'template': "<|im_start|>user\n{instruction}\n{input}<|im_end|>\n<|im_start|>assistant\n{output}<|im_end|>", 'description': "ChatML格式" } } def format_to_template(self, data: Dict, template_name: str = 'alpaca') -> str: """格式化为指定模板""" if template_name not in self.format_templates: raise ValueError(f"不支持的模板: {template_name}") template = self.format_templates[template_name]['template'] return template.format( instruction=data['instruction'], input=data['input'] or '', output=data['output'] ) def convert_to_training_format(self, dataset: List[Dict], format_type: str = 'alpaca') -> List[str]: """转换为训练格式""" formatted_data = [] for item in dataset: try: formatted = self.format_to_template(item, format_type) formatted_data.append(formatted) except Exception as e: print(f"格式化失败: {e}") continue return formatted_data def export_to_file(self, dataset: List[Dict], file_path: str, format: str = 'jsonl'): """导出到文件""" if format == 'jsonl': with open(file_path, 'w', encoding='utf-8') as f: for item in dataset: f.write(json.dumps(item, ensure_ascii=False) + '\n') elif format == 'json': with open(file_path, 'w', encoding='utf-8') as f: json.dump(dataset, f, ensure_ascii=False, indent=2) elif format == 'txt': formatted = self.convert_to_training_format(dataset, 'alpaca') with open(file_path, 'w', encoding='utf-8') as f: f.write('\n'.join(formatted))# 使用示例formatter = DataFormatter()sample_data = [{ "instruction": "解释机器学习", "input": "", "output": "机器学习是人工智能的一个子领域..."}]formatted = formatter.convert_to_training_format(sample_data, 'alpaca')print("格式化后的数据:")print(formatted[0])六、质量评估与筛选6.1 自动化质量评估class QualityEvaluator: """数据质量评估器""" def __init__(self): self.metrics_weights = { 'relevance': 0.3, 'accuracy': 0.3, 'clarity': 0.2, 'safety': 0.2 } def evaluate_single_example(self, instruction: str, output: str) -> Dict[str, float]: """评估单个样本质量""" scores = { 'relevance': self._score_relevance(instruction, output), 'accuracy': self._score_accuracy(output), 'clarity': self._score_clarity(output), 'safety': self._score_safety(output) } # 计算加权总分 total_score = sum(scores[metric] * weight for metric, weight in self.metrics_weights.items()) return {'scores': scores, 'total_score': total_score} def _score_relevance(self, instruction: str, output: str) -> float: """相关性评分""" # 简单的关键词匹配(实际中可以使用更复杂的方法) instruction_words = set(instruction.lower().split()) output_words = set(output.lower().split()) if not instruction_words: return 0.0 overlap = len(instruction_words & output_words) / len(instruction_words) return min(overlap * 2, 1.0) # 缩放至0-1范围 def _score_accuracy(self, output: str) -> float: """准确性评分(简化版)""" # 实际应用中可以使用事实核查API或知识库 positive_indicators = ['研究表明', '根据数据', '实验证明'] negative_indicators = ['我认为', '可能', '也许'] score = 0.5 # 基础分 for indicator in positive_indicators: if indicator in output: score += 0.1 for indicator in negative_indicators: if indicator in output: score -= 0.1 return max(0.0, min(1.0, score)) def _score_clarity(self, output: str) -> float: """清晰度评分""" # 基于句子长度和复杂度 sentences = re.split(r'[。!?.!?]', output) if not sentences: return 0.0 avg_length = sum(len(sent) for sent in sentences) / len(sentences) if avg_length < 10: return 0.3 elif avg_length < 20: return 0.7 else: return 0.5 # 太长的句子可能不够清晰 def _score_safety(self, output: str) -> float: """安全性评分""" harmful_patterns = [ r'暴力', r'仇恨', r'歧视', r'违法', r'自杀', r'kill', r'hate', r'discriminate', r'illegal' ] for pattern in harmful_patterns: if re.search(pattern, output, re.IGNORECASE): return 0.0 return 1.0 def filter_low_quality(self, dataset: List[Dict], threshold: float = 0.6) -> List[Dict]: """过滤低质量数据""" high_quality_data = [] for item in dataset: score = self.evaluate_single_example( item['instruction'], item['output'] )['total_score'] if score >= threshold: high_quality_data.append(item) return high_quality_data# 使用示例evaluator = QualityEvaluator()test_example = {"instruction": "解释机器学习", "output": "机器学习是人工智能的重要分支"}score = evaluator.evaluate_single_example( test_example['instruction'], test_example['output'])print(f"质量评分: {score}")七、数据增强与扩充7.1 数据增强策略class DataAugmentor: """数据增强处理器""" def __init__(self): self.augmentation_methods = [ 'paraphrase', 'back_translation', 'noise_injection', 'context_expansion' ] def augment_dataset(self, dataset: List[Dict], num_variations: int = 3) -> List[Dict]: """增强数据集""" augmented_data = [] for item in dataset: variations = self._create_variations(item, num_variations) augmented_data.extend(variations) return augmented_data def _create_variations(self, item: Dict, num_variations: int) -> List[Dict]: """创建数据变体""" variations = [] # 保留原始数据 variations.append(item) # 生成释义变体 if 'paraphrase' in self.augmentation_methods: for _ in range(num_variations - 1): paraphrased = self._paraphrase_item(item) if paraphrased: variations.append(paraphrased) return variations def _paraphrase_item(self, item: Dict) -> Dict: """生成释义版本""" # 简单的同义词替换(实际可以使用更复杂的方法) instruction = item['instruction'] output = item['output'] synonym_map = { '解释': ['说明', '阐述', '讲解'], '什么是': ['请介绍', '请说明', '请解释'], '如何': ['怎样', '怎么', '如何做'] } for original, replacements in synonym_map.items(): if original in instruction: new_instruction = instruction.replace( original, random.choice(replacements) ) return { 'instruction': new_instruction, 'input': item['input'], 'output': output } return None# 使用示例augmentor = DataAugmentor()original_data = [{ "instruction": "解释机器学习的基本概念", "input": "", "output": "机器学习是让计算机从数据中学习规律的方法..."}]augmented = augmentor.augment_dataset(original_data, 2)print(f"增强后数据量: {len(augmented)}")for i, item in enumerate(augmented): print(f"变体 {i+1}: {item['instruction']}")八、完整的数据构建管道8.1 端到端数据构建流程class SFTDataPipeline: """端到端SFT数据构建管道""" def __init__(self, config: Dict): self.config = config self.collector = DataCollector() self.generator = DataGenerator(config.get('api_key', '')) self.cleaner = DataCleaner() self.evaluator = QualityEvaluator() self.formatter = DataFormatter() self.augmentor = DataAugmentor() self.final_dataset = [] def run_pipeline(self): """运行完整的数据构建流程""" print("开始SFT数据构建流程...") # 1. 数据收集 print("\n1. 数据收集阶段") self._collect_data() # 2. 数据生成 print("\n2. 数据生成阶段") self._generate_data() # 3. 数据清洗 print("\n3. 数据清洗阶段") self._clean_data() # 4. 质量评估 print("\n4. 质量评估阶段") self._evaluate_quality() # 5. 数据增强 print("\n5. 数据增强阶段") self._augment_data() # 6. 最终格式化 print("\n6. 最终格式化阶段") self._format_final_data() print(f"\n流程完成!最终数据集大小: {len(self.final_dataset)}") def _collect_data(self): """数据收集步骤""" if self.config.get('collect_from_files'): self.collector.collect_from_existing_datasets( self.config['collect_from_files'] ) collected = self.collector.get_collected_data() print(f"收集到 {len(collected)} 条数据") self.final_dataset.extend(collected) def _generate_data(self): """数据生成步骤""" if self.config.get('generate_topics'): self.generator.generate_qa_pairs( self.config['generate_topics'], self.config.get('pairs_per_topic', 5) ) generated = self.generator.get_generated_data() print(f"生成 {len(generated)} 条数据") self.final_dataset.extend(generated) def _clean_data(self): """数据清洗步骤""" self.final_dataset = self.cleaner.clean_dataset(self.final_dataset) self.final_dataset = self.cleaner.remove_duplicates(self.final_dataset) print(f"清洗后剩余 {len(self.final_dataset)} 条数据") def _evaluate_quality(self): """质量评估步骤""" self.final_dataset = self.evaluator.filter_low_quality( self.final_dataset, self.config.get('quality_threshold', 0.6) ) print(f"质量过滤后剩余 {len(self.final_dataset)} 条数据") def _augment_data(self): """数据增强步骤""" if self.config.get('enable_augmentation', True): self.final_dataset = self.augmentor.augment_dataset( self.final_dataset, self.config.get('augmentation_factor', 2) ) print(f"增强后数据量: {len(self.final_dataset)}") def _format_final_data(self): """最终格式化""" output_format = self.config.get('output_format', 'jsonl') output_path = self.config.get('output_path', 'sft_dataset.jsonl') self.formatter.export_to_file( self.final_dataset, output_path, output_format ) print(f"数据已导出到: {output_path}") def get_dataset_stats(self) -> Dict[str, Any]: """获取数据集统计信息""" instructions = [item['instruction'] for item in self.final_dataset] avg_instruction_len = sum(len(inst) for inst in instructions) / len(instructions) outputs = [item['output'] for item in self.final_dataset] avg_output_len = sum(len(out) for out in outputs) / len(outputs) return { 'total_examples': len(self.final_dataset), 'avg_instruction_length': avg_instruction_len, 'avg_output_length': avg_output_len, 'has_input': any(item.get('input') for item in self.final_dataset) }# 配置和运行管道config = { 'collect_from_files': ['existing_data.jsonl'], 'generate_topics': ['人工智能', '机器学习', '深度学习'], 'pairs_per_topic': 3, 'quality_threshold': 0.6, 'enable_augmentation': True, 'augmentation_factor': 2, 'output_format': 'jsonl', 'output_path': 'final_sft_dataset.jsonl', 'api_key': 'your-api-key-here' # 实际使用时替换}# 创建并运行管道pipeline = SFTDataPipeline(config)pipeline.run_pipeline()# 输出统计信息stats = pipeline.get_dataset_stats()print(f"\n数据集统计:")for key, value in stats.items(): print(f"{key}: {value}")九、最佳实践与注意事项9.1 数据构建最佳实践渐进式构建:从小规模开始,逐步扩展多样化来源:结合人工创建和自动生成持续评估:建立自动化的质量监控体系版本控制:对数据集进行版本管理安全过滤:确保内容安全性和合规性9.2 常见陷阱与解决方案问题:数据质量不一致解决方案:建立严格的质量评估标准问题:领域覆盖不足解决方案:系统性规划任务类型和领域问题:生成数据缺乏多样性解决方案:使用多种提示词和生成策略问题:计算资源消耗大解决方案:分布式处理和缓存机制结语构建高质量的SFT指令数据是一个系统工程,需要综合考虑数据收集、生成、清洗、评估和增强等多个环节。本文提供的完整流程和代码实现为构建有效的指令微调数据集提供了实用指南。关键要点:数据质量比数量更重要多样性和覆盖面是关键自动化管道可以提高效率持续评估和迭代是必要的通过遵循本文的方法和最佳实践,您可以构建出高质量、多样化的SFT指令数据集,为训练优秀的大语言模型奠定坚实基础。———————————————— 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。 原文链接:https://blog.csdn.net/qq_16242613/article/details/151893183
-
准备数据集使用胎儿头围数据集Automated measurement of fetal head circumference,在怀孕期间,超声成像用于测量胎儿头围,监测胎儿的生长。数据集包含标准平面的二维(2D)超声图像。Automated measurement of fetal head circumferenceFor more information about this dataset go to: https://hc18.grand-challenge.org/https://zenodo.org/record/1322001#.XcX1jk9KhhEimport ospath2train="./data/training_set/" imgsList=[pp for pp in os.listdir(path2train) if "Annotation" not in pp]anntsList=[pp for pp in os.listdir(path2train) if "Annotation" in pp]print("number of images:", len(imgsList))print("number of annotations:", len(anntsList)) import numpy as npnp.random.seed(2024)rndImgs=np.random.choice(imgsList,4)rndImgs import matplotlib.pylab as pltfrom PIL import Imagefrom scipy import ndimage as ndifrom skimage.segmentation import mark_boundariesfrom torchvision.transforms.functional import to_tensor, to_pil_imageimport torch def show_img_mask(img, mask): if torch.is_tensor(img): img=to_pil_image(img) mask=to_pil_image(mask) img_mask=mark_boundaries(np.array(img), np.array(mask), outline_color=(0,1,0), color=(0,1,0)) plt.imshow(img_mask)AI写代码for fn in rndImgs: path2img = os.path.join(path2train, fn) path2annt= path2img.replace(".png", "_Annotation.png") img = Image.open(path2img) annt_edges = Image.open(path2annt) mask = ndi.binary_fill_holes(annt_edges) plt.figure() plt.subplot(1, 3, 1) plt.imshow(img, cmap="gray") plt.subplot(1, 3, 2) plt.imshow(mask, cmap="gray") plt.subplot(1, 3, 3) show_img_mask(img, mask)AI写代码 plt.figure()plt.subplot(1, 3, 1) plt.imshow(img, cmap="gray")plt.axis('off') plt.subplot(1, 3, 2) plt.imshow(mask, cmap="gray")plt.axis('off') plt.subplot(1, 3, 3) show_img_mask(img, mask)plt.axis('off')AI写代码 # conda install conda-forge/label/cf202003::albumentationsfrom albumentations import ( HorizontalFlip, VerticalFlip, Compose, Resize,) h,w=128,192transform_train = Compose([ Resize(h,w), HorizontalFlip(p=0.5), VerticalFlip(p=0.5), ]) transform_val = Resize(h,w)AI写代码创建自定义数据集from torch.utils.data import Datasetfrom PIL import Imagefrom torchvision.transforms.functional import to_tensor, to_pil_image class fetal_dataset(Dataset): def __init__(self, path2data, transform=None): imgsList=[pp for pp in os.listdir(path2data) if "Annotation" not in pp] anntsList=[pp for pp in os.listdir(path2train) if "Annotation" in pp] self.path2imgs = [os.path.join(path2data, fn) for fn in imgsList] self.path2annts= [p2i.replace(".png", "_Annotation.png") for p2i in self.path2imgs] self.transform = transform def __len__(self): return len(self.path2imgs) def __getitem__(self, idx): path2img = self.path2imgs[idx] image = Image.open(path2img) path2annt = self.path2annts[idx] annt_edges = Image.open(path2annt) mask = ndi.binary_fill_holes(annt_edges) image= np.array(image) mask=mask.astype("uint8") if self.transform: augmented = self.transform(image=image, mask=mask) image = augmented['image'] mask = augmented['mask'] image= to_tensor(image) mask=255*to_tensor(mask) return image, maskAI写代码fetal_ds1=fetal_dataset(path2train, transform=transform_train)fetal_ds2=fetal_dataset(path2train, transform=transform_val)img,mask=fetal_ds1[0]print(img.shape, img.type(),torch.max(img))print(mask.shape, mask.type(),torch.max(mask)) show_img_mask(img, mask)AI写代码 划分数据集按照8:2的比例划分训练数据集和验证数据集from sklearn.model_selection import ShuffleSplit sss = ShuffleSplit(n_splits=1, test_size=0.2, random_state=0)indices=range(len(fetal_ds1))for train_index, val_index in sss.split(indices): print(len(train_index)) print("-"*10) print(len(val_index))AI写代码from torch.utils.data import Subset train_ds=Subset(fetal_ds1,train_index)print(len(train_ds))val_ds=Subset(fetal_ds2,val_index)print(len(val_ds))AI写代码展示训练数据集示例图像 plt.figure(figsize=(5,5))for img,mask in train_ds: show_img_mask(img,mask) breakAI写代码展示验证数据集示例图像 plt.figure(figsize=(5,5))for img,mask in val_ds: show_img_mask(img,mask) breakAI写代码创建数据加载器from torch.utils.data import DataLoadertrain_dl = DataLoader(train_ds, batch_size=8, shuffle=True)val_dl = DataLoader(val_ds, batch_size=16, shuffle=False) for img_b, mask_b in train_dl: print(img_b.shape,img_b.dtype) print(mask_b.shape, mask_b.dtype) break for img_b, mask_b in val_dl: print(img_b.shape,img_b.dtype) print(mask_b.shape, mask_b.dtype) break torch.max(img_b)AI写代码 搭建模型基于编码器-解码器模型encoder–decoder model搭建分割任务模型 import torch.nn as nnimport torch.nn.functional as F class SegNet(nn.Module): def __init__(self, params): super(SegNet, self).__init__() C_in, H_in, W_in=params["input_shape"] init_f=params["initial_filters"] num_outputs=params["num_outputs"] self.conv1 = nn.Conv2d(C_in, init_f, kernel_size=3,stride=1,padding=1) self.conv2 = nn.Conv2d(init_f, 2*init_f, kernel_size=3,stride=1,padding=1) self.conv3 = nn.Conv2d(2*init_f, 4*init_f, kernel_size=3,padding=1) self.conv4 = nn.Conv2d(4*init_f, 8*init_f, kernel_size=3,padding=1) self.conv5 = nn.Conv2d(8*init_f, 16*init_f, kernel_size=3,padding=1) self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) self.conv_up1 = nn.Conv2d(16*init_f, 8*init_f, kernel_size=3,padding=1) self.conv_up2 = nn.Conv2d(8*init_f, 4*init_f, kernel_size=3,padding=1) self.conv_up3 = nn.Conv2d(4*init_f, 2*init_f, kernel_size=3,padding=1) self.conv_up4 = nn.Conv2d(2*init_f, init_f, kernel_size=3,padding=1) self.conv_out = nn.Conv2d(init_f, num_outputs , kernel_size=3,padding=1) def forward(self, x): x = F.relu(self.conv1(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv2(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv3(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv4(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv5(x)) x=self.upsample(x) x = F.relu(self.conv_up1(x)) x=self.upsample(x) x = F.relu(self.conv_up2(x)) x=self.upsample(x) x = F.relu(self.conv_up3(x)) x=self.upsample(x) x = F.relu(self.conv_up4(x)) x = self.conv_out(x) return x params_model={ "input_shape": (1,h,w), "initial_filters": 16, "num_outputs": 1, } model = SegNet(params_model) import torchdevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')model=model.to(device)AI写代码打印模型结构print(model)AI写代码获取模型摘要 from torchsummary import summarysummary(model, input_size=(1, h, w))AI写代码定义损失函数def dice_loss(pred, target, smooth = 1e-5): intersection = (pred * target).sum(dim=(2,3)) union= pred.sum(dim=(2,3)) + target.sum(dim=(2,3)) dice= 2.0 * (intersection + smooth) / (union+ smooth) loss = 1.0 - dice return loss.sum(), dice.sum() import torch.nn.functional as F def loss_func(pred, target): bce = F.binary_cross_entropy_with_logits(pred, target, reduction='sum') pred= torch.sigmoid(pred) dlv, _ = dice_loss(pred, target) loss = bce + dlv return loss for img_v,mask_v in val_dl: mask_v= mask_v[8:] break for img_t,mask_t in train_dl: break print(dice_loss(mask_v,mask_v))loss_func(mask_v,torch.zeros_like(mask_v))AI写代码import torchvision def metrics_batch(pred, target): pred= torch.sigmoid(pred) _, metric=dice_loss(pred, target) return metric def loss_batch(loss_func, output, target, opt=None): loss = loss_func(output, target) with torch.no_grad(): pred= torch.sigmoid(output) _, metric_b=dice_loss(pred, target) if opt is not None: opt.zero_grad() loss.backward() opt.step() return loss.item(), metric_bAI写代码定义优化器from torch import optimopt = optim.Adam(model.parameters(), lr=3e-4) from torch.optim.lr_scheduler import ReduceLROnPlateaulr_scheduler = ReduceLROnPlateau(opt, mode='min',factor=0.5, patience=20,verbose=1) def get_lr(opt): for param_group in opt.param_groups: return param_group['lr'] current_lr=get_lr(opt)print('current lr={}'.format(current_lr))AI写代码训练和评估模型def loss_epoch(model,loss_func,dataset_dl,sanity_check=False,opt=None): running_loss=0.0 running_metric=0.0 len_data=len(dataset_dl.dataset) for xb, yb in dataset_dl: xb=xb.to(device) yb=yb.to(device) output=model(xb) loss_b, metric_b=loss_batch(loss_func, output, yb, opt) running_loss += loss_b if metric_b is not None: running_metric+=metric_b if sanity_check is True: break loss=running_loss/float(len_data) metric=running_metric/float(len_data) return loss, metricAI写代码import copydef train_val(model, params): num_epochs=params["num_epochs"] loss_func=params["loss_func"] opt=params["optimizer"] train_dl=params["train_dl"] val_dl=params["val_dl"] sanity_check=params["sanity_check"] lr_scheduler=params["lr_scheduler"] path2weights=params["path2weights"] loss_history={ "train": [], "val": []} metric_history={ "train": [], "val": []} best_model_wts = copy.deepcopy(model.state_dict()) best_loss=float('inf') for epoch in range(num_epochs): current_lr=get_lr(opt) print('Epoch {}/{}, current lr={}'.format(epoch, num_epochs - 1, current_lr)) model.train() train_loss, train_metric=loss_epoch(model,loss_func,train_dl,sanity_check,opt) loss_history["train"].append(train_loss) metric_history["train"].append(train_metric) model.eval() with torch.no_grad(): val_loss, val_metric=loss_epoch(model,loss_func,val_dl,sanity_check) loss_history["val"].append(val_loss) metric_history["val"].append(val_metric) if val_loss < best_loss: best_loss = val_loss best_model_wts = copy.deepcopy(model.state_dict()) torch.save(model.state_dict(), path2weights) print("Copied best model weights!") lr_scheduler.step(val_loss) if current_lr != get_lr(opt): print("Loading best model weights!") model.load_state_dict(best_model_wts) print("train loss: %.6f, dice: %.2f" %(train_loss,100*train_metric)) print("val loss: %.6f, dice: %.2f" %(val_loss,100*val_metric)) print("-"*10) model.load_state_dict(best_model_wts) return model, loss_history, metric_history AI写代码opt = optim.Adam(model.parameters(), lr=3e-4) # 定义学习率调度器,当验证集上的损失不再下降时,将学习率降低为原来的0.5倍,等待20个epoch后再次降低学习率lr_scheduler = ReduceLROnPlateau(opt, mode='min',factor=0.5, patience=20,verbose=1) path2models= "./models/" # 判断path2models路径是否存在,如果不存在则创建该路径if not os.path.exists(path2models): os.mkdir(path2models) params_train={ "num_epochs": 100, "optimizer": opt, "loss_func": loss_func, "train_dl": train_dl, "val_dl": val_dl, "sanity_check": False, "lr_scheduler": lr_scheduler, "path2weights": path2models+"weights.pt",} model,loss_hist,metric_hist=train_val(model,params_train)AI写代码打印训练验证损失num_epochs=params_train["num_epochs"] plt.title("Train-Val Loss")plt.plot(range(1,num_epochs+1),loss_hist["train"],label="train")plt.plot(range(1,num_epochs+1),loss_hist["val"],label="val")plt.ylabel("Loss")plt.xlabel("Training Epochs")plt.legend()plt.show()AI写代码 打印训练验证精度# plot accuracy progressplt.title("Train-Val Accuracy")plt.plot(range(1,num_epochs+1),metric_hist["train"],label="train")plt.plot(range(1,num_epochs+1),metric_hist["val"],label="val")plt.ylabel("Accuracy")plt.xlabel("Training Epochs")plt.legend()plt.show()AI写代码———————————————— 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。 原文链接:https://blog.csdn.net/weixin_73404807/article/details/140868283
-
如何在OrangePi Studio Pro上升级CANN以及的Pytorch和MindSpore1. 安装 CANN 和 Pytorch首先我们在昇腾资源下载中心硬件信息中产品系列选择:加速卡,产品型号选择:Atlas 300V Pro 视频解析卡,CANN版本选择:8.2.RC1,下载CANN相关软件包,获取Pytorch源码。下载完成后,就安装CANN以及Pytorch了,我使用的OrangePi制作的预装好AI环境的Ubuntu22.04测试镜像,因此只需要升级Ascend-cann-toolkit_8.2.RC1_linux-x86_64.run和Ascend-cann-kernels-310p_8.2.RC1_linux-x86_64.run以及torch_npu-2.1.0.post13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl。首先我们切换到root用户安装更新依赖包列表安装g++-12:sudo apt update sudo apt install -y g++-12之后进入CANN软件包下载目录,依次执行下面的命令进行安装:chmod +x ./Ascend-cann-toolkit_8.2.RC1_linux-x86_64.run ./Ascend-cann-toolkit_8.2.RC1_linux-x86_64.run --full --quiet chmod +x ./Ascend-cann-kernels-310p_8.2.RC1_linux-x86_64.run ./Ascend-cann-kernels-310p_8.2.RC1_linux-x86_64.run --install --quiet pip3 install torch_npu-2.1.0.post13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl执行如下命令,验证是cann和torch_npu是否安装成功:source /usr/local/Ascend/ascend-toolkit/set_env.sh python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);" 2. 升级 MindSpore 版本我们访问MindSpore官网,CANN版本选择我们刚刚安装的CANN 8.2.RC1,其他配置根据自己的设备选择:切换到root用户执行如下安装命令:sudo su pip3 install mindspore==2.7.0 -i https://repo.mindspore.cn/pypi/simple --trusted-host repo.mindspore.cn --extra-index-url https://repo.huaweicloud.com/repository/pypi/simple安装完成后我们可以执行如下验证命令测试是否安装成功:source /usr/local/Ascend/ascend-toolkit/set_env.sh python3 -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check()" 如果输出下面的结果就证明 MindSpore 安装成功了![WARNING] ME(1621400:139701939115840,MainProcess):2025-09-24-10:46:21.978.000 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. MindSpore version: 2.7.0 [WARNING] GE_ADPT(1621400,7f0e18710640,python3):2025-09-24-10:46:23.323.570 [mindspore/ops/kernel/ascend/acl_ir/op_api_exec.cc:169] GetAscendDefaultCustomPath] Checking whether the so exists or if permission to access it is available: /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize_vision/op_api/lib/libcust_opapi.so The result of multiplication calculation is correct, MindSpore has been installed on platform [Ascend] successfully! 3. 小结本文详细介绍了在OrangePi Studio Pro开发板上升级CANN、PyTorch和MindSpore AI框架的完整流程。通过本文的指导,开发者可以轻松地将这些关键的AI组件升级到最新版本,从而充分发挥OrangePi Studio Pro硬件平台的AI计算能力。
-
一、常见错误类型与解决方案1. 文件路径错误报错现象:1FileNotFoundError: [Errno 2] No such file or directory: 'data/train'原因分析:相对路径使用不当数据文件未正确下载或存放解决方案:12345678910import os # 使用绝对路径data_dir = os.path.abspath("data/train")if not os.path.exists(data_dir): raise FileNotFoundError(f"路径 {data_dir} 不存在") # 动态路径构建base_dir = os.path.dirname(os.path.abspath(__file__))data_path = os.path.join(base_dir, "data", "train")2. 多进程加载异常报错现象:1RuntimeError: DataLoader worker (pid 4499) is killed by signal: Segmentation fault解决方案对比表:场景推荐方案适用环境Windows/macOS系统num_workers=0开发调试阶段Linux生产环境multiprocessing.set_start_method('spawn')GPU训练场景大数据集加载增加共享内存(--shm-size)Docker容器环境代码示例:12345678910import torchfrom torch.utils.data import DataLoader # 方法1:禁用多进程dataloader = DataLoader(dataset, batch_size=32, num_workers=0) # 方法2:设置进程启动方式import multiprocessing as mpmp.set_start_method('spawn')dataloader = DataLoader(dataset, batch_size=32, num_workers=4)3. 数据格式不匹配报错现象:1RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7]解决方案:12345678910from torchvision import transforms transform = transforms.Compose([ transforms.Resize(256), transforms.ToTensor(), # 转换为CHW格式的Tensor transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) dataset = MyDataset(transform=transform)二、高级调试技巧1. 内存优化策略场景:加载大型数据集时出现内存不足解决方案:123456789101112# 方法1:分块加载from torch.utils.data import IterableDataset class LargeDataset(IterableDataset): def __iter__(self): for i in range(1000): # 动态加载单个样本 yield torch.randn(3, 224, 224) # 方法2:使用内存映射import numpy as npdata = np.memmap("large_data.dat", dtype='float32', mode='r')2. 自定义Dataset调试推荐工具:pdb 调试器:在__getitem__方法设置断点PyTorch内置工具:1234567from torch.utils.data import get_worker_info def __getitem__(self, idx): worker_info = get_worker_info() if worker_info is not None: print(f"Worker {worker_info.id} 加载索引 {idx}") return self.data[idx]三、典型错误案例分析案例1:CUDA与多进程冲突错误现象:1RuntimeError: Cannot re-initialize CUDA in forked subprocess解决方案:12345678910# 主程序入口保护if __name__ == '__main__': # 禁用CUDA多进程初始化 torch.multiprocessing.set_sharing_strategy('file_system') # 显式指定设备 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 加载数据 dataloader = DataLoader(dataset, batch_size=32, num_workers=4)案例2:模型加载版本不兼容错误现象:1RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED解决方案:123456789101112# 方法1:指定map_locationmodel = torch.load('model.pth', map_location=torch.device('cpu')) # 方法2:转换模型版本import torch with open('legacy_model.pth', 'rb') as f: legacy_state = torch.load(f, map_location='cpu') new_model = NewModel()new_model.load_state_dict(legacy_state)torch.save(new_model.state_dict(), 'converted_model.pth')四、最佳实践建议路径管理:优先使用配置文件管理路径开发阶段使用相对路径,部署时转换为绝对路径多进程配置:1234567DataLoader( dataset, batch_size=32, num_workers=4, pin_memory=True, # 加速GPU传输 persistent_workers=True # PyTorch 1.8+)异常处理机制:123456789from torch.utils.data import DataLoader class SafeDataLoader(DataLoader): def __iter__(self): try: yield from super().__iter__() except Exception as e: print(f"数据加载异常: {str(e)}") raise 通过上述解决方案,可系统解决PyTorch数据加载过程中90%以上的常见问题
-
使用transformer的Deepseed进行单机多卡训练时算子报错。不使用Deepseed时可以正常训练模型: Qwen2.5-VL 3B镜像:pytorch_2.1.0-cann_8.0.rc2-py_3.9-euler_2.10.7-aarch64-snt9b机器:2卡910B2环境:torch 2.4.0torch-npu 2.4.0.post2torchvision 0.19.0transformers 4.51.3deepspeed 0.16.5启动命令:ASCEND_LAUNCH_BLOCKING=1 accelerate launch --config_file deep_config.yaml engine.py deep_config.yaml:compute_environment: LOCAL_MACHINEdistributed_type: DEEPSPEEDdeepspeed_config: path: ds_config.jsondebug: falsegpu_ids: "0,1"num_processes: 2use_cpu: falsenum_machines: 1machine_rank: 0same_network: truerdzv_backend: staticmain_training_function: mainmain_process_port: 29503tpu_env: []tpu_use_cluster: falsetpu_use_sudo: false deepconfig.json:{ "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 1, "gradient_clipping": 1.0, "bf16": { "enabled": true }, "fp16": { "enabled": false }, "zero_optimization": { "stage": 2, "contiguous_gradients": true, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 5e8, "allgather_bucket_size": 5e8 }, "optimizer": { "type": "AdamW", "params": { "lr": 5e-5, "betas": [0.9, 0.999], "eps": 1e-8, "weight_decay": 0.01 } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": 0.0, "warmup_max_lr": 5e-5, "warmup_num_steps": 0 } }, "activation_checkpointing": { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false }, "wall_clock_breakdown": false } 报错信息:loss ar: 59.6875, computed! performing backward pass...[rank1]: Traceback (most recent call last):[rank1]: File "/home/ma-user/work/DAR/engine.py", line 179, in <module>[rank1]: dar_trainer.train()[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train[rank1]: return inner_training_loop([rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop[rank1]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)[rank1]: File "/home/ma-user/work/DAR/dar_trainer.py", line 35, in training_step[rank1]: return self.consistency_training_step(model, inputs)[rank1]: File "/home/ma-user/work/DAR/dar_trainer.py", line 78, in consistency_training_step[rank1]: self.accelerator.backward(loss_ar)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/accelerator.py", line 2446, in backward[rank1]: self.deepspeed_engine_wrapped.backward(loss, **kwargs)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 266, in backward[rank1]: self.engine.backward(loss, **kwargs)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn[rank1]: ret_val = func(*args, **kwargs)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2187, in backward[rank1]: self._do_optimizer_backward(loss, retain_graph)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward[rank1]: self.optimizer.backward(loss, retain_graph=retain_graph)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2089, in backward[rank1]: self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward[rank1]: scaled_loss.backward(retain_graph=retain_graph)[rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward[rank1]: torch.autograd.backward([rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward[rank1]: _engine_run_backward([rank1]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward[rank1]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass[rank1]: RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv3DBackpropFilter, error code is 500002[rank1]: [ERROR] 2025-09-02-22:43:34 (PID:1305427, Device:1, RankID:1) ERR01100 OPS call acl api failed[rank1]: [Error]: A GE error occurs in the system.[rank1]: Rectify the fault based on the error information in the ascend log.[rank1]: E69999: Inner Error![rank1]: E69999: [PID: 1305427] 2025-09-02-22:43:34.306.673 op[Conv3DBackpropFilter3], illegal format of x.[FUNC:Conv3DBackpropFilterInfer][FILE:nn_calculation_ops.cc][LINE:9783][rank1]: TraceBack (most recent call last):[rank1]: Sessin_id 0 does not exist, graph_id 2[FUNC:GetJsonObject][FILE:analyzer.cc][LINE:155][rank1]: Param:graph_info is nullptr, check invalid[FUNC:DoAnalyze][FILE:analyzer.cc][LINE:253][rank1]: Param:graph_info is nullptr, check invalid[FUNC:SaveAnalyzerDataToFile][FILE:analyzer.cc][LINE:210][rank1]: Call InferShapeAndType for node:Conv3DBackpropFilter3(Conv3DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117][rank1]: process pass InferShapePass on node:Conv3DBackpropFilter3 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:563][rank1]: build graph failed, graph id:2, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][rank1]: [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][rank1]: [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][rank1]: build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]Traceback (most recent call last): File "/home/ma-user/work/DAR/engine.py", line 179, in <module> dar_trainer.train() File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train return inner_training_loop( File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop tr_loss_step = self.training_step(model, inputs, num_items_in_batch) File "/home/ma-user/work/DAR/dar_trainer.py", line 35, in training_step return self.consistency_training_step(model, inputs) File "/home/ma-user/work/DAR/dar_trainer.py", line 78, in consistency_training_step self.accelerator.backward(loss_ar) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/accelerator.py", line 2446, in backward self.deepspeed_engine_wrapped.backward(loss, **kwargs) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 266, in backward self.engine.backward(loss, **kwargs) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2187, in backward self._do_optimizer_backward(loss, retain_graph) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward self.optimizer.backward(loss, retain_graph=retain_graph) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2089, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward torch.autograd.backward( File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward _engine_run_backward( File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward passRuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv3DBackpropFilter, error code is 500002[ERROR] 2025-09-02-22:43:34 (PID:1305426, Device:0, RankID:0) ERR01100 OPS call acl api failed[Error]: A GE error occurs in the system. Rectify the fault based on the error information in the ascend log.E69999: Inner Error!E69999: [PID: 1305426] 2025-09-02-22:43:34.341.828 op[Conv3DBackpropFilter3], illegal format of x.[FUNC:Conv3DBackpropFilterInfer][FILE:nn_calculation_ops.cc][LINE:9783] TraceBack (most recent call last): Sessin_id 0 does not exist, graph_id 2[FUNC:GetJsonObject][FILE:analyzer.cc][LINE:155] Param:graph_info is nullptr, check invalid[FUNC:DoAnalyze][FILE:analyzer.cc][LINE:253] Param:graph_info is nullptr, check invalid[FUNC:SaveAnalyzerDataToFile][FILE:analyzer.cc][LINE:210] Call InferShapeAndType for node:Conv3DBackpropFilter3(Conv3DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117] process pass InferShapePass on node:Conv3DBackpropFilter3 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:563] build graph failed, graph id:2, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][rank0]: Traceback (most recent call last):[rank0]: File "/home/ma-user/work/DAR/engine.py", line 179, in <module>[rank0]: dar_trainer.train()[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train[rank0]: return inner_training_loop([rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop[rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)[rank0]: File "/home/ma-user/work/DAR/dar_trainer.py", line 35, in training_step[rank0]: return self.consistency_training_step(model, inputs)[rank0]: File "/home/ma-user/work/DAR/dar_trainer.py", line 78, in consistency_training_step[rank0]: self.accelerator.backward(loss_ar)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/accelerator.py", line 2446, in backward[rank0]: self.deepspeed_engine_wrapped.backward(loss, **kwargs)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 266, in backward[rank0]: self.engine.backward(loss, **kwargs)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn[rank0]: ret_val = func(*args, **kwargs)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2187, in backward[rank0]: self._do_optimizer_backward(loss, retain_graph)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward[rank0]: self.optimizer.backward(loss, retain_graph=retain_graph)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2089, in backward[rank0]: self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward[rank0]: scaled_loss.backward(retain_graph=retain_graph)[rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward[rank0]: torch.autograd.backward([rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward[rank0]: _engine_run_backward([rank0]: File "/home/ma-user/anaconda3/envs/llamafactory/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass[rank0]: RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv3DBackpropFilter, error code is 500002[rank0]: [ERROR] 2025-09-02-22:43:34 (PID:1305426, Device:0, RankID:0) ERR01100 OPS call acl api failed[rank0]: [Error]: A GE error occurs in the system.[rank0]: Rectify the fault based on the error information in the ascend log.[rank0]: E69999: Inner Error![rank0]: E69999: [PID: 1305426] 2025-09-02-22:43:34.341.828 op[Conv3DBackpropFilter3], illegal format of x.[FUNC:Conv3DBackpropFilterInfer][FILE:nn_calculation_ops.cc][LINE:9783][rank0]: TraceBack (most recent call last):[rank0]: Sessin_id 0 does not exist, graph_id 2[FUNC:GetJsonObject][FILE:analyzer.cc][LINE:155][rank0]: Param:graph_info is nullptr, check invalid[FUNC:DoAnalyze][FILE:analyzer.cc][LINE:253][rank0]: Param:graph_info is nullptr, check invalid[FUNC:SaveAnalyzerDataToFile][FILE:analyzer.cc][LINE:210][rank0]: Call InferShapeAndType for node:Conv3DBackpropFilter3(Conv3DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117][rank0]: process pass InferShapePass on node:Conv3DBackpropFilter3 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:563][rank0]: build graph failed, graph id:2, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][rank0]: [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][rank0]: [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][rank0]: build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
-
Deepwave 为 PyTorch 提供了波传播模块,适用于地震成像、地震反演等应用场景。repo地址:https://github.com/ar4/deepwave 在昇腾NPU jupyterbook中安装使用时报错如下:$ pip install deepwave >>> import deepwave Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.10/site-packages/deepwave/__init__.py", line 25, in <module> dll_cpu = ctypes.CDLL( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.10/ctypes/__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.10/site-packages/deepwave/libdeepwave_cpu_linux_x86_64.so: cannot open shared object file: No such file or directory (PyTorch-2.1.0) [ma-user 057]$ls /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.10/site-packages/deepwave/lib libdeepwave_cpu_linux_x86_64.so libdeepwave_cpu_macos_x86_64.dylib libdeepwave_cuda_linux_x86_64.so libgomp.so.1 libdeepwave_cpu_macos_arm64.dylib libdeepwave_cpu_windows_x86_64.dll libdeepwave_cuda_windows_x86_64.dll libiomp5md.dllEuler系统是aarch64架构,如何支持该模块运行?
-
一、词嵌入(Word Embedding)简介词嵌入是自然语言处理(NLP)中的一项核心技术,它将离散的词语映射到连续的向量空间中。通过词嵌入,语义相似的词语在向量空间中的位置也会相近。为什么需要词嵌入?解决维度灾难:传统one-hot编码维度等于词汇表大小,而词嵌入维度可自定义捕捉语义关系:通过向量空间中的距离反映词语间的语义关系迁移学习:预训练的词嵌入可以在不同任务间共享二、PyTorch中的nn.Embedding详解1. nn.Embedding基础nn.Embedding是PyTorch中实现词嵌入的核心模块,本质上是一个查找表,将整数索引(代表词语)映射到固定维度的稠密向量。123456789import torchimport torch.nn as nn# 基本使用示例embedding = nn.Embedding(num_embeddings=10, embedding_dim=5)# num_embeddings: 词汇表大小# embedding_dim: 词向量维度input = torch.LongTensor([1, 2, 3]) # 3个词的索引output = embedding(input)print(output.shape) # torch.Size([3, 5])2. nn.Embedding参数详解12345678910111213torch.nn.Embedding( num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None, _freeze=False, device=None, dtype=None)重要参数解释:num_embeddings (int): 词汇表的大小,即最大整数索引+1embedding_dim (int): 每个词向量的维度padding_idx (int, optional): 如果指定,此索引处的向量将全为0且在训练中不会更新max_norm (float, optional): 如果指定,超过此范数的向量将被重新归一化norm_type (float, optional): 为max_norm计算p-norm时的p值,默认为2scale_grad_by_freq (bool, optional): 如果为True,将根据单词在batch中的频率缩放梯度sparse (bool, optional): 如果为True,使用稀疏梯度更新权重矩阵3. 初始化与预训练词嵌入12345# 随机初始化embedding = nn.Embedding(100, 50) # 100个词,每个词50维# 使用预训练词向量pretrained_weights = torch.randn(100, 50) # 模拟预训练权重embedding = nn.Embedding.from_pretrained(pretrained_weights)4. 使用padding_idx处理变长序列12345embedding = nn.Embedding(100, 50, padding_idx=0)# 假设0是padding的索引input = torch.LongTensor([[1, 2, 3, 0], [4, 5, 0, 0]]) # batch_size=2, seq_len=4output = embedding(input)print(output.shape) # torch.Size([2, 4, 50])三、实战应用示例1. 基础文本分类模型12345678910111213141516171819import torchimport torch.nn as nnclass TextClassifier(nn.Module): def __init__(self, vocab_size, embed_dim, num_classes): super(TextClassifier, self).__init__() self.embedding = nn.Embedding(vocab_size, embed_dim) self.fc = nn.Linear(embed_dim, num_classes) def forward(self, x): # x shape: (batch_size, seq_len) embedded = self.embedding(x) # (batch_size, seq_len, embed_dim) # 取序列中所有词向量的平均值 pooled = embedded.mean(dim=1) # (batch_size, embed_dim) out = self.fc(pooled) return out# 使用示例model = TextClassifier(vocab_size=10000, embed_dim=300, num_classes=5)input = torch.LongTensor([[1, 2, 3], [4, 5, 0]]) # batch_size=2, seq_len=3output = model(input)print(output.shape) # torch.Size([2, 5])2. 结合LSTM的序列模型12345678910111213141516171819class LSTMModel(nn.Module): def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers, num_classes): super(LSTMModel, self).__init__() self.embedding = nn.Embedding(vocab_size, embed_dim) self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True) self.fc = nn.Linear(hidden_dim, num_classes) def forward(self, x): # x shape: (batch_size, seq_len) embedded = self.embedding(x) # (batch_size, seq_len, embed_dim) lstm_out, (h_n, c_n) = self.lstm(embedded) # lstm_out: (batch_size, seq_len, hidden_dim) # 取最后一个时间步的输出 out = self.fc(lstm_out[:, -1, :]) return out# 使用示例model = LSTMModel(vocab_size=10000, embed_dim=300, hidden_dim=128, num_layers=2, num_classes=5)input = torch.LongTensor([[1, 2, 3, 4], [5, 6, 0, 0]]) # batch_size=2, seq_len=4output = model(input)print(output.shape) # torch.Size([2, 5])3. 可视化词嵌入12345678910111213141516171819202122import matplotlib.pyplot as pltfrom sklearn.manifold import TSNEdef visualize_embeddings(embedding_layer, word_to_idx, words): # 获取词向量 indices = torch.LongTensor([word_to_idx[word] for word in words]) vectors = embedding_layer(indices).detach().numpy() # 使用t-SNE降维 tsne = TSNE(n_components=2, random_state=42) vectors_2d = tsne.fit_transform(vectors) # 可视化 plt.figure(figsize=(10, 8)) for i, word in enumerate(words): plt.scatter(vectors_2d[i, 0], vectors_2d[i, 1]) plt.annotate(word, xy=(vectors_2d[i, 0], vectors_2d[i, 1])) plt.show()# 示例词汇words = ["king", "queen", "man", "woman", "computer", "data"]word_to_idx = {word: i for i, word in enumerate(words)}# 创建嵌入层embedding = nn.Embedding(len(words), 50)# 可视化visualize_embeddings(embedding, word_to_idx, words)四、高级技巧与注意事项1. 冻结词嵌入层123456# 冻结嵌入层参数(不更新)embedding = nn.Embedding(1000, 300)embedding.weight.requires_grad = False# 或者使用from_pretrained时直接冻结pretrained = torch.randn(1000, 300)embedding = nn.Embedding.from_pretrained(pretrained, freeze=True)2. 处理OOV(Out-Of-Vocabulary)问题12345# 方法1: 使用UNK tokenvocab = {"<UNK>": 0, ...} # 将未知词映射到0embedding = nn.Embedding(len(vocab), 300, padding_idx=0)# 方法2: 随机初始化unk_vector = torch.randn(300) # 为OOV词准备的特殊向量3. 结合预训练词向量12345678910111213141516def load_pretrained_embeddings(word_to_idx, embedding_file, embedding_dim): # 创建权重矩阵 embedding_matrix = torch.zeros(len(word_to_idx), embedding_dim) # 加载预训练词向量(这里以GloVe格式为例) with open(embedding_file, 'r', encoding='utf-8') as f: for line in f: values = line.split() word = values[0] if word in word_to_idx: idx = word_to_idx[word] vector = torch.tensor([float(val) for val in values[1:]]) embedding_matrix[idx] = vector return nn.Embedding.from_pretrained(embedding_matrix)# 使用示例word_to_idx = {"hello": 0, "world": 1, ...} # 你的词汇表embedding = load_pretrained_embeddings(word_to_idx, 'glove.6B.100d.txt', 100)五、常见问题解答Q1: 如何选择词向量的维度?A: 一般经验值:小型数据集:50-100维中型数据集:200-300维大型数据集:300-500维也可以尝试不同维度比较模型性能Q2: 什么时候应该使用预训练词向量?A:当你的训练数据较少时当你的任务与预训练语料领域相似时当你没有足够的计算资源从头训练时Q3: padding_idx和masking有什么区别?A:padding_idx只是将特定索引的向量设为零且不更新masking则是完全忽略这些位置,不参与计算(如在RNN中)Q4: 如何更新预训练词向量?A:1embedding = nn.Embedding.from_pretrained(pretrained_weights, freeze=False) # 设置freeze=False
-
目前正在尝试使用pytorch或者mindspore框架,但是查看文档发现只支持atlas训练系列,后面我又在昇腾开发资源下载中心内找到了该型号的板子支持的资源,包括了pytorch及一些nnrt,nnae资源,并提供了torch_npu的下载链接,所以感觉又是能支持pytorch。请问ascend 310b这个型号的板子能跑pytorch这种机器学习框架吗?能用pytorch做训练和在线推理吗?或者原生的acl支持训练功能吗?
-
目前正在尝试使用pytorch或者mindspore框架,但是查看文档发现只支持atlas训练系列,后面我又在昇腾开发资源下载中心内找到了该型号的板子支持的资源,包括了pytorch及一些nnrt,nnae资源,并提供了torch_npu的下载链接,所以感觉又是能支持pytorch。请问ascend 310b这个型号的板子能跑pytorch这种机器学习框架吗?能用pytorch做训练和在线推理吗?或者原生的acl支持训练功能吗?
-
资源购买资源名称规格操作系统存储ECS鲲鹏内存优化型 km1.xlarge.8 4vCPUs 32GiBHuawei Cloud EulerOS 2.0 标准版 64位 ARM版系统盘:超高IO,100GiB基础软件安装Condamkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -f ~/miniconda3/miniconda.sh source ~/miniconda3/bin/activate conda init --allPython & Pip官方默认安装的python版本为3.99,版本比较低,这里使用Conda创建一个3.10的虚拟环境。conda create -n QAnything python=3.10 conda activate QAnything # 设置 pip软件源 pip config set global.index-url https://repo.huaweicloud.com/repository/pypi/simple python -m pip install --upgrade pipDocker & Docker Compose官方默认安装的docker版本为18.09.0,版本比较低,很多新的特性无法使用,如docker-buildx等,建议升级到最新版本。因为官方并未提供Huawei Cloud EulerOS 2.0的repo支持,所以可以采取以下方式进行安装。如果之前安装过docker,要先删掉之后再安装依赖sudo dnf remove docker docker-ce-cli docker-selinux docker-engine下载repo文件wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors.huaweicloud.com/docker-ce/linux/centos/docker-ce.repo sudo sed -i 's+download.docker.com+mirrors.huaweicloud.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo sudo sed -i 's+$releasever+9.9+' /etc/yum.repos.d/docker-ce.repo安装新版本sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin设置开机启动sudo systemctl enable --now docker配置镜像加速器vi /etc/docker/daemon.json # 粘贴以下配置,保存退出,镜像地址可替换成自己在华为云申请的镜像加速器地址 { "registry-mirrors": [ "https://docker.1ms.run", "https://docker.xuanyuan.me"] } 重启dockersystemctl restart docker下载QAnything源码git clone https://github.com/netease-youdao/QAnything.git官方的docker镜像xixihahaliu01/qanything-linux:v1.5.1 是在X86的服务器下构建的,无法在ARM架构下运行,需要手动构建Docker镜像。构建QAnything镜像进入build_images,查看Dockerfile,可以得到以下几点信息Dockerfile文件要移动到父目录下才能构建需要准备models、nltk_data文件夹数据# 复制 requirements.txt 文件到容器中 COPY requirements.txt /tmp/requirements.txt # 复制 models 文件夹到 /root 目录 COPY models /root/models COPY nltk_data /root/nltk_data想知道model文件夹下要准备什么数据就要看这个文件复制到容器后是如何使用的,通过docker-compose-linux.yaml 可知,运行qanything-container-local容器时会执行/bin/bash -c "cd /workspace/QAnything && bash scripts/entrypoint.sh",打开script/entrypoint.sh,其中跟/root/models和/root/nltk_data有关的信息如下:# 创建软连接 if [ ! -L "/workspace/QAnything/qanything_kernel/dependent_server/embedding_server/embedding_model_configs_v0.0.1" ]; then # 如果不存在软连接 cd /workspace/QAnything/qanything_kernel/dependent_server/embedding_server && ln -s /root/models/linux_onnx/embedding_model_configs_v0.0.1 . fi if [ ! -L "/workspace/QAnything/qanything_kernel/dependent_server/rerank_server/rerank_model_configs_v0.0.1" ]; then # 如果不存在软连接 cd /workspace/QAnything/qanything_kernel/dependent_server/rerank_server && ln -s /root/models/linux_onnx/rerank_model_configs_v0.0.1 . fi if [ ! -L "/workspace/QAnything/qanything_kernel/dependent_server/ocr_server/ocr_models" ]; then # 如果不存在软连接 cd /workspace/QAnything/qanything_kernel/dependent_server/ocr_server && ln -s /root/models/ocr_models . # 创建软连接 fi if [ ! -L "/workspace/QAnything/qanything_kernel/dependent_server/pdf_parser_server/pdf_to_markdown/checkpoints" ]; then # 如果不存在软连接 cd /workspace/QAnything/qanything_kernel/dependent_server/pdf_parser_server/pdf_to_markdown/ && ln -s /root/models/pdf_models checkpoints # 创建软连接 fi if [ ! -L "/workspace/QAnything/nltk_data" ]; then # 如果不存在软连接 cd /workspace/QAnything/ && ln -s /root/nltk_data . # 创建软连接 fi从脚本内容结合官方README可知,models文件夹中要准备embedding、rerank、ocr、pdf四个模型,nltk_data下要准备nltk 数据model文件夹下要准备的数据下面下载模型数据都是在QAnything目录下执行安装modelscopepip install modelscopebce-embedding-base_v1modelscope download --model netease-youdao/bce-embedding-base_v1 --local_dir ./models/linux_onnx/embedding_model_configs_v0.0.1bce-reranker-base_v1modelscope download --model netease-youdao/bce-reranker-base_v1 --local_dir ./models/linux_onnx/rerank_model_configs_v0.0.1pdf_models和ocr_modelmodelscope download --model netease-youdao/QAnything-pdf-parser --local_dir ./models/pdf_modelspdf_models中已经包含了ocr,可以重用,这里要改下entrypoint.sh中ocr_models路径if [ ! -L "/workspace/QAnything/qanything_kernel/dependent_server/ocr_server/ocr_models" ]; then # 如果不存在软连接 cd /workspace/QAnything/qanything_kernel/dependent_server/ocr_server && ln -s /root/models/pdf_models/ocr ocr_models # 创建软连接 finltk_data文件夹下要准备的数据modelscope download --dataset CaiJichang/nltk_data --local_dir ./nltk_data构建之前需要对项目中的一些文件做些优化。Dockfile优化在QAnything目录下创建新的Dockerfilevi Dockerfile # 添加以下内容 # 使用官方 Python 3.10.14 镜像作为基础镜像 FROM python:3.10-slim # 替换APT源 RUN sed -i 's/http:\/\/deb.debian.org\//https:\/\/mirrors.huaweicloud.com\//g' /etc/apt/sources.list.d/debian.sources # 设置时区 ENV TZ=Asia/Shanghai RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone # 安装 RUN apt-get update && apt-get install -y \ vim \ wget \ htop \ build-essential \ procps \ && rm -rf /var/lib/apt/lists/\* # 创建TikToken缓存目录 RUN mkdir /opt/tiktoken_cache # 下载TikToken模型缓存 ARG TIKTOKEN_URL="https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken" RUN wget -O /opt/tiktoken_cache/$(echo -n $TIKTOKEN_URL | sha1sum | head -c 40) "$TIKTOKEN_URL" # 设置环境变量指向TikToken缓存目录 ENV TIKTOKEN_CACHE_DIR=/opt/tiktoken_cache # 复制 requirements.txt 文件到容器中 COPY requirements.txt /tmp/requirements.txt RUN pip config set global.index-url https://repo.huaweicloud.com/repository/pypi/simple \ && python -m pip install --upgrade pip # 安装 Python 依赖(torch单独安装CPU版本) RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu \ && pip install -r /tmp/requirements.txt # 复制 models 文件夹到 /root 目录 COPY models /root/models COPY nltk_data /root/nltk_data # 设置工作目录 WORKDIR /workspace # 清理 APT 缓存 RUN apt-get clean && rm -rf /var/lib/apt/lists/\* # 设置默认命令 CMD ["/bin/bash"] requirements.txt优化默认的依赖包中会安装CUDA依赖,本次我们是纯CPU部署,不需要这些。vi requirements.txt # 添加以下内容 onnxruntime==1.17.1 xgboost-cpu==3.0.0 concurrent-log-handler==0.9.25 boto3==1.34.79 sanic==23.6.0 sanic_ext==23.6.0 langchain-openai==0.3.7 langchain_elasticsearch==0.3.2 langchain-community==0.3.18 unstructured==0.12.4 unstructured[pptx]==0.12.4 unstructured[md]==0.12.4 opencv-python-headless==4.9.0.80 python-dotenv==1.0.1 mysql-connector-python==8.2.0 pymilvus==2.5.5 aiomysql==0.2.0 PyMuPDF==1.24.4 openpyxl==3.1.2 python-docx==1.1.0 newspaper4k==0.9.3.1 newspaper4k[zh]==0.9.3.1 duckduckgo-search==5.3.0b4 html2text==2024.2.26 mistune==3.0.2 flair==0.13.0 nltk==3.8.1 pandas==2.1.1 scikit-learn==1.3.2 chardet==5.2.0 scipy==1.10.1 fastchat==0.1.0 wikipedia==1.4.0 Wikipedia-API==0.6.0 rouge-score==0.1.2 toml==0.10.2 tqdm==4.66.1 anthropic==0.25.7 streamlit==1.34.0 zhipuai==2.0.1.20240429 tiktoken==0.7.0 modelscope==1.13.0 cryptography==42.0.8 shapely==2.0.4 pyclipper==1.3.0.post5 pdfplumber==0.11.0 markdownify==0.12.1 datrie==0.8.2 hanziconv==0.3.2 PyPDF2==3.0.1 lxml_html_clean==0.1.1 docx2txt==0.8 构建docker镜像# 在QAnything目录下执行 docker build -t xixihahaliu01/qanything-linux:v1.5.1 . docker-compose-linux.yaml优化在qanything_local下的volumes,其实不用把整个QAnything映射到容器中,只需要映射需要的文件即可 volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/qanything_kernel:/workspace/QAnything/qanything_kernel - ${DOCKER_VOLUME_DIRECTORY:-.}/logs:/workspace/QAnything/logs - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes:/workspace/QAnything/volumes - ${DOCKER_VOLUME_DIRECTORY:-.}/QANY_DB:/workspace/QAnything/QANY_DB - ${DOCKER_VOLUME_DIRECTORY:-.}/scripts/entrypoint.sh:/workspace/QAnything/scripts/entrypoint.sh代码优化qanything_kernel/core/retriever/vectorstore.py@get_time def delete_expr(self, expr): # 如果expr为空,则不执行删除操作 result = self.get_local_chunks(expr) if result is None or len(result) == 0: debug_logger.info(f'expr: {expr} not found in local milvus') return qanything_kernel/dependent_server/embedding_server/embedding_server.py 和 qanything_kernel/dependent_server/rerank_server/rerank_server.py增加以下代码from sanic.worker.manager import WorkerManager WorkerManager.THRESHOLD = 600 qanything_kernel/configs/model_config.py增加以下配置DEFAULT_PROMPT_TEMPLATE = """ 参考信息: {{context}} --- 我的问题或指令: {{question}} --- 请根据上述参考信息回答我的问题或回复我的指令。前面的参考信息可能有用,也可能没用,你需要从我给出的参考信息中选出与我的问题最相关的那些,来为你的回答提供依据。回答一定要忠于原文,简洁但不丢信息,不要胡乱编造。我的问题或指令是什么语种,你就用什么语种回复,你的回复: """ # 匹配后单段上下文长度 CHUNK_SIZE = 800 修改以下配置# 知识库检索时返回的匹配内容条数 VECTOR_SEARCH_TOP_K = 5 # embedding检索的相似度阈值,归一化后的L2距离,设置越大,召回越多,设置越小,召回越少 VECTOR_SEARCH_SCORE_THRESHOLD = 0.5 qanything_kernel/core/local_doc_qa.py # 在下面语句后增加 DEFAULT_PROMPT_TEMPLATE from qanything_kernel.configs.model_config import DEFAULT_PROMPT_TEMPLATE # 586~592做以下修改 if custom_prompt: # prompt_template = CUSTOM_PROMPT_TEMPLATE.replace("{{custom_prompt}}", custom_prompt) prompt_template = custom_prompt else: # system_prompt = SYSTEM.replace("{{today_date}}", today).replace("{{current_time}}", now) # prompt_template = PROMPT_TEMPLATE.replace("{{system}}", system_prompt).replace("{{instructions}}",INSTRUCTIONS) prompt_template = DEFAULT_PROMPT_TEMPLATE 启动项目run.sh 修改# 修改run.sh 93行 source ./.env启动# 在QAnything目录下执行 bash run.sh出现以下信息表示启动成功了访问项目在运行run.sh过程中会询问是在本地部署还是云服务上部署,如果是云服务器部署,就填入弹性IP的地址即可。体验项目新建知识库上传文档目前支持文件格式md、txt、pdf、jpg、png、jpeg、docx、xlsx、pptx、eml、csv, 单个文档小于30M, 单张图片小于5M, 文件总大小不得超过125M.文档越大,处理时间越长.备注中可以看大部分时间都耗费在embedding中,如果文档太大也可能会超时导致失败.预览切片结果PDF文件的解析原理是将其转换成Markedown文件,这可以最大限度保证原版的样式和内容.可以对照源文件对比查看,支持编辑修改.配置LLM大模型这里使用华为云ModelArts Studio大模型服务来作为LLM的提供商进入产品官网ModelArts Studio在API Key管理菜单项中申请一个Api Key进入在线推理菜单项,领取大模型服务免费额度进入对应服务的调用说明获取调用参数配置模型提供商问答会话接下来就可以进行问答会话了,这里用的是外部LLM服务,响应还是非常快的.
-
我想使用Ascend资源搭建一个支持 PyTorch-Ascend 的Notebook环境,将 A800上训练的模型直接部署在昇腾计算卡上,期望在 Ubuntu 20.04 LTS上 使用 PyTorch 2.4.0 + CANN >= 8.0.RC3 (参考昇腾辅助软件)。目前现有的公有云镜像中CANN的版本为 8.0.RC2,不太能满足我的需求。我在阅读 在Notebook中通过Dockerfile从0制作自定义镜像 和 从0制作自定义镜像用于创建训练作业(Pytorch+Ascend) 、编写Dockerfile时遇到以下问题:我没有获得 CANN商业版 的资格,无法下载,能直接使用 CANN 社区版吗?是否需要安装NPU驱动?如何确保在使用自定义镜像时能正确识别到Ascend资源?以及想请问是否有相关的帖子或开源dockerfile可供参考?
-
在ModelArt的notebook创建了基于Ascend 910B4的环境,运行代码时在npu上的计算非常慢,显示npu AI core占用率为0%,但是如下的代码中torch.npu.is_available()的输出为Trueprint("torch.npu.is_available(): ",torch.npu.is_available()) torch.npu.set_device('npu:0') torch_npu.npu.set_device('npu:0') device = torch.device('npu:0' if torch.npu.is_available() else 'cpu') print("devise:----------", device)另外,我运行下面这个简单的测试脚本用了二十多秒:import torch import torch_npu device = torch.device('npu:0' if torch.npu.is_available() else 'cpu') torch.npu.set_device(device) #创建一个简单的张量并在NPU上进行计算 a = torch.randn(1000, 1000, device=device) print(a) b = torch.randn(1000, 1000, device=device) print(b) c = a + b print(c)求助怎么解决训练模型时npu AI core占用0%的问题
-
mindspore如何使用优化器进行参数更新和梯度清零,Adam和optimizer都找不到相关功能和范例
-
arm架构BC-Linux,启动ascend-pytorch镜像失败# docker run -it -e ASCEND_VISIBLE_DEVICES=0 ascendhub.huawei.com/public-ascendhub/ascend-pytorch:23.0.0-A2-1.11.0-centos7 /bin/bash standard_init_linux.go:219: exec user process caused "exec format error" libcontainer: container start initialization failed: standard_init_linux.go:219: exec user process caused "exec format error"
推荐直播
-
HDC深度解读系列 - Serverless与MCP融合创新,构建AI应用全新智能中枢2025/08/20 周三 16:30-18:00
张昆鹏 HCDG北京核心组代表
HDC2025期间,华为云展示了Serverless与MCP融合创新的解决方案,本期访谈直播,由华为云开发者专家(HCDE)兼华为云开发者社区组织HCDG北京核心组代表张鹏先生主持,华为云PaaS服务产品部 Serverless总监Ewen为大家深度解读华为云Serverless与MCP如何融合构建AI应用全新智能中枢
回顾中 -
关于RISC-V生态发展的思考2025/09/02 周二 17:00-18:00
中国科学院计算技术研究所副所长包云岗教授
中科院包云岗老师将在本次直播中,探讨处理器生态的关键要素及其联系,分享过去几年推动RISC-V生态建设实践过程中的经验与教训。
回顾中 -
一键搞定华为云万级资源,3步轻松管理企业成本2025/09/09 周二 15:00-16:00
阿言 华为云交易产品经理
本直播重点介绍如何一键续费万级资源,3步轻松管理成本,帮助提升日常管理效率!
回顾中
热门标签