建议使用以下浏览器,以获得最佳体验。 IE 9.0+以上版本 Chrome 31+ 谷歌浏览器 Firefox 30+ 火狐浏览器
温馨提示

抱歉,您需设置社区昵称后才能参与社区互动!

前往修改
我再想想

华为云大赛技术圈

话题 : 467 成员 : 405

加入HCSD

【学习赛2021-硬盘异常检测】【总结分享】 Dataflow转DataFrame(pandas)与生成新的数据实例

年月日 2021/3/1 410
### 华为网络AI学习赛2021-硬盘异常检测 [华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)   ### Dataflow转DataFrame(pandas)与生成新的数据实例 ``` python # 加载数据 dataset = 'DatasetService' #@param {"type":"string","label":"数据集","required":true,"asyncConfig":{"type":"dataset"},"hyperParamType":"inDataset"} dataset_entity = 'learning_training_data' #@param {"type":"string","label":"数据集实例","required":true,"asyncConfig":{"type":"data","params":{"dataset":{"key":"dataset"}}},"hyperParamType":"inData"} data_file_list = None #@param {"type":"raw","label":"数据文件列表","asyncConfig":{"type":"datafiles","params":{"dataset":{"key":"dataset"},"data":{"key":"dataset_entity"}}},"helpTip":"选择文件列表后,只读取列表中的文件并合并。各文件的列名需要完全相同。","advanced":true} data_file_type = None #@param {"type":"string","label":"数据文件格式","options":[{"text":"","value":"\\None"},{"text":"csv","value":"csv"},{"text":"其他","value":"other"}],"show":{"key":"data_file_list"},"helpTip":"选择CSV表示结构化数据,可调用to_pandas_dataframe方法转化为pandas的dataframe","advanced":true} attrs = {'is_time_series': False, 'time_column': None, 'time_format': None, 'id_column': None, "do_analysis": False} datareference = data_reference.get_data_reference(dataset, dataset_entity, file_list=data_file_list, file_type=data_file_type, attrs=attrs) #@global {"variableType":"datareference","label":"数据引用变量名","attrs":{"dataset":{"key":"dataset"},"dataset_entity":{"key":"dataset_entity"},"attrs":{"key":"attrs"}},"helpTip":"当存在多个数据引用时,可重命名数据引用对象的变量名以避免冲突","required":true,"advanced":true} dataflow = data_flow.create_data_flow(data_reference=datareference) #@global {"variableType":"dataflow","label":"操作流变量名","attrs":{"data_reference":{"variableType":"datareference"}},"helpTip":"当存在多个数据操作流时,可重命名操作流对象的变量名以避免冲突","required":true,"advanced":true} dataflow.show_head() ``` ![naie-feature-df-dataflow-save-1.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/115921ikl6e6twyxgypshq.png) ``` python type(dataflow) naie_cloud.feature_processing.data_flow.Dataflow ``` ``` python # dataflow 转 pandas_dataframe train_data_df=dataflow.to_pandas_dataframe() ``` ``` python type(train_data_df) pandas.core.frame.DataFrame ``` ``` python # 对数据集进行修改,删除指定列标签的列。并更新原始数据 train_data_df.drop(['date_g'],axis=1,inplace=True) ``` ``` python # pandas_dataframe 转 dataflow from naie.feature_processing import data_flow data_flow_new = data_flow.create_dataflow_from_df(train_data_df) ``` ``` python type(data_flow_new) naie_cloud.feature_processing.data_flow.Dataflow ``` ``` python # 生成数据实例 # dataset 和 dataset_entity 的命名要见名知意。能区分开另外两个比赛 #@param {"id":"create-dataset"} create_dataset_kwargs = { "dataset": "DiskDataSet", "dataset_entity": "Disk_20210226_1114" } current_dataflow = data_flow_new #@param {"id":"global.flow"} current_dataflow.write_as_dataset(**create_dataset_kwargs) ``` #### 加载数据,查看处理后的数据集 ``` python #@param {"id":"data-references"} data_reference_kwargs = [ { "dataset": "DiskDataSet", "dataset_entity": "Disk_20210226_1114", "file_list": [ "data.csv" ], "file_type": "csv", "encoding": "utf-8", "enable_local_cache": False } ] data_references = [data_reference.get_data_reference(**kwargs) for kwargs in data_reference_kwargs] #@global {"id":"global.export_data_references"} display(data_references) dataflows = [data_flow.create_data_flow(data_reference=datareference) for datareference in data_references] dataflow = dataflows[0] ``` ![naie-feature-df-dataflow-save-2.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/120009brmkufuvisurmkpe.png)   ### 学习资源和参考资料 [【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html) [【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html) [【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html)   ### 其他学习赛推荐 [华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction) [华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)   ### 备注 1. 感谢老师的教学与课件 2. 欢迎各位同学一起来交流比赛心得^_^ 3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛

回复 (0)

没有评论
上划加载中
标签
您还可以添加5个标签
  • 没有搜索到和“关键字”相关的标签
  • 云产品
  • 解决方案
  • 技术领域
  • 通用技术
  • 平台功能
取消

年月日

角色:成员

话题:25

发消息
更新于2021年03月01日 16:23:04 4100
直达本楼层的链接
楼主
正序浏览 只看该作者
[技术干货] 【学习赛2021-硬盘异常检测】【总结分享】 Dataflow转DataFrame(pandas)与生成新的数据实例

### 华为网络AI学习赛2021-硬盘异常检测 [华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)   ### Dataflow转DataFrame(pandas)与生成新的数据实例 ``` python # 加载数据 dataset = 'DatasetService' #@param {"type":"string","label":"数据集","required":true,"asyncConfig":{"type":"dataset"},"hyperParamType":"inDataset"} dataset_entity = 'learning_training_data' #@param {"type":"string","label":"数据集实例","required":true,"asyncConfig":{"type":"data","params":{"dataset":{"key":"dataset"}}},"hyperParamType":"inData"} data_file_list = None #@param {"type":"raw","label":"数据文件列表","asyncConfig":{"type":"datafiles","params":{"dataset":{"key":"dataset"},"data":{"key":"dataset_entity"}}},"helpTip":"选择文件列表后,只读取列表中的文件并合并。各文件的列名需要完全相同。","advanced":true} data_file_type = None #@param {"type":"string","label":"数据文件格式","options":[{"text":"","value":"\\None"},{"text":"csv","value":"csv"},{"text":"其他","value":"other"}],"show":{"key":"data_file_list"},"helpTip":"选择CSV表示结构化数据,可调用to_pandas_dataframe方法转化为pandas的dataframe","advanced":true} attrs = {'is_time_series': False, 'time_column': None, 'time_format': None, 'id_column': None, "do_analysis": False} datareference = data_reference.get_data_reference(dataset, dataset_entity, file_list=data_file_list, file_type=data_file_type, attrs=attrs) #@global {"variableType":"datareference","label":"数据引用变量名","attrs":{"dataset":{"key":"dataset"},"dataset_entity":{"key":"dataset_entity"},"attrs":{"key":"attrs"}},"helpTip":"当存在多个数据引用时,可重命名数据引用对象的变量名以避免冲突","required":true,"advanced":true} dataflow = data_flow.create_data_flow(data_reference=datareference) #@global {"variableType":"dataflow","label":"操作流变量名","attrs":{"data_reference":{"variableType":"datareference"}},"helpTip":"当存在多个数据操作流时,可重命名操作流对象的变量名以避免冲突","required":true,"advanced":true} dataflow.show_head() ``` ![naie-feature-df-dataflow-save-1.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/115921ikl6e6twyxgypshq.png) ``` python type(dataflow) naie_cloud.feature_processing.data_flow.Dataflow ``` ``` python # dataflow 转 pandas_dataframe train_data_df=dataflow.to_pandas_dataframe() ``` ``` python type(train_data_df) pandas.core.frame.DataFrame ``` ``` python # 对数据集进行修改,删除指定列标签的列。并更新原始数据 train_data_df.drop(['date_g'],axis=1,inplace=True) ``` ``` python # pandas_dataframe 转 dataflow from naie.feature_processing import data_flow data_flow_new = data_flow.create_dataflow_from_df(train_data_df) ``` ``` python type(data_flow_new) naie_cloud.feature_processing.data_flow.Dataflow ``` ``` python # 生成数据实例 # dataset 和 dataset_entity 的命名要见名知意。能区分开另外两个比赛 #@param {"id":"create-dataset"} create_dataset_kwargs = { "dataset": "DiskDataSet", "dataset_entity": "Disk_20210226_1114" } current_dataflow = data_flow_new #@param {"id":"global.flow"} current_dataflow.write_as_dataset(**create_dataset_kwargs) ``` #### 加载数据,查看处理后的数据集 ``` python #@param {"id":"data-references"} data_reference_kwargs = [ { "dataset": "DiskDataSet", "dataset_entity": "Disk_20210226_1114", "file_list": [ "data.csv" ], "file_type": "csv", "encoding": "utf-8", "enable_local_cache": False } ] data_references = [data_reference.get_data_reference(**kwargs) for kwargs in data_reference_kwargs] #@global {"id":"global.export_data_references"} display(data_references) dataflows = [data_flow.create_data_flow(data_reference=datareference) for datareference in data_references] dataflow = dataflows[0] ``` ![naie-feature-df-dataflow-save-2.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/120009brmkufuvisurmkpe.png)   ### 学习资源和参考资料 [【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html) [【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html) [【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html)   ### 其他学习赛推荐 [华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction) [华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)   ### 备注 1. 感谢老师的教学与课件 2. 欢迎各位同学一起来交流比赛心得^_^ 3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛
点赞 举报
分享

分享文章到朋友圈

分享文章到微博

游客

您需要登录后才可以回帖 登录 | 立即注册