### 华为网络AI学习赛2021-硬盘异常检测
[华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)
### Dataflow转DataFrame(pandas)与生成新的数据实例
``` python
# 加载数据
dataset = 'DatasetService' #@param {"type":"string","label":"数据集","required":true,"asyncConfig":{"type":"dataset"},"hyperParamType":"inDataset"}
dataset_entity = 'learning_training_data' #@param {"type":"string","label":"数据集实例","required":true,"asyncConfig":{"type":"data","params":{"dataset":{"key":"dataset"}}},"hyperParamType":"inData"}
data_file_list = None #@param {"type":"raw","label":"数据文件列表","asyncConfig":{"type":"datafiles","params":{"dataset":{"key":"dataset"},"data":{"key":"dataset_entity"}}},"helpTip":"选择文件列表后,只读取列表中的文件并合并。各文件的列名需要完全相同。","advanced":true}
data_file_type = None #@param {"type":"string","label":"数据文件格式","options":[{"text":"","value":"\\None"},{"text":"csv","value":"csv"},{"text":"其他","value":"other"}],"show":{"key":"data_file_list"},"helpTip":"选择CSV表示结构化数据,可调用to_pandas_dataframe方法转化为pandas的dataframe","advanced":true}
attrs = {'is_time_series': False, 'time_column': None, 'time_format': None, 'id_column': None, "do_analysis": False}
datareference = data_reference.get_data_reference(dataset, dataset_entity, file_list=data_file_list, file_type=data_file_type, attrs=attrs) #@global {"variableType":"datareference","label":"数据引用变量名","attrs":{"dataset":{"key":"dataset"},"dataset_entity":{"key":"dataset_entity"},"attrs":{"key":"attrs"}},"helpTip":"当存在多个数据引用时,可重命名数据引用对象的变量名以避免冲突","required":true,"advanced":true}
dataflow = data_flow.create_data_flow(data_reference=datareference) #@global {"variableType":"dataflow","label":"操作流变量名","attrs":{"data_reference":{"variableType":"datareference"}},"helpTip":"当存在多个数据操作流时,可重命名操作流对象的变量名以避免冲突","required":true,"advanced":true}
dataflow.show_head()
```

``` python
type(dataflow)
naie_cloud.feature_processing.data_flow.Dataflow
```
``` python
# dataflow 转 pandas_dataframe
train_data_df=dataflow.to_pandas_dataframe()
```
``` python
type(train_data_df)
pandas.core.frame.DataFrame
```
``` python
# 对数据集进行修改,删除指定列标签的列。并更新原始数据
train_data_df.drop(['date_g'],axis=1,inplace=True)
```
``` python
# pandas_dataframe 转 dataflow
from naie.feature_processing import data_flow
data_flow_new = data_flow.create_dataflow_from_df(train_data_df)
```
``` python
type(data_flow_new)
naie_cloud.feature_processing.data_flow.Dataflow
```
``` python
# 生成数据实例
# dataset 和 dataset_entity 的命名要见名知意。能区分开另外两个比赛
#@param {"id":"create-dataset"}
create_dataset_kwargs = {
"dataset": "DiskDataSet",
"dataset_entity": "Disk_20210226_1114"
}
current_dataflow = data_flow_new #@param {"id":"global.flow"}
current_dataflow.write_as_dataset(**create_dataset_kwargs)
```
#### 加载数据,查看处理后的数据集
``` python
#@param {"id":"data-references"}
data_reference_kwargs = [
{
"dataset": "DiskDataSet",
"dataset_entity": "Disk_20210226_1114",
"file_list": [
"data.csv"
],
"file_type": "csv",
"encoding": "utf-8",
"enable_local_cache": False
}
]
data_references = [data_reference.get_data_reference(**kwargs) for kwargs in data_reference_kwargs] #@global {"id":"global.export_data_references"}
display(data_references)
dataflows = [data_flow.create_data_flow(data_reference=datareference) for datareference in data_references]
dataflow = dataflows[0]
```

### 学习资源和参考资料
[【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html)
[【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html)
[【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html)
### 其他学习赛推荐
[华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction)
[华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)
### 备注
1. 感谢老师的教学与课件
2. 欢迎各位同学一起来交流比赛心得^_^
3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛
