### 华为网络AI学习赛2021-硬盘异常检测
[华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)
### Pandas 查看训练数据集中正负样本比例
``` python
import os
os.chdir("/home/ma-user/work/disk")
import pandas as pd
from naie.datasets import get_data_reference
dr_train = get_data_reference("DatasetService", "learning_training_data", file_type='csv')
train_data = dr_train.to_pandas_dataframe()
```
``` python
# 查看列的信息
train_data['failure']
0 0
1 0
2 0
3 0
4 0
..
558631 0
558632 0
558633 0
558634 0
558635 0
Name: failure, Length: 558636, dtype: int64
```
``` python
# crosstab
# rosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name:str='All', dropna:bool=True, normalize=False) -> 'DataFrame'
# Compute a simple cross tabulation of two (or more) factors. By default
# computes a frequency table of the factors unless an array of values and an
# aggregation function are passed.
pd.crosstab(train_data.failure,"count")
```

``` python
type(pd.crosstab(train_data.failure,"count"))
pandas.core.frame.DataFrame
```
``` python
# 饼图
# 显示的小数位数 autopct='%.3f%%'
pd.crosstab(train_data.failure,"count").plot(kind="pie",subplots=True,autopct='%.3f%%')
```

``` python
# 柱状图
pd.crosstab(train_data.failure,"count").plot(kind="bar")
```

有些学习数据集的正负样本比例是平衡的,在实际中不平衡的会多一些。
### 学习资源和参考资料
* [【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html)
* [【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html)
* [【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html)
* [Matplotlib简单画图(四) -- pandas绘图之DataFrame](https://blog.csdn.net/weixin_39778570/article/details/81143763)
### 其他学习赛推荐
* [华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction)
* [华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)
### 备注
1. 感谢老师的教学与课件
2. 欢迎各位同学一起来交流比赛心得^_^
3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛
