### 华为网络AI学习赛2021-硬盘异常检测
[华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)
### 特征工程 Filter 唯一值
``` python
# 此处省略导包和加载数据部分的代码
# dataflow -> pandas_dataframe
train_data=dataflow.to_pandas_dataframe()
```
``` python
# 查看每列中有多少中数值
# nunique(axis=0, dropna=True) -> pandas.core.series.Series method of pandas.core.frame.DataFrame instance
# Count distinct observations over requested axis.
#
# Return Series with number of distinct observations. Can ignore NaN values.
# https://blog.csdn.net/feizxiang3/article/details/93380525
train_data_nunique_dropnaTrue=train_data.nunique()
```
``` python
# 算上NaN
train_data_nunique_dropnaFalse = train_data.nunique(dropna=False)
```
``` python
type(train_data.nunique())
pandas.core.series.Series
```
``` python
i = 1
# 根据相同的index,查看 有无NaN的差异
# train_data_nunique_dropnaTrue.index 109列的列名
for index in train_data_nunique_dropnaTrue.index:
print(i,str(index).ljust(30),train_data_nunique_dropnaTrue[index],train_data_nunique_dropnaFalse[index])
i = i+1
```

``` python
# 挑选出 train_data_nunique.values 中是0的索引
# https://junchu.blog.csdn.net/article/details/88791935
index_list_dT = train_data_nunique_dropnaTrue[train_data_nunique_dropnaTrue.values == 0].index.tolist()
print("list\n",index_list_dT)
print("len\n",index_list_dT.__len__())
# 有56个特殊列
```

``` python
index_list_dF = train_data_nunique_dropnaFalse[train_data_nunique_dropnaFalse.values == 1].index.tolist()
print("list\n",index_list_dF)
print("len\n",index_list_dF.__len__())
# 有58个特殊列,多了 model 和 capacity_bytes
```

``` python
# 将两个List取并集
# https://www.cnblogs.com/kwzblog/p/14101262.html
index_list = list(set(index_list_dF).union(set(index_list_dT)))
print(index_list)
print(index_list.__len__())
```

``` python
# 删除index_list的列
# 删除的是 单值列 和 全NaN
train_data_df.drop(index_list,axis=1,inplace=True)
```
``` python
# 查看删除结果
train_data
```

``` python
# 再次使用 nunique
train_data_nunique_dropnaTrue = train_data.nunique(dropna=True)
train_data_nunique_dropnaFalse = train_data.nunique(dropna=False)
```
``` python
# 检查结果
i = 1
for index in train_data_nunique_dropnaTrue.index:
print(i,str(index).ljust(30),train_data_nunique_dropnaTrue[index],train_data_nunique_dropnaFalse[index])
i = i+1
```

注意,上图的输出结果中类似于 smart_XXX 1 2,是要多关注的。
### 学习资源和参考资料
[【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html)
[【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html)
[【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html)
[网络AI学习赛2021.硬盘异常检测,赛题解读](https://bbs.huaweicloud.com/live/dks_live/202102231900.html)
### 其他学习赛推荐
[华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction)
[华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)
### 备注
1. 感谢老师的教学与课件
2. 欢迎各位同学一起来交流比赛心得^_^
3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛

