建议使用以下浏览器,以获得最佳体验。 IE 9.0+以上版本 Chrome 31+ 谷歌浏览器 Firefox 30+ 火狐浏览器
温馨提示

抱歉,您需设置社区昵称后才能参与社区互动!

前往修改
我再想想

华为云大赛技术圈

话题 : 467 成员 : 405

加入HCSD

【学习赛2021-硬盘异常检测】【总结分享】特征工程 Filter(1) 唯一值

年月日 2021/3/1 462
### 华为网络AI学习赛2021-硬盘异常检测 [华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)   ### 特征工程 Filter 唯一值 ``` python # 此处省略导包和加载数据部分的代码 # dataflow -> pandas_dataframe train_data=dataflow.to_pandas_dataframe() ``` ``` python # 查看每列中有多少中数值 # nunique(axis=0, dropna=True) -> pandas.core.series.Series method of pandas.core.frame.DataFrame instance # Count distinct observations over requested axis. # # Return Series with number of distinct observations. Can ignore NaN values. # https://blog.csdn.net/feizxiang3/article/details/93380525 train_data_nunique_dropnaTrue=train_data.nunique() ``` ``` python # 算上NaN train_data_nunique_dropnaFalse = train_data.nunique(dropna=False) ``` ``` python type(train_data.nunique()) pandas.core.series.Series ``` ``` python i = 1 # 根据相同的index,查看 有无NaN的差异 # train_data_nunique_dropnaTrue.index 109列的列名 for index in train_data_nunique_dropnaTrue.index: print(i,str(index).ljust(30),train_data_nunique_dropnaTrue[index],train_data_nunique_dropnaFalse[index]) i = i+1 ``` ![naie-feature-filter-weiyizhi-1.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234651umlmss4q8plvlewu.png) ``` python # 挑选出 train_data_nunique.values 中是0的索引 # https://junchu.blog.csdn.net/article/details/88791935 index_list_dT = train_data_nunique_dropnaTrue[train_data_nunique_dropnaTrue.values == 0].index.tolist() print("list\n",index_list_dT) print("len\n",index_list_dT.__len__()) # 有56个特殊列 ``` ![naie-feature-filter-weiyizhi-2.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234657lyi8gdik5intuhio.png) ``` python index_list_dF = train_data_nunique_dropnaFalse[train_data_nunique_dropnaFalse.values == 1].index.tolist() print("list\n",index_list_dF) print("len\n",index_list_dF.__len__()) # 有58个特殊列,多了 model 和 capacity_bytes ``` ![naie-feature-filter-weiyizhi-3.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234704iglagc8dvk8nktcn.png) ``` python # 将两个List取并集 # https://www.cnblogs.com/kwzblog/p/14101262.html index_list = list(set(index_list_dF).union(set(index_list_dT))) print(index_list) print(index_list.__len__()) ``` ![naie-feature-filter-weiyizhi-4.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234719vk00nng4wtutijo2.png) ``` python # 删除index_list的列 # 删除的是 单值列 和 全NaN train_data_df.drop(index_list,axis=1,inplace=True) ``` ``` python # 查看删除结果 train_data ``` ![naie-feature-filter-weiyizhi-5.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234725xchd92wkt9okfwv5.png) ``` python # 再次使用 nunique train_data_nunique_dropnaTrue = train_data.nunique(dropna=True) train_data_nunique_dropnaFalse = train_data.nunique(dropna=False) ``` ``` python # 检查结果 i = 1 for index in train_data_nunique_dropnaTrue.index: print(i,str(index).ljust(30),train_data_nunique_dropnaTrue[index],train_data_nunique_dropnaFalse[index]) i = i+1 ``` ![naie-feature-filter-weiyizhi-6.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234730bdqfhvbrsrcegumq.png) 注意,上图的输出结果中类似于 smart_XXX 1 2,是要多关注的。   ### 学习资源和参考资料 [【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html) [【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html) [【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html) [网络AI学习赛2021.硬盘异常检测,赛题解读](https://bbs.huaweicloud.com/live/dks_live/202102231900.html)   ### 其他学习赛推荐 [华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction) [华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)   ### 备注 1. 感谢老师的教学与课件 2. 欢迎各位同学一起来交流比赛心得^_^ 3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛

回复 (0)

没有评论
上划加载中
标签
您还可以添加5个标签
  • 没有搜索到和“关键字”相关的标签
  • 云产品
  • 解决方案
  • 技术领域
  • 通用技术
  • 平台功能
取消

年月日

角色:成员

话题:25

发消息
发表于2021年03月01日 16:24:17 4620
直达本楼层的链接
楼主
正序浏览 只看该作者
[技术干货] 【学习赛2021-硬盘异常检测】【总结分享】特征工程 Filter(1) 唯一值

### 华为网络AI学习赛2021-硬盘异常检测 [华为网络AI学习赛2021-硬盘异常检测](https://competition.huaweicloud.com/information/1000041370/introduction)   ### 特征工程 Filter 唯一值 ``` python # 此处省略导包和加载数据部分的代码 # dataflow -> pandas_dataframe train_data=dataflow.to_pandas_dataframe() ``` ``` python # 查看每列中有多少中数值 # nunique(axis=0, dropna=True) -> pandas.core.series.Series method of pandas.core.frame.DataFrame instance # Count distinct observations over requested axis. # # Return Series with number of distinct observations. Can ignore NaN values. # https://blog.csdn.net/feizxiang3/article/details/93380525 train_data_nunique_dropnaTrue=train_data.nunique() ``` ``` python # 算上NaN train_data_nunique_dropnaFalse = train_data.nunique(dropna=False) ``` ``` python type(train_data.nunique()) pandas.core.series.Series ``` ``` python i = 1 # 根据相同的index,查看 有无NaN的差异 # train_data_nunique_dropnaTrue.index 109列的列名 for index in train_data_nunique_dropnaTrue.index: print(i,str(index).ljust(30),train_data_nunique_dropnaTrue[index],train_data_nunique_dropnaFalse[index]) i = i+1 ``` ![naie-feature-filter-weiyizhi-1.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234651umlmss4q8plvlewu.png) ``` python # 挑选出 train_data_nunique.values 中是0的索引 # https://junchu.blog.csdn.net/article/details/88791935 index_list_dT = train_data_nunique_dropnaTrue[train_data_nunique_dropnaTrue.values == 0].index.tolist() print("list\n",index_list_dT) print("len\n",index_list_dT.__len__()) # 有56个特殊列 ``` ![naie-feature-filter-weiyizhi-2.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234657lyi8gdik5intuhio.png) ``` python index_list_dF = train_data_nunique_dropnaFalse[train_data_nunique_dropnaFalse.values == 1].index.tolist() print("list\n",index_list_dF) print("len\n",index_list_dF.__len__()) # 有58个特殊列,多了 model 和 capacity_bytes ``` ![naie-feature-filter-weiyizhi-3.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234704iglagc8dvk8nktcn.png) ``` python # 将两个List取并集 # https://www.cnblogs.com/kwzblog/p/14101262.html index_list = list(set(index_list_dF).union(set(index_list_dT))) print(index_list) print(index_list.__len__()) ``` ![naie-feature-filter-weiyizhi-4.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234719vk00nng4wtutijo2.png) ``` python # 删除index_list的列 # 删除的是 单值列 和 全NaN train_data_df.drop(index_list,axis=1,inplace=True) ``` ``` python # 查看删除结果 train_data ``` ![naie-feature-filter-weiyizhi-5.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234725xchd92wkt9okfwv5.png) ``` python # 再次使用 nunique train_data_nunique_dropnaTrue = train_data.nunique(dropna=True) train_data_nunique_dropnaFalse = train_data.nunique(dropna=False) ``` ``` python # 检查结果 i = 1 for index in train_data_nunique_dropnaTrue.index: print(i,str(index).ljust(30),train_data_nunique_dropnaTrue[index],train_data_nunique_dropnaFalse[index]) i = i+1 ``` ![naie-feature-filter-weiyizhi-6.png](https://bbs-img-cbc-cn.obs.cn-north-1.myhuaweicloud.com/data/forums/attachment/forum/202102/26/234730bdqfhvbrsrcegumq.png) 注意,上图的输出结果中类似于 smart_XXX 1 2,是要多关注的。   ### 学习资源和参考资料 [【2021学习赛---硬盘异常检测】2月23号直播ppt](https://bbs.huaweicloud.com/forum/thread-108940-1-1.html) [【学习赛2021--硬盘异常检测】样例代码](https://bbs.huaweicloud.com/forum/thread-107416-1-1.html) [【学习赛2021--KPI异常检测】优秀选手usstroot直播baseline代码及ppt](https://bbs.huaweicloud.com/forum/thread-106253-1-1.html) [网络AI学习赛2021.硬盘异常检测,赛题解读](https://bbs.huaweicloud.com/live/dks_live/202102231900.html)   ### 其他学习赛推荐 [华为网络AI学习赛2021-KPI异常检测](https://competition.huaweicloud.com/information/1000041344/introduction) [华为网络AI学习赛2021-日志异常检测](https://competition.huaweicloud.com/information/1000041371/introduction)   ### 备注 1. 感谢老师的教学与课件 2. 欢迎各位同学一起来交流比赛心得^_^ 3. 比赛配备了较为丰富的学习资料,有助于新手平稳入门,推荐参赛
点赞 举报
分享

分享文章到朋友圈

分享文章到微博

游客

您需要登录后才可以回帖 登录 | 立即注册