• [其他] 分享大数据融合分析:GaussDB(DWS)轻松导入MRS-Hive数据源
    大数据融合分析时代,GaussDB(DWS)如需访问MRS数据源,该如何实现?本期云小课带您开启MRS数据源之门,通过远程读取MRS集群Hive上的ORC数据表完成数据导入DWS。准备环境需确保MRS和DWS集群在同一个区域、可用区、同一VPC子网内,确保集群网络互通。基本流程1、创建MRS分析集群(选择Hive组件)。2、通过将本地txt数据文件上传至OBS桶,再通过OBS桶导入Hive,并由txt存储表导入ORC存储表。3、创建MRS数据源连接。4、创建外部服务器。5、创建外表。6、通过外表导入DWS本地表。创建MRS分析集群登录华为云控制台,选择“EI企业智能 > MapReduce服务”,单击“购买集群”,选择“自定义购买”,填写软件配置参数,单击“下一步”。表1 软件配置参数项取值区域华北-北京四集群名称MRS01集群版本MRS 3.0.5集群类型分析集群填写硬件配置参数,单击“下一步”。表1 硬件配置参数项取值计费模式按需计费可用区可用区2虚拟私有云vpc-01子网subnet-01安全组自动创建弹性公网IP10.x.x.x企业项目defaultMaster节点打开“集群高可用”分析Core节点3分析Task节点0填写高级配置参数,单击“立即购买”,等待约15分钟,集群创建成功。表1 高级配置参数项取值标签test01委托保持默认即可告警保持默认即可规则名称保持默认即可主题名称保持默认即可Kerberos认证默认打开用户名admin密码设置密码,例如:Huawei@12345。该密码用于登录集群管理页面。确认密码再次输入设置admin用户密码登录方式密码用户名root密码设置密码,例如:Huawei_12345。该密码用于远程登录ECS机器。确认密码再次输入设置的root用户密码通信安全授权勾选“确认授权”准备MRS的ORC表数据源本地PC新建一个product_info.txt,并拷贝以下数据,保存到本地。100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good! 300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice 150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite 200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality. 250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective 200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear 300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good 100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good登录OBS控制台,单击“创建桶”,填写以下参数,单击“立即创建”。表1 桶参数参数项取值区域华北-北京四数据冗余存储策略单AZ存储桶mrs-datasource存储类别标准存储桶策略私有默认加密关闭归档数据直读关闭企业项目default标签-等待桶创建好,单击桶名称,选择“对象 > 上传对象”,将product_info.txt上传至OBS桶。切换回MRS控制台,单击创建好的MRS集群名称,进入“概览”,单击“IAM用户同步”所在行的“单击同步”,等待约5分钟同步完成。回到MRS集群页面,单击“节点管理”,单击任意一台master节点,进入该节点页面,切换到“弹性公网IP”,单击“绑定弹性公网IP”,勾选已有弹性IP并单击“确定”,如果没有,请创建。记录此公网IP。确认主master节点。使用SSH工具以root用户登录以上节点,root密码为Huawei_12345,切换到omm用户。su - omm执行以下命令查询主master节点,回显信息中“HAActive”参数值为“active”的节点为主master节点。sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh使用root用户登录主master节点,切换到omm用户,并进入Hive客户端所在目录。su - ommcd /opt/client在Hive上创建存储类型为TEXTFILE的表product_info。在/opt/client路径下,导入环境变量。source bigdata_env登录Hive客户端。beeline依次执行以下SQL语句创建demo数据库及表product_info。CREATE DATABASE demo;USE demo;DROP TABLE product_info; CREATE TABLE product_info ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as TEXTFILE将product_info.txt数据文件导入Hive。切回到MRS集群,单击“文件管理”,单击“导入数据”。OBS路径:选择上面创建好的OBS桶名,找到product_info.txt文件,单击“是”。HDFS路径:选择/user/hive/warehouse/demo.db/product_info/,单击“是”。单击“确定”,等待导入成功,此时product_info的表数据已导入成功。创建ORC表,并将数据导入ORC表。执行以下SQL语句创建ORC表。DROP TABLE product_info_orc; CREATE TABLE product_info_orc ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc;将product_info表的数据插入到Hive ORC表product_info_orc中。insert into product_info_orc select * from product_info;查询ORC表数据导入成功。select * from product_info_orc;创建MRS数据源连接登录DWS管理控制台,单击已创建好的DWS集群,确保DWS集群与MRS在同一个区域、可用分区,并且在同一VPC子网下。切换到“MRS数据源”,单击“创建MRS数据源连接”。选择前序步骤创建名为的“MRS01”数据源,用户名:admin,密码:Huawei@12345,单击“确定”,创建成功。创建外部服务器使用Data Studio连接已创建好的DWS集群。新建一个具有创建数据库权限的用户dbuser:CREATE USER dbuser WITH CREATEDB PASSWORD "Bigdata@123";切换为新建的dbuser用户:SET ROLE dbuser PASSWORD "Bigdata@123";创建新的mydatabase数据库:CREATE DATABASE mydatabase;执行以下步骤切换为连接新建的mydatabase数据库。在Data Studio客户端的“对象浏览器”窗口,右键单击数据库连接名称,在弹出菜单中单击“刷新”,刷新后就可以看到新建的数据库。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开连接”。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开新的终端”,即可打开连接到指定数据库的SQL命令窗口,后面的步骤,请全部在该命令窗口中执行。为dbuser用户授予创建外部服务器的权限:GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser;其中FOREIGN DATA WRAPPER的名字只能是hdfs_fdw,dbuser为创建SERVER的用户名。执行以下命令赋予用户使用外表的权限。ALTER USER dbuser USEFT;切换回Postgres系统数据库,查询创建MRS数据源后系统自动创建的外部服务器。SELECT * FROM pg_foreign_server;返回结果如: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- gsmpp_server | 10 | 13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (3 rows)切换到mydatabase数据库,并切换到dbuser用户。SET ROLE dbuser PASSWORD "Bigdata@123";创建外部服务器。SERVER名字、地址、配置路径保持与8一致即可。CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820,192.168.1.218:9820', //MRS管理面的Master主备节点的内网IP,可与DWS通讯。 hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' );查看外部服务器。SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';返回结果如下所示,表示已经创建成功: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:29820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (1 row)创建外表获取Hive的product_info_orc的文件路径。登录MRS管理控制台。选择“集群列表 > 现有集群”,单击要查看的集群名称,进入集群基本信息页面。单击“文件管理”,选择“HDFS文件列表”。进入您要导入到GaussDB(DWS)集群的数据的存储目录,并记录其路径。创建外表。 SERVER名字填写创建的外部服务器名称,foldername填写查到的路径。DROP FOREIGN TABLE IF EXISTS foreign_product_info; CREATE FOREIGN TABLE foreign_product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca OPTIONS ( format 'orc', encoding 'utf8', foldername '/user/hive/warehouse/demo.db/product_info_orc/' ) DISTRIBUTE BY ROUNDROBIN;执行数据导入创建本地目标表。DROP TABLE IF EXISTS product_info; CREATE TABLE product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) with ( orientation = column, compression=middle ) DISTRIBUTE BY HASH (product_id);从外表导入目标表。INSERT INTO product_info SELECT * FROM foreign_product_info;查询导入结果。SELECT * FROM product_info;转自【云小课】EI第17课 大数据融合分析:GaussDB(DWS)轻松导入MRS-Hive数据源-云社区-华为云 (huaweicloud.com)
  • [Flink] Flink使用HiveCatalog访问hive的鉴权问题
    问题描述:请问专家当前FI版本的flink1.10在使用HiveCatlog如何进行zookeeper的鉴权认证?如有示例代码能提供最好,谢谢了。官网的代码如下,但是未发现有鉴权相关的类或者方法:EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build(); TableEnvironment tableEnv = TableEnvironment.create(settings); String name = "myhive"; String defaultDatabase = "mydatabase"; String hiveConfDir = "/opt/hive-conf"; // a local path String version = "2.3.4"; HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version); tableEnv.registerCatalog("myhive", hive); // set the HiveCatalog as the current catalog of the session tableEnv.useCatalog("myhive");在flink集群上面访问zookeeper后面的hive该如何鉴权认证呀?
  • [问题求助] 【MRS-HIVE组件】【使用Hive读取关系型数据库数据】"ADD JAR hdfs:///tmp/ojdbc6.jar;"报
    【功能模块】使用Hive读取关系型数据库数据【操作步骤&问题现象】1、安全模式下,按照示例,执行“ADD JAR hdfs:///tmp/ojdbc6.jar;”报错。2、出错提示:Error: org.apache.ranger.authorization.spark.authorizer.SparkAccessControlException: Permission denied: user [pocuser] does not have [TEMPUDFADMIN] privilege on [hdfs:///tmp/ojdbc6.jar] (state=,code=0)【截图信息】1,ranger UI中的信息:2,按照产品手册的示例,在第6步中出错:3,在spark-beeline中执行,报错如下:4,在beeline中执行,报错如下:【日志信息】(可选,上传日志内容或者附件)
  • [问题求助] 如何将大批量gaussdb数据迁移到hive
    请假大神们一个问题,现在项目面临的问题是在gaussdb中大概2000多张表,如何将这些表的数据同步到hive中去。请教了一下华为的客服,推荐用loader组件,可是这个组件对于这种大批量的数据迁移来说实在是太麻烦了,有没有大神有过这方面的经验,烦请告知!先谢谢了
  • [问题求助] Hive小文件合并
    GA一个局点,ISV在使用Hive过程中产生了大量的小文件,文件格式是ORC。请问有没有什么好的方法做Hive的小文件合并。
  • [分享驿站] 【云小课】大数据融合分析:GaussDB(DWS)轻松导入MRS-Hive数据源
    大数据融合分析时代,GaussDB(DWS)如需访问MRS数据源,该如何实现?本期云小课带您开启MRS数据源之门,通过远程读取MRS集群Hive上的ORC数据表完成数据导入DWS。准备环境需确保MRS和DWS集群在同一个区域、可用区、同一VPC子网内,确保集群网络互通。基本流程1、创建MRS分析集群(选择Hive组件)。2、通过将本地txt数据文件上传至OBS桶,再通过OBS桶导入Hive,并由txt存储表导入ORC存储表。3、创建MRS数据源连接。4、创建外部服务器。5、创建外表。6、通过外表导入DWS本地表。创建MRS分析集群登录华为云控制台,选择“EI企业智能 > MapReduce服务”,单击“购买集群”,选择“自定义购买”,填写软件配置参数,单击“下一步”。表1 软件配置参数项取值区域华北-北京四集群名称MRS01集群版本MRS 3.0.5集群类型分析集群填写硬件配置参数,单击“下一步”。表1 硬件配置参数项取值计费模式按需计费可用区可用区2虚拟私有云vpc-01子网subnet-01安全组自动创建弹性公网IP10.x.x.x企业项目defaultMaster节点打开“集群高可用”分析Core节点3分析Task节点0填写高级配置参数,单击“立即购买”,等待约15分钟,集群创建成功。表1 高级配置参数项取值标签test01委托保持默认即可告警保持默认即可规则名称保持默认即可主题名称保持默认即可Kerberos认证默认打开用户名admin密码设置密码,例如:Huawei@12345。该密码用于登录集群管理页面。确认密码再次输入设置admin用户密码登录方式密码用户名root密码设置密码,例如:Huawei_12345。该密码用于远程登录ECS机器。确认密码再次输入设置的root用户密码通信安全授权勾选“确认授权”准备MRS的ORC表数据源本地PC新建一个product_info.txt,并拷贝以下数据,保存到本地。100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good! 300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice 150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite 200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality. 250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective 200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear 300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good 100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good 登录OBS控制台,单击“创建桶”,填写以下参数,单击“立即创建”。表1 桶参数参数项取值区域华北-北京四数据冗余存储策略单AZ存储桶mrs-datasource存储类别标准存储桶策略私有默认加密关闭归档数据直读关闭企业项目default标签-等待桶创建好,单击桶名称,选择“对象 > 上传对象”,将product_info.txt上传至OBS桶。切换回MRS控制台,单击创建好的MRS集群名称,进入“概览”,单击“IAM用户同步”所在行的“单击同步”,等待约5分钟同步完成。回到MRS集群页面,单击“节点管理”,单击任意一台master节点,进入该节点页面,切换到“弹性公网IP”,单击“绑定弹性公网IP”,勾选已有弹性IP并单击“确定”,如果没有,请创建。记录此公网IP。确认主master节点。使用SSH工具以root用户登录以上节点,root密码为Huawei_12345,切换到omm用户。su - omm执行以下命令查询主master节点,回显信息中“HAActive”参数值为“active”的节点为主master节点。sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh使用root用户登录主master节点,切换到omm用户,并进入Hive客户端所在目录。su - ommcd /opt/client在Hive上创建存储类型为TEXTFILE的表product_info。在/opt/client路径下,导入环境变量。source bigdata_env登录Hive客户端。beeline依次执行以下SQL语句创建demo数据库及表product_info。CREATE DATABASE demo; USE demo; DROP TABLE product_info; CREATE TABLE product_info ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as TEXTFILE 将product_info.txt数据文件导入Hive。切回到MRS集群,单击“文件管理”,单击“导入数据”。OBS路径:选择上面创建好的OBS桶名,找到product_info.txt文件,单击“是”。HDFS路径:选择/user/hive/warehouse/demo.db/product_info/,单击“是”。单击“确定”,等待导入成功,此时product_info的表数据已导入成功。创建ORC表,并将数据导入ORC表。执行以下SQL语句创建ORC表。DROP TABLE product_info_orc; CREATE TABLE product_info_orc ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc; 将product_info表的数据插入到Hive ORC表product_info_orc中。insert into product_info_orc select * from product_info; 查询ORC表数据导入成功。select * from product_info_orc; 创建MRS数据源连接登录DWS管理控制台,单击已创建好的DWS集群,确保DWS集群与MRS在同一个区域、可用分区,并且在同一VPC子网下。切换到“MRS数据源”,单击“创建MRS数据源连接”。选择前序步骤创建名为的“MRS01”数据源,用户名:admin,密码:Huawei@12345,单击“确定”,创建成功。创建外部服务器使用Data Studio连接已创建好的DWS集群。新建一个具有创建数据库权限的用户dbuser:CREATE USER dbuser WITH CREATEDB PASSWORD "Bigdata@123"; 切换为新建的dbuser用户:SET ROLE dbuser PASSWORD "Bigdata@123"; 创建新的mydatabase数据库:CREATE DATABASE mydatabase; 执行以下步骤切换为连接新建的mydatabase数据库。在Data Studio客户端的“对象浏览器”窗口,右键单击数据库连接名称,在弹出菜单中单击“刷新”,刷新后就可以看到新建的数据库。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开连接”。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开新的终端”,即可打开连接到指定数据库的SQL命令窗口,后面的步骤,请全部在该命令窗口中执行。为dbuser用户授予创建外部服务器的权限:GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser; 其中FOREIGN DATA WRAPPER的名字只能是hdfs_fdw,dbuser为创建SERVER的用户名。执行以下命令赋予用户使用外表的权限。ALTER USER dbuser USEFT; 切换回Postgres系统数据库,查询创建MRS数据源后系统自动创建的外部服务器。SELECT * FROM pg_foreign_server; 返回结果如: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- gsmpp_server | 10 | 13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (3 rows) 切换到mydatabase数据库,并切换到dbuser用户。SET ROLE dbuser PASSWORD "Bigdata@123"; 创建外部服务器。SERVER名字、地址、配置路径保持与8一致即可。CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820,192.168.1.218:9820', //MRS管理面的Master主备节点的内网IP,可与DWS通讯。 hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' ); 查看外部服务器。SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca'; 返回结果如下所示,表示已经创建成功: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:29820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (1 row) 创建外表获取Hive的product_info_orc的文件路径。登录MRS管理控制台。选择“集群列表 > 现有集群”,单击要查看的集群名称,进入集群基本信息页面。单击“文件管理”,选择“HDFS文件列表”。进入您要导入到GaussDB(DWS)集群的数据的存储目录,并记录其路径。创建外表。 SERVER名字填写10创建的外部服务器名称,foldername填写1查到的路径。DROP FOREIGN TABLE IF EXISTS foreign_product_info; CREATE FOREIGN TABLE foreign_product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca OPTIONS ( format 'orc', encoding 'utf8', foldername '/user/hive/warehouse/demo.db/product_info_orc/' ) DISTRIBUTE BY ROUNDROBIN; 执行数据导入创建本地目标表。DROP TABLE IF EXISTS product_info; CREATE TABLE product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) with ( orientation = column, compression=middle ) DISTRIBUTE BY HASH (product_id); 从外表导入目标表。INSERT INTO product_info SELECT * FROM foreign_product_info; 查询导入结果。SELECT * FROM product_info; 想了解更多GaussDB(DWS),请猛戳
  • [基础组件] 【hive】在beeline中执行set role admin;报错
    【功能模块】在hive中使用beeline执行set role admin;报错Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. dev doesn't belong to role admin (state=08S01,code=1)【操作步骤&问题现象】1、按照https://support.huaweicloud.com/trouble-mrs/mrs_03_0165.html 中所述添加对应角色。2、重复执行上述操作,还是报错。3,按照MRS文档中描述的内容执行修改,重复执行上述操作,还是报错。MRS版本:8.0.2已经重启过hive,Ranger,问题仍未解决。
  • [赋能学习] Hudi通过DeltaStreamer消费Kafka写入Hive样例
    场景:通过Hudi消费kafka数据写入Hive DeltaStreamer工具使用参考 https://hudi.apache.org/cn/docs/writing_data.html 验证环境: FusionInsight HD 8.1.0 安全集群,安装有 Spark2X、Hive、Kafka服务 说明:kafka服务端配置 allow.everyone.if.no.acl.found 为true 步骤: 1、准备集群(略) 2、安装客户端和准备测试用户 参考产品文档,默认安装客户端路径 /opt/client 测试用例通过manager页面创建,例如用户名developuser密码Haosuwei123 3、将附件中的 delta-stream-example-0.7.0.jar 上传到客户端节点/opt/haosuwei/hudi-demo/目录下,如果修改目录,注意修改下面提交命令中的路径信息 4、ssh登录客户端节点,执行安全认证 ``` source /opt/client/bigdata_env echo Haosuwei123 | kinit developuser ``` 5、确认HDFS上没有路径/tmp/huditest,有则删除 ``` hdfs dfs -rm -r /tmp/huditest ``` 6、创建kafka的topic: ``` kafka-topics.sh --create --topic delta_demo --bootstrap-server 8.5.136.1:21005 ``` 7、准备DeltaStreamer使用的配置文件kafka-source.properties,其内容参考如下,注意修改kafka的broker地址,保存到与第3步相同的路径中 ``` // hudi配置 hoodie.datasource.write.recordkey.field=id hoodie.datasource.write.partitionpath.field=y,ym,ymd hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator hoodie.datasource.write.hive_style_partitioning=true hoodie.delete.shuffle.parallelism=10 hoodie.upsert.shuffle.parallelism=10 hoodie.bulkinsert.shuffle.parallelism=10 hoodie.insert.shuffle.parallelism=10 hoodie.finalize.write.parallelism=10 hoodie.cleaner.parallelism=10 hoodie.datasource.write.precombine.field=id hoodie.base.path = /tmp/huditest/delta_demo hoodie.timeline.layout.version = 1 hoodie.datasource.write.payload.class=com.huawei.hudi.testcz.MyOverwriteNonDefaultsWithLatestAvroPayload hoodie.compaction.payload.class=com.huawei.hudi.testcz.MyOverwriteNonDefaultsWithLatestAvroPayload // hive config hoodie.datasource.hive_sync.table=delta_demo hoodie.datasource.hive_sync.partition_fields=y,ym,ymd hoodie.datasource.hive_sync.assume_date_partitioning=false hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor hoodie.datasource.hive_sync.use_jdbc=false // Kafka Source topic hoodie.deltastreamer.source.kafka.topic=delta_demo // checkpoint hoodie.deltastreamer.checkpoint.provider.path=hdfs://hacluster/tmp/huditest/checkpoint/ // Kafka props bootstrap.servers=8.5.136.1:21005 auto.offset.reset=latest group.id=group-1 offset.rang.limit=10000 ``` 8、提交spark任务 开发时以yarn-client模式提交,提交命令参考如下 ``` source /opt/client/bigdata_env kinit -k -t user.keytab developuser spark-submit --master yarn-client \ --jars /opt/client/Hudi/hudi/conf,/opt/client/Hudi/hudi/lib/*,/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --driver-class-path /opt/client/Hudi/hudi/conf:/opt/client/Hudi/hudi/lib/*:/opt/client/Spark2x/spark/jars/*:/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ spark-internal --props file:///opt/haosuwei/hudi-demo/kafka-source.properties \ --target-base-path /tmp/huditest/delta_demo \ --table-type COPY_ON_WRITE \ --target-table delta_demo \ --source-ordering-field id \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --schemaprovider-class com.huawei.hudi.deltastream.DataSchemaProviderExample \ --transformer-class com.huawei.hudi.deltastream.TransformerExample \ --enable-hive-sync --continuous ``` 9、插入数据验证: 通过kafka命令行,往topic中写入一条数据,其中id为2时其他字段为2222 ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/184535ytpfixzyuycuxwcd.png) 查看hive中自动创建的表 delta_demo ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/184559terj6gfjw9gh3ocs.png) 10、更新验证 通过kafka命令行,往topic中写入一条数据,其中id为2时其他字段为xxxx ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/18470182n1hygk6wabtwlm.png) 再次查看hive中表delta_demo中数据已被更新 ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/184730wh9adnm7bapzmom5.png) 【其他】 生产环境建议以 yarn cluster模式提交spark任务,相关的提交方式修改如下: 1. 将 kafka-source.properties 上传到 HDFS,例如放到 /tmp目录 ``` hdfs dfs -put kafka-source.properties /tmp ``` 2. 修改提交spark任务的命令 ``` source /opt/client/bigdata_env source /opt/client/Hudi/component_env kinit -k -t user.keytab developuser spark-submit --master yarn-client \ --jars /opt/client/Hudi/hudi/conf,/opt/client/Hudi/hudi/lib/*,/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --driver-class-path /opt/client/Hudi/hudi/conf:/opt/client/Hudi/hudi/lib/*:/opt/client/Spark2x/spark/jars/*:/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ spark-internal --props hdfs://hacluster/tmp/kafka-source.properties \ --target-base-path /tmp/huditest/delta_demo \ --table-type COPY_ON_WRITE \ --target-table delta_demo \ --source-ordering-field id \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --schemaprovider-class com.huawei.hudi.deltastream.DataSchemaProviderExample \ --transformer-class com.huawei.hudi.deltastream.TransformerExample \ --enable-hive-sync --continuous ```
  • [其他] 有什么图形界面工具可以连接MRS Hive 执行HSQL吗?
    有什么图形界面工具可以连接MRS Hive 执行HSQL吗?
  • [问题求助] 【hive-1.1.0 在麒麟V10系统中编译不通过
    【功能模块】使用华为的移植手册进行hive-1.1.0的编译 【操作步骤&问题现象】具体步骤参考:https://support.huaweicloud.com/prtg-cdh-kunpengbds/kunpenghivecdh5121_02_0009.html【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [运维宝典] Hive msck报错案例
    1.   问题现象Hive执行msck repair table table_name报错:FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)。2.   问题定位查看HiveServer日志/var/log/Bigdata/hive/hiveserver/hive.log,发现目录名不符合分区格式:3.   解决方案(1)删除错误的文件或目录。(2)set hive.msck.path.validation=skip,跳过无效的目录。(3)set hive.msck.path.validation=ignore,跳过校验(不推荐)。
  • [运维宝典] MRS3.X Hive查询ORC文件报错案例
    1.   问题现象Hive查询ORC文件报错:Error: java.io.IOException: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 2 kind LENGTH position: 6 length: 6 range: 0 offset: 16 limit: 16 range 0 = 0 to 6 uncompressed: 3 to 3。2.   问题定位(1)查看HiveServer日志/var/log/Bigdata/hive/hiveserver/hive.log:(2)默认值supports.orc.different.field.names=true会按表字段数量匹配ORC文件数据,当表字段数量与ORC文件数据一致时能支持显示不同字段名的数据,但是当表字段比orc文件多时,字段类型不匹配会报错。3.   解决方案(1)重新建表匹配ORC文件。(2)set hive.supports.orc.different.field.names=false,关闭支持不同字段名,表字段名在ORC文件中不存在显示null。
  • [运维宝典] Hive外置MySQL创建中文字段表失败案例
    1.   问题现象Hive外置元数据库MySQL,创建表包含中文字段名,报错Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO "COLUMNS_V2" ("CD_ID","COMMENT","COLUMN_NAME","TYPE_NAME","INTEGER_IDX") VALUES (?,?,?,?,?) ) (state=08S01,code=1)。2.   问题定位查看MySQL中columns_v2表字符集,发现是latin1。3.   解决方案在MySQL中执行命令:alter table columns_v2 convert to character set utf8;。
  • [二次开发] 【HD6.5.1】【hive连接失败】jdbc获取hive连接失败
    【功能模块】jdbc获取hive连接失败【操作步骤&问题现象】1、jdbc获取hive连接失败,一直报hadoop相关的错误,麻烦看看是什么原因【截图信息】【日志信息】(可选,上传日志内容或者附件)
  • [基础组件] hive on mapreduce 报错
    MRS3.0.2版本 hive 执行sql报错, 麻烦专家能帮忙看一下是哪里的问题吗?hive日志:WARN  : Shutting down task : Stage-11:MAPREDERROR : Ended Job = job_1616739725962_0005 with exception 'org.apache.hadoop.yarn.exceptions.YarnRuntimeException(java.lang.InterruptedException: sleep interrupted)'org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException: sleep interruptedat org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:350)at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:398)at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:904)at org.apache.hadoop.mapreduce.Job$6.run(Job.java:736)at org.apache.hadoop.mapreduce.Job$6.run(Job.java:733)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:733)at org.apache.hadoop.mapred.JobClient$NetworkedJob.getTaskCompletionEvents(JobClient.java:355)at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.computeReducerTimeStatsPerJob(HadoopJobExecHelper.java:634)at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:592)at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:650)at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:78)Caused by: java.lang.InterruptedException: sleep interruptedat java.lang.Thread.sleep(Native Method)at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:347)... 16 moreYARN日志截取:2021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000000_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000001_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000002_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000003_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000004_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000005_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000006_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000007_0 | RMContainerAllocator.java:4202021-03-26 14:41:56,056 | INFO  | Thread-66 | Processing the event EventType: CONTAINER_DEALLOCATE | RMContainerAllocator.java:4042021-03-26 14:41:56,056 | ERROR | Thread-66 | Could not deallocate container for task attemptId attempt_1616739725962_0004_r_000008_0 | RMContainerAllocator.java:4202021-03-26 14:42:02,002 | INFO  | AsyncDispatcher event handler | Num completed Tasks: 40 | JobImpl.java:20212021-03-26 14:42:02,002 | INFO  | Socket Reader #4 for port 27102 | Socket Reader #4 for port 27102: readAndProcess from client 10.114.10.75:45628 threw exception [java.io.IOException: Connection reset by peer] | Server.java:1383java.io.IOException: Connection reset by peer    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)    at sun.nio.ch.IOUtil.read(IOUtil.java:197)    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377)    at org.apache.hadoop.ipc.Server.channelRead(Server.java:3486)    at org.apache.hadoop.ipc.Server.access$2700(Server.java:140)    at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:2173)    at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:1376)    at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1232)    at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1203)2021-03-26 14:42:02,002 | INFO  | AsyncDispatcher event handler | Num completed Tasks: 41 | JobImpl.java:20212021-03-26 14:42:02,002 | INFO  | RMCommunicator Allocator | Received completed container container_e06_1616739725962_0005_01_000014 | RMContainerAllocator.java:9112021-03-26 14:42:02,002 | INFO  | AsyncDispatcher event handler | Diagnostics report from attempt_1616739725962_0005_m_000053_0: [2021-03-26 14:42:02.336]Container killed by the ApplicationMaster.[2021-03-26 14:42:02.617]Container killed on request. Exit code is 143[2021-03-26 14:42:02.678]Container exited with a non-zero exit code 143.| TaskAttemptImpl.java:26042021-03-26 14:42:02,002 | INFO  | RMCommunicator Allocator | After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:25 AssignedReds:0 CompletedMaps:77 CompletedReds:0 ContAlloc:69 ContRel:0 HostLocal:69 RackLocal:0 | RMContainerAllocator.java:16672021-03-26 14:42:02,002 | INFO  | AsyncDispatcher event handler | Diagnostics report from attempt_1616739725962_0005_m_000003_0: [2021-03-26 14:42:01.253]Container killed by the ApplicationMaster.[2021-03-26 14:42:01.458]Container killed on request. Exit code is 143[2021-03-26 14:42:01.889]Container exited with a non-zero exit code 143.| TaskAttemptImpl.java:26042021-03-26 14:42:04,004 | INFO  | RMCommunicator Allocator | Received completed container container_e06_1616739725962_0006_01_000015 | RMContainerAllocator.java:9112021-03-26 14:42:04,004 | ERROR | RMCommunicator Allocator | Container complete event for unknown container container_e06_1616739725962_0006_01_000015 | RMContainerAllocator.java:9192021-03-26 14:42:04,004 | INFO  | RMCommunicator Allocator | Received completed container container_e06_1616739725962_0006_01_000018 | RMContainerAllocator.java:9112021-03-26 14:42:04,004 | ERROR | RMCommunicator Allocator | Container complete event for unknown container container_e06_1616739725962_0006_01_000018 | RMContainerAllocator.java:9192021-03-26 14:42:04,004 | INFO  | RMCommunicator Allocator | Received completed container container_e06_1616739725962_0006_01_000020 | RMContainerAllocator.java:9112021-03-26 14:42:04,004 | ERROR | RMCommunicator Allocator | Container complete event for unknown container container_e06_1616739725962_0006_01_000020 | RMContainerAllocator.java:9192021-03-26 14:42:04,004 | INFO  | RMCommunicator Allocator | Got allocated containers 16 | RMContainerAllocator.java:1186