• [openEuler] 基于openEuler 20.03-LTS-SP1安装部署Hive 2.3.1
    一、查看系统信息二、安装Java注意:请勿安装jdk1.7及之前的版本,否则hive启动时会报错。jdk下载链接:https://www.oracle.com/cn/java/technologies/javase/javase-jdk8-downloads.html下载jdk软件包,解压至/usr/java目录tar zxf jdk-8u301-linux-x64.tar.gz -C /usr/java/配置环境变量vim /etc/porfile文件中添加以下内容export JAVA_HOME=/usr/java/jdk1.8.0_301 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar执行以下命令使环境变量立即生效source /etc/profile检查Java是否安装成功java -version三、安装Hadoop下载Hadoopwget http://archive.apache.org/dist/hadoop/core/hadoop-2.9.2/hadoop-2.9.2.tar.gz解压至/usr/local目录,并在目录下创建软链接tar zxf hadoop-2.9.2.tar.gz -C /usr/local cd /usr/local ln -sv hadoop-2.9.2 hadoop配置环境变量vim /etc/profile在文件中添加以下内容HADOOP_HOME=/usr/local/hadoop PATH=$HADOOP_HOME/bin:$PATH export HADOOP_HOME PATH使环境变量立即生效source /etc/profile检查Hadoop是否安装成功hadoop version四、安装hive安装并启动mariadbyum install mariadb* systemctl start mariadb.service登陆mysql配置密码并授权mysql -uroot -p # 首次登陆无密码,直接回车登陆 set password=password('你的密码'); grant all on *.* to root@'%' identified by '你的密码'; flush privileges;配置my.cnfvim /etc/my.cnf添加以下内容[mysqld] init_connect='SET collation_connection = utf8_unicode_ci' init_connect='SET NAMES utf8' character-set-server=utf8 collation-server=utf8_unicode_ci skip-character-set-client-handshake重启mariadbsystemctl restart mariadb.service下载apache hive 2.3.1安装包wget https://archive.apache.org/dist/hive/hive-2.3.1/apache-hive-2.3.1-bin.tar.gz解压至/usr/local目录并创建软链接tar zxf apache-hive-2.3.1-bin.tar.gz -C /usr/local/ cd /usr/local ln -sv apache-hive-2.3.1-bin hive配置环境变量vim /etc/profile添加以下内容# hive export HIVE_HOME=/usr/local/hive export PATH=$HIVE_HOME/bin:$PATH使环境变量立即生效source /etc/profile检查hive是否安装成功hive --version复制配置文件模板cd /usr/local/hive/conf cp hive-env.sh.template hive-env.sh vim hive-env.sh添加以下内容export JAVA_HOME=/usr/java/jdk1.8.0_301 export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf复制hive-site.xml模板文件cp hive-default.xml.template hive-site.xml sed -i 's/for&#/for/g' hive-site.xml vim hive-site.xml找到以下对应name标签,修改value的值<name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.mariadb.jdbc.Driver</value> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <name>javax.jdo.option.ConnectionPassword</name> <value>数据库密码</value> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> <name>hive.downloaded.resources.dir</name> <value>/tmp/${hive.session.id}_resources</value> <name>hive.querylog.location</name> <value>/tmp/hive</value>下载JDBC驱动,并放置/usr/local/hive/lib目录下cd /usr/local/hive/lib wget https://downloads.mariadb.com/Connectors/java/connector-java-2.3.0/mariadb-java-client-2.3.0.jar创建hive数据目录并配置文件夹权限/usr/local/hadoop/bin/hadoop fs -mkdir /tmp /usr/local/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse /usr/local/hadoop/bin/hadoop fs -chmod g+w /tmp /usr/local/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse创建Hive日志目录mkdir -p /usr/local/hive/log/ touch /usr/local/hive/log/hiveserver.log touch /usr/local/hive/log/hiveserver.err初始化Hiveschematool -dbType mysql -initSchema启动hive至此,hive2.3.1安装完成。
  • [openEuler] openEuler【20.03 LTS sp1】部署Apache Hive2.0.X操作指南
    一.环境信息二.安装jdkyum install java-1.8.0-openjdk*配置环境变量vim /etc/profile export JAVA_HOME=/usr/lib/jvm/java export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar export PATH=$PATH:$JAVA_HOME/bin环境变量生效source /etc/profile查看jdk版本java -version三.安装部署hadoop注意hive2.x和hadoop2.x适配下载wget http://archive.apache.org/dist/hadoop/core/hadoop-2.6.5/hadoop-2.6.5.tar.gz解压至/usr/local目录下tar -xzvf hadoop-2.6.5.tar.gz -C /usr/local配置环境变量vim /etc/profile export HADOOP_HOME=/usr/local/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH环境变量生效source /etc/profile创建软连接cd /usr/local/ ln -s hadoop-2.6.5/ hadoop查看hadoop版本hadoop version四.安装部署apache hive首先安装并启动mariadbyum install mariadb* systemctl start mariadb.service查看启动状态systemctl status mariadb.service登陆mysql配置密码并授权,密码不可过于简单mysql -uroot -p # 首次登陆无密码,直接回车登陆 set password=password('qwer1234!@#$'); grant all on *.* to root@'%' identified by 'qwer1234!@#$'; flush privileges;编辑配置my.cnfvim /etc/my.cnf [mysqld] init_connect='SET collation_connection = utf8_unicode_ci' init_connect='SET NAMES utf8' character-set-server=utf8 collation-server=utf8_unicode_ci skip-character-set-client-handshake重启mariadbsystemctl restart mariadb获取hive 2.0.1版本的软件安装包wget https://archive.apache.org/dist/hive/hive-2.0.1/apache-hive-2.0.1-bin.tar.gz解压至/usr/local目录下tar -xzvf apache-hive-2.0.1-bin.tar.gz -C /usr/local/创建软连接cd /usr/local ln -s apache-hive-2.0.1-bin hive配置环境变量vim /etc/profile export HIVE_HOME=/usr/local/hive export PATH=$HIVE_HOME/bin:$PATH环境变量生效source /etc/profile查看hive版本hive --version修改配置文件cd /usr/local/hive/conf cp hive-env.sh.template hive-env.sh vim hive-env.sh export JAVA_HOME=/usr/lib/jvm/java export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf复制hive-site.xml这样才会生效cp hive-default.xml.template hive-site.xml sed -i 's/for&#/for/g' hive-site.xml vim hive-site.xml添加如下内容,注意格式 <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>qwer1234!@#$</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.mariadb.jdbc.Driver</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property>如果后面启动有报错,需要把其他内容删除,只保留这一部分。下载JDBC驱动,并放置/usr/local/hive/lib目录下cd /usr/local/hive/lib wget https://downloads.mariadb.com/Connectors/java/connector-java-2.3.0/mariadb-java-client-2.3.0.jar创建Hive数据目录/usr/local/hadoop/bin/hadoop fs -mkdir /tmp /usr/local/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse /usr/local/hadoop/bin/hadoop fs -chmod g+w /tmp /usr/local/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse创建Hive日志目录mkdir -p /usr/local/hive/log/ touch /usr/local/hive/log/hiveserver.log touch /usr/local/hive/log/hiveserver.err初始化Hiveschematool -dbType mysql -initSchema启动hive至此Apache Hive2.0.X部署完成。五.Hive基础操作1.查看数据库show databases;2.创建数据库create database if not exists euler;3.创建测试表create table test( name string, friends array<string>, children map<string, int>, address struct<street:string, city:string> ) row format delimited fields terminated by ',' collection items terminated by '_' map keys terminated by ':' lines terminated by '\n';字段解释:row format delimited fields terminated by ‘,’ – 列分隔符collection items terminated by ‘_’ --MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)map keys terminated by ‘:’ – MAP中的key与value的分隔符lines terminated by ‘\n’; – 行分隔符4.创建本地测试文件test.txtongsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing5.导入文本数据到测试表 load data local inpath "/root/test.txt"into table test;6.访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式select friends[1],children['xiao song'],address.city from test where name="ongsong";
  • [赋能学习] HetuEngine学习3-创建dws数据源并和hive做跨仓融合分析
    根据上图本课程学习链路4,如何配置dws外部数据源,并通过链路1,通过客户端直接访问HetuEngine做到链路2和链路4,hive以及dws的跨仓融合分析1. 在HetuEngine Console页面增加dws数据源,并且配置参考产品文档《组件操作指南》->《HetuEngine》->《通过HetuEngine HSConsole配置数据源》-> 《配置JDBC数据源》2. 登陆dws数据库使用如下命令创建新的库以及测试表CREATE USER developuser WITH PASSWORD "Bigdata@123";GRANT ALL PRIVILEGES TO developuser;CREATE DATABASE test;CREATE TABLE test.jizhan (  rowid bigint(20) ,  province varchar(20) DEFAULT '',  jizhanid bigint(20) DEFAULT NULL,  date varchar(20) DEFAULT NULL,  PRIMARY KEY (`rowid`));insert into jizhan (rowid, province, jizhanid, date) values('12345','湖南',1,'2021-3-29 17:23:55');insert into jizhan (rowid, province, jizhanid, date) values('22345','湖北',2,'2021-3-28 16:23:55');insert into jizhan (rowid, province, jizhanid, date) values('32345','河南',1,'2021-3-27 15:23:55');insert into jizhan (rowid, province, jizhanid, date) values('42345','河北',1,'2021-3-26 14:23:55');insert into jizhan (rowid, province, jizhanid, date) values('52345','北京',1,'2021-3-25 13:23:55');2. 登陆hetu客户端hetu-cliselect des.test.test6;select hive.default.people;select *  from  hive.default.people h1 inner join dws.public.jizhan h2 on h1.rowid=h2.rowid;视频
  • [赋能学习] HetuEngine学习2-创建hive样例数据并且和hbase做跨源融合分析
    根据上图本课程学习通过链路1,使用客户端访问HetuEgine做链路2,链路3的hive, hbase数据源跨源融合分析1. 准备hive样例数据登录客户端 source /opt/client/bigdata_env创建hdfs路径 hdfs dfs -mkdir -p /tmp/people上传数据文件hdfs dfs -put people.txt /tmp/people/people.txt文件内容12345|jack|m|33|22345|gary|m|44|32345|eddi|m|55|42345|lina|f|66|52345|zhou|f|77|使用beeline登陆hive客户端创建表CREATE EXTERNAL TABLE `people`(                                                  `rowid` string,                                                             `name` string,                                                             `sex` string,                                                                  `age` int)                                                    ROW FORMAT SERDE                                                                     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'                             WITH SERDEPROPERTIES (                                                               'field.delim'='|',                                                                 'serialization.format'='|')                                                      STORED AS INPUTFORMAT                                                                'org.apache.hadoop.mapred.TextInputFormat'                                       OUTPUTFORMAT                                                                         'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'                     LOCATION                                                                             'hdfs://hacluster/tmp/people'     TBLPROPERTIES (                                                                      'STATS_GENERATED_VIA_STATS_TASK'='workaround for potential lack of HIVE-12730',    'bucketing_version'='2',                                                           'serialization.null.format'='',                                                    'transient_lastDdlTime'='1616490878');2. 登陆hetu客户端hetu-cliselect hbase.default.test6;select hive.default.people;select *  from  hive.default.people h1 inner join hbase.default.test6 h2 on h1.rowid=h2.rowid;视频
  • [openEuler] 基于openEuler 20.03-LTS-SP1安装部署Hive 2.1.1及Hive升级至2.2.0
    介绍       hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低,可以通过类似SQL语句实现快速MapReduce统计,使MapReduce变得更加简单,而不必开发专门的MapReduce应用程序。hive十分适合对数据仓库进行统计分析。安装部署说明本文档是基于openEuler20.03-LTS-SP1系统进行安装部署Hive 2.1.1,及Hive版本升级(以升级至2.2.0为例)。Hive的安装部署需要提前安装JDK、ZooKeeper以及Hadoop,为了适应兼容Hive 2.1.x以及2.2.x的安装使用,此次JDK版本选择默认的1.8.0,ZooKeeper版本选择3.7.0,Hadoop版本选择2.9.0,采用单节点进行部署测试。环境配置属性说明OSopenEuler-20.03-LTS-SP1CPU架构x86_64规格4C/8G/40G安装部署1、确认环境架构cat /etc/os-release uname -i2、安装JDKyum install java-1.8.0-openjdk*配置环境变量vim /etc/profile export JAVA_HOME=/usr/lib/jvm/java export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar export PATH=$PATH:$JAVA_HOME/bin # 环境变量生效 source /etc/profile查看jdk是否安装成功java -version3、安装Zookeeper下载zookeeper 3.7.0版本安装包wget https://downloads.apache.org/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz解压至/usr/local目录下tar -xzvf apache-zookeeper-3.7.0-bin.tar.gz -C /usr/local/建立链接,便于版本更换cd /usr/local/ ln -s apache-zookeeper-3.7.0-bin/ zookeeper配置环境变量vim /etc/profile export ZOOKEEPER_HOME=/usr/local/zookeeper export PATH=$ZOOKEEPER_HOME/bin:$PATH # 环境变量生效 source /etc/profile 修改zookeeper的配置文件cd /usr/local/zookeeper/conf cp zoo_sample.cfg zoo.cfg vim zoo.cfg # 修改数据存放路径 dataDir=/usr/local/zookeeper/tmp创建数据目录mkdir /usr/local/zookeeper/tmp启动zookeepercd /usr/local/zookeeper/bin ./zkServer.sh start4、安装Hadoop(原先安装hadoop 3.3.1版本发现与hive 2.1.1不兼容,故尝试安装2.9.0版本)获取hadoop软件安装包wget http://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz解压至/usr/local目录下tar -xzvf hadoop-2.9.0.tar.gz -C /usr/local建立链接,方便版本升级cd /usr/local/ ln -s hadoop-2.9.0/ hadoop配置环境变量vim /etc/profile export HADOOP_HOME=/usr/local/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH # 生效环境变量 source /etc/profile查看hadoop版本hadoop version5、安装HiveHive需将元数据保存在数据库中,则在安装Hive前,需先安装数据库软件,并在hive-site.xml中配置数据库相关信息。本次以MariaDB为例。安装并启动mariadbyum install mariadb* systemctl start mariadb.service登陆mysql配置密码并授权mysql -uroot -p # 首次登陆无密码,直接回车登陆 set password=password('xxxxxxx'); grant all on *.* to root@'%' identified by 'xxxxxxxx'; flush privileges;编辑配置my.cnfvim /etc/my.cnf [mysqld] init_connect='SET collation_connection = utf8_unicode_ci' init_connect='SET NAMES utf8' character-set-server=utf8 collation-server=utf8_unicode_ci skip-character-set-client-handshake重启mariadbsystemctl restart mariadb获取hive 2.1.1版本的软件安装包wget https://archive.apache.org/dist/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz解压至/usr/local目录底下tar -xzvf apache-hive-2.1.1-bin.tar.gz -C /usr/local/建立链接,方便后续版本升级cd /usr/local ln -s apache-hive-2.1.1-bin hive配置环境变量vim /etc/profile export HIVE_HOME=/usr/local/hive export PATH=$HIVE_HOME/bin:$PATH #生效环境变量 source /etc/profile查看hive版本hive --version修改配置文件cd /usr/local/hive/conf cp hive-env.sh.template hive-env.sh vim hive-env.sh export JAVA_HOME=/usr/lib/jvm/java export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf cp hive-default.xml.template hive-site.xml sed -i 's/for&#/for/g' hive-site.xml vim hive-site.xml <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.mariadb.jdbc.Driver</value> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <name>javax.jdo.option.ConnectionPassword</name> <value>xxxxxxxxxxx</value> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> <name>hive.downloaded.resources.dir</name> <value>/tmp/${hive.session.id}_resources</value> <name>hive.querylog.location</name> <value>/tmp/hive</value>下载JDBC驱动,并放置/usr/local/hive/lib目录下cd /usr/local/hive/lib wget https://downloads.mariadb.com/Connectors/java/connector-java-2.3.0/mariadb-java-client-2.3.0.jar创建Hive数据目录/usr/local/hadoop/bin/hadoop fs -mkdir /tmp /usr/local/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse /usr/local/hadoop/bin/hadoop fs -chmod g+w /tmp /usr/local/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse创建Hive日志目录mkdir -p /usr/local/hive/log/ touch /usr/local/hive/log/hiveserver.log touch /usr/local/hive/log/hiveserver.err初始化Hiveschematool -dbType mysql -initSchema至此Hive 2.1.1安装部署完成。6、升级Hive至2.2.0安装部署完Hive 2.1.1后,如果后续用户需要升级Hive更高版本,可以替换/usr/local/hive的软链接,并按上述修改hive配置文件即可。环境变量等无需再修改。以Hive升级至2.2.0版本为例,已安装的jdk、zookeeper、hadoop、mariadb无需修改。下载hive 2.2.0版本软件安装包wget https://archive.apache.org/dist/hive/hive-2.2.0/apache-hive-2.2.0-bin.tar.gz解压至/usr/local/目录下tar -xzvf apache-hive-2.2.0-bin.tar.gz -C /usr/local/删除原软链接,建立新软链接cd /usr/local/ rm -rf hive ln -s apache-hive-2.2.0-bin hive然后按照hive修改配置文件步骤操作cd /usr/local/hive/conf cp hive-env.sh.template hive-env.sh vim hive-env.sh export JAVA_HOME=/usr/lib/jvm/java export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf cp hive-default.xml.template hive-site.xml sed -i 's/for&#/for/g' hive-site.xml vim hive-site.xml <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.mariadb.jdbc.Driver</value> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <name>javax.jdo.option.ConnectionPassword</name> <value>xxxxxxxxxxx</value> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> <name>hive.downloaded.resources.dir</name> <value>/tmp/${hive.session.id}_resources</value> <name>hive.querylog.location</name> <value>/tmp/hive</value>初始化前进行数据库备份,再迁移至新库mysql dump -uroot -pxxxxxxx hive > /tmp/my_backup.sql drop database hive; create database hive; mysql -uroot -pxxxxxxxx hive < /tmp/my_backup.sql然后进行hive初始化schematool -dbType mysql -initSchema至此hive 2.2.0升级完毕
  • [运维管理] 【FI产品】Hive查询报错
    Hive执行命令报错,截图如下:
  • [其他] 分享大数据融合分析:GaussDB(DWS)轻松导入MRS-Hive数据源
    大数据融合分析时代,GaussDB(DWS)如需访问MRS数据源,该如何实现?本期云小课带您开启MRS数据源之门,通过远程读取MRS集群Hive上的ORC数据表完成数据导入DWS。准备环境需确保MRS和DWS集群在同一个区域、可用区、同一VPC子网内,确保集群网络互通。基本流程1、创建MRS分析集群(选择Hive组件)。2、通过将本地txt数据文件上传至OBS桶,再通过OBS桶导入Hive,并由txt存储表导入ORC存储表。3、创建MRS数据源连接。4、创建外部服务器。5、创建外表。6、通过外表导入DWS本地表。创建MRS分析集群登录华为云控制台,选择“EI企业智能 > MapReduce服务”,单击“购买集群”,选择“自定义购买”,填写软件配置参数,单击“下一步”。表1 软件配置参数项取值区域华北-北京四集群名称MRS01集群版本MRS 3.0.5集群类型分析集群填写硬件配置参数,单击“下一步”。表1 硬件配置参数项取值计费模式按需计费可用区可用区2虚拟私有云vpc-01子网subnet-01安全组自动创建弹性公网IP10.x.x.x企业项目defaultMaster节点打开“集群高可用”分析Core节点3分析Task节点0填写高级配置参数,单击“立即购买”,等待约15分钟,集群创建成功。表1 高级配置参数项取值标签test01委托保持默认即可告警保持默认即可规则名称保持默认即可主题名称保持默认即可Kerberos认证默认打开用户名admin密码设置密码,例如:Huawei@12345。该密码用于登录集群管理页面。确认密码再次输入设置admin用户密码登录方式密码用户名root密码设置密码,例如:Huawei_12345。该密码用于远程登录ECS机器。确认密码再次输入设置的root用户密码通信安全授权勾选“确认授权”准备MRS的ORC表数据源本地PC新建一个product_info.txt,并拷贝以下数据,保存到本地。100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good! 300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice 150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite 200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality. 250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective 200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear 300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good 100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good登录OBS控制台,单击“创建桶”,填写以下参数,单击“立即创建”。表1 桶参数参数项取值区域华北-北京四数据冗余存储策略单AZ存储桶mrs-datasource存储类别标准存储桶策略私有默认加密关闭归档数据直读关闭企业项目default标签-等待桶创建好,单击桶名称,选择“对象 > 上传对象”,将product_info.txt上传至OBS桶。切换回MRS控制台,单击创建好的MRS集群名称,进入“概览”,单击“IAM用户同步”所在行的“单击同步”,等待约5分钟同步完成。回到MRS集群页面,单击“节点管理”,单击任意一台master节点,进入该节点页面,切换到“弹性公网IP”,单击“绑定弹性公网IP”,勾选已有弹性IP并单击“确定”,如果没有,请创建。记录此公网IP。确认主master节点。使用SSH工具以root用户登录以上节点,root密码为Huawei_12345,切换到omm用户。su - omm执行以下命令查询主master节点,回显信息中“HAActive”参数值为“active”的节点为主master节点。sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh使用root用户登录主master节点,切换到omm用户,并进入Hive客户端所在目录。su - ommcd /opt/client在Hive上创建存储类型为TEXTFILE的表product_info。在/opt/client路径下,导入环境变量。source bigdata_env登录Hive客户端。beeline依次执行以下SQL语句创建demo数据库及表product_info。CREATE DATABASE demo;USE demo;DROP TABLE product_info; CREATE TABLE product_info ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as TEXTFILE将product_info.txt数据文件导入Hive。切回到MRS集群,单击“文件管理”,单击“导入数据”。OBS路径:选择上面创建好的OBS桶名,找到product_info.txt文件,单击“是”。HDFS路径:选择/user/hive/warehouse/demo.db/product_info/,单击“是”。单击“确定”,等待导入成功,此时product_info的表数据已导入成功。创建ORC表,并将数据导入ORC表。执行以下SQL语句创建ORC表。DROP TABLE product_info_orc; CREATE TABLE product_info_orc ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc;将product_info表的数据插入到Hive ORC表product_info_orc中。insert into product_info_orc select * from product_info;查询ORC表数据导入成功。select * from product_info_orc;创建MRS数据源连接登录DWS管理控制台,单击已创建好的DWS集群,确保DWS集群与MRS在同一个区域、可用分区,并且在同一VPC子网下。切换到“MRS数据源”,单击“创建MRS数据源连接”。选择前序步骤创建名为的“MRS01”数据源,用户名:admin,密码:Huawei@12345,单击“确定”,创建成功。创建外部服务器使用Data Studio连接已创建好的DWS集群。新建一个具有创建数据库权限的用户dbuser:CREATE USER dbuser WITH CREATEDB PASSWORD "Bigdata@123";切换为新建的dbuser用户:SET ROLE dbuser PASSWORD "Bigdata@123";创建新的mydatabase数据库:CREATE DATABASE mydatabase;执行以下步骤切换为连接新建的mydatabase数据库。在Data Studio客户端的“对象浏览器”窗口,右键单击数据库连接名称,在弹出菜单中单击“刷新”,刷新后就可以看到新建的数据库。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开连接”。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开新的终端”,即可打开连接到指定数据库的SQL命令窗口,后面的步骤,请全部在该命令窗口中执行。为dbuser用户授予创建外部服务器的权限:GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser;其中FOREIGN DATA WRAPPER的名字只能是hdfs_fdw,dbuser为创建SERVER的用户名。执行以下命令赋予用户使用外表的权限。ALTER USER dbuser USEFT;切换回Postgres系统数据库,查询创建MRS数据源后系统自动创建的外部服务器。SELECT * FROM pg_foreign_server;返回结果如: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- gsmpp_server | 10 | 13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (3 rows)切换到mydatabase数据库,并切换到dbuser用户。SET ROLE dbuser PASSWORD "Bigdata@123";创建外部服务器。SERVER名字、地址、配置路径保持与8一致即可。CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820,192.168.1.218:9820', //MRS管理面的Master主备节点的内网IP,可与DWS通讯。 hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' );查看外部服务器。SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';返回结果如下所示,表示已经创建成功: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:29820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (1 row)创建外表获取Hive的product_info_orc的文件路径。登录MRS管理控制台。选择“集群列表 > 现有集群”,单击要查看的集群名称,进入集群基本信息页面。单击“文件管理”,选择“HDFS文件列表”。进入您要导入到GaussDB(DWS)集群的数据的存储目录,并记录其路径。创建外表。 SERVER名字填写创建的外部服务器名称,foldername填写查到的路径。DROP FOREIGN TABLE IF EXISTS foreign_product_info; CREATE FOREIGN TABLE foreign_product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca OPTIONS ( format 'orc', encoding 'utf8', foldername '/user/hive/warehouse/demo.db/product_info_orc/' ) DISTRIBUTE BY ROUNDROBIN;执行数据导入创建本地目标表。DROP TABLE IF EXISTS product_info; CREATE TABLE product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) with ( orientation = column, compression=middle ) DISTRIBUTE BY HASH (product_id);从外表导入目标表。INSERT INTO product_info SELECT * FROM foreign_product_info;查询导入结果。SELECT * FROM product_info;转自【云小课】EI第17课 大数据融合分析:GaussDB(DWS)轻松导入MRS-Hive数据源-云社区-华为云 (huaweicloud.com)
  • [Flink] Flink使用HiveCatalog访问hive的鉴权问题
    问题描述:请问专家当前FI版本的flink1.10在使用HiveCatlog如何进行zookeeper的鉴权认证?如有示例代码能提供最好,谢谢了。官网的代码如下,但是未发现有鉴权相关的类或者方法:EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build(); TableEnvironment tableEnv = TableEnvironment.create(settings); String name = "myhive"; String defaultDatabase = "mydatabase"; String hiveConfDir = "/opt/hive-conf"; // a local path String version = "2.3.4"; HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version); tableEnv.registerCatalog("myhive", hive); // set the HiveCatalog as the current catalog of the session tableEnv.useCatalog("myhive");在flink集群上面访问zookeeper后面的hive该如何鉴权认证呀?
  • [问题求助] 【MRS-HIVE组件】【使用Hive读取关系型数据库数据】&quot;ADD JAR hdfs:///tmp/ojdbc6.jar;&quot;报
    【功能模块】使用Hive读取关系型数据库数据【操作步骤&问题现象】1、安全模式下,按照示例,执行“ADD JAR hdfs:///tmp/ojdbc6.jar;”报错。2、出错提示:Error: org.apache.ranger.authorization.spark.authorizer.SparkAccessControlException: Permission denied: user [pocuser] does not have [TEMPUDFADMIN] privilege on [hdfs:///tmp/ojdbc6.jar] (state=,code=0)【截图信息】1,ranger UI中的信息:2,按照产品手册的示例,在第6步中出错:3,在spark-beeline中执行,报错如下:4,在beeline中执行,报错如下:【日志信息】(可选,上传日志内容或者附件)
  • [问题求助] 如何将大批量gaussdb数据迁移到hive
    请假大神们一个问题,现在项目面临的问题是在gaussdb中大概2000多张表,如何将这些表的数据同步到hive中去。请教了一下华为的客服,推荐用loader组件,可是这个组件对于这种大批量的数据迁移来说实在是太麻烦了,有没有大神有过这方面的经验,烦请告知!先谢谢了
  • [问题求助] Hive小文件合并
    GA一个局点,ISV在使用Hive过程中产生了大量的小文件,文件格式是ORC。请问有没有什么好的方法做Hive的小文件合并。
  • [分享驿站] 【云小课】大数据融合分析:GaussDB(DWS)轻松导入MRS-Hive数据源
    大数据融合分析时代,GaussDB(DWS)如需访问MRS数据源,该如何实现?本期云小课带您开启MRS数据源之门,通过远程读取MRS集群Hive上的ORC数据表完成数据导入DWS。准备环境需确保MRS和DWS集群在同一个区域、可用区、同一VPC子网内,确保集群网络互通。基本流程1、创建MRS分析集群(选择Hive组件)。2、通过将本地txt数据文件上传至OBS桶,再通过OBS桶导入Hive,并由txt存储表导入ORC存储表。3、创建MRS数据源连接。4、创建外部服务器。5、创建外表。6、通过外表导入DWS本地表。创建MRS分析集群登录华为云控制台,选择“EI企业智能 > MapReduce服务”,单击“购买集群”,选择“自定义购买”,填写软件配置参数,单击“下一步”。表1 软件配置参数项取值区域华北-北京四集群名称MRS01集群版本MRS 3.0.5集群类型分析集群填写硬件配置参数,单击“下一步”。表1 硬件配置参数项取值计费模式按需计费可用区可用区2虚拟私有云vpc-01子网subnet-01安全组自动创建弹性公网IP10.x.x.x企业项目defaultMaster节点打开“集群高可用”分析Core节点3分析Task节点0填写高级配置参数,单击“立即购买”,等待约15分钟,集群创建成功。表1 高级配置参数项取值标签test01委托保持默认即可告警保持默认即可规则名称保持默认即可主题名称保持默认即可Kerberos认证默认打开用户名admin密码设置密码,例如:Huawei@12345。该密码用于登录集群管理页面。确认密码再次输入设置admin用户密码登录方式密码用户名root密码设置密码,例如:Huawei_12345。该密码用于远程登录ECS机器。确认密码再次输入设置的root用户密码通信安全授权勾选“确认授权”准备MRS的ORC表数据源本地PC新建一个product_info.txt,并拷贝以下数据,保存到本地。100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good! 300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice 150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite 200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality. 250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective 200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear 300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good 100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good 登录OBS控制台,单击“创建桶”,填写以下参数,单击“立即创建”。表1 桶参数参数项取值区域华北-北京四数据冗余存储策略单AZ存储桶mrs-datasource存储类别标准存储桶策略私有默认加密关闭归档数据直读关闭企业项目default标签-等待桶创建好,单击桶名称,选择“对象 > 上传对象”,将product_info.txt上传至OBS桶。切换回MRS控制台,单击创建好的MRS集群名称,进入“概览”,单击“IAM用户同步”所在行的“单击同步”,等待约5分钟同步完成。回到MRS集群页面,单击“节点管理”,单击任意一台master节点,进入该节点页面,切换到“弹性公网IP”,单击“绑定弹性公网IP”,勾选已有弹性IP并单击“确定”,如果没有,请创建。记录此公网IP。确认主master节点。使用SSH工具以root用户登录以上节点,root密码为Huawei_12345,切换到omm用户。su - omm执行以下命令查询主master节点,回显信息中“HAActive”参数值为“active”的节点为主master节点。sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh使用root用户登录主master节点,切换到omm用户,并进入Hive客户端所在目录。su - ommcd /opt/client在Hive上创建存储类型为TEXTFILE的表product_info。在/opt/client路径下,导入环境变量。source bigdata_env登录Hive客户端。beeline依次执行以下SQL语句创建demo数据库及表product_info。CREATE DATABASE demo; USE demo; DROP TABLE product_info; CREATE TABLE product_info ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as TEXTFILE 将product_info.txt数据文件导入Hive。切回到MRS集群,单击“文件管理”,单击“导入数据”。OBS路径:选择上面创建好的OBS桶名,找到product_info.txt文件,单击“是”。HDFS路径:选择/user/hive/warehouse/demo.db/product_info/,单击“是”。单击“确定”,等待导入成功,此时product_info的表数据已导入成功。创建ORC表,并将数据导入ORC表。执行以下SQL语句创建ORC表。DROP TABLE product_info_orc; CREATE TABLE product_info_orc ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc; 将product_info表的数据插入到Hive ORC表product_info_orc中。insert into product_info_orc select * from product_info; 查询ORC表数据导入成功。select * from product_info_orc; 创建MRS数据源连接登录DWS管理控制台,单击已创建好的DWS集群,确保DWS集群与MRS在同一个区域、可用分区,并且在同一VPC子网下。切换到“MRS数据源”,单击“创建MRS数据源连接”。选择前序步骤创建名为的“MRS01”数据源,用户名:admin,密码:Huawei@12345,单击“确定”,创建成功。创建外部服务器使用Data Studio连接已创建好的DWS集群。新建一个具有创建数据库权限的用户dbuser:CREATE USER dbuser WITH CREATEDB PASSWORD "Bigdata@123"; 切换为新建的dbuser用户:SET ROLE dbuser PASSWORD "Bigdata@123"; 创建新的mydatabase数据库:CREATE DATABASE mydatabase; 执行以下步骤切换为连接新建的mydatabase数据库。在Data Studio客户端的“对象浏览器”窗口,右键单击数据库连接名称,在弹出菜单中单击“刷新”,刷新后就可以看到新建的数据库。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开连接”。右键单击“mydatabase”数据库名称,在弹出菜单中单击“打开新的终端”,即可打开连接到指定数据库的SQL命令窗口,后面的步骤,请全部在该命令窗口中执行。为dbuser用户授予创建外部服务器的权限:GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser; 其中FOREIGN DATA WRAPPER的名字只能是hdfs_fdw,dbuser为创建SERVER的用户名。执行以下命令赋予用户使用外表的权限。ALTER USER dbuser USEFT; 切换回Postgres系统数据库,查询创建MRS数据源后系统自动创建的外部服务器。SELECT * FROM pg_foreign_server; 返回结果如: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- gsmpp_server | 10 | 13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (3 rows) 切换到mydatabase数据库,并切换到dbuser用户。SET ROLE dbuser PASSWORD "Bigdata@123"; 创建外部服务器。SERVER名字、地址、配置路径保持与8一致即可。CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820,192.168.1.218:9820', //MRS管理面的Master主备节点的内网IP,可与DWS通讯。 hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' ); 查看外部服务器。SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca'; 返回结果如下所示,表示已经创建成功: srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:29820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (1 row) 创建外表获取Hive的product_info_orc的文件路径。登录MRS管理控制台。选择“集群列表 > 现有集群”,单击要查看的集群名称,进入集群基本信息页面。单击“文件管理”,选择“HDFS文件列表”。进入您要导入到GaussDB(DWS)集群的数据的存储目录,并记录其路径。创建外表。 SERVER名字填写10创建的外部服务器名称,foldername填写1查到的路径。DROP FOREIGN TABLE IF EXISTS foreign_product_info; CREATE FOREIGN TABLE foreign_product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca OPTIONS ( format 'orc', encoding 'utf8', foldername '/user/hive/warehouse/demo.db/product_info_orc/' ) DISTRIBUTE BY ROUNDROBIN; 执行数据导入创建本地目标表。DROP TABLE IF EXISTS product_info; CREATE TABLE product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) with ( orientation = column, compression=middle ) DISTRIBUTE BY HASH (product_id); 从外表导入目标表。INSERT INTO product_info SELECT * FROM foreign_product_info; 查询导入结果。SELECT * FROM product_info; 想了解更多GaussDB(DWS),请猛戳
  • [基础组件] 【hive】在beeline中执行set role admin;报错
    【功能模块】在hive中使用beeline执行set role admin;报错Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. dev doesn't belong to role admin (state=08S01,code=1)【操作步骤&问题现象】1、按照https://support.huaweicloud.com/trouble-mrs/mrs_03_0165.html 中所述添加对应角色。2、重复执行上述操作,还是报错。3,按照MRS文档中描述的内容执行修改,重复执行上述操作,还是报错。MRS版本:8.0.2已经重启过hive,Ranger,问题仍未解决。
  • [赋能学习] Hudi通过DeltaStreamer消费Kafka写入Hive样例
    场景:通过Hudi消费kafka数据写入Hive DeltaStreamer工具使用参考 https://hudi.apache.org/cn/docs/writing_data.html 验证环境: FusionInsight HD 8.1.0 安全集群,安装有 Spark2X、Hive、Kafka服务 说明:kafka服务端配置 allow.everyone.if.no.acl.found 为true 步骤: 1、准备集群(略) 2、安装客户端和准备测试用户 参考产品文档,默认安装客户端路径 /opt/client 测试用例通过manager页面创建,例如用户名developuser密码Haosuwei123 3、将附件中的 delta-stream-example-0.7.0.jar 上传到客户端节点/opt/haosuwei/hudi-demo/目录下,如果修改目录,注意修改下面提交命令中的路径信息 4、ssh登录客户端节点,执行安全认证 ``` source /opt/client/bigdata_env echo Haosuwei123 | kinit developuser ``` 5、确认HDFS上没有路径/tmp/huditest,有则删除 ``` hdfs dfs -rm -r /tmp/huditest ``` 6、创建kafka的topic: ``` kafka-topics.sh --create --topic delta_demo --bootstrap-server 8.5.136.1:21005 ``` 7、准备DeltaStreamer使用的配置文件kafka-source.properties,其内容参考如下,注意修改kafka的broker地址,保存到与第3步相同的路径中 ``` // hudi配置 hoodie.datasource.write.recordkey.field=id hoodie.datasource.write.partitionpath.field=y,ym,ymd hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator hoodie.datasource.write.hive_style_partitioning=true hoodie.delete.shuffle.parallelism=10 hoodie.upsert.shuffle.parallelism=10 hoodie.bulkinsert.shuffle.parallelism=10 hoodie.insert.shuffle.parallelism=10 hoodie.finalize.write.parallelism=10 hoodie.cleaner.parallelism=10 hoodie.datasource.write.precombine.field=id hoodie.base.path = /tmp/huditest/delta_demo hoodie.timeline.layout.version = 1 hoodie.datasource.write.payload.class=com.huawei.hudi.testcz.MyOverwriteNonDefaultsWithLatestAvroPayload hoodie.compaction.payload.class=com.huawei.hudi.testcz.MyOverwriteNonDefaultsWithLatestAvroPayload // hive config hoodie.datasource.hive_sync.table=delta_demo hoodie.datasource.hive_sync.partition_fields=y,ym,ymd hoodie.datasource.hive_sync.assume_date_partitioning=false hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor hoodie.datasource.hive_sync.use_jdbc=false // Kafka Source topic hoodie.deltastreamer.source.kafka.topic=delta_demo // checkpoint hoodie.deltastreamer.checkpoint.provider.path=hdfs://hacluster/tmp/huditest/checkpoint/ // Kafka props bootstrap.servers=8.5.136.1:21005 auto.offset.reset=latest group.id=group-1 offset.rang.limit=10000 ``` 8、提交spark任务 开发时以yarn-client模式提交,提交命令参考如下 ``` source /opt/client/bigdata_env kinit -k -t user.keytab developuser spark-submit --master yarn-client \ --jars /opt/client/Hudi/hudi/conf,/opt/client/Hudi/hudi/lib/*,/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --driver-class-path /opt/client/Hudi/hudi/conf:/opt/client/Hudi/hudi/lib/*:/opt/client/Spark2x/spark/jars/*:/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ spark-internal --props file:///opt/haosuwei/hudi-demo/kafka-source.properties \ --target-base-path /tmp/huditest/delta_demo \ --table-type COPY_ON_WRITE \ --target-table delta_demo \ --source-ordering-field id \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --schemaprovider-class com.huawei.hudi.deltastream.DataSchemaProviderExample \ --transformer-class com.huawei.hudi.deltastream.TransformerExample \ --enable-hive-sync --continuous ``` 9、插入数据验证: 通过kafka命令行,往topic中写入一条数据,其中id为2时其他字段为2222 ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/184535ytpfixzyuycuxwcd.png) 查看hive中自动创建的表 delta_demo ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/184559terj6gfjw9gh3ocs.png) 10、更新验证 通过kafka命令行,往topic中写入一条数据,其中id为2时其他字段为xxxx ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/18470182n1hygk6wabtwlm.png) 再次查看hive中表delta_demo中数据已被更新 ![image.png](https://bbs-img.huaweicloud.com/data/forums/attachment/forum/202104/21/184730wh9adnm7bapzmom5.png) 【其他】 生产环境建议以 yarn cluster模式提交spark任务,相关的提交方式修改如下: 1. 将 kafka-source.properties 上传到 HDFS,例如放到 /tmp目录 ``` hdfs dfs -put kafka-source.properties /tmp ``` 2. 修改提交spark任务的命令 ``` source /opt/client/bigdata_env source /opt/client/Hudi/component_env kinit -k -t user.keytab developuser spark-submit --master yarn-client \ --jars /opt/client/Hudi/hudi/conf,/opt/client/Hudi/hudi/lib/*,/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --driver-class-path /opt/client/Hudi/hudi/conf:/opt/client/Hudi/hudi/lib/*:/opt/client/Spark2x/spark/jars/*:/opt/haosuwei/hudi-demo/delta-stream-example-0.7.0.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ spark-internal --props hdfs://hacluster/tmp/kafka-source.properties \ --target-base-path /tmp/huditest/delta_demo \ --table-type COPY_ON_WRITE \ --target-table delta_demo \ --source-ordering-field id \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --schemaprovider-class com.huawei.hudi.deltastream.DataSchemaProviderExample \ --transformer-class com.huawei.hudi.deltastream.TransformerExample \ --enable-hive-sync --continuous ```
  • [其他] 有什么图形界面工具可以连接MRS Hive 执行HSQL吗?
    有什么图形界面工具可以连接MRS Hive 执行HSQL吗?
总条数:126 到第
上滑加载中