• [基础组件] HDFS节点滚动重启时,HBase集群的RegionServer WAL写,偶现超时卡住
    HDFS的DataNode在低频率重启过程中,HBase集群的RegionServer WAL写流程,会偶现以下WAL超时卡住错误,如何解决呢:2024-08-26 15:35:13,294 ERROR [RS_CLOSE_REGION-regionserver/cqbs028:60020-1] executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGIONjava.lang.RuntimeException: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818811, WAL system stuck?at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:116)at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818811, WAL system stuck?at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:148)at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:711)at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:631)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:158)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:136)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:101)at org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1145)at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1684)at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1501)at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)在停止RegionServer的过程中,也有可能会因为WAL卡住,停止RegionServer慢:java.lang.RuntimeException: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818767, WAL system stuck?at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:116)at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)at java.base/java.lang.Thread.run(Thread.java:829)Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818767, WAL system stuck?at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:148)at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:711)at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:631)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:158)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:136)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:101)at org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1145)at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1684)at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1501)at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
  • 开启HDFS NodeLabel ,有哪些坑?需要重点注意那块影响?
    环境:FusionInsight HD 6513背景:     1. 原集群datanode 机器基本为ARM,且配置较高,设备较新;     2. 现有一批低性能、低配置X86主机,需扩容到集群中;计划:启动HDFS NodeLabel 功能,对HDFS 目录进行打标签,将后扩容主机设置成指定标签目录的主机,以此来规避机器异构可能出现的负载不均等问题。需求:      1. 帮忙确认一下该方案是否可行,是否有更好的方案。      2. 如果此方案可行,是否有需要注意的方向,是否有踩坑案例(越详细越好)可以提供一下。烦请社区的大佬,帮帮忙!
  • [问题求助] spark写高斯数据库异常提示
    采用spark将计算好的数据写入高斯数据库,提示invalid input syntax for type oid:"xxxxx"。导致部分数据无法写入oid这个是系统表中对数据库资源的标志吧,sql中没有修改这个字段。这个异常具体怎么回事,有大神能够帮忙解释一下吗?
  • [运维管理] FusionInsight HD 6513 在线升级 FusionInsight HD 6517版本 需要多长时间?怎么评估的?
    FusionInsight HD 6513 在线升级 FusionInsight HD 6517版本  需要多长时间?怎么评估的?
  • [运维管理] FusionInsight HD 6513升级 FusionInsight HD 6517版本,是否支持部分组件(如kafka 、zookeeper)在线升级,其他组件离线升级?
    FusionInsight HD 6513升级 FusionInsight HD 6517版本,是否支持部分组件在线升级,其他组件离线升级?
  • [运维管理] HD 线下6.5.1.7版本集群,hdfs 将副本临时调整1后再调回3会发生什么现象?
    HD 线下6.5.1.7版本集群,hdfs 将副本临时调整1后再调回3会发生什么现象?
  • [基础组件] Python 读hdfs写hudi
    Python 读写入hdfs代码import sys sys.path.insert(0, '/opt/140client/Spark2x/spark/python') sys.path.insert(0, '/opt/140client/Spark2x/spark/python/lib/py4j-0.10.9-src.zip') import os os.environ["PYSPARK_PYTHON"]="/usr/anaconda3/bin/python3" import pyspark from pyspark.sql import SparkSession from pyspark import SparkConf from pyspark import SparkContext os.system('source /opt/140client/bigdata_env') from pyspark.sql.types import StructType, StructField, StringType, IntegerType spark = SparkSession.builder \ .appName("Generate Parquet File") \ .getOrCreate() data = [("Alice", 25, "2023-08-29"), ("Bob", 30, "2023-08-30")] schema = StructType([ StructField("Name", StringType(), nullable=False), StructField("Age", IntegerType(), nullable=False), StructField("ts", StringType(), nullable=True) ]) df = spark.createDataFrame(data, schema) output_path = "/tmp/sandbox/output.parquet" df.write.parquet(output_path)操作步骤执行命令source /opt/140client/bigdata_env spark-submit --master yarn /opt/sandbox/parquet.py --keytab /opt/sandbox/user.keytab --principal username查看生成文件 python读hdfs写hudi创建hudi表代码create table if not exists hudi0829( Name string, Age int, ts string ) using hudi location '/tmp/sandbox/hudi0829' options ( type = 'mor', primaryKey = 'Name', preCombineField = 'ts' );查看表 读hdfs写入hudi代码import sys sys.path.insert(0, '/opt/140client/Spark2x/spark/python') sys.path.insert(0, '/opt/140client/Spark2x/spark/python/lib/py4j-0.10.9-src.zip') import os os.environ["PYSPARK_PYTHON"]="/usr/anaconda3/bin/python3" sys.path.append('/opt/140client/Hudi/hudi/lib/') import pyspark from pyspark.sql import SparkSession from pyspark import SparkConf from pyspark import SparkContext #from hudi.config import HoodieConfig #from hudi.dataframe import create_hudi_dataset os.system('source /opt/140client/bigdata_env') spark = SparkSession.builder \ .appName("Write Parquet to Hudi") \ .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .getOrCreate() parquet_df = spark.read.parquet("hdfs://hacluster/tmp/sandbox/output.parquet") hudi_table_path = "hdfs://hacluster/tmp/sandbox/hudi0829" parquet_df.write \ .format("org.apache.hudi") \ .option("hoodie.datasource.write.recordkey.field", "Name") \ .option("hoodie.datasource.write.partitionpath.field", "ts") \ .option("hoodie.table.name", "hudi0829") \ .option("hoodie.datasource.write.operation", "upsert") \ .mode("append") \ .save(hudi_table_path)执行命令spark-submit --master yarn /opt/sandbox/parquet_hudi.py --keytab /opt/sandbox/user.keytab --principal username
  • [运维管理] 线下HD 6517版本集群,业务客户端到集群之前的端口22禁用对使用上有没有影响?
    线下HD 6517版本集群,业务客户端到集群之前的端口22禁用对使用上有没有影响?
  • [环境搭建] 为什么不建议Flume和DataNode部署在同一节点?为什么会存在数据不均衡的风险?
    为什么不建议Flume和DataNode部署在同一节点?为什么会存在数据不均衡的风险?