HDFS的DataNode在低频率重启过程中,HBase集群的RegionServer WAL写流程,会偶现以下WAL超时卡住错误,如何解决呢:2024-08-26 15:35:13,294 ERROR [RS_CLOSE_REGION-regionserver/cqbs028:60020-1] executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGIONjava.lang.RuntimeException: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818811, WAL system stuck?at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:116)at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818811, WAL system stuck?at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:148)at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:711)at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:631)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:158)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:136)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:101)at org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1145)at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1684)at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1501)at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)在停止RegionServer的过程中,也有可能会因为WAL卡住,停止RegionServer慢:java.lang.RuntimeException: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818767, WAL system stuck?at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:116)at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)at java.base/java.lang.Thread.run(Thread.java:829)Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=818767, WAL system stuck?at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:148)at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:711)at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:631)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:158)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:136)at org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:101)at org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1145)at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1684)at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1501)at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
yd_279396828
发表于2024-08-28 10:28:40
2024-08-28 10:28:40
最后回复
142