2025-09-14 21:52:36 +00:00
|
|
|
|
# 【WMS-UTR项目】现场出现OLAP HOS Services Down与OLAP HBase Server Down告警
|
|
|
|
|
|
|
|
|
|
|
|
| ID | Creation Date | Assignee | Status |
|
|
|
|
|
|
|----|----------------|----------|--------|
|
|
|
|
|
|
| OMPUB-1266 | 2024-04-30T11:57:36.000+0800 | 王成成 | 已关闭 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
当地时间4月29号出现了OLAP HOS Services Down与OLAP HBase Server Down告警,附件为告警信息**wangchengcheng** commented on *2024-05-22T16:43:40.332+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
4月29日15:00 TWA后4台服务器(11-14)的io-util达到100%且持续,导致nezha无法采集服务器上组件(HOS、Hbase)的指标,因此nezha频繁有组件重启告警。
|
|
|
|
|
|
|
|
|
|
|
|
!4.29号TWA站点io使用率.png|thumbnail!
|
|
|
|
|
|
经过排查发现:
|
|
|
|
|
|
1.当地时间4月10日左右,TWA站点服务器发生断电重启。
|
|
|
|
|
|
2.TWA站点所有服务器的HDD硬盘配置为:(8T*10 raid5),理论顺序写能力可达到2000Mb/s,实际测试顺序写能力仅有200Mb/s。
|
|
|
|
|
|
|
|
|
|
|
|
!TWA服务器顺序写能力测试.png|thumbnail!
|
|
|
|
|
|
3.工程部同事排查硬件,未发现任何异常。
|
|
|
|
|
|
解决方案:
|
|
|
|
|
|
工程部同事重启服务器后,硬盘顺序写能力恢复至1700Mb/s。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**caoshanfeng** commented on *2024-08-29T17:35:00.893+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
已重启,观察无问题
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
# Attachments
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: 4.29号TWA站点io使用率.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: 4月29日告警.png
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: TWA服务器顺序写能力测试.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|