2025-09-14 21:52:36 +00:00
|
|
|
|
# 【E21-olap】DIR-IGW 、BJR-IGW 近两天出现OLAP kafka down 告警
|
|
|
|
|
|
|
|
|
|
|
|
| ID | Creation Date | Assignee | Status |
|
|
|
|
|
|
|----|----------------|----------|--------|
|
|
|
|
|
|
| OMPUB-492 | 2022-05-19T16:15:32.000+0800 | 戚岱杰 | 已关闭 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
DIR-IGW 及BJR-IGW 分别在2022-05-18及2022-05-19出现OLAP kafka down告警消息。
|
|
|
|
|
|
|
|
|
|
|
|
处理进展:
|
|
|
|
|
|
|
|
|
|
|
|
DIR-IGW 站点查看kafka界面数据,并提供kafka log日志、内存占用、现场配置文件中容器内存限制大小等给研发定位问题,根据现场数据和日志,研发认为可能是程序内存使用过高,超限制被docker干掉了
|
|
|
|
|
|
|
|
|
|
|
|
根据研发提供的处理方案更新了修改容器限制从17G->25G,删除并重启了容器,目前告警已消除,解决故障
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
BJR-IGW 根据现场数据,研发定位故障原因和DIR-IGW一致,故障暂通过重启docker解决。**qidaijie** commented on *2022-06-09T11:05:28.257+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
1:根据当时反馈的Kafka、Zookeeper日志没有发现明显的错误信息。
|
|
|
|
|
|
2:通过数据量监控未发现有日志量激增的情况。
|
|
|
|
|
|
!BJR-IGW日志量.jpg|thumbnail!
|
|
|
|
|
|
3:恢复后观察此量点的内存使用,均在4/6GB左右 未再超出上限。
|
|
|
|
|
|
!BJR-IGW和DIR-IGW Kafka内存使用.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
因未明确定位是何问题造成的,暂不对所有局点的Kafka容器进行修改,后续持续观察分析。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2022-06-27T09:39:16.254+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
根据现场回传日志查看:
|
|
|
|
|
|
1:Kafka存在与Zookeeper连接超时的情况,与Zookeeper连接超时,kakfa无法及时更新元信息,导致了Kafka服务终止。
|
|
|
|
|
|
!kafka-log-timeout.png|thumbnail!
|
|
|
|
|
|
2:通过查看机器的IO使用率,在Kafka出现连接超时的时间点附近,IO突增且持续;与之前正常情况下的IO使用率曲线有较大差别。
|
|
|
|
|
|
!image-2022-06-27-10-19-08-526.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
解决方案:
|
|
|
|
|
|
1:增加与Zookeeper的超时时间,减少数据刷盘前在内存内缓存的最大时间与大小。
|
|
|
|
|
|
2:在下次更新时,对所有局点的kafka进行配置优化。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
# Attachments
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: BJR-IGW和DIR-IGW+Kafka内存使用.png
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: BJR-IGW日志量.jpg
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2022-06-27-10-19-08-526.png
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: kafka-log-timeout.png
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
|
|
|
|
|
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|