Files
geedge-jira/md/OMPUB-808.md

121 lines
3.7 KiB
Markdown
Raw Permalink Normal View History

2025-09-14 21:52:36 +00:00
# 【E21现场-OLAP】近期SSM-IGW OLAP Flink TaskManager Down 告警
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-808 | 2023-02-15T19:22:47.000+0800 | 戚岱杰 | 已关闭 |
---
No description
---
**qidaijie** commented on *2023-02-17T10:44:05.212+0800*:
问题原因Flink taskmanager进程假死导致日志数据无法处理产生告警。
 
现场情况:
# 目前IGW站点均发现有较多非结构化文件写入的情况其中MWV-IGW和SSM-IGW数量较大.
# MWV-IGW和SSM-IGW局点每天写入3TB+的eml文件平均每秒300个请求峰值1000左右。参考 !MWV和SSM-IGW磁盘存储情况.png|thumbnail! !SSM-IGW-HOS请求数量.png|thumbnail!
## 按照当前的写入量存储余量最多可支持持续写入4天左右。
# Flink受到资源影响以及汇聚国家中心Kafka的处理延迟数据堆在内存中无法及时处理致使taskmanager进程重启。
## HOS使用的资源占总资源的50%Taskmanager无法及时处理数据造成数据堆积。参考文件[^MWV-SSM资源使用.txt]
## 使用SSL加密汇聚国家中心Kafka的处理延迟比SASL用户认证方式延迟多1.3倍左右。参考文件:[^Kafka不同认证生产者延迟情况.txt]
## 通过监控可以观察到SSM-IGW的Taskmanager大部分时候可以自动恢复但重启次数过多导致了进程假死。参考 !SSM-IGW-Taskmanager重启情况.png|thumbnail!
 
临时处理方案HOS请求限速由2000修改100并持续观察。
---
**qidaijie** commented on *2023-02-17T10:45:04.179+0800*:
MWV-IGW/SSM-IGW/BJR-IGW/BOLE-IGW四个站点已修改HOS限速。
---
**qidaijie** commented on *2023-02-21T13:53:22.608+0800*:
 DIR-IGW站点修改HOS限速。
---
**qidaijie** commented on *2023-02-27T15:21:45.085+0800*:
修改上述IGW站点HOS限速后持续观察一段时间未再出现相关情况。
!230222-27OLAP告警情况.png|thumbnail!
---
2025-09-14 22:26:17 +00:00
# Attachments
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: 230222-27OLAP告警情况.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![230222-27OLAP告警情况.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35526/230222-27OLAP告警情况.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: Kafka不同认证生产者延迟情况.txt
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[Kafka不同认证生产者延迟情况.txt](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35233/Kafka不同认证生产者延迟情况.txt)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: MWV-SSM资源使用.txt
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[MWV-SSM资源使用.txt](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35231/MWV-SSM资源使用.txt)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: MWV和SSM-IGW磁盘存储情况.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![MWV和SSM-IGW磁盘存储情况.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35224/MWV和SSM-IGW磁盘存储情况.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: SSM-IGW-HOS请求数量.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![SSM-IGW-HOS请求数量.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35225/SSM-IGW-HOS请求数量.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: SSM-IGW-Taskmanager重启情况.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![SSM-IGW-Taskmanager重启情况.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35234/SSM-IGW-Taskmanager重启情况.png)
Attachment: 微信图片_20230215141453.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20230215141453.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35153/微信图片_20230215141453.png)
Attachment: 微信图片_20230215141917.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20230215141917.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35154/微信图片_20230215141917.png)
Attachment: 微信图片_20230215141928.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20230215141928.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/35155/微信图片_20230215141928.png)
2025-09-14 21:52:36 +00:00