146 lines
3.0 KiB
Markdown
146 lines
3.0 KiB
Markdown
|
|
# 【E21-OLAP】国家中心Flink服务器长时间内存使用率80%告警
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-441 | 2022-04-10T03:50:07.000+0800 | 戚岱杰 | 已关闭 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
2022-04-04开始出现持续出现告警内存使用率过高的告警。**qidaijie** commented on *2022-04-11T11:51:45.490+0800*:
|
|||
|
|
|
|||
|
|
根据现场排查的情况,为新增的APP推荐任务较重,占用资源比较多,属于正常情况。
|
|||
|
|
目前告警临时处置方案为静默。
|
|||
|
|
|
|||
|
|
后续测试增加APP白名单,用以减少资源使用。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**zhengchao** commented on *2022-04-11T13:50:48.981+0800*:
|
|||
|
|
|
|||
|
|
VPN任务是指VPN客户端IP学习?
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**doufenghu** commented on *2022-04-11T15:07:28.919+0800*:
|
|||
|
|
|
|||
|
|
命名问题。属于APP实时推荐活跃客户端IP,选取部分VPN客户端应用进行学习。
|
|||
|
|
{quote}VPN任务是指VPN客户端IP学习?
|
|||
|
|
{quote}
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**zhengchao** commented on *2022-04-12T11:38:06.829+0800*:
|
|||
|
|
|
|||
|
|
“APP实时推荐活跃客户端IP”是为什么功能服务的?
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**doufenghu** commented on *2022-04-12T11:55:02.684+0800*:
|
|||
|
|
|
|||
|
|
CM 每分钟更新 Freegate, Psiphon3 活跃客户端IP列表。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**zhengchao** commented on *2022-04-12T12:46:58.558+0800*:
|
|||
|
|
|
|||
|
|
E现场,这几个客户端IP的规模有多大?
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**doufenghu** commented on *2022-04-13T15:31:02.009+0800*:
|
|||
|
|
|
|||
|
|
目前只有Psiphon3有流量,24小时独立客户端IP为32000个,会话量占识别全部APP流量的14%(总会话量90亿)。[~zhengchao]
|
|||
|
|
|
|||
|
|
[^app-top100.txt]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**qidaijie** commented on *2022-04-14T17:43:19.224+0800*:
|
|||
|
|
|
|||
|
|
在当地时间13号11点30分左右对 APP实时推荐活跃客户端IP程序,增加指定统计 *Freegate,Psiphon3,Tor* 三个APP配置。
|
|||
|
|
|
|||
|
|
修改后:
|
|||
|
|
# 从Flink自身暴露的指标观察,CPU和内存使用均有降低。
|
|||
|
|
!修改后Taskmanager CPU使用.png|thumbnail! !修改后Taskmanager 内存使用.png|thumbnail!
|
|||
|
|
# 从服务器整体观察,修改后的时间内,Taskmanager已申请的内存也没有释放。
|
|||
|
|
!修改后服务器整体内存使用.png|thumbnail!
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**qidaijie** commented on *2022-04-20T16:18:33.538+0800*:
|
|||
|
|
|
|||
|
|
目前将国家中心Flink集群taskmanager节点全部重启,重启后观察,使用内存在45%左右;后续持续追踪观察。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**liuju** commented on *2022-06-06T21:19:19.070+0800*:
|
|||
|
|
|
|||
|
|
鉴于2022-04-20 更新之后观察到现在,国家中心服务器flink服务器未再出现内存告警,故关闭该问题。[~qidaijie]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**27010/app-top100.txt**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**27286/Flink-taskmanager进程重启后内存.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**26926/Flink节点内存使用(自身指标).png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**26932/关闭APP推荐任务后的CPU使用率.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**26933/关闭APP推荐任务后内存使用(服务器整体内存).png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**27109/修改后Taskmanager+CPU使用.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**27110/修改后Taskmanager+内存使用.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**27111/修改后服务器整体内存使用.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**26929/重启taskmanager内存后内存使用.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|