Files
geedge-jira/md/OMPUB-1054.md

138 lines
3.6 KiB
Markdown
Raw Normal View History

2025-09-14 21:52:36 +00:00
# 【E21现场】升级23.07后Bole-IGW出现因记录HTTP会话日志申请大量内存导致连续重启
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-1054 | 2023-11-05T12:10:53.000+0800 | 杨威 | 已关闭 |
---
Bole-IGW NPB03从2023.11.3 12点(UTC+3)左右开始持续重启现象是watchdog timeout触发重启原因是部分线程大量申请内存导致缺页如果不开watchdog的话则会在启动后迅速触发oom**liuyang** commented on *2023-11-05T12:11:43.168+0800*:
暂时定位到原因是流量中有异常的HTTP URL触发会话插件占用内存异常暂时将会话插件对于HTTP只记录host具体是修改对应NPB中session_record.inf
[HTTP]
#FUNC_FLAG=ALL
FUNC_FLAG=HTTP_HOST
---
**yangwei** commented on *2023-11-06T14:24:02.557+0800*:
故障开始时间UTC+3 2023-11-03 11:40
故障排查时段UTC+3 2023-11-03 15:11-17:48
 
重启现象运行一段时间后单包处理延迟高触发watchdog timeout
!anydesk00001.png|thumbnail!
重启现场session_record调用sendlog发送日志调用realloc时间长导致超时
!anydesk00000.png|thumbnail!
定位过程:
* 关闭watchdog前台gdb运行观察到运行数分钟后部分CPU核 sys调用上涨同时内存快速增长触发OOM
!anydesk00002.png|thumbnail!
* perf top -C sys调用上涨的核观察到内存分配相关函数调用高对应的火焰图显示sys调用上涨原因为触发缺页中断
!anydesk00003.png|thumbnail!
!image-2023-11-06-14-14-36-831.png|width=468,height=336!
* bt查看对应sys调用高的核调用栈主要在HTTP日志处理
!image-2023-11-06-14-16-02-180.png|width=636,height=358!
* 尝试关闭session_record中的HTTP数据处理入口内存上涨的现象未复现
* 尝试session_record仅处理HTTP URL相关数据内存上涨现象复现
* 继续尝试session_record仅处理HTTP Host内存上涨现象未复现同时检查重启现象的范围全网对应时段仅有Bol-IGW NPB03一台暂时定位原因为流量中的异常HTTP URL导致session_record拼接日志内存使用异常触发watchdog timeout
故障处理
* UTC+3 2023-11-03 17:48暂时将Bole-IGW NPB 03中session_record处理HTTP的入口改为仅处理Host待观察后续情况
---
**yangwei** commented on *2023-11-06T14:32:12.243+0800*:
UTC+3 2023-11-03 21:02 Bole-IGW NPB03出现重启重启现象依然为watchdog timeout对应时段PSI监控CPU waiting较高
!image-2023-11-06-14-29-10-155.png|width=521,height=1098!
同时自17:48修改session_record参数后内存使用未见异常但是Application Drop计数明显上涨推测原因仍然与异常流量有关
!image-2023-11-06-14-30-31-368.png|width=507,height=535!
---
**yangwei** commented on *2023-11-27T09:49:58.473+0800*:
11-24 *Bole-IGW* {*}NOB04{*}重现UTC+3 14:29开启频繁重启现象为重启后触发watchdog timeout
执行11-03 NBP03相同的处理方式恢复正常
!image-2023-11-27-09-47-55-825.png|width=802,height=435!
---
**liuyang** commented on *2024-08-31T18:34:04.849+0800*:
系统已升级至TSG24.02关闭此bug。升级后系统再出现类似问题重新创建bug
---
## Attachments
**46696/anydesk00000.png**
---
**46695/anydesk00001.png**
---
**46697/anydesk00002.png**
---
**46698/anydesk00003.png**
---
**46700/image-2023-11-06-14-14-36-831.png**
---
**46701/image-2023-11-06-14-16-02-180.png**
---
**46705/image-2023-11-06-14-29-10-155.png**
---
**46706/image-2023-11-06-14-30-31-368.png**
---
**47582/image-2023-11-27-09-47-55-825.png**
---
**46699/perf+(3).svg**
---