Files
geedge-jira/md/OMPUB-1054.md
2025-09-14 21:52:36 +00:00

3.6 KiB
Raw Blame History

【E21现场】升级23.07后Bole-IGW出现因记录HTTP会话日志申请大量内存导致连续重启

ID Creation Date Assignee Status
OMPUB-1054 2023-11-05T12:10:53.000+0800 杨威 已关闭

Bole-IGW NPB03从2023.11.3 12点(UTC+3)左右开始持续重启现象是watchdog timeout触发重启原因是部分线程大量申请内存导致缺页如果不开watchdog的话则会在启动后迅速触发oomliuyang commented on 2023-11-05T12:11:43.168+0800:

暂时定位到原因是流量中有异常的HTTP URL触发会话插件占用内存异常暂时将会话插件对于HTTP只记录host具体是修改对应NPB中session_record.inf [HTTP] #FUNC_FLAG=ALL FUNC_FLAG=HTTP_HOST


yangwei commented on 2023-11-06T14:24:02.557+0800:

故障开始时间UTC+3 2023-11-03 11:40

故障排查时段UTC+3 2023-11-03 15:11-17:48

 

重启现象运行一段时间后单包处理延迟高触发watchdog timeout

!anydesk00001.png|thumbnail!

重启现场session_record调用sendlog发送日志调用realloc时间长导致超时

!anydesk00000.png|thumbnail!

定位过程:

  • 关闭watchdog前台gdb运行观察到运行数分钟后部分CPU核 sys调用上涨同时内存快速增长触发OOM

!anydesk00002.png|thumbnail!

  • perf top -C sys调用上涨的核观察到内存分配相关函数调用高对应的火焰图显示sys调用上涨原因为触发缺页中断

!anydesk00003.png|thumbnail!

!image-2023-11-06-14-14-36-831.png|width=468,height=336!

  • bt查看对应sys调用高的核调用栈主要在HTTP日志处理

!image-2023-11-06-14-16-02-180.png|width=636,height=358!

  • 尝试关闭session_record中的HTTP数据处理入口内存上涨的现象未复现
  • 尝试session_record仅处理HTTP URL相关数据内存上涨现象复现
  • 继续尝试session_record仅处理HTTP Host内存上涨现象未复现同时检查重启现象的范围全网对应时段仅有Bol-IGW NPB03一台暂时定位原因为流量中的异常HTTP URL导致session_record拼接日志内存使用异常触发watchdog timeout

故障处理

  • UTC+3 2023-11-03 17:48暂时将Bole-IGW NPB 03中session_record处理HTTP的入口改为仅处理Host待观察后续情况

yangwei commented on 2023-11-06T14:32:12.243+0800:

UTC+3 2023-11-03 21:02 Bole-IGW NPB03出现重启重启现象依然为watchdog timeout对应时段PSI监控CPU waiting较高

!image-2023-11-06-14-29-10-155.png|width=521,height=1098!

同时自17:48修改session_record参数后内存使用未见异常但是Application Drop计数明显上涨推测原因仍然与异常流量有关

!image-2023-11-06-14-30-31-368.png|width=507,height=535!


yangwei commented on 2023-11-27T09:49:58.473+0800:

11-24 Bole-IGW {}NOB04{}重现UTC+3 14:29开启频繁重启现象为重启后触发watchdog timeout

执行11-03 NBP03相同的处理方式恢复正常

!image-2023-11-27-09-47-55-825.png|width=802,height=435!


liuyang commented on 2024-08-31T18:34:04.849+0800:

系统已升级至TSG24.02关闭此bug。升级后系统再出现类似问题重新创建bug


Attachments

46696/anydesk00000.png


46695/anydesk00001.png


46697/anydesk00002.png


46698/anydesk00003.png


46700/image-2023-11-06-14-14-36-831.png


46701/image-2023-11-06-14-16-02-180.png


46705/image-2023-11-06-14-29-10-155.png


46706/image-2023-11-06-14-30-31-368.png


47582/image-2023-11-27-09-47-55-825.png


46699/perf+(3).svg